Re: [PATCH RFC] vhost: basic device IOTLB support
On 01/05/2016 11:18 AM, Yang Zhang wrote: > On 2016/1/4 14:22, Jason Wang wrote: >> >> >> On 01/04/2016 09:39 AM, Yang Zhang wrote: >>> On 2015/12/31 15:13, Jason Wang wrote: This patch tries to implement an device IOTLB for vhost. This could be used with for co-operation with userspace(qemu) implementation of iommu for a secure DMA environment in guest. The idea is simple. When vhost meets an IOTLB miss, it will request the assistance of userspace to do the translation, this is done through: - Fill the translation request in a preset userspace address (This address is set through ioctl VHOST_SET_IOTLB_REQUEST_ENTRY). - Notify userspace through eventfd (This eventfd was set through ioctl VHOST_SET_IOTLB_FD). When userspace finishes the translation, it will update the vhost IOTLB through VHOST_UPDATE_IOTLB ioctl. Userspace is also in charge of snooping the IOTLB invalidation of IOMMU IOTLB and use VHOST_UPDATE_IOTLB to invalidate the possible entry in vhost. >>> >>> Is there any performance data shows the difference with IOTLB >>> supporting? >> >> Basic testing show it was slower than without IOTLB. >> >>> I doubt we may see performance decrease since the flush code path is >>> longer than before. >>> >> >> Yes, it also depend on the TLB hit rate. >> >> If lots of dynamic mappings and unmappings are used in guest (e.g normal >> Linux driver). This method should be much more slower since: >> >> - lots of invalidation and its path is slow. >> - the hit rate is low and the high price of userspace assisted address >> translation. >> - limitation of userspace IOMMU/IOTLB implementation (qemu's vtd >> emulation simply empty all entries when it's full). >> >> Another method is to implement kernel IOMMU (e.g vtd). But I'm not sure >> vhost is the best place to do this, since vhost should be architecture >> independent. Maybe we'd better to do it in kvm or have a pv IOMMU >> implementation in vhost. > > Actually, i have the kernel IOMMU(virtual vtd) patch which can pass > though the physical device to L2 guest on hand. A little bit confused, I believe the first step is to exporting an IOMMU to L1 guest for it to use for a assigned device? > But it is just a draft patch which was written several years ago. If > there is real requirement for it, I can rebase it and send out it for > review. Interesting but I think the goal is different. This patch tries to make vhost/virtio works with emulated IOMMU. > >> >> Another side, if fixed mappings were used in guest, (e.g dpdk in guest). >> We have the possibility to have 100% hit rate with almost no >> invalidation, the performance penalty should be ignorable, this should >> be the main use case for this patch. >> >> The patch is just a prototype for discussion. Any other ideas are >> welcomed. >> >> Thanks >> > > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC] vhost: basic device IOTLB support
On 2016/1/4 14:22, Jason Wang wrote: On 01/04/2016 09:39 AM, Yang Zhang wrote: On 2015/12/31 15:13, Jason Wang wrote: This patch tries to implement an device IOTLB for vhost. This could be used with for co-operation with userspace(qemu) implementation of iommu for a secure DMA environment in guest. The idea is simple. When vhost meets an IOTLB miss, it will request the assistance of userspace to do the translation, this is done through: - Fill the translation request in a preset userspace address (This address is set through ioctl VHOST_SET_IOTLB_REQUEST_ENTRY). - Notify userspace through eventfd (This eventfd was set through ioctl VHOST_SET_IOTLB_FD). When userspace finishes the translation, it will update the vhost IOTLB through VHOST_UPDATE_IOTLB ioctl. Userspace is also in charge of snooping the IOTLB invalidation of IOMMU IOTLB and use VHOST_UPDATE_IOTLB to invalidate the possible entry in vhost. Is there any performance data shows the difference with IOTLB supporting? Basic testing show it was slower than without IOTLB. I doubt we may see performance decrease since the flush code path is longer than before. Yes, it also depend on the TLB hit rate. If lots of dynamic mappings and unmappings are used in guest (e.g normal Linux driver). This method should be much more slower since: - lots of invalidation and its path is slow. - the hit rate is low and the high price of userspace assisted address translation. - limitation of userspace IOMMU/IOTLB implementation (qemu's vtd emulation simply empty all entries when it's full). Another method is to implement kernel IOMMU (e.g vtd). But I'm not sure vhost is the best place to do this, since vhost should be architecture independent. Maybe we'd better to do it in kvm or have a pv IOMMU implementation in vhost. Actually, i have the kernel IOMMU(virtual vtd) patch which can pass though the physical device to L2 guest on hand. But it is just a draft patch which was written several years ago. If there is real requirement for it, I can rebase it and send out it for review. Another side, if fixed mappings were used in guest, (e.g dpdk in guest). We have the possibility to have 100% hit rate with almost no invalidation, the performance penalty should be ignorable, this should be the main use case for this patch. The patch is just a prototype for discussion. Any other ideas are welcomed. Thanks -- best regards yang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC] vhost: basic device IOTLB support
On 01/04/2016 09:39 AM, Yang Zhang wrote: > On 2015/12/31 15:13, Jason Wang wrote: >> This patch tries to implement an device IOTLB for vhost. This could be >> used with for co-operation with userspace(qemu) implementation of >> iommu for a secure DMA environment in guest. >> >> The idea is simple. When vhost meets an IOTLB miss, it will request >> the assistance of userspace to do the translation, this is done >> through: >> >> - Fill the translation request in a preset userspace address (This >>address is set through ioctl VHOST_SET_IOTLB_REQUEST_ENTRY). >> - Notify userspace through eventfd (This eventfd was set through ioctl >>VHOST_SET_IOTLB_FD). >> >> When userspace finishes the translation, it will update the vhost >> IOTLB through VHOST_UPDATE_IOTLB ioctl. Userspace is also in charge of >> snooping the IOTLB invalidation of IOMMU IOTLB and use >> VHOST_UPDATE_IOTLB to invalidate the possible entry in vhost. > > Is there any performance data shows the difference with IOTLB supporting? Basic testing show it was slower than without IOTLB. > I doubt we may see performance decrease since the flush code path is > longer than before. > Yes, it also depend on the TLB hit rate. If lots of dynamic mappings and unmappings are used in guest (e.g normal Linux driver). This method should be much more slower since: - lots of invalidation and its path is slow. - the hit rate is low and the high price of userspace assisted address translation. - limitation of userspace IOMMU/IOTLB implementation (qemu's vtd emulation simply empty all entries when it's full). Another method is to implement kernel IOMMU (e.g vtd). But I'm not sure vhost is the best place to do this, since vhost should be architecture independent. Maybe we'd better to do it in kvm or have a pv IOMMU implementation in vhost. Another side, if fixed mappings were used in guest, (e.g dpdk in guest). We have the possibility to have 100% hit rate with almost no invalidation, the performance penalty should be ignorable, this should be the main use case for this patch. The patch is just a prototype for discussion. Any other ideas are welcomed. Thanks -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC] vhost: basic device IOTLB support
On 12/31/2015 07:17 PM, Michael S. Tsirkin wrote: > On Thu, Dec 31, 2015 at 03:13:45PM +0800, Jason Wang wrote: >> This patch tries to implement an device IOTLB for vhost. This could be >> used with for co-operation with userspace(qemu) implementation of >> iommu for a secure DMA environment in guest. >> >> The idea is simple. When vhost meets an IOTLB miss, it will request >> the assistance of userspace to do the translation, this is done >> through: >> >> - Fill the translation request in a preset userspace address (This >> address is set through ioctl VHOST_SET_IOTLB_REQUEST_ENTRY). >> - Notify userspace through eventfd (This eventfd was set through ioctl >> VHOST_SET_IOTLB_FD). >> >> When userspace finishes the translation, it will update the vhost >> IOTLB through VHOST_UPDATE_IOTLB ioctl. Userspace is also in charge of >> snooping the IOTLB invalidation of IOMMU IOTLB and use >> VHOST_UPDATE_IOTLB to invalidate the possible entry in vhost. >> >> For simplicity, IOTLB was implemented with a simple hash array. The >> index were calculated from IOVA page frame number which can only works >> at PAGE_SIZE level. >> >> An qemu implementation (for reference) is available at: >> g...@github.com:jasowang/qemu.git iommu >> >> TODO & Known issues: >> >> - read/write permission validation was not implemented. >> - no feature negotiation. >> - VHOST_SET_MEM_TABLE is not reused (maybe there's a chance). >> - working at PAGE_SIZE level, don't support large mappings. >> - better data structure for IOTLB instead of simple hash array. >> - better API, e.g using mmap() instead of preset userspace address. >> >> Signed-off-by: Jason Wang > Interesting. I'm working on a slightly different approach > which is direct vt-d support in vhost. I've considered this approach. May have advantages but the issues here is vt-d emulation is even in-complete in qemu and I believe we don't want to duplicate the code in both vhost-kernel and vhost-user? > This one has the advantage of being more portable. Right, the patch tries to be architecture independent. > >> --- >> drivers/vhost/net.c| 2 +- >> drivers/vhost/vhost.c | 190 >> - >> drivers/vhost/vhost.h | 13 >> include/uapi/linux/vhost.h | 26 +++ >> 4 files changed, 229 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c >> index 9eda69e..a172be9 100644 >> --- a/drivers/vhost/net.c >> +++ b/drivers/vhost/net.c >> @@ -1083,7 +1083,7 @@ static long vhost_net_ioctl(struct file *f, unsigned >> int ioctl, >> r = vhost_dev_ioctl(&n->dev, ioctl, argp); >> if (r == -ENOIOCTLCMD) >> r = vhost_vring_ioctl(&n->dev, ioctl, argp); >> -else >> +else if (ioctl != VHOST_UPDATE_IOTLB) >> vhost_net_flush(n); >> mutex_unlock(&n->dev.mutex); >> return r; >> diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c >> index eec2f11..729fe05 100644 >> --- a/drivers/vhost/vhost.c >> +++ b/drivers/vhost/vhost.c >> @@ -113,6 +113,11 @@ static void vhost_init_is_le(struct vhost_virtqueue *vq) >> } >> #endif /* CONFIG_VHOST_CROSS_ENDIAN_LEGACY */ >> >> +static inline int vhost_iotlb_hash(u64 iova) >> +{ >> +return (iova >> PAGE_SHIFT) & (VHOST_IOTLB_SIZE - 1); >> +} >> + >> static void vhost_poll_func(struct file *file, wait_queue_head_t *wqh, >> poll_table *pt) >> { >> @@ -384,8 +389,14 @@ void vhost_dev_init(struct vhost_dev *dev, >> dev->memory = NULL; >> dev->mm = NULL; >> spin_lock_init(&dev->work_lock); >> +spin_lock_init(&dev->iotlb_lock); >> +mutex_init(&dev->iotlb_req_mutex); >> INIT_LIST_HEAD(&dev->work_list); >> dev->worker = NULL; >> +dev->iotlb_request = NULL; >> +dev->iotlb_ctx = NULL; >> +dev->iotlb_file = NULL; >> +dev->pending_request.flags.type = VHOST_IOTLB_INVALIDATE; >> >> for (i = 0; i < dev->nvqs; ++i) { >> vq = dev->vqs[i]; >> @@ -393,12 +404,17 @@ void vhost_dev_init(struct vhost_dev *dev, >> vq->indirect = NULL; >> vq->heads = NULL; >> vq->dev = dev; >> +vq->iotlb_request = NULL; >> mutex_init(&vq->mutex); >> vhost_vq_reset(dev, vq); >> if (vq->handle_kick) >> vhost_poll_init(&vq->poll, vq->handle_kick, >> POLLIN, dev); >> } >> + >> +init_completion(&dev->iotlb_completion); >> +for (i = 0; i < VHOST_IOTLB_SIZE; i++) >> +dev->iotlb[i].flags.valid = VHOST_IOTLB_INVALID; >> } >> EXPORT_SYMBOL_GPL(vhost_dev_init); >> >> @@ -940,9 +956,10 @@ long vhost_dev_ioctl(struct vhost_dev *d, unsigned int >> ioctl, void __user *argp) >> { >> struct file *eventfp, *filep = NULL; >> struct eventfd_ctx *ctx = NULL; >> +struct vhost_iotlb_entry entry; >> u64 p;
Re: [PATCH RFC] vhost: basic device IOTLB support
On 2015/12/31 15:13, Jason Wang wrote: This patch tries to implement an device IOTLB for vhost. This could be used with for co-operation with userspace(qemu) implementation of iommu for a secure DMA environment in guest. The idea is simple. When vhost meets an IOTLB miss, it will request the assistance of userspace to do the translation, this is done through: - Fill the translation request in a preset userspace address (This address is set through ioctl VHOST_SET_IOTLB_REQUEST_ENTRY). - Notify userspace through eventfd (This eventfd was set through ioctl VHOST_SET_IOTLB_FD). When userspace finishes the translation, it will update the vhost IOTLB through VHOST_UPDATE_IOTLB ioctl. Userspace is also in charge of snooping the IOTLB invalidation of IOMMU IOTLB and use VHOST_UPDATE_IOTLB to invalidate the possible entry in vhost. Is there any performance data shows the difference with IOTLB supporting? I doubt we may see performance decrease since the flush code path is longer than before. -- best regards yang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH RFC] vhost: basic device IOTLB support
On Thu, Dec 31, 2015 at 03:13:45PM +0800, Jason Wang wrote: > This patch tries to implement an device IOTLB for vhost. This could be > used with for co-operation with userspace(qemu) implementation of > iommu for a secure DMA environment in guest. > > The idea is simple. When vhost meets an IOTLB miss, it will request > the assistance of userspace to do the translation, this is done > through: > > - Fill the translation request in a preset userspace address (This > address is set through ioctl VHOST_SET_IOTLB_REQUEST_ENTRY). > - Notify userspace through eventfd (This eventfd was set through ioctl > VHOST_SET_IOTLB_FD). > > When userspace finishes the translation, it will update the vhost > IOTLB through VHOST_UPDATE_IOTLB ioctl. Userspace is also in charge of > snooping the IOTLB invalidation of IOMMU IOTLB and use > VHOST_UPDATE_IOTLB to invalidate the possible entry in vhost. > > For simplicity, IOTLB was implemented with a simple hash array. The > index were calculated from IOVA page frame number which can only works > at PAGE_SIZE level. > > An qemu implementation (for reference) is available at: > g...@github.com:jasowang/qemu.git iommu > > TODO & Known issues: > > - read/write permission validation was not implemented. > - no feature negotiation. > - VHOST_SET_MEM_TABLE is not reused (maybe there's a chance). > - working at PAGE_SIZE level, don't support large mappings. > - better data structure for IOTLB instead of simple hash array. > - better API, e.g using mmap() instead of preset userspace address. > > Signed-off-by: Jason Wang Interesting. I'm working on a slightly different approach which is direct vt-d support in vhost. This one has the advantage of being more portable. > --- > drivers/vhost/net.c| 2 +- > drivers/vhost/vhost.c | 190 > - > drivers/vhost/vhost.h | 13 > include/uapi/linux/vhost.h | 26 +++ > 4 files changed, 229 insertions(+), 2 deletions(-) > > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c > index 9eda69e..a172be9 100644 > --- a/drivers/vhost/net.c > +++ b/drivers/vhost/net.c > @@ -1083,7 +1083,7 @@ static long vhost_net_ioctl(struct file *f, unsigned > int ioctl, > r = vhost_dev_ioctl(&n->dev, ioctl, argp); > if (r == -ENOIOCTLCMD) > r = vhost_vring_ioctl(&n->dev, ioctl, argp); > - else > + else if (ioctl != VHOST_UPDATE_IOTLB) > vhost_net_flush(n); > mutex_unlock(&n->dev.mutex); > return r; > diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c > index eec2f11..729fe05 100644 > --- a/drivers/vhost/vhost.c > +++ b/drivers/vhost/vhost.c > @@ -113,6 +113,11 @@ static void vhost_init_is_le(struct vhost_virtqueue *vq) > } > #endif /* CONFIG_VHOST_CROSS_ENDIAN_LEGACY */ > > +static inline int vhost_iotlb_hash(u64 iova) > +{ > + return (iova >> PAGE_SHIFT) & (VHOST_IOTLB_SIZE - 1); > +} > + > static void vhost_poll_func(struct file *file, wait_queue_head_t *wqh, > poll_table *pt) > { > @@ -384,8 +389,14 @@ void vhost_dev_init(struct vhost_dev *dev, > dev->memory = NULL; > dev->mm = NULL; > spin_lock_init(&dev->work_lock); > + spin_lock_init(&dev->iotlb_lock); > + mutex_init(&dev->iotlb_req_mutex); > INIT_LIST_HEAD(&dev->work_list); > dev->worker = NULL; > + dev->iotlb_request = NULL; > + dev->iotlb_ctx = NULL; > + dev->iotlb_file = NULL; > + dev->pending_request.flags.type = VHOST_IOTLB_INVALIDATE; > > for (i = 0; i < dev->nvqs; ++i) { > vq = dev->vqs[i]; > @@ -393,12 +404,17 @@ void vhost_dev_init(struct vhost_dev *dev, > vq->indirect = NULL; > vq->heads = NULL; > vq->dev = dev; > + vq->iotlb_request = NULL; > mutex_init(&vq->mutex); > vhost_vq_reset(dev, vq); > if (vq->handle_kick) > vhost_poll_init(&vq->poll, vq->handle_kick, > POLLIN, dev); > } > + > + init_completion(&dev->iotlb_completion); > + for (i = 0; i < VHOST_IOTLB_SIZE; i++) > + dev->iotlb[i].flags.valid = VHOST_IOTLB_INVALID; > } > EXPORT_SYMBOL_GPL(vhost_dev_init); > > @@ -940,9 +956,10 @@ long vhost_dev_ioctl(struct vhost_dev *d, unsigned int > ioctl, void __user *argp) > { > struct file *eventfp, *filep = NULL; > struct eventfd_ctx *ctx = NULL; > + struct vhost_iotlb_entry entry; > u64 p; > long r; > - int i, fd; > + int index, i, fd; > > /* If you are not the owner, you can become one */ > if (ioctl == VHOST_SET_OWNER) { > @@ -1008,6 +1025,80 @@ long vhost_dev_ioctl(struct vhost_dev *d, unsigned int > ioctl, void __user *argp) > if (filep) > fput(filep); > break; > + case VHOST_SET_IOT
[PATCH RFC] vhost: basic device IOTLB support
This patch tries to implement an device IOTLB for vhost. This could be used with for co-operation with userspace(qemu) implementation of iommu for a secure DMA environment in guest. The idea is simple. When vhost meets an IOTLB miss, it will request the assistance of userspace to do the translation, this is done through: - Fill the translation request in a preset userspace address (This address is set through ioctl VHOST_SET_IOTLB_REQUEST_ENTRY). - Notify userspace through eventfd (This eventfd was set through ioctl VHOST_SET_IOTLB_FD). When userspace finishes the translation, it will update the vhost IOTLB through VHOST_UPDATE_IOTLB ioctl. Userspace is also in charge of snooping the IOTLB invalidation of IOMMU IOTLB and use VHOST_UPDATE_IOTLB to invalidate the possible entry in vhost. For simplicity, IOTLB was implemented with a simple hash array. The index were calculated from IOVA page frame number which can only works at PAGE_SIZE level. An qemu implementation (for reference) is available at: g...@github.com:jasowang/qemu.git iommu TODO & Known issues: - read/write permission validation was not implemented. - no feature negotiation. - VHOST_SET_MEM_TABLE is not reused (maybe there's a chance). - working at PAGE_SIZE level, don't support large mappings. - better data structure for IOTLB instead of simple hash array. - better API, e.g using mmap() instead of preset userspace address. Signed-off-by: Jason Wang --- drivers/vhost/net.c| 2 +- drivers/vhost/vhost.c | 190 - drivers/vhost/vhost.h | 13 include/uapi/linux/vhost.h | 26 +++ 4 files changed, 229 insertions(+), 2 deletions(-) diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c index 9eda69e..a172be9 100644 --- a/drivers/vhost/net.c +++ b/drivers/vhost/net.c @@ -1083,7 +1083,7 @@ static long vhost_net_ioctl(struct file *f, unsigned int ioctl, r = vhost_dev_ioctl(&n->dev, ioctl, argp); if (r == -ENOIOCTLCMD) r = vhost_vring_ioctl(&n->dev, ioctl, argp); - else + else if (ioctl != VHOST_UPDATE_IOTLB) vhost_net_flush(n); mutex_unlock(&n->dev.mutex); return r; diff --git a/drivers/vhost/vhost.c b/drivers/vhost/vhost.c index eec2f11..729fe05 100644 --- a/drivers/vhost/vhost.c +++ b/drivers/vhost/vhost.c @@ -113,6 +113,11 @@ static void vhost_init_is_le(struct vhost_virtqueue *vq) } #endif /* CONFIG_VHOST_CROSS_ENDIAN_LEGACY */ +static inline int vhost_iotlb_hash(u64 iova) +{ + return (iova >> PAGE_SHIFT) & (VHOST_IOTLB_SIZE - 1); +} + static void vhost_poll_func(struct file *file, wait_queue_head_t *wqh, poll_table *pt) { @@ -384,8 +389,14 @@ void vhost_dev_init(struct vhost_dev *dev, dev->memory = NULL; dev->mm = NULL; spin_lock_init(&dev->work_lock); + spin_lock_init(&dev->iotlb_lock); + mutex_init(&dev->iotlb_req_mutex); INIT_LIST_HEAD(&dev->work_list); dev->worker = NULL; + dev->iotlb_request = NULL; + dev->iotlb_ctx = NULL; + dev->iotlb_file = NULL; + dev->pending_request.flags.type = VHOST_IOTLB_INVALIDATE; for (i = 0; i < dev->nvqs; ++i) { vq = dev->vqs[i]; @@ -393,12 +404,17 @@ void vhost_dev_init(struct vhost_dev *dev, vq->indirect = NULL; vq->heads = NULL; vq->dev = dev; + vq->iotlb_request = NULL; mutex_init(&vq->mutex); vhost_vq_reset(dev, vq); if (vq->handle_kick) vhost_poll_init(&vq->poll, vq->handle_kick, POLLIN, dev); } + + init_completion(&dev->iotlb_completion); + for (i = 0; i < VHOST_IOTLB_SIZE; i++) + dev->iotlb[i].flags.valid = VHOST_IOTLB_INVALID; } EXPORT_SYMBOL_GPL(vhost_dev_init); @@ -940,9 +956,10 @@ long vhost_dev_ioctl(struct vhost_dev *d, unsigned int ioctl, void __user *argp) { struct file *eventfp, *filep = NULL; struct eventfd_ctx *ctx = NULL; + struct vhost_iotlb_entry entry; u64 p; long r; - int i, fd; + int index, i, fd; /* If you are not the owner, you can become one */ if (ioctl == VHOST_SET_OWNER) { @@ -1008,6 +1025,80 @@ long vhost_dev_ioctl(struct vhost_dev *d, unsigned int ioctl, void __user *argp) if (filep) fput(filep); break; + case VHOST_SET_IOTLB_FD: + r = get_user(fd, (int __user *)argp); + if (r < 0) + break; + eventfp = fd == -1 ? NULL : eventfd_fget(fd); + if (IS_ERR(eventfp)) { + r = PTR_ERR(eventfp); + break; + } + if (eventfp != d->iotlb_file) {