Re: [Qemu-devel] [virtio-dev] Re: [PATCH v1 6/6] vhost-user: add VFIO based accelerators support
On Wed, Feb 07, 2018 at 10:02:24AM -0800, Alexander Duyck wrote: > On Wed, Feb 7, 2018 at 8:43 AM, Michael S. Tsirkinwrote: > > On Sun, Feb 04, 2018 at 01:49:46PM -0800, Alexander Duyck wrote: > >> On Thu, Jan 25, 2018 at 9:57 PM, Tiwei Bie wrote: > >> > On Fri, Jan 26, 2018 at 11:41:27AM +0800, Jason Wang wrote: > >> >> On 2018年01月26日 07:59, Michael S. Tsirkin wrote: > >> >> > > The virtual IOMMU isn't supported by the accelerators for now. > >> >> > > Because vhost-user currently lacks of an efficient way to share > >> >> > > the IOMMU table in VM to vhost backend. That's why the software > >> >> > > implementation of virtual IOMMU support in vhost-user backend > >> >> > > can't support dynamic mapping well. > >> >> > What exactly is meant by that? vIOMMU seems to work for people, > >> >> > it's not that fast if you change mappings all the time, > >> >> > but e.g. dpdk within guest doesn't. > >> >> > >> >> Yes, software implementation support dynamic mapping for sure. I think > >> >> the > >> >> point is, current vhost-user backend can not program hardware IOMMU. So > >> >> it > >> >> can not let hardware accelerator to cowork with software vIOMMU. > >> > > >> > Vhost-user backend can program hardware IOMMU. Currently > >> > vhost-user backend (or more precisely the vDPA driver in > >> > vhost-user backend) will use the memory table (delivered > >> > by the VHOST_USER_SET_MEM_TABLE message) to program the > >> > IOMMU via vfio, and that's why accelerators can use the > >> > GPA (guest physical address) in descriptors directly. > >> > > >> > Theoretically, we can use the IOVA mapping info (delivered > >> > by the VHOST_USER_IOTLB_MSG message) to program the IOMMU, > >> > and accelerators will be able to use IOVA. But the problem > >> > is that in vhost-user QEMU won't push all the IOVA mappings > >> > to backend directly. Backend needs to ask for those info > >> > when it meets a new IOVA. Such design and implementation > >> > won't work well for dynamic mappings anyway and couldn't > >> > be supported by hardware accelerators. > >> > > >> >> I think > >> >> that's another call to implement the offloaded path inside qemu which > >> >> has > >> >> complete support for vIOMMU co-operated VFIO. > >> > > >> > Yes, that's exactly what we want. After revisiting the > >> > last paragraph in the commit message, I found it's not > >> > really accurate. The practicability of dynamic mappings > >> > support is a common issue for QEMU. It also exists for > >> > vfio (hw/vfio in QEMU). If QEMU needs to trap all the > >> > map/unmap events, the data path performance couldn't be > >> > high. If we want to thoroughly fix this issue especially > >> > for vfio (hw/vfio in QEMU), we need to have the offload > >> > path Jason mentioned in QEMU. And I think accelerators > >> > could use it too. > >> > > >> > Best regards, > >> > Tiwei Bie > >> > >> I wonder if we couldn't look at coming up with an altered security > >> model for the IOMMU drivers to address some of the performance issues > >> seen with typical hardware IOMMU? > >> > >> In the case of most network devices, we seem to be moving toward a > >> model where the Rx pages are mapped for an extended period of time and > >> see a fairly high rate of reuse. As such pages mapped as being > >> writable or read/write by the device are left mapped for an extended > >> period of time while Tx pages, which are read only, are often > >> mapped/unmapped since they are coming from some other location in the > >> kernel beyond the driver's control. > >> > >> If we were to somehow come up with a model where the read-only(Tx) > >> pages had access to a pre-allocated memory mapped address, and the > >> read/write(descriptor rings), write-only(Rx) pages were provided with > >> dynamic addresses we might be able to come up with a solution that > >> would allow for fairly high network performance while at least > >> protecting from memory corruption. The only issue it would open up is > >> that the device would have the ability to read any/all memory on the > >> guest. I was wondering about doing something like this with the vIOMMU > >> with VFIO for the Intel NICs this way since an interface like igb, > >> ixgbe, ixgbevf, i40e, or i40evf would probably show pretty good > >> performance under such a model and as long as the writable pages were > >> being tracked by the vIOMMU. It could even allow for live migration > >> support if the vIOMMU provided the info needed for migratable/dirty > >> page tracking and we held off on migrating any of the dynamically > >> mapped pages until after they were either unmapped or an FLR reset the > >> device. > >> > >> Thanks. > >> > >> - Alex > > > > > > > > It might be a good idea to change the iommu instead - how about a > > variant of strict in intel iommu which forces an IOTLB flush after > > invalidating a writeable mapping but not a RO mapping? Not sure what the > > name would be - relaxed-ro? > > >
Re: [Qemu-devel] [virtio-dev] Re: [PATCH v1 6/6] vhost-user: add VFIO based accelerators support
On Wed, Feb 7, 2018 at 8:43 AM, Michael S. Tsirkinwrote: > On Sun, Feb 04, 2018 at 01:49:46PM -0800, Alexander Duyck wrote: >> On Thu, Jan 25, 2018 at 9:57 PM, Tiwei Bie wrote: >> > On Fri, Jan 26, 2018 at 11:41:27AM +0800, Jason Wang wrote: >> >> On 2018年01月26日 07:59, Michael S. Tsirkin wrote: >> >> > > The virtual IOMMU isn't supported by the accelerators for now. >> >> > > Because vhost-user currently lacks of an efficient way to share >> >> > > the IOMMU table in VM to vhost backend. That's why the software >> >> > > implementation of virtual IOMMU support in vhost-user backend >> >> > > can't support dynamic mapping well. >> >> > What exactly is meant by that? vIOMMU seems to work for people, >> >> > it's not that fast if you change mappings all the time, >> >> > but e.g. dpdk within guest doesn't. >> >> >> >> Yes, software implementation support dynamic mapping for sure. I think the >> >> point is, current vhost-user backend can not program hardware IOMMU. So it >> >> can not let hardware accelerator to cowork with software vIOMMU. >> > >> > Vhost-user backend can program hardware IOMMU. Currently >> > vhost-user backend (or more precisely the vDPA driver in >> > vhost-user backend) will use the memory table (delivered >> > by the VHOST_USER_SET_MEM_TABLE message) to program the >> > IOMMU via vfio, and that's why accelerators can use the >> > GPA (guest physical address) in descriptors directly. >> > >> > Theoretically, we can use the IOVA mapping info (delivered >> > by the VHOST_USER_IOTLB_MSG message) to program the IOMMU, >> > and accelerators will be able to use IOVA. But the problem >> > is that in vhost-user QEMU won't push all the IOVA mappings >> > to backend directly. Backend needs to ask for those info >> > when it meets a new IOVA. Such design and implementation >> > won't work well for dynamic mappings anyway and couldn't >> > be supported by hardware accelerators. >> > >> >> I think >> >> that's another call to implement the offloaded path inside qemu which has >> >> complete support for vIOMMU co-operated VFIO. >> > >> > Yes, that's exactly what we want. After revisiting the >> > last paragraph in the commit message, I found it's not >> > really accurate. The practicability of dynamic mappings >> > support is a common issue for QEMU. It also exists for >> > vfio (hw/vfio in QEMU). If QEMU needs to trap all the >> > map/unmap events, the data path performance couldn't be >> > high. If we want to thoroughly fix this issue especially >> > for vfio (hw/vfio in QEMU), we need to have the offload >> > path Jason mentioned in QEMU. And I think accelerators >> > could use it too. >> > >> > Best regards, >> > Tiwei Bie >> >> I wonder if we couldn't look at coming up with an altered security >> model for the IOMMU drivers to address some of the performance issues >> seen with typical hardware IOMMU? >> >> In the case of most network devices, we seem to be moving toward a >> model where the Rx pages are mapped for an extended period of time and >> see a fairly high rate of reuse. As such pages mapped as being >> writable or read/write by the device are left mapped for an extended >> period of time while Tx pages, which are read only, are often >> mapped/unmapped since they are coming from some other location in the >> kernel beyond the driver's control. >> >> If we were to somehow come up with a model where the read-only(Tx) >> pages had access to a pre-allocated memory mapped address, and the >> read/write(descriptor rings), write-only(Rx) pages were provided with >> dynamic addresses we might be able to come up with a solution that >> would allow for fairly high network performance while at least >> protecting from memory corruption. The only issue it would open up is >> that the device would have the ability to read any/all memory on the >> guest. I was wondering about doing something like this with the vIOMMU >> with VFIO for the Intel NICs this way since an interface like igb, >> ixgbe, ixgbevf, i40e, or i40evf would probably show pretty good >> performance under such a model and as long as the writable pages were >> being tracked by the vIOMMU. It could even allow for live migration >> support if the vIOMMU provided the info needed for migratable/dirty >> page tracking and we held off on migrating any of the dynamically >> mapped pages until after they were either unmapped or an FLR reset the >> device. >> >> Thanks. >> >> - Alex > > > > It might be a good idea to change the iommu instead - how about a > variant of strict in intel iommu which forces an IOTLB flush after > invalidating a writeable mapping but not a RO mapping? Not sure what the > name would be - relaxed-ro? > > This is probably easier than poking at the drivers and net core. > > Keeping the RX pages mapped in the IOMMU was envisioned for XDP. > That might be a good place to start. My plan is to update the Intel IOMMU driver first since it seems like something that shouldn't
Re: [Qemu-devel] [virtio-dev] Re: [PATCH v1 6/6] vhost-user: add VFIO based accelerators support
On Sun, Feb 04, 2018 at 01:49:46PM -0800, Alexander Duyck wrote: > On Thu, Jan 25, 2018 at 9:57 PM, Tiwei Biewrote: > > On Fri, Jan 26, 2018 at 11:41:27AM +0800, Jason Wang wrote: > >> On 2018年01月26日 07:59, Michael S. Tsirkin wrote: > >> > > The virtual IOMMU isn't supported by the accelerators for now. > >> > > Because vhost-user currently lacks of an efficient way to share > >> > > the IOMMU table in VM to vhost backend. That's why the software > >> > > implementation of virtual IOMMU support in vhost-user backend > >> > > can't support dynamic mapping well. > >> > What exactly is meant by that? vIOMMU seems to work for people, > >> > it's not that fast if you change mappings all the time, > >> > but e.g. dpdk within guest doesn't. > >> > >> Yes, software implementation support dynamic mapping for sure. I think the > >> point is, current vhost-user backend can not program hardware IOMMU. So it > >> can not let hardware accelerator to cowork with software vIOMMU. > > > > Vhost-user backend can program hardware IOMMU. Currently > > vhost-user backend (or more precisely the vDPA driver in > > vhost-user backend) will use the memory table (delivered > > by the VHOST_USER_SET_MEM_TABLE message) to program the > > IOMMU via vfio, and that's why accelerators can use the > > GPA (guest physical address) in descriptors directly. > > > > Theoretically, we can use the IOVA mapping info (delivered > > by the VHOST_USER_IOTLB_MSG message) to program the IOMMU, > > and accelerators will be able to use IOVA. But the problem > > is that in vhost-user QEMU won't push all the IOVA mappings > > to backend directly. Backend needs to ask for those info > > when it meets a new IOVA. Such design and implementation > > won't work well for dynamic mappings anyway and couldn't > > be supported by hardware accelerators. > > > >> I think > >> that's another call to implement the offloaded path inside qemu which has > >> complete support for vIOMMU co-operated VFIO. > > > > Yes, that's exactly what we want. After revisiting the > > last paragraph in the commit message, I found it's not > > really accurate. The practicability of dynamic mappings > > support is a common issue for QEMU. It also exists for > > vfio (hw/vfio in QEMU). If QEMU needs to trap all the > > map/unmap events, the data path performance couldn't be > > high. If we want to thoroughly fix this issue especially > > for vfio (hw/vfio in QEMU), we need to have the offload > > path Jason mentioned in QEMU. And I think accelerators > > could use it too. > > > > Best regards, > > Tiwei Bie > > I wonder if we couldn't look at coming up with an altered security > model for the IOMMU drivers to address some of the performance issues > seen with typical hardware IOMMU? > > In the case of most network devices, we seem to be moving toward a > model where the Rx pages are mapped for an extended period of time and > see a fairly high rate of reuse. As such pages mapped as being > writable or read/write by the device are left mapped for an extended > period of time while Tx pages, which are read only, are often > mapped/unmapped since they are coming from some other location in the > kernel beyond the driver's control. > > If we were to somehow come up with a model where the read-only(Tx) > pages had access to a pre-allocated memory mapped address, and the > read/write(descriptor rings), write-only(Rx) pages were provided with > dynamic addresses we might be able to come up with a solution that > would allow for fairly high network performance while at least > protecting from memory corruption. The only issue it would open up is > that the device would have the ability to read any/all memory on the > guest. I was wondering about doing something like this with the vIOMMU > with VFIO for the Intel NICs this way since an interface like igb, > ixgbe, ixgbevf, i40e, or i40evf would probably show pretty good > performance under such a model and as long as the writable pages were > being tracked by the vIOMMU. It could even allow for live migration > support if the vIOMMU provided the info needed for migratable/dirty > page tracking and we held off on migrating any of the dynamically > mapped pages until after they were either unmapped or an FLR reset the > device. > > Thanks. > > - Alex It might be a good idea to change the iommu instead - how about a variant of strict in intel iommu which forces an IOTLB flush after invalidating a writeable mapping but not a RO mapping? Not sure what the name would be - relaxed-ro? This is probably easier than poking at the drivers and net core. Keeping the RX pages mapped in the IOMMU was envisioned for XDP. That might be a good place to start. -- MST
Re: [Qemu-devel] [virtio-dev] Re: [PATCH v1 6/6] vhost-user: add VFIO based accelerators support
On Thu, Jan 25, 2018 at 9:57 PM, Tiwei Biewrote: > On Fri, Jan 26, 2018 at 11:41:27AM +0800, Jason Wang wrote: >> On 2018年01月26日 07:59, Michael S. Tsirkin wrote: >> > > The virtual IOMMU isn't supported by the accelerators for now. >> > > Because vhost-user currently lacks of an efficient way to share >> > > the IOMMU table in VM to vhost backend. That's why the software >> > > implementation of virtual IOMMU support in vhost-user backend >> > > can't support dynamic mapping well. >> > What exactly is meant by that? vIOMMU seems to work for people, >> > it's not that fast if you change mappings all the time, >> > but e.g. dpdk within guest doesn't. >> >> Yes, software implementation support dynamic mapping for sure. I think the >> point is, current vhost-user backend can not program hardware IOMMU. So it >> can not let hardware accelerator to cowork with software vIOMMU. > > Vhost-user backend can program hardware IOMMU. Currently > vhost-user backend (or more precisely the vDPA driver in > vhost-user backend) will use the memory table (delivered > by the VHOST_USER_SET_MEM_TABLE message) to program the > IOMMU via vfio, and that's why accelerators can use the > GPA (guest physical address) in descriptors directly. > > Theoretically, we can use the IOVA mapping info (delivered > by the VHOST_USER_IOTLB_MSG message) to program the IOMMU, > and accelerators will be able to use IOVA. But the problem > is that in vhost-user QEMU won't push all the IOVA mappings > to backend directly. Backend needs to ask for those info > when it meets a new IOVA. Such design and implementation > won't work well for dynamic mappings anyway and couldn't > be supported by hardware accelerators. > >> I think >> that's another call to implement the offloaded path inside qemu which has >> complete support for vIOMMU co-operated VFIO. > > Yes, that's exactly what we want. After revisiting the > last paragraph in the commit message, I found it's not > really accurate. The practicability of dynamic mappings > support is a common issue for QEMU. It also exists for > vfio (hw/vfio in QEMU). If QEMU needs to trap all the > map/unmap events, the data path performance couldn't be > high. If we want to thoroughly fix this issue especially > for vfio (hw/vfio in QEMU), we need to have the offload > path Jason mentioned in QEMU. And I think accelerators > could use it too. > > Best regards, > Tiwei Bie I wonder if we couldn't look at coming up with an altered security model for the IOMMU drivers to address some of the performance issues seen with typical hardware IOMMU? In the case of most network devices, we seem to be moving toward a model where the Rx pages are mapped for an extended period of time and see a fairly high rate of reuse. As such pages mapped as being writable or read/write by the device are left mapped for an extended period of time while Tx pages, which are read only, are often mapped/unmapped since they are coming from some other location in the kernel beyond the driver's control. If we were to somehow come up with a model where the read-only(Tx) pages had access to a pre-allocated memory mapped address, and the read/write(descriptor rings), write-only(Rx) pages were provided with dynamic addresses we might be able to come up with a solution that would allow for fairly high network performance while at least protecting from memory corruption. The only issue it would open up is that the device would have the ability to read any/all memory on the guest. I was wondering about doing something like this with the vIOMMU with VFIO for the Intel NICs this way since an interface like igb, ixgbe, ixgbevf, i40e, or i40evf would probably show pretty good performance under such a model and as long as the writable pages were being tracked by the vIOMMU. It could even allow for live migration support if the vIOMMU provided the info needed for migratable/dirty page tracking and we held off on migrating any of the dynamically mapped pages until after they were either unmapped or an FLR reset the device. Thanks. - Alex
Re: [Qemu-devel] [virtio-dev] Re: [PATCH v1 6/6] vhost-user: add VFIO based accelerators support
On Fri, Jan 26, 2018 at 11:41:27AM +0800, Jason Wang wrote: > On 2018年01月26日 07:59, Michael S. Tsirkin wrote: > > > The virtual IOMMU isn't supported by the accelerators for now. > > > Because vhost-user currently lacks of an efficient way to share > > > the IOMMU table in VM to vhost backend. That's why the software > > > implementation of virtual IOMMU support in vhost-user backend > > > can't support dynamic mapping well. > > What exactly is meant by that? vIOMMU seems to work for people, > > it's not that fast if you change mappings all the time, > > but e.g. dpdk within guest doesn't. > > Yes, software implementation support dynamic mapping for sure. I think the > point is, current vhost-user backend can not program hardware IOMMU. So it > can not let hardware accelerator to cowork with software vIOMMU. Vhost-user backend can program hardware IOMMU. Currently vhost-user backend (or more precisely the vDPA driver in vhost-user backend) will use the memory table (delivered by the VHOST_USER_SET_MEM_TABLE message) to program the IOMMU via vfio, and that's why accelerators can use the GPA (guest physical address) in descriptors directly. Theoretically, we can use the IOVA mapping info (delivered by the VHOST_USER_IOTLB_MSG message) to program the IOMMU, and accelerators will be able to use IOVA. But the problem is that in vhost-user QEMU won't push all the IOVA mappings to backend directly. Backend needs to ask for those info when it meets a new IOVA. Such design and implementation won't work well for dynamic mappings anyway and couldn't be supported by hardware accelerators. > I think > that's another call to implement the offloaded path inside qemu which has > complete support for vIOMMU co-operated VFIO. Yes, that's exactly what we want. After revisiting the last paragraph in the commit message, I found it's not really accurate. The practicability of dynamic mappings support is a common issue for QEMU. It also exists for vfio (hw/vfio in QEMU). If QEMU needs to trap all the map/unmap events, the data path performance couldn't be high. If we want to thoroughly fix this issue especially for vfio (hw/vfio in QEMU), we need to have the offload path Jason mentioned in QEMU. And I think accelerators could use it too. Best regards, Tiwei Bie > > Thanks > > > > > > Once this problem is solved > > > in vhost-user, virtual IOMMU can be supported by accelerators > > > too, and the IOMMU feature bit checking in this patch can be > > > removed. > > Given it works with software backends right now, I suspect > > this will be up to you guys to address. > > >
Re: [Qemu-devel] [virtio-dev] Re: [PATCH v1 6/6] vhost-user: add VFIO based accelerators support
On 2018年01月26日 07:59, Michael S. Tsirkin wrote: The virtual IOMMU isn't supported by the accelerators for now. Because vhost-user currently lacks of an efficient way to share the IOMMU table in VM to vhost backend. That's why the software implementation of virtual IOMMU support in vhost-user backend can't support dynamic mapping well. What exactly is meant by that? vIOMMU seems to work for people, it's not that fast if you change mappings all the time, but e.g. dpdk within guest doesn't. Yes, software implementation support dynamic mapping for sure. I think the point is, current vhost-user backend can not program hardware IOMMU. So it can not let hardware accelerator to cowork with software vIOMMU. I think that's another call to implement the offloaded path inside qemu which has complete support for vIOMMU co-operated VFIO. Thanks Once this problem is solved in vhost-user, virtual IOMMU can be supported by accelerators too, and the IOMMU feature bit checking in this patch can be removed. Given it works with software backends right now, I suspect this will be up to you guys to address.