Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Tue, Nov 03, 2020 at 08:14:29PM +0100, j...@8bytes.org wrote: > On Tue, Nov 03, 2020 at 01:48:51PM -0400, Jason Gunthorpe wrote: > > I think the same PCI driver with a small flag to support the PF or > > VF is not the same as two completely different drivers in different > > subsystems > > There are counter-examples: ixgbe vs. ixgbevf. > > Note that also a single driver can support both, an SVA device and an > mdev device, sharing code for accessing parts of the device like queues > and handling interrupts. Needing a mdev device at all is the larger issue, mdev means the kernel must carry a lot of emulation code depending on how the SVA device is designed. Eg creating queues may require an emulated BAR. Shifting that code to userspace and having a single clean 'SVA' interface from the kernel for the device makes a lot more sense, esepcially from a security perspective. Forcing all vIOMMU stuff to only use VFIO permanently closes this as an option. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Tue, Nov 03, 2020 at 01:48:51PM -0400, Jason Gunthorpe wrote: > I think the same PCI driver with a small flag to support the PF or > VF is not the same as two completely different drivers in different > subsystems There are counter-examples: ixgbe vs. ixgbevf. Note that also a single driver can support both, an SVA device and an mdev device, sharing code for accessing parts of the device like queues and handling interrupts. Regards, Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Tue, Nov 03, 2020 at 05:55:40PM +0100, j...@8bytes.org wrote: > On Tue, Nov 03, 2020 at 11:22:23AM -0400, Jason Gunthorpe wrote: > > This whole thread was brought up by IDXD which has a SVA driver and > > now wants to add a vfio-mdev driver too. SVA devices that want to be > > plugged into VMs are going to be common - this architecture that a SVA > > driver cannot cover the kvm case seems problematic. > > Isn't that the same pattern as having separate drivers for VFs and the > parent device in SR-IOV? I think the same PCI driver with a small flag to support the PF or VF is not the same as two completely different drivers in different subsystems Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Tue, Nov 03, 2020 at 11:22:23AM -0400, Jason Gunthorpe wrote: > This whole thread was brought up by IDXD which has a SVA driver and > now wants to add a vfio-mdev driver too. SVA devices that want to be > plugged into VMs are going to be common - this architecture that a SVA > driver cannot cover the kvm case seems problematic. Isn't that the same pattern as having separate drivers for VFs and the parent device in SR-IOV? Regards, Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Tue, Nov 03, 2020 at 03:35:32PM +0100, j...@8bytes.org wrote: > On Tue, Nov 03, 2020 at 10:06:42AM -0400, Jason Gunthorpe wrote: > > The point is that other places beyond VFIO need this > > Which and why? > > > Sure, but sometimes it is necessary, and in those cases the answer > > can't be "rewrite a SVA driver to use vfio" > > This is getting to abstract. Can you come up with an example where > handling this in VFIO or an endpoint device kernel driver does not work? This whole thread was brought up by IDXD which has a SVA driver and now wants to add a vfio-mdev driver too. SVA devices that want to be plugged into VMs are going to be common - this architecture that a SVA driver cannot cover the kvm case seems problematic. Yes, everything can have a SVA driver and a vfio-mdev, it works just fine, but it is not very clean or simple. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Tue, Nov 03, 2020 at 10:06:42AM -0400, Jason Gunthorpe wrote: > The point is that other places beyond VFIO need this Which and why? > Sure, but sometimes it is necessary, and in those cases the answer > can't be "rewrite a SVA driver to use vfio" This is getting to abstract. Can you come up with an example where handling this in VFIO or an endpoint device kernel driver does not work? Regards, Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Tue, Nov 03, 2020 at 03:03:18PM +0100, j...@8bytes.org wrote: > On Tue, Nov 03, 2020 at 09:23:35AM -0400, Jason Gunthorpe wrote: > > Userspace needs fine grained control over the composition of the page > > table behind the PASID, 1:1 with the mm_struct is only one use case. > > VFIO already offers an interface for that. It shouldn't be too > complicated to expand that for PASID-bound page-tables. > > > Userspace needs to be able to handle IOMMU faults, apparently > > Could be implemented by a fault-fd handed out by VFIO. The point is that other places beyond VFIO need this > I really don't think that user-space should have to deal with details > like PASIDs or other IOMMU internals, unless absolutly necessary. This > is an OS we work on, and the idea behind an OS is to abstract the > hardware away. Sure, but sometimes it is necessary, and in those cases the answer can't be "rewrite a SVA driver to use vfio" Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Tue, Nov 03, 2020 at 09:23:35AM -0400, Jason Gunthorpe wrote: > Userspace needs fine grained control over the composition of the page > table behind the PASID, 1:1 with the mm_struct is only one use case. VFIO already offers an interface for that. It shouldn't be too complicated to expand that for PASID-bound page-tables. > Userspace needs to be able to handle IOMMU faults, apparently Could be implemented by a fault-fd handed out by VFIO. > The Intel guys had a bunch of other stuff too, looking through the new > API they are proposing for vfio gives some flavour what they think is > needed.. I really don't think that user-space should have to deal with details like PASIDs or other IOMMU internals, unless absolutly necessary. This is an OS we work on, and the idea behind an OS is to abstract the hardware away. Regards, Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Tue, Nov 03, 2020 at 02:18:52PM +0100, j...@8bytes.org wrote: > On Tue, Nov 03, 2020 at 08:56:43AM -0400, Jason Gunthorpe wrote: > > On Tue, Nov 03, 2020 at 10:52:09AM +0100, j...@8bytes.org wrote: > > > So having said this, what is the benefit of exposing those SVA internals > > > to user-space? > > > > Only the device use of the PASID is device specific, the actual PASID > > and everything on the IOMMU side is generic. > > > > There is enough API there it doesn't make sense to duplicate it into > > every single SVA driver. > > What generic things have to be done by the drivers besides > allocating/deallocating PASIDs and binding an address space to it? > > Is there anything which isn't better handled in a kernel-internal > library which drivers just use? Userspace needs fine grained control over the composition of the page table behind the PASID, 1:1 with the mm_struct is only one use case. Userspace needs to be able to handle IOMMU faults, apparently The Intel guys had a bunch of other stuff too, looking through the new API they are proposing for vfio gives some flavour what they think is needed.. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Tue, Nov 03, 2020 at 08:56:43AM -0400, Jason Gunthorpe wrote: > On Tue, Nov 03, 2020 at 10:52:09AM +0100, j...@8bytes.org wrote: > > So having said this, what is the benefit of exposing those SVA internals > > to user-space? > > Only the device use of the PASID is device specific, the actual PASID > and everything on the IOMMU side is generic. > > There is enough API there it doesn't make sense to duplicate it into > every single SVA driver. What generic things have to be done by the drivers besides allocating/deallocating PASIDs and binding an address space to it? Is there anything which isn't better handled in a kernel-internal library which drivers just use? Regards, Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Tue, Nov 03, 2020 at 10:52:09AM +0100, j...@8bytes.org wrote: > So having said this, what is the benefit of exposing those SVA internals > to user-space? Only the device use of the PASID is device specific, the actual PASID and everything on the IOMMU side is generic. There is enough API there it doesn't make sense to duplicate it into every single SVA driver. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Mon, Oct 12, 2020 at 08:38:54AM +, Tian, Kevin wrote: > > From: Jason Wang > > Jason suggest something like /dev/sva. There will be a lot of other > > subsystems that could benefit from this (e.g vDPA). Honestly, I fail to see the benefit of offloading these IOMMU specific setup tasks to user-space. The ways PASID and the device partitioning it allows are used are very device specific. A GPU will be partitioned completly different than a network card. So the device drivers should use the (v)SVA APIs to setup the partitioning in a way which makes sense for the device. And VFIO is of course a user by itself, as it allows assigning device partitions to guests. Or even allow assigning complete devices and allow the guests to partition it themselfes. So having said this, what is the benefit of exposing those SVA internals to user-space? Regards, Joerg ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On 2020/10/22 上午11:54, Liu, Yi L wrote: Hi Jason, From: Jason Wang Sent: Thursday, October 22, 2020 10:56 AM [...] If you(Intel) don't have plan to do vDPA, you should not prevent other vendors from implementing PASID capable hardware through non-VFIO subsystem/uAPI on top of your SIOV architecture. Isn't it? yes, that's true. So if Intel has the willing to collaborate on the POC, I'd happy to help. E.g it's not hard to have a PASID capable virtio device through qemu, and we can start from there. actually, I'm already doing a poc to move the PASID allocation/free interface out of VFIO. So that other users could use it as well. I think this is also what you replied previously. :-) I'll send out when it's ready and seek for your help on mature it. does it sound good to you? Yes, fine with me. Thanks Regards, Yi Liu Thanks ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
Hi Jason, > From: Jason Wang > Sent: Thursday, October 22, 2020 10:56 AM > [...] > If you(Intel) don't have plan to do vDPA, you should not prevent other vendors > from implementing PASID capable hardware through non-VFIO subsystem/uAPI > on top of your SIOV architecture. Isn't it? yes, that's true. > So if Intel has the willing to collaborate on the POC, I'd happy to help. E.g > it's not > hard to have a PASID capable virtio device through qemu, and we can start from > there. actually, I'm already doing a poc to move the PASID allocation/free interface out of VFIO. So that other users could use it as well. I think this is also what you replied previously. :-) I'll send out when it's ready and seek for your help on mature it. does it sound good to you? Regards, Yi Liu > > Thanks > > > > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On 2020/10/22 上午1:51, Raj, Ashok wrote: On Wed, Oct 21, 2020 at 08:48:29AM -0300, Jason Gunthorpe wrote: On Tue, Oct 20, 2020 at 01:27:13PM -0700, Raj, Ashok wrote: On Tue, Oct 20, 2020 at 05:14:03PM -0300, Jason Gunthorpe wrote: On Tue, Oct 20, 2020 at 01:08:44PM -0700, Raj, Ashok wrote: On Tue, Oct 20, 2020 at 04:55:57PM -0300, Jason Gunthorpe wrote: On Tue, Oct 20, 2020 at 12:51:46PM -0700, Raj, Ashok wrote: I think we agreed (or agree to disagree and commit) for device types that we have for SIOV, VFIO based approach works well without having to re-invent another way to do the same things. Not looking for a shortcut by any means, but we need to plan around existing hardware though. Looks like vDPA took some shortcuts then to not abstract iommu uAPI instead :-)? When all necessary hardware was available.. This would be a solved puzzle. I think it is the opposite, vIOMMU and related has outgrown VFIO as the "home" and needs to stand alone. Apparently the HW that will need PASID for vDPA is Intel HW, so if So just to make this clear, I did check internally if there are any plans for vDPA + SVM. There are none at the moment. Not SVM, SIOV. ... And that included.. I should have said vDPA + PASID, No current plans. I have no idea who set expectations with you. Can you please put me in touch with that person, privately is fine. It was the team that aruged VDPA had to be done through VFIO - SIOV and PASID was one of their reasons it had to be VFIO, check the list archives Humm... I could search the arhives, but the point is I'm confirming that there is no forward looking plan! And who ever did was it was based on probably strawman hypothetical argument that wasn't grounded in reality. If they didn't plan to use it, bit of a strawman argument, right? This doesn't need to continue like the debates :-) Pun intended :-) I don't think it makes any sense to have an abstract strawman argument design discussion. Yi is looking into for pasid management alone. Rest of the IOMMU related topics should wait until we have another *real* use requiring consolidation. Contrary to your argument, vDPA went with a half blown device only iommu user without considering existing abstractions like containers and such in VFIO is part of the reason the gap is big at the moment. And you might not agree, but that's beside the point. Can you explain why it must care VFIO abstractions? vDPA is trying to hide device details which is fundamentally different with what VFIO wants to do. vDPA allows the parent to deal with IOMMU stuffs, and if necessary, the parent can talk with IOMMU drivers directly via IOMMU APIs. Rather than pivot ourselves around hypothetical, strawman, non-intersecting, suggesting architecture without having done a proof of concept to validate the proposal should stop. We have to ground ourselves in reality. The reality is VFIO should not be the only user for (v)SVA/SIOV/PASID. The kernel hard already had users like GPU or uacce. The use cases we have so far for SIOV, VFIO and mdev seem to be the right candidates and addresses them well. Now you might disagree, but as noted we all agreed to move past this. The mdev is not perfect for sure, but it's another topic. If you(Intel) don't have plan to do vDPA, you should not prevent other vendors from implementing PASID capable hardware through non-VFIO subsystem/uAPI on top of your SIOV architecture. Isn't it? So if Intel has the willing to collaborate on the POC, I'd happy to help. E.g it's not hard to have a PASID capable virtio device through qemu, and we can start from there. Thanks ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Wed, Oct 21, 2020 at 08:32:18PM -0300, Jason Gunthorpe wrote: > On Wed, Oct 21, 2020 at 01:03:15PM -0700, Raj, Ashok wrote: > > > I'm not sure why you tie in IDXD and VDPA here. How IDXD uses native > > SVM is orthogonal to how we achieve mdev passthrough to guest and > > vSVM. > > Everyone assumes that vIOMMU and SIOV aka PASID is going to be needed > on the VDPA side as well, I think that is why JasonW brought this up > in the first place. True, to that effect we are working on trying to move PASID allocation outside of VFIO, so both agents VFIO and vDPA with PASID, when that comes available can support one way to allocate and manage PASID's from user space. Since the IOASID is almost standalone, this is possible. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Wed, Oct 21, 2020 at 01:03:15PM -0700, Raj, Ashok wrote: > I'm not sure why you tie in IDXD and VDPA here. How IDXD uses native > SVM is orthogonal to how we achieve mdev passthrough to guest and > vSVM. Everyone assumes that vIOMMU and SIOV aka PASID is going to be needed on the VDPA side as well, I think that is why JasonW brought this up in the first place. We may not see vSVA for VDPA, but that seems like some special sub mode of all the other vIOMMU and PASID stuff, and not a completely distinct thing. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Wed, Oct 21, 2020 at 03:24:42PM -0300, Jason Gunthorpe wrote: > > > Contrary to your argument, vDPA went with a half blown device only > > iommu user without considering existing abstractions like containers > > VDPA IOMMU was done *for Intel*, as the kind of half-architected thing > you are advocating should be allowed for IDXD here. Not sure why you > think bashing that work is going to help your case here. I'm not bashing that work, sorry if it comes out that way, but just feels like double standards. I'm not sure why you tie in IDXD and VDPA here. How IDXD uses native SVM is orthogonal to how we achieve mdev passthrough to guest and vSVM. We visited that exact thing multiple times. Doing SVM is quite simple and doesn't carry the weight of other (Kevin explained this in detail not too long ago) long list of things we need to accomplish for mdev pass through. For SVM, just access to hw, mmio and bind_mm to get a PASID bound with IOMMU. For IDXD that creates passthough devices for guest access and vSVM is through the VFIO path. For guest SVM, we expose mdev's to guest OS, idxd in the guest provides vSVM services. vSVM is *not* build around native SVM interfaces. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Wed, Oct 21, 2020 at 08:48:29AM -0300, Jason Gunthorpe wrote: > On Tue, Oct 20, 2020 at 01:27:13PM -0700, Raj, Ashok wrote: > > On Tue, Oct 20, 2020 at 05:14:03PM -0300, Jason Gunthorpe wrote: > > > On Tue, Oct 20, 2020 at 01:08:44PM -0700, Raj, Ashok wrote: > > > > On Tue, Oct 20, 2020 at 04:55:57PM -0300, Jason Gunthorpe wrote: > > > > > On Tue, Oct 20, 2020 at 12:51:46PM -0700, Raj, Ashok wrote: > > > > > > I think we agreed (or agree to disagree and commit) for device > > > > > > types that > > > > > > we have for SIOV, VFIO based approach works well without having to > > > > > > re-invent > > > > > > another way to do the same things. Not looking for a shortcut by > > > > > > any means, > > > > > > but we need to plan around existing hardware though. Looks like > > > > > > vDPA took > > > > > > some shortcuts then to not abstract iommu uAPI instead :-)? When all > > > > > > necessary hardware was available.. This would be a solved puzzle. > > > > > > > > > > I think it is the opposite, vIOMMU and related has outgrown VFIO as > > > > > the "home" and needs to stand alone. > > > > > > > > > > Apparently the HW that will need PASID for vDPA is Intel HW, so if > > > > > > > > So just to make this clear, I did check internally if there are any > > > > plans > > > > for vDPA + SVM. There are none at the moment. > > > > > > Not SVM, SIOV. > > > > ... And that included.. I should have said vDPA + PASID, No current plans. > > I have no idea who set expectations with you. Can you please put me in > > touch > > with that person, privately is fine. > > It was the team that aruged VDPA had to be done through VFIO - SIOV > and PASID was one of their reasons it had to be VFIO, check the list > archives Humm... I could search the arhives, but the point is I'm confirming that there is no forward looking plan! And who ever did was it was based on probably strawman hypothetical argument that wasn't grounded in reality. > > If they didn't plan to use it, bit of a strawman argument, right? This doesn't need to continue like the debates :-) Pun intended :-) I don't think it makes any sense to have an abstract strawman argument design discussion. Yi is looking into for pasid management alone. Rest of the IOMMU related topics should wait until we have another *real* use requiring consolidation. Contrary to your argument, vDPA went with a half blown device only iommu user without considering existing abstractions like containers and such in VFIO is part of the reason the gap is big at the moment. And you might not agree, but that's beside the point. Rather than pivot ourselves around hypothetical, strawman, non-intersecting, suggesting architecture without having done a proof of concept to validate the proposal should stop. We have to ground ourselves in reality. The use cases we have so far for SIOV, VFIO and mdev seem to be the right candidates and addresses them well. Now you might disagree, but as noted we all agreed to move past this. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Wed, Oct 21, 2020 at 10:51:46AM -0700, Raj, Ashok wrote: > > If they didn't plan to use it, bit of a strawman argument, right? > > This doesn't need to continue like the debates :-) Pun intended :-) > > I don't think it makes any sense to have an abstract strawman argument > design discussion. Yi is looking into for pasid management alone. Rest > of the IOMMU related topics should wait until we have another > *real* use requiring consolidation. Actually I'm really annoyed right now that the other Intel team wasted quiet a lot of the rest of our time on arguing about vDPA and vfio with no actual interest in this technology. So you'll excuse me if I'm not particularly enamored with this discussion right now. > Contrary to your argument, vDPA went with a half blown device only > iommu user without considering existing abstractions like containers VDPA IOMMU was done *for Intel*, as the kind of half-architected thing you are advocating should be allowed for IDXD here. Not sure why you think bashing that work is going to help your case here. I'm saying Intel needs to get its architecture together and stop ceating this mess across the kernel to support Intel devices. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Tue, Oct 20, 2020 at 01:27:13PM -0700, Raj, Ashok wrote: > On Tue, Oct 20, 2020 at 05:14:03PM -0300, Jason Gunthorpe wrote: > > On Tue, Oct 20, 2020 at 01:08:44PM -0700, Raj, Ashok wrote: > > > On Tue, Oct 20, 2020 at 04:55:57PM -0300, Jason Gunthorpe wrote: > > > > On Tue, Oct 20, 2020 at 12:51:46PM -0700, Raj, Ashok wrote: > > > > > I think we agreed (or agree to disagree and commit) for device types > > > > > that > > > > > we have for SIOV, VFIO based approach works well without having to > > > > > re-invent > > > > > another way to do the same things. Not looking for a shortcut by any > > > > > means, > > > > > but we need to plan around existing hardware though. Looks like vDPA > > > > > took > > > > > some shortcuts then to not abstract iommu uAPI instead :-)? When all > > > > > necessary hardware was available.. This would be a solved puzzle. > > > > > > > > I think it is the opposite, vIOMMU and related has outgrown VFIO as > > > > the "home" and needs to stand alone. > > > > > > > > Apparently the HW that will need PASID for vDPA is Intel HW, so if > > > > > > So just to make this clear, I did check internally if there are any plans > > > for vDPA + SVM. There are none at the moment. > > > > Not SVM, SIOV. > > ... And that included.. I should have said vDPA + PASID, No current plans. > I have no idea who set expectations with you. Can you please put me in touch > with that person, privately is fine. It was the team that aruged VDPA had to be done through VFIO - SIOV and PASID was one of their reasons it had to be VFIO, check the list archives If they didn't plan to use it, bit of a strawman argument, right? Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On 2020/10/20 下午10:19, Liu, Yi L wrote: From: Jason Gunthorpe Sent: Tuesday, October 20, 2020 10:02 PM [...] Whoever provides the vIOMMU emulation and relays the page fault to the guest has to translate the RID - that's the point. But the device info (especially the sub-device info) is within the passthru framework (e.g. VFIO). So page fault reporting needs to go through passthru framework. what does that have to do with VFIO? How will VPDA provide the vIOMMU emulation? a pardon here. I believe vIOMMU emulation should be based on IOMMU vendor specification, right? you may correct me if I'm missing anything. I'm asking how will VDPA translate the RID when VDPA triggers a page fault that has to be relayed to the guest. VDPA also has virtual PCI devices it creates. I've got a question. Does vDPA work with vIOMMU so far? e.g. Intel vIOMMU or other type vIOMMU. The kernel code is ready. Note that vhost suppport for vIOMMU is even earlier than VFIO. The API is designed to be generic is not limited to any specific type of vIOMMU. For qemu, it just need a patch to implement map/unmap notifier as what VFIO did. Thanks Regards, Yi Liu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Tue, Oct 20, 2020 at 05:14:03PM -0300, Jason Gunthorpe wrote: > On Tue, Oct 20, 2020 at 01:08:44PM -0700, Raj, Ashok wrote: > > On Tue, Oct 20, 2020 at 04:55:57PM -0300, Jason Gunthorpe wrote: > > > On Tue, Oct 20, 2020 at 12:51:46PM -0700, Raj, Ashok wrote: > > > > I think we agreed (or agree to disagree and commit) for device types > > > > that > > > > we have for SIOV, VFIO based approach works well without having to > > > > re-invent > > > > another way to do the same things. Not looking for a shortcut by any > > > > means, > > > > but we need to plan around existing hardware though. Looks like vDPA > > > > took > > > > some shortcuts then to not abstract iommu uAPI instead :-)? When all > > > > necessary hardware was available.. This would be a solved puzzle. > > > > > > I think it is the opposite, vIOMMU and related has outgrown VFIO as > > > the "home" and needs to stand alone. > > > > > > Apparently the HW that will need PASID for vDPA is Intel HW, so if > > > > So just to make this clear, I did check internally if there are any plans > > for vDPA + SVM. There are none at the moment. > > Not SVM, SIOV. ... And that included.. I should have said vDPA + PASID, No current plans. I have no idea who set expectations with you. Can you please put me in touch with that person, privately is fine. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Tue, Oct 20, 2020 at 01:08:44PM -0700, Raj, Ashok wrote: > On Tue, Oct 20, 2020 at 04:55:57PM -0300, Jason Gunthorpe wrote: > > On Tue, Oct 20, 2020 at 12:51:46PM -0700, Raj, Ashok wrote: > > > I think we agreed (or agree to disagree and commit) for device types that > > > we have for SIOV, VFIO based approach works well without having to > > > re-invent > > > another way to do the same things. Not looking for a shortcut by any > > > means, > > > but we need to plan around existing hardware though. Looks like vDPA took > > > some shortcuts then to not abstract iommu uAPI instead :-)? When all > > > necessary hardware was available.. This would be a solved puzzle. > > > > I think it is the opposite, vIOMMU and related has outgrown VFIO as > > the "home" and needs to stand alone. > > > > Apparently the HW that will need PASID for vDPA is Intel HW, so if > > So just to make this clear, I did check internally if there are any plans > for vDPA + SVM. There are none at the moment. Not SVM, SIOV. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Tue, Oct 20, 2020 at 04:55:57PM -0300, Jason Gunthorpe wrote: > On Tue, Oct 20, 2020 at 12:51:46PM -0700, Raj, Ashok wrote: > > I think we agreed (or agree to disagree and commit) for device types that > > we have for SIOV, VFIO based approach works well without having to > > re-invent > > another way to do the same things. Not looking for a shortcut by any means, > > but we need to plan around existing hardware though. Looks like vDPA took > > some shortcuts then to not abstract iommu uAPI instead :-)? When all > > necessary hardware was available.. This would be a solved puzzle. > > I think it is the opposite, vIOMMU and related has outgrown VFIO as > the "home" and needs to stand alone. > > Apparently the HW that will need PASID for vDPA is Intel HW, so if So just to make this clear, I did check internally if there are any plans for vDPA + SVM. There are none at the moment. It seems like you have better insight into our plans ;-). Please do let me know who confirmed vDPA roadmap with you and I would love to talk to them to clear the air. Cheers, Ashok ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Tue, Oct 20, 2020 at 12:51:46PM -0700, Raj, Ashok wrote: > I think we agreed (or agree to disagree and commit) for device types that > we have for SIOV, VFIO based approach works well without having to re-invent > another way to do the same things. Not looking for a shortcut by any means, > but we need to plan around existing hardware though. Looks like vDPA took > some shortcuts then to not abstract iommu uAPI instead :-)? When all > necessary hardware was available.. This would be a solved puzzle. I think it is the opposite, vIOMMU and related has outgrown VFIO as the "home" and needs to stand alone. Apparently the HW that will need PASID for vDPA is Intel HW, so if more is needed to do a good design you are probably the only one that can get it/do it. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Tue, Oct 20, 2020 at 02:03:36PM -0300, Jason Gunthorpe wrote: > On Tue, Oct 20, 2020 at 09:24:30AM -0700, Raj, Ashok wrote: > > Hi Jason, > > > > > > On Tue, Oct 20, 2020 at 11:02:17AM -0300, Jason Gunthorpe wrote: > > > On Tue, Oct 20, 2020 at 10:21:41AM +, Liu, Yi L wrote: > > > > > > > > I'm sure there will be some > > > > > weird overlaps because we can't delete any of the existing VFIO APIs, > > > > > but > > > > > that > > > > > should not be a blocker. > > > > > > > > but the weird thing is what we should consider. And it perhaps not just > > > > overlap, it may be a re-definition of VFIO container. As I mentioned, > > > > VFIO > > > > container is IOMMU context from the day it was defined. It could be the > > > > blocker. :-( > > > > > > So maybe you have to broaden the VFIO container to be usable by other > > > subsystems. The discussion here is about what the uAPI should look > > > like in a fairly abstract way. When we say 'dev/sva' it just some > > > placeholder for a shared cdev that provides the necessary > > > dis-aggregated functionality > > > > > > It could be an existing cdev with broader functionaltiy, it could > > > really be /dev/iommu, etc. This is up to the folks building it to > > > decide. > > > > > > > I'm not expert on vDPA for now, but I saw you three open source > > > > veterans have a similar idea for a place to cover IOMMU handling, > > > > I think it may be a valuable thing to do. I said "may be" as I'm not > > > > sure about Alex's opinion on such idea. But the sure thing is this > > > > idea may introduce weird overlap even re-definition of existing > > > > thing as I replied above. We need to evaluate the impact and mature > > > > the idea step by step. > > > > > > This has happened before, uAPIs do get obsoleted and replaced with > > > more general/better versions. It is often too hard to create a uAPI > > > that lasts for decades when the HW landscape is constantly changing > > > and sometime a reset is needed. > > > > I'm throwing this out with a lot of hesitation, but I'm going to :-) > > > > So we have been disussing this for months now, with some high level vision > > trying to get the uAPI's solidified with a vDPA hardware that might > > potentially have SIOV/SVM like extensions in hardware which actualy doesn't > > exist today. Understood people have plans. > > > Given that vDPA today has diverged already with duplicating use of IOMMU > > api's without making an effort to gravitate to /dev/iommu as how you are > > proposing. > > I see it more like, given that we already know we have multiple users > of IOMMU, adding new IOMMU focused features has to gravitate toward > some kind of convergance. > > Currently things are not so bad, VDPA is just getting started and the > current IOMMU feature set is not so big. > > PASID/vIOMMU/etc/et are all stressing this more, I think the > responsibility falls to the people proposing these features to do the > architecture work. > > > The question is should we hold hostage the current vSVM/vIOMMU efforts > > without even having made an effort for current vDPA/VFIO convergence. > > I don't think it is "held hostage" it is a "no shortcuts" approach, > there was always a recognition that future VDPA drivers will need some > work to integrated with vIOMMU realted stuff. I think we agreed (or agree to disagree and commit) for device types that we have for SIOV, VFIO based approach works well without having to re-invent another way to do the same things. Not looking for a shortcut by any means, but we need to plan around existing hardware though. Looks like vDPA took some shortcuts then to not abstract iommu uAPI instead :-)? When all necessary hardware was available.. This would be a solved puzzle. > > This is no different than the IMS discussion. The first proposed patch > was really simple, but a layering violation. > > The correct solution was some wild 20 patch series modernizing how x86 That was more like 48 patches, not 20 :-). But we had a real device with IMS to model and create these new abstractions and test them against. For vDPA+SVM we have non-intersecting conversations at the moment with no real hardware to model our discussion around. > interrupts works because it had outgrown itself. This general approach > to use the shared MSI infrastructure was pointed out at the very > beginning of IMS, BTW. Agreed, and thankfully Thomas worked hard and made it a lot easier :-). Today IMS only deals with on device store. Although IMS could mean just simply having system memory to hold the interrupt attributes. This is how some of the graphics devices would be with context holding interrupt attributes. But certainly not rushing this since we need a REAL user to be there before we support DEV_MSI that uses msg_addr/msg_data held in system memory. ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/ma
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Tue, Oct 20, 2020 at 09:24:30AM -0700, Raj, Ashok wrote: > Hi Jason, > > > On Tue, Oct 20, 2020 at 11:02:17AM -0300, Jason Gunthorpe wrote: > > On Tue, Oct 20, 2020 at 10:21:41AM +, Liu, Yi L wrote: > > > > > > I'm sure there will be some > > > > weird overlaps because we can't delete any of the existing VFIO APIs, > > > > but > > > > that > > > > should not be a blocker. > > > > > > but the weird thing is what we should consider. And it perhaps not just > > > overlap, it may be a re-definition of VFIO container. As I mentioned, VFIO > > > container is IOMMU context from the day it was defined. It could be the > > > blocker. :-( > > > > So maybe you have to broaden the VFIO container to be usable by other > > subsystems. The discussion here is about what the uAPI should look > > like in a fairly abstract way. When we say 'dev/sva' it just some > > placeholder for a shared cdev that provides the necessary > > dis-aggregated functionality > > > > It could be an existing cdev with broader functionaltiy, it could > > really be /dev/iommu, etc. This is up to the folks building it to > > decide. > > > > > I'm not expert on vDPA for now, but I saw you three open source > > > veterans have a similar idea for a place to cover IOMMU handling, > > > I think it may be a valuable thing to do. I said "may be" as I'm not > > > sure about Alex's opinion on such idea. But the sure thing is this > > > idea may introduce weird overlap even re-definition of existing > > > thing as I replied above. We need to evaluate the impact and mature > > > the idea step by step. > > > > This has happened before, uAPIs do get obsoleted and replaced with > > more general/better versions. It is often too hard to create a uAPI > > that lasts for decades when the HW landscape is constantly changing > > and sometime a reset is needed. > > I'm throwing this out with a lot of hesitation, but I'm going to :-) > > So we have been disussing this for months now, with some high level vision > trying to get the uAPI's solidified with a vDPA hardware that might > potentially have SIOV/SVM like extensions in hardware which actualy doesn't > exist today. Understood people have plans. > Given that vDPA today has diverged already with duplicating use of IOMMU > api's without making an effort to gravitate to /dev/iommu as how you are > proposing. I see it more like, given that we already know we have multiple users of IOMMU, adding new IOMMU focused features has to gravitate toward some kind of convergance. Currently things are not so bad, VDPA is just getting started and the current IOMMU feature set is not so big. PASID/vIOMMU/etc/et are all stressing this more, I think the responsibility falls to the people proposing these features to do the architecture work. > The question is should we hold hostage the current vSVM/vIOMMU efforts > without even having made an effort for current vDPA/VFIO convergence. I don't think it is "held hostage" it is a "no shortcuts" approach, there was always a recognition that future VDPA drivers will need some work to integrated with vIOMMU realted stuff. This is no different than the IMS discussion. The first proposed patch was really simple, but a layering violation. The correct solution was some wild 20 patch series modernizing how x86 interrupts works because it had outgrown itself. This general approach to use the shared MSI infrastructure was pointed out at the very beginning of IMS, BTW. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
Hi Jason, On Tue, Oct 20, 2020 at 11:02:17AM -0300, Jason Gunthorpe wrote: > On Tue, Oct 20, 2020 at 10:21:41AM +, Liu, Yi L wrote: > > > > I'm sure there will be some > > > weird overlaps because we can't delete any of the existing VFIO APIs, but > > > that > > > should not be a blocker. > > > > but the weird thing is what we should consider. And it perhaps not just > > overlap, it may be a re-definition of VFIO container. As I mentioned, VFIO > > container is IOMMU context from the day it was defined. It could be the > > blocker. :-( > > So maybe you have to broaden the VFIO container to be usable by other > subsystems. The discussion here is about what the uAPI should look > like in a fairly abstract way. When we say 'dev/sva' it just some > placeholder for a shared cdev that provides the necessary > dis-aggregated functionality > > It could be an existing cdev with broader functionaltiy, it could > really be /dev/iommu, etc. This is up to the folks building it to > decide. > > > I'm not expert on vDPA for now, but I saw you three open source > > veterans have a similar idea for a place to cover IOMMU handling, > > I think it may be a valuable thing to do. I said "may be" as I'm not > > sure about Alex's opinion on such idea. But the sure thing is this > > idea may introduce weird overlap even re-definition of existing > > thing as I replied above. We need to evaluate the impact and mature > > the idea step by step. > > This has happened before, uAPIs do get obsoleted and replaced with > more general/better versions. It is often too hard to create a uAPI > that lasts for decades when the HW landscape is constantly changing > and sometime a reset is needed. I'm throwing this out with a lot of hesitation, but I'm going to :-) So we have been disussing this for months now, with some high level vision trying to get the uAPI's solidified with a vDPA hardware that might potentially have SIOV/SVM like extensions in hardware which actualy doesn't exist today. Understood people have plans. Given that vDPA today has diverged already with duplicating use of IOMMU api's without making an effort to gravitate to /dev/iommu as how you are proposing. I think we all understand creating a permanent uAPI is hard, and they can evolve in future. Maybe we should start work on how to converge on generalizing the IOMMU story first with what we have today (vDPA + VFIO) convergence and let it evolve with real hardware and new features like SVM/SIOV in mind. This is going to take time and we can start with what we have today for pulling vDPA and VFIO pieces first. The question is should we hold hostage the current vSVM/vIOMMU efforts without even having made an effort for current vDPA/VFIO convergence. > > The jump to shared PASID based IOMMU feels like one of those moments here. As we have all noted, even without PASID we have divergence today? > > > > Whoever provides the vIOMMU emulation and relays the page fault to the > > > guest > > > has to translate the RID - > > > > that's the point. But the device info (especially the sub-device info) is > > within the passthru framework (e.g. VFIO). So page fault reporting needs > > to go through passthru framework. > > > > > what does that have to do with VFIO? > > > > > > How will VPDA provide the vIOMMU emulation? > > > > a pardon here. I believe vIOMMU emulation should be based on IOMMU vendor > > specification, right? you may correct me if I'm missing anything. > > I'm asking how will VDPA translate the RID when VDPA triggers a page > fault that has to be relayed to the guest. VDPA also has virtual PCI > devices it creates. > > We can't rely on VFIO to be the place that the vIOMMU lives because it > excludes/complicates everything that is not VFIO from using that > stuff. > > Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
> From: Jason Gunthorpe > Sent: Tuesday, October 20, 2020 10:02 PM [...] > > > Whoever provides the vIOMMU emulation and relays the page fault to the > guest > > > has to translate the RID - > > > > that's the point. But the device info (especially the sub-device info) is > > within the passthru framework (e.g. VFIO). So page fault reporting needs > > to go through passthru framework. > > > > > what does that have to do with VFIO? > > > > > > How will VPDA provide the vIOMMU emulation? > > > > a pardon here. I believe vIOMMU emulation should be based on IOMMU > vendor > > specification, right? you may correct me if I'm missing anything. > > I'm asking how will VDPA translate the RID when VDPA triggers a page > fault that has to be relayed to the guest. VDPA also has virtual PCI > devices it creates. I've got a question. Does vDPA work with vIOMMU so far? e.g. Intel vIOMMU or other type vIOMMU. Regards, Yi Liu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
> From: Jason Gunthorpe > Sent: Tuesday, October 20, 2020 10:05 PM > To: Liu, Yi L > > On Tue, Oct 20, 2020 at 02:00:31PM +, Liu, Yi L wrote: > > > From: Jason Gunthorpe > > > Sent: Tuesday, October 20, 2020 9:55 PM > > > > > > On Tue, Oct 20, 2020 at 09:40:14AM +, Liu, Yi L wrote: > > > > > > > > See previous discussion with Kevin. If I understand correctly, > > > > > you expect a > > > shared > > > > > L2 table if vDPA and VFIO device are using the same PASID. > > > > > > > > L2 table sharing is not mandatory. The mapping is the same, but no > > > > need to assume L2 tables are shared. Especially for VFIO/vDPA > > > > devices. Even within a passthru framework, like VFIO, if the > > > > attributes of backend IOMMU are not the same, the L2 page table are not > shared, but the mapping is the same. > > > > > > I think not being able to share the PASID shows exactly why this > > > VFIO centric approach is bad. > > > > no, I didn't say PASID is not sharable. My point is sharing L2 page > > table is not mandatory. > > IMHO a PASID should be 1:1 with a page table, what does it even mean to share > a PASID but have different page tables? PASID is actually 1:1 with an address space. Not really needs to be 1:1 with page table. :-) Regards, Yi Liu > Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Tue, Oct 20, 2020 at 02:00:31PM +, Liu, Yi L wrote: > > From: Jason Gunthorpe > > Sent: Tuesday, October 20, 2020 9:55 PM > > > > On Tue, Oct 20, 2020 at 09:40:14AM +, Liu, Yi L wrote: > > > > > > See previous discussion with Kevin. If I understand correctly, you > > > > expect a > > shared > > > > L2 table if vDPA and VFIO device are using the same PASID. > > > > > > L2 table sharing is not mandatory. The mapping is the same, but no need to > > > assume L2 tables are shared. Especially for VFIO/vDPA devices. Even within > > > a passthru framework, like VFIO, if the attributes of backend IOMMU are > > > not > > > the same, the L2 page table are not shared, but the mapping is the same. > > > > I think not being able to share the PASID shows exactly why this VFIO > > centric approach is bad. > > no, I didn't say PASID is not sharable. My point is sharing L2 page table is > not mandatory. IMHO a PASID should be 1:1 with a page table, what does it even mean to share a PASID but have different page tables? Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Tue, Oct 20, 2020 at 10:21:41AM +, Liu, Yi L wrote: > > I'm sure there will be some > > weird overlaps because we can't delete any of the existing VFIO APIs, but > > that > > should not be a blocker. > > but the weird thing is what we should consider. And it perhaps not just > overlap, it may be a re-definition of VFIO container. As I mentioned, VFIO > container is IOMMU context from the day it was defined. It could be the > blocker. :-( So maybe you have to broaden the VFIO container to be usable by other subsystems. The discussion here is about what the uAPI should look like in a fairly abstract way. When we say 'dev/sva' it just some placeholder for a shared cdev that provides the necessary dis-aggregated functionality It could be an existing cdev with broader functionaltiy, it could really be /dev/iommu, etc. This is up to the folks building it to decide. > I'm not expert on vDPA for now, but I saw you three open source > veterans have a similar idea for a place to cover IOMMU handling, > I think it may be a valuable thing to do. I said "may be" as I'm not > sure about Alex's opinion on such idea. But the sure thing is this > idea may introduce weird overlap even re-definition of existing > thing as I replied above. We need to evaluate the impact and mature > the idea step by step. This has happened before, uAPIs do get obsoleted and replaced with more general/better versions. It is often too hard to create a uAPI that lasts for decades when the HW landscape is constantly changing and sometime a reset is needed. The jump to shared PASID based IOMMU feels like one of those moments here. > > Whoever provides the vIOMMU emulation and relays the page fault to the guest > > has to translate the RID - > > that's the point. But the device info (especially the sub-device info) is > within the passthru framework (e.g. VFIO). So page fault reporting needs > to go through passthru framework. > > > what does that have to do with VFIO? > > > > How will VPDA provide the vIOMMU emulation? > > a pardon here. I believe vIOMMU emulation should be based on IOMMU vendor > specification, right? you may correct me if I'm missing anything. I'm asking how will VDPA translate the RID when VDPA triggers a page fault that has to be relayed to the guest. VDPA also has virtual PCI devices it creates. We can't rely on VFIO to be the place that the vIOMMU lives because it excludes/complicates everything that is not VFIO from using that stuff. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
> From: Jason Gunthorpe > Sent: Tuesday, October 20, 2020 9:55 PM > > On Tue, Oct 20, 2020 at 09:40:14AM +, Liu, Yi L wrote: > > > > See previous discussion with Kevin. If I understand correctly, you expect > > > a > shared > > > L2 table if vDPA and VFIO device are using the same PASID. > > > > L2 table sharing is not mandatory. The mapping is the same, but no need to > > assume L2 tables are shared. Especially for VFIO/vDPA devices. Even within > > a passthru framework, like VFIO, if the attributes of backend IOMMU are not > > the same, the L2 page table are not shared, but the mapping is the same. > > I think not being able to share the PASID shows exactly why this VFIO > centric approach is bad. no, I didn't say PASID is not sharable. My point is sharing L2 page table is not mandatory. Regards, Yi Liu > Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Tue, Oct 20, 2020 at 09:40:14AM +, Liu, Yi L wrote: > > See previous discussion with Kevin. If I understand correctly, you expect a > > shared > > L2 table if vDPA and VFIO device are using the same PASID. > > L2 table sharing is not mandatory. The mapping is the same, but no need to > assume L2 tables are shared. Especially for VFIO/vDPA devices. Even within > a passthru framework, like VFIO, if the attributes of backend IOMMU are not > the same, the L2 page table are not shared, but the mapping is the same. I think not being able to share the PASID shows exactly why this VFIO centric approach is bad. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
> From: Jason Gunthorpe > Sent: Monday, October 19, 2020 10:25 PM > > On Mon, Oct 19, 2020 at 08:39:03AM +, Liu, Yi L wrote: > > Hi Jason, > > > > Good to see your response. > > Ah, I was away got it. :-) > > > > > Second, IOMMU nested translation is a per IOMMU domain > > > > > capability. Since IOMMU domains are managed by VFIO/VDPA > > > > > (alloc/free domain, attach/detach device, set/get domain > > > > > attribute, etc.), reporting/enabling the nesting capability is > > > > > an natural extension to the domain uAPI of existing passthrough > frameworks. > > > > > Actually, VFIO already includes a nesting enable interface even > > > > > before this series. So it doesn't make sense to generalize this > > > > > uAPI out. > > > > > > The subsystem that obtains an IOMMU domain for a device would have > > > to register it with an open FD of the '/dev/sva'. That is the > > > connection between the two subsystems. It would be some simple > > > kernel internal > > > stuff: > > > > > > sva = get_sva_from_file(fd); > > > > Is this fd provided by userspace? I suppose the /dev/sva has a set of > > uAPIs which will finally program page table to host iommu driver. As > > far as I know, it's weird for VFIO user. Why should VFIO user connect > > to a /dev/sva fd after it sets a proper iommu type to the opened > > container. VFIO container already stands for an iommu context with > > which userspace could program page mapping to host iommu. > > Again the point is to dis-aggregate the vIOMMU related stuff from VFIO so it > can > be shared between more subsystems that need it. I understand you here. :-) > I'm sure there will be some > weird overlaps because we can't delete any of the existing VFIO APIs, but > that > should not be a blocker. but the weird thing is what we should consider. And it perhaps not just overlap, it may be a re-definition of VFIO container. As I mentioned, VFIO container is IOMMU context from the day it was defined. It could be the blocker. :-( > Having VFIO run in a mode where '/dev/sva' provides all the IOMMU handling is > a possible path. This looks to be similar with the proposal from Jason Wang and Kevin Tian. It is an idea to add "/dev/iommu" and delegate the IOMMU domain alloc, device attach/detach which is no in passthru framework to an independent kernel driver. Just as Jason Wang said replace vfio iommu type1 driver. Jason Wang: "And all the proposal in this series is to reuse the container fd. It should be possible to replace e.g type1 IOMMU with a unified module." link: https://lore.kernel.org/kvm/20201019142526.gj6...@nvidia.com/T/#md49fe9ac9d9eff6ddf5b8c2ee2f27eb2766f66f3 Kevin Tian: "Based on above, I feel a more reasonable way is to first make a /dev/iommu uAPI supporting DMA map/unmap usages and then introduce vSVA to it. Doing this order is because DMA map/unmap is widely used thus can better help verify the core logic with many existing devices." link: https://lore.kernel.org/kvm/mwhpr11mb1645c702d148a2852b41fca08c...@mwhpr11mb1645.namprd11.prod.outlook.com/ > > If your plan is to just opencode everything into VFIO then I don't > see how VDPA will work well, and if proper in-kernel abstractions are built I > fail to see how > routing some of it through userspace is a fundamental problem. I'm not expert on vDPA for now, but I saw you three open source veterans have a similar idea for a place to cover IOMMU handling, I think it may be a valuable thing to do. I said "may be" as I'm not sure about Alex's opinion on such idea. But the sure thing is this idea may introduce weird overlap even re-definition of existing thing as I replied above. We need to evaluate the impact and mature the idea step by step. That means it would take time, so perhaps we may do it in a staging way. First having a "/dev/iommu" be ready to handle page MAP/UNMAP which can be used by both VFIO and vDPA, mean- while let VFIO grow up (adding features) by itself and consider to adopt the new /dev/iommu later once /dev/iommu is competent. Of course this needs Alex's approval. And then adding new features to /dev/iommu, like SVA. > > > > sva_register_device_to_pasid(sva, pasid, pci_device, > > > iommu_domain); > > > > So this is supposed to be called by VFIO/VDPA to register the info to > > /dev/sva. > > right? And in dev/sva, it will also maintain the device/iommu_domain > > and pasid info? will it be duplicated with VFIO/VDPA? > > Each part needs to have the information it needs? yeah, but it's the duplication which I'm not very much in. Perhaps the idea from Jason Wang and Kevin would avoid such duplication. > > > > > Moreover, mapping page fault to subdevice requires pre- > > > > > registering subdevice fault data to IOMMU layer when binding > > > > > guest page table, while such fault data can be only retrieved > > > > > from parent driver through VFIO/VDPA. > > > > > > Not sure what this means, page fault should be tied to the PASID, > > > any hookup needed for that
RE: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
> From: Jason Wang > Sent: Tuesday, October 20, 2020 5:20 PM > > Hi Yi: > > On 2020/10/20 ??4:19, Liu, Yi L wrote: > >> Yes, but since PASID is a global identifier now, I think kernel > >> should track the a device list per PASID? > > We have such track. It's done in iommu driver. You can refer to the > > struct intel_svm. PASID is a global identifier, but it doesn’t affect > > that the PASID table is per-device. > > > >> So for such binding, PASID should be > >> sufficient for uAPI. > > not quite get it. PASID may be bound to multiple devices, how do you > > figure out the target device if you don’t provide such info. > > > I may miss soemthing but is there any reason that userspace need to figure out > the target device? PASID is about address space not a specific device I think. If you have multiple devices assigned to a VM, you won't expect to bind all of them to a PASID in a single bind operation, right? you may want to bind only the devices you really mean. This manner should be more flexible and reasonable. :-) > > > > > The binding request is initiated by the virtual IOMMU, when > > capturing guest attempt of binding page table to a virtual PASID > > entry for a given device. > And for L2 page table programming, if PASID is use by both e.g VFIO > and vDPA, user need to choose one of uAPI to build l2 mappings? > >>> for L2 page table mappings, it's done by VFIO MAP/UNMAP. for vdpa, I > >>> guess it is tlb flush. so you are right. Keeping L1/L2 page table > >>> management in a single uAPI set is also a reason for my current > >>> series which extends VFIO for L1 management. > >> I'm afraid that would introduce confusing to userspace. E.g: > >> > >> 1) when having only vDPA device, it uses vDPA uAPI to do l2 > >> management > >> 2) when vDPA shares PASID with VFIO, it will use VFIO uAPI to do the > >> l2 management? > > I think vDPA will still use its own l2 for the l2 mappings. not sure > > why you need vDPA use VFIO's l2 management. I don't think it is the case. > > > See previous discussion with Kevin. If I understand correctly, you expect a > shared > L2 table if vDPA and VFIO device are using the same PASID. L2 table sharing is not mandatory. The mapping is the same, but no need to assume L2 tables are shared. Especially for VFIO/vDPA devices. Even within a passthru framework, like VFIO, if the attributes of backend IOMMU are not the same, the L2 page table are not shared, but the mapping is the same. > In this case, if l2 is still managed separately, there will be duplicated > request of > map and unmap. yes, but this is not a functional issue, right? If we want to solve it, we should have a single uAPI set which can handle both L1 and L2 management. That's also why you proposed to replace type1 driver. right? Regards, Yi Liu > > Thanks > > > > > > Regards, > > Yi Liu > > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
Hi Yi: On 2020/10/20 下午4:19, Liu, Yi L wrote: Yes, but since PASID is a global identifier now, I think kernel should track the a device list per PASID? We have such track. It's done in iommu driver. You can refer to the struct intel_svm. PASID is a global identifier, but it doesn’t affect that the PASID table is per-device. So for such binding, PASID should be sufficient for uAPI. not quite get it. PASID may be bound to multiple devices, how do you figure out the target device if you don’t provide such info. I may miss soemthing but is there any reason that userspace need to figure out the target device? PASID is about address space not a specific device I think. The binding request is initiated by the virtual IOMMU, when capturing guest attempt of binding page table to a virtual PASID entry for a given device. And for L2 page table programming, if PASID is use by both e.g VFIO and vDPA, user need to choose one of uAPI to build l2 mappings? for L2 page table mappings, it's done by VFIO MAP/UNMAP. for vdpa, I guess it is tlb flush. so you are right. Keeping L1/L2 page table management in a single uAPI set is also a reason for my current series which extends VFIO for L1 management. I'm afraid that would introduce confusing to userspace. E.g: 1) when having only vDPA device, it uses vDPA uAPI to do l2 management 2) when vDPA shares PASID with VFIO, it will use VFIO uAPI to do the l2 management? I think vDPA will still use its own l2 for the l2 mappings. not sure why you need vDPA use VFIO's l2 management. I don't think it is the case. See previous discussion with Kevin. If I understand correctly, you expect a shared L2 table if vDPA and VFIO device are using the same PASID. In this case, if l2 is still managed separately, there will be duplicated request of map and unmap. Thanks Regards, Yi Liu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
Hey Jason, > From: Jason Wang > Sent: Tuesday, October 20, 2020 2:18 PM > > On 2020/10/15 ??6:14, Liu, Yi L wrote: > >> From: Jason Wang > >> Sent: Thursday, October 15, 2020 4:41 PM > >> > >> > >> On 2020/10/15 ??3:58, Tian, Kevin wrote: > From: Jason Wang > Sent: Thursday, October 15, 2020 2:52 PM > > > On 2020/10/14 ??11:08, Tian, Kevin wrote: > >> From: Jason Wang > >> Sent: Tuesday, October 13, 2020 2:22 PM > >> > >> > >> On 2020/10/12 ??4:38, Tian, Kevin wrote: > From: Jason Wang > Sent: Monday, September 14, 2020 12:20 PM > > >>> [...] > >>> > If it's possible, I would suggest a generic uAPI instead of > >>> a VFIO > specific one. > > Jason suggest something like /dev/sva. There will be a lot of > other subsystems that could benefit from this (e.g vDPA). > > Have you ever considered this approach? > > >>> Hi, Jason, > >>> > >>> We did some study on this approach and below is the output. It's a > >>> long writing but I didn't find a way to further abstract w/o > >>> losing necessary context. Sorry about that. > >>> > >>> Overall the real purpose of this series is to enable IOMMU nested > >>> translation capability with vSVA as one major usage, through below > >>> new uAPIs: > >>> 1) Report/enable IOMMU nested translation capability; > >>> 2) Allocate/free PASID; > >>> 3) Bind/unbind guest page table; > >>> 4) Invalidate IOMMU cache; > >>> 5) Handle IOMMU page request/response (not in this series); > >>> 1/3/4) is the minimal set for using IOMMU nested translation, with > >>> the other two optional. For example, the guest may enable vSVA on > >>> a device without using PASID. Or, it may bind its gIOVA page table > >>> which doesn't require page fault support. Finally, all operations > >>> can be applied to either physical device or subdevice. > >>> > >>> Then we evaluated each uAPI whether generalizing it is a good > >>> thing both in concept and regarding to complexity. > >>> > >>> First, unlike other uAPIs which are all backed by iommu_ops, PASID > >>> allocation/free is through the IOASID sub-system. > >> A question here, is IOASID expected to be the single management > >> interface for PASID? > > yes > > > >> (I'm asking since there're already vendor specific IDA based PASID > >> allocator e.g amdgpu_pasid_alloc()) > > That comes before IOASID core was introduced. I think it should be > > changed to use the new generic interface. Jacob/Jean can better > > comment if other reason exists for this exception. > If there's no exception it should be fixed. > > > >>> From this angle > >>> we feel generalizing PASID management does make some sense. > >>> First, PASID is just a number and not related to any device before > >>> it's bound to a page table and IOMMU domain. Second, PASID is a > >>> global resource (at least on Intel VT-d), > >> I think we need a definition of "global" here. It looks to me for > >> vt-d the PASID table is per device. > > PASID table is per device, thus VT-d could support per-device PASIDs > > in concept. > I think that's the requirement of PCIE spec which said PASID + RID > identifies the process address space ID. > > > > However on Intel platform we require PASIDs to be managed in > > system-wide (cross host and guest) when combining vSVA, SIOV, SR-IOV > > and ENQCMD together. > Any reason for such requirement? (I'm not familiar with ENQCMD, but > my understanding is that vSVA, SIOV or SR-IOV doesn't have the > requirement for system-wide PASID). > >>> ENQCMD is a new instruction to allow multiple processes submitting > >>> workload to one shared workqueue. Each process has an unique PASID > >>> saved in a MSR, which is included in the ENQCMD payload to indicate > >>> the address space when the CPU sends to the device. As one process > >>> might issue ENQCMD to multiple devices, OS-wide PASID allocation is > >>> required both in host and guest side. > >>> > >>> When executing ENQCMD in the guest to a SIOV device, the guest > >>> programmed value in the PASID_MSR must be translated to a host PASID > >>> value for proper function/isolation as PASID represents the address > >>> space. The translation is done through a new VMCS PASID translation > >>> structure (per-VM, and 1:1 mapping). From this angle the host PASIDs > >>> must be allocated 'globally' cross all assigned devices otherwise it > >>> may lead to 1:N mapping when a guest process issues ENQCMD to multiple > >>> assigned devices/subdevices. > >>> > >>> There will be a KVM forum session for this topic btw. > >> > >> Thanks for the background. Now I see the restrict comes from ENQCMD. > >> > >> >
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On 2020/10/15 下午6:14, Liu, Yi L wrote: From: Jason Wang Sent: Thursday, October 15, 2020 4:41 PM On 2020/10/15 ??3:58, Tian, Kevin wrote: From: Jason Wang Sent: Thursday, October 15, 2020 2:52 PM On 2020/10/14 ??11:08, Tian, Kevin wrote: From: Jason Wang Sent: Tuesday, October 13, 2020 2:22 PM On 2020/10/12 ??4:38, Tian, Kevin wrote: From: Jason Wang Sent: Monday, September 14, 2020 12:20 PM [...] > If it's possible, I would suggest a generic uAPI instead of a VFIO specific one. Jason suggest something like /dev/sva. There will be a lot of other subsystems that could benefit from this (e.g vDPA). Have you ever considered this approach? Hi, Jason, We did some study on this approach and below is the output. It's a long writing but I didn't find a way to further abstract w/o losing necessary context. Sorry about that. Overall the real purpose of this series is to enable IOMMU nested translation capability with vSVA as one major usage, through below new uAPIs: 1) Report/enable IOMMU nested translation capability; 2) Allocate/free PASID; 3) Bind/unbind guest page table; 4) Invalidate IOMMU cache; 5) Handle IOMMU page request/response (not in this series); 1/3/4) is the minimal set for using IOMMU nested translation, with the other two optional. For example, the guest may enable vSVA on a device without using PASID. Or, it may bind its gIOVA page table which doesn't require page fault support. Finally, all operations can be applied to either physical device or subdevice. Then we evaluated each uAPI whether generalizing it is a good thing both in concept and regarding to complexity. First, unlike other uAPIs which are all backed by iommu_ops, PASID allocation/free is through the IOASID sub-system. A question here, is IOASID expected to be the single management interface for PASID? yes (I'm asking since there're already vendor specific IDA based PASID allocator e.g amdgpu_pasid_alloc()) That comes before IOASID core was introduced. I think it should be changed to use the new generic interface. Jacob/Jean can better comment if other reason exists for this exception. If there's no exception it should be fixed. From this angle we feel generalizing PASID management does make some sense. First, PASID is just a number and not related to any device before it's bound to a page table and IOMMU domain. Second, PASID is a global resource (at least on Intel VT-d), I think we need a definition of "global" here. It looks to me for vt-d the PASID table is per device. PASID table is per device, thus VT-d could support per-device PASIDs in concept. I think that's the requirement of PCIE spec which said PASID + RID identifies the process address space ID. However on Intel platform we require PASIDs to be managed in system-wide (cross host and guest) when combining vSVA, SIOV, SR-IOV and ENQCMD together. Any reason for such requirement? (I'm not familiar with ENQCMD, but my understanding is that vSVA, SIOV or SR-IOV doesn't have the requirement for system-wide PASID). ENQCMD is a new instruction to allow multiple processes submitting workload to one shared workqueue. Each process has an unique PASID saved in a MSR, which is included in the ENQCMD payload to indicate the address space when the CPU sends to the device. As one process might issue ENQCMD to multiple devices, OS-wide PASID allocation is required both in host and guest side. When executing ENQCMD in the guest to a SIOV device, the guest programmed value in the PASID_MSR must be translated to a host PASID value for proper function/isolation as PASID represents the address space. The translation is done through a new VMCS PASID translation structure (per-VM, and 1:1 mapping). From this angle the host PASIDs must be allocated 'globally' cross all assigned devices otherwise it may lead to 1:N mapping when a guest process issues ENQCMD to multiple assigned devices/subdevices. There will be a KVM forum session for this topic btw. Thanks for the background. Now I see the restrict comes from ENQCMD. Thus the host creates only one 'global' PASID namespace but do use per-device PASID table to assure isolation between devices on Intel platforms. But ARM does it differently as Jean explained. They have a global namespace for host processes on all host-owned devices (same as Intel), but then per-device namespace when a device (and its PASID table) is assigned to userspace. Another question, is this possible to have two DMAR hardware unit(at least I can see two even in my laptop). In this case, is PASID still a global resource? yes while having separate VFIO/ VDPA allocation interfaces may easily cause confusion in userspace, e.g. which interface to be used if both VFIO/VDPA devices exist. Moreover, an unified interface allows centralized control over how many PASIDs are allowed per process. Yes. One unclear part with this generalization is about the permission. Do we
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Mon, Oct 19, 2020 at 08:39:03AM +, Liu, Yi L wrote: > Hi Jason, > > Good to see your response. Ah, I was away > > > > Second, IOMMU nested translation is a per IOMMU domain > > > > capability. Since IOMMU domains are managed by VFIO/VDPA > > > > (alloc/free domain, attach/detach device, set/get domain attribute, > > > > etc.), reporting/enabling the nesting capability is an natural > > > > extension to the domain uAPI of existing passthrough frameworks. > > > > Actually, VFIO already includes a nesting enable interface even > > > > before this series. So it doesn't make sense to generalize this uAPI > > > > out. > > > > The subsystem that obtains an IOMMU domain for a device would have to > > register it with an open FD of the '/dev/sva'. That is the connection > > between the two subsystems. It would be some simple kernel internal > > stuff: > > > > sva = get_sva_from_file(fd); > > Is this fd provided by userspace? I suppose the /dev/sva has a set of uAPIs > which will finally program page table to host iommu driver. As far as I know, > it's weird for VFIO user. Why should VFIO user connect to a /dev/sva fd after > it sets a proper iommu type to the opened container. VFIO container already > stands for an iommu context with which userspace could program page mapping > to host iommu. Again the point is to dis-aggregate the vIOMMU related stuff from VFIO so it can be shared between more subsystems that need it. I'm sure there will be some weird overlaps because we can't delete any of the existing VFIO APIs, but that should not be a blocker. Having VFIO run in a mode where '/dev/sva' provides all the IOMMU handling is a possible path. If your plan is to just opencode everything into VFIO then I don't see how VDPA will work well, and if proper in-kernel abstractions are built I fail to see how routing some of it through userspace is a fundamental problem. > > sva_register_device_to_pasid(sva, pasid, pci_device, iommu_domain); > > So this is supposed to be called by VFIO/VDPA to register the info to > /dev/sva. > right? And in dev/sva, it will also maintain the device/iommu_domain and pasid > info? will it be duplicated with VFIO/VDPA? Each part needs to have the information it needs? > > > > Moreover, mapping page fault to subdevice requires pre- > > > > registering subdevice fault data to IOMMU layer when binding > > > > guest page table, while such fault data can be only retrieved from > > > > parent driver through VFIO/VDPA. > > > > Not sure what this means, page fault should be tied to the PASID, any > > hookup needed for that should be done in-kernel when the device is > > connected to the PASID. > > you may refer to chapter 7.4.1.1 of VT-d spec. Page request is reported to > software together with the requestor id of the device. For the page request > injects to guest, it should have the device info. Whoever provides the vIOMMU emulation and relays the page fault to the guest has to translate the RID - what does that have to do with VFIO? How will VPDA provide the vIOMMU emulation? Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
Hi Jason, Good to see your response. > From: Jason Gunthorpe > Sent: Friday, October 16, 2020 11:37 PM > > On Wed, Oct 14, 2020 at 03:16:22AM +, Tian, Kevin wrote: > > Hi, Alex and Jason (G), > > > > How about your opinion for this new proposal? For now looks both > > Jason (W) and Jean are OK with this direction and more discussions > > are possibly required for the new /dev/ioasid interface. Internally > > we're doing a quick prototype to see any unforeseen issue with this > > separation. > > Assuming VDPA and VFIO will be the only two users so duplicating > everything only twice sounds pretty restricting to me. > > > > Second, IOMMU nested translation is a per IOMMU domain > > > capability. Since IOMMU domains are managed by VFIO/VDPA > > > (alloc/free domain, attach/detach device, set/get domain attribute, > > > etc.), reporting/enabling the nesting capability is an natural > > > extension to the domain uAPI of existing passthrough frameworks. > > > Actually, VFIO already includes a nesting enable interface even > > > before this series. So it doesn't make sense to generalize this uAPI > > > out. > > The subsystem that obtains an IOMMU domain for a device would have to > register it with an open FD of the '/dev/sva'. That is the connection > between the two subsystems. It would be some simple kernel internal > stuff: > > sva = get_sva_from_file(fd); Is this fd provided by userspace? I suppose the /dev/sva has a set of uAPIs which will finally program page table to host iommu driver. As far as I know, it's weird for VFIO user. Why should VFIO user connect to a /dev/sva fd after it sets a proper iommu type to the opened container. VFIO container already stands for an iommu context with which userspace could program page mapping to host iommu. > sva_register_device_to_pasid(sva, pasid, pci_device, iommu_domain); So this is supposed to be called by VFIO/VDPA to register the info to /dev/sva. right? And in dev/sva, it will also maintain the device/iommu_domain and pasid info? will it be duplicated with VFIO/VDPA? > Not sure why this is a roadblock? > > How would this be any different from having some kernel libsva that > VDPA and VFIO would both rely on? > > You don't plan to just open code all this stuff in VFIO, do you? > > > > Then the tricky part comes with the remaining operations (3/4/5), > > > which are all backed by iommu_ops thus effective only within an > > > IOMMU domain. To generalize them, the first thing is to find a way > > > to associate the sva_FD (opened through generic /dev/sva) with an > > > IOMMU domain that is created by VFIO/VDPA. The second thing is > > > to replicate {domain<->device/subdevice} association in /dev/sva > > > path because some operations (e.g. page fault) is triggered/handled > > > per device/subdevice. Therefore, /dev/sva must provide both per- > > > domain and per-device uAPIs similar to what VFIO/VDPA already > > > does. > > Yes, the point here was to move the general APIs out of VFIO and into > a sharable location. So, of course one would expect some duplication > during the transition period. > > > > Moreover, mapping page fault to subdevice requires pre- > > > registering subdevice fault data to IOMMU layer when binding > > > guest page table, while such fault data can be only retrieved from > > > parent driver through VFIO/VDPA. > > Not sure what this means, page fault should be tied to the PASID, any > hookup needed for that should be done in-kernel when the device is > connected to the PASID. you may refer to chapter 7.4.1.1 of VT-d spec. Page request is reported to software together with the requestor id of the device. For the page request injects to guest, it should have the device info. Regards, Yi Liu > > > > space but they may be organized in multiple IOMMU domains based > > > on their bus type. How (should we let) the userspace know the > > > domain information and open an sva_FD for each domain is the main > > > problem here. > > Why is one sva_FD per iommu domain required? The HW can attach the > same PASID to multiple iommu domains, right? > > > > In the end we just realized that doing such generalization doesn't > > > really lead to a clear design and instead requires tight coordination > > > between /dev/sva and VFIO/VDPA for almost every new uAPI > > > (especially about synchronization when the domain/device > > > association is changed or when the device/subdevice is being reset/ > > > drained). Finally it may become a usability burden to the userspace > > > on proper use of the two interfaces on the assigned device. > > If you have a list of things that needs to be done to attach a PCI > device to a PASID then of course they should be tidy kernel APIs > already, and not just hard wired into VFIO. > > The worst outcome would be to have VDPA and VFIO have to different > ways to do all of this with a different set of bugs. Bug fixes/new > features in VFIO won't flow over to VDPA. > > Jason ___
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Wed, Oct 14, 2020 at 03:16:22AM +, Tian, Kevin wrote: > Hi, Alex and Jason (G), > > How about your opinion for this new proposal? For now looks both > Jason (W) and Jean are OK with this direction and more discussions > are possibly required for the new /dev/ioasid interface. Internally > we're doing a quick prototype to see any unforeseen issue with this > separation. Assuming VDPA and VFIO will be the only two users so duplicating everything only twice sounds pretty restricting to me. > > Second, IOMMU nested translation is a per IOMMU domain > > capability. Since IOMMU domains are managed by VFIO/VDPA > > (alloc/free domain, attach/detach device, set/get domain attribute, > > etc.), reporting/enabling the nesting capability is an natural > > extension to the domain uAPI of existing passthrough frameworks. > > Actually, VFIO already includes a nesting enable interface even > > before this series. So it doesn't make sense to generalize this uAPI > > out. The subsystem that obtains an IOMMU domain for a device would have to register it with an open FD of the '/dev/sva'. That is the connection between the two subsystems. It would be some simple kernel internal stuff: sva = get_sva_from_file(fd); sva_register_device_to_pasid(sva, pasid, pci_device, iommu_domain); Not sure why this is a roadblock? How would this be any different from having some kernel libsva that VDPA and VFIO would both rely on? You don't plan to just open code all this stuff in VFIO, do you? > > Then the tricky part comes with the remaining operations (3/4/5), > > which are all backed by iommu_ops thus effective only within an > > IOMMU domain. To generalize them, the first thing is to find a way > > to associate the sva_FD (opened through generic /dev/sva) with an > > IOMMU domain that is created by VFIO/VDPA. The second thing is > > to replicate {domain<->device/subdevice} association in /dev/sva > > path because some operations (e.g. page fault) is triggered/handled > > per device/subdevice. Therefore, /dev/sva must provide both per- > > domain and per-device uAPIs similar to what VFIO/VDPA already > > does. Yes, the point here was to move the general APIs out of VFIO and into a sharable location. So, of course one would expect some duplication during the transition period. > > Moreover, mapping page fault to subdevice requires pre- > > registering subdevice fault data to IOMMU layer when binding > > guest page table, while such fault data can be only retrieved from > > parent driver through VFIO/VDPA. Not sure what this means, page fault should be tied to the PASID, any hookup needed for that should be done in-kernel when the device is connected to the PASID. > > space but they may be organized in multiple IOMMU domains based > > on their bus type. How (should we let) the userspace know the > > domain information and open an sva_FD for each domain is the main > > problem here. Why is one sva_FD per iommu domain required? The HW can attach the same PASID to multiple iommu domains, right? > > In the end we just realized that doing such generalization doesn't > > really lead to a clear design and instead requires tight coordination > > between /dev/sva and VFIO/VDPA for almost every new uAPI > > (especially about synchronization when the domain/device > > association is changed or when the device/subdevice is being reset/ > > drained). Finally it may become a usability burden to the userspace > > on proper use of the two interfaces on the assigned device. If you have a list of things that needs to be done to attach a PCI device to a PASID then of course they should be tidy kernel APIs already, and not just hard wired into VFIO. The worst outcome would be to have VDPA and VFIO have to different ways to do all of this with a different set of bugs. Bug fixes/new features in VFIO won't flow over to VDPA. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
> From: Jason Wang > Sent: Thursday, October 15, 2020 4:41 PM > > > On 2020/10/15 ??3:58, Tian, Kevin wrote: > >> From: Jason Wang > >> Sent: Thursday, October 15, 2020 2:52 PM > >> > >> > >> On 2020/10/14 ??11:08, Tian, Kevin wrote: > From: Jason Wang > Sent: Tuesday, October 13, 2020 2:22 PM > > > On 2020/10/12 ??4:38, Tian, Kevin wrote: > >> From: Jason Wang > >> Sent: Monday, September 14, 2020 12:20 PM > >> > > [...] > > > If it's possible, I would suggest a generic uAPI instead of > > a VFIO > >> specific one. > >> > >> Jason suggest something like /dev/sva. There will be a lot of > >> other subsystems that could benefit from this (e.g vDPA). > >> > >> Have you ever considered this approach? > >> > > Hi, Jason, > > > > We did some study on this approach and below is the output. It's a > > long writing but I didn't find a way to further abstract w/o > > losing necessary context. Sorry about that. > > > > Overall the real purpose of this series is to enable IOMMU nested > > translation capability with vSVA as one major usage, through below > > new uAPIs: > > 1) Report/enable IOMMU nested translation capability; > > 2) Allocate/free PASID; > > 3) Bind/unbind guest page table; > > 4) Invalidate IOMMU cache; > > 5) Handle IOMMU page request/response (not in this series); > > 1/3/4) is the minimal set for using IOMMU nested translation, with > > the other two optional. For example, the guest may enable vSVA on > > a device without using PASID. Or, it may bind its gIOVA page table > > which doesn't require page fault support. Finally, all operations > > can be applied to either physical device or subdevice. > > > > Then we evaluated each uAPI whether generalizing it is a good > > thing both in concept and regarding to complexity. > > > > First, unlike other uAPIs which are all backed by iommu_ops, PASID > > allocation/free is through the IOASID sub-system. > A question here, is IOASID expected to be the single management > interface for PASID? > >>> yes > >>> > (I'm asking since there're already vendor specific IDA based PASID > allocator e.g amdgpu_pasid_alloc()) > >>> That comes before IOASID core was introduced. I think it should be > >>> changed to use the new generic interface. Jacob/Jean can better > >>> comment if other reason exists for this exception. > >> > >> If there's no exception it should be fixed. > >> > >> > > From this angle > > we feel generalizing PASID management does make some sense. > > First, PASID is just a number and not related to any device before > > it's bound to a page table and IOMMU domain. Second, PASID is a > > global resource (at least on Intel VT-d), > I think we need a definition of "global" here. It looks to me for > vt-d the PASID table is per device. > >>> PASID table is per device, thus VT-d could support per-device PASIDs > >>> in concept. > >> > >> I think that's the requirement of PCIE spec which said PASID + RID > >> identifies the process address space ID. > >> > >> > >>>However on Intel platform we require PASIDs to be managed in > >>> system-wide (cross host and guest) when combining vSVA, SIOV, SR-IOV > >>> and ENQCMD together. > >> > >> Any reason for such requirement? (I'm not familiar with ENQCMD, but > >> my understanding is that vSVA, SIOV or SR-IOV doesn't have the > >> requirement for system-wide PASID). > > ENQCMD is a new instruction to allow multiple processes submitting > > workload to one shared workqueue. Each process has an unique PASID > > saved in a MSR, which is included in the ENQCMD payload to indicate > > the address space when the CPU sends to the device. As one process > > might issue ENQCMD to multiple devices, OS-wide PASID allocation is > > required both in host and guest side. > > > > When executing ENQCMD in the guest to a SIOV device, the guest > > programmed value in the PASID_MSR must be translated to a host PASID > > value for proper function/isolation as PASID represents the address > > space. The translation is done through a new VMCS PASID translation > > structure (per-VM, and 1:1 mapping). From this angle the host PASIDs > > must be allocated 'globally' cross all assigned devices otherwise it > > may lead to 1:N mapping when a guest process issues ENQCMD to multiple > > assigned devices/subdevices. > > > > There will be a KVM forum session for this topic btw. > > > Thanks for the background. Now I see the restrict comes from ENQCMD. > > > > > >> > >>> Thus the host creates only one 'global' PASID namespace but do use > >>> per-device PASID table to assure isolation between devices on Intel > >>> platforms. But ARM does it differently as Jean explained. > >>> They have a global namespace for host processes on all host-owned > >>> devices (s
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On 2020/10/15 下午3:58, Tian, Kevin wrote: From: Jason Wang Sent: Thursday, October 15, 2020 2:52 PM On 2020/10/14 上午11:08, Tian, Kevin wrote: From: Jason Wang Sent: Tuesday, October 13, 2020 2:22 PM On 2020/10/12 下午4:38, Tian, Kevin wrote: From: Jason Wang Sent: Monday, September 14, 2020 12:20 PM [...] > If it's possible, I would suggest a generic uAPI instead of a VFIO specific one. Jason suggest something like /dev/sva. There will be a lot of other subsystems that could benefit from this (e.g vDPA). Have you ever considered this approach? Hi, Jason, We did some study on this approach and below is the output. It's a long writing but I didn't find a way to further abstract w/o losing necessary context. Sorry about that. Overall the real purpose of this series is to enable IOMMU nested translation capability with vSVA as one major usage, through below new uAPIs: 1) Report/enable IOMMU nested translation capability; 2) Allocate/free PASID; 3) Bind/unbind guest page table; 4) Invalidate IOMMU cache; 5) Handle IOMMU page request/response (not in this series); 1/3/4) is the minimal set for using IOMMU nested translation, with the other two optional. For example, the guest may enable vSVA on a device without using PASID. Or, it may bind its gIOVA page table which doesn't require page fault support. Finally, all operations can be applied to either physical device or subdevice. Then we evaluated each uAPI whether generalizing it is a good thing both in concept and regarding to complexity. First, unlike other uAPIs which are all backed by iommu_ops, PASID allocation/free is through the IOASID sub-system. A question here, is IOASID expected to be the single management interface for PASID? yes (I'm asking since there're already vendor specific IDA based PASID allocator e.g amdgpu_pasid_alloc()) That comes before IOASID core was introduced. I think it should be changed to use the new generic interface. Jacob/Jean can better comment if other reason exists for this exception. If there's no exception it should be fixed. From this angle we feel generalizing PASID management does make some sense. First, PASID is just a number and not related to any device before it's bound to a page table and IOMMU domain. Second, PASID is a global resource (at least on Intel VT-d), I think we need a definition of "global" here. It looks to me for vt-d the PASID table is per device. PASID table is per device, thus VT-d could support per-device PASIDs in concept. I think that's the requirement of PCIE spec which said PASID + RID identifies the process address space ID. However on Intel platform we require PASIDs to be managed in system-wide (cross host and guest) when combining vSVA, SIOV, SR-IOV and ENQCMD together. Any reason for such requirement? (I'm not familiar with ENQCMD, but my understanding is that vSVA, SIOV or SR-IOV doesn't have the requirement for system-wide PASID). ENQCMD is a new instruction to allow multiple processes submitting workload to one shared workqueue. Each process has an unique PASID saved in a MSR, which is included in the ENQCMD payload to indicate the address space when the CPU sends to the device. As one process might issue ENQCMD to multiple devices, OS-wide PASID allocation is required both in host and guest side. When executing ENQCMD in the guest to a SIOV device, the guest programmed value in the PASID_MSR must be translated to a host PASID value for proper function/isolation as PASID represents the address space. The translation is done through a new VMCS PASID translation structure (per-VM, and 1:1 mapping). From this angle the host PASIDs must be allocated 'globally' cross all assigned devices otherwise it may lead to 1:N mapping when a guest process issues ENQCMD to multiple assigned devices/subdevices. There will be a KVM forum session for this topic btw. Thanks for the background. Now I see the restrict comes from ENQCMD. Thus the host creates only one 'global' PASID namespace but do use per-device PASID table to assure isolation between devices on Intel platforms. But ARM does it differently as Jean explained. They have a global namespace for host processes on all host-owned devices (same as Intel), but then per-device namespace when a device (and its PASID table) is assigned to userspace. Another question, is this possible to have two DMAR hardware unit(at least I can see two even in my laptop). In this case, is PASID still a global resource? yes while having separate VFIO/ VDPA allocation interfaces may easily cause confusion in userspace, e.g. which interface to be used if both VFIO/VDPA devices exist. Moreover, an unified interface allows centralized control over how many PASIDs are allowed per process. Yes. One unclear part with this generalization is about the permission. Do we open this interface to any process or only to those which have assigned devices? If the latter, w
RE: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
> From: Jason Wang > Sent: Thursday, October 15, 2020 2:52 PM > > > On 2020/10/14 上午11:08, Tian, Kevin wrote: > >> From: Jason Wang > >> Sent: Tuesday, October 13, 2020 2:22 PM > >> > >> > >> On 2020/10/12 下午4:38, Tian, Kevin wrote: > From: Jason Wang > Sent: Monday, September 14, 2020 12:20 PM > > >>> [...] > >>>> If it's possible, I would suggest a generic uAPI instead of a VFIO > specific one. > > Jason suggest something like /dev/sva. There will be a lot of other > subsystems that could benefit from this (e.g vDPA). > > Have you ever considered this approach? > > >>> Hi, Jason, > >>> > >>> We did some study on this approach and below is the output. It's a > >>> long writing but I didn't find a way to further abstract w/o losing > >>> necessary context. Sorry about that. > >>> > >>> Overall the real purpose of this series is to enable IOMMU nested > >>> translation capability with vSVA as one major usage, through > >>> below new uAPIs: > >>> 1) Report/enable IOMMU nested translation capability; > >>> 2) Allocate/free PASID; > >>> 3) Bind/unbind guest page table; > >>> 4) Invalidate IOMMU cache; > >>> 5) Handle IOMMU page request/response (not in this series); > >>> 1/3/4) is the minimal set for using IOMMU nested translation, with > >>> the other two optional. For example, the guest may enable vSVA on > >>> a device without using PASID. Or, it may bind its gIOVA page table > >>> which doesn't require page fault support. Finally, all operations can > >>> be applied to either physical device or subdevice. > >>> > >>> Then we evaluated each uAPI whether generalizing it is a good thing > >>> both in concept and regarding to complexity. > >>> > >>> First, unlike other uAPIs which are all backed by iommu_ops, PASID > >>> allocation/free is through the IOASID sub-system. > >> > >> A question here, is IOASID expected to be the single management > >> interface for PASID? > > yes > > > >> (I'm asking since there're already vendor specific IDA based PASID > >> allocator e.g amdgpu_pasid_alloc()) > > That comes before IOASID core was introduced. I think it should be > > changed to use the new generic interface. Jacob/Jean can better > > comment if other reason exists for this exception. > > > If there's no exception it should be fixed. > > > > > >> > >>>From this angle > >>> we feel generalizing PASID management does make some sense. > >>> First, PASID is just a number and not related to any device before > >>> it's bound to a page table and IOMMU domain. Second, PASID is a > >>> global resource (at least on Intel VT-d), > >> > >> I think we need a definition of "global" here. It looks to me for vt-d > >> the PASID table is per device. > > PASID table is per device, thus VT-d could support per-device PASIDs > > in concept. > > > I think that's the requirement of PCIE spec which said PASID + RID > identifies the process address space ID. > > > > However on Intel platform we require PASIDs to be managed > > in system-wide (cross host and guest) when combining vSVA, SIOV, SR-IOV > > and ENQCMD together. > > > Any reason for such requirement? (I'm not familiar with ENQCMD, but my > understanding is that vSVA, SIOV or SR-IOV doesn't have the requirement > for system-wide PASID). ENQCMD is a new instruction to allow multiple processes submitting workload to one shared workqueue. Each process has an unique PASID saved in a MSR, which is included in the ENQCMD payload to indicate the address space when the CPU sends to the device. As one process might issue ENQCMD to multiple devices, OS-wide PASID allocation is required both in host and guest side. When executing ENQCMD in the guest to a SIOV device, the guest programmed value in the PASID_MSR must be translated to a host PASID value for proper function/isolation as PASID represents the address space. The translation is done through a new VMCS PASID translation structure (per-VM, and 1:1 mapping). From this angle the host PASIDs must be allocated 'globally' cross all assigned devices otherwise it may lead to 1:N mapping when a guest process issues ENQCMD to multiple assigned devices/subdevices. There will be a KVM forum session for this topic btw. > > > > Thus the host creates only one 'global' PASID > > namespace but do use per-device PASID table to assure isolation between > > devices on Intel platforms. But ARM does it differently as Jean explained. > > They have a global namespace for host processes on all host-owned > > devices (same as Intel), but then per-device namespace when a device > > (and its PASID table) is assigned to userspace. > > > >> Another question, is this possible to have two DMAR hardware unit(at > >> least I can see two even in my laptop). In this case, is PASID still a > >> global resource? > > yes > > > >> > >>>while having separate VFIO/ > >>> VDPA allocation interfaces may easily cause confusion in userspace, > >>> e.g. which interface to be u
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On 2020/10/15 上午7:10, Alex Williamson wrote: On Wed, 14 Oct 2020 03:08:31 + "Tian, Kevin" wrote: From: Jason Wang Sent: Tuesday, October 13, 2020 2:22 PM On 2020/10/12 下午4:38, Tian, Kevin wrote: From: Jason Wang Sent: Monday, September 14, 2020 12:20 PM [...] > If it's possible, I would suggest a generic uAPI instead of a VFIO specific one. Jason suggest something like /dev/sva. There will be a lot of other subsystems that could benefit from this (e.g vDPA). Have you ever considered this approach? Hi, Jason, We did some study on this approach and below is the output. It's a long writing but I didn't find a way to further abstract w/o losing necessary context. Sorry about that. Overall the real purpose of this series is to enable IOMMU nested translation capability with vSVA as one major usage, through below new uAPIs: 1) Report/enable IOMMU nested translation capability; 2) Allocate/free PASID; 3) Bind/unbind guest page table; 4) Invalidate IOMMU cache; 5) Handle IOMMU page request/response (not in this series); 1/3/4) is the minimal set for using IOMMU nested translation, with the other two optional. For example, the guest may enable vSVA on a device without using PASID. Or, it may bind its gIOVA page table which doesn't require page fault support. Finally, all operations can be applied to either physical device or subdevice. Then we evaluated each uAPI whether generalizing it is a good thing both in concept and regarding to complexity. First, unlike other uAPIs which are all backed by iommu_ops, PASID allocation/free is through the IOASID sub-system. A question here, is IOASID expected to be the single management interface for PASID? yes (I'm asking since there're already vendor specific IDA based PASID allocator e.g amdgpu_pasid_alloc()) That comes before IOASID core was introduced. I think it should be changed to use the new generic interface. Jacob/Jean can better comment if other reason exists for this exception. From this angle we feel generalizing PASID management does make some sense. First, PASID is just a number and not related to any device before it's bound to a page table and IOMMU domain. Second, PASID is a global resource (at least on Intel VT-d), I think we need a definition of "global" here. It looks to me for vt-d the PASID table is per device. PASID table is per device, thus VT-d could support per-device PASIDs in concept. However on Intel platform we require PASIDs to be managed in system-wide (cross host and guest) when combining vSVA, SIOV, SR-IOV and ENQCMD together. Thus the host creates only one 'global' PASID namespace but do use per-device PASID table to assure isolation between devices on Intel platforms. But ARM does it differently as Jean explained. They have a global namespace for host processes on all host-owned devices (same as Intel), but then per-device namespace when a device (and its PASID table) is assigned to userspace. Another question, is this possible to have two DMAR hardware unit(at least I can see two even in my laptop). In this case, is PASID still a global resource? yes while having separate VFIO/ VDPA allocation interfaces may easily cause confusion in userspace, e.g. which interface to be used if both VFIO/VDPA devices exist. Moreover, an unified interface allows centralized control over how many PASIDs are allowed per process. Yes. One unclear part with this generalization is about the permission. Do we open this interface to any process or only to those which have assigned devices? If the latter, what would be the mechanism to coordinate between this new interface and specific passthrough frameworks? I'm not sure, but if you just want a permission, you probably can introduce new capability (CAP_XXX) for this. A more tricky case, vSVA support on ARM (Eric/Jean please correct me) plans to do per-device PASID namespace which is built on a bind_pasid_table iommu callback to allow guest fully manage its PASIDs on a given passthrough device. I see, so I think the answer is to prepare for the namespace support from the start. (btw, I don't see how namespace is handled in current IOASID module?) The PASID table is based on GPA when nested translation is enabled on ARM SMMU. This design implies that the guest manages PASID table thus PASIDs instead of going through host-side API on assigned device. From this angle we don't need explicit namespace in the host API. Just need a way to control how many PASIDs a process is allowed to allocate in the global namespace. btw IOASID module already has 'set' concept per-process and PASIDs are managed per-set. Then the quota control can be easily introduced in the 'set' level. I'm not sure how such requirement can be unified w/o involving passthrough frameworks, or whether ARM could also switch to global PASID style... Second, IOMMU nested translation is a per IOMMU domain capability. Since IO
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On 2020/10/14 上午11:08, Tian, Kevin wrote: From: Jason Wang Sent: Tuesday, October 13, 2020 2:22 PM On 2020/10/12 下午4:38, Tian, Kevin wrote: From: Jason Wang Sent: Monday, September 14, 2020 12:20 PM [...] > If it's possible, I would suggest a generic uAPI instead of a VFIO specific one. Jason suggest something like /dev/sva. There will be a lot of other subsystems that could benefit from this (e.g vDPA). Have you ever considered this approach? Hi, Jason, We did some study on this approach and below is the output. It's a long writing but I didn't find a way to further abstract w/o losing necessary context. Sorry about that. Overall the real purpose of this series is to enable IOMMU nested translation capability with vSVA as one major usage, through below new uAPIs: 1) Report/enable IOMMU nested translation capability; 2) Allocate/free PASID; 3) Bind/unbind guest page table; 4) Invalidate IOMMU cache; 5) Handle IOMMU page request/response (not in this series); 1/3/4) is the minimal set for using IOMMU nested translation, with the other two optional. For example, the guest may enable vSVA on a device without using PASID. Or, it may bind its gIOVA page table which doesn't require page fault support. Finally, all operations can be applied to either physical device or subdevice. Then we evaluated each uAPI whether generalizing it is a good thing both in concept and regarding to complexity. First, unlike other uAPIs which are all backed by iommu_ops, PASID allocation/free is through the IOASID sub-system. A question here, is IOASID expected to be the single management interface for PASID? yes (I'm asking since there're already vendor specific IDA based PASID allocator e.g amdgpu_pasid_alloc()) That comes before IOASID core was introduced. I think it should be changed to use the new generic interface. Jacob/Jean can better comment if other reason exists for this exception. If there's no exception it should be fixed. From this angle we feel generalizing PASID management does make some sense. First, PASID is just a number and not related to any device before it's bound to a page table and IOMMU domain. Second, PASID is a global resource (at least on Intel VT-d), I think we need a definition of "global" here. It looks to me for vt-d the PASID table is per device. PASID table is per device, thus VT-d could support per-device PASIDs in concept. I think that's the requirement of PCIE spec which said PASID + RID identifies the process address space ID. However on Intel platform we require PASIDs to be managed in system-wide (cross host and guest) when combining vSVA, SIOV, SR-IOV and ENQCMD together. Any reason for such requirement? (I'm not familiar with ENQCMD, but my understanding is that vSVA, SIOV or SR-IOV doesn't have the requirement for system-wide PASID). Thus the host creates only one 'global' PASID namespace but do use per-device PASID table to assure isolation between devices on Intel platforms. But ARM does it differently as Jean explained. They have a global namespace for host processes on all host-owned devices (same as Intel), but then per-device namespace when a device (and its PASID table) is assigned to userspace. Another question, is this possible to have two DMAR hardware unit(at least I can see two even in my laptop). In this case, is PASID still a global resource? yes while having separate VFIO/ VDPA allocation interfaces may easily cause confusion in userspace, e.g. which interface to be used if both VFIO/VDPA devices exist. Moreover, an unified interface allows centralized control over how many PASIDs are allowed per process. Yes. One unclear part with this generalization is about the permission. Do we open this interface to any process or only to those which have assigned devices? If the latter, what would be the mechanism to coordinate between this new interface and specific passthrough frameworks? I'm not sure, but if you just want a permission, you probably can introduce new capability (CAP_XXX) for this. A more tricky case, vSVA support on ARM (Eric/Jean please correct me) plans to do per-device PASID namespace which is built on a bind_pasid_table iommu callback to allow guest fully manage its PASIDs on a given passthrough device. I see, so I think the answer is to prepare for the namespace support from the start. (btw, I don't see how namespace is handled in current IOASID module?) The PASID table is based on GPA when nested translation is enabled on ARM SMMU. This design implies that the guest manages PASID table thus PASIDs instead of going through host-side API on assigned device. From this angle we don't need explicit namespace in the host API. Just need a way to control how many PASIDs a process is allowed to allocate in the global namespace. btw IOASID module already has 'set' concept per-process and PASIDs are managed per-set. Then the quota control can be
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Wed, 14 Oct 2020 03:08:31 + "Tian, Kevin" wrote: > > From: Jason Wang > > Sent: Tuesday, October 13, 2020 2:22 PM > > > > > > On 2020/10/12 下午4:38, Tian, Kevin wrote: > > >> From: Jason Wang > > >> Sent: Monday, September 14, 2020 12:20 PM > > >> > > > [...] > > > > If it's possible, I would suggest a generic uAPI instead of a VFIO > > >> specific one. > > >> > > >> Jason suggest something like /dev/sva. There will be a lot of other > > >> subsystems that could benefit from this (e.g vDPA). > > >> > > >> Have you ever considered this approach? > > >> > > > Hi, Jason, > > > > > > We did some study on this approach and below is the output. It's a > > > long writing but I didn't find a way to further abstract w/o losing > > > necessary context. Sorry about that. > > > > > > Overall the real purpose of this series is to enable IOMMU nested > > > translation capability with vSVA as one major usage, through > > > below new uAPIs: > > > 1) Report/enable IOMMU nested translation capability; > > > 2) Allocate/free PASID; > > > 3) Bind/unbind guest page table; > > > 4) Invalidate IOMMU cache; > > > 5) Handle IOMMU page request/response (not in this series); > > > 1/3/4) is the minimal set for using IOMMU nested translation, with > > > the other two optional. For example, the guest may enable vSVA on > > > a device without using PASID. Or, it may bind its gIOVA page table > > > which doesn't require page fault support. Finally, all operations can > > > be applied to either physical device or subdevice. > > > > > > Then we evaluated each uAPI whether generalizing it is a good thing > > > both in concept and regarding to complexity. > > > > > > First, unlike other uAPIs which are all backed by iommu_ops, PASID > > > allocation/free is through the IOASID sub-system. > > > > > > A question here, is IOASID expected to be the single management > > interface for PASID? > > yes > > > > > (I'm asking since there're already vendor specific IDA based PASID > > allocator e.g amdgpu_pasid_alloc()) > > That comes before IOASID core was introduced. I think it should be > changed to use the new generic interface. Jacob/Jean can better > comment if other reason exists for this exception. > > > > > > > > From this angle > > > we feel generalizing PASID management does make some sense. > > > First, PASID is just a number and not related to any device before > > > it's bound to a page table and IOMMU domain. Second, PASID is a > > > global resource (at least on Intel VT-d), > > > > > > I think we need a definition of "global" here. It looks to me for vt-d > > the PASID table is per device. > > PASID table is per device, thus VT-d could support per-device PASIDs > in concept. However on Intel platform we require PASIDs to be managed > in system-wide (cross host and guest) when combining vSVA, SIOV, SR-IOV > and ENQCMD together. Thus the host creates only one 'global' PASID > namespace but do use per-device PASID table to assure isolation between > devices on Intel platforms. But ARM does it differently as Jean explained. > They have a global namespace for host processes on all host-owned > devices (same as Intel), but then per-device namespace when a device > (and its PASID table) is assigned to userspace. > > > > > Another question, is this possible to have two DMAR hardware unit(at > > least I can see two even in my laptop). In this case, is PASID still a > > global resource? > > yes > > > > > > > > while having separate VFIO/ > > > VDPA allocation interfaces may easily cause confusion in userspace, > > > e.g. which interface to be used if both VFIO/VDPA devices exist. > > > Moreover, an unified interface allows centralized control over how > > > many PASIDs are allowed per process. > > > > > > Yes. > > > > > > > > > > One unclear part with this generalization is about the permission. > > > Do we open this interface to any process or only to those which > > > have assigned devices? If the latter, what would be the mechanism > > > to coordinate between this new interface and specific passthrough > > > frameworks? > > > > > > I'm not sure, but if you just want a permission, you probably can > > introduce new capability (CAP_XXX) for this. > > > > > > > A more tricky case, vSVA support on ARM (Eric/Jean > > > please correct me) plans to do per-device PASID namespace which > > > is built on a bind_pasid_table iommu callback to allow guest fully > > > manage its PASIDs on a given passthrough device. > > > > > > I see, so I think the answer is to prepare for the namespace support > > from the start. (btw, I don't see how namespace is handled in current > > IOASID module?) > > The PASID table is based on GPA when nested translation is enabled > on ARM SMMU. This design implies that the guest manages PASID > table thus PASIDs instead of going through host-side API on assigned > device. From this angle we don't need explicit namespace in the
RE: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
Hi, Alex and Jason (G), How about your opinion for this new proposal? For now looks both Jason (W) and Jean are OK with this direction and more discussions are possibly required for the new /dev/ioasid interface. Internally we're doing a quick prototype to see any unforeseen issue with this separation. Please let us know your thoughts. Thanks Kevin > From: Tian, Kevin > Sent: Monday, October 12, 2020 4:39 PM > > > From: Jason Wang > > Sent: Monday, September 14, 2020 12:20 PM > > > [...] > > If it's possible, I would suggest a generic uAPI instead of a VFIO > > specific one. > > > > Jason suggest something like /dev/sva. There will be a lot of other > > subsystems that could benefit from this (e.g vDPA). > > > > Have you ever considered this approach? > > > > Hi, Jason, > > We did some study on this approach and below is the output. It's a > long writing but I didn't find a way to further abstract w/o losing > necessary context. Sorry about that. > > Overall the real purpose of this series is to enable IOMMU nested > translation capability with vSVA as one major usage, through > below new uAPIs: > 1) Report/enable IOMMU nested translation capability; > 2) Allocate/free PASID; > 3) Bind/unbind guest page table; > 4) Invalidate IOMMU cache; > 5) Handle IOMMU page request/response (not in this series); > 1/3/4) is the minimal set for using IOMMU nested translation, with > the other two optional. For example, the guest may enable vSVA on > a device without using PASID. Or, it may bind its gIOVA page table > which doesn't require page fault support. Finally, all operations can > be applied to either physical device or subdevice. > > Then we evaluated each uAPI whether generalizing it is a good thing > both in concept and regarding to complexity. > > First, unlike other uAPIs which are all backed by iommu_ops, PASID > allocation/free is through the IOASID sub-system. From this angle > we feel generalizing PASID management does make some sense. > First, PASID is just a number and not related to any device before > it's bound to a page table and IOMMU domain. Second, PASID is a > global resource (at least on Intel VT-d), while having separate VFIO/ > VDPA allocation interfaces may easily cause confusion in userspace, > e.g. which interface to be used if both VFIO/VDPA devices exist. > Moreover, an unified interface allows centralized control over how > many PASIDs are allowed per process. > > One unclear part with this generalization is about the permission. > Do we open this interface to any process or only to those which > have assigned devices? If the latter, what would be the mechanism > to coordinate between this new interface and specific passthrough > frameworks? A more tricky case, vSVA support on ARM (Eric/Jean > please correct me) plans to do per-device PASID namespace which > is built on a bind_pasid_table iommu callback to allow guest fully > manage its PASIDs on a given passthrough device. I'm not sure > how such requirement can be unified w/o involving passthrough > frameworks, or whether ARM could also switch to global PASID > style... > > Second, IOMMU nested translation is a per IOMMU domain > capability. Since IOMMU domains are managed by VFIO/VDPA > (alloc/free domain, attach/detach device, set/get domain attribute, > etc.), reporting/enabling the nesting capability is an natural > extension to the domain uAPI of existing passthrough frameworks. > Actually, VFIO already includes a nesting enable interface even > before this series. So it doesn't make sense to generalize this uAPI > out. > > Then the tricky part comes with the remaining operations (3/4/5), > which are all backed by iommu_ops thus effective only within an > IOMMU domain. To generalize them, the first thing is to find a way > to associate the sva_FD (opened through generic /dev/sva) with an > IOMMU domain that is created by VFIO/VDPA. The second thing is > to replicate {domain<->device/subdevice} association in /dev/sva > path because some operations (e.g. page fault) is triggered/handled > per device/subdevice. Therefore, /dev/sva must provide both per- > domain and per-device uAPIs similar to what VFIO/VDPA already > does. Moreover, mapping page fault to subdevice requires pre- > registering subdevice fault data to IOMMU layer when binding > guest page table, while such fault data can be only retrieved from > parent driver through VFIO/VDPA. > > However, we failed to find a good way even at the 1st step about > domain association. The iommu domains are not exposed to the > userspace, and there is no 1:1 mapping between domain and device. > In VFIO, all devices within the same VFIO container share the address > space but they may be organized in multiple IOMMU domains based > on their bus type. How (should we let) the userspace know the > domain information and open an sva_FD for each domain is the main > problem here. > > In the end we just realized that doing such generalization
RE: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
> From: Jason Wang > Sent: Tuesday, October 13, 2020 2:22 PM > > > On 2020/10/12 下午4:38, Tian, Kevin wrote: > >> From: Jason Wang > >> Sent: Monday, September 14, 2020 12:20 PM > >> > > [...] > > > If it's possible, I would suggest a generic uAPI instead of a VFIO > >> specific one. > >> > >> Jason suggest something like /dev/sva. There will be a lot of other > >> subsystems that could benefit from this (e.g vDPA). > >> > >> Have you ever considered this approach? > >> > > Hi, Jason, > > > > We did some study on this approach and below is the output. It's a > > long writing but I didn't find a way to further abstract w/o losing > > necessary context. Sorry about that. > > > > Overall the real purpose of this series is to enable IOMMU nested > > translation capability with vSVA as one major usage, through > > below new uAPIs: > > 1) Report/enable IOMMU nested translation capability; > > 2) Allocate/free PASID; > > 3) Bind/unbind guest page table; > > 4) Invalidate IOMMU cache; > > 5) Handle IOMMU page request/response (not in this series); > > 1/3/4) is the minimal set for using IOMMU nested translation, with > > the other two optional. For example, the guest may enable vSVA on > > a device without using PASID. Or, it may bind its gIOVA page table > > which doesn't require page fault support. Finally, all operations can > > be applied to either physical device or subdevice. > > > > Then we evaluated each uAPI whether generalizing it is a good thing > > both in concept and regarding to complexity. > > > > First, unlike other uAPIs which are all backed by iommu_ops, PASID > > allocation/free is through the IOASID sub-system. > > > A question here, is IOASID expected to be the single management > interface for PASID? yes > > (I'm asking since there're already vendor specific IDA based PASID > allocator e.g amdgpu_pasid_alloc()) That comes before IOASID core was introduced. I think it should be changed to use the new generic interface. Jacob/Jean can better comment if other reason exists for this exception. > > > > From this angle > > we feel generalizing PASID management does make some sense. > > First, PASID is just a number and not related to any device before > > it's bound to a page table and IOMMU domain. Second, PASID is a > > global resource (at least on Intel VT-d), > > > I think we need a definition of "global" here. It looks to me for vt-d > the PASID table is per device. PASID table is per device, thus VT-d could support per-device PASIDs in concept. However on Intel platform we require PASIDs to be managed in system-wide (cross host and guest) when combining vSVA, SIOV, SR-IOV and ENQCMD together. Thus the host creates only one 'global' PASID namespace but do use per-device PASID table to assure isolation between devices on Intel platforms. But ARM does it differently as Jean explained. They have a global namespace for host processes on all host-owned devices (same as Intel), but then per-device namespace when a device (and its PASID table) is assigned to userspace. > > Another question, is this possible to have two DMAR hardware unit(at > least I can see two even in my laptop). In this case, is PASID still a > global resource? yes > > > > while having separate VFIO/ > > VDPA allocation interfaces may easily cause confusion in userspace, > > e.g. which interface to be used if both VFIO/VDPA devices exist. > > Moreover, an unified interface allows centralized control over how > > many PASIDs are allowed per process. > > > Yes. > > > > > > One unclear part with this generalization is about the permission. > > Do we open this interface to any process or only to those which > > have assigned devices? If the latter, what would be the mechanism > > to coordinate between this new interface and specific passthrough > > frameworks? > > > I'm not sure, but if you just want a permission, you probably can > introduce new capability (CAP_XXX) for this. > > > > A more tricky case, vSVA support on ARM (Eric/Jean > > please correct me) plans to do per-device PASID namespace which > > is built on a bind_pasid_table iommu callback to allow guest fully > > manage its PASIDs on a given passthrough device. > > > I see, so I think the answer is to prepare for the namespace support > from the start. (btw, I don't see how namespace is handled in current > IOASID module?) The PASID table is based on GPA when nested translation is enabled on ARM SMMU. This design implies that the guest manages PASID table thus PASIDs instead of going through host-side API on assigned device. From this angle we don't need explicit namespace in the host API. Just need a way to control how many PASIDs a process is allowed to allocate in the global namespace. btw IOASID module already has 'set' concept per-process and PASIDs are managed per-set. Then the quota control can be easily introduced in the 'set' level. > > > > I'm not sure > > how such requirement can be un
RE: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
> From: Jean-Philippe Brucker > Sent: Tuesday, October 13, 2020 6:28 PM > > On Mon, Oct 12, 2020 at 08:38:54AM +, Tian, Kevin wrote: > > > From: Jason Wang > > > Sent: Monday, September 14, 2020 12:20 PM > > > > > [...] > > > If it's possible, I would suggest a generic uAPI instead of a VFIO > > > specific one. > > > > > > Jason suggest something like /dev/sva. There will be a lot of other > > > subsystems that could benefit from this (e.g vDPA). > > > > > > Have you ever considered this approach? > > > > > > > Hi, Jason, > > > > We did some study on this approach and below is the output. It's a > > long writing but I didn't find a way to further abstract w/o losing > > necessary context. Sorry about that. > > > > Overall the real purpose of this series is to enable IOMMU nested > > translation capability with vSVA as one major usage, through > > below new uAPIs: > > 1) Report/enable IOMMU nested translation capability; > > 2) Allocate/free PASID; > > 3) Bind/unbind guest page table; > > 4) Invalidate IOMMU cache; > > 5) Handle IOMMU page request/response (not in this series); > > 1/3/4) is the minimal set for using IOMMU nested translation, with > > the other two optional. For example, the guest may enable vSVA on > > a device without using PASID. Or, it may bind its gIOVA page table > > which doesn't require page fault support. Finally, all operations can > > be applied to either physical device or subdevice. > > > > Then we evaluated each uAPI whether generalizing it is a good thing > > both in concept and regarding to complexity. > > > > First, unlike other uAPIs which are all backed by iommu_ops, PASID > > allocation/free is through the IOASID sub-system. From this angle > > we feel generalizing PASID management does make some sense. > > First, PASID is just a number and not related to any device before > > it's bound to a page table and IOMMU domain. Second, PASID is a > > global resource (at least on Intel VT-d), while having separate VFIO/ > > VDPA allocation interfaces may easily cause confusion in userspace, > > e.g. which interface to be used if both VFIO/VDPA devices exist. > > Moreover, an unified interface allows centralized control over how > > many PASIDs are allowed per process. > > > > One unclear part with this generalization is about the permission. > > Do we open this interface to any process or only to those which > > have assigned devices? If the latter, what would be the mechanism > > to coordinate between this new interface and specific passthrough > > frameworks? A more tricky case, vSVA support on ARM (Eric/Jean > > please correct me) plans to do per-device PASID namespace which > > is built on a bind_pasid_table iommu callback to allow guest fully > > manage its PASIDs on a given passthrough device. > > Yes we need a bind_pasid_table. The guest needs to allocate the PASID > tables because they are accessed via guest-physical addresses by the HW > SMMU. > > With bind_pasid_table, the invalidation message also requires a scope to > invalidate a whole PASID context, in addition to invalidating a mappings > ranges. > > > I'm not sure > > how such requirement can be unified w/o involving passthrough > > frameworks, or whether ARM could also switch to global PASID > > style... > > Not planned at the moment, sorry. It requires a PV IOMMU to do PASID > allocation, which is possible with virtio-iommu but not with a vSMMU > emulation. The VM will manage its own PASID space. The upside is that we > don't need userspace access to IOASID, so I won't pester you with comments > on that part of the API :) It makes sense. Possibly in the future when you plan to support SIOV-like capability then you may have to convert PASID table to use host physical address then the same API could be reused. :) Thanks Kevin > > > Second, IOMMU nested translation is a per IOMMU domain > > capability. Since IOMMU domains are managed by VFIO/VDPA > > (alloc/free domain, attach/detach device, set/get domain attribute, > > etc.), reporting/enabling the nesting capability is an natural > > extension to the domain uAPI of existing passthrough frameworks. > > Actually, VFIO already includes a nesting enable interface even > > before this series. So it doesn't make sense to generalize this uAPI > > out. > > Agree for enabling, but for reporting we did consider adding a sysfs > interface in /sys/class/iommu/ describing an IOMMU's properties. Then > opted for VFIO capabilities to keep the API nice and contained, but if > we're breaking up the API, sysfs might be more convenient to use and > extend. > > > Then the tricky part comes with the remaining operations (3/4/5), > > which are all backed by iommu_ops thus effective only within an > > IOMMU domain. To generalize them, the first thing is to find a way > > to associate the sva_FD (opened through generic /dev/sva) with an > > IOMMU domain that is created by VFIO/VDPA. The second thing is > > to replicate {domain<->device/subdevice
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Mon, Oct 12, 2020 at 08:38:54AM +, Tian, Kevin wrote: > > From: Jason Wang > > Sent: Monday, September 14, 2020 12:20 PM > > > [...] > > If it's possible, I would suggest a generic uAPI instead of a VFIO > > specific one. > > > > Jason suggest something like /dev/sva. There will be a lot of other > > subsystems that could benefit from this (e.g vDPA). > > > > Have you ever considered this approach? > > > > Hi, Jason, > > We did some study on this approach and below is the output. It's a > long writing but I didn't find a way to further abstract w/o losing > necessary context. Sorry about that. > > Overall the real purpose of this series is to enable IOMMU nested > translation capability with vSVA as one major usage, through > below new uAPIs: > 1) Report/enable IOMMU nested translation capability; > 2) Allocate/free PASID; > 3) Bind/unbind guest page table; > 4) Invalidate IOMMU cache; > 5) Handle IOMMU page request/response (not in this series); > 1/3/4) is the minimal set for using IOMMU nested translation, with > the other two optional. For example, the guest may enable vSVA on > a device without using PASID. Or, it may bind its gIOVA page table > which doesn't require page fault support. Finally, all operations can > be applied to either physical device or subdevice. > > Then we evaluated each uAPI whether generalizing it is a good thing > both in concept and regarding to complexity. > > First, unlike other uAPIs which are all backed by iommu_ops, PASID > allocation/free is through the IOASID sub-system. From this angle > we feel generalizing PASID management does make some sense. > First, PASID is just a number and not related to any device before > it's bound to a page table and IOMMU domain. Second, PASID is a > global resource (at least on Intel VT-d), while having separate VFIO/ > VDPA allocation interfaces may easily cause confusion in userspace, > e.g. which interface to be used if both VFIO/VDPA devices exist. > Moreover, an unified interface allows centralized control over how > many PASIDs are allowed per process. > > One unclear part with this generalization is about the permission. > Do we open this interface to any process or only to those which > have assigned devices? If the latter, what would be the mechanism > to coordinate between this new interface and specific passthrough > frameworks? A more tricky case, vSVA support on ARM (Eric/Jean > please correct me) plans to do per-device PASID namespace which > is built on a bind_pasid_table iommu callback to allow guest fully > manage its PASIDs on a given passthrough device. Yes we need a bind_pasid_table. The guest needs to allocate the PASID tables because they are accessed via guest-physical addresses by the HW SMMU. With bind_pasid_table, the invalidation message also requires a scope to invalidate a whole PASID context, in addition to invalidating a mappings ranges. > I'm not sure > how such requirement can be unified w/o involving passthrough > frameworks, or whether ARM could also switch to global PASID > style... Not planned at the moment, sorry. It requires a PV IOMMU to do PASID allocation, which is possible with virtio-iommu but not with a vSMMU emulation. The VM will manage its own PASID space. The upside is that we don't need userspace access to IOASID, so I won't pester you with comments on that part of the API :) > Second, IOMMU nested translation is a per IOMMU domain > capability. Since IOMMU domains are managed by VFIO/VDPA > (alloc/free domain, attach/detach device, set/get domain attribute, > etc.), reporting/enabling the nesting capability is an natural > extension to the domain uAPI of existing passthrough frameworks. > Actually, VFIO already includes a nesting enable interface even > before this series. So it doesn't make sense to generalize this uAPI > out. Agree for enabling, but for reporting we did consider adding a sysfs interface in /sys/class/iommu/ describing an IOMMU's properties. Then opted for VFIO capabilities to keep the API nice and contained, but if we're breaking up the API, sysfs might be more convenient to use and extend. > Then the tricky part comes with the remaining operations (3/4/5), > which are all backed by iommu_ops thus effective only within an > IOMMU domain. To generalize them, the first thing is to find a way > to associate the sva_FD (opened through generic /dev/sva) with an > IOMMU domain that is created by VFIO/VDPA. The second thing is > to replicate {domain<->device/subdevice} association in /dev/sva > path because some operations (e.g. page fault) is triggered/handled > per device/subdevice. Therefore, /dev/sva must provide both per- > domain and per-device uAPIs similar to what VFIO/VDPA already > does. Moreover, mapping page fault to subdevice requires pre- > registering subdevice fault data to IOMMU layer when binding > guest page table, while such fault data can be only retrieved from > parent
Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On 2020/10/12 下午4:38, Tian, Kevin wrote: From: Jason Wang Sent: Monday, September 14, 2020 12:20 PM [...] > If it's possible, I would suggest a generic uAPI instead of a VFIO specific one. Jason suggest something like /dev/sva. There will be a lot of other subsystems that could benefit from this (e.g vDPA). Have you ever considered this approach? Hi, Jason, We did some study on this approach and below is the output. It's a long writing but I didn't find a way to further abstract w/o losing necessary context. Sorry about that. Overall the real purpose of this series is to enable IOMMU nested translation capability with vSVA as one major usage, through below new uAPIs: 1) Report/enable IOMMU nested translation capability; 2) Allocate/free PASID; 3) Bind/unbind guest page table; 4) Invalidate IOMMU cache; 5) Handle IOMMU page request/response (not in this series); 1/3/4) is the minimal set for using IOMMU nested translation, with the other two optional. For example, the guest may enable vSVA on a device without using PASID. Or, it may bind its gIOVA page table which doesn't require page fault support. Finally, all operations can be applied to either physical device or subdevice. Then we evaluated each uAPI whether generalizing it is a good thing both in concept and regarding to complexity. First, unlike other uAPIs which are all backed by iommu_ops, PASID allocation/free is through the IOASID sub-system. A question here, is IOASID expected to be the single management interface for PASID? (I'm asking since there're already vendor specific IDA based PASID allocator e.g amdgpu_pasid_alloc()) From this angle we feel generalizing PASID management does make some sense. First, PASID is just a number and not related to any device before it's bound to a page table and IOMMU domain. Second, PASID is a global resource (at least on Intel VT-d), I think we need a definition of "global" here. It looks to me for vt-d the PASID table is per device. Another question, is this possible to have two DMAR hardware unit(at least I can see two even in my laptop). In this case, is PASID still a global resource? while having separate VFIO/ VDPA allocation interfaces may easily cause confusion in userspace, e.g. which interface to be used if both VFIO/VDPA devices exist. Moreover, an unified interface allows centralized control over how many PASIDs are allowed per process. Yes. One unclear part with this generalization is about the permission. Do we open this interface to any process or only to those which have assigned devices? If the latter, what would be the mechanism to coordinate between this new interface and specific passthrough frameworks? I'm not sure, but if you just want a permission, you probably can introduce new capability (CAP_XXX) for this. A more tricky case, vSVA support on ARM (Eric/Jean please correct me) plans to do per-device PASID namespace which is built on a bind_pasid_table iommu callback to allow guest fully manage its PASIDs on a given passthrough device. I see, so I think the answer is to prepare for the namespace support from the start. (btw, I don't see how namespace is handled in current IOASID module?) I'm not sure how such requirement can be unified w/o involving passthrough frameworks, or whether ARM could also switch to global PASID style... Second, IOMMU nested translation is a per IOMMU domain capability. Since IOMMU domains are managed by VFIO/VDPA (alloc/free domain, attach/detach device, set/get domain attribute, etc.), reporting/enabling the nesting capability is an natural extension to the domain uAPI of existing passthrough frameworks. Actually, VFIO already includes a nesting enable interface even before this series. So it doesn't make sense to generalize this uAPI out. So my understanding is that VFIO already: 1) use multiple fds 2) separate IOMMU ops to a dedicated container fd (type1 iommu) 3) provides API to associated devices/group with a container And all the proposal in this series is to reuse the container fd. It should be possible to replace e.g type1 IOMMU with a unified module. Then the tricky part comes with the remaining operations (3/4/5), which are all backed by iommu_ops thus effective only within an IOMMU domain. To generalize them, the first thing is to find a way to associate the sva_FD (opened through generic /dev/sva) with an IOMMU domain that is created by VFIO/VDPA. The second thing is to replicate {domain<->device/subdevice} association in /dev/sva path because some operations (e.g. page fault) is triggered/handled per device/subdevice. Is there any reason that the #PF can not be handled via SVA fd? Therefore, /dev/sva must provide both per- domain and per-device uAPIs similar to what VFIO/VDPA already does. Moreover, mapping page fault to subdevice requires pre- registering subdevice fault data to IOMMU layer when binding guest page ta
(proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
> From: Jason Wang > Sent: Monday, September 14, 2020 12:20 PM > [...] > If it's possible, I would suggest a generic uAPI instead of a VFIO > specific one. > > Jason suggest something like /dev/sva. There will be a lot of other > subsystems that could benefit from this (e.g vDPA). > > Have you ever considered this approach? > Hi, Jason, We did some study on this approach and below is the output. It's a long writing but I didn't find a way to further abstract w/o losing necessary context. Sorry about that. Overall the real purpose of this series is to enable IOMMU nested translation capability with vSVA as one major usage, through below new uAPIs: 1) Report/enable IOMMU nested translation capability; 2) Allocate/free PASID; 3) Bind/unbind guest page table; 4) Invalidate IOMMU cache; 5) Handle IOMMU page request/response (not in this series); 1/3/4) is the minimal set for using IOMMU nested translation, with the other two optional. For example, the guest may enable vSVA on a device without using PASID. Or, it may bind its gIOVA page table which doesn't require page fault support. Finally, all operations can be applied to either physical device or subdevice. Then we evaluated each uAPI whether generalizing it is a good thing both in concept and regarding to complexity. First, unlike other uAPIs which are all backed by iommu_ops, PASID allocation/free is through the IOASID sub-system. From this angle we feel generalizing PASID management does make some sense. First, PASID is just a number and not related to any device before it's bound to a page table and IOMMU domain. Second, PASID is a global resource (at least on Intel VT-d), while having separate VFIO/ VDPA allocation interfaces may easily cause confusion in userspace, e.g. which interface to be used if both VFIO/VDPA devices exist. Moreover, an unified interface allows centralized control over how many PASIDs are allowed per process. One unclear part with this generalization is about the permission. Do we open this interface to any process or only to those which have assigned devices? If the latter, what would be the mechanism to coordinate between this new interface and specific passthrough frameworks? A more tricky case, vSVA support on ARM (Eric/Jean please correct me) plans to do per-device PASID namespace which is built on a bind_pasid_table iommu callback to allow guest fully manage its PASIDs on a given passthrough device. I'm not sure how such requirement can be unified w/o involving passthrough frameworks, or whether ARM could also switch to global PASID style... Second, IOMMU nested translation is a per IOMMU domain capability. Since IOMMU domains are managed by VFIO/VDPA (alloc/free domain, attach/detach device, set/get domain attribute, etc.), reporting/enabling the nesting capability is an natural extension to the domain uAPI of existing passthrough frameworks. Actually, VFIO already includes a nesting enable interface even before this series. So it doesn't make sense to generalize this uAPI out. Then the tricky part comes with the remaining operations (3/4/5), which are all backed by iommu_ops thus effective only within an IOMMU domain. To generalize them, the first thing is to find a way to associate the sva_FD (opened through generic /dev/sva) with an IOMMU domain that is created by VFIO/VDPA. The second thing is to replicate {domain<->device/subdevice} association in /dev/sva path because some operations (e.g. page fault) is triggered/handled per device/subdevice. Therefore, /dev/sva must provide both per- domain and per-device uAPIs similar to what VFIO/VDPA already does. Moreover, mapping page fault to subdevice requires pre- registering subdevice fault data to IOMMU layer when binding guest page table, while such fault data can be only retrieved from parent driver through VFIO/VDPA. However, we failed to find a good way even at the 1st step about domain association. The iommu domains are not exposed to the userspace, and there is no 1:1 mapping between domain and device. In VFIO, all devices within the same VFIO container share the address space but they may be organized in multiple IOMMU domains based on their bus type. How (should we let) the userspace know the domain information and open an sva_FD for each domain is the main problem here. In the end we just realized that doing such generalization doesn't really lead to a clear design and instead requires tight coordination between /dev/sva and VFIO/VDPA for almost every new uAPI (especially about synchronization when the domain/device association is changed or when the device/subdevice is being reset/ drained). Finally it may become a usability burden to the userspace on proper use of the two interfaces on the assigned device. Based on above analysis we feel that just generalizing PASID mgmt. might be a good thing to look at while the remaining operations are better being VFIO/VDPA specific u
Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On 2020/9/18 上午2:17, Jacob Pan (Jun) wrote: Hi Jason, On Thu, 17 Sep 2020 11:53:49 +0800, Jason Wang wrote: On 2020/9/17 上午7:09, Jacob Pan (Jun) wrote: Hi Jason, On Wed, 16 Sep 2020 15:38:41 -0300, Jason Gunthorpe wrote: On Wed, Sep 16, 2020 at 11:21:10AM -0700, Jacob Pan (Jun) wrote: Hi Jason, On Wed, 16 Sep 2020 14:01:13 -0300, Jason Gunthorpe wrote: On Wed, Sep 16, 2020 at 09:33:43AM -0700, Raj, Ashok wrote: On Wed, Sep 16, 2020 at 12:07:54PM -0300, Jason Gunthorpe wrote: On Tue, Sep 15, 2020 at 05:22:26PM -0700, Jacob Pan (Jun) wrote: If user space wants to bind page tables, create the PASID with /dev/sva, use ioctls there to setup the page table the way it wants, then pass the now configured PASID to a driver that can use it. Are we talking about bare metal SVA? What a weird term. Glad you noticed it at v7 :-) Any suggestions on something less weird than Shared Virtual Addressing? There is a reason why we moved from SVM to SVA. SVA is fine, what is "bare metal" supposed to mean? What I meant here is sharing virtual address between DMA and host process. This requires devices perform DMA request with PASID and use IOMMU first level/stage 1 page tables. This can be further divided into 1) user SVA 2) supervisor SVA (sharing init_mm) My point is that /dev/sva is not useful here since the driver can perform PASID allocation while doing SVA bind. No, you are thinking too small. Look at VDPA, it has a SVA uAPI. Some HW might use PASID for the SVA. Could you point to me the SVA UAPI? I couldn't find it in the mainline. Seems VDPA uses VHOST interface? It's the vhost_iotlb_msg defined in uapi/linux/vhost_types.h. Thanks for the pointer, for complete vSVA functionality we would need 1 TLB flush (IOTLB and PASID cache etc.) 2 PASID alloc/free 3 bind/unbind page tables or PASID tables 4 Page request service Seems vhost_iotlb_msg can be used for #1 partially. And the proposal is to pluck out the rest into /dev/sda? Seems awkward as Alex pointed out earlier for similar situation in VFIO. Consider it doesn't have any PASID support yet, my understanding is that if we go with /dev/sva: - vhost uAPI will still keep the uAPI for associating an ASID to a specific virtqueue - except for this, we can use /dev/sva for all the rest (P)ASID operations When VDPA is used by DPDK it makes sense that the PASID will be SVA and 1:1 with the mm_struct. I still don't see why bare metal DPDK needs to get a handle of the PASID. My understanding is that it may: - have a unified uAPI with vSVA: alloc, bind, unbind, free Got your point, but vSVA needs more than these Yes it's just a subset of what vSVA required. - leave the binding policy to userspace instead of the using a implied one in the kenrel Only if necessary. Yes, I think it's all about visibility(flexibility) and**manageability. Consider device has queue A, B, C. We will only dedicated queue A, B for one PASID(for vSVA) and C with another PASID(for SVA). It looks to me the current sva_bind() API doesn't support this. We still need an API for allocating a PASID for SVA and assign it to the (mediated) device. This case is pretty common for implementing a shadow queue for a guest. Perhaps the SVA patch would explain. Or are you talking about vDPA DPDK process that is used to support virtio-net-pmd in the guest? When VDPA is used by qemu it makes sense that the PASID will be an arbitary IOVA map constructed to be 1:1 with the guest vCPU physical map. /dev/sva allows a single uAPI to do this kind of setup, and qemu can support it while supporting a range of SVA kernel drivers. VDPA and vfio-mdev are obvious initial targets. *BOTH* are needed. In general any uAPI for PASID should have the option to use either the mm_struct SVA PASID *OR* a PASID from /dev/sva. It costs virtually nothing to implement this in the driver as PASID is just a number, and gives so much more flexability. Not really nothing in terms of PASID life cycles. For example, if user uses uacce interface to open an accelerator, it gets an FD_acc. Then it opens /dev/sva to allocate PASID then get another FD_pasid. Then we pass FD_pasid to the driver to bind page tables, perhaps multiple drivers. Now we have to worry about If FD_pasid gets closed before FD_acc(s) closed and all these race conditions. I'm not sure I understand this. But this demonstrates the flexibility of an unified uAPI. E.g it allows vDPA and VFIO device to use the same PAISD which can be shared with a process in the guest. This is for user DMA not for vSVA. I was contending that /dev/sva creates unnecessary steps for such usage. A question here is where the PASID management is expected to be done. I'm not quite sure the silent 1:1 binding done in intel_svm_bind_mm() can satisfy the requirement for management layer. For vSVA, I think vDPA and VFIO can potentially share but I am not seeing convincing benefits. If a guest process wants to
Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
Hi Jason, On Thu, 17 Sep 2020 11:53:49 +0800, Jason Wang wrote: > On 2020/9/17 上午7:09, Jacob Pan (Jun) wrote: > > Hi Jason, > > On Wed, 16 Sep 2020 15:38:41 -0300, Jason Gunthorpe > > wrote: > > > >> On Wed, Sep 16, 2020 at 11:21:10AM -0700, Jacob Pan (Jun) wrote: > >>> Hi Jason, > >>> On Wed, 16 Sep 2020 14:01:13 -0300, Jason Gunthorpe > >>> wrote: > >>> > On Wed, Sep 16, 2020 at 09:33:43AM -0700, Raj, Ashok wrote: > > On Wed, Sep 16, 2020 at 12:07:54PM -0300, Jason Gunthorpe > > wrote: > >> On Tue, Sep 15, 2020 at 05:22:26PM -0700, Jacob Pan (Jun) > >> wrote: > If user space wants to bind page tables, create the PASID > with /dev/sva, use ioctls there to setup the page table > the way it wants, then pass the now configured PASID to a > driver that can use it. > >>> Are we talking about bare metal SVA? > >> What a weird term. > > Glad you noticed it at v7 :-) > > > > Any suggestions on something less weird than > > Shared Virtual Addressing? There is a reason why we moved from > > SVM to SVA. > SVA is fine, what is "bare metal" supposed to mean? > > >>> What I meant here is sharing virtual address between DMA and host > >>> process. This requires devices perform DMA request with PASID and > >>> use IOMMU first level/stage 1 page tables. > >>> This can be further divided into 1) user SVA 2) supervisor SVA > >>> (sharing init_mm) > >>> > >>> My point is that /dev/sva is not useful here since the driver can > >>> perform PASID allocation while doing SVA bind. > >> No, you are thinking too small. > >> > >> Look at VDPA, it has a SVA uAPI. Some HW might use PASID for the > >> SVA. > > Could you point to me the SVA UAPI? I couldn't find it in the > > mainline. Seems VDPA uses VHOST interface? > > > It's the vhost_iotlb_msg defined in uapi/linux/vhost_types.h. > Thanks for the pointer, for complete vSVA functionality we would need 1 TLB flush (IOTLB and PASID cache etc.) 2 PASID alloc/free 3 bind/unbind page tables or PASID tables 4 Page request service Seems vhost_iotlb_msg can be used for #1 partially. And the proposal is to pluck out the rest into /dev/sda? Seems awkward as Alex pointed out earlier for similar situation in VFIO. > > > > >> When VDPA is used by DPDK it makes sense that the PASID will be SVA > >> and 1:1 with the mm_struct. > >> > > I still don't see why bare metal DPDK needs to get a handle of the > > PASID. > > > My understanding is that it may: > > - have a unified uAPI with vSVA: alloc, bind, unbind, free Got your point, but vSVA needs more than these > - leave the binding policy to userspace instead of the using a > implied one in the kenrel > Only if necessary. > > > Perhaps the SVA patch would explain. Or are you talking about > > vDPA DPDK process that is used to support virtio-net-pmd in the > > guest? > >> When VDPA is used by qemu it makes sense that the PASID will be an > >> arbitary IOVA map constructed to be 1:1 with the guest vCPU > >> physical map. /dev/sva allows a single uAPI to do this kind of > >> setup, and qemu can support it while supporting a range of SVA > >> kernel drivers. VDPA and vfio-mdev are obvious initial targets. > >> > >> *BOTH* are needed. > >> > >> In general any uAPI for PASID should have the option to use either > >> the mm_struct SVA PASID *OR* a PASID from /dev/sva. It costs > >> virtually nothing to implement this in the driver as PASID is just > >> a number, and gives so much more flexability. > >> > > Not really nothing in terms of PASID life cycles. For example, if > > user uses uacce interface to open an accelerator, it gets an > > FD_acc. Then it opens /dev/sva to allocate PASID then get another > > FD_pasid. Then we pass FD_pasid to the driver to bind page tables, > > perhaps multiple drivers. Now we have to worry about If FD_pasid > > gets closed before FD_acc(s) closed and all these race conditions. > > > I'm not sure I understand this. But this demonstrates the flexibility > of an unified uAPI. E.g it allows vDPA and VFIO device to use the > same PAISD which can be shared with a process in the guest. > This is for user DMA not for vSVA. I was contending that /dev/sva creates unnecessary steps for such usage. For vSVA, I think vDPA and VFIO can potentially share but I am not seeing convincing benefits. If a guest process wants to do SVA with a VFIO assigned device and a vDPA-backed virtio-net at the same time, it might be a limitation if PASID is not managed via a common interface. But I am not sure how vDPA SVA support will look like, does it support gIOVA? need virtio IOMMU? > For the race condition, it could be probably solved with refcnt. > Agreed but the best solution might be not to have the problem in the first place :) > Thanks > > > > > > If we do not expose FD_pasid to the user, the teardown is much > > simpler and streamlined. Following each FD_acc close, PASID unbind >
Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Thu, Sep 17, 2020 at 11:53:49AM +0800, Jason Wang wrote: > > > When VDPA is used by qemu it makes sense that the PASID will be an > > > arbitary IOVA map constructed to be 1:1 with the guest vCPU physical > > > map. /dev/sva allows a single uAPI to do this kind of setup, and qemu > > > can support it while supporting a range of SVA kernel drivers. VDPA > > > and vfio-mdev are obvious initial targets. > > > > > > *BOTH* are needed. > > > > > > In general any uAPI for PASID should have the option to use either the > > > mm_struct SVA PASID *OR* a PASID from /dev/sva. It costs virtually > > > nothing to implement this in the driver as PASID is just a number, and > > > gives so much more flexability. > > > > > Not really nothing in terms of PASID life cycles. For example, if user > > uses uacce interface to open an accelerator, it gets an FD_acc. Then it > > opens /dev/sva to allocate PASID then get another FD_pasid. Then we > > pass FD_pasid to the driver to bind page tables, perhaps multiple > > drivers. Now we have to worry about If FD_pasid gets closed before > > FD_acc(s) closed and all these race conditions. > > > I'm not sure I understand this. But this demonstrates the flexibility of an > unified uAPI. E.g it allows vDPA and VFIO device to use the same PAISD which > can be shared with a process in the guest. > > For the race condition, it could be probably solved with refcnt. Yep Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
> From: Jason Gunthorpe > Sent: Wednesday, September 16, 2020 10:45 PM > > On Wed, Sep 16, 2020 at 01:19:18AM +, Tian, Kevin wrote: > > > From: Jason Gunthorpe > > > Sent: Tuesday, September 15, 2020 10:29 PM > > > > > > > Do they need a device at all? It's not clear to me why RID based > > > > IOMMU management fits within vfio's scope, but PASID based does not. > > > > > > In RID mode vfio-pci completely owns the PCI function, so it is more > > > natural that VFIO, as the sole device owner, would own the DMA > mapping > > > machinery. Further, the RID IOMMU mode is rarely used outside of VFIO > > > so there is not much reason to try and disaggregate the API. > > > > It is also used by vDPA. > > A driver in VDPA, not VDPA itself. what is the difference? It is still the example of using RID IOMMU mode outside of VFIO (and just implies that vDPA even doesn't do a good abstraction internally). > > > > PASID on the other hand, is shared. vfio-mdev drivers will share the > > > device with other kernel drivers. PASID and DMA will be concurrent > > > with VFIO and other kernel drivers/etc. > > > > Looks you are equating PASID to host-side sharing, while ignoring > > another valid usage that a PASID-capable device is passed through > > to the guest through vfio-pci and then PASID is used by the guest > > for guest-side sharing. In such case, it is an exclusive usage in host > > side and then what is the problem for VFIO to manage PASID given > > that vfio-pci completely owns the function? > > This is no different than vfio-pci being yet another client to > /dev/sva > My comment was to echo Alex's question about "why RID based IOMMU management fits within vfio's scope, but PASID based does not". and when talking about generalization we should look bigger beyond sva. What really matters here is the iommu_domain which is about everything related to DMA mapping. The domain associated with a passthru device is marked as "unmanaged" in kernel and allows userspace to manage DMA mapping of this device through a set of iommu_ops: - alloc/free domain; - attach/detach device/subdevice; - map/unmap a memory region; - bind/unbind page table and invalidate iommu cache; - ... (and lots of other callbacks) map/unmap or bind/unbind are just different ways of managing DMAs in an iommu domain. The passthrough framework (VFIO or VDPA) has been providing its uAPI to manage every aspect of iommu_domain so far, and sva is just a natural extension following this design. If we really want to generalize something, it needs to be /dev/iommu as an unified interface for managing every aspect of iommu_domain. Asking SVA abstraction alone just causes unnecessary mess to both kernel (sync domain/device association between /dev/vfio and /dev/sva) and userspace (talk to two interfaces even for same vfio-pci device). Then it sounds like more like a bandaid for saving development effort in VDPA (which instead should go proposing /dev/iommu when it was invented instead of reinventing its own bits until such effort is unaffordable and then ask for partial abstraction to fix its gap). Thanks Kevin ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On 2020/9/17 上午7:09, Jacob Pan (Jun) wrote: Hi Jason, On Wed, 16 Sep 2020 15:38:41 -0300, Jason Gunthorpe wrote: On Wed, Sep 16, 2020 at 11:21:10AM -0700, Jacob Pan (Jun) wrote: Hi Jason, On Wed, 16 Sep 2020 14:01:13 -0300, Jason Gunthorpe wrote: On Wed, Sep 16, 2020 at 09:33:43AM -0700, Raj, Ashok wrote: On Wed, Sep 16, 2020 at 12:07:54PM -0300, Jason Gunthorpe wrote: On Tue, Sep 15, 2020 at 05:22:26PM -0700, Jacob Pan (Jun) wrote: If user space wants to bind page tables, create the PASID with /dev/sva, use ioctls there to setup the page table the way it wants, then pass the now configured PASID to a driver that can use it. Are we talking about bare metal SVA? What a weird term. Glad you noticed it at v7 :-) Any suggestions on something less weird than Shared Virtual Addressing? There is a reason why we moved from SVM to SVA. SVA is fine, what is "bare metal" supposed to mean? What I meant here is sharing virtual address between DMA and host process. This requires devices perform DMA request with PASID and use IOMMU first level/stage 1 page tables. This can be further divided into 1) user SVA 2) supervisor SVA (sharing init_mm) My point is that /dev/sva is not useful here since the driver can perform PASID allocation while doing SVA bind. No, you are thinking too small. Look at VDPA, it has a SVA uAPI. Some HW might use PASID for the SVA. Could you point to me the SVA UAPI? I couldn't find it in the mainline. Seems VDPA uses VHOST interface? It's the vhost_iotlb_msg defined in uapi/linux/vhost_types.h. When VDPA is used by DPDK it makes sense that the PASID will be SVA and 1:1 with the mm_struct. I still don't see why bare metal DPDK needs to get a handle of the PASID. My understanding is that it may: - have a unified uAPI with vSVA: alloc, bind, unbind, free - leave the binding policy to userspace instead of the using a implied one in the kenrel Perhaps the SVA patch would explain. Or are you talking about vDPA DPDK process that is used to support virtio-net-pmd in the guest? When VDPA is used by qemu it makes sense that the PASID will be an arbitary IOVA map constructed to be 1:1 with the guest vCPU physical map. /dev/sva allows a single uAPI to do this kind of setup, and qemu can support it while supporting a range of SVA kernel drivers. VDPA and vfio-mdev are obvious initial targets. *BOTH* are needed. In general any uAPI for PASID should have the option to use either the mm_struct SVA PASID *OR* a PASID from /dev/sva. It costs virtually nothing to implement this in the driver as PASID is just a number, and gives so much more flexability. Not really nothing in terms of PASID life cycles. For example, if user uses uacce interface to open an accelerator, it gets an FD_acc. Then it opens /dev/sva to allocate PASID then get another FD_pasid. Then we pass FD_pasid to the driver to bind page tables, perhaps multiple drivers. Now we have to worry about If FD_pasid gets closed before FD_acc(s) closed and all these race conditions. I'm not sure I understand this. But this demonstrates the flexibility of an unified uAPI. E.g it allows vDPA and VFIO device to use the same PAISD which can be shared with a process in the guest. For the race condition, it could be probably solved with refcnt. Thanks If we do not expose FD_pasid to the user, the teardown is much simpler and streamlined. Following each FD_acc close, PASID unbind is performed. Yi can correct me but this set is is about VFIO-PCI, VFIO-mdev will be introduced later. Last patch is: vfio/type1: Add vSVA support for IOMMU-backed mdevs So pretty hard to see how this is not about vfio-mdev, at least a little.. Jason Thanks, Jacob ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
Hi Jason, On Wed, 16 Sep 2020 15:38:41 -0300, Jason Gunthorpe wrote: > On Wed, Sep 16, 2020 at 11:21:10AM -0700, Jacob Pan (Jun) wrote: > > Hi Jason, > > On Wed, 16 Sep 2020 14:01:13 -0300, Jason Gunthorpe > > wrote: > > > > > On Wed, Sep 16, 2020 at 09:33:43AM -0700, Raj, Ashok wrote: > > > > On Wed, Sep 16, 2020 at 12:07:54PM -0300, Jason Gunthorpe > > > > wrote: > > > > > On Tue, Sep 15, 2020 at 05:22:26PM -0700, Jacob Pan (Jun) > > > > > wrote: > > > > > > > If user space wants to bind page tables, create the PASID > > > > > > > with /dev/sva, use ioctls there to setup the page table > > > > > > > the way it wants, then pass the now configured PASID to a > > > > > > > driver that can use it. > > > > > > > > > > > > Are we talking about bare metal SVA? > > > > > > > > > > What a weird term. > > > > > > > > Glad you noticed it at v7 :-) > > > > > > > > Any suggestions on something less weird than > > > > Shared Virtual Addressing? There is a reason why we moved from > > > > SVM to SVA. > > > > > > SVA is fine, what is "bare metal" supposed to mean? > > > > > What I meant here is sharing virtual address between DMA and host > > process. This requires devices perform DMA request with PASID and > > use IOMMU first level/stage 1 page tables. > > This can be further divided into 1) user SVA 2) supervisor SVA > > (sharing init_mm) > > > > My point is that /dev/sva is not useful here since the driver can > > perform PASID allocation while doing SVA bind. > > No, you are thinking too small. > > Look at VDPA, it has a SVA uAPI. Some HW might use PASID for the SVA. > Could you point to me the SVA UAPI? I couldn't find it in the mainline. Seems VDPA uses VHOST interface? > When VDPA is used by DPDK it makes sense that the PASID will be SVA > and 1:1 with the mm_struct. > I still don't see why bare metal DPDK needs to get a handle of the PASID. Perhaps the SVA patch would explain. Or are you talking about vDPA DPDK process that is used to support virtio-net-pmd in the guest? > When VDPA is used by qemu it makes sense that the PASID will be an > arbitary IOVA map constructed to be 1:1 with the guest vCPU physical > map. /dev/sva allows a single uAPI to do this kind of setup, and qemu > can support it while supporting a range of SVA kernel drivers. VDPA > and vfio-mdev are obvious initial targets. > > *BOTH* are needed. > > In general any uAPI for PASID should have the option to use either the > mm_struct SVA PASID *OR* a PASID from /dev/sva. It costs virtually > nothing to implement this in the driver as PASID is just a number, and > gives so much more flexability. > Not really nothing in terms of PASID life cycles. For example, if user uses uacce interface to open an accelerator, it gets an FD_acc. Then it opens /dev/sva to allocate PASID then get another FD_pasid. Then we pass FD_pasid to the driver to bind page tables, perhaps multiple drivers. Now we have to worry about If FD_pasid gets closed before FD_acc(s) closed and all these race conditions. If we do not expose FD_pasid to the user, the teardown is much simpler and streamlined. Following each FD_acc close, PASID unbind is performed. > > Yi can correct me but this set is is about VFIO-PCI, VFIO-mdev will > > be introduced later. > > Last patch is: > > vfio/type1: Add vSVA support for IOMMU-backed mdevs > > So pretty hard to see how this is not about vfio-mdev, at least a > little.. > > Jason Thanks, Jacob ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Wed, Sep 16, 2020 at 11:21:10AM -0700, Jacob Pan (Jun) wrote: > Hi Jason, > On Wed, 16 Sep 2020 14:01:13 -0300, Jason Gunthorpe > wrote: > > > On Wed, Sep 16, 2020 at 09:33:43AM -0700, Raj, Ashok wrote: > > > On Wed, Sep 16, 2020 at 12:07:54PM -0300, Jason Gunthorpe wrote: > > > > On Tue, Sep 15, 2020 at 05:22:26PM -0700, Jacob Pan (Jun) wrote: > > > > > > If user space wants to bind page tables, create the PASID with > > > > > > /dev/sva, use ioctls there to setup the page table the way it > > > > > > wants, then pass the now configured PASID to a driver that > > > > > > can use it. > > > > > > > > > > Are we talking about bare metal SVA? > > > > > > > > What a weird term. > > > > > > Glad you noticed it at v7 :-) > > > > > > Any suggestions on something less weird than > > > Shared Virtual Addressing? There is a reason why we moved from SVM > > > to SVA. > > > > SVA is fine, what is "bare metal" supposed to mean? > > > What I meant here is sharing virtual address between DMA and host > process. This requires devices perform DMA request with PASID and use > IOMMU first level/stage 1 page tables. > This can be further divided into 1) user SVA 2) supervisor SVA (sharing > init_mm) > > My point is that /dev/sva is not useful here since the driver can > perform PASID allocation while doing SVA bind. No, you are thinking too small. Look at VDPA, it has a SVA uAPI. Some HW might use PASID for the SVA. When VDPA is used by DPDK it makes sense that the PASID will be SVA and 1:1 with the mm_struct. When VDPA is used by qemu it makes sense that the PASID will be an arbitary IOVA map constructed to be 1:1 with the guest vCPU physical map. /dev/sva allows a single uAPI to do this kind of setup, and qemu can support it while supporting a range of SVA kernel drivers. VDPA and vfio-mdev are obvious initial targets. *BOTH* are needed. In general any uAPI for PASID should have the option to use either the mm_struct SVA PASID *OR* a PASID from /dev/sva. It costs virtually nothing to implement this in the driver as PASID is just a number, and gives so much more flexability. > Yi can correct me but this set is is about VFIO-PCI, VFIO-mdev will be > introduced later. Last patch is: vfio/type1: Add vSVA support for IOMMU-backed mdevs So pretty hard to see how this is not about vfio-mdev, at least a little.. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
Hi Jason, On Wed, 16 Sep 2020 14:01:13 -0300, Jason Gunthorpe wrote: > On Wed, Sep 16, 2020 at 09:33:43AM -0700, Raj, Ashok wrote: > > On Wed, Sep 16, 2020 at 12:07:54PM -0300, Jason Gunthorpe wrote: > > > On Tue, Sep 15, 2020 at 05:22:26PM -0700, Jacob Pan (Jun) wrote: > > > > > If user space wants to bind page tables, create the PASID with > > > > > /dev/sva, use ioctls there to setup the page table the way it > > > > > wants, then pass the now configured PASID to a driver that > > > > > can use it. > > > > > > > > Are we talking about bare metal SVA? > > > > > > What a weird term. > > > > Glad you noticed it at v7 :-) > > > > Any suggestions on something less weird than > > Shared Virtual Addressing? There is a reason why we moved from SVM > > to SVA. > > SVA is fine, what is "bare metal" supposed to mean? > What I meant here is sharing virtual address between DMA and host process. This requires devices perform DMA request with PASID and use IOMMU first level/stage 1 page tables. This can be further divided into 1) user SVA 2) supervisor SVA (sharing init_mm) My point is that /dev/sva is not useful here since the driver can perform PASID allocation while doing SVA bind. > PASID is about constructing an arbitary DMA IOVA map for PCI-E > devices, being able to intercept device DMA faults, etc. > An arbitrary IOVA map does not need PASID. In IOVA, you do map/unmap explicitly, why you need to handle IO page fault? To me, PASID identifies an address space that is associated with a mm_struct. > SVA is doing DMA IOVA 1:1 with the mm_struct CPU VA. DMA faults > trigger the same thing as CPU page faults. If is it not 1:1 then there > is no "shared". When SVA is done using PCI-E PASID it is "PASID for > SVA". Lots of existing devices already have SVA without PASID or > IOMMU, so lets not muddy the terminology. > I agree. This conversation is about "PASID for SVA" not "SVA without PASID" > vPASID/vIOMMU is allowing a guest to control the DMA IOVA map and > manipulate the PASIDs. > > vSVA is when a guest uses a vPASID to provide SVA, not sure this is > an informative term. > I agree. > This particular patch series seems to be about vPASID/vIOMMU for > vfio-mdev vs the other vPASID/vIOMMU patch which was about vPASID for > vfio-pci. > Yi can correct me but this set is is about VFIO-PCI, VFIO-mdev will be introduced later. > > > > If so, I don't see the need for userspace to know there is a > > > > PASID. All user space need is that my current mm is bound to a > > > > device by the driver. So it can be a one-step process for user > > > > instead of two. > > > > > > You've missed the entire point of the conversation, VDPA already > > > needs more than "my current mm is bound to a device" > > > > You mean current version of vDPA? or a potential future version of > > vDPA? > > Future VDPA drivers, it was made clear this was important to Intel > during the argument about VDPA as a mdev. > > Jason Thanks, Jacob ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Wed, Sep 16, 2020 at 09:33:43AM -0700, Raj, Ashok wrote: > On Wed, Sep 16, 2020 at 12:07:54PM -0300, Jason Gunthorpe wrote: > > On Tue, Sep 15, 2020 at 05:22:26PM -0700, Jacob Pan (Jun) wrote: > > > > If user space wants to bind page tables, create the PASID with > > > > /dev/sva, use ioctls there to setup the page table the way it wants, > > > > then pass the now configured PASID to a driver that can use it. > > > > > > Are we talking about bare metal SVA? > > > > What a weird term. > > Glad you noticed it at v7 :-) > > Any suggestions on something less weird than > Shared Virtual Addressing? There is a reason why we moved from SVM > to SVA. SVA is fine, what is "bare metal" supposed to mean? PASID is about constructing an arbitary DMA IOVA map for PCI-E devices, being able to intercept device DMA faults, etc. SVA is doing DMA IOVA 1:1 with the mm_struct CPU VA. DMA faults trigger the same thing as CPU page faults. If is it not 1:1 then there is no "shared". When SVA is done using PCI-E PASID it is "PASID for SVA". Lots of existing devices already have SVA without PASID or IOMMU, so lets not muddy the terminology. vPASID/vIOMMU is allowing a guest to control the DMA IOVA map and manipulate the PASIDs. vSVA is when a guest uses a vPASID to provide SVA, not sure this is an informative term. This particular patch series seems to be about vPASID/vIOMMU for vfio-mdev vs the other vPASID/vIOMMU patch which was about vPASID for vfio-pci. > > > If so, I don't see the need for userspace to know there is a > > > PASID. All user space need is that my current mm is bound to a > > > device by the driver. So it can be a one-step process for user > > > instead of two. > > > > You've missed the entire point of the conversation, VDPA already needs > > more than "my current mm is bound to a device" > > You mean current version of vDPA? or a potential future version of vDPA? Future VDPA drivers, it was made clear this was important to Intel during the argument about VDPA as a mdev. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
Hi, On 9/16/20 6:32 PM, Jason Gunthorpe wrote: > On Wed, Sep 16, 2020 at 06:20:52PM +0200, Jean-Philippe Brucker wrote: >> On Wed, Sep 16, 2020 at 11:51:48AM -0300, Jason Gunthorpe wrote: >>> On Wed, Sep 16, 2020 at 10:32:17AM +0200, Jean-Philippe Brucker wrote: And this is the only PASID model for Arm SMMU (and AMD IOMMU, I believe): the PASID space of a PCI function cannot be shared between host and guest, so we assign the whole PASID table along with the RID. Since we need the BIND, INVALIDATE, and report APIs introduced here to support nested translation, a /dev/sva interface would need to support this mode as well. >>> >>> Well, that means this HW cannot support PASID capable 'SIOV' style >>> devices in guests. >> >> It does not yet support Intel SIOV, no. It does support the standards, >> though: PCI SR-IOV to partition a device and PASIDs in a guest. > > SIOV is basically standards based, it is better thought of as a > cookbook on how to use PASID and IOMMU together. > >>> I admit whole function PASID delegation might be something vfio-pci >>> should handle - but only if it really doesn't fit in some /dev/sva >>> after we cover the other PASID cases. >> >> Wouldn't that be the duplication you're trying to avoid? A second >> channel for bind, invalidate, capability and fault reporting >> mechanisms? > > Yes, which is why it seems like it would be nicer to avoid it. Why I > said "might" :) > >> If we extract SVA parts of vfio_iommu_type1 into a separate chardev, >> PASID table pass-through [1] will have to use that. > > Yes, '/dev/sva' (which is a terrible name) would want to be the uAPI > entry point for controlling the vIOMMU related to PASID. > > Does anything in the [1] series have tight coupling to VFIO other than > needing to know a bus/device/function? It looks like it is mostly > exposing iommu_* functions as uAPI? this series does not use any PASID so it fits quite nicely into the VFIO framework I think. Besides cache invalidation that takes the struct device, other operations (MSI binding and PASID table passing operate on the iommu domain). Also we use the VFIO memory region and interrupt/eventfd registration mechanism to return faults. Thanks Eric > > Jason > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Wed, Sep 16, 2020 at 12:07:54PM -0300, Jason Gunthorpe wrote: > On Tue, Sep 15, 2020 at 05:22:26PM -0700, Jacob Pan (Jun) wrote: > > > If user space wants to bind page tables, create the PASID with > > > /dev/sva, use ioctls there to setup the page table the way it wants, > > > then pass the now configured PASID to a driver that can use it. > > > > Are we talking about bare metal SVA? > > What a weird term. Glad you noticed it at v7 :-) Any suggestions on something less weird than Shared Virtual Addressing? There is a reason why we moved from SVM to SVA. > > > If so, I don't see the need for userspace to know there is a > > PASID. All user space need is that my current mm is bound to a > > device by the driver. So it can be a one-step process for user > > instead of two. > > You've missed the entire point of the conversation, VDPA already needs > more than "my current mm is bound to a device" You mean current version of vDPA? or a potential future version of vDPA? Cheers, Ashok ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Wed, Sep 16, 2020 at 06:20:52PM +0200, Jean-Philippe Brucker wrote: > On Wed, Sep 16, 2020 at 11:51:48AM -0300, Jason Gunthorpe wrote: > > On Wed, Sep 16, 2020 at 10:32:17AM +0200, Jean-Philippe Brucker wrote: > > > And this is the only PASID model for Arm SMMU (and AMD IOMMU, I believe): > > > the PASID space of a PCI function cannot be shared between host and guest, > > > so we assign the whole PASID table along with the RID. Since we need the > > > BIND, INVALIDATE, and report APIs introduced here to support nested > > > translation, a /dev/sva interface would need to support this mode as well. > > > > Well, that means this HW cannot support PASID capable 'SIOV' style > > devices in guests. > > It does not yet support Intel SIOV, no. It does support the standards, > though: PCI SR-IOV to partition a device and PASIDs in a guest. SIOV is basically standards based, it is better thought of as a cookbook on how to use PASID and IOMMU together. > > I admit whole function PASID delegation might be something vfio-pci > > should handle - but only if it really doesn't fit in some /dev/sva > > after we cover the other PASID cases. > > Wouldn't that be the duplication you're trying to avoid? A second > channel for bind, invalidate, capability and fault reporting > mechanisms? Yes, which is why it seems like it would be nicer to avoid it. Why I said "might" :) > If we extract SVA parts of vfio_iommu_type1 into a separate chardev, > PASID table pass-through [1] will have to use that. Yes, '/dev/sva' (which is a terrible name) would want to be the uAPI entry point for controlling the vIOMMU related to PASID. Does anything in the [1] series have tight coupling to VFIO other than needing to know a bus/device/function? It looks like it is mostly exposing iommu_* functions as uAPI? Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Wed, Sep 16, 2020 at 11:51:48AM -0300, Jason Gunthorpe wrote: > On Wed, Sep 16, 2020 at 10:32:17AM +0200, Jean-Philippe Brucker wrote: > > And this is the only PASID model for Arm SMMU (and AMD IOMMU, I believe): > > the PASID space of a PCI function cannot be shared between host and guest, > > so we assign the whole PASID table along with the RID. Since we need the > > BIND, INVALIDATE, and report APIs introduced here to support nested > > translation, a /dev/sva interface would need to support this mode as well. > > Well, that means this HW cannot support PASID capable 'SIOV' style > devices in guests. It does not yet support Intel SIOV, no. It does support the standards, though: PCI SR-IOV to partition a device and PASIDs in a guest. > I admit whole function PASID delegation might be something vfio-pci > should handle - but only if it really doesn't fit in some /dev/sva > after we cover the other PASID cases. Wouldn't that be the duplication you're trying to avoid? A second channel for bind, invalidate, capability and fault reporting mechanisms? If we extract SVA parts of vfio_iommu_type1 into a separate chardev, PASID table pass-through [1] will have to use that. Thanks, Jean [1] https://lore.kernel.org/linux-iommu/20200320161911.27494-1-eric.au...@redhat.com/ ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Tue, Sep 15, 2020 at 05:22:26PM -0700, Jacob Pan (Jun) wrote: > > If user space wants to bind page tables, create the PASID with > > /dev/sva, use ioctls there to setup the page table the way it wants, > > then pass the now configured PASID to a driver that can use it. > > Are we talking about bare metal SVA? What a weird term. > If so, I don't see the need for userspace to know there is a > PASID. All user space need is that my current mm is bound to a > device by the driver. So it can be a one-step process for user > instead of two. You've missed the entire point of the conversation, VDPA already needs more than "my current mm is bound to a device" > > PASID managment and binding is seperated from the driver(s) that are > > using the PASID. > > Why separate? Drivers need to be involved in PASID life cycle > management. For example, when tearing down a PASID, the driver needs to > stop DMA, IOMMU driver needs to unbind, etc. If driver is the control > point, then things are just in order. I am referring to bare metal SVA. Drivers can be involved and still have the uAPIs seperate. It isn't hard. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Wed, Sep 16, 2020 at 10:32:17AM +0200, Jean-Philippe Brucker wrote: > And this is the only PASID model for Arm SMMU (and AMD IOMMU, I believe): > the PASID space of a PCI function cannot be shared between host and guest, > so we assign the whole PASID table along with the RID. Since we need the > BIND, INVALIDATE, and report APIs introduced here to support nested > translation, a /dev/sva interface would need to support this mode as well. Well, that means this HW cannot support PASID capable 'SIOV' style devices in guests. I admit whole function PASID delegation might be something vfio-pci should handle - but only if it really doesn't fit in some /dev/sva after we cover the other PASID cases. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Wed, Sep 16, 2020 at 01:19:18AM +, Tian, Kevin wrote: > > From: Jason Gunthorpe > > Sent: Tuesday, September 15, 2020 10:29 PM > > > > > Do they need a device at all? It's not clear to me why RID based > > > IOMMU management fits within vfio's scope, but PASID based does not. > > > > In RID mode vfio-pci completely owns the PCI function, so it is more > > natural that VFIO, as the sole device owner, would own the DMA mapping > > machinery. Further, the RID IOMMU mode is rarely used outside of VFIO > > so there is not much reason to try and disaggregate the API. > > It is also used by vDPA. A driver in VDPA, not VDPA itself. > > PASID on the other hand, is shared. vfio-mdev drivers will share the > > device with other kernel drivers. PASID and DMA will be concurrent > > with VFIO and other kernel drivers/etc. > > Looks you are equating PASID to host-side sharing, while ignoring > another valid usage that a PASID-capable device is passed through > to the guest through vfio-pci and then PASID is used by the guest > for guest-side sharing. In such case, it is an exclusive usage in host > side and then what is the problem for VFIO to manage PASID given > that vfio-pci completely owns the function? This is no different than vfio-pci being yet another client to /dev/sva Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Wed, Sep 16, 2020 at 01:19:18AM +, Tian, Kevin wrote: > > From: Jason Gunthorpe > > Sent: Tuesday, September 15, 2020 10:29 PM > > > > > Do they need a device at all? It's not clear to me why RID based > > > IOMMU management fits within vfio's scope, but PASID based does not. > > > > In RID mode vfio-pci completely owns the PCI function, so it is more > > natural that VFIO, as the sole device owner, would own the DMA mapping > > machinery. Further, the RID IOMMU mode is rarely used outside of VFIO > > so there is not much reason to try and disaggregate the API. > > It is also used by vDPA. > > > > > PASID on the other hand, is shared. vfio-mdev drivers will share the > > device with other kernel drivers. PASID and DMA will be concurrent > > with VFIO and other kernel drivers/etc. > > > > Looks you are equating PASID to host-side sharing, while ignoring > another valid usage that a PASID-capable device is passed through > to the guest through vfio-pci and then PASID is used by the guest > for guest-side sharing. In such case, it is an exclusive usage in host > side and then what is the problem for VFIO to manage PASID given > that vfio-pci completely owns the function? And this is the only PASID model for Arm SMMU (and AMD IOMMU, I believe): the PASID space of a PCI function cannot be shared between host and guest, so we assign the whole PASID table along with the RID. Since we need the BIND, INVALIDATE, and report APIs introduced here to support nested translation, a /dev/sva interface would need to support this mode as well. Thanks, Jean ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On 2020/9/16 上午3:26, Raj, Ashok wrote: IIUC, you are asking that part of the interface to move to a API interface that potentially the new /dev/sva and VFIO could share? I think the API's for PASID management themselves are generic (Jean's patchset + Jacob's ioasid set management). Yes, the in kernel APIs are pretty generic now, and can be used by many types of drivers. Good, so there is no new requirements here I suppose. The requirement is not for in-kernel APIs but a generic uAPIs. As JasonW kicked this off, VDPA will need all this identical stuff too. We already know this, and I think Intel VDPA HW will need it, so it should concern you too:) This is one of those things that I would disagree and commit :-).. A PASID vIOMMU solution sharable with VDPA and VFIO, based on a PASID control char dev (eg /dev/sva, or maybe /dev/iommu) seems like a reasonable starting point for discussion. Looks like now we are getting closer to what we need.:-) Given that PASID api's are general purpose today and any driver can use it to take advantage. VFIO fortunately or unfortunately has the IOMMU things abstracted. I suppose that support is also mostly built on top of the generic iommu* api abstractions in a vendor neutral way? I'm still lost on what is missing that vDPA can't build on top of what is available? For sure it can, but we may end up duplicated (or similar) uAPIs which is bad. Thanks Cheers, Ashok ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On 2020/9/14 下午9:31, Jean-Philippe Brucker wrote: If it's possible, I would suggest a generic uAPI instead of a VFIO specific one. A large part of this work is already generic uAPI, in include/uapi/linux/iommu.h. This is not what I read from this series, all the following uAPI is VFIO specific: struct vfio_iommu_type1_nesting_op; struct vfio_iommu_type1_pasid_request; And include/uapi/linux/iommu.h is not included in include/uapi/linux/vfio.h at all. This patchset connects that generic interface to the pre-existing VFIO uAPI that deals with IOMMU mappings of an assigned device. But the bulk of the work is done by the IOMMU subsystem, and is available to all device drivers. So any reason not introducing the uAPI to IOMMU drivers directly? Jason suggest something like /dev/sva. There will be a lot of other subsystems that could benefit from this (e.g vDPA). Do you have a more precise idea of the interface /dev/sva would provide, how it would interact with VFIO and others? Can we replace the container fd with sva fd like: sva = open("/dev/sva", O_RDWR); group = open("/dev/vfio/26", O_RDWR); ioctl(group, VFIO_GROUP_SET_SVA, &sva); Then we can do all SVA stuffs through sva fd, and for other subsystems (like vDPA) it only need to implement the function that is equivalent to VFIO_GROUP_SET_SVA. vDPA could transport the generic iommu.h structures via its own uAPI, and call the IOMMU API directly without going through an intermediate /dev/sva handle. Any value for those transporting? I think we have agreed that VFIO is not the only user for vSVA ... It's not hard to forecast that there would be more subsystems that want to benefit from vSVA, we don't want to duplicate the similar uAPIs in all of those subsystems. Thanks Thanks, Jean ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On 9/16/20 8:22 AM, Jacob Pan (Jun) wrote: If user space wants to bind page tables, create the PASID with /dev/sva, use ioctls there to setup the page table the way it wants, then pass the now configured PASID to a driver that can use it. Are we talking about bare metal SVA? If so, I don't see the need for userspace to know there is a PASID. All user space need is that my current mm is bound to a device by the driver. So it can be a one-step process for user instead of two. Driver does not do page table binding. Do not duplicate all the control plane uAPI in every driver. PASID managment and binding is seperated from the driver(s) that are using the PASID. Why separate? Drivers need to be involved in PASID life cycle management. For example, when tearing down a PASID, the driver needs to stop DMA, IOMMU driver needs to unbind, etc. If driver is the control point, then things are just in order. I am referring to bare metal SVA. For guest SVA, I agree that binding is separate from PASID allocation. Could you review this doc. in terms of life cycle? https://lkml.org/lkml/2020/8/22/13 My point is that /dev/sda has no value for bare metal SVA, we are just talking about if guest SVA UAPIs can be consolidated. Or am I missing something? Not only bare metal SVA, but also subdevice passthrough (Intel Scalable IOV and ARM SubStream ID) also consumes PASID which has nothing to do with user space, hence the /dev/sva is unsuited. Best regards, baolu ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
> From: Jason Gunthorpe > Sent: Tuesday, September 15, 2020 10:29 PM > > > Do they need a device at all? It's not clear to me why RID based > > IOMMU management fits within vfio's scope, but PASID based does not. > > In RID mode vfio-pci completely owns the PCI function, so it is more > natural that VFIO, as the sole device owner, would own the DMA mapping > machinery. Further, the RID IOMMU mode is rarely used outside of VFIO > so there is not much reason to try and disaggregate the API. It is also used by vDPA. > > PASID on the other hand, is shared. vfio-mdev drivers will share the > device with other kernel drivers. PASID and DMA will be concurrent > with VFIO and other kernel drivers/etc. > Looks you are equating PASID to host-side sharing, while ignoring another valid usage that a PASID-capable device is passed through to the guest through vfio-pci and then PASID is used by the guest for guest-side sharing. In such case, it is an exclusive usage in host side and then what is the problem for VFIO to manage PASID given that vfio-pci completely owns the function? Thanks Kevin ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
Hi Jason, On Tue, 15 Sep 2020 20:51:26 -0300, Jason Gunthorpe wrote: > On Tue, Sep 15, 2020 at 03:08:51PM -0700, Jacob Pan wrote: > > > A PASID vIOMMU solution sharable with VDPA and VFIO, based on a > > > PASID control char dev (eg /dev/sva, or maybe /dev/iommu) seems > > > like a reasonable starting point for discussion. > > > > I am not sure what can really be consolidated in /dev/sva. > > More or less, everything in this patch. All the manipulations of PASID > that are required for vIOMMU use case/etc. Basically all PASID control > that is not just a 1:1 mapping of the mm_struct. > > > will have their own kerne-user interfaces anyway for their usage > > models. They are just providing the specific transport while > > sharing generic IOMMU UAPIs and IOASID management. > > > As I mentioned PASID management is already consolidated in the > > IOASID layer, so for VDPA or other users, it just matter of create > > its own ioasid_set, doing allocation. > > Creating the PASID is not the problem, managing what the PASID maps to > is the issue. That is all uAPI that we don't really have today. > > > IOASID is also available to the in-kernel users which does not > > need /dev/sva AFAICT. For bare metal SVA, I don't see a need to > > create this 'floating' state of the PASID when created by /dev/sva. > > PASID allocation could happen behind the scene when users need to > > bind page tables to a device DMA stream. > > My point is I would like to see one set of uAPI ioctls to bind page > tables. I don't want to have VFIO, VDPA, etc, etc uAPIs to do the > exact same things only slightly differently. > Got your point. I am not familiar with VDPA but for VFIO UAPI, it is very thin, mostly passthrough IOMMU UAPI struct as opaque data. > If user space wants to bind page tables, create the PASID with > /dev/sva, use ioctls there to setup the page table the way it wants, > then pass the now configured PASID to a driver that can use it. > Are we talking about bare metal SVA? If so, I don't see the need for userspace to know there is a PASID. All user space need is that my current mm is bound to a device by the driver. So it can be a one-step process for user instead of two. > Driver does not do page table binding. Do not duplicate all the > control plane uAPI in every driver. > > PASID managment and binding is seperated from the driver(s) that are > using the PASID. > Why separate? Drivers need to be involved in PASID life cycle management. For example, when tearing down a PASID, the driver needs to stop DMA, IOMMU driver needs to unbind, etc. If driver is the control point, then things are just in order. I am referring to bare metal SVA. For guest SVA, I agree that binding is separate from PASID allocation. Could you review this doc. in terms of life cycle? https://lkml.org/lkml/2020/8/22/13 My point is that /dev/sda has no value for bare metal SVA, we are just talking about if guest SVA UAPIs can be consolidated. Or am I missing something? > Jason Thanks, Jacob ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Tue, Sep 15, 2020 at 03:08:51PM -0700, Jacob Pan wrote: > > A PASID vIOMMU solution sharable with VDPA and VFIO, based on a PASID > > control char dev (eg /dev/sva, or maybe /dev/iommu) seems like a > > reasonable starting point for discussion. > > I am not sure what can really be consolidated in /dev/sva. More or less, everything in this patch. All the manipulations of PASID that are required for vIOMMU use case/etc. Basically all PASID control that is not just a 1:1 mapping of the mm_struct. > will have their own kerne-user interfaces anyway for their usage models. > They are just providing the specific transport while sharing generic IOMMU > UAPIs and IOASID management. > As I mentioned PASID management is already consolidated in the IOASID layer, > so for VDPA or other users, it just matter of create its own ioasid_set, > doing allocation. Creating the PASID is not the problem, managing what the PASID maps to is the issue. That is all uAPI that we don't really have today. > IOASID is also available to the in-kernel users which does not > need /dev/sva AFAICT. For bare metal SVA, I don't see a need to create this > 'floating' state of the PASID when created by /dev/sva. PASID allocation > could happen behind the scene when users need to bind page tables to a > device DMA stream. My point is I would like to see one set of uAPI ioctls to bind page tables. I don't want to have VFIO, VDPA, etc, etc uAPIs to do the exact same things only slightly differently. If user space wants to bind page tables, create the PASID with /dev/sva, use ioctls there to setup the page table the way it wants, then pass the now configured PASID to a driver that can use it. Driver does not do page table binding. Do not duplicate all the control plane uAPI in every driver. PASID managment and binding is seperated from the driver(s) that are using the PASID. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Tue, Sep 15, 2020 at 12:26:32PM -0700, Raj, Ashok wrote: > > Yes, there is. There is a limited pool of HW PASID's. If one user fork > > bombs it can easially claim an unreasonable number from that pool as > > each process will claim a PASID. That can DOS the rest of the system. > > Not sure how you had this played out.. For PASID used in ENQCMD today for > our SVM usages, we *DO* not automatically propagate or allocate new PASIDs. > > The new process needs to bind to get a PASID for its own use. For threads > of same process the PASID is inherited. For forks(), we do not > auto-allocate them. Auto-allocate doesn't matter, the PASID is tied to the mm_struct, after fork the program will get a new mm_struct, and it can manually re-trigger PASID allocation for that mm_struct from any SVA kernel driver. 64k processes, each with their own mm_struct, all triggering SVA, will allocate 64k PASID's and use up the whole 16 bit space. > Given that PASID api's are general purpose today and any driver can use it > to take advantage. VFIO fortunately or unfortunately has the IOMMU things > abstracted. I suppose that support is also mostly built on top of the > generic iommu* api abstractions in a vendor neutral way? > > I'm still lost on what is missing that vDPA can't build on top of what is > available? I think it is basically everything in this patch.. Why duplicate all this uAPI? Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
Hi Jason, On Tue, 15 Sep 2020 15:45:10 -0300, Jason Gunthorpe wrote: > On Tue, Sep 15, 2020 at 11:11:54AM -0700, Raj, Ashok wrote: > > > PASID applies widely to many device and needs to be introduced with a > > > wide community agreement so all scenarios will be supportable. > > > > True, reading some of the earlier replies I was clearly confused as I > > thought you were talking about mdev again. But now that you stay it, you > > have moved past mdev and its the PASID interfaces correct? > > Yes, we agreed mdev for IDXD at LPC, didn't talk about PASID. > > > For the native user applications have just 1 PASID per > > process. There is no need for a quota management. > > Yes, there is. There is a limited pool of HW PASID's. If one user fork > bombs it can easially claim an unreasonable number from that pool as > each process will claim a PASID. That can DOS the rest of the system. > > If PASID DOS is a worry then it must be solved at the IOMMU level for > all user applications that might trigger a PASID allocation. VFIO is > not special. > > > IIUC, you are asking that part of the interface to move to a API > > interface that potentially the new /dev/sva and VFIO could share? I > > think the API's for PASID management themselves are generic (Jean's > > patchset + Jacob's ioasid set management). > > Yes, the in kernel APIs are pretty generic now, and can be used by > many types of drivers. > Right, IOMMU UAPIs are not VFIO specific, we pass user pointer to the IOMMU layer to process. Similarly for PASID management, the IOASID extensions we are proposing will handle ioasid_set (groups/pools), quota, permissions, and notifications in the IOASID core. There is nothing VFIO specific. https://lkml.org/lkml/2020/8/22/12 > As JasonW kicked this off, VDPA will need all this identical stuff > too. We already know this, and I think Intel VDPA HW will need it, so > it should concern you too :) > > A PASID vIOMMU solution sharable with VDPA and VFIO, based on a PASID > control char dev (eg /dev/sva, or maybe /dev/iommu) seems like a > reasonable starting point for discussion. > I am not sure what can really be consolidated in /dev/sva. VFIO and VDPA will have their own kerne-user interfaces anyway for their usage models. They are just providing the specific transport while sharing generic IOMMU UAPIs and IOASID management. As I mentioned PASID management is already consolidated in the IOASID layer, so for VDPA or other users, it just matter of create its own ioasid_set, doing allocation. IOASID is also available to the in-kernel users which does not need /dev/sva AFAICT. For bare metal SVA, I don't see a need to create this 'floating' state of the PASID when created by /dev/sva. PASID allocation could happen behind the scene when users need to bind page tables to a device DMA stream. Security authorization of the PASID is natively enforced when user try to bind page table, there is no need to pass the FD handle of the PASID back to the kernel as you suggested earlier. Thanks, Jacob ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Tue, Sep 15, 2020 at 03:45:10PM -0300, Jason Gunthorpe wrote: > On Tue, Sep 15, 2020 at 11:11:54AM -0700, Raj, Ashok wrote: > > > PASID applies widely to many device and needs to be introduced with a > > > wide community agreement so all scenarios will be supportable. > > > > True, reading some of the earlier replies I was clearly confused as I > > thought you were talking about mdev again. But now that you stay it, you > > have moved past mdev and its the PASID interfaces correct? > > Yes, we agreed mdev for IDXD at LPC, didn't talk about PASID. > > > For the native user applications have just 1 PASID per > > process. There is no need for a quota management. > > Yes, there is. There is a limited pool of HW PASID's. If one user fork > bombs it can easially claim an unreasonable number from that pool as > each process will claim a PASID. That can DOS the rest of the system. Not sure how you had this played out.. For PASID used in ENQCMD today for our SVM usages, we *DO* not automatically propagate or allocate new PASIDs. The new process needs to bind to get a PASID for its own use. For threads of same process the PASID is inherited. For forks(), we do not auto-allocate them. Since PASID isn't a sharable resource much like how you would not pass mmio mmap's to forked processes that cannot be shared correct? Such as your doorbell space for e.g. > > If PASID DOS is a worry then it must be solved at the IOMMU level for > all user applications that might trigger a PASID allocation. VFIO is > not special. Feels like you can simply avoid the PASID DOS rather than permit it to happen. > > > IIUC, you are asking that part of the interface to move to a API interface > > that potentially the new /dev/sva and VFIO could share? I think the API's > > for PASID management themselves are generic (Jean's patchset + Jacob's > > ioasid set management). > > Yes, the in kernel APIs are pretty generic now, and can be used by > many types of drivers. Good, so there is no new requirements here I suppose. > > As JasonW kicked this off, VDPA will need all this identical stuff > too. We already know this, and I think Intel VDPA HW will need it, so > it should concern you too :) This is one of those things that I would disagree and commit :-).. > > A PASID vIOMMU solution sharable with VDPA and VFIO, based on a PASID > control char dev (eg /dev/sva, or maybe /dev/iommu) seems like a > reasonable starting point for discussion. Looks like now we are getting closer to what we need. :-) Given that PASID api's are general purpose today and any driver can use it to take advantage. VFIO fortunately or unfortunately has the IOMMU things abstracted. I suppose that support is also mostly built on top of the generic iommu* api abstractions in a vendor neutral way? I'm still lost on what is missing that vDPA can't build on top of what is available? Cheers, Ashok ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Tue, Sep 15, 2020 at 11:11:54AM -0700, Raj, Ashok wrote: > > PASID applies widely to many device and needs to be introduced with a > > wide community agreement so all scenarios will be supportable. > > True, reading some of the earlier replies I was clearly confused as I > thought you were talking about mdev again. But now that you stay it, you > have moved past mdev and its the PASID interfaces correct? Yes, we agreed mdev for IDXD at LPC, didn't talk about PASID. > For the native user applications have just 1 PASID per > process. There is no need for a quota management. Yes, there is. There is a limited pool of HW PASID's. If one user fork bombs it can easially claim an unreasonable number from that pool as each process will claim a PASID. That can DOS the rest of the system. If PASID DOS is a worry then it must be solved at the IOMMU level for all user applications that might trigger a PASID allocation. VFIO is not special. > IIUC, you are asking that part of the interface to move to a API interface > that potentially the new /dev/sva and VFIO could share? I think the API's > for PASID management themselves are generic (Jean's patchset + Jacob's > ioasid set management). Yes, the in kernel APIs are pretty generic now, and can be used by many types of drivers. As JasonW kicked this off, VDPA will need all this identical stuff too. We already know this, and I think Intel VDPA HW will need it, so it should concern you too :) A PASID vIOMMU solution sharable with VDPA and VFIO, based on a PASID control char dev (eg /dev/sva, or maybe /dev/iommu) seems like a reasonable starting point for discussion. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Tue, Sep 15, 2020 at 08:33:41AM -0300, Jason Gunthorpe wrote: > On Mon, Sep 14, 2020 at 03:44:38PM -0700, Raj, Ashok wrote: > > Hi Jason, > > > > I thought we discussed this at LPC, but still seems to be going in > > circles :-(. > > We discused mdev at LPC, not PASID. > > PASID applies widely to many device and needs to be introduced with a > wide community agreement so all scenarios will be supportable. True, reading some of the earlier replies I was clearly confused as I thought you were talking about mdev again. But now that you stay it, you have moved past mdev and its the PASID interfaces correct? > > > As you had suggested earlier in the mail thread could Jason Wang maybe > > build out what it takes to have a full fledged /dev/sva interface for vDPA > > and figure out how the interfaces should emerge? otherwise it appears > > everyone is talking very high level and with that limited understanding of > > how things work at the moment. > > You want Jason Wang to do the work to get Intel PASID support merged? > Seems a bit of strange request. I was reading mdev in my head. Not PASID, sorry. For the native user applications have just 1 PASID per process. There is no need for a quota management. VFIO being the one used for guest where there is more PASID's per guest is where this is enforced today. IIUC, you are asking that part of the interface to move to a API interface that potentially the new /dev/sva and VFIO could share? I think the API's for PASID management themselves are generic (Jean's patchset + Jacob's ioasid set management). Possibly what you need is already available, but not in a specific way that you expect maybe? Let me check with Jacob and let him/Jean pick that up. Cheers, Ashok ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Mon, Sep 14, 2020 at 04:33:10PM -0600, Alex Williamson wrote: > Can you explain that further, or spit-ball what you think this /dev/sva > interface looks like and how a user might interact between vfio and > this new interface? When you open it you get some container, inside the container the user can create PASIDs. PASIDs outside that container cannot be reached. Creating a PASID, or the guest PASID range would be the entry point for doing all the operations against a PASID or range that this patch series imagines: - Map process VA mappings to the PASID's DMA virtual address space - Catch faults - Setup any special HW stuff like Intel's two level thing, ARM stuff, etc - Expose resource controls, cgroup, whatever - Migration special stuff (allocate fixed PASIDs) A PASID is a handle for an IOMMU page table, and the tools to manipulate it. Within /dev/sva the page table is just 'floating' and not linked to any PCI functions The open /dev/sva FD holding the allocated PASIDs would be passed to a kernel driver. This is a security authorization that the specified PASID can be assigned to a PCI device by the kernel. At this point the kernel driver would have the IOMMU permit its bus/device/function to use the PASID. The PASID can be passed to multiple drivers of any driver flavour so table re-use is possible. Now the IOMMU page table is linked to a device. The kernel device driver would also do the device specific programming to setup the PASID in the device, attach it to some device object and expose the device for user DMA. For instance IDXD's char dev would map the queue memory and associate the PASID with that queue and setup the HW to be ready for the new enque instruction. The IDXD mdev would link to its emulated PCI BAR and ensure the guest can only use PASID's included in the /dev/sva container. The qemu control plane for vIOMMU related to PASID would run over /dev/sva. I think the design could go further where a 'PASID' is just an abstract idea of a page table, then vfio-pci could consume it too as a IOMMU page table handle even though there is no actual PASID. So qemu could end up with one API to universally control the vIOMMU, an API that can be shared between subsystems and is not tied to VFIO. > allocating pasids and associating them with page tables for that > two-stage IOMMU setup, performing cache invalidations based on page > table updates, etc. How does it make more sense for a vIOMMU to > setup some aspects of the IOMMU through vfio and others through a > TBD interface? vfio's IOMMU interface is about RID based full device ownership, and fixed mappings. PASID is about mediation, shared ownership and page faulting. Does PASID overlap with the existing IOMMU RID interface beyond both are using the IOMMU? > The IOMMU needs to allocate PASIDs, so in that sense it enforces a > quota via the architectural limits, but is the IOMMU layer going to > distinguish in-kernel versus user limits? A cgroup limit seems like a > good idea, but that's not really at the IOMMU layer either and I don't > see that a /dev/sva and vfio interface couldn't both support a cgroup > type quota. It is all good questions. PASID is new, this stuff needs to be sketched out more. A lot of in-kernel users of IOMMU PASID are probably going to be triggered by userspace actions. I think a cgroup quota would end up near the IOMMU layer, so vfio, sva, and any other driver char devs would all be restricted by the cgroup as peers. > And it's not clear that they'll have compatible requirements. A > userspace idxd driver might have limited needs versus a vIOMMU backend. > Does a single quota model adequately support both or are we back to the > differences between access to a device and ownership of a device? At the end of the day a PASID is just a number and the drivers only use of it is to program it into HW. All these other differences deal with the IOMMU side of the PASID, how pages are mapped into it, how page fault works, etc, etc. Keeping the two concerns seperated seems very clean. A device driver shouldn't care how the PASID is setup. > > > This series is a blueprint within the context of the ownership and > > > permission model that VFIO already provides. It doesn't seem like we > > > can pluck that out on its own, nor is it necessarily the case that VFIO > > > wouldn't want to provide PASID services within its own API even if we > > > did have this undefined /dev/sva interface. > > > > I don't see what you do - VFIO does not own PASID, and in this > > vfio-mdev mode it does not own the PCI device/IOMMU either. So why > > would this need to be part of the VFIO owernship and permission model? > > Doesn't the PASID model essentially just augment the requester ID IOMMU > model so as to manage the IOVAs for a subdevice of a RID? I'd say not really.. PASID is very different from RID because PASID must always be mediated by the kernel. vfio-pci doesn't know how to use PASID because it doesn't k
Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Mon, Sep 14, 2020 at 03:44:38PM -0700, Raj, Ashok wrote: > Hi Jason, > > I thought we discussed this at LPC, but still seems to be going in > circles :-(. We discused mdev at LPC, not PASID. PASID applies widely to many device and needs to be introduced with a wide community agreement so all scenarios will be supportable. > As you had suggested earlier in the mail thread could Jason Wang maybe > build out what it takes to have a full fledged /dev/sva interface for vDPA > and figure out how the interfaces should emerge? otherwise it appears > everyone is talking very high level and with that limited understanding of > how things work at the moment. You want Jason Wang to do the work to get Intel PASID support merged? Seems a bit of strange request. > This has to move ahead of these email discussions, hoping somone with the > right ideas would help move this forward. Why not try yourself to come up with a proposal? Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
Hi Jason, I thought we discussed this at LPC, but still seems to be going in circles :-(. On Mon, Sep 14, 2020 at 04:00:57PM -0300, Jason Gunthorpe wrote: > On Mon, Sep 14, 2020 at 12:23:28PM -0600, Alex Williamson wrote: > > On Mon, 14 Sep 2020 14:41:21 -0300 > > Jason Gunthorpe wrote: > > > > > On Mon, Sep 14, 2020 at 10:58:57AM -0600, Alex Williamson wrote: > > > > > > > "its own special way" is arguable, VFIO is just making use of what's > > > > being proposed as the uapi via its existing IOMMU interface. > > > > > > I mean, if we have a /dev/sva then it makes no sense to extend the > > > VFIO interfaces with the same stuff. VFIO should simply accept a PASID > > > created from /dev/sva and use it just like any other user-DMA driver > > > would. > > > > I don't think that's absolutely true. By the same logic, we could say > > that pci-sysfs provides access to PCI BAR and config space > > resources, > > No, it is the reverse, VFIO is a better version of pci-sysfs, so > pci-sysfs is the one that is obsoleted by VFIO. Similarly a /dev/sva > would be the superset interface for PASID, so whatver VFIO has would > be obsoleted. As you had suggested earlier in the mail thread could Jason Wang maybe build out what it takes to have a full fledged /dev/sva interface for vDPA and figure out how the interfaces should emerge? otherwise it appears everyone is talking very high level and with that limited understanding of how things work at the moment. As Kevin pointed out there are several aspects, and a real prototype from interested people would be the best way to understand the easy/hard aspects of moving between the proposals. - PASID allocation and life cycle management Managing both 1-1 (as its done today) and also support a guest PASID space. (Supporting guest PASID range is required for migration I suppose) - Page request processing. - Interaction with vIOMMU, vSVA requires vIOMMU for supporting invalidations, forwarding prq and such. - Supporting ENQCMD in guest. (Today its just in Intel products, but its also submitted to PCIe SIG) and if you are a member should be able to see that. FWIW, it might already be open for public review, it not now maybe pretty soon. For Intel we have some KVM interaction setting up the guest pasid->host pasid interaces. This has to move ahead of these email discussions, hoping somone with the right ideas would help move this forward. Cheers, Ashok ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Mon, 14 Sep 2020 16:00:57 -0300 Jason Gunthorpe wrote: > On Mon, Sep 14, 2020 at 12:23:28PM -0600, Alex Williamson wrote: > > On Mon, 14 Sep 2020 14:41:21 -0300 > > Jason Gunthorpe wrote: > > > > > On Mon, Sep 14, 2020 at 10:58:57AM -0600, Alex Williamson wrote: > > > > > > > "its own special way" is arguable, VFIO is just making use of what's > > > > being proposed as the uapi via its existing IOMMU interface. > > > > > > I mean, if we have a /dev/sva then it makes no sense to extend the > > > VFIO interfaces with the same stuff. VFIO should simply accept a PASID > > > created from /dev/sva and use it just like any other user-DMA driver > > > would. > > > > I don't think that's absolutely true. By the same logic, we could say > > that pci-sysfs provides access to PCI BAR and config space > > resources, > > No, it is the reverse, VFIO is a better version of pci-sysfs, so > pci-sysfs is the one that is obsoleted by VFIO. Similarly a /dev/sva > would be the superset interface for PASID, so whatver VFIO has would > be obsoleted. > > It would be very unusual for the kernel to have to 'preferred' > interfaces for the same thing, IMHO. The review process for uAPI > should really prevent that by allowing all interests to be served > while the uAPI is designed. > > > the VFIO device interface duplicates part of that interface therefore it > > should be abandoned. But in reality, VFIO providing access to those > > resources puts those accesses within the scope and control of the VFIO > > interface. > > Not clear to my why VFIO needs that. PASID seems quite orthogonal from > VFIO to me. Can you explain that further, or spit-ball what you think this /dev/sva interface looks like and how a user might interact between vfio and this new interface? The interface proposed here definitely does not seem orthogonal to the vfio IOMMU interface, ie. selecting a specific IOMMU domain mode during vfio setup, allocating pasids and associating them with page tables for that two-stage IOMMU setup, performing cache invalidations based on page table updates, etc. How does it make more sense for a vIOMMU to setup some aspects of the IOMMU through vfio and others through a TBD interface? > > > This has already happened, the SVA patches generally allow unpriv user > > > space to allocate a PASID for their process. > > > > > > If a device implements a mdev shared with a kernel driver (like IDXD) > > > then it will be sharing that PASID pool across both drivers. In this > > > case it makes no sense that VFIO has PASID quota logic because it has > > > an incomplete view. It could only make sense if VFIO is the exclusive > > > owner of the bus/device/function. > > > > > > The tracking logic needs to be global.. Most probably in some kind of > > > PASID cgroup controller? > > > > AIUI, that doesn't exist yet, so it makes sense that VFIO, as the > > mechanism through which a user would allocate a PASID, > > VFIO is not the exclusive user interface for PASID. Other SVA drivers > will allocate PASIDs. Any quota has to be implemented by the IOMMU > layer, and shared across all drivers. The IOMMU needs to allocate PASIDs, so in that sense it enforces a quota via the architectural limits, but is the IOMMU layer going to distinguish in-kernel versus user limits? A cgroup limit seems like a good idea, but that's not really at the IOMMU layer either and I don't see that a /dev/sva and vfio interface couldn't both support a cgroup type quota. > > space. Also, "unprivileged user" is a bit of a misnomer in this > > context as the VFIO user must be privileged with ownership of a device > > before they can even participate in PASID allocation. Is truly > > unprivileged access reasonable for a limited resource? > > I'm not talking about VFIO, I'm talking about the other SVA drivers. I > expect some of them will be unpriv safe, like IDXD, for > instance. > > Some way to manage the limited PASID resource will be necessary beyond > just VFIO. And it's not clear that they'll have compatible requirements. A userspace idxd driver might have limited needs versus a vIOMMU backend. Does a single quota model adequately support both or are we back to the differences between access to a device and ownership of a device? Maybe a single pasid per user makes sense in the former. If we could bring this discussion to some sort of more concrete proposal it might be easier to weigh the choices. > > QEMU typically runs in a sandbox with limited access, when a device or > > mdev is assigned to a VM, file permissions are configured to allow that > > access. QEMU doesn't get to poke at any random dev file it likes, > > that's part of how userspace reduces the potential attack surface. > > Plumbing the exact same APIs through VFIO's uAPI vs /dev/sva doesn't > reduce the attack surface. qemu can simply include /dev/sva in the > sandbox when using VFIO with no increase in attack surface from this > proposed series.
Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Mon, Sep 14, 2020 at 12:23:28PM -0600, Alex Williamson wrote: > On Mon, 14 Sep 2020 14:41:21 -0300 > Jason Gunthorpe wrote: > > > On Mon, Sep 14, 2020 at 10:58:57AM -0600, Alex Williamson wrote: > > > > > "its own special way" is arguable, VFIO is just making use of what's > > > being proposed as the uapi via its existing IOMMU interface. > > > > I mean, if we have a /dev/sva then it makes no sense to extend the > > VFIO interfaces with the same stuff. VFIO should simply accept a PASID > > created from /dev/sva and use it just like any other user-DMA driver > > would. > > I don't think that's absolutely true. By the same logic, we could say > that pci-sysfs provides access to PCI BAR and config space > resources, No, it is the reverse, VFIO is a better version of pci-sysfs, so pci-sysfs is the one that is obsoleted by VFIO. Similarly a /dev/sva would be the superset interface for PASID, so whatver VFIO has would be obsoleted. It would be very unusual for the kernel to have to 'preferred' interfaces for the same thing, IMHO. The review process for uAPI should really prevent that by allowing all interests to be served while the uAPI is designed. > the VFIO device interface duplicates part of that interface therefore it > should be abandoned. But in reality, VFIO providing access to those > resources puts those accesses within the scope and control of the VFIO > interface. Not clear to my why VFIO needs that. PASID seems quite orthogonal from VFIO to me. > > This has already happened, the SVA patches generally allow unpriv user > > space to allocate a PASID for their process. > > > > If a device implements a mdev shared with a kernel driver (like IDXD) > > then it will be sharing that PASID pool across both drivers. In this > > case it makes no sense that VFIO has PASID quota logic because it has > > an incomplete view. It could only make sense if VFIO is the exclusive > > owner of the bus/device/function. > > > > The tracking logic needs to be global.. Most probably in some kind of > > PASID cgroup controller? > > AIUI, that doesn't exist yet, so it makes sense that VFIO, as the > mechanism through which a user would allocate a PASID, VFIO is not the exclusive user interface for PASID. Other SVA drivers will allocate PASIDs. Any quota has to be implemented by the IOMMU layer, and shared across all drivers. > space. Also, "unprivileged user" is a bit of a misnomer in this > context as the VFIO user must be privileged with ownership of a device > before they can even participate in PASID allocation. Is truly > unprivileged access reasonable for a limited resource? I'm not talking about VFIO, I'm talking about the other SVA drivers. I expect some of them will be unpriv safe, like IDXD, for instance. Some way to manage the limited PASID resource will be necessary beyond just VFIO. > QEMU typically runs in a sandbox with limited access, when a device or > mdev is assigned to a VM, file permissions are configured to allow that > access. QEMU doesn't get to poke at any random dev file it likes, > that's part of how userspace reduces the potential attack surface. Plumbing the exact same APIs through VFIO's uAPI vs /dev/sva doesn't reduce the attack surface. qemu can simply include /dev/sva in the sandbox when using VFIO with no increase in attack surface from this proposed series. > This series is a blueprint within the context of the ownership and > permission model that VFIO already provides. It doesn't seem like we > can pluck that out on its own, nor is it necessarily the case that VFIO > wouldn't want to provide PASID services within its own API even if we > did have this undefined /dev/sva interface. I don't see what you do - VFIO does not own PASID, and in this vfio-mdev mode it does not own the PCI device/IOMMU either. So why would this need to be part of the VFIO owernship and permission model? Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Mon, 14 Sep 2020 14:41:21 -0300 Jason Gunthorpe wrote: > On Mon, Sep 14, 2020 at 10:58:57AM -0600, Alex Williamson wrote: > > > "its own special way" is arguable, VFIO is just making use of what's > > being proposed as the uapi via its existing IOMMU interface. > > I mean, if we have a /dev/sva then it makes no sense to extend the > VFIO interfaces with the same stuff. VFIO should simply accept a PASID > created from /dev/sva and use it just like any other user-DMA driver > would. I don't think that's absolutely true. By the same logic, we could say that pci-sysfs provides access to PCI BAR and config space resources, the VFIO device interface duplicates part of that interface therefore it should be abandoned. But in reality, VFIO providing access to those resources puts those accesses within the scope and control of the VFIO interface. Ownership of a device through vfio is provided by allowing the user access to the vfio group dev file, not by the group file, plus some number of resource files, and the config file, and running with admin permissions to see the full extent of config space. Reserved ranges for the IOMMU are also provided via sysfs, but VFIO includes a capability on the IOMMU get_info ioctl for the user to learn about available IOVA ranges w/o scraping through sysfs. > > are also a system resource, so we require some degree of access control > > and quotas for management of PASIDs. > > This has already happened, the SVA patches generally allow unpriv user > space to allocate a PASID for their process. > > If a device implements a mdev shared with a kernel driver (like IDXD) > then it will be sharing that PASID pool across both drivers. In this > case it makes no sense that VFIO has PASID quota logic because it has > an incomplete view. It could only make sense if VFIO is the exclusive > owner of the bus/device/function. > > The tracking logic needs to be global.. Most probably in some kind of > PASID cgroup controller? AIUI, that doesn't exist yet, so it makes sense that VFIO, as the mechanism through which a user would allocate a PASID, implements a reasonable quota to avoid an unprivileged user exhausting the address space. Also, "unprivileged user" is a bit of a misnomer in this context as the VFIO user must be privileged with ownership of a device before they can even participate in PASID allocation. Is truly unprivileged access reasonable for a limited resource? > > know whether an assigned device requires PASIDs such that access to > > this dev file is provided to QEMU? > > Wouldn't QEMU just open /dev/sva if it needs it? Like other dev files? > Why would it need something special? QEMU typically runs in a sandbox with limited access, when a device or mdev is assigned to a VM, file permissions are configured to allow that access. QEMU doesn't get to poke at any random dev file it likes, that's part of how userspace reduces the potential attack surface. > > would be an obvious DoS path if any user can create arbitrary > > allocations. If we can move code out of VFIO, I'm all for it, but I > > think it needs to be better defined than "implement magic universal sva > > uapi interface" before we can really consider it. Thanks, > > Jason began by saying VDPA will need this too, I agree with him. > > I'm not sure why it would be "magic"? This series already gives a > pretty solid blueprint for what the interface would need to > have. Interested folks need to sit down and talk about it not just > default everything to being built inside VFIO. This series is a blueprint within the context of the ownership and permission model that VFIO already provides. It doesn't seem like we can pluck that out on its own, nor is it necessarily the case that VFIO wouldn't want to provide PASID services within its own API even if we did have this undefined /dev/sva interface. Thanks, Alex ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Mon, Sep 14, 2020 at 10:58:57AM -0600, Alex Williamson wrote: > "its own special way" is arguable, VFIO is just making use of what's > being proposed as the uapi via its existing IOMMU interface. I mean, if we have a /dev/sva then it makes no sense to extend the VFIO interfaces with the same stuff. VFIO should simply accept a PASID created from /dev/sva and use it just like any other user-DMA driver would. > are also a system resource, so we require some degree of access control > and quotas for management of PASIDs. This has already happened, the SVA patches generally allow unpriv user space to allocate a PASID for their process. If a device implements a mdev shared with a kernel driver (like IDXD) then it will be sharing that PASID pool across both drivers. In this case it makes no sense that VFIO has PASID quota logic because it has an incomplete view. It could only make sense if VFIO is the exclusive owner of the bus/device/function. The tracking logic needs to be global.. Most probably in some kind of PASID cgroup controller? > know whether an assigned device requires PASIDs such that access to > this dev file is provided to QEMU? Wouldn't QEMU just open /dev/sva if it needs it? Like other dev files? Why would it need something special? > would be an obvious DoS path if any user can create arbitrary > allocations. If we can move code out of VFIO, I'm all for it, but I > think it needs to be better defined than "implement magic universal sva > uapi interface" before we can really consider it. Thanks, Jason began by saying VDPA will need this too, I agree with him. I'm not sure why it would be "magic"? This series already gives a pretty solid blueprint for what the interface would need to have. Interested folks need to sit down and talk about it not just default everything to being built inside VFIO. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Mon, 14 Sep 2020 13:33:54 -0300 Jason Gunthorpe wrote: > On Mon, Sep 14, 2020 at 09:22:47AM -0700, Raj, Ashok wrote: > > Hi Jason, > > > > On Mon, Sep 14, 2020 at 10:47:38AM -0300, Jason Gunthorpe wrote: > > > On Mon, Sep 14, 2020 at 03:31:13PM +0200, Jean-Philippe Brucker wrote: > > > > > > > > Jason suggest something like /dev/sva. There will be a lot of other > > > > > subsystems that could benefit from this (e.g vDPA). > > > > > > > > Do you have a more precise idea of the interface /dev/sva would provide, > > > > how it would interact with VFIO and others? vDPA could transport the > > > > generic iommu.h structures via its own uAPI, and call the IOMMU API > > > > directly without going through an intermediate /dev/sva handle. > > > > > > Prior to PASID IOMMU really only makes sense as part of vfio-pci > > > because the iommu can only key on the BDF. That can't work unless the > > > whole PCI function can be assigned. It is hard to see how a shared PCI > > > device can work with IOMMU like this, so may as well use vfio. > > > > > > SVA and various vIOMMU models change this, a shared PCI driver can > > > absoultely work with a PASID that is assigned to a VM safely, and > > > actually don't need to know if their PASID maps a mm_struct or > > > something else. > > > > Well, IOMMU does care if its a native mm_struct or something that belongs > > to guest. Because you need ability to forward page-requests and pickup > > page-responses from guest. Since there is just one PRQ on the IOMMU and > > responses can't be sent directly. You have to depend on vIOMMU type > > interface in guest to make all of this magic work right? > > Yes, IOMMU cares, but not the PCI Driver. It just knows it has a > PASID. Details on how page faultings is handled or how the mapping is > setup is abstracted by the PASID. > > > > This new PASID allocator would match the guest memory layout and > > > > Not sure what you mean by "match guest memory layout"? > > Probably, meaning first level is gVA or gIOVA? > > It means whatever the qemu/viommu/guest/etc needs across all the > IOMMU/arch implementations. > > Basically, there should only be two ways to get a PASID > - From mm_struct that mirrors the creating process > - Via '/dev/sva' which has an complete interface to create and >control a PASID suitable for virtualization and more > > VFIO should not have its own special way to get a PASID. "its own special way" is arguable, VFIO is just making use of what's being proposed as the uapi via its existing IOMMU interface. PASIDs are also a system resource, so we require some degree of access control and quotas for management of PASIDs. Does libvirt now get involved to know whether an assigned device requires PASIDs such that access to this dev file is provided to QEMU? How does the kernel validate usage or implement quotas when disconnected from device ownership? PASIDs would be an obvious DoS path if any user can create arbitrary allocations. If we can move code out of VFIO, I'm all for it, but I think it needs to be better defined than "implement magic universal sva uapi interface" before we can really consider it. Thanks, Alex ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Mon, Sep 14, 2020 at 09:22:47AM -0700, Raj, Ashok wrote: > Hi Jason, > > On Mon, Sep 14, 2020 at 10:47:38AM -0300, Jason Gunthorpe wrote: > > On Mon, Sep 14, 2020 at 03:31:13PM +0200, Jean-Philippe Brucker wrote: > > > > > > Jason suggest something like /dev/sva. There will be a lot of other > > > > subsystems that could benefit from this (e.g vDPA). > > > > > > Do you have a more precise idea of the interface /dev/sva would provide, > > > how it would interact with VFIO and others? vDPA could transport the > > > generic iommu.h structures via its own uAPI, and call the IOMMU API > > > directly without going through an intermediate /dev/sva handle. > > > > Prior to PASID IOMMU really only makes sense as part of vfio-pci > > because the iommu can only key on the BDF. That can't work unless the > > whole PCI function can be assigned. It is hard to see how a shared PCI > > device can work with IOMMU like this, so may as well use vfio. > > > > SVA and various vIOMMU models change this, a shared PCI driver can > > absoultely work with a PASID that is assigned to a VM safely, and > > actually don't need to know if their PASID maps a mm_struct or > > something else. > > Well, IOMMU does care if its a native mm_struct or something that belongs > to guest. Because you need ability to forward page-requests and pickup > page-responses from guest. Since there is just one PRQ on the IOMMU and > responses can't be sent directly. You have to depend on vIOMMU type > interface in guest to make all of this magic work right? Yes, IOMMU cares, but not the PCI Driver. It just knows it has a PASID. Details on how page faultings is handled or how the mapping is setup is abstracted by the PASID. > > This new PASID allocator would match the guest memory layout and > > Not sure what you mean by "match guest memory layout"? > Probably, meaning first level is gVA or gIOVA? It means whatever the qemu/viommu/guest/etc needs across all the IOMMU/arch implementations. Basically, there should only be two ways to get a PASID - From mm_struct that mirrors the creating process - Via '/dev/sva' which has an complete interface to create and control a PASID suitable for virtualization and more VFIO should not have its own special way to get a PASID. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
Hi Jason, On Mon, Sep 14, 2020 at 10:47:38AM -0300, Jason Gunthorpe wrote: > On Mon, Sep 14, 2020 at 03:31:13PM +0200, Jean-Philippe Brucker wrote: > > > > Jason suggest something like /dev/sva. There will be a lot of other > > > subsystems that could benefit from this (e.g vDPA). > > > > Do you have a more precise idea of the interface /dev/sva would provide, > > how it would interact with VFIO and others? vDPA could transport the > > generic iommu.h structures via its own uAPI, and call the IOMMU API > > directly without going through an intermediate /dev/sva handle. > > Prior to PASID IOMMU really only makes sense as part of vfio-pci > because the iommu can only key on the BDF. That can't work unless the > whole PCI function can be assigned. It is hard to see how a shared PCI > device can work with IOMMU like this, so may as well use vfio. > > SVA and various vIOMMU models change this, a shared PCI driver can > absoultely work with a PASID that is assigned to a VM safely, and > actually don't need to know if their PASID maps a mm_struct or > something else. Well, IOMMU does care if its a native mm_struct or something that belongs to guest. Because you need ability to forward page-requests and pickup page-responses from guest. Since there is just one PRQ on the IOMMU and responses can't be sent directly. You have to depend on vIOMMU type interface in guest to make all of this magic work right? > > So, some /dev/sva is another way to allocate a PASID that is not 1:1 > with mm_struct, as the existing SVA stuff enforces. ie it is a way to > program the DMA address map of the PASID. > > This new PASID allocator would match the guest memory layout and Not sure what you mean by "match guest memory layout"? Probably, meaning first level is gVA or gIOVA? Cheers, Ashok ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Mon, Sep 14, 2020 at 03:31:13PM +0200, Jean-Philippe Brucker wrote: > > Jason suggest something like /dev/sva. There will be a lot of other > > subsystems that could benefit from this (e.g vDPA). > > Do you have a more precise idea of the interface /dev/sva would provide, > how it would interact with VFIO and others? vDPA could transport the > generic iommu.h structures via its own uAPI, and call the IOMMU API > directly without going through an intermediate /dev/sva handle. Prior to PASID IOMMU really only makes sense as part of vfio-pci because the iommu can only key on the BDF. That can't work unless the whole PCI function can be assigned. It is hard to see how a shared PCI device can work with IOMMU like this, so may as well use vfio. SVA and various vIOMMU models change this, a shared PCI driver can absoultely work with a PASID that is assigned to a VM safely, and actually don't need to know if their PASID maps a mm_struct or something else. So, some /dev/sva is another way to allocate a PASID that is not 1:1 with mm_struct, as the existing SVA stuff enforces. ie it is a way to program the DMA address map of the PASID. This new PASID allocator would match the guest memory layout and support the IOMMU nesting stuff needed for vPASID. This is the common code for the complex cases of virtualization with PASID, shared by all user DMA drivers, including VFIO. It doesn't make a lot of sense to build a uAPI exclusive to VFIO just for PASID and vPASID. We already know everything doing user DMA will eventually need this stuff. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Mon, Sep 14, 2020 at 12:20:10PM +0800, Jason Wang wrote: > > On 2020/9/10 下午6:45, Liu Yi L wrote: > > Shared Virtual Addressing (SVA), a.k.a, Shared Virtual Memory (SVM) on > > Intel platforms allows address space sharing between device DMA and > > applications. SVA can reduce programming complexity and enhance security. > > > > This VFIO series is intended to expose SVA usage to VMs. i.e. Sharing > > guest application address space with passthru devices. This is called > > vSVA in this series. The whole vSVA enabling requires QEMU/VFIO/IOMMU > > changes. For IOMMU and QEMU changes, they are in separate series (listed > > in the "Related series"). > > > > The high-level architecture for SVA virtualization is as below, the key > > design of vSVA support is to utilize the dual-stage IOMMU translation ( > > also known as IOMMU nesting translation) capability in host IOMMU. > > > > > > .-. .---. > > | vIOMMU| | Guest process CR3, FL only| > > | | '---' > > ./ > > | PASID Entry |--- PASID cache flush - > > '-' | > > | | V > > | |CR3 in GPA > > '-' > > Guest > > --| Shadow |--| > >vv v > > Host > > .-. .--. > > | pIOMMU| | Bind FL for GVA-GPA | > > | | '--' > > ./ | > > | PASID Entry | V (Nested xlate) > > '\.--. > > | ||SL for GPA-HPA, default domain| > > | | '--' > > '-' > > Where: > > - FL = First level/stage one page tables > > - SL = Second level/stage two page tables > > > > Patch Overview: > > 1. reports IOMMU nesting info to userspace ( patch 0001, 0002, 0003, 0015 > > , 0016) > > 2. vfio support for PASID allocation and free for VMs (patch 0004, 0005, > > 0007) > > 3. a fix to a revisit in intel iommu driver (patch 0006) > > 4. vfio support for binding guest page table to host (patch 0008, 0009, > > 0010) > > 5. vfio support for IOMMU cache invalidation from VMs (patch 0011) > > 6. vfio support for vSVA usage on IOMMU-backed mdevs (patch 0012) > > 7. expose PASID capability to VM (patch 0013) > > 8. add doc for VFIO dual stage control (patch 0014) > > > If it's possible, I would suggest a generic uAPI instead of a VFIO specific > one. A large part of this work is already generic uAPI, in include/uapi/linux/iommu.h. This patchset connects that generic interface to the pre-existing VFIO uAPI that deals with IOMMU mappings of an assigned device. But the bulk of the work is done by the IOMMU subsystem, and is available to all device drivers. > Jason suggest something like /dev/sva. There will be a lot of other > subsystems that could benefit from this (e.g vDPA). Do you have a more precise idea of the interface /dev/sva would provide, how it would interact with VFIO and others? vDPA could transport the generic iommu.h structures via its own uAPI, and call the IOMMU API directly without going through an intermediate /dev/sva handle. Thanks, Jean > Have you ever considered this approach? > > Thanks > ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On Mon, Sep 14, 2020 at 10:38:10AM +, Tian, Kevin wrote: > is widely used thus can better help verify the core logic with > many existing devices. For vSVA, vDPA support has not be started > while VFIO support is close to be accepted. It doesn't make much > sense by blocking the VFIO part until vDPA is ready for wide You keep saying that, but if we keep ignoring the right architecture we end up with a mess inside VFIO just to save some development time. That is usually not how the kernel process works. Jason ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
> From: Jason Wang > Sent: Monday, September 14, 2020 4:57 PM > > On 2020/9/14 下午4:01, Tian, Kevin wrote: > >> From: Jason Wang > >> Sent: Monday, September 14, 2020 12:20 PM > >> > >> On 2020/9/10 下午6:45, Liu Yi L wrote: > >>> Shared Virtual Addressing (SVA), a.k.a, Shared Virtual Memory (SVM) on > >>> Intel platforms allows address space sharing between device DMA and > >>> applications. SVA can reduce programming complexity and enhance > >> security. > >>> This VFIO series is intended to expose SVA usage to VMs. i.e. Sharing > >>> guest application address space with passthru devices. This is called > >>> vSVA in this series. The whole vSVA enabling requires > QEMU/VFIO/IOMMU > >>> changes. For IOMMU and QEMU changes, they are in separate series > (listed > >>> in the "Related series"). > >>> > >>> The high-level architecture for SVA virtualization is as below, the key > >>> design of vSVA support is to utilize the dual-stage IOMMU translation ( > >>> also known as IOMMU nesting translation) capability in host IOMMU. > >>> > >>> > >>> .-. .---. > >>> | vIOMMU| | Guest process CR3, FL only| > >>> | | '---' > >>> ./ > >>> | PASID Entry |--- PASID cache flush - > >>> '-' | > >>> | | V > >>> | |CR3 in GPA > >>> '-' > >>> Guest > >>> --| Shadow |--| > >>> vv v > >>> Host > >>> .-. .--. > >>> | pIOMMU| | Bind FL for GVA-GPA | > >>> | | '--' > >>> ./ | > >>> | PASID Entry | V (Nested xlate) > >>> '\.--. > >>> | ||SL for GPA-HPA, default domain| > >>> | | '--' > >>> '-' > >>> Where: > >>>- FL = First level/stage one page tables > >>>- SL = Second level/stage two page tables > >>> > >>> Patch Overview: > >>>1. reports IOMMU nesting info to userspace ( patch 0001, 0002, 0003, > >> 0015 , 0016) > >>>2. vfio support for PASID allocation and free for VMs (patch 0004, > >>> 0005, > >> 0007) > >>>3. a fix to a revisit in intel iommu driver (patch 0006) > >>>4. vfio support for binding guest page table to host (patch 0008, 0009, > >> 0010) > >>>5. vfio support for IOMMU cache invalidation from VMs (patch 0011) > >>>6. vfio support for vSVA usage on IOMMU-backed mdevs (patch 0012) > >>>7. expose PASID capability to VM (patch 0013) > >>>8. add doc for VFIO dual stage control (patch 0014) > >> > >> If it's possible, I would suggest a generic uAPI instead of a VFIO > >> specific one. > >> > >> Jason suggest something like /dev/sva. There will be a lot of other > >> subsystems that could benefit from this (e.g vDPA). > >> > > Just be curious. When does vDPA subsystem plan to support vSVA and > > when could one expect a SVA-capable vDPA device in market? > > > > Thanks > > Kevin > > > vSVA is in the plan but there's no ETA. I think we might start the work > after control vq support. It will probably start from SVA first and > then vSVA (since it might require platform support). > > For the device part, it really depends on the chipset and other device > vendors. We plan to do the prototype in virtio by introducing PASID > support in the spec. > Thanks for the info. Then here is my thought. First, I don't think /dev/sva is the right interface. Once we start considering such generic uAPI, it better behaves as the one interface for all kinds of DMA requirements on device/subdevice passthrough. Nested page table thru vSVA is one way. Manual map/unmap is another way. It doesn't make sense to have one through generic uAPI and the other through subsystem specific uAPI. In the end the interface might become /dev/iommu, for delegating certain IOMMU operations to userspace. In addition, delegated IOMMU operations have different scopes. PASID allocation is per process/VM. pgtbl-bind/unbind, map/unmap and cache invalidation are per iommu domain. page request/ response are per device/subdevice. This requires the uAPI to also understand and manage the association between domain/group/ device/subdevice (such as group attach/detach), instead of doing it separately in VFIO or vDPA as today. Based on above, I feel a more reasonable way is to first make a /dev/iommu uAPI supporting DMA map/unmap usages and then introduce vSVA to it. Doing this order is because DMA map/unmap is widely used thus can better help verify the core logic with many existing devices. For vSVA, vDPA support has not be started while VFIO support is close to be accepted. It doesn't make much sense by blocking the VFIO part unti
Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On 2020/9/14 下午4:01, Tian, Kevin wrote: From: Jason Wang Sent: Monday, September 14, 2020 12:20 PM On 2020/9/10 下午6:45, Liu Yi L wrote: Shared Virtual Addressing (SVA), a.k.a, Shared Virtual Memory (SVM) on Intel platforms allows address space sharing between device DMA and applications. SVA can reduce programming complexity and enhance security. This VFIO series is intended to expose SVA usage to VMs. i.e. Sharing guest application address space with passthru devices. This is called vSVA in this series. The whole vSVA enabling requires QEMU/VFIO/IOMMU changes. For IOMMU and QEMU changes, they are in separate series (listed in the "Related series"). The high-level architecture for SVA virtualization is as below, the key design of vSVA support is to utilize the dual-stage IOMMU translation ( also known as IOMMU nesting translation) capability in host IOMMU. .-. .---. | vIOMMU| | Guest process CR3, FL only| | | '---' ./ | PASID Entry |--- PASID cache flush - '-' | | | V | |CR3 in GPA '-' Guest --| Shadow |--| vv v Host .-. .--. | pIOMMU| | Bind FL for GVA-GPA | | | '--' ./ | | PASID Entry | V (Nested xlate) '\.--. | ||SL for GPA-HPA, default domain| | | '--' '-' Where: - FL = First level/stage one page tables - SL = Second level/stage two page tables Patch Overview: 1. reports IOMMU nesting info to userspace ( patch 0001, 0002, 0003, 0015 , 0016) 2. vfio support for PASID allocation and free for VMs (patch 0004, 0005, 0007) 3. a fix to a revisit in intel iommu driver (patch 0006) 4. vfio support for binding guest page table to host (patch 0008, 0009, 0010) 5. vfio support for IOMMU cache invalidation from VMs (patch 0011) 6. vfio support for vSVA usage on IOMMU-backed mdevs (patch 0012) 7. expose PASID capability to VM (patch 0013) 8. add doc for VFIO dual stage control (patch 0014) If it's possible, I would suggest a generic uAPI instead of a VFIO specific one. Jason suggest something like /dev/sva. There will be a lot of other subsystems that could benefit from this (e.g vDPA). Just be curious. When does vDPA subsystem plan to support vSVA and when could one expect a SVA-capable vDPA device in market? Thanks Kevin vSVA is in the plan but there's no ETA. I think we might start the work after control vq support. It will probably start from SVA first and then vSVA (since it might require platform support). For the device part, it really depends on the chipset and other device vendors. We plan to do the prototype in virtio by introducing PASID support in the spec. Thanks ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
> From: Jason Wang > Sent: Monday, September 14, 2020 12:20 PM > > On 2020/9/10 下午6:45, Liu Yi L wrote: > > Shared Virtual Addressing (SVA), a.k.a, Shared Virtual Memory (SVM) on > > Intel platforms allows address space sharing between device DMA and > > applications. SVA can reduce programming complexity and enhance > security. > > > > This VFIO series is intended to expose SVA usage to VMs. i.e. Sharing > > guest application address space with passthru devices. This is called > > vSVA in this series. The whole vSVA enabling requires QEMU/VFIO/IOMMU > > changes. For IOMMU and QEMU changes, they are in separate series (listed > > in the "Related series"). > > > > The high-level architecture for SVA virtualization is as below, the key > > design of vSVA support is to utilize the dual-stage IOMMU translation ( > > also known as IOMMU nesting translation) capability in host IOMMU. > > > > > > .-. .---. > > | vIOMMU| | Guest process CR3, FL only| > > | | '---' > > ./ > > | PASID Entry |--- PASID cache flush - > > '-' | > > | | V > > | |CR3 in GPA > > '-' > > Guest > > --| Shadow |--| > >vv v > > Host > > .-. .--. > > | pIOMMU| | Bind FL for GVA-GPA | > > | | '--' > > ./ | > > | PASID Entry | V (Nested xlate) > > '\.--. > > | ||SL for GPA-HPA, default domain| > > | | '--' > > '-' > > Where: > > - FL = First level/stage one page tables > > - SL = Second level/stage two page tables > > > > Patch Overview: > > 1. reports IOMMU nesting info to userspace ( patch 0001, 0002, 0003, > 0015 , 0016) > > 2. vfio support for PASID allocation and free for VMs (patch 0004, 0005, > 0007) > > 3. a fix to a revisit in intel iommu driver (patch 0006) > > 4. vfio support for binding guest page table to host (patch 0008, 0009, > 0010) > > 5. vfio support for IOMMU cache invalidation from VMs (patch 0011) > > 6. vfio support for vSVA usage on IOMMU-backed mdevs (patch 0012) > > 7. expose PASID capability to VM (patch 0013) > > 8. add doc for VFIO dual stage control (patch 0014) > > > If it's possible, I would suggest a generic uAPI instead of a VFIO > specific one. > > Jason suggest something like /dev/sva. There will be a lot of other > subsystems that could benefit from this (e.g vDPA). > Just be curious. When does vDPA subsystem plan to support vSVA and when could one expect a SVA-capable vDPA device in market? Thanks Kevin ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu
Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs
On 2020/9/10 下午6:45, Liu Yi L wrote: Shared Virtual Addressing (SVA), a.k.a, Shared Virtual Memory (SVM) on Intel platforms allows address space sharing between device DMA and applications. SVA can reduce programming complexity and enhance security. This VFIO series is intended to expose SVA usage to VMs. i.e. Sharing guest application address space with passthru devices. This is called vSVA in this series. The whole vSVA enabling requires QEMU/VFIO/IOMMU changes. For IOMMU and QEMU changes, they are in separate series (listed in the "Related series"). The high-level architecture for SVA virtualization is as below, the key design of vSVA support is to utilize the dual-stage IOMMU translation ( also known as IOMMU nesting translation) capability in host IOMMU. .-. .---. | vIOMMU| | Guest process CR3, FL only| | | '---' ./ | PASID Entry |--- PASID cache flush - '-' | | | V | |CR3 in GPA '-' Guest --| Shadow |--| vv v Host .-. .--. | pIOMMU| | Bind FL for GVA-GPA | | | '--' ./ | | PASID Entry | V (Nested xlate) '\.--. | ||SL for GPA-HPA, default domain| | | '--' '-' Where: - FL = First level/stage one page tables - SL = Second level/stage two page tables Patch Overview: 1. reports IOMMU nesting info to userspace ( patch 0001, 0002, 0003, 0015 , 0016) 2. vfio support for PASID allocation and free for VMs (patch 0004, 0005, 0007) 3. a fix to a revisit in intel iommu driver (patch 0006) 4. vfio support for binding guest page table to host (patch 0008, 0009, 0010) 5. vfio support for IOMMU cache invalidation from VMs (patch 0011) 6. vfio support for vSVA usage on IOMMU-backed mdevs (patch 0012) 7. expose PASID capability to VM (patch 0013) 8. add doc for VFIO dual stage control (patch 0014) If it's possible, I would suggest a generic uAPI instead of a VFIO specific one. Jason suggest something like /dev/sva. There will be a lot of other subsystems that could benefit from this (e.g vDPA). Have you ever considered this approach? Thanks ___ iommu mailing list iommu@lists.linux-foundation.org https://lists.linuxfoundation.org/mailman/listinfo/iommu