Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-11-04 Thread Jason Gunthorpe
On Tue, Nov 03, 2020 at 08:14:29PM +0100, j...@8bytes.org wrote:
> On Tue, Nov 03, 2020 at 01:48:51PM -0400, Jason Gunthorpe wrote:
> > I think the same PCI driver with a small flag to support the PF or
> > VF is not the same as two completely different drivers in different
> > subsystems
> 
> There are counter-examples: ixgbe vs. ixgbevf.
>
> Note that also a single driver can support both, an SVA device and an
> mdev device, sharing code for accessing parts of the device like queues
> and handling interrupts.

Needing a mdev device at all is the larger issue, mdev means the
kernel must carry a lot of emulation code depending on how the SVA
device is designed. Eg creating queues may require an emulated BAR.

Shifting that code to userspace and having a single clean 'SVA'
interface from the kernel for the device makes a lot more sense,
esepcially from a security perspective.

Forcing all vIOMMU stuff to only use VFIO permanently closes this as
an option.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-11-03 Thread j...@8bytes.org
On Tue, Nov 03, 2020 at 01:48:51PM -0400, Jason Gunthorpe wrote:
> I think the same PCI driver with a small flag to support the PF or
> VF is not the same as two completely different drivers in different
> subsystems

There are counter-examples: ixgbe vs. ixgbevf.

Note that also a single driver can support both, an SVA device and an
mdev device, sharing code for accessing parts of the device like queues
and handling interrupts.

Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-11-03 Thread Jason Gunthorpe
On Tue, Nov 03, 2020 at 05:55:40PM +0100, j...@8bytes.org wrote:
> On Tue, Nov 03, 2020 at 11:22:23AM -0400, Jason Gunthorpe wrote:
> > This whole thread was brought up by IDXD which has a SVA driver and
> > now wants to add a vfio-mdev driver too. SVA devices that want to be
> > plugged into VMs are going to be common - this architecture that a SVA
> > driver cannot cover the kvm case seems problematic.
> 
> Isn't that the same pattern as having separate drivers for VFs and the
> parent device in SR-IOV?

I think the same PCI driver with a small flag to support the PF or
VF is not the same as two completely different drivers in different
subsystems

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-11-03 Thread j...@8bytes.org
On Tue, Nov 03, 2020 at 11:22:23AM -0400, Jason Gunthorpe wrote:
> This whole thread was brought up by IDXD which has a SVA driver and
> now wants to add a vfio-mdev driver too. SVA devices that want to be
> plugged into VMs are going to be common - this architecture that a SVA
> driver cannot cover the kvm case seems problematic.

Isn't that the same pattern as having separate drivers for VFs and the
parent device in SR-IOV?

Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-11-03 Thread Jason Gunthorpe
On Tue, Nov 03, 2020 at 03:35:32PM +0100, j...@8bytes.org wrote:
> On Tue, Nov 03, 2020 at 10:06:42AM -0400, Jason Gunthorpe wrote:
> > The point is that other places beyond VFIO need this
> 
> Which and why?
>
> > Sure, but sometimes it is necessary, and in those cases the answer
> > can't be "rewrite a SVA driver to use vfio"
> 
> This is getting to abstract. Can you come up with an example where
> handling this in VFIO or an endpoint device kernel driver does not work?

This whole thread was brought up by IDXD which has a SVA driver and
now wants to add a vfio-mdev driver too. SVA devices that want to be
plugged into VMs are going to be common - this architecture that a SVA
driver cannot cover the kvm case seems problematic.

Yes, everything can have a SVA driver and a vfio-mdev, it works just
fine, but it is not very clean or simple.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-11-03 Thread j...@8bytes.org
On Tue, Nov 03, 2020 at 10:06:42AM -0400, Jason Gunthorpe wrote:
> The point is that other places beyond VFIO need this

Which and why?

> Sure, but sometimes it is necessary, and in those cases the answer
> can't be "rewrite a SVA driver to use vfio"

This is getting to abstract. Can you come up with an example where
handling this in VFIO or an endpoint device kernel driver does not work?

Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-11-03 Thread Jason Gunthorpe
On Tue, Nov 03, 2020 at 03:03:18PM +0100, j...@8bytes.org wrote:
> On Tue, Nov 03, 2020 at 09:23:35AM -0400, Jason Gunthorpe wrote:
> > Userspace needs fine grained control over the composition of the page
> > table behind the PASID, 1:1 with the mm_struct is only one use case.
> 
> VFIO already offers an interface for that. It shouldn't be too
> complicated to expand that for PASID-bound page-tables.
> 
> > Userspace needs to be able to handle IOMMU faults, apparently
> 
> Could be implemented by a fault-fd handed out by VFIO.

The point is that other places beyond VFIO need this

> I really don't think that user-space should have to deal with details
> like PASIDs or other IOMMU internals, unless absolutly necessary. This
> is an OS we work on, and the idea behind an OS is to abstract the
> hardware away.

Sure, but sometimes it is necessary, and in those cases the answer
can't be "rewrite a SVA driver to use vfio"

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-11-03 Thread j...@8bytes.org
On Tue, Nov 03, 2020 at 09:23:35AM -0400, Jason Gunthorpe wrote:
> Userspace needs fine grained control over the composition of the page
> table behind the PASID, 1:1 with the mm_struct is only one use case.

VFIO already offers an interface for that. It shouldn't be too
complicated to expand that for PASID-bound page-tables.

> Userspace needs to be able to handle IOMMU faults, apparently

Could be implemented by a fault-fd handed out by VFIO.

> The Intel guys had a bunch of other stuff too, looking through the new
> API they are proposing for vfio gives some flavour what they think is
> needed..

I really don't think that user-space should have to deal with details
like PASIDs or other IOMMU internals, unless absolutly necessary. This
is an OS we work on, and the idea behind an OS is to abstract the
hardware away.

Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-11-03 Thread Jason Gunthorpe
On Tue, Nov 03, 2020 at 02:18:52PM +0100, j...@8bytes.org wrote:
> On Tue, Nov 03, 2020 at 08:56:43AM -0400, Jason Gunthorpe wrote:
> > On Tue, Nov 03, 2020 at 10:52:09AM +0100, j...@8bytes.org wrote:
> > > So having said this, what is the benefit of exposing those SVA internals
> > > to user-space?
> > 
> > Only the device use of the PASID is device specific, the actual PASID
> > and everything on the IOMMU side is generic.
> > 
> > There is enough API there it doesn't make sense to duplicate it into
> > every single SVA driver.
> 
> What generic things have to be done by the drivers besides
> allocating/deallocating PASIDs and binding an address space to it?
> 
> Is there anything which isn't better handled in a kernel-internal
> library which drivers just use?

Userspace needs fine grained control over the composition of the page
table behind the PASID, 1:1 with the mm_struct is only one use case.

Userspace needs to be able to handle IOMMU faults, apparently

The Intel guys had a bunch of other stuff too, looking through the new
API they are proposing for vfio gives some flavour what they think is
needed..

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-11-03 Thread j...@8bytes.org
On Tue, Nov 03, 2020 at 08:56:43AM -0400, Jason Gunthorpe wrote:
> On Tue, Nov 03, 2020 at 10:52:09AM +0100, j...@8bytes.org wrote:
> > So having said this, what is the benefit of exposing those SVA internals
> > to user-space?
> 
> Only the device use of the PASID is device specific, the actual PASID
> and everything on the IOMMU side is generic.
> 
> There is enough API there it doesn't make sense to duplicate it into
> every single SVA driver.

What generic things have to be done by the drivers besides
allocating/deallocating PASIDs and binding an address space to it?

Is there anything which isn't better handled in a kernel-internal
library which drivers just use?

Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-11-03 Thread Jason Gunthorpe
On Tue, Nov 03, 2020 at 10:52:09AM +0100, j...@8bytes.org wrote:
> So having said this, what is the benefit of exposing those SVA internals
> to user-space?

Only the device use of the PASID is device specific, the actual PASID
and everything on the IOMMU side is generic.

There is enough API there it doesn't make sense to duplicate it into
every single SVA driver.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-11-03 Thread j...@8bytes.org
On Mon, Oct 12, 2020 at 08:38:54AM +, Tian, Kevin wrote:
> > From: Jason Wang 

> > Jason suggest something like /dev/sva. There will be a lot of other
> > subsystems that could benefit from this (e.g vDPA).

Honestly, I fail to see the benefit of offloading these IOMMU specific
setup tasks to user-space.

The ways PASID and the device partitioning it allows are used are very
device specific. A GPU will be partitioned completly different than a
network card. So the device drivers should use the (v)SVA APIs to setup
the partitioning in a way which makes sense for the device.

And VFIO is of course a user by itself, as it allows assigning device
partitions to guests. Or even allow assigning complete devices and allow
the guests to partition it themselfes.

So having said this, what is the benefit of exposing those SVA internals
to user-space?

Regards,

Joerg
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-21 Thread Jason Wang


On 2020/10/22 上午11:54, Liu, Yi L wrote:

Hi Jason,


From: Jason Wang 
Sent: Thursday, October 22, 2020 10:56 AM


[...]

If you(Intel) don't have plan to do vDPA, you should not prevent other vendors
from implementing PASID capable hardware through non-VFIO subsystem/uAPI
on top of your SIOV architecture. Isn't it?

yes, that's true.


So if Intel has the willing to collaborate on the POC, I'd happy to help. E.g 
it's not
hard to have a PASID capable virtio device through qemu, and we can start from
there.

actually, I'm already doing a poc to move the PASID allocation/free interface
out of VFIO. So that other users could use it as well. I think this is also
what you replied previously. :-) I'll send out when it's ready and seek for
your help on mature it. does it sound good to you?



Yes, fine with me.

Thanks




Regards,
Yi Liu


Thanks




___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-21 Thread Liu, Yi L
Hi Jason,

> From: Jason Wang 
> Sent: Thursday, October 22, 2020 10:56 AM
> 
[...]
> If you(Intel) don't have plan to do vDPA, you should not prevent other vendors
> from implementing PASID capable hardware through non-VFIO subsystem/uAPI
> on top of your SIOV architecture. Isn't it?

yes, that's true.

> So if Intel has the willing to collaborate on the POC, I'd happy to help. E.g 
> it's not
> hard to have a PASID capable virtio device through qemu, and we can start from
> there.

actually, I'm already doing a poc to move the PASID allocation/free interface
out of VFIO. So that other users could use it as well. I think this is also
what you replied previously. :-) I'll send out when it's ready and seek for
your help on mature it. does it sound good to you?

Regards,
Yi Liu

> 
> Thanks
> 
> 
> >

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-21 Thread Jason Wang


On 2020/10/22 上午1:51, Raj, Ashok wrote:

On Wed, Oct 21, 2020 at 08:48:29AM -0300, Jason Gunthorpe wrote:

On Tue, Oct 20, 2020 at 01:27:13PM -0700, Raj, Ashok wrote:

On Tue, Oct 20, 2020 at 05:14:03PM -0300, Jason Gunthorpe wrote:

On Tue, Oct 20, 2020 at 01:08:44PM -0700, Raj, Ashok wrote:

On Tue, Oct 20, 2020 at 04:55:57PM -0300, Jason Gunthorpe wrote:

On Tue, Oct 20, 2020 at 12:51:46PM -0700, Raj, Ashok wrote:

I think we agreed (or agree to disagree and commit) for device types that
we have for SIOV, VFIO based approach works well without having to re-invent
another way to do the same things. Not looking for a shortcut by any means,
but we need to plan around existing hardware though. Looks like vDPA took
some shortcuts then to not abstract iommu uAPI instead :-)? When all
necessary hardware was available.. This would be a solved puzzle.

I think it is the opposite, vIOMMU and related has outgrown VFIO as
the "home" and needs to stand alone.

Apparently the HW that will need PASID for vDPA is Intel HW, so if

So just to make this clear, I did check internally if there are any plans
for vDPA + SVM. There are none at the moment.

Not SVM, SIOV.

... And that included.. I should have said vDPA + PASID, No current plans.
I have no idea who set expectations with you. Can you please put me in touch
with that person, privately is fine.

It was the team that aruged VDPA had to be done through VFIO - SIOV
and PASID was one of their reasons it had to be VFIO, check the list
archives

Humm... I could search the arhives, but the point is I'm confirming that
there is no forward looking plan!

And who ever did was it was based on probably strawman hypothetical argument 
that wasn't
grounded in reality.


If they didn't plan to use it, bit of a strawman argument, right?

This doesn't need to continue like the debates :-) Pun intended :-)

I don't think it makes any sense to have an abstract strawman argument
design discussion. Yi is looking into for pasid management alone. Rest
of the IOMMU related topics should wait until we have another
*real* use requiring consolidation.

Contrary to your argument, vDPA went with a half blown device only
iommu user without considering existing abstractions like containers
and such in VFIO is part of the reason the gap is big at the moment.
And you might not agree, but that's beside the point.



Can you explain why it must care VFIO abstractions? vDPA is trying to 
hide device details which is fundamentally different with what VFIO 
wants to do. vDPA allows the parent to deal with IOMMU stuffs, and if 
necessary, the parent can talk with IOMMU drivers directly via IOMMU APIs.



  


Rather than pivot ourselves around hypothetical, strawman,
non-intersecting, suggesting architecture without having done a proof of
concept to validate the proposal should stop. We have to ground ourselves
in reality.



The reality is VFIO should not be the only user for (v)SVA/SIOV/PASID. 
The kernel hard already had users like GPU or uacce.





The use cases we have so far for SIOV, VFIO and mdev seem to be the right
candidates and addresses them well. Now you might disagree, but as noted we
all agreed to move past this.



The mdev is not perfect for sure, but it's another topic.

If you(Intel) don't have plan to do vDPA, you should not prevent other 
vendors from implementing PASID capable hardware through non-VFIO 
subsystem/uAPI on top of your SIOV architecture. Isn't it?


So if Intel has the willing to collaborate on the POC, I'd happy to 
help. E.g it's not hard to have a PASID capable virtio device through 
qemu, and we can start from there.


Thanks






___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-21 Thread Raj, Ashok
On Wed, Oct 21, 2020 at 08:32:18PM -0300, Jason Gunthorpe wrote:
> On Wed, Oct 21, 2020 at 01:03:15PM -0700, Raj, Ashok wrote:
> 
> > I'm not sure why you tie in IDXD and VDPA here. How IDXD uses native
> > SVM is orthogonal to how we achieve mdev passthrough to guest and
> > vSVM.
> 
> Everyone assumes that vIOMMU and SIOV aka PASID is going to be needed
> on the VDPA side as well, I think that is why JasonW brought this up
> in the first place.

True, to that effect we are working on trying to move PASID allocation
outside of VFIO, so both agents VFIO and vDPA with PASID, when that comes
available can support one way to allocate and manage PASID's from user
space.

Since the IOASID is almost standalone, this is possible.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-21 Thread Jason Gunthorpe
On Wed, Oct 21, 2020 at 01:03:15PM -0700, Raj, Ashok wrote:

> I'm not sure why you tie in IDXD and VDPA here. How IDXD uses native
> SVM is orthogonal to how we achieve mdev passthrough to guest and
> vSVM.

Everyone assumes that vIOMMU and SIOV aka PASID is going to be needed
on the VDPA side as well, I think that is why JasonW brought this up
in the first place.

We may not see vSVA for VDPA, but that seems like some special sub
mode of all the other vIOMMU and PASID stuff, and not a completely
distinct thing.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-21 Thread Raj, Ashok
On Wed, Oct 21, 2020 at 03:24:42PM -0300, Jason Gunthorpe wrote:
> 
> > Contrary to your argument, vDPA went with a half blown device only 
> > iommu user without considering existing abstractions like containers
> 
> VDPA IOMMU was done *for Intel*, as the kind of half-architected thing
> you are advocating should be allowed for IDXD here. Not sure why you
> think bashing that work is going to help your case here.

I'm not bashing that work, sorry if it comes out that way, 
but just feels like double standards.

I'm not sure why you tie in IDXD and VDPA here. How IDXD uses native
SVM is orthogonal to how we achieve mdev passthrough to guest and vSVM. 
We visited that exact thing multiple times. Doing SVM is quite simple and 
doesn't carry the weight of other (Kevin explained this in detail 
not too long ago) long list of things we need to accomplish for mdev pass 
through. 

For SVM, just access to hw, mmio and bind_mm to get a PASID bound with
IOMMU. 

For IDXD that creates passthough devices for guest access and vSVM is
through the VFIO path. 

For guest SVM, we expose mdev's to guest OS, idxd in the guest provides vSVM
services. vSVM is *not* build around native SVM interfaces. 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-21 Thread Raj, Ashok
On Wed, Oct 21, 2020 at 08:48:29AM -0300, Jason Gunthorpe wrote:
> On Tue, Oct 20, 2020 at 01:27:13PM -0700, Raj, Ashok wrote:
> > On Tue, Oct 20, 2020 at 05:14:03PM -0300, Jason Gunthorpe wrote:
> > > On Tue, Oct 20, 2020 at 01:08:44PM -0700, Raj, Ashok wrote:
> > > > On Tue, Oct 20, 2020 at 04:55:57PM -0300, Jason Gunthorpe wrote:
> > > > > On Tue, Oct 20, 2020 at 12:51:46PM -0700, Raj, Ashok wrote:
> > > > > > I think we agreed (or agree to disagree and commit) for device 
> > > > > > types that 
> > > > > > we have for SIOV, VFIO based approach works well without having to 
> > > > > > re-invent 
> > > > > > another way to do the same things. Not looking for a shortcut by 
> > > > > > any means, 
> > > > > > but we need to plan around existing hardware though. Looks like 
> > > > > > vDPA took 
> > > > > > some shortcuts then to not abstract iommu uAPI instead :-)? When all
> > > > > > necessary hardware was available.. This would be a solved puzzle. 
> > > > > 
> > > > > I think it is the opposite, vIOMMU and related has outgrown VFIO as
> > > > > the "home" and needs to stand alone.
> > > > > 
> > > > > Apparently the HW that will need PASID for vDPA is Intel HW, so if
> > > > 
> > > > So just to make this clear, I did check internally if there are any 
> > > > plans
> > > > for vDPA + SVM. There are none at the moment. 
> > > 
> > > Not SVM, SIOV.
> > 
> > ... And that included.. I should have said vDPA + PASID, No current plans. 
> > I have no idea who set expectations with you. Can you please put me in 
> > touch 
> > with that person, privately is fine.
> 
> It was the team that aruged VDPA had to be done through VFIO - SIOV
> and PASID was one of their reasons it had to be VFIO, check the list
> archives

Humm... I could search the arhives, but the point is I'm confirming that
there is no forward looking plan!

And who ever did was it was based on probably strawman hypothetical argument 
that wasn't
grounded in reality. 

> 
> If they didn't plan to use it, bit of a strawman argument, right?

This doesn't need to continue like the debates :-) Pun intended :-)

I don't think it makes any sense to have an abstract strawman argument
design discussion. Yi is looking into for pasid management alone. Rest 
of the IOMMU related topics should wait until we have another 
*real* use requiring consolidation. 

Contrary to your argument, vDPA went with a half blown device only 
iommu user without considering existing abstractions like containers 
and such in VFIO is part of the reason the gap is big at the moment.
And you might not agree, but that's beside the point. 

Rather than pivot ourselves around hypothetical, strawman,
non-intersecting, suggesting architecture without having done a proof of
concept to validate the proposal should stop. We have to ground ourselves
in reality.

The use cases we have so far for SIOV, VFIO and mdev seem to be the right
candidates and addresses them well. Now you might disagree, but as noted we
all agreed to move past this.

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-21 Thread Jason Gunthorpe
On Wed, Oct 21, 2020 at 10:51:46AM -0700, Raj, Ashok wrote:

> > If they didn't plan to use it, bit of a strawman argument, right?
> 
> This doesn't need to continue like the debates :-) Pun intended :-)
> 
> I don't think it makes any sense to have an abstract strawman argument
> design discussion. Yi is looking into for pasid management alone. Rest 
> of the IOMMU related topics should wait until we have another 
> *real* use requiring consolidation. 

Actually I'm really annoyed right now that the other Intel team wasted
quiet a lot of the rest of our time on arguing about vDPA and vfio
with no actual interest in this technology.

So you'll excuse me if I'm not particularly enamored with this
discussion right now.

> Contrary to your argument, vDPA went with a half blown device only 
> iommu user without considering existing abstractions like containers

VDPA IOMMU was done *for Intel*, as the kind of half-architected thing
you are advocating should be allowed for IDXD here. Not sure why you
think bashing that work is going to help your case here.

I'm saying Intel needs to get its architecture together and stop
ceating this mess across the kernel to support Intel devices.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-21 Thread Jason Gunthorpe
On Tue, Oct 20, 2020 at 01:27:13PM -0700, Raj, Ashok wrote:
> On Tue, Oct 20, 2020 at 05:14:03PM -0300, Jason Gunthorpe wrote:
> > On Tue, Oct 20, 2020 at 01:08:44PM -0700, Raj, Ashok wrote:
> > > On Tue, Oct 20, 2020 at 04:55:57PM -0300, Jason Gunthorpe wrote:
> > > > On Tue, Oct 20, 2020 at 12:51:46PM -0700, Raj, Ashok wrote:
> > > > > I think we agreed (or agree to disagree and commit) for device types 
> > > > > that 
> > > > > we have for SIOV, VFIO based approach works well without having to 
> > > > > re-invent 
> > > > > another way to do the same things. Not looking for a shortcut by any 
> > > > > means, 
> > > > > but we need to plan around existing hardware though. Looks like vDPA 
> > > > > took 
> > > > > some shortcuts then to not abstract iommu uAPI instead :-)? When all
> > > > > necessary hardware was available.. This would be a solved puzzle. 
> > > > 
> > > > I think it is the opposite, vIOMMU and related has outgrown VFIO as
> > > > the "home" and needs to stand alone.
> > > > 
> > > > Apparently the HW that will need PASID for vDPA is Intel HW, so if
> > > 
> > > So just to make this clear, I did check internally if there are any plans
> > > for vDPA + SVM. There are none at the moment. 
> > 
> > Not SVM, SIOV.
> 
> ... And that included.. I should have said vDPA + PASID, No current plans. 
> I have no idea who set expectations with you. Can you please put me in touch 
> with that person, privately is fine.

It was the team that aruged VDPA had to be done through VFIO - SIOV
and PASID was one of their reasons it had to be VFIO, check the list
archives

If they didn't plan to use it, bit of a strawman argument, right?

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-20 Thread Jason Wang


On 2020/10/20 下午10:19, Liu, Yi L wrote:

From: Jason Gunthorpe 
Sent: Tuesday, October 20, 2020 10:02 PM

[...]

Whoever provides the vIOMMU emulation and relays the page fault to the

guest

has to translate the RID -

that's the point. But the device info (especially the sub-device info) is
within the passthru framework (e.g. VFIO). So page fault reporting needs
to go through passthru framework.


what does that have to do with VFIO?

How will VPDA provide the vIOMMU emulation?

a pardon here. I believe vIOMMU emulation should be based on IOMMU

vendor

specification, right? you may correct me if I'm missing anything.

I'm asking how will VDPA translate the RID when VDPA triggers a page
fault that has to be relayed to the guest. VDPA also has virtual PCI
devices it creates.

I've got a question. Does vDPA work with vIOMMU so far? e.g. Intel vIOMMU
or other type vIOMMU.



The kernel code is ready. Note that vhost suppport for vIOMMU is even 
earlier than VFIO.


The API is designed to be generic is not limited to any specific type of 
vIOMMU.


For qemu, it just need a patch to implement map/unmap notifier as what 
VFIO did.


Thanks





Regards,
Yi Liu



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-20 Thread Raj, Ashok
On Tue, Oct 20, 2020 at 05:14:03PM -0300, Jason Gunthorpe wrote:
> On Tue, Oct 20, 2020 at 01:08:44PM -0700, Raj, Ashok wrote:
> > On Tue, Oct 20, 2020 at 04:55:57PM -0300, Jason Gunthorpe wrote:
> > > On Tue, Oct 20, 2020 at 12:51:46PM -0700, Raj, Ashok wrote:
> > > > I think we agreed (or agree to disagree and commit) for device types 
> > > > that 
> > > > we have for SIOV, VFIO based approach works well without having to 
> > > > re-invent 
> > > > another way to do the same things. Not looking for a shortcut by any 
> > > > means, 
> > > > but we need to plan around existing hardware though. Looks like vDPA 
> > > > took 
> > > > some shortcuts then to not abstract iommu uAPI instead :-)? When all
> > > > necessary hardware was available.. This would be a solved puzzle. 
> > > 
> > > I think it is the opposite, vIOMMU and related has outgrown VFIO as
> > > the "home" and needs to stand alone.
> > > 
> > > Apparently the HW that will need PASID for vDPA is Intel HW, so if
> > 
> > So just to make this clear, I did check internally if there are any plans
> > for vDPA + SVM. There are none at the moment. 
> 
> Not SVM, SIOV.

... And that included.. I should have said vDPA + PASID, No current plans. 
I have no idea who set expectations with you. Can you please put me in touch 
with that person, privately is fine.
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-20 Thread Jason Gunthorpe
On Tue, Oct 20, 2020 at 01:08:44PM -0700, Raj, Ashok wrote:
> On Tue, Oct 20, 2020 at 04:55:57PM -0300, Jason Gunthorpe wrote:
> > On Tue, Oct 20, 2020 at 12:51:46PM -0700, Raj, Ashok wrote:
> > > I think we agreed (or agree to disagree and commit) for device types that 
> > > we have for SIOV, VFIO based approach works well without having to 
> > > re-invent 
> > > another way to do the same things. Not looking for a shortcut by any 
> > > means, 
> > > but we need to plan around existing hardware though. Looks like vDPA took 
> > > some shortcuts then to not abstract iommu uAPI instead :-)? When all
> > > necessary hardware was available.. This would be a solved puzzle. 
> > 
> > I think it is the opposite, vIOMMU and related has outgrown VFIO as
> > the "home" and needs to stand alone.
> > 
> > Apparently the HW that will need PASID for vDPA is Intel HW, so if
> 
> So just to make this clear, I did check internally if there are any plans
> for vDPA + SVM. There are none at the moment. 

Not SVM, SIOV.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-20 Thread Raj, Ashok
On Tue, Oct 20, 2020 at 04:55:57PM -0300, Jason Gunthorpe wrote:
> On Tue, Oct 20, 2020 at 12:51:46PM -0700, Raj, Ashok wrote:
> > I think we agreed (or agree to disagree and commit) for device types that 
> > we have for SIOV, VFIO based approach works well without having to 
> > re-invent 
> > another way to do the same things. Not looking for a shortcut by any means, 
> > but we need to plan around existing hardware though. Looks like vDPA took 
> > some shortcuts then to not abstract iommu uAPI instead :-)? When all
> > necessary hardware was available.. This would be a solved puzzle. 
> 
> I think it is the opposite, vIOMMU and related has outgrown VFIO as
> the "home" and needs to stand alone.
> 
> Apparently the HW that will need PASID for vDPA is Intel HW, so if

So just to make this clear, I did check internally if there are any plans
for vDPA + SVM. There are none at the moment. It seems like you have
better insight into our plans ;-). Please do let me know who confirmed vDPA
roadmap with you and I would love to talk to them to clear the air.


Cheers,
Ashok
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-20 Thread Jason Gunthorpe
On Tue, Oct 20, 2020 at 12:51:46PM -0700, Raj, Ashok wrote:
> I think we agreed (or agree to disagree and commit) for device types that 
> we have for SIOV, VFIO based approach works well without having to re-invent 
> another way to do the same things. Not looking for a shortcut by any means, 
> but we need to plan around existing hardware though. Looks like vDPA took 
> some shortcuts then to not abstract iommu uAPI instead :-)? When all
> necessary hardware was available.. This would be a solved puzzle. 

I think it is the opposite, vIOMMU and related has outgrown VFIO as
the "home" and needs to stand alone.

Apparently the HW that will need PASID for vDPA is Intel HW, so if
more is needed to do a good design you are probably the only one that
can get it/do it.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-20 Thread Raj, Ashok
On Tue, Oct 20, 2020 at 02:03:36PM -0300, Jason Gunthorpe wrote:
> On Tue, Oct 20, 2020 at 09:24:30AM -0700, Raj, Ashok wrote:
> > Hi Jason,
> > 
> > 
> > On Tue, Oct 20, 2020 at 11:02:17AM -0300, Jason Gunthorpe wrote:
> > > On Tue, Oct 20, 2020 at 10:21:41AM +, Liu, Yi L wrote:
> > > 
> > > > > I'm sure there will be some
> > > > > weird overlaps because we can't delete any of the existing VFIO APIs, 
> > > > > but
> > > > > that
> > > > > should not be a blocker.
> > > > 
> > > > but the weird thing is what we should consider. And it perhaps not just
> > > > overlap, it may be a re-definition of VFIO container. As I mentioned, 
> > > > VFIO
> > > > container is IOMMU context from the day it was defined. It could be the
> > > > blocker. :-(
> > > 
> > > So maybe you have to broaden the VFIO container to be usable by other
> > > subsystems. The discussion here is about what the uAPI should look
> > > like in a fairly abstract way. When we say 'dev/sva' it just some
> > > placeholder for a shared cdev that provides the necessary
> > > dis-aggregated functionality 
> > > 
> > > It could be an existing cdev with broader functionaltiy, it could
> > > really be /dev/iommu, etc. This is up to the folks building it to
> > > decide.
> > > 
> > > > I'm not expert on vDPA for now, but I saw you three open source
> > > > veterans have a similar idea for a place to cover IOMMU handling,
> > > > I think it may be a valuable thing to do. I said "may be" as I'm not
> > > > sure about Alex's opinion on such idea. But the sure thing is this
> > > > idea may introduce weird overlap even re-definition of existing
> > > > thing as I replied above. We need to evaluate the impact and mature
> > > > the idea step by step. 
> > > 
> > > This has happened before, uAPIs do get obsoleted and replaced with
> > > more general/better versions. It is often too hard to create a uAPI
> > > that lasts for decades when the HW landscape is constantly changing
> > > and sometime a reset is needed. 
> > 
> > I'm throwing this out with a lot of hesitation, but I'm going to :-)
> > 
> > So we have been disussing this for months now, with some high level vision
> > trying to get the uAPI's solidified with a vDPA hardware that might
> > potentially have SIOV/SVM like extensions in hardware which actualy doesn't
> > exist today. Understood people have plans. 
> 
> > Given that vDPA today has diverged already with duplicating use of IOMMU
> > api's without making an effort to gravitate to /dev/iommu as how you are
> > proposing.
> 
> I see it more like, given that we already know we have multiple users
> of IOMMU, adding new IOMMU focused features has to gravitate toward
> some kind of convergance.
> 
> Currently things are not so bad, VDPA is just getting started and the
> current IOMMU feature set is not so big.
> 
> PASID/vIOMMU/etc/et are all stressing this more, I think the
> responsibility falls to the people proposing these features to do the
> architecture work.
> 
> > The question is should we hold hostage the current vSVM/vIOMMU efforts
> > without even having made an effort for current vDPA/VFIO convergence. 
> 
> I don't think it is "held hostage" it is a "no shortcuts" approach,
> there was always a recognition that future VDPA drivers will need some
> work to integrated with vIOMMU realted stuff.

I think we agreed (or agree to disagree and commit) for device types that 
we have for SIOV, VFIO based approach works well without having to re-invent 
another way to do the same things. Not looking for a shortcut by any means, 
but we need to plan around existing hardware though. Looks like vDPA took 
some shortcuts then to not abstract iommu uAPI instead :-)? When all
necessary hardware was available.. This would be a solved puzzle. 


> 
> This is no different than the IMS discussion. The first proposed patch
> was really simple, but a layering violation.
> 
> The correct solution was some wild 20 patch series modernizing how x86

That was more like 48 patches, not 20 :-). But we had a real device with
IMS to model and create these new abstractions and test them against. 

For vDPA+SVM we have non-intersecting conversations at the moment with no
real hardware to model our discussion around. 

> interrupts works because it had outgrown itself. This general approach
> to use the shared MSI infrastructure was pointed out at the very
> beginning of IMS, BTW.

Agreed, and thankfully Thomas worked hard and made it a lot easier :-). 
Today IMS only deals with on device store. Although IMS could mean 
just simply having system memory to hold the interrupt attributes. 
This is how some of the graphics devices would be with context 
holding interrupt attributes. 

But certainly not rushing this since we need a REAL user to be there before we
support DEV_MSI that uses msg_addr/msg_data held in system memory. 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/ma

Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-20 Thread Jason Gunthorpe
On Tue, Oct 20, 2020 at 09:24:30AM -0700, Raj, Ashok wrote:
> Hi Jason,
> 
> 
> On Tue, Oct 20, 2020 at 11:02:17AM -0300, Jason Gunthorpe wrote:
> > On Tue, Oct 20, 2020 at 10:21:41AM +, Liu, Yi L wrote:
> > 
> > > > I'm sure there will be some
> > > > weird overlaps because we can't delete any of the existing VFIO APIs, 
> > > > but
> > > > that
> > > > should not be a blocker.
> > > 
> > > but the weird thing is what we should consider. And it perhaps not just
> > > overlap, it may be a re-definition of VFIO container. As I mentioned, VFIO
> > > container is IOMMU context from the day it was defined. It could be the
> > > blocker. :-(
> > 
> > So maybe you have to broaden the VFIO container to be usable by other
> > subsystems. The discussion here is about what the uAPI should look
> > like in a fairly abstract way. When we say 'dev/sva' it just some
> > placeholder for a shared cdev that provides the necessary
> > dis-aggregated functionality 
> > 
> > It could be an existing cdev with broader functionaltiy, it could
> > really be /dev/iommu, etc. This is up to the folks building it to
> > decide.
> > 
> > > I'm not expert on vDPA for now, but I saw you three open source
> > > veterans have a similar idea for a place to cover IOMMU handling,
> > > I think it may be a valuable thing to do. I said "may be" as I'm not
> > > sure about Alex's opinion on such idea. But the sure thing is this
> > > idea may introduce weird overlap even re-definition of existing
> > > thing as I replied above. We need to evaluate the impact and mature
> > > the idea step by step. 
> > 
> > This has happened before, uAPIs do get obsoleted and replaced with
> > more general/better versions. It is often too hard to create a uAPI
> > that lasts for decades when the HW landscape is constantly changing
> > and sometime a reset is needed. 
> 
> I'm throwing this out with a lot of hesitation, but I'm going to :-)
> 
> So we have been disussing this for months now, with some high level vision
> trying to get the uAPI's solidified with a vDPA hardware that might
> potentially have SIOV/SVM like extensions in hardware which actualy doesn't
> exist today. Understood people have plans. 

> Given that vDPA today has diverged already with duplicating use of IOMMU
> api's without making an effort to gravitate to /dev/iommu as how you are
> proposing.

I see it more like, given that we already know we have multiple users
of IOMMU, adding new IOMMU focused features has to gravitate toward
some kind of convergance.

Currently things are not so bad, VDPA is just getting started and the
current IOMMU feature set is not so big.

PASID/vIOMMU/etc/et are all stressing this more, I think the
responsibility falls to the people proposing these features to do the
architecture work.

> The question is should we hold hostage the current vSVM/vIOMMU efforts
> without even having made an effort for current vDPA/VFIO convergence. 

I don't think it is "held hostage" it is a "no shortcuts" approach,
there was always a recognition that future VDPA drivers will need some
work to integrated with vIOMMU realted stuff.

This is no different than the IMS discussion. The first proposed patch
was really simple, but a layering violation.

The correct solution was some wild 20 patch series modernizing how x86
interrupts works because it had outgrown itself. This general approach
to use the shared MSI infrastructure was pointed out at the very
beginning of IMS, BTW.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-20 Thread Raj, Ashok
Hi Jason,


On Tue, Oct 20, 2020 at 11:02:17AM -0300, Jason Gunthorpe wrote:
> On Tue, Oct 20, 2020 at 10:21:41AM +, Liu, Yi L wrote:
> 
> > > I'm sure there will be some
> > > weird overlaps because we can't delete any of the existing VFIO APIs, but
> > > that
> > > should not be a blocker.
> > 
> > but the weird thing is what we should consider. And it perhaps not just
> > overlap, it may be a re-definition of VFIO container. As I mentioned, VFIO
> > container is IOMMU context from the day it was defined. It could be the
> > blocker. :-(
> 
> So maybe you have to broaden the VFIO container to be usable by other
> subsystems. The discussion here is about what the uAPI should look
> like in a fairly abstract way. When we say 'dev/sva' it just some
> placeholder for a shared cdev that provides the necessary
> dis-aggregated functionality 
> 
> It could be an existing cdev with broader functionaltiy, it could
> really be /dev/iommu, etc. This is up to the folks building it to
> decide.
> 
> > I'm not expert on vDPA for now, but I saw you three open source
> > veterans have a similar idea for a place to cover IOMMU handling,
> > I think it may be a valuable thing to do. I said "may be" as I'm not
> > sure about Alex's opinion on such idea. But the sure thing is this
> > idea may introduce weird overlap even re-definition of existing
> > thing as I replied above. We need to evaluate the impact and mature
> > the idea step by step. 
> 
> This has happened before, uAPIs do get obsoleted and replaced with
> more general/better versions. It is often too hard to create a uAPI
> that lasts for decades when the HW landscape is constantly changing
> and sometime a reset is needed. 

I'm throwing this out with a lot of hesitation, but I'm going to :-)

So we have been disussing this for months now, with some high level vision
trying to get the uAPI's solidified with a vDPA hardware that might
potentially have SIOV/SVM like extensions in hardware which actualy doesn't
exist today. Understood people have plans. 

Given that vDPA today has diverged already with duplicating use of IOMMU
api's without making an effort to gravitate to /dev/iommu as how you are
proposing.

I think we all understand creating a permanent uAPI is hard, and they can
evolve in future. 

Maybe  we should start work on how to converge on generalizing the IOMMU
story first with what we have today (vDPA + VFIO) convergence and let it evolve 
with real hardware and new features like SVM/SIOV in mind. This is going 
to take time and we can start with what we have today for pulling vDPA and 
VFIO pieces first.

The question is should we hold hostage the current vSVM/vIOMMU efforts
without even having made an effort for current vDPA/VFIO convergence. 

> 
> The jump to shared PASID based IOMMU feels like one of those moments here.

As we have all noted, even without PASID we have divergence today?


> 
> > > Whoever provides the vIOMMU emulation and relays the page fault to the 
> > > guest
> > > has to translate the RID -
> > 
> > that's the point. But the device info (especially the sub-device info) is
> > within the passthru framework (e.g. VFIO). So page fault reporting needs
> > to go through passthru framework.
> >
> > > what does that have to do with VFIO?
> > > 
> > > How will VPDA provide the vIOMMU emulation?
> > 
> > a pardon here. I believe vIOMMU emulation should be based on IOMMU vendor
> > specification, right? you may correct me if I'm missing anything.
> 
> I'm asking how will VDPA translate the RID when VDPA triggers a page
> fault that has to be relayed to the guest. VDPA also has virtual PCI
> devices it creates.
> 
> We can't rely on VFIO to be the place that the vIOMMU lives because it
> excludes/complicates everything that is not VFIO from using that
> stuff.
> 
> Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-20 Thread Liu, Yi L
> From: Jason Gunthorpe 
> Sent: Tuesday, October 20, 2020 10:02 PM
[...]
> > > Whoever provides the vIOMMU emulation and relays the page fault to the
> guest
> > > has to translate the RID -
> >
> > that's the point. But the device info (especially the sub-device info) is
> > within the passthru framework (e.g. VFIO). So page fault reporting needs
> > to go through passthru framework.
> >
> > > what does that have to do with VFIO?
> > >
> > > How will VPDA provide the vIOMMU emulation?
> >
> > a pardon here. I believe vIOMMU emulation should be based on IOMMU
> vendor
> > specification, right? you may correct me if I'm missing anything.
> 
> I'm asking how will VDPA translate the RID when VDPA triggers a page
> fault that has to be relayed to the guest. VDPA also has virtual PCI
> devices it creates.

I've got a question. Does vDPA work with vIOMMU so far? e.g. Intel vIOMMU
or other type vIOMMU.

Regards,
Yi Liu

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-20 Thread Liu, Yi L
> From: Jason Gunthorpe 
> Sent: Tuesday, October 20, 2020 10:05 PM
> To: Liu, Yi L 
> 
> On Tue, Oct 20, 2020 at 02:00:31PM +, Liu, Yi L wrote:
> > > From: Jason Gunthorpe 
> > > Sent: Tuesday, October 20, 2020 9:55 PM
> > >
> > > On Tue, Oct 20, 2020 at 09:40:14AM +, Liu, Yi L wrote:
> > >
> > > > > See previous discussion with Kevin. If I understand correctly,
> > > > > you expect a
> > > shared
> > > > > L2 table if vDPA and VFIO device are using the same PASID.
> > > >
> > > > L2 table sharing is not mandatory. The mapping is the same, but no
> > > > need to assume L2 tables are shared. Especially for VFIO/vDPA
> > > > devices. Even within a passthru framework, like VFIO, if the
> > > > attributes of backend IOMMU are not the same, the L2 page table are not
> shared, but the mapping is the same.
> > >
> > > I think not being able to share the PASID shows exactly why this
> > > VFIO centric approach is bad.
> >
> > no, I didn't say PASID is not sharable. My point is sharing L2 page
> > table is not mandatory.
> 
> IMHO a PASID should be 1:1 with a page table, what does it even mean to share
> a PASID but have different page tables?

PASID is actually 1:1 with an address space. Not really needs to be 1:1 with
page table. :-)

Regards,
Yi Liu

> Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-20 Thread Jason Gunthorpe
On Tue, Oct 20, 2020 at 02:00:31PM +, Liu, Yi L wrote:
> > From: Jason Gunthorpe 
> > Sent: Tuesday, October 20, 2020 9:55 PM
> > 
> > On Tue, Oct 20, 2020 at 09:40:14AM +, Liu, Yi L wrote:
> > 
> > > > See previous discussion with Kevin. If I understand correctly, you 
> > > > expect a
> > shared
> > > > L2 table if vDPA and VFIO device are using the same PASID.
> > >
> > > L2 table sharing is not mandatory. The mapping is the same, but no need to
> > > assume L2 tables are shared. Especially for VFIO/vDPA devices. Even within
> > > a passthru framework, like VFIO, if the attributes of backend IOMMU are 
> > > not
> > > the same, the L2 page table are not shared, but the mapping is the same.
> > 
> > I think not being able to share the PASID shows exactly why this VFIO
> > centric approach is bad.
> 
> no, I didn't say PASID is not sharable. My point is sharing L2 page table is
> not mandatory.

IMHO a PASID should be 1:1 with a page table, what does it even mean
to share a PASID but have different page tables?

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-20 Thread Jason Gunthorpe
On Tue, Oct 20, 2020 at 10:21:41AM +, Liu, Yi L wrote:

> > I'm sure there will be some
> > weird overlaps because we can't delete any of the existing VFIO APIs, but
> > that
> > should not be a blocker.
> 
> but the weird thing is what we should consider. And it perhaps not just
> overlap, it may be a re-definition of VFIO container. As I mentioned, VFIO
> container is IOMMU context from the day it was defined. It could be the
> blocker. :-(

So maybe you have to broaden the VFIO container to be usable by other
subsystems. The discussion here is about what the uAPI should look
like in a fairly abstract way. When we say 'dev/sva' it just some
placeholder for a shared cdev that provides the necessary
dis-aggregated functionality 

It could be an existing cdev with broader functionaltiy, it could
really be /dev/iommu, etc. This is up to the folks building it to
decide.

> I'm not expert on vDPA for now, but I saw you three open source
> veterans have a similar idea for a place to cover IOMMU handling,
> I think it may be a valuable thing to do. I said "may be" as I'm not
> sure about Alex's opinion on such idea. But the sure thing is this
> idea may introduce weird overlap even re-definition of existing
> thing as I replied above. We need to evaluate the impact and mature
> the idea step by step. 

This has happened before, uAPIs do get obsoleted and replaced with
more general/better versions. It is often too hard to create a uAPI
that lasts for decades when the HW landscape is constantly changing
and sometime a reset is needed. 

The jump to shared PASID based IOMMU feels like one of those moments here.

> > Whoever provides the vIOMMU emulation and relays the page fault to the guest
> > has to translate the RID -
> 
> that's the point. But the device info (especially the sub-device info) is
> within the passthru framework (e.g. VFIO). So page fault reporting needs
> to go through passthru framework.
>
> > what does that have to do with VFIO?
> > 
> > How will VPDA provide the vIOMMU emulation?
> 
> a pardon here. I believe vIOMMU emulation should be based on IOMMU vendor
> specification, right? you may correct me if I'm missing anything.

I'm asking how will VDPA translate the RID when VDPA triggers a page
fault that has to be relayed to the guest. VDPA also has virtual PCI
devices it creates.

We can't rely on VFIO to be the place that the vIOMMU lives because it
excludes/complicates everything that is not VFIO from using that
stuff.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-20 Thread Liu, Yi L
> From: Jason Gunthorpe 
> Sent: Tuesday, October 20, 2020 9:55 PM
> 
> On Tue, Oct 20, 2020 at 09:40:14AM +, Liu, Yi L wrote:
> 
> > > See previous discussion with Kevin. If I understand correctly, you expect 
> > > a
> shared
> > > L2 table if vDPA and VFIO device are using the same PASID.
> >
> > L2 table sharing is not mandatory. The mapping is the same, but no need to
> > assume L2 tables are shared. Especially for VFIO/vDPA devices. Even within
> > a passthru framework, like VFIO, if the attributes of backend IOMMU are not
> > the same, the L2 page table are not shared, but the mapping is the same.
> 
> I think not being able to share the PASID shows exactly why this VFIO
> centric approach is bad.

no, I didn't say PASID is not sharable. My point is sharing L2 page table is
not mandatory.

Regards,
Yi Liu

> Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-20 Thread Jason Gunthorpe
On Tue, Oct 20, 2020 at 09:40:14AM +, Liu, Yi L wrote:

> > See previous discussion with Kevin. If I understand correctly, you expect a 
> > shared
> > L2 table if vDPA and VFIO device are using the same PASID.
> 
> L2 table sharing is not mandatory. The mapping is the same, but no need to
> assume L2 tables are shared. Especially for VFIO/vDPA devices. Even within
> a passthru framework, like VFIO, if the attributes of backend IOMMU are not
> the same, the L2 page table are not shared, but the mapping is the same.

I think not being able to share the PASID shows exactly why this VFIO
centric approach is bad.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-20 Thread Liu, Yi L
> From: Jason Gunthorpe 
> Sent: Monday, October 19, 2020 10:25 PM
> 
> On Mon, Oct 19, 2020 at 08:39:03AM +, Liu, Yi L wrote:
> > Hi Jason,
> >
> > Good to see your response.
> 
> Ah, I was away

got it. :-)

> > > > > Second, IOMMU nested translation is a per IOMMU domain
> > > > > capability. Since IOMMU domains are managed by VFIO/VDPA
> > > > > (alloc/free domain, attach/detach device, set/get domain
> > > > > attribute, etc.), reporting/enabling the nesting capability is
> > > > > an natural extension to the domain uAPI of existing passthrough
> frameworks.
> > > > > Actually, VFIO already includes a nesting enable interface even
> > > > > before this series. So it doesn't make sense to generalize this
> > > > > uAPI out.
> > >
> > > The subsystem that obtains an IOMMU domain for a device would have
> > > to register it with an open FD of the '/dev/sva'. That is the
> > > connection between the two subsystems. It would be some simple
> > > kernel internal
> > > stuff:
> > >
> > >   sva = get_sva_from_file(fd);
> >
> > Is this fd provided by userspace? I suppose the /dev/sva has a set of
> > uAPIs which will finally program page table to host iommu driver. As
> > far as I know, it's weird for VFIO user. Why should VFIO user connect
> > to a /dev/sva fd after it sets a proper iommu type to the opened
> > container. VFIO container already stands for an iommu context with
> > which userspace could program page mapping to host iommu.
> 
> Again the point is to dis-aggregate the vIOMMU related stuff from VFIO so it
> can
> be shared between more subsystems that need it.

I understand you here. :-)

> I'm sure there will be some
> weird overlaps because we can't delete any of the existing VFIO APIs, but
> that
> should not be a blocker.

but the weird thing is what we should consider. And it perhaps not just
overlap, it may be a re-definition of VFIO container. As I mentioned, VFIO
container is IOMMU context from the day it was defined. It could be the
blocker. :-(

> Having VFIO run in a mode where '/dev/sva' provides all the IOMMU handling is
> a possible path.

This looks to be similar with the proposal from Jason Wang and Kevin Tian.
It is an idea to add "/dev/iommu" and delegate the IOMMU domain alloc,
device attach/detach which is no in passthru framework to an independent
kernel driver. Just as Jason Wang said replace vfio iommu type1 driver.

Jason Wang:
 "And all the proposal in this series is to reuse the container fd. It 
 should be possible to replace e.g type1 IOMMU with a unified module."
link: 
https://lore.kernel.org/kvm/20201019142526.gj6...@nvidia.com/T/#md49fe9ac9d9eff6ddf5b8c2ee2f27eb2766f66f3

Kevin Tian:
 "Based on above, I feel a more reasonable way is to first make a 
 /dev/iommu uAPI supporting DMA map/unmap usages and then 
 introduce vSVA to it. Doing this order is because DMA map/unmap 
 is widely used thus can better help verify the core logic with 
 many existing devices."
link: 
https://lore.kernel.org/kvm/mwhpr11mb1645c702d148a2852b41fca08c...@mwhpr11mb1645.namprd11.prod.outlook.com/

> 
> If your plan is to just opencode everything into VFIO then I don't
> see how VDPA will work well, and if proper in-kernel abstractions are built I
> fail to see how
> routing some of it through userspace is a fundamental problem.

I'm not expert on vDPA for now, but I saw you three open source
veterans have a similar idea for a place to cover IOMMU handling,
I think it may be a valuable thing to do. I said "may be" as I'm not
sure about Alex's opinion on such idea. But the sure thing is this
idea may introduce weird overlap even re-definition of existing
thing as I replied above. We need to evaluate the impact and mature
the idea step by step. That means it would take time, so perhaps we
may do it in a staging way. First having a "/dev/iommu" be ready to
handle page MAP/UNMAP which can be used by both VFIO and vDPA, mean-
while let VFIO grow up (adding features) by itself and consider to
adopt the new /dev/iommu later once /dev/iommu is competent. Of
course this needs Alex's approval. And then adding new features
to /dev/iommu, like SVA.

> 
> > >   sva_register_device_to_pasid(sva, pasid, pci_device,
> > > iommu_domain);
> >
> > So this is supposed to be called by VFIO/VDPA to register the info to
> > /dev/sva.
> > right? And in dev/sva, it will also maintain the device/iommu_domain
> > and pasid info? will it be duplicated with VFIO/VDPA?
> 
> Each part needs to have the information it needs?

yeah, but it's the duplication which I'm not very much in. Perhaps the idea
from Jason Wang and Kevin would avoid such duplication.

> > > > > Moreover, mapping page fault to subdevice requires pre-
> > > > > registering subdevice fault data to IOMMU layer when binding
> > > > > guest page table, while such fault data can be only retrieved
> > > > > from parent driver through VFIO/VDPA.
> > >
> > > Not sure what this means, page fault should be tied to the PASID,
> > > any hookup needed for that

RE: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-20 Thread Liu, Yi L
> From: Jason Wang 
> Sent: Tuesday, October 20, 2020 5:20 PM
> 
> Hi Yi:
> 
> On 2020/10/20 ??4:19, Liu, Yi L wrote:
> >> Yes, but since PASID is a global identifier now, I think kernel
> >> should track the a device list per PASID?
> > We have such track. It's done in iommu driver. You can refer to the
> > struct intel_svm. PASID is a global identifier, but it doesn’t affect
> > that the PASID table is per-device.
> >
> >> So for such binding, PASID should be
> >> sufficient for uAPI.
> > not quite get it. PASID may be bound to multiple devices, how do you
> > figure out the target device if you don’t provide such info.
> 
> 
> I may miss soemthing but is there any reason that userspace need to figure out
> the target device? PASID is about address space not a specific device I think.

If you have multiple devices assigned to a VM, you won't expect to bind all
of them to a PASID in a single bind operation, right? you may want to bind
only the devices you really mean. This manner should be more flexible and
reasonable. :-)

> 
> >
> > The binding request is initiated by the virtual IOMMU, when
> > capturing guest attempt of binding page table to a virtual PASID
> > entry for a given device.
>  And for L2 page table programming, if PASID is use by both e.g VFIO
>  and vDPA, user need to choose one of uAPI to build l2 mappings?
> >>> for L2 page table mappings, it's done by VFIO MAP/UNMAP. for vdpa, I
> >>> guess it is tlb flush. so you are right. Keeping L1/L2 page table
> >>> management in a single uAPI set is also a reason for my current
> >>> series which extends VFIO for L1 management.
> >> I'm afraid that would introduce confusing to userspace. E.g:
> >>
> >> 1) when having only vDPA device, it uses vDPA uAPI to do l2
> >> management
> >> 2) when vDPA shares PASID with VFIO, it will use VFIO uAPI to do the
> >> l2 management?
> > I think vDPA will still use its own l2 for the l2 mappings. not sure
> > why you need vDPA use VFIO's l2 management. I don't think it is the case.
> 
> 
> See previous discussion with Kevin. If I understand correctly, you expect a 
> shared
> L2 table if vDPA and VFIO device are using the same PASID.

L2 table sharing is not mandatory. The mapping is the same, but no need to
assume L2 tables are shared. Especially for VFIO/vDPA devices. Even within
a passthru framework, like VFIO, if the attributes of backend IOMMU are not
the same, the L2 page table are not shared, but the mapping is the same.

> In this case, if l2 is still managed separately, there will be duplicated 
> request of
> map and unmap.

yes, but this is not a functional issue, right? If we want to solve it, we
should have a single uAPI set which can handle both L1 and L2 management.
That's also why you proposed to replace type1 driver. right?

Regards,
Yi Liu

> 
> Thanks
> 
> 
> >
> > Regards,
> > Yi Liu
> >

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-20 Thread Jason Wang

Hi Yi:

On 2020/10/20 下午4:19, Liu, Yi L wrote:

Yes, but since PASID is a global identifier now, I think kernel should
track the a device list per PASID?

We have such track. It's done in iommu driver. You can refer to the
struct intel_svm. PASID is a global identifier, but it doesn’t affect that
the PASID table is per-device.


So for such binding, PASID should be
sufficient for uAPI.

not quite get it. PASID may be bound to multiple devices, how do
you figure out the target device if you don’t provide such info.



I may miss soemthing but is there any reason that userspace need to 
figure out the target device? PASID is about address space not a 
specific device I think.






The binding request is initiated by the virtual IOMMU, when capturing
guest attempt of binding page table to a virtual PASID entry for a
given device.

And for L2 page table programming, if PASID is use by both e.g VFIO and
vDPA, user need to choose one of uAPI to build l2 mappings?

for L2 page table mappings, it's done by VFIO MAP/UNMAP. for vdpa, I guess
it is tlb flush. so you are right. Keeping L1/L2 page table management in
a single uAPI set is also a reason for my current series which extends VFIO
for L1 management.

I'm afraid that would introduce confusing to userspace. E.g:

1) when having only vDPA device, it uses vDPA uAPI to do l2 management
2) when vDPA shares PASID with VFIO, it will use VFIO uAPI to do the l2
management?

I think vDPA will still use its own l2 for the l2 mappings. not sure why you
need vDPA use VFIO's l2 management. I don't think it is the case.



See previous discussion with Kevin. If I understand correctly, you 
expect a shared L2 table if vDPA and VFIO device are using the same PASID.


In this case, if l2 is still managed separately, there will be 
duplicated request of map and unmap.


Thanks




Regards,
Yi Liu



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-20 Thread Liu, Yi L
Hey Jason,

> From: Jason Wang 
> Sent: Tuesday, October 20, 2020 2:18 PM
> 
> On 2020/10/15 ??6:14, Liu, Yi L wrote:
> >> From: Jason Wang 
> >> Sent: Thursday, October 15, 2020 4:41 PM
> >>
> >>
> >> On 2020/10/15 ??3:58, Tian, Kevin wrote:
>  From: Jason Wang 
>  Sent: Thursday, October 15, 2020 2:52 PM
> 
> 
>  On 2020/10/14 ??11:08, Tian, Kevin wrote:
> >> From: Jason Wang 
> >> Sent: Tuesday, October 13, 2020 2:22 PM
> >>
> >>
> >> On 2020/10/12 ??4:38, Tian, Kevin wrote:
>  From: Jason Wang 
>  Sent: Monday, September 14, 2020 12:20 PM
> 
> >>> [...]
> >>>  > If it's possible, I would suggest a generic uAPI instead of
> >>> a VFIO
>  specific one.
> 
>  Jason suggest something like /dev/sva. There will be a lot of
>  other subsystems that could benefit from this (e.g vDPA).
> 
>  Have you ever considered this approach?
> 
> >>> Hi, Jason,
> >>>
> >>> We did some study on this approach and below is the output. It's a
> >>> long writing but I didn't find a way to further abstract w/o
> >>> losing necessary context. Sorry about that.
> >>>
> >>> Overall the real purpose of this series is to enable IOMMU nested
> >>> translation capability with vSVA as one major usage, through below
> >>> new uAPIs:
> >>>   1) Report/enable IOMMU nested translation capability;
> >>>   2) Allocate/free PASID;
> >>>   3) Bind/unbind guest page table;
> >>>   4) Invalidate IOMMU cache;
> >>>   5) Handle IOMMU page request/response (not in this series);
> >>> 1/3/4) is the minimal set for using IOMMU nested translation, with
> >>> the other two optional. For example, the guest may enable vSVA on
> >>> a device without using PASID. Or, it may bind its gIOVA page table
> >>> which doesn't require page fault support. Finally, all operations
> >>> can be applied to either physical device or subdevice.
> >>>
> >>> Then we evaluated each uAPI whether generalizing it is a good
> >>> thing both in concept and regarding to complexity.
> >>>
> >>> First, unlike other uAPIs which are all backed by iommu_ops, PASID
> >>> allocation/free is through the IOASID sub-system.
> >> A question here, is IOASID expected to be the single management
> >> interface for PASID?
> > yes
> >
> >> (I'm asking since there're already vendor specific IDA based PASID
> >> allocator e.g amdgpu_pasid_alloc())
> > That comes before IOASID core was introduced. I think it should be
> > changed to use the new generic interface. Jacob/Jean can better
> > comment if other reason exists for this exception.
>  If there's no exception it should be fixed.
> 
> 
> >>>  From this angle
> >>> we feel generalizing PASID management does make some sense.
> >>> First, PASID is just a number and not related to any device before
> >>> it's bound to a page table and IOMMU domain. Second, PASID is a
> >>> global resource (at least on Intel VT-d),
> >> I think we need a definition of "global" here. It looks to me for
> >> vt-d the PASID table is per device.
> > PASID table is per device, thus VT-d could support per-device PASIDs
> > in concept.
>  I think that's the requirement of PCIE spec which said PASID + RID
>  identifies the process address space ID.
> 
> 
> > However on Intel platform we require PASIDs to be managed in
> > system-wide (cross host and guest) when combining vSVA, SIOV, SR-IOV
> > and ENQCMD together.
>  Any reason for such requirement? (I'm not familiar with ENQCMD, but
>  my understanding is that vSVA, SIOV or SR-IOV doesn't have the
>  requirement for system-wide PASID).
> >>> ENQCMD is a new instruction to allow multiple processes submitting
> >>> workload to one shared workqueue. Each process has an unique PASID
> >>> saved in a MSR, which is included in the ENQCMD payload to indicate
> >>> the address space when the CPU sends to the device. As one process
> >>> might issue ENQCMD to multiple devices, OS-wide PASID allocation is
> >>> required both in host and guest side.
> >>>
> >>> When executing ENQCMD in the guest to a SIOV device, the guest
> >>> programmed value in the PASID_MSR must be translated to a host PASID
> >>> value for proper function/isolation as PASID represents the address
> >>> space. The translation is done through a new VMCS PASID translation
> >>> structure (per-VM, and 1:1 mapping). From this angle the host PASIDs
> >>> must be allocated 'globally' cross all assigned devices otherwise it
> >>> may lead to 1:N mapping when a guest process issues ENQCMD to multiple
> >>> assigned devices/subdevices.
> >>>
> >>> There will be a KVM forum session for this topic btw.
> >>
> >> Thanks for the background. Now I see the restrict comes from ENQCMD.
> >>
> >>
>

Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-19 Thread Jason Wang


On 2020/10/15 下午6:14, Liu, Yi L wrote:

From: Jason Wang 
Sent: Thursday, October 15, 2020 4:41 PM


On 2020/10/15 ??3:58, Tian, Kevin wrote:

From: Jason Wang 
Sent: Thursday, October 15, 2020 2:52 PM


On 2020/10/14 ??11:08, Tian, Kevin wrote:

From: Jason Wang 
Sent: Tuesday, October 13, 2020 2:22 PM


On 2020/10/12 ??4:38, Tian, Kevin wrote:

From: Jason Wang 
Sent: Monday, September 14, 2020 12:20 PM


[...]
 > If it's possible, I would suggest a generic uAPI instead of
a VFIO

specific one.

Jason suggest something like /dev/sva. There will be a lot of
other subsystems that could benefit from this (e.g vDPA).

Have you ever considered this approach?


Hi, Jason,

We did some study on this approach and below is the output. It's a
long writing but I didn't find a way to further abstract w/o
losing necessary context. Sorry about that.

Overall the real purpose of this series is to enable IOMMU nested
translation capability with vSVA as one major usage, through below
new uAPIs:
1) Report/enable IOMMU nested translation capability;
2) Allocate/free PASID;
3) Bind/unbind guest page table;
4) Invalidate IOMMU cache;
5) Handle IOMMU page request/response (not in this series);
1/3/4) is the minimal set for using IOMMU nested translation, with
the other two optional. For example, the guest may enable vSVA on
a device without using PASID. Or, it may bind its gIOVA page table
which doesn't require page fault support. Finally, all operations
can be applied to either physical device or subdevice.

Then we evaluated each uAPI whether generalizing it is a good
thing both in concept and regarding to complexity.

First, unlike other uAPIs which are all backed by iommu_ops, PASID
allocation/free is through the IOASID sub-system.

A question here, is IOASID expected to be the single management
interface for PASID?

yes


(I'm asking since there're already vendor specific IDA based PASID
allocator e.g amdgpu_pasid_alloc())

That comes before IOASID core was introduced. I think it should be
changed to use the new generic interface. Jacob/Jean can better
comment if other reason exists for this exception.

If there's no exception it should be fixed.



 From this angle
we feel generalizing PASID management does make some sense.
First, PASID is just a number and not related to any device before
it's bound to a page table and IOMMU domain. Second, PASID is a
global resource (at least on Intel VT-d),

I think we need a definition of "global" here. It looks to me for
vt-d the PASID table is per device.

PASID table is per device, thus VT-d could support per-device PASIDs
in concept.

I think that's the requirement of PCIE spec which said PASID + RID
identifies the process address space ID.



However on Intel platform we require PASIDs to be managed in
system-wide (cross host and guest) when combining vSVA, SIOV, SR-IOV
and ENQCMD together.

Any reason for such requirement? (I'm not familiar with ENQCMD, but
my understanding is that vSVA, SIOV or SR-IOV doesn't have the
requirement for system-wide PASID).

ENQCMD is a new instruction to allow multiple processes submitting
workload to one shared workqueue. Each process has an unique PASID
saved in a MSR, which is included in the ENQCMD payload to indicate
the address space when the CPU sends to the device. As one process
might issue ENQCMD to multiple devices, OS-wide PASID allocation is
required both in host and guest side.

When executing ENQCMD in the guest to a SIOV device, the guest
programmed value in the PASID_MSR must be translated to a host PASID
value for proper function/isolation as PASID represents the address
space. The translation is done through a new VMCS PASID translation
structure (per-VM, and 1:1 mapping). From this angle the host PASIDs
must be allocated 'globally' cross all assigned devices otherwise it
may lead to 1:N mapping when a guest process issues ENQCMD to multiple
assigned devices/subdevices.

There will be a KVM forum session for this topic btw.


Thanks for the background. Now I see the restrict comes from ENQCMD.



Thus the host creates only one 'global' PASID namespace but do use
per-device PASID table to assure isolation between devices on Intel
platforms. But ARM does it differently as Jean explained.
They have a global namespace for host processes on all host-owned
devices (same as Intel), but then per-device namespace when a device
(and its PASID table) is assigned to userspace.


Another question, is this possible to have two DMAR hardware
unit(at least I can see two even in my laptop). In this case, is
PASID still a global resource?

yes


 while having separate VFIO/
VDPA allocation interfaces may easily cause confusion in
userspace, e.g. which interface to be used if both VFIO/VDPA devices exist.
Moreover, an unified interface allows centralized control over how
many PASIDs are allowed per process.

Yes.



One unclear part with this generalization is about the permission.
Do we 

Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-19 Thread Jason Gunthorpe
On Mon, Oct 19, 2020 at 08:39:03AM +, Liu, Yi L wrote:
> Hi Jason,
> 
> Good to see your response.

Ah, I was away

> > > > Second, IOMMU nested translation is a per IOMMU domain
> > > > capability. Since IOMMU domains are managed by VFIO/VDPA
> > > >  (alloc/free domain, attach/detach device, set/get domain attribute,
> > > > etc.), reporting/enabling the nesting capability is an natural
> > > > extension to the domain uAPI of existing passthrough frameworks.
> > > > Actually, VFIO already includes a nesting enable interface even
> > > > before this series. So it doesn't make sense to generalize this uAPI
> > > > out.
> > 
> > The subsystem that obtains an IOMMU domain for a device would have to
> > register it with an open FD of the '/dev/sva'. That is the connection
> > between the two subsystems. It would be some simple kernel internal
> > stuff:
> > 
> >   sva = get_sva_from_file(fd);
> 
> Is this fd provided by userspace? I suppose the /dev/sva has a set of uAPIs
> which will finally program page table to host iommu driver. As far as I know,
> it's weird for VFIO user. Why should VFIO user connect to a /dev/sva fd after
> it sets a proper iommu type to the opened container. VFIO container already
> stands for an iommu context with which userspace could program page mapping
> to host iommu.

Again the point is to dis-aggregate the vIOMMU related stuff from VFIO
so it can be shared between more subsystems that need it. I'm sure
there will be some weird overlaps because we can't delete any of the
existing VFIO APIs, but that should not be a blocker.

Having VFIO run in a mode where '/dev/sva' provides all the IOMMU
handling is a possible path.

If your plan is to just opencode everything into VFIO then I don't see
how VDPA will work well, and if proper in-kernel abstractions are
built I fail to see how routing some of it through userspace is a
fundamental problem.

> >   sva_register_device_to_pasid(sva, pasid, pci_device, iommu_domain);
> 
> So this is supposed to be called by VFIO/VDPA to register the info to 
> /dev/sva.
> right? And in dev/sva, it will also maintain the device/iommu_domain and pasid
> info? will it be duplicated with VFIO/VDPA?

Each part needs to have the information it needs? 

> > > > Moreover, mapping page fault to subdevice requires pre-
> > > > registering subdevice fault data to IOMMU layer when binding
> > > > guest page table, while such fault data can be only retrieved from
> > > > parent driver through VFIO/VDPA.
> > 
> > Not sure what this means, page fault should be tied to the PASID, any
> > hookup needed for that should be done in-kernel when the device is
> > connected to the PASID.
> 
> you may refer to chapter 7.4.1.1 of VT-d spec. Page request is reported to
> software together with the requestor id of the device. For the page request
> injects to guest, it should have the device info.

Whoever provides the vIOMMU emulation and relays the page fault to the
guest has to translate the RID - what does that have to do with VFIO?

How will VPDA provide the vIOMMU emulation?

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-19 Thread Liu, Yi L
Hi Jason,

Good to see your response.

> From: Jason Gunthorpe 
> Sent: Friday, October 16, 2020 11:37 PM
> 
> On Wed, Oct 14, 2020 at 03:16:22AM +, Tian, Kevin wrote:
> > Hi, Alex and Jason (G),
> >
> > How about your opinion for this new proposal? For now looks both
> > Jason (W) and Jean are OK with this direction and more discussions
> > are possibly required for the new /dev/ioasid interface. Internally
> > we're doing a quick prototype to see any unforeseen issue with this
> > separation.
> 
> Assuming VDPA and VFIO will be the only two users so duplicating
> everything only twice sounds pretty restricting to me.
> 
> > > Second, IOMMU nested translation is a per IOMMU domain
> > > capability. Since IOMMU domains are managed by VFIO/VDPA
> > >  (alloc/free domain, attach/detach device, set/get domain attribute,
> > > etc.), reporting/enabling the nesting capability is an natural
> > > extension to the domain uAPI of existing passthrough frameworks.
> > > Actually, VFIO already includes a nesting enable interface even
> > > before this series. So it doesn't make sense to generalize this uAPI
> > > out.
> 
> The subsystem that obtains an IOMMU domain for a device would have to
> register it with an open FD of the '/dev/sva'. That is the connection
> between the two subsystems. It would be some simple kernel internal
> stuff:
> 
>   sva = get_sva_from_file(fd);

Is this fd provided by userspace? I suppose the /dev/sva has a set of uAPIs
which will finally program page table to host iommu driver. As far as I know,
it's weird for VFIO user. Why should VFIO user connect to a /dev/sva fd after
it sets a proper iommu type to the opened container. VFIO container already
stands for an iommu context with which userspace could program page mapping
to host iommu.

>   sva_register_device_to_pasid(sva, pasid, pci_device, iommu_domain);

So this is supposed to be called by VFIO/VDPA to register the info to /dev/sva.
right? And in dev/sva, it will also maintain the device/iommu_domain and pasid
info? will it be duplicated with VFIO/VDPA?

> Not sure why this is a roadblock?
> 
> How would this be any different from having some kernel libsva that
> VDPA and VFIO would both rely on?
> 
> You don't plan to just open code all this stuff in VFIO, do you?
> 
> > > Then the tricky part comes with the remaining operations (3/4/5),
> > > which are all backed by iommu_ops thus effective only within an
> > > IOMMU domain. To generalize them, the first thing is to find a way
> > > to associate the sva_FD (opened through generic /dev/sva) with an
> > > IOMMU domain that is created by VFIO/VDPA. The second thing is
> > > to replicate {domain<->device/subdevice} association in /dev/sva
> > > path because some operations (e.g. page fault) is triggered/handled
> > > per device/subdevice. Therefore, /dev/sva must provide both per-
> > > domain and per-device uAPIs similar to what VFIO/VDPA already
> > > does.
> 
> Yes, the point here was to move the general APIs out of VFIO and into
> a sharable location. So, of course one would expect some duplication
> during the transition period.
> 
> > > Moreover, mapping page fault to subdevice requires pre-
> > > registering subdevice fault data to IOMMU layer when binding
> > > guest page table, while such fault data can be only retrieved from
> > > parent driver through VFIO/VDPA.
> 
> Not sure what this means, page fault should be tied to the PASID, any
> hookup needed for that should be done in-kernel when the device is
> connected to the PASID.

you may refer to chapter 7.4.1.1 of VT-d spec. Page request is reported to
software together with the requestor id of the device. For the page request
injects to guest, it should have the device info.

Regards,
Yi Liu

> 
> > > space but they may be organized in multiple IOMMU domains based
> > > on their bus type. How (should we let) the userspace know the
> > > domain information and open an sva_FD for each domain is the main
> > > problem here.
> 
> Why is one sva_FD per iommu domain required? The HW can attach the
> same PASID to multiple iommu domains, right?
> 
> > > In the end we just realized that doing such generalization doesn't
> > > really lead to a clear design and instead requires tight coordination
> > > between /dev/sva and VFIO/VDPA for almost every new uAPI
> > > (especially about synchronization when the domain/device
> > > association is changed or when the device/subdevice is being reset/
> > > drained). Finally it may become a usability burden to the userspace
> > > on proper use of the two interfaces on the assigned device.
> 
> If you have a list of things that needs to be done to attach a PCI
> device to a PASID then of course they should be tidy kernel APIs
> already, and not just hard wired into VFIO.
> 
> The worst outcome would be to have VDPA and VFIO have to different
> ways to do all of this with a different set of bugs. Bug fixes/new
> features in VFIO won't flow over to VDPA.
> 
> Jason
___

Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-16 Thread Jason Gunthorpe
On Wed, Oct 14, 2020 at 03:16:22AM +, Tian, Kevin wrote:
> Hi, Alex and Jason (G),
> 
> How about your opinion for this new proposal? For now looks both
> Jason (W) and Jean are OK with this direction and more discussions
> are possibly required for the new /dev/ioasid interface. Internally 
> we're doing a quick prototype to see any unforeseen issue with this
> separation. 

Assuming VDPA and VFIO will be the only two users so duplicating
everything only twice sounds pretty restricting to me.

> > Second, IOMMU nested translation is a per IOMMU domain
> > capability. Since IOMMU domains are managed by VFIO/VDPA
> >  (alloc/free domain, attach/detach device, set/get domain attribute,
> > etc.), reporting/enabling the nesting capability is an natural
> > extension to the domain uAPI of existing passthrough frameworks.
> > Actually, VFIO already includes a nesting enable interface even
> > before this series. So it doesn't make sense to generalize this uAPI
> > out.

The subsystem that obtains an IOMMU domain for a device would have to
register it with an open FD of the '/dev/sva'. That is the connection
between the two subsystems. It would be some simple kernel internal
stuff:

  sva = get_sva_from_file(fd);
  sva_register_device_to_pasid(sva, pasid, pci_device, iommu_domain);

Not sure why this is a roadblock?

How would this be any different from having some kernel libsva that
VDPA and VFIO would both rely on?

You don't plan to just open code all this stuff in VFIO, do you?

> > Then the tricky part comes with the remaining operations (3/4/5),
> > which are all backed by iommu_ops thus effective only within an
> > IOMMU domain. To generalize them, the first thing is to find a way
> > to associate the sva_FD (opened through generic /dev/sva) with an
> > IOMMU domain that is created by VFIO/VDPA. The second thing is
> > to replicate {domain<->device/subdevice} association in /dev/sva
> > path because some operations (e.g. page fault) is triggered/handled
> > per device/subdevice. Therefore, /dev/sva must provide both per-
> > domain and per-device uAPIs similar to what VFIO/VDPA already
> > does. 

Yes, the point here was to move the general APIs out of VFIO and into
a sharable location. So, of course one would expect some duplication
during the transition period.

> > Moreover, mapping page fault to subdevice requires pre-
> > registering subdevice fault data to IOMMU layer when binding
> > guest page table, while such fault data can be only retrieved from
> > parent driver through VFIO/VDPA.

Not sure what this means, page fault should be tied to the PASID, any
hookup needed for that should be done in-kernel when the device is
connected to the PASID.

> > space but they may be organized in multiple IOMMU domains based
> > on their bus type. How (should we let) the userspace know the
> > domain information and open an sva_FD for each domain is the main
> > problem here.

Why is one sva_FD per iommu domain required? The HW can attach the
same PASID to multiple iommu domains, right?

> > In the end we just realized that doing such generalization doesn't
> > really lead to a clear design and instead requires tight coordination
> > between /dev/sva and VFIO/VDPA for almost every new uAPI
> > (especially about synchronization when the domain/device
> > association is changed or when the device/subdevice is being reset/
> > drained). Finally it may become a usability burden to the userspace
> > on proper use of the two interfaces on the assigned device.

If you have a list of things that needs to be done to attach a PCI
device to a PASID then of course they should be tidy kernel APIs
already, and not just hard wired into VFIO.

The worst outcome would be to have VDPA and VFIO have to different
ways to do all of this with a different set of bugs. Bug fixes/new
features in VFIO won't flow over to VDPA.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-15 Thread Liu, Yi L
> From: Jason Wang 
> Sent: Thursday, October 15, 2020 4:41 PM
> 
> 
> On 2020/10/15 ??3:58, Tian, Kevin wrote:
> >> From: Jason Wang 
> >> Sent: Thursday, October 15, 2020 2:52 PM
> >>
> >>
> >> On 2020/10/14 ??11:08, Tian, Kevin wrote:
>  From: Jason Wang 
>  Sent: Tuesday, October 13, 2020 2:22 PM
> 
> 
>  On 2020/10/12 ??4:38, Tian, Kevin wrote:
> >> From: Jason Wang 
> >> Sent: Monday, September 14, 2020 12:20 PM
> >>
> > [...]
> > > If it's possible, I would suggest a generic uAPI instead of
> > a VFIO
> >> specific one.
> >>
> >> Jason suggest something like /dev/sva. There will be a lot of
> >> other subsystems that could benefit from this (e.g vDPA).
> >>
> >> Have you ever considered this approach?
> >>
> > Hi, Jason,
> >
> > We did some study on this approach and below is the output. It's a
> > long writing but I didn't find a way to further abstract w/o
> > losing necessary context. Sorry about that.
> >
> > Overall the real purpose of this series is to enable IOMMU nested
> > translation capability with vSVA as one major usage, through below
> > new uAPIs:
> > 1) Report/enable IOMMU nested translation capability;
> > 2) Allocate/free PASID;
> > 3) Bind/unbind guest page table;
> > 4) Invalidate IOMMU cache;
> > 5) Handle IOMMU page request/response (not in this series);
> > 1/3/4) is the minimal set for using IOMMU nested translation, with
> > the other two optional. For example, the guest may enable vSVA on
> > a device without using PASID. Or, it may bind its gIOVA page table
> > which doesn't require page fault support. Finally, all operations
> > can be applied to either physical device or subdevice.
> >
> > Then we evaluated each uAPI whether generalizing it is a good
> > thing both in concept and regarding to complexity.
> >
> > First, unlike other uAPIs which are all backed by iommu_ops, PASID
> > allocation/free is through the IOASID sub-system.
>  A question here, is IOASID expected to be the single management
>  interface for PASID?
> >>> yes
> >>>
>  (I'm asking since there're already vendor specific IDA based PASID
>  allocator e.g amdgpu_pasid_alloc())
> >>> That comes before IOASID core was introduced. I think it should be
> >>> changed to use the new generic interface. Jacob/Jean can better
> >>> comment if other reason exists for this exception.
> >>
> >> If there's no exception it should be fixed.
> >>
> >>
> > From this angle
> > we feel generalizing PASID management does make some sense.
> > First, PASID is just a number and not related to any device before
> > it's bound to a page table and IOMMU domain. Second, PASID is a
> > global resource (at least on Intel VT-d),
>  I think we need a definition of "global" here. It looks to me for
>  vt-d the PASID table is per device.
> >>> PASID table is per device, thus VT-d could support per-device PASIDs
> >>> in concept.
> >>
> >> I think that's the requirement of PCIE spec which said PASID + RID
> >> identifies the process address space ID.
> >>
> >>
> >>>However on Intel platform we require PASIDs to be managed in
> >>> system-wide (cross host and guest) when combining vSVA, SIOV, SR-IOV
> >>> and ENQCMD together.
> >>
> >> Any reason for such requirement? (I'm not familiar with ENQCMD, but
> >> my understanding is that vSVA, SIOV or SR-IOV doesn't have the
> >> requirement for system-wide PASID).
> > ENQCMD is a new instruction to allow multiple processes submitting
> > workload to one shared workqueue. Each process has an unique PASID
> > saved in a MSR, which is included in the ENQCMD payload to indicate
> > the address space when the CPU sends to the device. As one process
> > might issue ENQCMD to multiple devices, OS-wide PASID allocation is
> > required both in host and guest side.
> >
> > When executing ENQCMD in the guest to a SIOV device, the guest
> > programmed value in the PASID_MSR must be translated to a host PASID
> > value for proper function/isolation as PASID represents the address
> > space. The translation is done through a new VMCS PASID translation
> > structure (per-VM, and 1:1 mapping). From this angle the host PASIDs
> > must be allocated 'globally' cross all assigned devices otherwise it
> > may lead to 1:N mapping when a guest process issues ENQCMD to multiple
> > assigned devices/subdevices.
> >
> > There will be a KVM forum session for this topic btw.
> 
> 
> Thanks for the background. Now I see the restrict comes from ENQCMD.
> 
> 
> >
> >>
> >>> Thus the host creates only one 'global' PASID namespace but do use
> >>> per-device PASID table to assure isolation between devices on Intel
> >>> platforms. But ARM does it differently as Jean explained.
> >>> They have a global namespace for host processes on all host-owned
> >>> devices (s

Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-15 Thread Jason Wang


On 2020/10/15 下午3:58, Tian, Kevin wrote:

From: Jason Wang 
Sent: Thursday, October 15, 2020 2:52 PM


On 2020/10/14 上午11:08, Tian, Kevin wrote:

From: Jason Wang 
Sent: Tuesday, October 13, 2020 2:22 PM


On 2020/10/12 下午4:38, Tian, Kevin wrote:

From: Jason Wang 
Sent: Monday, September 14, 2020 12:20 PM


[...]
> If it's possible, I would suggest a generic uAPI instead of a VFIO

specific one.

Jason suggest something like /dev/sva. There will be a lot of other
subsystems that could benefit from this (e.g vDPA).

Have you ever considered this approach?


Hi, Jason,

We did some study on this approach and below is the output. It's a
long writing but I didn't find a way to further abstract w/o losing
necessary context. Sorry about that.

Overall the real purpose of this series is to enable IOMMU nested
translation capability with vSVA as one major usage, through
below new uAPIs:
1) Report/enable IOMMU nested translation capability;
2) Allocate/free PASID;
3) Bind/unbind guest page table;
4) Invalidate IOMMU cache;
5) Handle IOMMU page request/response (not in this series);
1/3/4) is the minimal set for using IOMMU nested translation, with
the other two optional. For example, the guest may enable vSVA on
a device without using PASID. Or, it may bind its gIOVA page table
which doesn't require page fault support. Finally, all operations can
be applied to either physical device or subdevice.

Then we evaluated each uAPI whether generalizing it is a good thing
both in concept and regarding to complexity.

First, unlike other uAPIs which are all backed by iommu_ops, PASID
allocation/free is through the IOASID sub-system.

A question here, is IOASID expected to be the single management
interface for PASID?

yes


(I'm asking since there're already vendor specific IDA based PASID
allocator e.g amdgpu_pasid_alloc())

That comes before IOASID core was introduced. I think it should be
changed to use the new generic interface. Jacob/Jean can better
comment if other reason exists for this exception.


If there's no exception it should be fixed.



From this angle
we feel generalizing PASID management does make some sense.
First, PASID is just a number and not related to any device before
it's bound to a page table and IOMMU domain. Second, PASID is a
global resource (at least on Intel VT-d),

I think we need a definition of "global" here. It looks to me for vt-d
the PASID table is per device.

PASID table is per device, thus VT-d could support per-device PASIDs
in concept.


I think that's the requirement of PCIE spec which said PASID + RID
identifies the process address space ID.



   However on Intel platform we require PASIDs to be managed
in system-wide (cross host and guest) when combining vSVA, SIOV, SR-IOV
and ENQCMD together.


Any reason for such requirement? (I'm not familiar with ENQCMD, but my
understanding is that vSVA, SIOV or SR-IOV doesn't have the requirement
for system-wide PASID).

ENQCMD is a new instruction to allow multiple processes submitting
workload to one shared workqueue. Each process has an unique PASID
saved in a MSR, which is included in the ENQCMD payload to indicate
the address space when the CPU sends to the device. As one process
might issue ENQCMD to multiple devices, OS-wide PASID allocation is
required both in host and guest side.

When executing ENQCMD in the guest to a SIOV device, the guest
programmed value in the PASID_MSR must be translated to a host PASID
value for proper function/isolation as PASID represents the address
space. The translation is done through a new VMCS PASID translation
structure (per-VM, and 1:1 mapping). From this angle the host PASIDs
must be allocated 'globally' cross all assigned devices otherwise it may
lead to 1:N mapping when a guest process issues ENQCMD to multiple
assigned devices/subdevices.

There will be a KVM forum session for this topic btw.



Thanks for the background. Now I see the restrict comes from ENQCMD.







Thus the host creates only one 'global' PASID
namespace but do use per-device PASID table to assure isolation between
devices on Intel platforms. But ARM does it differently as Jean explained.
They have a global namespace for host processes on all host-owned
devices (same as Intel), but then per-device namespace when a device
(and its PASID table) is assigned to userspace.


Another question, is this possible to have two DMAR hardware unit(at
least I can see two even in my laptop). In this case, is PASID still a
global resource?

yes


while having separate VFIO/
VDPA allocation interfaces may easily cause confusion in userspace,
e.g. which interface to be used if both VFIO/VDPA devices exist.
Moreover, an unified interface allows centralized control over how
many PASIDs are allowed per process.

Yes.



One unclear part with this generalization is about the permission.
Do we open this interface to any process or only to those which
have assigned devices? If the latter, w

RE: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-15 Thread Tian, Kevin
> From: Jason Wang 
> Sent: Thursday, October 15, 2020 2:52 PM
> 
> 
> On 2020/10/14 上午11:08, Tian, Kevin wrote:
> >> From: Jason Wang 
> >> Sent: Tuesday, October 13, 2020 2:22 PM
> >>
> >>
> >> On 2020/10/12 下午4:38, Tian, Kevin wrote:
>  From: Jason Wang 
>  Sent: Monday, September 14, 2020 12:20 PM
> 
> >>> [...]
> >>>> If it's possible, I would suggest a generic uAPI instead of a VFIO
>  specific one.
> 
>  Jason suggest something like /dev/sva. There will be a lot of other
>  subsystems that could benefit from this (e.g vDPA).
> 
>  Have you ever considered this approach?
> 
> >>> Hi, Jason,
> >>>
> >>> We did some study on this approach and below is the output. It's a
> >>> long writing but I didn't find a way to further abstract w/o losing
> >>> necessary context. Sorry about that.
> >>>
> >>> Overall the real purpose of this series is to enable IOMMU nested
> >>> translation capability with vSVA as one major usage, through
> >>> below new uAPIs:
> >>>   1) Report/enable IOMMU nested translation capability;
> >>>   2) Allocate/free PASID;
> >>>   3) Bind/unbind guest page table;
> >>>   4) Invalidate IOMMU cache;
> >>>   5) Handle IOMMU page request/response (not in this series);
> >>> 1/3/4) is the minimal set for using IOMMU nested translation, with
> >>> the other two optional. For example, the guest may enable vSVA on
> >>> a device without using PASID. Or, it may bind its gIOVA page table
> >>> which doesn't require page fault support. Finally, all operations can
> >>> be applied to either physical device or subdevice.
> >>>
> >>> Then we evaluated each uAPI whether generalizing it is a good thing
> >>> both in concept and regarding to complexity.
> >>>
> >>> First, unlike other uAPIs which are all backed by iommu_ops, PASID
> >>> allocation/free is through the IOASID sub-system.
> >>
> >> A question here, is IOASID expected to be the single management
> >> interface for PASID?
> > yes
> >
> >> (I'm asking since there're already vendor specific IDA based PASID
> >> allocator e.g amdgpu_pasid_alloc())
> > That comes before IOASID core was introduced. I think it should be
> > changed to use the new generic interface. Jacob/Jean can better
> > comment if other reason exists for this exception.
> 
> 
> If there's no exception it should be fixed.
> 
> 
> >
> >>
> >>>From this angle
> >>> we feel generalizing PASID management does make some sense.
> >>> First, PASID is just a number and not related to any device before
> >>> it's bound to a page table and IOMMU domain. Second, PASID is a
> >>> global resource (at least on Intel VT-d),
> >>
> >> I think we need a definition of "global" here. It looks to me for vt-d
> >> the PASID table is per device.
> > PASID table is per device, thus VT-d could support per-device PASIDs
> > in concept.
> 
> 
> I think that's the requirement of PCIE spec which said PASID + RID
> identifies the process address space ID.
> 
> 
> >   However on Intel platform we require PASIDs to be managed
> > in system-wide (cross host and guest) when combining vSVA, SIOV, SR-IOV
> > and ENQCMD together.
> 
> 
> Any reason for such requirement? (I'm not familiar with ENQCMD, but my
> understanding is that vSVA, SIOV or SR-IOV doesn't have the requirement
> for system-wide PASID).

ENQCMD is a new instruction to allow multiple processes submitting
workload to one shared workqueue. Each process has an unique PASID
saved in a MSR, which is included in the ENQCMD payload to indicate
the address space when the CPU sends to the device. As one process 
might issue ENQCMD to multiple devices, OS-wide PASID allocation is 
required both in host and guest side.

When executing ENQCMD in the guest to a SIOV device, the guest
programmed value in the PASID_MSR must be translated to a host PASID
value for proper function/isolation as PASID represents the address
space. The translation is done through a new VMCS PASID translation 
structure (per-VM, and 1:1 mapping). From this angle the host PASIDs 
must be allocated 'globally' cross all assigned devices otherwise it may 
lead to 1:N mapping when a guest process issues ENQCMD to multiple 
assigned devices/subdevices. 

There will be a KVM forum session for this topic btw.

> 
> 
> > Thus the host creates only one 'global' PASID
> > namespace but do use per-device PASID table to assure isolation between
> > devices on Intel platforms. But ARM does it differently as Jean explained.
> > They have a global namespace for host processes on all host-owned
> > devices (same as Intel), but then per-device namespace when a device
> > (and its PASID table) is assigned to userspace.
> >
> >> Another question, is this possible to have two DMAR hardware unit(at
> >> least I can see two even in my laptop). In this case, is PASID still a
> >> global resource?
> > yes
> >
> >>
> >>>while having separate VFIO/
> >>> VDPA allocation interfaces may easily cause confusion in userspace,
> >>> e.g. which interface to be u

Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-15 Thread Jason Wang


On 2020/10/15 上午7:10, Alex Williamson wrote:

On Wed, 14 Oct 2020 03:08:31 +
"Tian, Kevin"  wrote:


From: Jason Wang 
Sent: Tuesday, October 13, 2020 2:22 PM


On 2020/10/12 下午4:38, Tian, Kevin wrote:

From: Jason Wang 
Sent: Monday, September 14, 2020 12:20 PM
  

[...]
   > If it's possible, I would suggest a generic uAPI instead of a VFIO

specific one.

Jason suggest something like /dev/sva. There will be a lot of other
subsystems that could benefit from this (e.g vDPA).

Have you ever considered this approach?
  

Hi, Jason,

We did some study on this approach and below is the output. It's a
long writing but I didn't find a way to further abstract w/o losing
necessary context. Sorry about that.

Overall the real purpose of this series is to enable IOMMU nested
translation capability with vSVA as one major usage, through
below new uAPIs:
1) Report/enable IOMMU nested translation capability;
2) Allocate/free PASID;
3) Bind/unbind guest page table;
4) Invalidate IOMMU cache;
5) Handle IOMMU page request/response (not in this series);
1/3/4) is the minimal set for using IOMMU nested translation, with
the other two optional. For example, the guest may enable vSVA on
a device without using PASID. Or, it may bind its gIOVA page table
which doesn't require page fault support. Finally, all operations can
be applied to either physical device or subdevice.

Then we evaluated each uAPI whether generalizing it is a good thing
both in concept and regarding to complexity.

First, unlike other uAPIs which are all backed by iommu_ops, PASID
allocation/free is through the IOASID sub-system.


A question here, is IOASID expected to be the single management
interface for PASID?

yes


(I'm asking since there're already vendor specific IDA based PASID
allocator e.g amdgpu_pasid_alloc())

That comes before IOASID core was introduced. I think it should be
changed to use the new generic interface. Jacob/Jean can better
comment if other reason exists for this exception.

   

   From this angle
we feel generalizing PASID management does make some sense.
First, PASID is just a number and not related to any device before
it's bound to a page table and IOMMU domain. Second, PASID is a
global resource (at least on Intel VT-d),


I think we need a definition of "global" here. It looks to me for vt-d
the PASID table is per device.

PASID table is per device, thus VT-d could support per-device PASIDs
in concept. However on Intel platform we require PASIDs to be managed
in system-wide (cross host and guest) when combining vSVA, SIOV, SR-IOV
and ENQCMD together. Thus the host creates only one 'global' PASID
namespace but do use per-device PASID table to assure isolation between
devices on Intel platforms. But ARM does it differently as Jean explained.
They have a global namespace for host processes on all host-owned
devices (same as Intel), but then per-device namespace when a device
(and its PASID table) is assigned to userspace.


Another question, is this possible to have two DMAR hardware unit(at
least I can see two even in my laptop). In this case, is PASID still a
global resource?

yes

   

   while having separate VFIO/
VDPA allocation interfaces may easily cause confusion in userspace,
e.g. which interface to be used if both VFIO/VDPA devices exist.
Moreover, an unified interface allows centralized control over how
many PASIDs are allowed per process.


Yes.

   

One unclear part with this generalization is about the permission.
Do we open this interface to any process or only to those which
have assigned devices? If the latter, what would be the mechanism
to coordinate between this new interface and specific passthrough
frameworks?


I'm not sure, but if you just want a permission, you probably can
introduce new capability (CAP_XXX) for this.

   

   A more tricky case, vSVA support on ARM (Eric/Jean
please correct me) plans to do per-device PASID namespace which
is built on a bind_pasid_table iommu callback to allow guest fully
manage its PASIDs on a given passthrough device.


I see, so I think the answer is to prepare for the namespace support
from the start. (btw, I don't see how namespace is handled in current
IOASID module?)

The PASID table is based on GPA when nested translation is enabled
on ARM SMMU. This design implies that the guest manages PASID
table thus PASIDs instead of going through host-side API on assigned
device. From this angle we don't need explicit namespace in the host
API. Just need a way to control how many PASIDs a process is allowed
to allocate in the global namespace. btw IOASID module already has
'set' concept per-process and PASIDs are managed per-set. Then the
quota control can be easily introduced in the 'set' level.

   

   I'm not sure
how such requirement can be unified w/o involving passthrough
frameworks, or whether ARM could also switch to global PASID
style...

Second, IOMMU nested translation is a per IOMMU domain
capability. Since IO

Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-14 Thread Jason Wang


On 2020/10/14 上午11:08, Tian, Kevin wrote:

From: Jason Wang 
Sent: Tuesday, October 13, 2020 2:22 PM


On 2020/10/12 下午4:38, Tian, Kevin wrote:

From: Jason Wang 
Sent: Monday, September 14, 2020 12:20 PM


[...]
   > If it's possible, I would suggest a generic uAPI instead of a VFIO

specific one.

Jason suggest something like /dev/sva. There will be a lot of other
subsystems that could benefit from this (e.g vDPA).

Have you ever considered this approach?


Hi, Jason,

We did some study on this approach and below is the output. It's a
long writing but I didn't find a way to further abstract w/o losing
necessary context. Sorry about that.

Overall the real purpose of this series is to enable IOMMU nested
translation capability with vSVA as one major usage, through
below new uAPIs:
1) Report/enable IOMMU nested translation capability;
2) Allocate/free PASID;
3) Bind/unbind guest page table;
4) Invalidate IOMMU cache;
5) Handle IOMMU page request/response (not in this series);
1/3/4) is the minimal set for using IOMMU nested translation, with
the other two optional. For example, the guest may enable vSVA on
a device without using PASID. Or, it may bind its gIOVA page table
which doesn't require page fault support. Finally, all operations can
be applied to either physical device or subdevice.

Then we evaluated each uAPI whether generalizing it is a good thing
both in concept and regarding to complexity.

First, unlike other uAPIs which are all backed by iommu_ops, PASID
allocation/free is through the IOASID sub-system.


A question here, is IOASID expected to be the single management
interface for PASID?

yes


(I'm asking since there're already vendor specific IDA based PASID
allocator e.g amdgpu_pasid_alloc())

That comes before IOASID core was introduced. I think it should be
changed to use the new generic interface. Jacob/Jean can better
comment if other reason exists for this exception.



If there's no exception it should be fixed.







   From this angle
we feel generalizing PASID management does make some sense.
First, PASID is just a number and not related to any device before
it's bound to a page table and IOMMU domain. Second, PASID is a
global resource (at least on Intel VT-d),


I think we need a definition of "global" here. It looks to me for vt-d
the PASID table is per device.

PASID table is per device, thus VT-d could support per-device PASIDs
in concept.



I think that's the requirement of PCIE spec which said PASID + RID 
identifies the process address space ID.




  However on Intel platform we require PASIDs to be managed
in system-wide (cross host and guest) when combining vSVA, SIOV, SR-IOV
and ENQCMD together.



Any reason for such requirement? (I'm not familiar with ENQCMD, but my 
understanding is that vSVA, SIOV or SR-IOV doesn't have the requirement 
for system-wide PASID).




Thus the host creates only one 'global' PASID
namespace but do use per-device PASID table to assure isolation between
devices on Intel platforms. But ARM does it differently as Jean explained.
They have a global namespace for host processes on all host-owned
devices (same as Intel), but then per-device namespace when a device
(and its PASID table) is assigned to userspace.


Another question, is this possible to have two DMAR hardware unit(at
least I can see two even in my laptop). In this case, is PASID still a
global resource?

yes




   while having separate VFIO/
VDPA allocation interfaces may easily cause confusion in userspace,
e.g. which interface to be used if both VFIO/VDPA devices exist.
Moreover, an unified interface allows centralized control over how
many PASIDs are allowed per process.


Yes.



One unclear part with this generalization is about the permission.
Do we open this interface to any process or only to those which
have assigned devices? If the latter, what would be the mechanism
to coordinate between this new interface and specific passthrough
frameworks?


I'm not sure, but if you just want a permission, you probably can
introduce new capability (CAP_XXX) for this.



   A more tricky case, vSVA support on ARM (Eric/Jean
please correct me) plans to do per-device PASID namespace which
is built on a bind_pasid_table iommu callback to allow guest fully
manage its PASIDs on a given passthrough device.


I see, so I think the answer is to prepare for the namespace support
from the start. (btw, I don't see how namespace is handled in current
IOASID module?)

The PASID table is based on GPA when nested translation is enabled
on ARM SMMU. This design implies that the guest manages PASID
table thus PASIDs instead of going through host-side API on assigned
device. From this angle we don't need explicit namespace in the host
API. Just need a way to control how many PASIDs a process is allowed
to allocate in the global namespace. btw IOASID module already has
'set' concept per-process and PASIDs are managed per-set. Then the
quota control can be 

Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-14 Thread Alex Williamson
On Wed, 14 Oct 2020 03:08:31 +
"Tian, Kevin"  wrote:

> > From: Jason Wang 
> > Sent: Tuesday, October 13, 2020 2:22 PM
> > 
> > 
> > On 2020/10/12 下午4:38, Tian, Kevin wrote:  
> > >> From: Jason Wang 
> > >> Sent: Monday, September 14, 2020 12:20 PM
> > >>  
> > > [...]  
> > >   > If it's possible, I would suggest a generic uAPI instead of a VFIO
> > >> specific one.
> > >>
> > >> Jason suggest something like /dev/sva. There will be a lot of other
> > >> subsystems that could benefit from this (e.g vDPA).
> > >>
> > >> Have you ever considered this approach?
> > >>  
> > > Hi, Jason,
> > >
> > > We did some study on this approach and below is the output. It's a
> > > long writing but I didn't find a way to further abstract w/o losing
> > > necessary context. Sorry about that.
> > >
> > > Overall the real purpose of this series is to enable IOMMU nested
> > > translation capability with vSVA as one major usage, through
> > > below new uAPIs:
> > >   1) Report/enable IOMMU nested translation capability;
> > >   2) Allocate/free PASID;
> > >   3) Bind/unbind guest page table;
> > >   4) Invalidate IOMMU cache;
> > >   5) Handle IOMMU page request/response (not in this series);
> > > 1/3/4) is the minimal set for using IOMMU nested translation, with
> > > the other two optional. For example, the guest may enable vSVA on
> > > a device without using PASID. Or, it may bind its gIOVA page table
> > > which doesn't require page fault support. Finally, all operations can
> > > be applied to either physical device or subdevice.
> > >
> > > Then we evaluated each uAPI whether generalizing it is a good thing
> > > both in concept and regarding to complexity.
> > >
> > > First, unlike other uAPIs which are all backed by iommu_ops, PASID
> > > allocation/free is through the IOASID sub-system.  
> > 
> > 
> > A question here, is IOASID expected to be the single management
> > interface for PASID?  
> 
> yes
> 
> > 
> > (I'm asking since there're already vendor specific IDA based PASID
> > allocator e.g amdgpu_pasid_alloc())  
> 
> That comes before IOASID core was introduced. I think it should be
> changed to use the new generic interface. Jacob/Jean can better
> comment if other reason exists for this exception.
> 
> > 
> >   
> > >   From this angle
> > > we feel generalizing PASID management does make some sense.
> > > First, PASID is just a number and not related to any device before
> > > it's bound to a page table and IOMMU domain. Second, PASID is a
> > > global resource (at least on Intel VT-d),  
> > 
> > 
> > I think we need a definition of "global" here. It looks to me for vt-d
> > the PASID table is per device.  
> 
> PASID table is per device, thus VT-d could support per-device PASIDs
> in concept. However on Intel platform we require PASIDs to be managed 
> in system-wide (cross host and guest) when combining vSVA, SIOV, SR-IOV 
> and ENQCMD together. Thus the host creates only one 'global' PASID 
> namespace but do use per-device PASID table to assure isolation between 
> devices on Intel platforms. But ARM does it differently as Jean explained. 
> They have a global namespace for host processes on all host-owned 
> devices (same as Intel), but then per-device namespace when a device 
> (and its PASID table) is assigned to userspace.
> 
> > 
> > Another question, is this possible to have two DMAR hardware unit(at
> > least I can see two even in my laptop). In this case, is PASID still a
> > global resource?  
> 
> yes
> 
> > 
> >   
> > >   while having separate VFIO/
> > > VDPA allocation interfaces may easily cause confusion in userspace,
> > > e.g. which interface to be used if both VFIO/VDPA devices exist.
> > > Moreover, an unified interface allows centralized control over how
> > > many PASIDs are allowed per process.  
> > 
> > 
> > Yes.
> > 
> >   
> > >
> > > One unclear part with this generalization is about the permission.
> > > Do we open this interface to any process or only to those which
> > > have assigned devices? If the latter, what would be the mechanism
> > > to coordinate between this new interface and specific passthrough
> > > frameworks?  
> > 
> > 
> > I'm not sure, but if you just want a permission, you probably can
> > introduce new capability (CAP_XXX) for this.
> > 
> >   
> > >   A more tricky case, vSVA support on ARM (Eric/Jean
> > > please correct me) plans to do per-device PASID namespace which
> > > is built on a bind_pasid_table iommu callback to allow guest fully
> > > manage its PASIDs on a given passthrough device.  
> > 
> > 
> > I see, so I think the answer is to prepare for the namespace support
> > from the start. (btw, I don't see how namespace is handled in current
> > IOASID module?)  
> 
> The PASID table is based on GPA when nested translation is enabled 
> on ARM SMMU. This design implies that the guest manages PASID
> table thus PASIDs instead of going through host-side API on assigned 
> device. From this angle we don't need explicit namespace in the

RE: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-13 Thread Tian, Kevin
Hi, Alex and Jason (G),

How about your opinion for this new proposal? For now looks both
Jason (W) and Jean are OK with this direction and more discussions
are possibly required for the new /dev/ioasid interface. Internally 
we're doing a quick prototype to see any unforeseen issue with this
separation. 

Please let us know your thoughts.

Thanks
Kevin

> From: Tian, Kevin 
> Sent: Monday, October 12, 2020 4:39 PM
> 
> > From: Jason Wang 
> > Sent: Monday, September 14, 2020 12:20 PM
> >
> [...]
>  > If it's possible, I would suggest a generic uAPI instead of a VFIO
> > specific one.
> >
> > Jason suggest something like /dev/sva. There will be a lot of other
> > subsystems that could benefit from this (e.g vDPA).
> >
> > Have you ever considered this approach?
> >
> 
> Hi, Jason,
> 
> We did some study on this approach and below is the output. It's a
> long writing but I didn't find a way to further abstract w/o losing
> necessary context. Sorry about that.
> 
> Overall the real purpose of this series is to enable IOMMU nested
> translation capability with vSVA as one major usage, through
> below new uAPIs:
>   1) Report/enable IOMMU nested translation capability;
>   2) Allocate/free PASID;
>   3) Bind/unbind guest page table;
>   4) Invalidate IOMMU cache;
>   5) Handle IOMMU page request/response (not in this series);
> 1/3/4) is the minimal set for using IOMMU nested translation, with
> the other two optional. For example, the guest may enable vSVA on
> a device without using PASID. Or, it may bind its gIOVA page table
> which doesn't require page fault support. Finally, all operations can
> be applied to either physical device or subdevice.
> 
> Then we evaluated each uAPI whether generalizing it is a good thing
> both in concept and regarding to complexity.
> 
> First, unlike other uAPIs which are all backed by iommu_ops, PASID
> allocation/free is through the IOASID sub-system. From this angle
> we feel generalizing PASID management does make some sense.
> First, PASID is just a number and not related to any device before
> it's bound to a page table and IOMMU domain. Second, PASID is a
> global resource (at least on Intel VT-d), while having separate VFIO/
> VDPA allocation interfaces may easily cause confusion in userspace,
> e.g. which interface to be used if both VFIO/VDPA devices exist.
> Moreover, an unified interface allows centralized control over how
> many PASIDs are allowed per process.
> 
> One unclear part with this generalization is about the permission.
> Do we open this interface to any process or only to those which
> have assigned devices? If the latter, what would be the mechanism
> to coordinate between this new interface and specific passthrough
> frameworks? A more tricky case, vSVA support on ARM (Eric/Jean
> please correct me) plans to do per-device PASID namespace which
> is built on a bind_pasid_table iommu callback to allow guest fully
> manage its PASIDs on a given passthrough device. I'm not sure
> how such requirement can be unified w/o involving passthrough
> frameworks, or whether ARM could also switch to global PASID
> style...
> 
> Second, IOMMU nested translation is a per IOMMU domain
> capability. Since IOMMU domains are managed by VFIO/VDPA
>  (alloc/free domain, attach/detach device, set/get domain attribute,
> etc.), reporting/enabling the nesting capability is an natural
> extension to the domain uAPI of existing passthrough frameworks.
> Actually, VFIO already includes a nesting enable interface even
> before this series. So it doesn't make sense to generalize this uAPI
> out.
> 
> Then the tricky part comes with the remaining operations (3/4/5),
> which are all backed by iommu_ops thus effective only within an
> IOMMU domain. To generalize them, the first thing is to find a way
> to associate the sva_FD (opened through generic /dev/sva) with an
> IOMMU domain that is created by VFIO/VDPA. The second thing is
> to replicate {domain<->device/subdevice} association in /dev/sva
> path because some operations (e.g. page fault) is triggered/handled
> per device/subdevice. Therefore, /dev/sva must provide both per-
> domain and per-device uAPIs similar to what VFIO/VDPA already
> does. Moreover, mapping page fault to subdevice requires pre-
> registering subdevice fault data to IOMMU layer when binding
> guest page table, while such fault data can be only retrieved from
> parent driver through VFIO/VDPA.
> 
> However, we failed to find a good way even at the 1st step about
> domain association. The iommu domains are not exposed to the
> userspace, and there is no 1:1 mapping between domain and device.
> In VFIO, all devices within the same VFIO container share the address
> space but they may be organized in multiple IOMMU domains based
> on their bus type. How (should we let) the userspace know the
> domain information and open an sva_FD for each domain is the main
> problem here.
> 
> In the end we just realized that doing such generalization

RE: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-13 Thread Tian, Kevin
> From: Jason Wang 
> Sent: Tuesday, October 13, 2020 2:22 PM
> 
> 
> On 2020/10/12 下午4:38, Tian, Kevin wrote:
> >> From: Jason Wang 
> >> Sent: Monday, September 14, 2020 12:20 PM
> >>
> > [...]
> >   > If it's possible, I would suggest a generic uAPI instead of a VFIO
> >> specific one.
> >>
> >> Jason suggest something like /dev/sva. There will be a lot of other
> >> subsystems that could benefit from this (e.g vDPA).
> >>
> >> Have you ever considered this approach?
> >>
> > Hi, Jason,
> >
> > We did some study on this approach and below is the output. It's a
> > long writing but I didn't find a way to further abstract w/o losing
> > necessary context. Sorry about that.
> >
> > Overall the real purpose of this series is to enable IOMMU nested
> > translation capability with vSVA as one major usage, through
> > below new uAPIs:
> > 1) Report/enable IOMMU nested translation capability;
> > 2) Allocate/free PASID;
> > 3) Bind/unbind guest page table;
> > 4) Invalidate IOMMU cache;
> > 5) Handle IOMMU page request/response (not in this series);
> > 1/3/4) is the minimal set for using IOMMU nested translation, with
> > the other two optional. For example, the guest may enable vSVA on
> > a device without using PASID. Or, it may bind its gIOVA page table
> > which doesn't require page fault support. Finally, all operations can
> > be applied to either physical device or subdevice.
> >
> > Then we evaluated each uAPI whether generalizing it is a good thing
> > both in concept and regarding to complexity.
> >
> > First, unlike other uAPIs which are all backed by iommu_ops, PASID
> > allocation/free is through the IOASID sub-system.
> 
> 
> A question here, is IOASID expected to be the single management
> interface for PASID?

yes

> 
> (I'm asking since there're already vendor specific IDA based PASID
> allocator e.g amdgpu_pasid_alloc())

That comes before IOASID core was introduced. I think it should be
changed to use the new generic interface. Jacob/Jean can better
comment if other reason exists for this exception.

> 
> 
> >   From this angle
> > we feel generalizing PASID management does make some sense.
> > First, PASID is just a number and not related to any device before
> > it's bound to a page table and IOMMU domain. Second, PASID is a
> > global resource (at least on Intel VT-d),
> 
> 
> I think we need a definition of "global" here. It looks to me for vt-d
> the PASID table is per device.

PASID table is per device, thus VT-d could support per-device PASIDs
in concept. However on Intel platform we require PASIDs to be managed 
in system-wide (cross host and guest) when combining vSVA, SIOV, SR-IOV 
and ENQCMD together. Thus the host creates only one 'global' PASID 
namespace but do use per-device PASID table to assure isolation between 
devices on Intel platforms. But ARM does it differently as Jean explained. 
They have a global namespace for host processes on all host-owned 
devices (same as Intel), but then per-device namespace when a device 
(and its PASID table) is assigned to userspace.

> 
> Another question, is this possible to have two DMAR hardware unit(at
> least I can see two even in my laptop). In this case, is PASID still a
> global resource?

yes

> 
> 
> >   while having separate VFIO/
> > VDPA allocation interfaces may easily cause confusion in userspace,
> > e.g. which interface to be used if both VFIO/VDPA devices exist.
> > Moreover, an unified interface allows centralized control over how
> > many PASIDs are allowed per process.
> 
> 
> Yes.
> 
> 
> >
> > One unclear part with this generalization is about the permission.
> > Do we open this interface to any process or only to those which
> > have assigned devices? If the latter, what would be the mechanism
> > to coordinate between this new interface and specific passthrough
> > frameworks?
> 
> 
> I'm not sure, but if you just want a permission, you probably can
> introduce new capability (CAP_XXX) for this.
> 
> 
> >   A more tricky case, vSVA support on ARM (Eric/Jean
> > please correct me) plans to do per-device PASID namespace which
> > is built on a bind_pasid_table iommu callback to allow guest fully
> > manage its PASIDs on a given passthrough device.
> 
> 
> I see, so I think the answer is to prepare for the namespace support
> from the start. (btw, I don't see how namespace is handled in current
> IOASID module?)

The PASID table is based on GPA when nested translation is enabled 
on ARM SMMU. This design implies that the guest manages PASID
table thus PASIDs instead of going through host-side API on assigned 
device. From this angle we don't need explicit namespace in the host 
API. Just need a way to control how many PASIDs a process is allowed 
to allocate in the global namespace. btw IOASID module already has 
'set' concept per-process and PASIDs are managed per-set. Then the 
quota control can be easily introduced in the 'set' level.

> 
> 
> >   I'm not sure
> > how such requirement can be un

RE: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-13 Thread Tian, Kevin
> From: Jean-Philippe Brucker 
> Sent: Tuesday, October 13, 2020 6:28 PM
> 
> On Mon, Oct 12, 2020 at 08:38:54AM +, Tian, Kevin wrote:
> > > From: Jason Wang 
> > > Sent: Monday, September 14, 2020 12:20 PM
> > >
> > [...]
> >  > If it's possible, I would suggest a generic uAPI instead of a VFIO
> > > specific one.
> > >
> > > Jason suggest something like /dev/sva. There will be a lot of other
> > > subsystems that could benefit from this (e.g vDPA).
> > >
> > > Have you ever considered this approach?
> > >
> >
> > Hi, Jason,
> >
> > We did some study on this approach and below is the output. It's a
> > long writing but I didn't find a way to further abstract w/o losing
> > necessary context. Sorry about that.
> >
> > Overall the real purpose of this series is to enable IOMMU nested
> > translation capability with vSVA as one major usage, through
> > below new uAPIs:
> > 1) Report/enable IOMMU nested translation capability;
> > 2) Allocate/free PASID;
> > 3) Bind/unbind guest page table;
> > 4) Invalidate IOMMU cache;
> > 5) Handle IOMMU page request/response (not in this series);
> > 1/3/4) is the minimal set for using IOMMU nested translation, with
> > the other two optional. For example, the guest may enable vSVA on
> > a device without using PASID. Or, it may bind its gIOVA page table
> > which doesn't require page fault support. Finally, all operations can
> > be applied to either physical device or subdevice.
> >
> > Then we evaluated each uAPI whether generalizing it is a good thing
> > both in concept and regarding to complexity.
> >
> > First, unlike other uAPIs which are all backed by iommu_ops, PASID
> > allocation/free is through the IOASID sub-system. From this angle
> > we feel generalizing PASID management does make some sense.
> > First, PASID is just a number and not related to any device before
> > it's bound to a page table and IOMMU domain. Second, PASID is a
> > global resource (at least on Intel VT-d), while having separate VFIO/
> > VDPA allocation interfaces may easily cause confusion in userspace,
> > e.g. which interface to be used if both VFIO/VDPA devices exist.
> > Moreover, an unified interface allows centralized control over how
> > many PASIDs are allowed per process.
> >
> > One unclear part with this generalization is about the permission.
> > Do we open this interface to any process or only to those which
> > have assigned devices? If the latter, what would be the mechanism
> > to coordinate between this new interface and specific passthrough
> > frameworks? A more tricky case, vSVA support on ARM (Eric/Jean
> > please correct me) plans to do per-device PASID namespace which
> > is built on a bind_pasid_table iommu callback to allow guest fully
> > manage its PASIDs on a given passthrough device.
> 
> Yes we need a bind_pasid_table. The guest needs to allocate the PASID
> tables because they are accessed via guest-physical addresses by the HW
> SMMU.
> 
> With bind_pasid_table, the invalidation message also requires a scope to
> invalidate a whole PASID context, in addition to invalidating a mappings
> ranges.
> 
> > I'm not sure
> > how such requirement can be unified w/o involving passthrough
> > frameworks, or whether ARM could also switch to global PASID
> > style...
> 
> Not planned at the moment, sorry. It requires a PV IOMMU to do PASID
> allocation, which is possible with virtio-iommu but not with a vSMMU
> emulation. The VM will manage its own PASID space. The upside is that we
> don't need userspace access to IOASID, so I won't pester you with comments
> on that part of the API :)

It makes sense. Possibly in the future when you plan to support 
SIOV-like capability then you may have to convert PASID table
to use host physical address then the same API could be reused. :)

Thanks
Kevin

> 
> > Second, IOMMU nested translation is a per IOMMU domain
> > capability. Since IOMMU domains are managed by VFIO/VDPA
> >  (alloc/free domain, attach/detach device, set/get domain attribute,
> > etc.), reporting/enabling the nesting capability is an natural
> > extension to the domain uAPI of existing passthrough frameworks.
> > Actually, VFIO already includes a nesting enable interface even
> > before this series. So it doesn't make sense to generalize this uAPI
> > out.
> 
> Agree for enabling, but for reporting we did consider adding a sysfs
> interface in /sys/class/iommu/ describing an IOMMU's properties. Then
> opted for VFIO capabilities to keep the API nice and contained, but if
> we're breaking up the API, sysfs might be more convenient to use and
> extend.
> 
> > Then the tricky part comes with the remaining operations (3/4/5),
> > which are all backed by iommu_ops thus effective only within an
> > IOMMU domain. To generalize them, the first thing is to find a way
> > to associate the sva_FD (opened through generic /dev/sva) with an
> > IOMMU domain that is created by VFIO/VDPA. The second thing is
> > to replicate {domain<->device/subdevice

Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-13 Thread Jean-Philippe Brucker
On Mon, Oct 12, 2020 at 08:38:54AM +, Tian, Kevin wrote:
> > From: Jason Wang 
> > Sent: Monday, September 14, 2020 12:20 PM
> >
> [...]
>  > If it's possible, I would suggest a generic uAPI instead of a VFIO
> > specific one.
> > 
> > Jason suggest something like /dev/sva. There will be a lot of other
> > subsystems that could benefit from this (e.g vDPA).
> > 
> > Have you ever considered this approach?
> > 
> 
> Hi, Jason,
> 
> We did some study on this approach and below is the output. It's a
> long writing but I didn't find a way to further abstract w/o losing 
> necessary context. Sorry about that.
> 
> Overall the real purpose of this series is to enable IOMMU nested
> translation capability with vSVA as one major usage, through
> below new uAPIs:
>   1) Report/enable IOMMU nested translation capability;
>   2) Allocate/free PASID;
>   3) Bind/unbind guest page table;
>   4) Invalidate IOMMU cache;
>   5) Handle IOMMU page request/response (not in this series);
> 1/3/4) is the minimal set for using IOMMU nested translation, with 
> the other two optional. For example, the guest may enable vSVA on 
> a device without using PASID. Or, it may bind its gIOVA page table 
> which doesn't require page fault support. Finally, all operations can 
> be applied to either physical device or subdevice.
> 
> Then we evaluated each uAPI whether generalizing it is a good thing 
> both in concept and regarding to complexity.
> 
> First, unlike other uAPIs which are all backed by iommu_ops, PASID 
> allocation/free is through the IOASID sub-system. From this angle
> we feel generalizing PASID management does make some sense. 
> First, PASID is just a number and not related to any device before 
> it's bound to a page table and IOMMU domain. Second, PASID is a 
> global resource (at least on Intel VT-d), while having separate VFIO/
> VDPA allocation interfaces may easily cause confusion in userspace,
> e.g. which interface to be used if both VFIO/VDPA devices exist. 
> Moreover, an unified interface allows centralized control over how 
> many PASIDs are allowed per process.
> 
> One unclear part with this generalization is about the permission.
> Do we open this interface to any process or only to those which
> have assigned devices? If the latter, what would be the mechanism
> to coordinate between this new interface and specific passthrough 
> frameworks? A more tricky case, vSVA support on ARM (Eric/Jean
> please correct me) plans to do per-device PASID namespace which
> is built on a bind_pasid_table iommu callback to allow guest fully 
> manage its PASIDs on a given passthrough device.

Yes we need a bind_pasid_table. The guest needs to allocate the PASID
tables because they are accessed via guest-physical addresses by the HW
SMMU.

With bind_pasid_table, the invalidation message also requires a scope to
invalidate a whole PASID context, in addition to invalidating a mappings
ranges.

> I'm not sure 
> how such requirement can be unified w/o involving passthrough
> frameworks, or whether ARM could also switch to global PASID 
> style...

Not planned at the moment, sorry. It requires a PV IOMMU to do PASID
allocation, which is possible with virtio-iommu but not with a vSMMU
emulation. The VM will manage its own PASID space. The upside is that we
don't need userspace access to IOASID, so I won't pester you with comments
on that part of the API :)

> Second, IOMMU nested translation is a per IOMMU domain
> capability. Since IOMMU domains are managed by VFIO/VDPA
>  (alloc/free domain, attach/detach device, set/get domain attribute,
> etc.), reporting/enabling the nesting capability is an natural 
> extension to the domain uAPI of existing passthrough frameworks. 
> Actually, VFIO already includes a nesting enable interface even 
> before this series. So it doesn't make sense to generalize this uAPI 
> out.

Agree for enabling, but for reporting we did consider adding a sysfs
interface in /sys/class/iommu/ describing an IOMMU's properties. Then
opted for VFIO capabilities to keep the API nice and contained, but if
we're breaking up the API, sysfs might be more convenient to use and
extend.

> Then the tricky part comes with the remaining operations (3/4/5),
> which are all backed by iommu_ops thus effective only within an 
> IOMMU domain. To generalize them, the first thing is to find a way 
> to associate the sva_FD (opened through generic /dev/sva) with an 
> IOMMU domain that is created by VFIO/VDPA. The second thing is 
> to replicate {domain<->device/subdevice} association in /dev/sva 
> path because some operations (e.g. page fault) is triggered/handled 
> per device/subdevice. Therefore, /dev/sva must provide both per-
> domain and per-device uAPIs similar to what VFIO/VDPA already 
> does. Moreover, mapping page fault to subdevice requires pre-
> registering subdevice fault data to IOMMU layer when binding 
> guest page table, while such fault data can be only retrieved from 
> parent 

Re: (proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-12 Thread Jason Wang


On 2020/10/12 下午4:38, Tian, Kevin wrote:

From: Jason Wang 
Sent: Monday, September 14, 2020 12:20 PM


[...]
  > If it's possible, I would suggest a generic uAPI instead of a VFIO

specific one.

Jason suggest something like /dev/sva. There will be a lot of other
subsystems that could benefit from this (e.g vDPA).

Have you ever considered this approach?


Hi, Jason,

We did some study on this approach and below is the output. It's a
long writing but I didn't find a way to further abstract w/o losing
necessary context. Sorry about that.

Overall the real purpose of this series is to enable IOMMU nested
translation capability with vSVA as one major usage, through
below new uAPIs:
1) Report/enable IOMMU nested translation capability;
2) Allocate/free PASID;
3) Bind/unbind guest page table;
4) Invalidate IOMMU cache;
5) Handle IOMMU page request/response (not in this series);
1/3/4) is the minimal set for using IOMMU nested translation, with
the other two optional. For example, the guest may enable vSVA on
a device without using PASID. Or, it may bind its gIOVA page table
which doesn't require page fault support. Finally, all operations can
be applied to either physical device or subdevice.

Then we evaluated each uAPI whether generalizing it is a good thing
both in concept and regarding to complexity.

First, unlike other uAPIs which are all backed by iommu_ops, PASID
allocation/free is through the IOASID sub-system.



A question here, is IOASID expected to be the single management 
interface for PASID?


(I'm asking since there're already vendor specific IDA based PASID 
allocator e.g amdgpu_pasid_alloc())




  From this angle
we feel generalizing PASID management does make some sense.
First, PASID is just a number and not related to any device before
it's bound to a page table and IOMMU domain. Second, PASID is a
global resource (at least on Intel VT-d),



I think we need a definition of "global" here. It looks to me for vt-d 
the PASID table is per device.


Another question, is this possible to have two DMAR hardware unit(at 
least I can see two even in my laptop). In this case, is PASID still a 
global resource?




  while having separate VFIO/
VDPA allocation interfaces may easily cause confusion in userspace,
e.g. which interface to be used if both VFIO/VDPA devices exist.
Moreover, an unified interface allows centralized control over how
many PASIDs are allowed per process.



Yes.




One unclear part with this generalization is about the permission.
Do we open this interface to any process or only to those which
have assigned devices? If the latter, what would be the mechanism
to coordinate between this new interface and specific passthrough
frameworks?



I'm not sure, but if you just want a permission, you probably can 
introduce new capability (CAP_XXX) for this.




  A more tricky case, vSVA support on ARM (Eric/Jean
please correct me) plans to do per-device PASID namespace which
is built on a bind_pasid_table iommu callback to allow guest fully
manage its PASIDs on a given passthrough device.



I see, so I think the answer is to prepare for the namespace support 
from the start. (btw, I don't see how namespace is handled in current 
IOASID module?)




  I'm not sure
how such requirement can be unified w/o involving passthrough
frameworks, or whether ARM could also switch to global PASID
style...

Second, IOMMU nested translation is a per IOMMU domain
capability. Since IOMMU domains are managed by VFIO/VDPA
  (alloc/free domain, attach/detach device, set/get domain attribute,
etc.), reporting/enabling the nesting capability is an natural
extension to the domain uAPI of existing passthrough frameworks.
Actually, VFIO already includes a nesting enable interface even
before this series. So it doesn't make sense to generalize this uAPI
out.



So my understanding is that VFIO already:

1) use multiple fds
2) separate IOMMU ops to a dedicated container fd (type1 iommu)
3) provides API to associated devices/group with a container

And all the proposal in this series is to reuse the container fd. It 
should be possible to replace e.g type1 IOMMU with a unified module.





Then the tricky part comes with the remaining operations (3/4/5),
which are all backed by iommu_ops thus effective only within an
IOMMU domain. To generalize them, the first thing is to find a way
to associate the sva_FD (opened through generic /dev/sva) with an
IOMMU domain that is created by VFIO/VDPA. The second thing is
to replicate {domain<->device/subdevice} association in /dev/sva
path because some operations (e.g. page fault) is triggered/handled
per device/subdevice.



Is there any reason that the #PF can not be handled via SVA fd?



  Therefore, /dev/sva must provide both per-
domain and per-device uAPIs similar to what VFIO/VDPA already
does. Moreover, mapping page fault to subdevice requires pre-
registering subdevice fault data to IOMMU layer when binding
guest page ta

(proposal) RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-10-12 Thread Tian, Kevin
> From: Jason Wang 
> Sent: Monday, September 14, 2020 12:20 PM
>
[...]
 > If it's possible, I would suggest a generic uAPI instead of a VFIO
> specific one.
> 
> Jason suggest something like /dev/sva. There will be a lot of other
> subsystems that could benefit from this (e.g vDPA).
> 
> Have you ever considered this approach?
> 

Hi, Jason,

We did some study on this approach and below is the output. It's a
long writing but I didn't find a way to further abstract w/o losing 
necessary context. Sorry about that.

Overall the real purpose of this series is to enable IOMMU nested
translation capability with vSVA as one major usage, through
below new uAPIs:
1) Report/enable IOMMU nested translation capability;
2) Allocate/free PASID;
3) Bind/unbind guest page table;
4) Invalidate IOMMU cache;
5) Handle IOMMU page request/response (not in this series);
1/3/4) is the minimal set for using IOMMU nested translation, with 
the other two optional. For example, the guest may enable vSVA on 
a device without using PASID. Or, it may bind its gIOVA page table 
which doesn't require page fault support. Finally, all operations can 
be applied to either physical device or subdevice.

Then we evaluated each uAPI whether generalizing it is a good thing 
both in concept and regarding to complexity.

First, unlike other uAPIs which are all backed by iommu_ops, PASID 
allocation/free is through the IOASID sub-system. From this angle
we feel generalizing PASID management does make some sense. 
First, PASID is just a number and not related to any device before 
it's bound to a page table and IOMMU domain. Second, PASID is a 
global resource (at least on Intel VT-d), while having separate VFIO/
VDPA allocation interfaces may easily cause confusion in userspace,
e.g. which interface to be used if both VFIO/VDPA devices exist. 
Moreover, an unified interface allows centralized control over how 
many PASIDs are allowed per process.

One unclear part with this generalization is about the permission.
Do we open this interface to any process or only to those which
have assigned devices? If the latter, what would be the mechanism
to coordinate between this new interface and specific passthrough 
frameworks? A more tricky case, vSVA support on ARM (Eric/Jean
please correct me) plans to do per-device PASID namespace which
is built on a bind_pasid_table iommu callback to allow guest fully 
manage its PASIDs on a given passthrough device. I'm not sure 
how such requirement can be unified w/o involving passthrough
frameworks, or whether ARM could also switch to global PASID 
style...

Second, IOMMU nested translation is a per IOMMU domain
capability. Since IOMMU domains are managed by VFIO/VDPA
 (alloc/free domain, attach/detach device, set/get domain attribute,
etc.), reporting/enabling the nesting capability is an natural 
extension to the domain uAPI of existing passthrough frameworks. 
Actually, VFIO already includes a nesting enable interface even 
before this series. So it doesn't make sense to generalize this uAPI 
out.

Then the tricky part comes with the remaining operations (3/4/5),
which are all backed by iommu_ops thus effective only within an 
IOMMU domain. To generalize them, the first thing is to find a way 
to associate the sva_FD (opened through generic /dev/sva) with an 
IOMMU domain that is created by VFIO/VDPA. The second thing is 
to replicate {domain<->device/subdevice} association in /dev/sva 
path because some operations (e.g. page fault) is triggered/handled 
per device/subdevice. Therefore, /dev/sva must provide both per-
domain and per-device uAPIs similar to what VFIO/VDPA already 
does. Moreover, mapping page fault to subdevice requires pre-
registering subdevice fault data to IOMMU layer when binding 
guest page table, while such fault data can be only retrieved from 
parent driver through VFIO/VDPA. 

However, we failed to find a good way even at the 1st step about
domain association. The iommu domains are not exposed to the
userspace, and there is no 1:1 mapping between domain and device.
In VFIO, all devices within the same VFIO container share the address
space but they may be organized in multiple IOMMU domains based
on their bus type. How (should we let) the userspace know the
domain information and open an sva_FD for each domain is the main
problem here.

In the end we just realized that doing such generalization doesn't
really lead to a clear design and instead requires tight coordination 
between /dev/sva and VFIO/VDPA for almost every new uAPI 
(especially about synchronization when the domain/device 
association is changed or when the device/subdevice is being reset/
drained). Finally it may become a usability burden to the userspace
on proper use of the two interfaces on the assigned device.
 
Based on above analysis we feel that just generalizing PASID mgmt.
might be a good thing to look at while the remaining operations are 
better being VFIO/VDPA specific u

Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-09-17 Thread Jason Wang


On 2020/9/18 上午2:17, Jacob Pan (Jun) wrote:

Hi Jason,
On Thu, 17 Sep 2020 11:53:49 +0800, Jason Wang 
wrote:


On 2020/9/17 上午7:09, Jacob Pan (Jun) wrote:

Hi Jason,
On Wed, 16 Sep 2020 15:38:41 -0300, Jason Gunthorpe 
wrote:
  

On Wed, Sep 16, 2020 at 11:21:10AM -0700, Jacob Pan (Jun) wrote:

Hi Jason,
On Wed, 16 Sep 2020 14:01:13 -0300, Jason Gunthorpe
 wrote:
  

On Wed, Sep 16, 2020 at 09:33:43AM -0700, Raj, Ashok wrote:

On Wed, Sep 16, 2020 at 12:07:54PM -0300, Jason Gunthorpe
wrote:

On Tue, Sep 15, 2020 at 05:22:26PM -0700, Jacob Pan (Jun)
wrote:

If user space wants to bind page tables, create the PASID
with /dev/sva, use ioctls there to setup the page table
the way it wants, then pass the now configured PASID to a
driver that can use it.

Are we talking about bare metal SVA?

What a weird term.

Glad you noticed it at v7 :-)

Any suggestions on something less weird than
Shared Virtual Addressing? There is a reason why we moved from
SVM to SVA.

SVA is fine, what is "bare metal" supposed to mean?
  

What I meant here is sharing virtual address between DMA and host
process. This requires devices perform DMA request with PASID and
use IOMMU first level/stage 1 page tables.
This can be further divided into 1) user SVA 2) supervisor SVA
(sharing init_mm)

My point is that /dev/sva is not useful here since the driver can
perform PASID allocation while doing SVA bind.

No, you are thinking too small.

Look at VDPA, it has a SVA uAPI. Some HW might use PASID for the
SVA.

Could you point to me the SVA UAPI? I couldn't find it in the
mainline. Seems VDPA uses VHOST interface?


It's the vhost_iotlb_msg defined in uapi/linux/vhost_types.h.


Thanks for the pointer, for complete vSVA functionality we would need
1 TLB flush (IOTLB and PASID cache etc.)
2 PASID alloc/free
3 bind/unbind page tables or PASID tables
4 Page request service

Seems vhost_iotlb_msg can be used for #1 partially. And the
proposal is to pluck out the rest into /dev/sda? Seems awkward as Alex
pointed out earlier for similar situation in VFIO.



Consider it doesn't have any PASID support yet, my understanding is that 
if we go with /dev/sva:


- vhost uAPI will still keep the uAPI for associating an ASID to a 
specific virtqueue

- except for this, we can use /dev/sva for all the rest (P)ASID operations




  

When VDPA is used by DPDK it makes sense that the PASID will be SVA
and 1:1 with the mm_struct.
  

I still don't see why bare metal DPDK needs to get a handle of the
PASID.


My understanding is that it may:

- have a unified uAPI with vSVA: alloc, bind, unbind, free

Got your point, but vSVA needs more than these



Yes it's just a subset of what vSVA required.





- leave the binding policy to userspace instead of the using a
implied one in the kenrel


Only if necessary.



Yes, I think it's all about visibility(flexibility) and**manageability.

Consider device has queue A, B, C. We will only dedicated queue A, B for 
one PASID(for vSVA) and C with another PASID(for SVA). It looks to me 
the current sva_bind() API doesn't support this. We still need an API 
for allocating a PASID for SVA and assign it to the (mediated) device.  
This case is pretty common for implementing a shadow queue for a guest.






Perhaps the SVA patch would explain. Or are you talking about
vDPA DPDK process that is used to support virtio-net-pmd in the
guest?

When VDPA is used by qemu it makes sense that the PASID will be an
arbitary IOVA map constructed to be 1:1 with the guest vCPU
physical map. /dev/sva allows a single uAPI to do this kind of
setup, and qemu can support it while supporting a range of SVA
kernel drivers. VDPA and vfio-mdev are obvious initial targets.

*BOTH* are needed.

In general any uAPI for PASID should have the option to use either
the mm_struct SVA PASID *OR* a PASID from /dev/sva. It costs
virtually nothing to implement this in the driver as PASID is just
a number, and gives so much more flexability.
  

Not really nothing in terms of PASID life cycles. For example, if
user uses uacce interface to open an accelerator, it gets an
FD_acc. Then it opens /dev/sva to allocate PASID then get another
FD_pasid. Then we pass FD_pasid to the driver to bind page tables,
perhaps multiple drivers. Now we have to worry about If FD_pasid
gets closed before FD_acc(s) closed and all these race conditions.


I'm not sure I understand this. But this demonstrates the flexibility
of an unified uAPI. E.g it allows vDPA and VFIO device to use the
same PAISD which can be shared with a process in the guest.


This is for user DMA not for vSVA. I was contending that /dev/sva
creates unnecessary steps for such usage.



A question here is where the PASID management is expected to be done. 
I'm not quite sure the silent 1:1 binding done in intel_svm_bind_mm() 
can satisfy the requirement for management layer.





For vSVA, I think vDPA and VFIO can potentially share but I am not
seeing convincing benefits.

If a guest process wants to

Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-09-17 Thread Jacob Pan (Jun)
Hi Jason,
On Thu, 17 Sep 2020 11:53:49 +0800, Jason Wang 
wrote:

> On 2020/9/17 上午7:09, Jacob Pan (Jun) wrote:
> > Hi Jason,
> > On Wed, 16 Sep 2020 15:38:41 -0300, Jason Gunthorpe 
> > wrote:
> >  
> >> On Wed, Sep 16, 2020 at 11:21:10AM -0700, Jacob Pan (Jun) wrote:  
> >>> Hi Jason,
> >>> On Wed, 16 Sep 2020 14:01:13 -0300, Jason Gunthorpe
> >>>  wrote:
> >>>  
>  On Wed, Sep 16, 2020 at 09:33:43AM -0700, Raj, Ashok wrote:  
> > On Wed, Sep 16, 2020 at 12:07:54PM -0300, Jason Gunthorpe
> > wrote:  
> >> On Tue, Sep 15, 2020 at 05:22:26PM -0700, Jacob Pan (Jun)
> >> wrote:  
>  If user space wants to bind page tables, create the PASID
>  with /dev/sva, use ioctls there to setup the page table
>  the way it wants, then pass the now configured PASID to a
>  driver that can use it.  
> >>> Are we talking about bare metal SVA?  
> >> What a weird term.  
> > Glad you noticed it at v7 :-)
> >
> > Any suggestions on something less weird than
> > Shared Virtual Addressing? There is a reason why we moved from
> > SVM to SVA.  
>  SVA is fine, what is "bare metal" supposed to mean?
>   
> >>> What I meant here is sharing virtual address between DMA and host
> >>> process. This requires devices perform DMA request with PASID and
> >>> use IOMMU first level/stage 1 page tables.
> >>> This can be further divided into 1) user SVA 2) supervisor SVA
> >>> (sharing init_mm)
> >>>
> >>> My point is that /dev/sva is not useful here since the driver can
> >>> perform PASID allocation while doing SVA bind.  
> >> No, you are thinking too small.
> >>
> >> Look at VDPA, it has a SVA uAPI. Some HW might use PASID for the
> >> SVA. 
> > Could you point to me the SVA UAPI? I couldn't find it in the
> > mainline. Seems VDPA uses VHOST interface?  
> 
> 
> It's the vhost_iotlb_msg defined in uapi/linux/vhost_types.h.
> 
Thanks for the pointer, for complete vSVA functionality we would need
1 TLB flush (IOTLB and PASID cache etc.)
2 PASID alloc/free
3 bind/unbind page tables or PASID tables
4 Page request service

Seems vhost_iotlb_msg can be used for #1 partially. And the
proposal is to pluck out the rest into /dev/sda? Seems awkward as Alex
pointed out earlier for similar situation in VFIO.

> 
> >  
> >> When VDPA is used by DPDK it makes sense that the PASID will be SVA
> >> and 1:1 with the mm_struct.
> >>  
> > I still don't see why bare metal DPDK needs to get a handle of the
> > PASID.  
> 
> 
> My understanding is that it may:
> 
> - have a unified uAPI with vSVA: alloc, bind, unbind, free
Got your point, but vSVA needs more than these

> - leave the binding policy to userspace instead of the using a
> implied one in the kenrel
> 
Only if necessary.

> 
> > Perhaps the SVA patch would explain. Or are you talking about
> > vDPA DPDK process that is used to support virtio-net-pmd in the
> > guest? 
> >> When VDPA is used by qemu it makes sense that the PASID will be an
> >> arbitary IOVA map constructed to be 1:1 with the guest vCPU
> >> physical map. /dev/sva allows a single uAPI to do this kind of
> >> setup, and qemu can support it while supporting a range of SVA
> >> kernel drivers. VDPA and vfio-mdev are obvious initial targets.
> >>
> >> *BOTH* are needed.
> >>
> >> In general any uAPI for PASID should have the option to use either
> >> the mm_struct SVA PASID *OR* a PASID from /dev/sva. It costs
> >> virtually nothing to implement this in the driver as PASID is just
> >> a number, and gives so much more flexability.
> >>  
> > Not really nothing in terms of PASID life cycles. For example, if
> > user uses uacce interface to open an accelerator, it gets an
> > FD_acc. Then it opens /dev/sva to allocate PASID then get another
> > FD_pasid. Then we pass FD_pasid to the driver to bind page tables,
> > perhaps multiple drivers. Now we have to worry about If FD_pasid
> > gets closed before FD_acc(s) closed and all these race conditions.  
> 
> 
> I'm not sure I understand this. But this demonstrates the flexibility
> of an unified uAPI. E.g it allows vDPA and VFIO device to use the
> same PAISD which can be shared with a process in the guest.
> 
This is for user DMA not for vSVA. I was contending that /dev/sva
creates unnecessary steps for such usage.

For vSVA, I think vDPA and VFIO can potentially share but I am not
seeing convincing benefits.

If a guest process wants to do SVA with a VFIO assigned device and a
vDPA-backed virtio-net at the same time, it might be a limitation if
PASID is not managed via a common interface. But I am not sure how vDPA
SVA support will look like, does it support gIOVA? need virtio IOMMU?

> For the race condition, it could be probably solved with refcnt.
> 
Agreed but the best solution might be not to have the problem in the
first place :)

> Thanks
> 
> 
> >
> > If we do not expose FD_pasid to the user, the teardown is much
> > simpler and streamlined. Following each FD_acc close, PASID unbind
>

Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-09-17 Thread Jason Gunthorpe
On Thu, Sep 17, 2020 at 11:53:49AM +0800, Jason Wang wrote:
> > > When VDPA is used by qemu it makes sense that the PASID will be an
> > > arbitary IOVA map constructed to be 1:1 with the guest vCPU physical
> > > map. /dev/sva allows a single uAPI to do this kind of setup, and qemu
> > > can support it while supporting a range of SVA kernel drivers. VDPA
> > > and vfio-mdev are obvious initial targets.
> > > 
> > > *BOTH* are needed.
> > > 
> > > In general any uAPI for PASID should have the option to use either the
> > > mm_struct SVA PASID *OR* a PASID from /dev/sva. It costs virtually
> > > nothing to implement this in the driver as PASID is just a number, and
> > > gives so much more flexability.
> > > 
> > Not really nothing in terms of PASID life cycles. For example, if user
> > uses uacce interface to open an accelerator, it gets an FD_acc. Then it
> > opens /dev/sva to allocate PASID then get another FD_pasid. Then we
> > pass FD_pasid to the driver to bind page tables, perhaps multiple
> > drivers. Now we have to worry about If FD_pasid gets closed before
> > FD_acc(s) closed and all these race conditions.
> 
> 
> I'm not sure I understand this. But this demonstrates the flexibility of an
> unified uAPI. E.g it allows vDPA and VFIO device to use the same PAISD which
> can be shared with a process in the guest.
> 
> For the race condition, it could be probably solved with refcnt.

Yep

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-09-16 Thread Tian, Kevin
> From: Jason Gunthorpe 
> Sent: Wednesday, September 16, 2020 10:45 PM
> 
> On Wed, Sep 16, 2020 at 01:19:18AM +, Tian, Kevin wrote:
> > > From: Jason Gunthorpe 
> > > Sent: Tuesday, September 15, 2020 10:29 PM
> > >
> > > > Do they need a device at all?  It's not clear to me why RID based
> > > > IOMMU management fits within vfio's scope, but PASID based does not.
> > >
> > > In RID mode vfio-pci completely owns the PCI function, so it is more
> > > natural that VFIO, as the sole device owner, would own the DMA
> mapping
> > > machinery. Further, the RID IOMMU mode is rarely used outside of VFIO
> > > so there is not much reason to try and disaggregate the API.
> >
> > It is also used by vDPA.
> 
> A driver in VDPA, not VDPA itself.

what is the difference? It is still the example of using RID IOMMU mode
outside of VFIO (and just implies that vDPA even doesn't do a good
abstraction internally).

> 
> > > PASID on the other hand, is shared. vfio-mdev drivers will share the
> > > device with other kernel drivers. PASID and DMA will be concurrent
> > > with VFIO and other kernel drivers/etc.
> >
> > Looks you are equating PASID to host-side sharing, while ignoring
> > another valid usage that a PASID-capable device is passed through
> > to the guest through vfio-pci and then PASID is used by the guest
> > for guest-side sharing. In such case, it is an exclusive usage in host
> > side and then what is the problem for VFIO to manage PASID given
> > that vfio-pci completely owns the function?
> 
> This is no different than vfio-pci being yet another client to
> /dev/sva
> 

My comment was to echo Alex's question about "why RID based
IOMMU management fits within vfio's scope, but PASID based 
does not". and when talking about generalization we should look
bigger beyond sva. What really matters here is the iommu_domain
which is about everything related to DMA mapping. The domain
associated with a passthru device is marked as "unmanaged" in 
kernel and allows userspace to manage DMA mapping of this 
device through a set of iommu_ops:

- alloc/free domain;
- attach/detach device/subdevice;
- map/unmap a memory region;
- bind/unbind page table and invalidate iommu cache;
- ... (and lots of other callbacks)

map/unmap or bind/unbind are just different ways of managing
DMAs in an iommu domain. The passthrough framework (VFIO 
or VDPA) has been providing its uAPI to manage every aspect of 
iommu_domain so far, and sva is just a natural extension following 
this design. If we really want to generalize something, it needs to 
be /dev/iommu as an unified interface for managing every aspect 
of iommu_domain. Asking SVA abstraction alone just causes 
unnecessary mess to both kernel (sync domain/device association 
between /dev/vfio and /dev/sva) and userspace (talk to two 
interfaces even for same vfio-pci device). Then it sounds like more 
like a bandaid for saving development effort in VDPA (which 
instead should go proposing /dev/iommu when it was invented 
instead of reinventing its own bits until such effort is unaffordable 
and then ask for partial abstraction to fix its gap).

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-09-16 Thread Jason Wang


On 2020/9/17 上午7:09, Jacob Pan (Jun) wrote:

Hi Jason,
On Wed, 16 Sep 2020 15:38:41 -0300, Jason Gunthorpe 
wrote:


On Wed, Sep 16, 2020 at 11:21:10AM -0700, Jacob Pan (Jun) wrote:

Hi Jason,
On Wed, 16 Sep 2020 14:01:13 -0300, Jason Gunthorpe 
wrote:
   

On Wed, Sep 16, 2020 at 09:33:43AM -0700, Raj, Ashok wrote:

On Wed, Sep 16, 2020 at 12:07:54PM -0300, Jason Gunthorpe
wrote:

On Tue, Sep 15, 2020 at 05:22:26PM -0700, Jacob Pan (Jun)
wrote:

If user space wants to bind page tables, create the PASID
with /dev/sva, use ioctls there to setup the page table
the way it wants, then pass the now configured PASID to a
driver that can use it.

Are we talking about bare metal SVA?

What a weird term.

Glad you noticed it at v7 :-)

Any suggestions on something less weird than
Shared Virtual Addressing? There is a reason why we moved from
SVM to SVA.

SVA is fine, what is "bare metal" supposed to mean?
   

What I meant here is sharing virtual address between DMA and host
process. This requires devices perform DMA request with PASID and
use IOMMU first level/stage 1 page tables.
This can be further divided into 1) user SVA 2) supervisor SVA
(sharing init_mm)

My point is that /dev/sva is not useful here since the driver can
perform PASID allocation while doing SVA bind.

No, you are thinking too small.

Look at VDPA, it has a SVA uAPI. Some HW might use PASID for the SVA.


Could you point to me the SVA UAPI? I couldn't find it in the mainline.
Seems VDPA uses VHOST interface?



It's the vhost_iotlb_msg defined in uapi/linux/vhost_types.h.





When VDPA is used by DPDK it makes sense that the PASID will be SVA
and 1:1 with the mm_struct.


I still don't see why bare metal DPDK needs to get a handle of the
PASID.



My understanding is that it may:

- have a unified uAPI with vSVA: alloc, bind, unbind, free
- leave the binding policy to userspace instead of the using a implied 
one in the kenrel




Perhaps the SVA patch would explain. Or are you talking about
vDPA DPDK process that is used to support virtio-net-pmd in the guest?


When VDPA is used by qemu it makes sense that the PASID will be an
arbitary IOVA map constructed to be 1:1 with the guest vCPU physical
map. /dev/sva allows a single uAPI to do this kind of setup, and qemu
can support it while supporting a range of SVA kernel drivers. VDPA
and vfio-mdev are obvious initial targets.

*BOTH* are needed.

In general any uAPI for PASID should have the option to use either the
mm_struct SVA PASID *OR* a PASID from /dev/sva. It costs virtually
nothing to implement this in the driver as PASID is just a number, and
gives so much more flexability.


Not really nothing in terms of PASID life cycles. For example, if user
uses uacce interface to open an accelerator, it gets an FD_acc. Then it
opens /dev/sva to allocate PASID then get another FD_pasid. Then we
pass FD_pasid to the driver to bind page tables, perhaps multiple
drivers. Now we have to worry about If FD_pasid gets closed before
FD_acc(s) closed and all these race conditions.



I'm not sure I understand this. But this demonstrates the flexibility of 
an unified uAPI. E.g it allows vDPA and VFIO device to use the same 
PAISD which can be shared with a process in the guest.


For the race condition, it could be probably solved with refcnt.

Thanks




If we do not expose FD_pasid to the user, the teardown is much simpler
and streamlined. Following each FD_acc close, PASID unbind is performed.


Yi can correct me but this set is is about VFIO-PCI, VFIO-mdev will
be introduced later.

Last patch is:

   vfio/type1: Add vSVA support for IOMMU-backed mdevs

So pretty hard to see how this is not about vfio-mdev, at least a
little..

Jason


Thanks,

Jacob



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-09-16 Thread Jacob Pan (Jun)
Hi Jason,
On Wed, 16 Sep 2020 15:38:41 -0300, Jason Gunthorpe 
wrote:

> On Wed, Sep 16, 2020 at 11:21:10AM -0700, Jacob Pan (Jun) wrote:
> > Hi Jason,
> > On Wed, 16 Sep 2020 14:01:13 -0300, Jason Gunthorpe 
> > wrote:
> >   
> > > On Wed, Sep 16, 2020 at 09:33:43AM -0700, Raj, Ashok wrote:  
> > > > On Wed, Sep 16, 2020 at 12:07:54PM -0300, Jason Gunthorpe
> > > > wrote:
> > > > > On Tue, Sep 15, 2020 at 05:22:26PM -0700, Jacob Pan (Jun)
> > > > > wrote:
> > > > > > > If user space wants to bind page tables, create the PASID
> > > > > > > with /dev/sva, use ioctls there to setup the page table
> > > > > > > the way it wants, then pass the now configured PASID to a
> > > > > > > driver that can use it. 
> > > > > > 
> > > > > > Are we talking about bare metal SVA? 
> > > > > 
> > > > > What a weird term.
> > > > 
> > > > Glad you noticed it at v7 :-) 
> > > > 
> > > > Any suggestions on something less weird than 
> > > > Shared Virtual Addressing? There is a reason why we moved from
> > > > SVM to SVA.
> > > 
> > > SVA is fine, what is "bare metal" supposed to mean?
> > >   
> > What I meant here is sharing virtual address between DMA and host
> > process. This requires devices perform DMA request with PASID and
> > use IOMMU first level/stage 1 page tables.
> > This can be further divided into 1) user SVA 2) supervisor SVA
> > (sharing init_mm)
> > 
> > My point is that /dev/sva is not useful here since the driver can
> > perform PASID allocation while doing SVA bind.  
> 
> No, you are thinking too small.
> 
> Look at VDPA, it has a SVA uAPI. Some HW might use PASID for the SVA.
> 
Could you point to me the SVA UAPI? I couldn't find it in the mainline.
Seems VDPA uses VHOST interface?

> When VDPA is used by DPDK it makes sense that the PASID will be SVA
> and 1:1 with the mm_struct.
> 
I still don't see why bare metal DPDK needs to get a handle of the
PASID. Perhaps the SVA patch would explain. Or are you talking about
vDPA DPDK process that is used to support virtio-net-pmd in the guest?

> When VDPA is used by qemu it makes sense that the PASID will be an
> arbitary IOVA map constructed to be 1:1 with the guest vCPU physical
> map. /dev/sva allows a single uAPI to do this kind of setup, and qemu
> can support it while supporting a range of SVA kernel drivers. VDPA
> and vfio-mdev are obvious initial targets.
> 
> *BOTH* are needed.
> 
> In general any uAPI for PASID should have the option to use either the
> mm_struct SVA PASID *OR* a PASID from /dev/sva. It costs virtually
> nothing to implement this in the driver as PASID is just a number, and
> gives so much more flexability.
> 
Not really nothing in terms of PASID life cycles. For example, if user
uses uacce interface to open an accelerator, it gets an FD_acc. Then it
opens /dev/sva to allocate PASID then get another FD_pasid. Then we
pass FD_pasid to the driver to bind page tables, perhaps multiple
drivers. Now we have to worry about If FD_pasid gets closed before
FD_acc(s) closed and all these race conditions.

If we do not expose FD_pasid to the user, the teardown is much simpler
and streamlined. Following each FD_acc close, PASID unbind is performed.

> > Yi can correct me but this set is is about VFIO-PCI, VFIO-mdev will
> > be introduced later.  
> 
> Last patch is:
> 
>   vfio/type1: Add vSVA support for IOMMU-backed mdevs
> 
> So pretty hard to see how this is not about vfio-mdev, at least a
> little..
> 
> Jason


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-09-16 Thread Jason Gunthorpe
On Wed, Sep 16, 2020 at 11:21:10AM -0700, Jacob Pan (Jun) wrote:
> Hi Jason,
> On Wed, 16 Sep 2020 14:01:13 -0300, Jason Gunthorpe 
> wrote:
> 
> > On Wed, Sep 16, 2020 at 09:33:43AM -0700, Raj, Ashok wrote:
> > > On Wed, Sep 16, 2020 at 12:07:54PM -0300, Jason Gunthorpe wrote:  
> > > > On Tue, Sep 15, 2020 at 05:22:26PM -0700, Jacob Pan (Jun) wrote:  
> > > > > > If user space wants to bind page tables, create the PASID with
> > > > > > /dev/sva, use ioctls there to setup the page table the way it
> > > > > > wants, then pass the now configured PASID to a driver that
> > > > > > can use it.   
> > > > > 
> > > > > Are we talking about bare metal SVA?   
> > > > 
> > > > What a weird term.  
> > > 
> > > Glad you noticed it at v7 :-) 
> > > 
> > > Any suggestions on something less weird than 
> > > Shared Virtual Addressing? There is a reason why we moved from SVM
> > > to SVA.  
> > 
> > SVA is fine, what is "bare metal" supposed to mean?
> > 
> What I meant here is sharing virtual address between DMA and host
> process. This requires devices perform DMA request with PASID and use
> IOMMU first level/stage 1 page tables.
> This can be further divided into 1) user SVA 2) supervisor SVA (sharing
> init_mm)
> 
> My point is that /dev/sva is not useful here since the driver can
> perform PASID allocation while doing SVA bind.

No, you are thinking too small.

Look at VDPA, it has a SVA uAPI. Some HW might use PASID for the SVA.

When VDPA is used by DPDK it makes sense that the PASID will be SVA and
1:1 with the mm_struct.

When VDPA is used by qemu it makes sense that the PASID will be an
arbitary IOVA map constructed to be 1:1 with the guest vCPU physical
map. /dev/sva allows a single uAPI to do this kind of setup, and qemu
can support it while supporting a range of SVA kernel drivers. VDPA
and vfio-mdev are obvious initial targets.

*BOTH* are needed.

In general any uAPI for PASID should have the option to use either the
mm_struct SVA PASID *OR* a PASID from /dev/sva. It costs virtually
nothing to implement this in the driver as PASID is just a number, and
gives so much more flexability.

> Yi can correct me but this set is is about VFIO-PCI, VFIO-mdev will be
> introduced later.

Last patch is:

  vfio/type1: Add vSVA support for IOMMU-backed mdevs

So pretty hard to see how this is not about vfio-mdev, at least a
little..

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-09-16 Thread Jacob Pan (Jun)
Hi Jason,
On Wed, 16 Sep 2020 14:01:13 -0300, Jason Gunthorpe 
wrote:

> On Wed, Sep 16, 2020 at 09:33:43AM -0700, Raj, Ashok wrote:
> > On Wed, Sep 16, 2020 at 12:07:54PM -0300, Jason Gunthorpe wrote:  
> > > On Tue, Sep 15, 2020 at 05:22:26PM -0700, Jacob Pan (Jun) wrote:  
> > > > > If user space wants to bind page tables, create the PASID with
> > > > > /dev/sva, use ioctls there to setup the page table the way it
> > > > > wants, then pass the now configured PASID to a driver that
> > > > > can use it.   
> > > > 
> > > > Are we talking about bare metal SVA?   
> > > 
> > > What a weird term.  
> > 
> > Glad you noticed it at v7 :-) 
> > 
> > Any suggestions on something less weird than 
> > Shared Virtual Addressing? There is a reason why we moved from SVM
> > to SVA.  
> 
> SVA is fine, what is "bare metal" supposed to mean?
> 
What I meant here is sharing virtual address between DMA and host
process. This requires devices perform DMA request with PASID and use
IOMMU first level/stage 1 page tables.
This can be further divided into 1) user SVA 2) supervisor SVA (sharing
init_mm)

My point is that /dev/sva is not useful here since the driver can
perform PASID allocation while doing SVA bind.

> PASID is about constructing an arbitary DMA IOVA map for PCI-E
> devices, being able to intercept device DMA faults, etc.
> 
An arbitrary IOVA map does not need PASID. In IOVA, you do map/unmap
explicitly, why you need to handle IO page fault?

To me, PASID identifies an address space that is associated with a
mm_struct.

> SVA is doing DMA IOVA 1:1 with the mm_struct CPU VA. DMA faults
> trigger the same thing as CPU page faults. If is it not 1:1 then there
> is no "shared". When SVA is done using PCI-E PASID it is "PASID for
> SVA". Lots of existing devices already have SVA without PASID or
> IOMMU, so lets not muddy the terminology.
> 
I agree. This conversation is about "PASID for SVA" not "SVA without
PASID"


> vPASID/vIOMMU is allowing a guest to control the DMA IOVA map and
> manipulate the PASIDs.
> 
> vSVA is when a guest uses a vPASID to provide SVA, not sure this is
> an informative term.
> 
I agree.

> This particular patch series seems to be about vPASID/vIOMMU for
> vfio-mdev vs the other vPASID/vIOMMU patch which was about vPASID for
> vfio-pci.
> 
Yi can correct me but this set is is about VFIO-PCI, VFIO-mdev will be
introduced later.

> > > > If so, I don't see the need for userspace to know there is a
> > > > PASID. All user space need is that my current mm is bound to a
> > > > device by the driver. So it can be a one-step process for user
> > > > instead of two.  
> > > 
> > > You've missed the entire point of the conversation, VDPA already
> > > needs more than "my current mm is bound to a device"  
> > 
> > You mean current version of vDPA? or a potential future version of
> > vDPA?  
> 
> Future VDPA drivers, it was made clear this was important to Intel
> during the argument about VDPA as a mdev.
> 
> Jason


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-09-16 Thread Jason Gunthorpe
On Wed, Sep 16, 2020 at 09:33:43AM -0700, Raj, Ashok wrote:
> On Wed, Sep 16, 2020 at 12:07:54PM -0300, Jason Gunthorpe wrote:
> > On Tue, Sep 15, 2020 at 05:22:26PM -0700, Jacob Pan (Jun) wrote:
> > > > If user space wants to bind page tables, create the PASID with
> > > > /dev/sva, use ioctls there to setup the page table the way it wants,
> > > > then pass the now configured PASID to a driver that can use it. 
> > > 
> > > Are we talking about bare metal SVA? 
> > 
> > What a weird term.
> 
> Glad you noticed it at v7 :-) 
> 
> Any suggestions on something less weird than 
> Shared Virtual Addressing? There is a reason why we moved from SVM
> to SVA.

SVA is fine, what is "bare metal" supposed to mean?

PASID is about constructing an arbitary DMA IOVA map for PCI-E
devices, being able to intercept device DMA faults, etc.

SVA is doing DMA IOVA 1:1 with the mm_struct CPU VA. DMA faults
trigger the same thing as CPU page faults. If is it not 1:1 then there
is no "shared". When SVA is done using PCI-E PASID it is "PASID for
SVA". Lots of existing devices already have SVA without PASID or
IOMMU, so lets not muddy the terminology.

vPASID/vIOMMU is allowing a guest to control the DMA IOVA map and
manipulate the PASIDs.

vSVA is when a guest uses a vPASID to provide SVA, not sure this is
an informative term.

This particular patch series seems to be about vPASID/vIOMMU for vfio-mdev
vs the other vPASID/vIOMMU patch which was about vPASID for vfio-pci.

> > > If so, I don't see the need for userspace to know there is a
> > > PASID. All user space need is that my current mm is bound to a
> > > device by the driver. So it can be a one-step process for user
> > > instead of two.
> > 
> > You've missed the entire point of the conversation, VDPA already needs
> > more than "my current mm is bound to a device"
> 
> You mean current version of vDPA? or a potential future version of vDPA?

Future VDPA drivers, it was made clear this was important to Intel
during the argument about VDPA as a mdev.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-09-16 Thread Auger Eric
Hi,
On 9/16/20 6:32 PM, Jason Gunthorpe wrote:
> On Wed, Sep 16, 2020 at 06:20:52PM +0200, Jean-Philippe Brucker wrote:
>> On Wed, Sep 16, 2020 at 11:51:48AM -0300, Jason Gunthorpe wrote:
>>> On Wed, Sep 16, 2020 at 10:32:17AM +0200, Jean-Philippe Brucker wrote:
 And this is the only PASID model for Arm SMMU (and AMD IOMMU, I believe):
 the PASID space of a PCI function cannot be shared between host and guest,
 so we assign the whole PASID table along with the RID. Since we need the
 BIND, INVALIDATE, and report APIs introduced here to support nested
 translation, a /dev/sva interface would need to support this mode as well.
>>>
>>> Well, that means this HW cannot support PASID capable 'SIOV' style
>>> devices in guests.
>>
>> It does not yet support Intel SIOV, no. It does support the standards,
>> though: PCI SR-IOV to partition a device and PASIDs in a guest.
> 
> SIOV is basically standards based, it is better thought of as a
> cookbook on how to use PASID and IOMMU together.
> 
>>> I admit whole function PASID delegation might be something vfio-pci
>>> should handle - but only if it really doesn't fit in some /dev/sva
>>> after we cover the other PASID cases.
>>
>> Wouldn't that be the duplication you're trying to avoid?  A second
>> channel for bind, invalidate, capability and fault reporting
>> mechanisms?
> 
> Yes, which is why it seems like it would be nicer to avoid it. Why I
> said "might" :)
> 
>> If we extract SVA parts of vfio_iommu_type1 into a separate chardev,
>> PASID table pass-through [1] will have to use that.
> 
> Yes, '/dev/sva' (which is a terrible name) would want to be the uAPI
> entry point for controlling the vIOMMU related to PASID.
> 
> Does anything in the [1] series have tight coupling to VFIO other than
> needing to know a bus/device/function? It looks like it is mostly
> exposing iommu_* functions as uAPI?

this series does not use any PASID so it fits quite nicely into the VFIO
framework I think. Besides cache invalidation that takes the struct
device, other operations (MSI binding and PASID table passing operate on
the iommu domain). Also we use the VFIO memory region and
interrupt/eventfd registration mechanism to return faults.

Thanks

Eric
> 
> Jason
> 

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-09-16 Thread Raj, Ashok
On Wed, Sep 16, 2020 at 12:07:54PM -0300, Jason Gunthorpe wrote:
> On Tue, Sep 15, 2020 at 05:22:26PM -0700, Jacob Pan (Jun) wrote:
> > > If user space wants to bind page tables, create the PASID with
> > > /dev/sva, use ioctls there to setup the page table the way it wants,
> > > then pass the now configured PASID to a driver that can use it. 
> > 
> > Are we talking about bare metal SVA? 
> 
> What a weird term.

Glad you noticed it at v7 :-) 

Any suggestions on something less weird than 
Shared Virtual Addressing? There is a reason why we moved from SVM to SVA.
> 
> > If so, I don't see the need for userspace to know there is a
> > PASID. All user space need is that my current mm is bound to a
> > device by the driver. So it can be a one-step process for user
> > instead of two.
> 
> You've missed the entire point of the conversation, VDPA already needs
> more than "my current mm is bound to a device"

You mean current version of vDPA? or a potential future version of vDPA?

Cheers,
Ashok
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-09-16 Thread Jason Gunthorpe
On Wed, Sep 16, 2020 at 06:20:52PM +0200, Jean-Philippe Brucker wrote:
> On Wed, Sep 16, 2020 at 11:51:48AM -0300, Jason Gunthorpe wrote:
> > On Wed, Sep 16, 2020 at 10:32:17AM +0200, Jean-Philippe Brucker wrote:
> > > And this is the only PASID model for Arm SMMU (and AMD IOMMU, I believe):
> > > the PASID space of a PCI function cannot be shared between host and guest,
> > > so we assign the whole PASID table along with the RID. Since we need the
> > > BIND, INVALIDATE, and report APIs introduced here to support nested
> > > translation, a /dev/sva interface would need to support this mode as well.
> > 
> > Well, that means this HW cannot support PASID capable 'SIOV' style
> > devices in guests.
> 
> It does not yet support Intel SIOV, no. It does support the standards,
> though: PCI SR-IOV to partition a device and PASIDs in a guest.

SIOV is basically standards based, it is better thought of as a
cookbook on how to use PASID and IOMMU together.

> > I admit whole function PASID delegation might be something vfio-pci
> > should handle - but only if it really doesn't fit in some /dev/sva
> > after we cover the other PASID cases.
> 
> Wouldn't that be the duplication you're trying to avoid?  A second
> channel for bind, invalidate, capability and fault reporting
> mechanisms?

Yes, which is why it seems like it would be nicer to avoid it. Why I
said "might" :)

> If we extract SVA parts of vfio_iommu_type1 into a separate chardev,
> PASID table pass-through [1] will have to use that.

Yes, '/dev/sva' (which is a terrible name) would want to be the uAPI
entry point for controlling the vIOMMU related to PASID.

Does anything in the [1] series have tight coupling to VFIO other than
needing to know a bus/device/function? It looks like it is mostly
exposing iommu_* functions as uAPI?

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-09-16 Thread Jean-Philippe Brucker
On Wed, Sep 16, 2020 at 11:51:48AM -0300, Jason Gunthorpe wrote:
> On Wed, Sep 16, 2020 at 10:32:17AM +0200, Jean-Philippe Brucker wrote:
> > And this is the only PASID model for Arm SMMU (and AMD IOMMU, I believe):
> > the PASID space of a PCI function cannot be shared between host and guest,
> > so we assign the whole PASID table along with the RID. Since we need the
> > BIND, INVALIDATE, and report APIs introduced here to support nested
> > translation, a /dev/sva interface would need to support this mode as well.
> 
> Well, that means this HW cannot support PASID capable 'SIOV' style
> devices in guests.

It does not yet support Intel SIOV, no. It does support the standards,
though: PCI SR-IOV to partition a device and PASIDs in a guest.

> I admit whole function PASID delegation might be something vfio-pci
> should handle - but only if it really doesn't fit in some /dev/sva
> after we cover the other PASID cases.

Wouldn't that be the duplication you're trying to avoid?  A second channel
for bind, invalidate, capability and fault reporting mechanisms?  If we
extract SVA parts of vfio_iommu_type1 into a separate chardev, PASID table
pass-through [1] will have to use that.

Thanks,
Jean

[1] 
https://lore.kernel.org/linux-iommu/20200320161911.27494-1-eric.au...@redhat.com/
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-09-16 Thread Jason Gunthorpe
On Tue, Sep 15, 2020 at 05:22:26PM -0700, Jacob Pan (Jun) wrote:
> > If user space wants to bind page tables, create the PASID with
> > /dev/sva, use ioctls there to setup the page table the way it wants,
> > then pass the now configured PASID to a driver that can use it. 
> 
> Are we talking about bare metal SVA? 

What a weird term.

> If so, I don't see the need for userspace to know there is a
> PASID. All user space need is that my current mm is bound to a
> device by the driver. So it can be a one-step process for user
> instead of two.

You've missed the entire point of the conversation, VDPA already needs
more than "my current mm is bound to a device"

> > PASID managment and binding is seperated from the driver(s) that are
> > using the PASID.
> 
> Why separate? Drivers need to be involved in PASID life cycle
> management. For example, when tearing down a PASID, the driver needs to
> stop DMA, IOMMU driver needs to unbind, etc. If driver is the control
> point, then things are just in order. I am referring to bare metal SVA.

Drivers can be involved and still have the uAPIs seperate. It isn't hard.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-09-16 Thread Jason Gunthorpe
On Wed, Sep 16, 2020 at 10:32:17AM +0200, Jean-Philippe Brucker wrote:
> And this is the only PASID model for Arm SMMU (and AMD IOMMU, I believe):
> the PASID space of a PCI function cannot be shared between host and guest,
> so we assign the whole PASID table along with the RID. Since we need the
> BIND, INVALIDATE, and report APIs introduced here to support nested
> translation, a /dev/sva interface would need to support this mode as well.

Well, that means this HW cannot support PASID capable 'SIOV' style
devices in guests.

I admit whole function PASID delegation might be something vfio-pci
should handle - but only if it really doesn't fit in some /dev/sva
after we cover the other PASID cases.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-09-16 Thread Jason Gunthorpe
On Wed, Sep 16, 2020 at 01:19:18AM +, Tian, Kevin wrote:
> > From: Jason Gunthorpe 
> > Sent: Tuesday, September 15, 2020 10:29 PM
> >
> > > Do they need a device at all?  It's not clear to me why RID based
> > > IOMMU management fits within vfio's scope, but PASID based does not.
> > 
> > In RID mode vfio-pci completely owns the PCI function, so it is more
> > natural that VFIO, as the sole device owner, would own the DMA mapping
> > machinery. Further, the RID IOMMU mode is rarely used outside of VFIO
> > so there is not much reason to try and disaggregate the API.
> 
> It is also used by vDPA.

A driver in VDPA, not VDPA itself.

> > PASID on the other hand, is shared. vfio-mdev drivers will share the
> > device with other kernel drivers. PASID and DMA will be concurrent
> > with VFIO and other kernel drivers/etc.
> 
> Looks you are equating PASID to host-side sharing, while ignoring 
> another valid usage that a PASID-capable device is passed through
> to the guest through vfio-pci and then PASID is used by the guest
> for guest-side sharing. In such case, it is an exclusive usage in host
> side and then what is the problem for VFIO to manage PASID given
> that vfio-pci completely owns the function?

This is no different than vfio-pci being yet another client to
/dev/sva

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-09-16 Thread Jean-Philippe Brucker
On Wed, Sep 16, 2020 at 01:19:18AM +, Tian, Kevin wrote:
> > From: Jason Gunthorpe 
> > Sent: Tuesday, September 15, 2020 10:29 PM
> >
> > > Do they need a device at all?  It's not clear to me why RID based
> > > IOMMU management fits within vfio's scope, but PASID based does not.
> > 
> > In RID mode vfio-pci completely owns the PCI function, so it is more
> > natural that VFIO, as the sole device owner, would own the DMA mapping
> > machinery. Further, the RID IOMMU mode is rarely used outside of VFIO
> > so there is not much reason to try and disaggregate the API.
> 
> It is also used by vDPA.
> 
> > 
> > PASID on the other hand, is shared. vfio-mdev drivers will share the
> > device with other kernel drivers. PASID and DMA will be concurrent
> > with VFIO and other kernel drivers/etc.
> > 
> 
> Looks you are equating PASID to host-side sharing, while ignoring 
> another valid usage that a PASID-capable device is passed through
> to the guest through vfio-pci and then PASID is used by the guest 
> for guest-side sharing. In such case, it is an exclusive usage in host
> side and then what is the problem for VFIO to manage PASID given
> that vfio-pci completely owns the function?

And this is the only PASID model for Arm SMMU (and AMD IOMMU, I believe):
the PASID space of a PCI function cannot be shared between host and guest,
so we assign the whole PASID table along with the RID. Since we need the
BIND, INVALIDATE, and report APIs introduced here to support nested
translation, a /dev/sva interface would need to support this mode as well.

Thanks,
Jean
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-09-15 Thread Jason Wang


On 2020/9/16 上午3:26, Raj, Ashok wrote:

IIUC, you are asking that part of the interface to move to a API interface
that potentially the new /dev/sva and VFIO could share? I think the API's
for PASID management themselves are generic (Jean's patchset + Jacob's
ioasid set management).

Yes, the in kernel APIs are pretty generic now, and can be used by
many types of drivers.

Good, so there is no new requirements here I suppose.



The requirement is not for in-kernel APIs but a generic uAPIs.



As JasonW kicked this off, VDPA will need all this identical stuff
too. We already know this, and I think Intel VDPA HW will need it, so
it should concern you too:)

This is one of those things that I would disagree and commit :-)..


A PASID vIOMMU solution sharable with VDPA and VFIO, based on a PASID
control char dev (eg /dev/sva, or maybe /dev/iommu) seems like a
reasonable starting point for discussion.

Looks like now we are getting closer to what we need.:-)

Given that PASID api's are general purpose today and any driver can use it
to take advantage. VFIO fortunately or unfortunately has the IOMMU things
abstracted. I suppose that support is also mostly built on top of the
generic iommu* api abstractions in a vendor neutral way?

I'm still lost on what is missing that vDPA can't build on top of what is
available?



For sure it can, but we may end up duplicated (or similar) uAPIs which 
is bad.


Thanks




Cheers,
Ashok



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-09-15 Thread Jason Wang


On 2020/9/14 下午9:31, Jean-Philippe Brucker wrote:

If it's possible, I would suggest a generic uAPI instead of a VFIO specific
one.

A large part of this work is already generic uAPI, in
include/uapi/linux/iommu.h.



This is not what I read from this series, all the following uAPI is VFIO 
specific:


struct vfio_iommu_type1_nesting_op;
struct vfio_iommu_type1_pasid_request;

And include/uapi/linux/iommu.h is not included in 
include/uapi/linux/vfio.h at all.





This patchset connects that generic interface
to the pre-existing VFIO uAPI that deals with IOMMU mappings of an
assigned device. But the bulk of the work is done by the IOMMU subsystem,
and is available to all device drivers.



So any reason not introducing the uAPI to IOMMU drivers directly?





Jason suggest something like /dev/sva. There will be a lot of other
subsystems that could benefit from this (e.g vDPA).

Do you have a more precise idea of the interface /dev/sva would provide,
how it would interact with VFIO and others?



Can we replace the container fd with sva fd like:

sva = open("/dev/sva", O_RDWR);
group = open("/dev/vfio/26", O_RDWR);
ioctl(group, VFIO_GROUP_SET_SVA, &sva);

Then we can do all SVA stuffs through sva fd, and for other subsystems 
(like vDPA) it only need to implement the function that is equivalent to 
VFIO_GROUP_SET_SVA.




   vDPA could transport the
generic iommu.h structures via its own uAPI, and call the IOMMU API
directly without going through an intermediate /dev/sva handle.



Any value for those transporting? I think we have agreed that VFIO is 
not the only user for vSVA ...


It's not hard to forecast that there would be more subsystems that want 
to benefit from vSVA, we don't want to duplicate the similar uAPIs in 
all of those subsystems.


Thanks




Thanks,
Jean



___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-09-15 Thread Lu Baolu

On 9/16/20 8:22 AM, Jacob Pan (Jun) wrote:

If user space wants to bind page tables, create the PASID with
/dev/sva, use ioctls there to setup the page table the way it wants,
then pass the now configured PASID to a driver that can use it.


Are we talking about bare metal SVA? If so, I don't see the need for
userspace to know there is a PASID. All user space need is that my
current mm is bound to a device by the driver. So it can be a one-step
process for user instead of two.


Driver does not do page table binding. Do not duplicate all the
control plane uAPI in every driver.

PASID managment and binding is seperated from the driver(s) that are
using the PASID.


Why separate? Drivers need to be involved in PASID life cycle
management. For example, when tearing down a PASID, the driver needs to
stop DMA, IOMMU driver needs to unbind, etc. If driver is the control
point, then things are just in order. I am referring to bare metal SVA.

For guest SVA, I agree that binding is separate from PASID allocation.
Could you review this doc. in terms of life cycle?
https://lkml.org/lkml/2020/8/22/13

My point is that /dev/sda has no value for bare metal SVA, we are just
talking about if guest SVA UAPIs can be consolidated. Or am I missing
something?



Not only bare metal SVA, but also subdevice passthrough (Intel Scalable
IOV and ARM SubStream ID) also consumes PASID which has nothing to do
with user space, hence the /dev/sva is unsuited.

Best regards,
baolu
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-09-15 Thread Tian, Kevin
> From: Jason Gunthorpe 
> Sent: Tuesday, September 15, 2020 10:29 PM
>
> > Do they need a device at all?  It's not clear to me why RID based
> > IOMMU management fits within vfio's scope, but PASID based does not.
> 
> In RID mode vfio-pci completely owns the PCI function, so it is more
> natural that VFIO, as the sole device owner, would own the DMA mapping
> machinery. Further, the RID IOMMU mode is rarely used outside of VFIO
> so there is not much reason to try and disaggregate the API.

It is also used by vDPA.

> 
> PASID on the other hand, is shared. vfio-mdev drivers will share the
> device with other kernel drivers. PASID and DMA will be concurrent
> with VFIO and other kernel drivers/etc.
> 

Looks you are equating PASID to host-side sharing, while ignoring 
another valid usage that a PASID-capable device is passed through
to the guest through vfio-pci and then PASID is used by the guest 
for guest-side sharing. In such case, it is an exclusive usage in host
side and then what is the problem for VFIO to manage PASID given
that vfio-pci completely owns the function?

Thanks
Kevin 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-09-15 Thread Jacob Pan (Jun)
Hi Jason,
On Tue, 15 Sep 2020 20:51:26 -0300, Jason Gunthorpe 
wrote:

> On Tue, Sep 15, 2020 at 03:08:51PM -0700, Jacob Pan wrote:
> > > A PASID vIOMMU solution sharable with VDPA and VFIO, based on a
> > > PASID control char dev (eg /dev/sva, or maybe /dev/iommu) seems
> > > like a reasonable starting point for discussion.  
> > 
> > I am not sure what can really be consolidated in /dev/sva.   
> 
> More or less, everything in this patch. All the manipulations of PASID
> that are required for vIOMMU use case/etc. Basically all PASID control
> that is not just a 1:1 mapping of the mm_struct.
> 
> > will have their own kerne-user interfaces anyway for their usage
> > models. They are just providing the specific transport while
> > sharing generic IOMMU UAPIs and IOASID management.  
> 
> > As I mentioned PASID management is already consolidated in the
> > IOASID layer, so for VDPA or other users, it just matter of create
> > its own ioasid_set, doing allocation.  
> 
> Creating the PASID is not the problem, managing what the PASID maps to
> is the issue. That is all uAPI that we don't really have today.
> 
> > IOASID is also available to the in-kernel users which does not
> > need /dev/sva AFAICT. For bare metal SVA, I don't see a need to
> > create this 'floating' state of the PASID when created by /dev/sva.
> > PASID allocation could happen behind the scene when users need to
> > bind page tables to a device DMA stream.  
> 
> My point is I would like to see one set of uAPI ioctls to bind page
> tables. I don't want to have VFIO, VDPA, etc, etc uAPIs to do the
> exact same things only slightly differently.
> 
Got your point. I am not familiar with VDPA but for VFIO UAPI, it is
very thin, mostly passthrough IOMMU UAPI struct as opaque data.

> If user space wants to bind page tables, create the PASID with
> /dev/sva, use ioctls there to setup the page table the way it wants,
> then pass the now configured PASID to a driver that can use it. 
> 
Are we talking about bare metal SVA? If so, I don't see the need for
userspace to know there is a PASID. All user space need is that my
current mm is bound to a device by the driver. So it can be a one-step
process for user instead of two.

> Driver does not do page table binding. Do not duplicate all the
> control plane uAPI in every driver.
> 
> PASID managment and binding is seperated from the driver(s) that are
> using the PASID.
> 
Why separate? Drivers need to be involved in PASID life cycle
management. For example, when tearing down a PASID, the driver needs to
stop DMA, IOMMU driver needs to unbind, etc. If driver is the control
point, then things are just in order. I am referring to bare metal SVA.

For guest SVA, I agree that binding is separate from PASID allocation.
Could you review this doc. in terms of life cycle?
https://lkml.org/lkml/2020/8/22/13

My point is that /dev/sda has no value for bare metal SVA, we are just
talking about if guest SVA UAPIs can be consolidated. Or am I missing
something?

> Jason


Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-09-15 Thread Jason Gunthorpe
On Tue, Sep 15, 2020 at 03:08:51PM -0700, Jacob Pan wrote:
> > A PASID vIOMMU solution sharable with VDPA and VFIO, based on a PASID
> > control char dev (eg /dev/sva, or maybe /dev/iommu) seems like a
> > reasonable starting point for discussion.
> 
> I am not sure what can really be consolidated in /dev/sva. 

More or less, everything in this patch. All the manipulations of PASID
that are required for vIOMMU use case/etc. Basically all PASID control
that is not just a 1:1 mapping of the mm_struct.

> will have their own kerne-user interfaces anyway for their usage models.
> They are just providing the specific transport while sharing generic IOMMU
> UAPIs and IOASID management.

> As I mentioned PASID management is already consolidated in the IOASID layer,
> so for VDPA or other users, it just matter of create its own ioasid_set,
> doing allocation.

Creating the PASID is not the problem, managing what the PASID maps to
is the issue. That is all uAPI that we don't really have today.

> IOASID is also available to the in-kernel users which does not
> need /dev/sva AFAICT. For bare metal SVA, I don't see a need to create this
> 'floating' state of the PASID when created by /dev/sva. PASID allocation
> could happen behind the scene when users need to bind page tables to a
> device DMA stream.

My point is I would like to see one set of uAPI ioctls to bind page
tables. I don't want to have VFIO, VDPA, etc, etc uAPIs to do the exact
same things only slightly differently.

If user space wants to bind page tables, create the PASID with
/dev/sva, use ioctls there to setup the page table the way it wants,
then pass the now configured PASID to a driver that can use it. 

Driver does not do page table binding. Do not duplicate all the
control plane uAPI in every driver.

PASID managment and binding is seperated from the driver(s) that are
using the PASID.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-09-15 Thread Jason Gunthorpe
On Tue, Sep 15, 2020 at 12:26:32PM -0700, Raj, Ashok wrote:

> > Yes, there is. There is a limited pool of HW PASID's. If one user fork
> > bombs it can easially claim an unreasonable number from that pool as
> > each process will claim a PASID. That can DOS the rest of the system.
> 
> Not sure how you had this played out.. For PASID used in ENQCMD today for
> our SVM usages, we *DO* not automatically propagate or allocate new PASIDs. 
> 
> The new process needs to bind to get a PASID for its own use. For threads
> of same process the PASID is inherited. For forks(), we do not
> auto-allocate them.

Auto-allocate doesn't matter, the PASID is tied to the mm_struct,
after fork the program will get a new mm_struct, and it can manually
re-trigger PASID allocation for that mm_struct from any SVA kernel
driver.

64k processes, each with their own mm_struct, all triggering SVA, will
allocate 64k PASID's and use up the whole 16 bit space.

> Given that PASID api's are general purpose today and any driver can use it
> to take advantage. VFIO fortunately or unfortunately has the IOMMU things
> abstracted. I suppose that support is also mostly built on top of the
> generic iommu* api abstractions in a vendor neutral way? 
> 
> I'm still lost on what is missing that vDPA can't build on top of what is
> available?

I think it is basically everything in this patch.. Why duplicate all
this uAPI?

Jason 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-09-15 Thread Jacob Pan
Hi Jason,

On Tue, 15 Sep 2020 15:45:10 -0300, Jason Gunthorpe  wrote:

> On Tue, Sep 15, 2020 at 11:11:54AM -0700, Raj, Ashok wrote:
> > > PASID applies widely to many device and needs to be introduced with a
> > > wide community agreement so all scenarios will be supportable.  
> > 
> > True, reading some of the earlier replies I was clearly confused as I
> > thought you were talking about mdev again. But now that you stay it, you
> > have moved past mdev and its the PASID interfaces correct?  
> 
> Yes, we agreed mdev for IDXD at LPC, didn't talk about PASID.
> 
> > For the native user applications have just 1 PASID per
> > process. There is no need for a quota management.  
> 
> Yes, there is. There is a limited pool of HW PASID's. If one user fork
> bombs it can easially claim an unreasonable number from that pool as
> each process will claim a PASID. That can DOS the rest of the system.
> 
> If PASID DOS is a worry then it must be solved at the IOMMU level for
> all user applications that might trigger a PASID allocation. VFIO is
> not special.
> 
> > IIUC, you are asking that part of the interface to move to a API
> > interface that potentially the new /dev/sva and VFIO could share? I
> > think the API's for PASID management themselves are generic (Jean's
> > patchset + Jacob's ioasid set management).  
> 
> Yes, the in kernel APIs are pretty generic now, and can be used by
> many types of drivers.
> 
Right, IOMMU UAPIs are not VFIO specific, we pass user pointer to the IOMMU
layer to process.

Similarly for PASID management, the IOASID extensions we are proposing
will handle ioasid_set (groups/pools), quota, permissions, and notifications
in the IOASID core. There is nothing VFIO specific.
https://lkml.org/lkml/2020/8/22/12

> As JasonW kicked this off, VDPA will need all this identical stuff
> too. We already know this, and I think Intel VDPA HW will need it, so
> it should concern you too :)
> 
> A PASID vIOMMU solution sharable with VDPA and VFIO, based on a PASID
> control char dev (eg /dev/sva, or maybe /dev/iommu) seems like a
> reasonable starting point for discussion.
> 
I am not sure what can really be consolidated in /dev/sva. VFIO and VDPA
will have their own kerne-user interfaces anyway for their usage models.
They are just providing the specific transport while sharing generic IOMMU
UAPIs and IOASID management.

As I mentioned PASID management is already consolidated in the IOASID layer,
so for VDPA or other users, it just matter of create its own ioasid_set,
doing allocation.

IOASID is also available to the in-kernel users which does not
need /dev/sva AFAICT. For bare metal SVA, I don't see a need to create this
'floating' state of the PASID when created by /dev/sva. PASID allocation
could happen behind the scene when users need to bind page tables to a
device DMA stream. Security authorization of the PASID is natively enforced
when user try to bind page table, there is no need to pass the FD handle of
the PASID back to the kernel as you suggested earlier.

Thanks,

Jacob
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-09-15 Thread Raj, Ashok
On Tue, Sep 15, 2020 at 03:45:10PM -0300, Jason Gunthorpe wrote:
> On Tue, Sep 15, 2020 at 11:11:54AM -0700, Raj, Ashok wrote:
> > > PASID applies widely to many device and needs to be introduced with a
> > > wide community agreement so all scenarios will be supportable.
> > 
> > True, reading some of the earlier replies I was clearly confused as I
> > thought you were talking about mdev again. But now that you stay it, you
> > have moved past mdev and its the PASID interfaces correct?
> 
> Yes, we agreed mdev for IDXD at LPC, didn't talk about PASID.
> 
> > For the native user applications have just 1 PASID per
> > process. There is no need for a quota management.
> 
> Yes, there is. There is a limited pool of HW PASID's. If one user fork
> bombs it can easially claim an unreasonable number from that pool as
> each process will claim a PASID. That can DOS the rest of the system.

Not sure how you had this played out.. For PASID used in ENQCMD today for
our SVM usages, we *DO* not automatically propagate or allocate new PASIDs. 

The new process needs to bind to get a PASID for its own use. For threads
of same process the PASID is inherited. For forks(), we do not
auto-allocate them. Since PASID isn't a sharable resource much like how you
would not pass mmio mmap's to forked processes that cannot be shared correct?
Such as your doorbell space for e.g. 

> 
> If PASID DOS is a worry then it must be solved at the IOMMU level for
> all user applications that might trigger a PASID allocation. VFIO is
> not special.

Feels like you can simply avoid the PASID DOS rather than permit it to
happen. 
> 
> > IIUC, you are asking that part of the interface to move to a API interface
> > that potentially the new /dev/sva and VFIO could share? I think the API's
> > for PASID management themselves are generic (Jean's patchset + Jacob's
> > ioasid set management).
> 
> Yes, the in kernel APIs are pretty generic now, and can be used by
> many types of drivers.

Good, so there is no new requirements here I suppose.
> 
> As JasonW kicked this off, VDPA will need all this identical stuff
> too. We already know this, and I think Intel VDPA HW will need it, so
> it should concern you too :)

This is one of those things that I would disagree and commit :-).. 

> 
> A PASID vIOMMU solution sharable with VDPA and VFIO, based on a PASID
> control char dev (eg /dev/sva, or maybe /dev/iommu) seems like a
> reasonable starting point for discussion.

Looks like now we are getting closer to what we need. :-)

Given that PASID api's are general purpose today and any driver can use it
to take advantage. VFIO fortunately or unfortunately has the IOMMU things
abstracted. I suppose that support is also mostly built on top of the
generic iommu* api abstractions in a vendor neutral way? 

I'm still lost on what is missing that vDPA can't build on top of what is
available?

Cheers,
Ashok
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-09-15 Thread Jason Gunthorpe
On Tue, Sep 15, 2020 at 11:11:54AM -0700, Raj, Ashok wrote:
> > PASID applies widely to many device and needs to be introduced with a
> > wide community agreement so all scenarios will be supportable.
> 
> True, reading some of the earlier replies I was clearly confused as I
> thought you were talking about mdev again. But now that you stay it, you
> have moved past mdev and its the PASID interfaces correct?

Yes, we agreed mdev for IDXD at LPC, didn't talk about PASID.

> For the native user applications have just 1 PASID per
> process. There is no need for a quota management.

Yes, there is. There is a limited pool of HW PASID's. If one user fork
bombs it can easially claim an unreasonable number from that pool as
each process will claim a PASID. That can DOS the rest of the system.

If PASID DOS is a worry then it must be solved at the IOMMU level for
all user applications that might trigger a PASID allocation. VFIO is
not special.

> IIUC, you are asking that part of the interface to move to a API interface
> that potentially the new /dev/sva and VFIO could share? I think the API's
> for PASID management themselves are generic (Jean's patchset + Jacob's
> ioasid set management).

Yes, the in kernel APIs are pretty generic now, and can be used by
many types of drivers.

As JasonW kicked this off, VDPA will need all this identical stuff
too. We already know this, and I think Intel VDPA HW will need it, so
it should concern you too :)

A PASID vIOMMU solution sharable with VDPA and VFIO, based on a PASID
control char dev (eg /dev/sva, or maybe /dev/iommu) seems like a
reasonable starting point for discussion.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-09-15 Thread Raj, Ashok
On Tue, Sep 15, 2020 at 08:33:41AM -0300, Jason Gunthorpe wrote:
> On Mon, Sep 14, 2020 at 03:44:38PM -0700, Raj, Ashok wrote:
> > Hi Jason,
> > 
> > I thought we discussed this at LPC, but still seems to be going in
> > circles :-(.
> 
> We discused mdev at LPC, not PASID.
> 
> PASID applies widely to many device and needs to be introduced with a
> wide community agreement so all scenarios will be supportable.

True, reading some of the earlier replies I was clearly confused as I
thought you were talking about mdev again. But now that you stay it, you
have moved past mdev and its the PASID interfaces correct?

> 
> > As you had suggested earlier in the mail thread could Jason Wang maybe
> > build out what it takes to have a full fledged /dev/sva interface for vDPA
> > and figure out how the interfaces should emerge? otherwise it appears
> > everyone is talking very high level and with that limited understanding of
> > how things work at the moment. 
> 
> You want Jason Wang to do the work to get Intel PASID support merged?
> Seems a bit of strange request.

I was reading mdev in my head. Not PASID, sorry.

For the native user applications have just 1 PASID per process. There is no
need for a quota management. VFIO being the one used for guest where there
is more PASID's per guest is where this is enforced today. 

IIUC, you are asking that part of the interface to move to a API interface
that potentially the new /dev/sva and VFIO could share? I think the API's
for PASID management themselves are generic (Jean's patchset + Jacob's
ioasid set management). 

Possibly what you need is already available, but not in a specific way that
you expect maybe? 

Let me check with Jacob and let him/Jean pick that up.

Cheers,
Ashok

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-09-15 Thread Jason Gunthorpe
On Mon, Sep 14, 2020 at 04:33:10PM -0600, Alex Williamson wrote:

> Can you explain that further, or spit-ball what you think this /dev/sva
> interface looks like and how a user might interact between vfio and
> this new interface? 

When you open it you get some container, inside the container the
user can create PASIDs. PASIDs outside that container cannot be
reached.

Creating a PASID, or the guest PASID range would be the entry point
for doing all the operations against a PASID or range that this patch
series imagines:
 - Map process VA mappings to the PASID's DMA virtual address space
 - Catch faults
 - Setup any special HW stuff like Intel's two level thing, ARM stuff, etc
 - Expose resource controls, cgroup, whatever
 - Migration special stuff (allocate fixed PASIDs)

A PASID is a handle for an IOMMU page table, and the tools to
manipulate it. Within /dev/sva the page table is just 'floating' and
not linked to any PCI functions

The open /dev/sva FD holding the allocated PASIDs would be passed to a
kernel driver. This is a security authorization that the specified
PASID can be assigned to a PCI device by the kernel.

At this point the kernel driver would have the IOMMU permit its
bus/device/function to use the PASID. The PASID can be passed to
multiple drivers of any driver flavour so table re-use is
possible. Now the IOMMU page table is linked to a device.

The kernel device driver would also do the device specific programming
to setup the PASID in the device, attach it to some device object and
expose the device for user DMA.

For instance IDXD's char dev would map the queue memory and associate
the PASID with that queue and setup the HW to be ready for the new
enque instruction. The IDXD mdev would link to its emulated PCI BAR
and ensure the guest can only use PASID's included in the /dev/sva
container.

The qemu control plane for vIOMMU related to PASID would run over
/dev/sva.

I think the design could go further where a 'PASID' is just an
abstract idea of a page table, then vfio-pci could consume it too as a
IOMMU page table handle even though there is no actual PASID. So qemu
could end up with one API to universally control the vIOMMU, an API
that can be shared between subsystems and is not tied to VFIO.

> allocating pasids and associating them with page tables for that
> two-stage IOMMU setup, performing cache invalidations based on page
> table updates, etc.  How does it make more sense for a vIOMMU to
> setup some aspects of the IOMMU through vfio and others through a
> TBD interface?

vfio's IOMMU interface is about RID based full device ownership,
and fixed mappings.

PASID is about mediation, shared ownership and page faulting.

Does PASID overlap with the existing IOMMU RID interface beyond both
are using the IOMMU?

> The IOMMU needs to allocate PASIDs, so in that sense it enforces a
> quota via the architectural limits, but is the IOMMU layer going to
> distinguish in-kernel versus user limits?  A cgroup limit seems like a
> good idea, but that's not really at the IOMMU layer either and I don't
> see that a /dev/sva and vfio interface couldn't both support a cgroup
> type quota.

It is all good questions. PASID is new, this stuff needs to be
sketched out more. A lot of in-kernel users of IOMMU PASID are
probably going to be triggered by userspace actions.

I think a cgroup quota would end up near the IOMMU layer, so vfio,
sva, and any other driver char devs would all be restricted by the
cgroup as peers.

> And it's not clear that they'll have compatible requirements.  A
> userspace idxd driver might have limited needs versus a vIOMMU backend.
> Does a single quota model adequately support both or are we back to the
> differences between access to a device and ownership of a device?

At the end of the day a PASID is just a number and the drivers only
use of it is to program it into HW.

All these other differences deal with the IOMMU side of the PASID, how
pages are mapped into it, how page fault works, etc, etc. Keeping the
two concerns seperated seems very clean. A device driver shouldn't
care how the PASID is setup.

> > > This series is a blueprint within the context of the ownership and
> > > permission model that VFIO already provides.  It doesn't seem like we
> > > can pluck that out on its own, nor is it necessarily the case that VFIO
> > > wouldn't want to provide PASID services within its own API even if we
> > > did have this undefined /dev/sva interface.  
> > 
> > I don't see what you do - VFIO does not own PASID, and in this
> > vfio-mdev mode it does not own the PCI device/IOMMU either. So why
> > would this need to be part of the VFIO owernship and permission model?
> 
> Doesn't the PASID model essentially just augment the requester ID IOMMU
> model so as to manage the IOVAs for a subdevice of a RID?  

I'd say not really.. PASID is very different from RID because PASID
must always be mediated by the kernel. vfio-pci doesn't know how to
use PASID because it doesn't k

Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-09-15 Thread Jason Gunthorpe
On Mon, Sep 14, 2020 at 03:44:38PM -0700, Raj, Ashok wrote:
> Hi Jason,
> 
> I thought we discussed this at LPC, but still seems to be going in
> circles :-(.

We discused mdev at LPC, not PASID.

PASID applies widely to many device and needs to be introduced with a
wide community agreement so all scenarios will be supportable.

> As you had suggested earlier in the mail thread could Jason Wang maybe
> build out what it takes to have a full fledged /dev/sva interface for vDPA
> and figure out how the interfaces should emerge? otherwise it appears
> everyone is talking very high level and with that limited understanding of
> how things work at the moment. 

You want Jason Wang to do the work to get Intel PASID support merged?
Seems a bit of strange request.

> This has to move ahead of these email discussions, hoping somone with the
> right ideas would help move this forward.

Why not try yourself to come up with a proposal?

Jason 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-09-14 Thread Raj, Ashok
Hi Jason,

I thought we discussed this at LPC, but still seems to be going in
circles :-(.


On Mon, Sep 14, 2020 at 04:00:57PM -0300, Jason Gunthorpe wrote:
> On Mon, Sep 14, 2020 at 12:23:28PM -0600, Alex Williamson wrote:
> > On Mon, 14 Sep 2020 14:41:21 -0300
> > Jason Gunthorpe  wrote:
> > 
> > > On Mon, Sep 14, 2020 at 10:58:57AM -0600, Alex Williamson wrote:
> > >  
> > > > "its own special way" is arguable, VFIO is just making use of what's
> > > > being proposed as the uapi via its existing IOMMU interface.  
> > > 
> > > I mean, if we have a /dev/sva then it makes no sense to extend the
> > > VFIO interfaces with the same stuff. VFIO should simply accept a PASID
> > > created from /dev/sva and use it just like any other user-DMA driver
> > > would.
> > 
> > I don't think that's absolutely true.  By the same logic, we could say
> > that pci-sysfs provides access to PCI BAR and config space
> > resources,
> 
> No, it is the reverse, VFIO is a better version of pci-sysfs, so
> pci-sysfs is the one that is obsoleted by VFIO. Similarly a /dev/sva
> would be the superset interface for PASID, so whatver VFIO has would
> be obsoleted.

As you had suggested earlier in the mail thread could Jason Wang maybe
build out what it takes to have a full fledged /dev/sva interface for vDPA
and figure out how the interfaces should emerge? otherwise it appears
everyone is talking very high level and with that limited understanding of
how things work at the moment. 

As Kevin pointed out there are several aspects, and a real prototype from
interested people would be the best way to understand the easy/hard aspects
of moving between the proposals.

- PASID allocation and life cycle management
  Managing both 1-1 (as its done today) and also support a guest PASID
  space. (Supporting guest PASID range is required for migration I suppose)
- Page request processing.
- Interaction with vIOMMU, vSVA requires vIOMMU for supporting
  invalidations, forwarding prq and such.
- Supporting ENQCMD in guest. (Today its just in Intel products, but its
  also submitted to PCIe SIG) and if you are a member should be able to see
  that. FWIW, it might already be open for public review, it not now maybe
  pretty soon.
  
  For Intel we have some KVM interaction setting up the guest pasid->host
  pasid interaces.

This has to move ahead of these email discussions, hoping somone with the
right ideas would help move this forward.

Cheers,
Ashok


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-09-14 Thread Alex Williamson
On Mon, 14 Sep 2020 16:00:57 -0300
Jason Gunthorpe  wrote:

> On Mon, Sep 14, 2020 at 12:23:28PM -0600, Alex Williamson wrote:
> > On Mon, 14 Sep 2020 14:41:21 -0300
> > Jason Gunthorpe  wrote:
> >   
> > > On Mon, Sep 14, 2020 at 10:58:57AM -0600, Alex Williamson wrote:
> > >
> > > > "its own special way" is arguable, VFIO is just making use of what's
> > > > being proposed as the uapi via its existing IOMMU interface.
> > > 
> > > I mean, if we have a /dev/sva then it makes no sense to extend the
> > > VFIO interfaces with the same stuff. VFIO should simply accept a PASID
> > > created from /dev/sva and use it just like any other user-DMA driver
> > > would.  
> > 
> > I don't think that's absolutely true.  By the same logic, we could say
> > that pci-sysfs provides access to PCI BAR and config space
> > resources,  
> 
> No, it is the reverse, VFIO is a better version of pci-sysfs, so
> pci-sysfs is the one that is obsoleted by VFIO. Similarly a /dev/sva
> would be the superset interface for PASID, so whatver VFIO has would
> be obsoleted.
> 
> It would be very unusual for the kernel to have to 'preferred'
> interfaces for the same thing, IMHO. The review process for uAPI
> should really prevent that by allowing all interests to be served
> while the uAPI is designed.
> 
> > the VFIO device interface duplicates part of that interface therefore it
> > should be abandoned.  But in reality, VFIO providing access to those
> > resources puts those accesses within the scope and control of the VFIO
> > interface.  
> 
> Not clear to my why VFIO needs that. PASID seems quite orthogonal from
> VFIO to me.


Can you explain that further, or spit-ball what you think this /dev/sva
interface looks like and how a user might interact between vfio and
this new interface?  The interface proposed here definitely does not
seem orthogonal to the vfio IOMMU interface, ie. selecting a specific
IOMMU domain mode during vfio setup, allocating pasids and associating
them with page tables for that two-stage IOMMU setup, performing cache
invalidations based on page table updates, etc.  How does it make more
sense for a vIOMMU to setup some aspects of the IOMMU through vfio and
others through a TBD interface?


> > > This has already happened, the SVA patches generally allow unpriv user
> > > space to allocate a PASID for their process.
> > > 
> > > If a device implements a mdev shared with a kernel driver (like IDXD)
> > > then it will be sharing that PASID pool across both drivers. In this
> > > case it makes no sense that VFIO has PASID quota logic because it has
> > > an incomplete view. It could only make sense if VFIO is the exclusive
> > > owner of the bus/device/function.
> > > 
> > > The tracking logic needs to be global.. Most probably in some kind of
> > > PASID cgroup controller?  
> > 
> > AIUI, that doesn't exist yet, so it makes sense that VFIO, as the
> > mechanism through which a user would allocate a PASID,   
> 
> VFIO is not the exclusive user interface for PASID. Other SVA drivers
> will allocate PASIDs. Any quota has to be implemented by the IOMMU
> layer, and shared across all drivers.


The IOMMU needs to allocate PASIDs, so in that sense it enforces a
quota via the architectural limits, but is the IOMMU layer going to
distinguish in-kernel versus user limits?  A cgroup limit seems like a
good idea, but that's not really at the IOMMU layer either and I don't
see that a /dev/sva and vfio interface couldn't both support a cgroup
type quota.

 
> > space.  Also, "unprivileged user" is a bit of a misnomer in this
> > context as the VFIO user must be privileged with ownership of a device
> > before they can even participate in PASID allocation.  Is truly
> > unprivileged access reasonable for a limited resource?  
> 
> I'm not talking about VFIO, I'm talking about the other SVA drivers. I
> expect some of them will be unpriv safe, like IDXD, for
> instance.
> 
> Some way to manage the limited PASID resource will be necessary beyond
> just VFIO.

And it's not clear that they'll have compatible requirements.  A
userspace idxd driver might have limited needs versus a vIOMMU backend.
Does a single quota model adequately support both or are we back to the
differences between access to a device and ownership of a device?
Maybe a single pasid per user makes sense in the former.  If we could
bring this discussion to some sort of more concrete proposal it might
be easier to weigh the choices.
 
> > QEMU typically runs in a sandbox with limited access, when a device or
> > mdev is assigned to a VM, file permissions are configured to allow that
> > access.  QEMU doesn't get to poke at any random dev file it likes,
> > that's part of how userspace reduces the potential attack surface.  
> 
> Plumbing the exact same APIs through VFIO's uAPI vs /dev/sva doesn't
> reduce the attack surface. qemu can simply include /dev/sva in the
> sandbox when using VFIO with no increase in attack surface from this
> proposed series.

Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-09-14 Thread Jason Gunthorpe
On Mon, Sep 14, 2020 at 12:23:28PM -0600, Alex Williamson wrote:
> On Mon, 14 Sep 2020 14:41:21 -0300
> Jason Gunthorpe  wrote:
> 
> > On Mon, Sep 14, 2020 at 10:58:57AM -0600, Alex Williamson wrote:
> >  
> > > "its own special way" is arguable, VFIO is just making use of what's
> > > being proposed as the uapi via its existing IOMMU interface.  
> > 
> > I mean, if we have a /dev/sva then it makes no sense to extend the
> > VFIO interfaces with the same stuff. VFIO should simply accept a PASID
> > created from /dev/sva and use it just like any other user-DMA driver
> > would.
> 
> I don't think that's absolutely true.  By the same logic, we could say
> that pci-sysfs provides access to PCI BAR and config space
> resources,

No, it is the reverse, VFIO is a better version of pci-sysfs, so
pci-sysfs is the one that is obsoleted by VFIO. Similarly a /dev/sva
would be the superset interface for PASID, so whatver VFIO has would
be obsoleted.

It would be very unusual for the kernel to have to 'preferred'
interfaces for the same thing, IMHO. The review process for uAPI
should really prevent that by allowing all interests to be served
while the uAPI is designed.

> the VFIO device interface duplicates part of that interface therefore it
> should be abandoned.  But in reality, VFIO providing access to those
> resources puts those accesses within the scope and control of the VFIO
> interface.

Not clear to my why VFIO needs that. PASID seems quite orthogonal from
VFIO to me.

> > This has already happened, the SVA patches generally allow unpriv user
> > space to allocate a PASID for their process.
> > 
> > If a device implements a mdev shared with a kernel driver (like IDXD)
> > then it will be sharing that PASID pool across both drivers. In this
> > case it makes no sense that VFIO has PASID quota logic because it has
> > an incomplete view. It could only make sense if VFIO is the exclusive
> > owner of the bus/device/function.
> > 
> > The tracking logic needs to be global.. Most probably in some kind of
> > PASID cgroup controller?
> 
> AIUI, that doesn't exist yet, so it makes sense that VFIO, as the
> mechanism through which a user would allocate a PASID, 

VFIO is not the exclusive user interface for PASID. Other SVA drivers
will allocate PASIDs. Any quota has to be implemented by the IOMMU
layer, and shared across all drivers.

> space.  Also, "unprivileged user" is a bit of a misnomer in this
> context as the VFIO user must be privileged with ownership of a device
> before they can even participate in PASID allocation.  Is truly
> unprivileged access reasonable for a limited resource?

I'm not talking about VFIO, I'm talking about the other SVA drivers. I
expect some of them will be unpriv safe, like IDXD, for
instance.

Some way to manage the limited PASID resource will be necessary beyond
just VFIO.

> QEMU typically runs in a sandbox with limited access, when a device or
> mdev is assigned to a VM, file permissions are configured to allow that
> access.  QEMU doesn't get to poke at any random dev file it likes,
> that's part of how userspace reduces the potential attack surface.

Plumbing the exact same APIs through VFIO's uAPI vs /dev/sva doesn't
reduce the attack surface. qemu can simply include /dev/sva in the
sandbox when using VFIO with no increase in attack surface from this
proposed series.

> This series is a blueprint within the context of the ownership and
> permission model that VFIO already provides.  It doesn't seem like we
> can pluck that out on its own, nor is it necessarily the case that VFIO
> wouldn't want to provide PASID services within its own API even if we
> did have this undefined /dev/sva interface.

I don't see what you do - VFIO does not own PASID, and in this
vfio-mdev mode it does not own the PCI device/IOMMU either. So why
would this need to be part of the VFIO owernship and permission model?

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-09-14 Thread Alex Williamson
On Mon, 14 Sep 2020 14:41:21 -0300
Jason Gunthorpe  wrote:

> On Mon, Sep 14, 2020 at 10:58:57AM -0600, Alex Williamson wrote:
>  
> > "its own special way" is arguable, VFIO is just making use of what's
> > being proposed as the uapi via its existing IOMMU interface.  
> 
> I mean, if we have a /dev/sva then it makes no sense to extend the
> VFIO interfaces with the same stuff. VFIO should simply accept a PASID
> created from /dev/sva and use it just like any other user-DMA driver
> would.

I don't think that's absolutely true.  By the same logic, we could say
that pci-sysfs provides access to PCI BAR and config space resources,
the VFIO device interface duplicates part of that interface therefore it
should be abandoned.  But in reality, VFIO providing access to those
resources puts those accesses within the scope and control of the VFIO
interface.  Ownership of a device through vfio is provided by allowing
the user access to the vfio group dev file, not by the group file, plus
some number of resource files, and the config file, and running with
admin permissions to see the full extent of config space.  Reserved
ranges for the IOMMU are also provided via sysfs, but VFIO includes a
capability on the IOMMU get_info ioctl for the user to learn about
available IOVA ranges w/o scraping through sysfs.

> > are also a system resource, so we require some degree of access control
> > and quotas for management of PASIDs.
> 
> This has already happened, the SVA patches generally allow unpriv user
> space to allocate a PASID for their process.
> 
> If a device implements a mdev shared with a kernel driver (like IDXD)
> then it will be sharing that PASID pool across both drivers. In this
> case it makes no sense that VFIO has PASID quota logic because it has
> an incomplete view. It could only make sense if VFIO is the exclusive
> owner of the bus/device/function.
> 
> The tracking logic needs to be global.. Most probably in some kind of
> PASID cgroup controller?

AIUI, that doesn't exist yet, so it makes sense that VFIO, as the
mechanism through which a user would allocate a PASID, implements a
reasonable quota to avoid an unprivileged user exhausting the address
space.  Also, "unprivileged user" is a bit of a misnomer in this
context as the VFIO user must be privileged with ownership of a device
before they can even participate in PASID allocation.  Is truly
unprivileged access reasonable for a limited resource?
 
> > know whether an assigned device requires PASIDs such that access to
> > this dev file is provided to QEMU?  
> 
> Wouldn't QEMU just open /dev/sva if it needs it? Like other dev files?
> Why would it need something special?

QEMU typically runs in a sandbox with limited access, when a device or
mdev is assigned to a VM, file permissions are configured to allow that
access.  QEMU doesn't get to poke at any random dev file it likes,
that's part of how userspace reduces the potential attack surface.
 
> > would be an obvious DoS path if any user can create arbitrary
> > allocations.  If we can move code out of VFIO, I'm all for it, but I
> > think it needs to be better defined than "implement magic universal sva
> > uapi interface" before we can really consider it.  Thanks,  
> 
> Jason began by saying VDPA will need this too, I agree with him.
> 
> I'm not sure why it would be "magic"? This series already gives a
> pretty solid blueprint for what the interface would need to
> have. Interested folks need to sit down and talk about it not just
> default everything to being built inside VFIO.

This series is a blueprint within the context of the ownership and
permission model that VFIO already provides.  It doesn't seem like we
can pluck that out on its own, nor is it necessarily the case that VFIO
wouldn't want to provide PASID services within its own API even if we
did have this undefined /dev/sva interface.  Thanks,

Alex

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-09-14 Thread Jason Gunthorpe
On Mon, Sep 14, 2020 at 10:58:57AM -0600, Alex Williamson wrote:
 
> "its own special way" is arguable, VFIO is just making use of what's
> being proposed as the uapi via its existing IOMMU interface.

I mean, if we have a /dev/sva then it makes no sense to extend the
VFIO interfaces with the same stuff. VFIO should simply accept a PASID
created from /dev/sva and use it just like any other user-DMA driver
would.

> are also a system resource, so we require some degree of access control
> and quotas for management of PASIDs.  

This has already happened, the SVA patches generally allow unpriv user
space to allocate a PASID for their process.

If a device implements a mdev shared with a kernel driver (like IDXD)
then it will be sharing that PASID pool across both drivers. In this
case it makes no sense that VFIO has PASID quota logic because it has
an incomplete view. It could only make sense if VFIO is the exclusive
owner of the bus/device/function.

The tracking logic needs to be global.. Most probably in some kind of
PASID cgroup controller?

> know whether an assigned device requires PASIDs such that access to
> this dev file is provided to QEMU?

Wouldn't QEMU just open /dev/sva if it needs it? Like other dev files?
Why would it need something special?

> would be an obvious DoS path if any user can create arbitrary
> allocations.  If we can move code out of VFIO, I'm all for it, but I
> think it needs to be better defined than "implement magic universal sva
> uapi interface" before we can really consider it.  Thanks,

Jason began by saying VDPA will need this too, I agree with him.

I'm not sure why it would be "magic"? This series already gives a
pretty solid blueprint for what the interface would need to
have. Interested folks need to sit down and talk about it not just
default everything to being built inside VFIO.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-09-14 Thread Alex Williamson
On Mon, 14 Sep 2020 13:33:54 -0300
Jason Gunthorpe  wrote:

> On Mon, Sep 14, 2020 at 09:22:47AM -0700, Raj, Ashok wrote:
> > Hi Jason,
> > 
> > On Mon, Sep 14, 2020 at 10:47:38AM -0300, Jason Gunthorpe wrote:  
> > > On Mon, Sep 14, 2020 at 03:31:13PM +0200, Jean-Philippe Brucker wrote:
> > >   
> > > > > Jason suggest something like /dev/sva. There will be a lot of other
> > > > > subsystems that could benefit from this (e.g vDPA).  
> > > > 
> > > > Do you have a more precise idea of the interface /dev/sva would provide,
> > > > how it would interact with VFIO and others?  vDPA could transport the
> > > > generic iommu.h structures via its own uAPI, and call the IOMMU API
> > > > directly without going through an intermediate /dev/sva handle.  
> > > 
> > > Prior to PASID IOMMU really only makes sense as part of vfio-pci
> > > because the iommu can only key on the BDF. That can't work unless the
> > > whole PCI function can be assigned. It is hard to see how a shared PCI
> > > device can work with IOMMU like this, so may as well use vfio.
> > > 
> > > SVA and various vIOMMU models change this, a shared PCI driver can
> > > absoultely work with a PASID that is assigned to a VM safely, and
> > > actually don't need to know if their PASID maps a mm_struct or
> > > something else.  
> > 
> > Well, IOMMU does care if its a native mm_struct or something that belongs
> > to guest. Because you need ability to forward page-requests and pickup
> > page-responses from guest. Since there is just one PRQ on the IOMMU and
> > responses can't be sent directly. You have to depend on vIOMMU type
> > interface in guest to make all of this magic work right?  
> 
> Yes, IOMMU cares, but not the PCI Driver. It just knows it has a
> PASID. Details on how page faultings is handled or how the mapping is
> setup is abstracted by the PASID.
> 
> > > This new PASID allocator would match the guest memory layout and  
> > 
> > Not sure what you mean by "match guest memory layout"? 
> > Probably, meaning first level is gVA or gIOVA?   
> 
> It means whatever the qemu/viommu/guest/etc needs across all the
> IOMMU/arch implementations.
> 
> Basically, there should only be two ways to get a PASID
>  - From mm_struct that mirrors the creating process
>  - Via '/dev/sva' which has an complete interface to create and
>control a PASID suitable for virtualization and more
> 
> VFIO should not have its own special way to get a PASID.

"its own special way" is arguable, VFIO is just making use of what's
being proposed as the uapi via its existing IOMMU interface.  PASIDs
are also a system resource, so we require some degree of access control
and quotas for management of PASIDs.  Does libvirt now get involved to
know whether an assigned device requires PASIDs such that access to
this dev file is provided to QEMU?  How does the kernel validate usage
or implement quotas when disconnected from device ownership?  PASIDs
would be an obvious DoS path if any user can create arbitrary
allocations.  If we can move code out of VFIO, I'm all for it, but I
think it needs to be better defined than "implement magic universal sva
uapi interface" before we can really consider it.  Thanks,

Alex

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-09-14 Thread Jason Gunthorpe
On Mon, Sep 14, 2020 at 09:22:47AM -0700, Raj, Ashok wrote:
> Hi Jason,
> 
> On Mon, Sep 14, 2020 at 10:47:38AM -0300, Jason Gunthorpe wrote:
> > On Mon, Sep 14, 2020 at 03:31:13PM +0200, Jean-Philippe Brucker wrote:
> > 
> > > > Jason suggest something like /dev/sva. There will be a lot of other
> > > > subsystems that could benefit from this (e.g vDPA).
> > > 
> > > Do you have a more precise idea of the interface /dev/sva would provide,
> > > how it would interact with VFIO and others?  vDPA could transport the
> > > generic iommu.h structures via its own uAPI, and call the IOMMU API
> > > directly without going through an intermediate /dev/sva handle.
> > 
> > Prior to PASID IOMMU really only makes sense as part of vfio-pci
> > because the iommu can only key on the BDF. That can't work unless the
> > whole PCI function can be assigned. It is hard to see how a shared PCI
> > device can work with IOMMU like this, so may as well use vfio.
> > 
> > SVA and various vIOMMU models change this, a shared PCI driver can
> > absoultely work with a PASID that is assigned to a VM safely, and
> > actually don't need to know if their PASID maps a mm_struct or
> > something else.
> 
> Well, IOMMU does care if its a native mm_struct or something that belongs
> to guest. Because you need ability to forward page-requests and pickup
> page-responses from guest. Since there is just one PRQ on the IOMMU and
> responses can't be sent directly. You have to depend on vIOMMU type
> interface in guest to make all of this magic work right?

Yes, IOMMU cares, but not the PCI Driver. It just knows it has a
PASID. Details on how page faultings is handled or how the mapping is
setup is abstracted by the PASID.

> > This new PASID allocator would match the guest memory layout and
> 
> Not sure what you mean by "match guest memory layout"? 
> Probably, meaning first level is gVA or gIOVA? 

It means whatever the qemu/viommu/guest/etc needs across all the
IOMMU/arch implementations.

Basically, there should only be two ways to get a PASID
 - From mm_struct that mirrors the creating process
 - Via '/dev/sva' which has an complete interface to create and
   control a PASID suitable for virtualization and more

VFIO should not have its own special way to get a PASID.

Jason
 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-09-14 Thread Raj, Ashok
Hi Jason,

On Mon, Sep 14, 2020 at 10:47:38AM -0300, Jason Gunthorpe wrote:
> On Mon, Sep 14, 2020 at 03:31:13PM +0200, Jean-Philippe Brucker wrote:
> 
> > > Jason suggest something like /dev/sva. There will be a lot of other
> > > subsystems that could benefit from this (e.g vDPA).
> > 
> > Do you have a more precise idea of the interface /dev/sva would provide,
> > how it would interact with VFIO and others?  vDPA could transport the
> > generic iommu.h structures via its own uAPI, and call the IOMMU API
> > directly without going through an intermediate /dev/sva handle.
> 
> Prior to PASID IOMMU really only makes sense as part of vfio-pci
> because the iommu can only key on the BDF. That can't work unless the
> whole PCI function can be assigned. It is hard to see how a shared PCI
> device can work with IOMMU like this, so may as well use vfio.
> 
> SVA and various vIOMMU models change this, a shared PCI driver can
> absoultely work with a PASID that is assigned to a VM safely, and
> actually don't need to know if their PASID maps a mm_struct or
> something else.

Well, IOMMU does care if its a native mm_struct or something that belongs
to guest. Because you need ability to forward page-requests and pickup
page-responses from guest. Since there is just one PRQ on the IOMMU and
responses can't be sent directly. You have to depend on vIOMMU type
interface in guest to make all of this magic work right?

> 
> So, some /dev/sva is another way to allocate a PASID that is not 1:1
> with mm_struct, as the existing SVA stuff enforces. ie it is a way to
> program the DMA address map of the PASID.
> 
> This new PASID allocator would match the guest memory layout and

Not sure what you mean by "match guest memory layout"? 
Probably, meaning first level is gVA or gIOVA? 

Cheers,
Ashok
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-09-14 Thread Jason Gunthorpe
On Mon, Sep 14, 2020 at 03:31:13PM +0200, Jean-Philippe Brucker wrote:

> > Jason suggest something like /dev/sva. There will be a lot of other
> > subsystems that could benefit from this (e.g vDPA).
> 
> Do you have a more precise idea of the interface /dev/sva would provide,
> how it would interact with VFIO and others?  vDPA could transport the
> generic iommu.h structures via its own uAPI, and call the IOMMU API
> directly without going through an intermediate /dev/sva handle.

Prior to PASID IOMMU really only makes sense as part of vfio-pci
because the iommu can only key on the BDF. That can't work unless the
whole PCI function can be assigned. It is hard to see how a shared PCI
device can work with IOMMU like this, so may as well use vfio.

SVA and various vIOMMU models change this, a shared PCI driver can
absoultely work with a PASID that is assigned to a VM safely, and
actually don't need to know if their PASID maps a mm_struct or
something else.

So, some /dev/sva is another way to allocate a PASID that is not 1:1
with mm_struct, as the existing SVA stuff enforces. ie it is a way to
program the DMA address map of the PASID.

This new PASID allocator would match the guest memory layout and
support the IOMMU nesting stuff needed for vPASID.

This is the common code for the complex cases of virtualization with
PASID, shared by all user DMA drivers, including VFIO.

It doesn't make a lot of sense to build a uAPI exclusive to VFIO just
for PASID and vPASID. We already know everything doing user DMA will
eventually need this stuff.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-09-14 Thread Jean-Philippe Brucker
On Mon, Sep 14, 2020 at 12:20:10PM +0800, Jason Wang wrote:
> 
> On 2020/9/10 下午6:45, Liu Yi L wrote:
> > Shared Virtual Addressing (SVA), a.k.a, Shared Virtual Memory (SVM) on
> > Intel platforms allows address space sharing between device DMA and
> > applications. SVA can reduce programming complexity and enhance security.
> > 
> > This VFIO series is intended to expose SVA usage to VMs. i.e. Sharing
> > guest application address space with passthru devices. This is called
> > vSVA in this series. The whole vSVA enabling requires QEMU/VFIO/IOMMU
> > changes. For IOMMU and QEMU changes, they are in separate series (listed
> > in the "Related series").
> > 
> > The high-level architecture for SVA virtualization is as below, the key
> > design of vSVA support is to utilize the dual-stage IOMMU translation (
> > also known as IOMMU nesting translation) capability in host IOMMU.
> > 
> > 
> >  .-.  .---.
> >  |   vIOMMU|  | Guest process CR3, FL only|
> >  | |  '---'
> >  ./
> >  | PASID Entry |--- PASID cache flush -
> >  '-'   |
> >  | |   V
> >  | |CR3 in GPA
> >  '-'
> > Guest
> > --| Shadow |--|
> >vv  v
> > Host
> >  .-.  .--.
> >  |   pIOMMU|  | Bind FL for GVA-GPA  |
> >  | |  '--'
> >  ./  |
> >  | PASID Entry | V (Nested xlate)
> >  '\.--.
> >  | ||SL for GPA-HPA, default domain|
> >  | |   '--'
> >  '-'
> > Where:
> >   - FL = First level/stage one page tables
> >   - SL = Second level/stage two page tables
> > 
> > Patch Overview:
> >   1. reports IOMMU nesting info to userspace ( patch 0001, 0002, 0003, 0015 
> > , 0016)
> >   2. vfio support for PASID allocation and free for VMs (patch 0004, 0005, 
> > 0007)
> >   3. a fix to a revisit in intel iommu driver (patch 0006)
> >   4. vfio support for binding guest page table to host (patch 0008, 0009, 
> > 0010)
> >   5. vfio support for IOMMU cache invalidation from VMs (patch 0011)
> >   6. vfio support for vSVA usage on IOMMU-backed mdevs (patch 0012)
> >   7. expose PASID capability to VM (patch 0013)
> >   8. add doc for VFIO dual stage control (patch 0014)
> 
> 
> If it's possible, I would suggest a generic uAPI instead of a VFIO specific
> one.

A large part of this work is already generic uAPI, in
include/uapi/linux/iommu.h. This patchset connects that generic interface
to the pre-existing VFIO uAPI that deals with IOMMU mappings of an
assigned device. But the bulk of the work is done by the IOMMU subsystem,
and is available to all device drivers.

> Jason suggest something like /dev/sva. There will be a lot of other
> subsystems that could benefit from this (e.g vDPA).

Do you have a more precise idea of the interface /dev/sva would provide,
how it would interact with VFIO and others?  vDPA could transport the
generic iommu.h structures via its own uAPI, and call the IOMMU API
directly without going through an intermediate /dev/sva handle.

Thanks,
Jean

> Have you ever considered this approach?
> 
> Thanks
> 
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-09-14 Thread Jason Gunthorpe
On Mon, Sep 14, 2020 at 10:38:10AM +, Tian, Kevin wrote:

> is widely used thus can better help verify the core logic with
> many existing devices. For vSVA, vDPA support has not be started
> while VFIO support is close to be accepted. It doesn't make much
> sense by blocking the VFIO part until vDPA is ready for wide

You keep saying that, but if we keep ignoring the right architecture
we end up with a mess inside VFIO just to save some development
time. That is usually not how the kernel process works.

Jason
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu


RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-09-14 Thread Tian, Kevin
> From: Jason Wang
> Sent: Monday, September 14, 2020 4:57 PM
> 
> On 2020/9/14 下午4:01, Tian, Kevin wrote:
> >> From: Jason Wang 
> >> Sent: Monday, September 14, 2020 12:20 PM
> >>
> >> On 2020/9/10 下午6:45, Liu Yi L wrote:
> >>> Shared Virtual Addressing (SVA), a.k.a, Shared Virtual Memory (SVM) on
> >>> Intel platforms allows address space sharing between device DMA and
> >>> applications. SVA can reduce programming complexity and enhance
> >> security.
> >>> This VFIO series is intended to expose SVA usage to VMs. i.e. Sharing
> >>> guest application address space with passthru devices. This is called
> >>> vSVA in this series. The whole vSVA enabling requires
> QEMU/VFIO/IOMMU
> >>> changes. For IOMMU and QEMU changes, they are in separate series
> (listed
> >>> in the "Related series").
> >>>
> >>> The high-level architecture for SVA virtualization is as below, the key
> >>> design of vSVA support is to utilize the dual-stage IOMMU translation (
> >>> also known as IOMMU nesting translation) capability in host IOMMU.
> >>>
> >>>
> >>>   .-.  .---.
> >>>   |   vIOMMU|  | Guest process CR3, FL only|
> >>>   | |  '---'
> >>>   ./
> >>>   | PASID Entry |--- PASID cache flush -
> >>>   '-'   |
> >>>   | |   V
> >>>   | |CR3 in GPA
> >>>   '-'
> >>> Guest
> >>> --| Shadow |--|
> >>> vv  v
> >>> Host
> >>>   .-.  .--.
> >>>   |   pIOMMU|  | Bind FL for GVA-GPA  |
> >>>   | |  '--'
> >>>   ./  |
> >>>   | PASID Entry | V (Nested xlate)
> >>>   '\.--.
> >>>   | ||SL for GPA-HPA, default domain|
> >>>   | |   '--'
> >>>   '-'
> >>> Where:
> >>>- FL = First level/stage one page tables
> >>>- SL = Second level/stage two page tables
> >>>
> >>> Patch Overview:
> >>>1. reports IOMMU nesting info to userspace ( patch 0001, 0002, 0003,
> >> 0015 , 0016)
> >>>2. vfio support for PASID allocation and free for VMs (patch 0004, 
> >>> 0005,
> >> 0007)
> >>>3. a fix to a revisit in intel iommu driver (patch 0006)
> >>>4. vfio support for binding guest page table to host (patch 0008, 0009,
> >> 0010)
> >>>5. vfio support for IOMMU cache invalidation from VMs (patch 0011)
> >>>6. vfio support for vSVA usage on IOMMU-backed mdevs (patch 0012)
> >>>7. expose PASID capability to VM (patch 0013)
> >>>8. add doc for VFIO dual stage control (patch 0014)
> >>
> >> If it's possible, I would suggest a generic uAPI instead of a VFIO
> >> specific one.
> >>
> >> Jason suggest something like /dev/sva. There will be a lot of other
> >> subsystems that could benefit from this (e.g vDPA).
> >>
> > Just be curious. When does vDPA subsystem plan to support vSVA and
> > when could one expect a SVA-capable vDPA device in market?
> >
> > Thanks
> > Kevin
> 
> 
> vSVA is in the plan but there's no ETA. I think we might start the work
> after control vq support.  It will probably start from SVA first and
> then vSVA (since it might require platform support).
> 
> For the device part, it really depends on the chipset and other device
> vendors. We plan to do the prototype in virtio by introducing PASID
> support in the spec.
> 

Thanks for the info. Then here is my thought.

First, I don't think /dev/sva is the right interface. Once we start 
considering such generic uAPI, it better behaves as the one interface
for all kinds of DMA requirements on device/subdevice passthrough.
Nested page table thru vSVA is one way. Manual map/unmap is
another way. It doesn't make sense to have one through generic
uAPI and the other through subsystem specific uAPI. In the end
the interface might become /dev/iommu, for delegating certain
IOMMU operations to userspace. 

In addition, delegated IOMMU operations have different scopes.
PASID allocation is per process/VM. pgtbl-bind/unbind, map/unmap 
and cache invalidation are per iommu domain. page request/
response are per device/subdevice. This requires the uAPI to also
understand and manage the association between domain/group/
device/subdevice (such as group attach/detach), instead of doing 
it separately in VFIO or vDPA as today. 

Based on above, I feel a more reasonable way is to first make a 
/dev/iommu uAPI supporting DMA map/unmap usages and then 
introduce vSVA to it. Doing this order is because DMA map/unmap 
is widely used thus can better help verify the core logic with 
many existing devices. For vSVA, vDPA support has not be started
while VFIO support is close to be accepted. It doesn't make much 
sense by blocking the VFIO part unti

Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-09-14 Thread Jason Wang


On 2020/9/14 下午4:01, Tian, Kevin wrote:

From: Jason Wang 
Sent: Monday, September 14, 2020 12:20 PM

On 2020/9/10 下午6:45, Liu Yi L wrote:

Shared Virtual Addressing (SVA), a.k.a, Shared Virtual Memory (SVM) on
Intel platforms allows address space sharing between device DMA and
applications. SVA can reduce programming complexity and enhance

security.

This VFIO series is intended to expose SVA usage to VMs. i.e. Sharing
guest application address space with passthru devices. This is called
vSVA in this series. The whole vSVA enabling requires QEMU/VFIO/IOMMU
changes. For IOMMU and QEMU changes, they are in separate series (listed
in the "Related series").

The high-level architecture for SVA virtualization is as below, the key
design of vSVA support is to utilize the dual-stage IOMMU translation (
also known as IOMMU nesting translation) capability in host IOMMU.


  .-.  .---.
  |   vIOMMU|  | Guest process CR3, FL only|
  | |  '---'
  ./
  | PASID Entry |--- PASID cache flush -
  '-'   |
  | |   V
  | |CR3 in GPA
  '-'
Guest
--| Shadow |--|
vv  v
Host
  .-.  .--.
  |   pIOMMU|  | Bind FL for GVA-GPA  |
  | |  '--'
  ./  |
  | PASID Entry | V (Nested xlate)
  '\.--.
  | ||SL for GPA-HPA, default domain|
  | |   '--'
  '-'
Where:
   - FL = First level/stage one page tables
   - SL = Second level/stage two page tables

Patch Overview:
   1. reports IOMMU nesting info to userspace ( patch 0001, 0002, 0003,

0015 , 0016)

   2. vfio support for PASID allocation and free for VMs (patch 0004, 0005,

0007)

   3. a fix to a revisit in intel iommu driver (patch 0006)
   4. vfio support for binding guest page table to host (patch 0008, 0009,

0010)

   5. vfio support for IOMMU cache invalidation from VMs (patch 0011)
   6. vfio support for vSVA usage on IOMMU-backed mdevs (patch 0012)
   7. expose PASID capability to VM (patch 0013)
   8. add doc for VFIO dual stage control (patch 0014)


If it's possible, I would suggest a generic uAPI instead of a VFIO
specific one.

Jason suggest something like /dev/sva. There will be a lot of other
subsystems that could benefit from this (e.g vDPA).


Just be curious. When does vDPA subsystem plan to support vSVA and
when could one expect a SVA-capable vDPA device in market?

Thanks
Kevin



vSVA is in the plan but there's no ETA. I think we might start the work 
after control vq support.  It will probably start from SVA first and 
then vSVA (since it might require platform support).


For the device part, it really depends on the chipset and other device 
vendors. We plan to do the prototype in virtio by introducing PASID 
support in the spec.


Thanks


___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

RE: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-09-14 Thread Tian, Kevin
> From: Jason Wang 
> Sent: Monday, September 14, 2020 12:20 PM
> 
> On 2020/9/10 下午6:45, Liu Yi L wrote:
> > Shared Virtual Addressing (SVA), a.k.a, Shared Virtual Memory (SVM) on
> > Intel platforms allows address space sharing between device DMA and
> > applications. SVA can reduce programming complexity and enhance
> security.
> >
> > This VFIO series is intended to expose SVA usage to VMs. i.e. Sharing
> > guest application address space with passthru devices. This is called
> > vSVA in this series. The whole vSVA enabling requires QEMU/VFIO/IOMMU
> > changes. For IOMMU and QEMU changes, they are in separate series (listed
> > in the "Related series").
> >
> > The high-level architecture for SVA virtualization is as below, the key
> > design of vSVA support is to utilize the dual-stage IOMMU translation (
> > also known as IOMMU nesting translation) capability in host IOMMU.
> >
> >
> >  .-.  .---.
> >  |   vIOMMU|  | Guest process CR3, FL only|
> >  | |  '---'
> >  ./
> >  | PASID Entry |--- PASID cache flush -
> >  '-'   |
> >  | |   V
> >  | |CR3 in GPA
> >  '-'
> > Guest
> > --| Shadow |--|
> >vv  v
> > Host
> >  .-.  .--.
> >  |   pIOMMU|  | Bind FL for GVA-GPA  |
> >  | |  '--'
> >  ./  |
> >  | PASID Entry | V (Nested xlate)
> >  '\.--.
> >  | ||SL for GPA-HPA, default domain|
> >  | |   '--'
> >  '-'
> > Where:
> >   - FL = First level/stage one page tables
> >   - SL = Second level/stage two page tables
> >
> > Patch Overview:
> >   1. reports IOMMU nesting info to userspace ( patch 0001, 0002, 0003,
> 0015 , 0016)
> >   2. vfio support for PASID allocation and free for VMs (patch 0004, 0005,
> 0007)
> >   3. a fix to a revisit in intel iommu driver (patch 0006)
> >   4. vfio support for binding guest page table to host (patch 0008, 0009,
> 0010)
> >   5. vfio support for IOMMU cache invalidation from VMs (patch 0011)
> >   6. vfio support for vSVA usage on IOMMU-backed mdevs (patch 0012)
> >   7. expose PASID capability to VM (patch 0013)
> >   8. add doc for VFIO dual stage control (patch 0014)
> 
> 
> If it's possible, I would suggest a generic uAPI instead of a VFIO
> specific one.
> 
> Jason suggest something like /dev/sva. There will be a lot of other
> subsystems that could benefit from this (e.g vDPA).
> 

Just be curious. When does vDPA subsystem plan to support vSVA and 
when could one expect a SVA-capable vDPA device in market?

Thanks
Kevin
___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

Re: [PATCH v7 00/16] vfio: expose virtual Shared Virtual Addressing to VMs

2020-09-13 Thread Jason Wang


On 2020/9/10 下午6:45, Liu Yi L wrote:

Shared Virtual Addressing (SVA), a.k.a, Shared Virtual Memory (SVM) on
Intel platforms allows address space sharing between device DMA and
applications. SVA can reduce programming complexity and enhance security.

This VFIO series is intended to expose SVA usage to VMs. i.e. Sharing
guest application address space with passthru devices. This is called
vSVA in this series. The whole vSVA enabling requires QEMU/VFIO/IOMMU
changes. For IOMMU and QEMU changes, they are in separate series (listed
in the "Related series").

The high-level architecture for SVA virtualization is as below, the key
design of vSVA support is to utilize the dual-stage IOMMU translation (
also known as IOMMU nesting translation) capability in host IOMMU.


 .-.  .---.
 |   vIOMMU|  | Guest process CR3, FL only|
 | |  '---'
 ./
 | PASID Entry |--- PASID cache flush -
 '-'   |
 | |   V
 | |CR3 in GPA
 '-'
Guest
--| Shadow |--|
   vv  v
Host
 .-.  .--.
 |   pIOMMU|  | Bind FL for GVA-GPA  |
 | |  '--'
 ./  |
 | PASID Entry | V (Nested xlate)
 '\.--.
 | ||SL for GPA-HPA, default domain|
 | |   '--'
 '-'
Where:
  - FL = First level/stage one page tables
  - SL = Second level/stage two page tables

Patch Overview:
  1. reports IOMMU nesting info to userspace ( patch 0001, 0002, 0003, 0015 , 
0016)
  2. vfio support for PASID allocation and free for VMs (patch 0004, 0005, 0007)
  3. a fix to a revisit in intel iommu driver (patch 0006)
  4. vfio support for binding guest page table to host (patch 0008, 0009, 0010)
  5. vfio support for IOMMU cache invalidation from VMs (patch 0011)
  6. vfio support for vSVA usage on IOMMU-backed mdevs (patch 0012)
  7. expose PASID capability to VM (patch 0013)
  8. add doc for VFIO dual stage control (patch 0014)



If it's possible, I would suggest a generic uAPI instead of a VFIO 
specific one.


Jason suggest something like /dev/sva. There will be a lot of other 
subsystems that could benefit from this (e.g vDPA).


Have you ever considered this approach?

Thanks

___
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu

  1   2   >