Re: [PATCH] util: basic support for vendor-specific vfio drivers

2022-08-05 Thread Laine Stump

On 8/5/22 2:20 PM, Jason Gunthorpe wrote:

On Fri, Aug 05, 2022 at 11:24:08AM -0600, Alex Williamson wrote:

On Thu, 4 Aug 2022 21:11:07 -0300
Jason Gunthorpe  wrote:


On Thu, Aug 04, 2022 at 01:36:24PM -0600, Alex Williamson wrote:


That is reasonable, but I'd say those three kernels only have two
drivers and they both have vfio as a substring in their name - so the
simple thing of just substring searching 'vfio' would get us over that
gap.


Looking at the aliases for exactly "vfio_pci" isn't that much more
complicated, and "feels" a lot more reliable than just doing a substring
search for "vfio" in the driver's name. (It would be, uh,  "not
smart" to name a driver "vfio" if it wasn't actually a vfio
variant driver (or the opposite), but I could imagine it happening; :-/)


This is still pretty hacky. I'm worried about what happens to the
kernel if this becames some crazy unintended uAPI that we never really
thought about carefully... This was not a use case when we designed
the modules.alias stuff at least.

BTW - why not do things the normal way?

1. readlink /sys/bus/pci/devices/XX/iommu_group
2. Compute basename of #1
3. Check if /dev/vfio/#2 exists (or /sys/class/vfio/#2)

It has a small edge case where a multi-device group might give a false
positive for an undrivered device, but for the purposes of libvirt
that seems pretty obscure.. (while the above has false negative
issues, obviously)


This is not a small edge case, it's extremely common.  We have a *lot*
of users assigning desktop GPUs and other consumer grade hardware, which
are usually multi-function devices without isolation exposed via ACS or
quirks.


The edge case is that the user has created a multi-device group,
manually assigned device 1 in the group to VFIO, left device 2 with no
driver and then told libvirt to manually use device 2. With the above
approach libvirt won't detect this misconfiguration and qemu will
fail.


libvirt will see that there is no driver at all, and recognize that, by 
definition, "no driver" == "not a vfio variant driver" :-). So in this 
case libvirt catches the misconfiguration.





The vfio group exists if any devices in the group are bound to a vfio
driver, but the device is not accessible from the group unless the
viability test passes.  That means QEMU may not be able to get access
to the device because the device we want isn't actually bound to a vfio
driver or another device in the group is not in a viable state.  Thanks,


This is a different misconfiguration that libvirt also won't detect,
right? In this case ownership claiming in the kernel will fail and
qemu will fail too, like above.


Right. If we're relying on "iommu group matching" as you suggest, then 
libvirt will mistakenly conclude that the driver is a vfio variant, but 
then qemu will fail.




This, and the above, could be handled by having libvirt also open the
group FD and get the device. It would prove both correct binding and
viability.

I had understood the point of this logic was to give better error
reporting to users so that common misconfigurations could be diagnosed
earlier.


Correct. In the end QEMU and the kernel have the final say of course, 
but if we can detect the problem sooner then it's more likely the user 
will get a meaningful error message.




When I say 'small edge case' I mean it seems like an unlikely
misconfiguration that someone would know to setup VFIO but then use
the wrong BDFs to do it - arguably less likely than someone would know
to setup VFIO but forget to unbind the other drivers in the group?


You obviously haven't spent enough time trying to remotely troubleshoot 
the setups of noobs on IRC :-). If anything can be done wrong, there is 
certainly someone around who will do it that way.


Of course we can't eliminate 100% of these, but the more 
misconfigurations we can catch, and the earlier we can catch them, the 
better.


All of this without any false error reports of course. It's better that 
some edge cases get through (and be shot down by QEMU) rather than that 
we would mistakenly prevent someone from using a totally viable config 
(which is the case right now).



But maybe I don't get it at all ...


Nah, you get it. We just have minor differences of opinion of which 
choice will be the best combination of simple to implement + less user 
headaches (and in the end any of us could be right; personally my 
opinion sways from minute to minute).




Re: [PATCH] util: basic support for vendor-specific vfio drivers

2022-08-05 Thread Laine Stump

On 8/5/22 2:56 PM, Alex Williamson wrote:

On Fri, 5 Aug 2022 15:20:24 -0300
Jason Gunthorpe  wrote:


On Fri, Aug 05, 2022 at 11:24:08AM -0600, Alex Williamson wrote:

On Thu, 4 Aug 2022 21:11:07 -0300
Jason Gunthorpe  wrote:
   

On Thu, Aug 04, 2022 at 01:36:24PM -0600, Alex Williamson wrote:
   

That is reasonable, but I'd say those three kernels only have two
drivers and they both have vfio as a substring in their name - so the
simple thing of just substring searching 'vfio' would get us over that
gap.


Looking at the aliases for exactly "vfio_pci" isn't that much more
complicated, and "feels" a lot more reliable than just doing a substring
search for "vfio" in the driver's name. (It would be, uh,  "not
smart" to name a driver "vfio" if it wasn't actually a vfio
variant driver (or the opposite), but I could imagine it happening; :-/)


This is still pretty hacky. I'm worried about what happens to the
kernel if this becames some crazy unintended uAPI that we never really
thought about carefully... This was not a use case when we designed
the modules.alias stuff at least.

BTW - why not do things the normal way?

1. readlink /sys/bus/pci/devices/XX/iommu_group
2. Compute basename of #1
3. Check if /dev/vfio/#2 exists (or /sys/class/vfio/#2)

It has a small edge case where a multi-device group might give a false
positive for an undrivered device, but for the purposes of libvirt
that seems pretty obscure.. (while the above has false negative
issues, obviously)


This is not a small edge case, it's extremely common.  We have a *lot*
of users assigning desktop GPUs and other consumer grade hardware, which
are usually multi-function devices without isolation exposed via ACS or
quirks.


The edge case is that the user has created a multi-device group,
manually assigned device 1 in the group to VFIO, left device 2 with no
driver and then told libvirt to manually use device 2. With the above
approach libvirt won't detect this misconfiguration and qemu will
fail.


The vfio group exists if any devices in the group are bound to a vfio
driver, but the device is not accessible from the group unless the
viability test passes.  That means QEMU may not be able to get access
to the device because the device we want isn't actually bound to a vfio
driver or another device in the group is not in a viable state.  Thanks,


This is a different misconfiguration that libvirt also won't detect,
right? In this case ownership claiming in the kernel will fail and
qemu will fail too, like above.

This, and the above, could be handled by having libvirt also open the
group FD and get the device. It would prove both correct binding and
viability.


libvirt cannot do this in the group model because the group must be
isolated in a container before the device can be accessed and libvirt
cannot presume the QEMU container configuration.  For direct device
access, this certainly becomes a possibility and I've been trying to
steer things in that direction, libvirt has the option to pass an fd for
the iommufd and can then pass fds for each of the devices in the new
uAPI paradigm.


I had understood the point of this logic was to give better error
reporting to users so that common misconfigurations could be diagnosed
earlier. When I say 'small edge case' I mean it seems like an unlikely
misconfiguration that someone would know to setup VFIO but then use
the wrong BDFs to do it - arguably less likely than someone would know
to setup VFIO but forget to unbind the other drivers in the group?


I'm not sure how much testing libvirt does of other devices in a group,
Laine?


It had been so long since I looked at that, and the code was so obtuse, 
that I had to set something up to see what happened.


1) if there is another devices in the same group, and that device is 
bound to vfio-pci and in use by a different QEMU process, then libvirt 
refuses assign the device in question (it will allow assigning the 
device to the same guest as other devices in the same group.


2) if there is another device in the same group, and that device is 
bound to some other driver than vfio-pci, then libvirt doesn't notice, 
tells QEMU to do the assignment, and QEMU fails.


Without looking/trying, I would have said that libvirt would check for 
(2), but I guess nobody ever tried it :-/




AIUI here, libvirt has a managed='yes|no' option per device.  In the
'yes' case libvirt will unbind devices from their host driver and bind
them to vfio-pci.  In the 'no' case, I believe libvirt is still doing a
sanity test on the driver, but only knows about vfio-pci.


Correct.



The initial step is to then enlighten libvirt that other drivers can be
compatible for the 'no' case and later we can make smarter choices
about which driver to use or allow the user to specify (ie. a user
should be able to use vfio-pci rather than a variant driver if they
choose) in the 'yes' case.


Yes, that's the next step. I just wanted to first add a simple (i.e. 
difficult to botch up) 

Re: [PATCH] util: basic support for vendor-specific vfio drivers

2022-08-05 Thread Alex Williamson
On Fri, 5 Aug 2022 15:20:24 -0300
Jason Gunthorpe  wrote:

> On Fri, Aug 05, 2022 at 11:24:08AM -0600, Alex Williamson wrote:
> > On Thu, 4 Aug 2022 21:11:07 -0300
> > Jason Gunthorpe  wrote:
> >   
> > > On Thu, Aug 04, 2022 at 01:36:24PM -0600, Alex Williamson wrote:
> > >   
> > > > > > That is reasonable, but I'd say those three kernels only have two
> > > > > > drivers and they both have vfio as a substring in their name - so 
> > > > > > the
> > > > > > simple thing of just substring searching 'vfio' would get us over 
> > > > > > that
> > > > > > gap.  
> > > > > 
> > > > > Looking at the aliases for exactly "vfio_pci" isn't that much more 
> > > > > complicated, and "feels" a lot more reliable than just doing a 
> > > > > substring 
> > > > > search for "vfio" in the driver's name. (It would be, uh,  "not 
> > > > > smart" to name a driver "vfio" if it wasn't actually a vfio 
> > > > > variant driver (or the opposite), but I could imagine it happening; 
> > > > > :-/)
> > > 
> > > This is still pretty hacky. I'm worried about what happens to the
> > > kernel if this becames some crazy unintended uAPI that we never really
> > > thought about carefully... This was not a use case when we designed
> > > the modules.alias stuff at least.
> > > 
> > > BTW - why not do things the normal way?
> > > 
> > > 1. readlink /sys/bus/pci/devices/XX/iommu_group
> > > 2. Compute basename of #1
> > > 3. Check if /dev/vfio/#2 exists (or /sys/class/vfio/#2)
> > > 
> > > It has a small edge case where a multi-device group might give a false
> > > positive for an undrivered device, but for the purposes of libvirt
> > > that seems pretty obscure.. (while the above has false negative
> > > issues, obviously)  
> > 
> > This is not a small edge case, it's extremely common.  We have a *lot*
> > of users assigning desktop GPUs and other consumer grade hardware, which
> > are usually multi-function devices without isolation exposed via ACS or
> > quirks.  
> 
> The edge case is that the user has created a multi-device group,
> manually assigned device 1 in the group to VFIO, left device 2 with no
> driver and then told libvirt to manually use device 2. With the above
> approach libvirt won't detect this misconfiguration and qemu will
> fail.
> 
> > The vfio group exists if any devices in the group are bound to a vfio
> > driver, but the device is not accessible from the group unless the
> > viability test passes.  That means QEMU may not be able to get access
> > to the device because the device we want isn't actually bound to a vfio
> > driver or another device in the group is not in a viable state.  Thanks,  
> 
> This is a different misconfiguration that libvirt also won't detect,
> right? In this case ownership claiming in the kernel will fail and
> qemu will fail too, like above.
> 
> This, and the above, could be handled by having libvirt also open the
> group FD and get the device. It would prove both correct binding and
> viability.

libvirt cannot do this in the group model because the group must be
isolated in a container before the device can be accessed and libvirt
cannot presume the QEMU container configuration.  For direct device
access, this certainly becomes a possibility and I've been trying to
steer things in that direction, libvirt has the option to pass an fd for
the iommufd and can then pass fds for each of the devices in the new
uAPI paradigm.

> I had understood the point of this logic was to give better error
> reporting to users so that common misconfigurations could be diagnosed
> earlier. When I say 'small edge case' I mean it seems like an unlikely
> misconfiguration that someone would know to setup VFIO but then use
> the wrong BDFs to do it - arguably less likely than someone would know
> to setup VFIO but forget to unbind the other drivers in the group?

I'm not sure how much testing libvirt does of other devices in a group,
Laine?

AIUI here, libvirt has a managed='yes|no' option per device.  In the
'yes' case libvirt will unbind devices from their host driver and bind
them to vfio-pci.  In the 'no' case, I believe libvirt is still doing a
sanity test on the driver, but only knows about vfio-pci.

The initial step is to then enlighten libvirt that other drivers can be
compatible for the 'no' case and later we can make smarter choices
about which driver to use or allow the user to specify (ie. a user
should be able to use vfio-pci rather than a variant driver if they
choose) in the 'yes' case.

If libvirt is currently testing that only the target device is bound to
vfio-pci, then maybe we do have gaps for the ancillary devices in the
group, but that gap changes if instead we only test that a vfio group
exists relative to the iommu group of the target device.  Thanks,

Alex



Re: [PATCH] util: basic support for vendor-specific vfio drivers

2022-08-05 Thread Alex Williamson
On Thu, 4 Aug 2022 21:11:07 -0300
Jason Gunthorpe  wrote:

> On Thu, Aug 04, 2022 at 01:36:24PM -0600, Alex Williamson wrote:
> 
> > > > That is reasonable, but I'd say those three kernels only have two
> > > > drivers and they both have vfio as a substring in their name - so the
> > > > simple thing of just substring searching 'vfio' would get us over that
> > > > gap.
> > > 
> > > Looking at the aliases for exactly "vfio_pci" isn't that much more 
> > > complicated, and "feels" a lot more reliable than just doing a substring 
> > > search for "vfio" in the driver's name. (It would be, uh,  "not 
> > > smart" to name a driver "vfio" if it wasn't actually a vfio 
> > > variant driver (or the opposite), but I could imagine it happening; :-/)  
> 
> This is still pretty hacky. I'm worried about what happens to the
> kernel if this becames some crazy unintended uAPI that we never really
> thought about carefully... This was not a use case when we designed
> the modules.alias stuff at least.
> 
> BTW - why not do things the normal way?
> 
> 1. readlink /sys/bus/pci/devices/XX/iommu_group
> 2. Compute basename of #1
> 3. Check if /dev/vfio/#2 exists (or /sys/class/vfio/#2)
> 
> It has a small edge case where a multi-device group might give a false
> positive for an undrivered device, but for the purposes of libvirt
> that seems pretty obscure.. (while the above has false negative
> issues, obviously)

This is not a small edge case, it's extremely common.  We have a *lot*
of users assigning desktop GPUs and other consumer grade hardware, which
are usually multi-function devices without isolation exposed via ACS or
quirks.

The vfio group exists if any devices in the group are bound to a vfio
driver, but the device is not accessible from the group unless the
viability test passes.  That means QEMU may not be able to get access
to the device because the device we want isn't actually bound to a vfio
driver or another device in the group is not in a viable state.  Thanks,

Alex



Re: [PATCH] util: basic support for vendor-specific vfio drivers

2022-08-05 Thread Daniel P . Berrangé
On Fri, Aug 05, 2022 at 11:46:17AM -0400, Laine Stump wrote:
> On 8/5/22 5:40 AM, Daniel P. Berrangé wrote:
> > On Mon, Aug 01, 2022 at 09:30:38AM -0400, Laine Stump wrote:
> > > On 8/1/22 7:58 AM, Erik Skultety wrote:
> > > > 
> > > > Instead of calling an external program and then grepping its output 
> > > > which
> > > > technically could change in the future, wouldn't it be better if we read
> > > > /lib/modules/`uname -r`/modules.alias and filtered whatever line had the
> > > > vfio-pci' substring and compared the module name with the user-provided 
> > > > device
> > > > driver?
> > > 
> > > Again, although I was hesistant about calling an external command, and 
> > > asked
> > > if there was something simpler, Alex still suggested modinfo, so I'll let
> > > him answer that. Alex?
> > > 
> > > (Also, although the format of the output of "uname -r" is pretty much
> > > written in stone, you're still running an external command :-))
> > 
> > You wouldn't  actually call 'uname -r', you'd invoke uname(2) function
> > and use the 'release' field in 'struct utsname'.
> 
> Yeah, I wasn't thinking clearly when I said that :-P
> 
> > 
> > I'd favour reading modules.alias directly over invoking modinfo for
> > sure, though I'd be even more in favour of the kernel just exposing
> > the sysfs attribute and in the meanwhile just hardcoding the only 2
> > driver names that exist so far.
> 
> The problem with hardcoding the 2 existing driver names is that it wouldn't
> do any good to anyone developing a new driver, and part of the aim of doing
> this is to make it possible for developers to test their new drivers using
> libvirt (and management systems based on libvirt).

I'm only suggesting hardcoding the driver names, *if* the kernel folks
agree to expose the sysfs directory.  Anyone developing new drivers
is unlikely to have their drivers merged before this new sysfs dir
is added, so it is not a significant enough real world blocker for them
for us to worry about. Best to focus on what the best long term
approach is, and not worry about problems that will only exist for
a couple of kernel releases today.

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



Re: [PATCH] util: basic support for vendor-specific vfio drivers

2022-08-05 Thread Laine Stump

On 8/5/22 5:40 AM, Daniel P. Berrangé wrote:

On Mon, Aug 01, 2022 at 09:30:38AM -0400, Laine Stump wrote:

On 8/1/22 7:58 AM, Erik Skultety wrote:


Instead of calling an external program and then grepping its output which
technically could change in the future, wouldn't it be better if we read
/lib/modules/`uname -r`/modules.alias and filtered whatever line had the
vfio-pci' substring and compared the module name with the user-provided device
driver?


Again, although I was hesistant about calling an external command, and asked
if there was something simpler, Alex still suggested modinfo, so I'll let
him answer that. Alex?

(Also, although the format of the output of "uname -r" is pretty much
written in stone, you're still running an external command :-))


You wouldn't  actually call 'uname -r', you'd invoke uname(2) function
and use the 'release' field in 'struct utsname'.


Yeah, I wasn't thinking clearly when I said that :-P



I'd favour reading modules.alias directly over invoking modinfo for
sure, though I'd be even more in favour of the kernel just exposing
the sysfs attribute and in the meanwhile just hardcoding the only 2
driver names that exist so far.


The problem with hardcoding the 2 existing driver names is that it 
wouldn't do any good to anyone developing a new driver, and part of the 
aim of doing this is to make it possible for developers to test their 
new drivers using libvirt (and management systems based on libvirt).




Re: [PATCH] util: basic support for vendor-specific vfio drivers

2022-08-05 Thread Daniel P . Berrangé
On Mon, Aug 01, 2022 at 09:30:38AM -0400, Laine Stump wrote:
> On 8/1/22 7:58 AM, Erik Skultety wrote:
> > 
> > Instead of calling an external program and then grepping its output which
> > technically could change in the future, wouldn't it be better if we read
> > /lib/modules/`uname -r`/modules.alias and filtered whatever line had the
> > vfio-pci' substring and compared the module name with the user-provided 
> > device
> > driver?
> 
> Again, although I was hesistant about calling an external command, and asked
> if there was something simpler, Alex still suggested modinfo, so I'll let
> him answer that. Alex?
> 
> (Also, although the format of the output of "uname -r" is pretty much
> written in stone, you're still running an external command :-))

You wouldn't  actually call 'uname -r', you'd invoke uname(2) function
and use the 'release' field in 'struct utsname'.

I'd favour reading modules.alias directly over invoking modinfo for
sure, though I'd be even more in favour of the kernel just exposing
the sysfs attribute and in the meanwhile just hardcoding the only 2
driver names that exist so far.

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



Re: [PATCH] util: basic support for vendor-specific vfio drivers

2022-08-05 Thread Daniel P . Berrangé
On Thu, Aug 04, 2022 at 01:51:20PM -0300, Jason Gunthorpe wrote:
> On Mon, Aug 01, 2022 at 09:49:28AM -0600, Alex Williamson wrote:
> 
> > > > > > Fortunately these new vendor/device-specific drivers can be easily
> > > > > > identified as being "vfio-pci + extra stuff" - all that's needed is 
> > > > > > to
> > > > > > look at the output of the "modinfo $driver_name" command to see if
> > > > > > "vfio_pci" is in the alias list for the driver.
> 
> We are moving in a direction on the kernel side to expose a sysfs
> under the PCI device that definitively says it is VFIO enabled, eg
> something like
> 
>  /sys/devices/pci:00/:00:1f.6/vfio/
> 
> Which is how every other subsystem in the kernel works. When this
> lands libvirt can simply stat the vfio directory and confirm that the
> device handle it is looking at is vfio enabled, for all things that
> vfio support.
> 
> My thinking had been to do the above work a bit later, but if libvirt
> needs it right now then lets do it right away so we don't have to
> worry about this hacky modprobe stuff down the road?

I wouldn't go so far as to say libvirt "needs" it, as obviously we can
make it work using  module.alias information.  I would say that exposing
this in sysfs though makes it simpler and faster, because the check then
essentially turns into a single stat() call. So from that POV libvirt
would be happy to see that improvement.

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



Re: [PATCH] util: basic support for vendor-specific vfio drivers

2022-08-05 Thread Daniel P . Berrangé
On Thu, Aug 04, 2022 at 03:11:07PM -0400, Laine Stump wrote:
> On 8/4/22 2:36 PM, Jason Gunthorpe wrote:
> > On Thu, Aug 04, 2022 at 12:18:26PM -0600, Alex Williamson wrote:
> > > On Thu, 4 Aug 2022 13:51:20 -0300
> > > Jason Gunthorpe  wrote:
> > > 
> > > > On Mon, Aug 01, 2022 at 09:49:28AM -0600, Alex Williamson wrote:
> > > > 
> > > > > > > > > Fortunately these new vendor/device-specific drivers can be 
> > > > > > > > > easily
> > > > > > > > > identified as being "vfio-pci + extra stuff" - all that's 
> > > > > > > > > needed is to
> > > > > > > > > look at the output of the "modinfo $driver_name" command to 
> > > > > > > > > see if
> > > > > > > > > "vfio_pci" is in the alias list for the driver.
> > > > 
> > > > We are moving in a direction on the kernel side to expose a sysfs
> > > > under the PCI device that definitively says it is VFIO enabled, eg
> > > > something like
> > > > 
> > > >   /sys/devices/pci:00/:00:1f.6/vfio/
> > > > 
> > > > Which is how every other subsystem in the kernel works. When this
> > > > lands libvirt can simply stat the vfio directory and confirm that the
> > > > device handle it is looking at is vfio enabled, for all things that
> > > > vfio support.
> > > > 
> > > > My thinking had been to do the above work a bit later, but if libvirt
> > > > needs it right now then lets do it right away so we don't have to
> > > > worry about this hacky modprobe stuff down the road?
> > > 
> > > That seems like a pretty long gap, there are vfio-pci variant drivers
> > > since v5.18 and this hasn't even been proposed for v6.0 (aka v5.20)
> > > midway through the merge window.  We therefore have at least 3 kernels
> > > exposing devices in a way that libvirt can't make use of simply due to
> > > a driver matching test.
> > 
> > That is reasonable, but I'd say those three kernels only have two
> > drivers and they both have vfio as a substring in their name - so the
> > simple thing of just substring searching 'vfio' would get us over that
> > gap.
> 
> Looking at the aliases for exactly "vfio_pci" isn't that much more
> complicated, and "feels" a lot more reliable than just doing a substring
> search for "vfio" in the driver's name. (It would be, uh,  "not smart"
> to name a driver "vfio" if it wasn't actually a vfio variant
> driver (or the opposite), but I could imagine it happening; :-/)

If it is just 2 drivers so far then we don't need to even do a
substring. We should do a precise full string match for just
those couple of drivers that exist. We don't need to care about
out of tree drivers IMHO.

With regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|



Re: [PATCH] util: basic support for vendor-specific vfio drivers

2022-08-04 Thread Alex Williamson
On Thu, 4 Aug 2022 15:11:07 -0400
Laine Stump  wrote:

> On 8/4/22 2:36 PM, Jason Gunthorpe wrote:
> > On Thu, Aug 04, 2022 at 12:18:26PM -0600, Alex Williamson wrote:  
> >> On Thu, 4 Aug 2022 13:51:20 -0300
> >> Jason Gunthorpe  wrote:
> >>  
> >>> On Mon, Aug 01, 2022 at 09:49:28AM -0600, Alex Williamson wrote:
> >>>  
>  Fortunately these new vendor/device-specific drivers can be easily
>  identified as being "vfio-pci + extra stuff" - all that's needed is 
>  to
>  look at the output of the "modinfo $driver_name" command to see if
>  "vfio_pci" is in the alias list for the driver.  
> >>>
> >>> We are moving in a direction on the kernel side to expose a sysfs
> >>> under the PCI device that definitively says it is VFIO enabled, eg
> >>> something like
> >>>
> >>>   /sys/devices/pci:00/:00:1f.6/vfio/
> >>>
> >>> Which is how every other subsystem in the kernel works. When this
> >>> lands libvirt can simply stat the vfio directory and confirm that the
> >>> device handle it is looking at is vfio enabled, for all things that
> >>> vfio support.
> >>>
> >>> My thinking had been to do the above work a bit later, but if libvirt
> >>> needs it right now then lets do it right away so we don't have to
> >>> worry about this hacky modprobe stuff down the road?  
> >>
> >> That seems like a pretty long gap, there are vfio-pci variant drivers
> >> since v5.18 and this hasn't even been proposed for v6.0 (aka v5.20)
> >> midway through the merge window.  We therefore have at least 3 kernels
> >> exposing devices in a way that libvirt can't make use of simply due to
> >> a driver matching test.  
> > 
> > That is reasonable, but I'd say those three kernels only have two
> > drivers and they both have vfio as a substring in their name - so the
> > simple thing of just substring searching 'vfio' would get us over that
> > gap.  
> 
> Looking at the aliases for exactly "vfio_pci" isn't that much more 
> complicated, and "feels" a lot more reliable than just doing a substring 
> search for "vfio" in the driver's name. (It would be, uh,  "not 
> smart" to name a driver "vfio" if it wasn't actually a vfio 
> variant driver (or the opposite), but I could imagine it happening; :-/)
> 
> >   
> >> might be leveraged for managed='yes' with variant drivers.  Once vfio
> >> devices expose a chardev themselves, libvirt might order the tests as:  
> > 
> > I wasn't thinking to include the chardev part if we are to expedite
> > this. The struct device bit alone is enough and it doesn't have the
> > complex bits needed to make the cdev.
> > 
> > If you say you want to do it we'll do it for v6.1..  
> 
> Since we already need to do something else as a stop-gap for the interim 
> (in order to avoid making driver developers wait any longer if for no 
> other reason), my opinion would be to not spend extra time splitting up 
> patches just to give us this functionality slightly sooner; we'll anyway 
> have something at least workable in place.

We also need to be careful in adding things piecemeal that libvirt can
determine when new functionality, such as vfio device chardevs, are
actually available and not simply a placeholder to fill a gap
elsewhere.  Thanks,

Alex



Re: [PATCH] util: basic support for vendor-specific vfio drivers

2022-08-04 Thread Laine Stump

On 8/4/22 2:36 PM, Jason Gunthorpe wrote:

On Thu, Aug 04, 2022 at 12:18:26PM -0600, Alex Williamson wrote:

On Thu, 4 Aug 2022 13:51:20 -0300
Jason Gunthorpe  wrote:


On Mon, Aug 01, 2022 at 09:49:28AM -0600, Alex Williamson wrote:


Fortunately these new vendor/device-specific drivers can be easily
identified as being "vfio-pci + extra stuff" - all that's needed is to
look at the output of the "modinfo $driver_name" command to see if
"vfio_pci" is in the alias list for the driver.


We are moving in a direction on the kernel side to expose a sysfs
under the PCI device that definitively says it is VFIO enabled, eg
something like

  /sys/devices/pci:00/:00:1f.6/vfio/

Which is how every other subsystem in the kernel works. When this
lands libvirt can simply stat the vfio directory and confirm that the
device handle it is looking at is vfio enabled, for all things that
vfio support.

My thinking had been to do the above work a bit later, but if libvirt
needs it right now then lets do it right away so we don't have to
worry about this hacky modprobe stuff down the road?


That seems like a pretty long gap, there are vfio-pci variant drivers
since v5.18 and this hasn't even been proposed for v6.0 (aka v5.20)
midway through the merge window.  We therefore have at least 3 kernels
exposing devices in a way that libvirt can't make use of simply due to
a driver matching test.


That is reasonable, but I'd say those three kernels only have two
drivers and they both have vfio as a substring in their name - so the
simple thing of just substring searching 'vfio' would get us over that
gap.


Looking at the aliases for exactly "vfio_pci" isn't that much more 
complicated, and "feels" a lot more reliable than just doing a substring 
search for "vfio" in the driver's name. (It would be, uh,  "not 
smart" to name a driver "vfio" if it wasn't actually a vfio 
variant driver (or the opposite), but I could imagine it happening; :-/)





might be leveraged for managed='yes' with variant drivers.  Once vfio
devices expose a chardev themselves, libvirt might order the tests as:


I wasn't thinking to include the chardev part if we are to expedite
this. The struct device bit alone is enough and it doesn't have the
complex bits needed to make the cdev.

If you say you want to do it we'll do it for v6.1..


Since we already need to do something else as a stop-gap for the interim 
(in order to avoid making driver developers wait any longer if for no 
other reason), my opinion would be to not spend extra time splitting up 
patches just to give us this functionality slightly sooner; we'll anyway 
have something at least workable in place.


Definitely once it is there, libvirt should check for it, since it would 
be quicker and just "feels even more reliable".


I'm updating my patches to directly look at modules.alias and will 
resend based on that.




Re: [PATCH] util: basic support for vendor-specific vfio drivers

2022-08-04 Thread Alex Williamson
On Thu, 4 Aug 2022 13:51:20 -0300
Jason Gunthorpe  wrote:

> On Mon, Aug 01, 2022 at 09:49:28AM -0600, Alex Williamson wrote:
> 
> > > > > > Fortunately these new vendor/device-specific drivers can be easily
> > > > > > identified as being "vfio-pci + extra stuff" - all that's needed is 
> > > > > > to
> > > > > > look at the output of the "modinfo $driver_name" command to see if
> > > > > > "vfio_pci" is in the alias list for the driver.  
> 
> We are moving in a direction on the kernel side to expose a sysfs
> under the PCI device that definitively says it is VFIO enabled, eg
> something like
> 
>  /sys/devices/pci:00/:00:1f.6/vfio/
> 
> Which is how every other subsystem in the kernel works. When this
> lands libvirt can simply stat the vfio directory and confirm that the
> device handle it is looking at is vfio enabled, for all things that
> vfio support.
> 
> My thinking had been to do the above work a bit later, but if libvirt
> needs it right now then lets do it right away so we don't have to
> worry about this hacky modprobe stuff down the road?

That seems like a pretty long gap, there are vfio-pci variant drivers
since v5.18 and this hasn't even been proposed for v6.0 (aka v5.20)
midway through the merge window.  We therefore have at least 3 kernels
exposing devices in a way that libvirt can't make use of simply due to
a driver matching test.

Libvirt needs backwards compatibility, so we'll need it to look for the
vfio-pci driver through some long deprecation period.  In the interim,
it can look at module aliases, support for which will be necessary and
might be leveraged for managed='yes' with variant drivers.  Once vfio
devices expose a chardev themselves, libvirt might order the tests as:

 a) vfio device chardev present
 b) driver is a vfio-pci modalias
 c) driver is vfio-pci

The current state of the world though is that variant driver exist and
libvirt can't make use of them.  Thanks,

Alex



Re: [PATCH] util: basic support for vendor-specific vfio drivers

2022-08-01 Thread Alex Williamson
On Mon, 1 Aug 2022 16:02:05 +0200
Erik Skultety  wrote:

> Putting Alex on CC since I don't see him there:
> +alex.william...@redhat.com

Hmm, Laine cc'd me on the initial post but it seems it got dropped
somewhere.
 
> On Mon, Aug 01, 2022 at 09:30:38AM -0400, Laine Stump wrote:
> > On 8/1/22 7:58 AM, Erik Skultety wrote:  
> > > On Mon, Aug 01, 2022 at 12:02:22AM -0400, Laine Stump wrote:  
> > > > Before a PCI device can be assigned to a guest with VFIO, that device
> > > > must be bound to the vfio-pci driver rather than to the device's
> > > > normal driver. The vfio-pci driver provides APIs that permit QEMU to
> > > > perform all the necessary operations to make the device accessible to
> > > > the guest.
> > > > 
> > > > There has been kernel work recently to support vendor/device-specific
> > > > VFIO drivers that provide the basic vfio-pci driver functionality
> > > > while adding support for device-specific operations (for example these
> > > > device-specific drivers are planned to support live migration of
> > > > certain devices). All that will be needed to make this functionality
> > > > available will be to bind the new vendor-specific driver to the device
> > > > (rather than the generic vfio-pci driver, which will continue to work
> > > > just without the extra functionality).
> > > > 
> > > > But until now libvirt has required that all PCI devices being assigned
> > > > to a guest with VFIO specifically have the "vfio-pci" driver bound to
> > > > the device. So even if the user manually binds a shiny new
> > > > vendor-specific driver to the device (and puts "managed='no'" in the
> > > > config to prevent libvirt from changing that), libvirt will just fail
> > > > during startup of the guest (or during hotplug) because the driver
> > > > bound to the device isn't named exactly "vfio-pci".
> > > > 
> > > > Fortunately these new vendor/device-specific drivers can be easily
> > > > identified as being "vfio-pci + extra stuff" - all that's needed is to
> > > > look at the output of the "modinfo $driver_name" command to see if
> > > > "vfio_pci" is in the alias list for the driver.
> > > > 
> > > > That's what this patch does. When libvirt checks the driver bound to a
> > > > device (either to decide if it needs to bind to a different driver or
> > > > perform some other operation, or if the current driver is acceptable
> > > > as-is), if the driver isn't specifically "vfio-pci", then it will look
> > > > at the output of modinfo for the driver that *is* bound to the device;
> > > > if modinfo shows vfio_pci as an alias for that device, then we'll
> > > > behave as if the driver was exactly "vfio-pci".  
> > > 
> > > Since you say that the vendor/device-specific drivers does each of such 
> > > drivers
> > > implement the base vfio-pci functionality or they simply call into the 
> > > base
> > > driver? The reason why I'm asking is that if each of the vendor-specific
> > > drivers depend on the vfio-pci module to be loaded as well, then reading
> > > /proc/modules should suffice as vfio-pci should be listed right next to 
> > > the
> > > vendor-specific one. What am I missing?  
> > I don't know the definitive answer to that, as I have no example of a
> > working vendor-specific driver to look at and only know about the kernel
> > work going on second-hand from Alex. It looks like even the vfio_pci driver
> > itself depends on other presumably lower level vfio-* modules (it directly
> > uses vfio_pci_core, which in turn uses vfio and vfio_vifqfd), so possibly
> > these new drivers would be depending on one or more of those lower level
> > modules rather than vfio_pci. Also I would imagine it would be possible for
> > other drivers to also depend on the vfio-pci driver while not themselves
> > being a vfio driver.

A module dependency on vfio-pci (actually vfio-pci-core) is a pretty
loose requirement, *any* symbol dependency generates such a linkage,
without necessarily exposing a vfio-pci uAPI.  The alias support
introduced to the kernel is intended to allow userspace to determine
the most appropriate vfio-pci driver for a device, whether that's
vfio-pci itself or a variant driver that augments device specific
features.  See the upstream commit here:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=cc6711b0bf36de068b10490198d05ac168377989

All we're doing here is extending libvirt to say that if the driver is
vfio-pci or the modalias for the driver is prefixed with vfio-pci, then
the driver exposes a vfio-pci compatible uAPI.  I expect in the future
libvirt, or some other utility, may take on the role as described in
the above commit log to not only detect that a driver supports a
vfio-pci uAPI, but also to identify the most appropriate driver for the
device which exposes a vfio-uAPI.

> > > The 'alias' field is optional so do we have any support guarantees from 
> > > the
> > > vendors that the it will always be filled in correctly? I mean you surely
> > > 

Re: [PATCH] util: basic support for vendor-specific vfio drivers

2022-08-01 Thread Erik Skultety
Putting Alex on CC since I don't see him there:
+alex.william...@redhat.com

On Mon, Aug 01, 2022 at 09:30:38AM -0400, Laine Stump wrote:
> On 8/1/22 7:58 AM, Erik Skultety wrote:
> > On Mon, Aug 01, 2022 at 12:02:22AM -0400, Laine Stump wrote:
> > > Before a PCI device can be assigned to a guest with VFIO, that device
> > > must be bound to the vfio-pci driver rather than to the device's
> > > normal driver. The vfio-pci driver provides APIs that permit QEMU to
> > > perform all the necessary operations to make the device accessible to
> > > the guest.
> > > 
> > > There has been kernel work recently to support vendor/device-specific
> > > VFIO drivers that provide the basic vfio-pci driver functionality
> > > while adding support for device-specific operations (for example these
> > > device-specific drivers are planned to support live migration of
> > > certain devices). All that will be needed to make this functionality
> > > available will be to bind the new vendor-specific driver to the device
> > > (rather than the generic vfio-pci driver, which will continue to work
> > > just without the extra functionality).
> > > 
> > > But until now libvirt has required that all PCI devices being assigned
> > > to a guest with VFIO specifically have the "vfio-pci" driver bound to
> > > the device. So even if the user manually binds a shiny new
> > > vendor-specific driver to the device (and puts "managed='no'" in the
> > > config to prevent libvirt from changing that), libvirt will just fail
> > > during startup of the guest (or during hotplug) because the driver
> > > bound to the device isn't named exactly "vfio-pci".
> > > 
> > > Fortunately these new vendor/device-specific drivers can be easily
> > > identified as being "vfio-pci + extra stuff" - all that's needed is to
> > > look at the output of the "modinfo $driver_name" command to see if
> > > "vfio_pci" is in the alias list for the driver.
> > > 
> > > That's what this patch does. When libvirt checks the driver bound to a
> > > device (either to decide if it needs to bind to a different driver or
> > > perform some other operation, or if the current driver is acceptable
> > > as-is), if the driver isn't specifically "vfio-pci", then it will look
> > > at the output of modinfo for the driver that *is* bound to the device;
> > > if modinfo shows vfio_pci as an alias for that device, then we'll
> > > behave as if the driver was exactly "vfio-pci".
> > 
> > Since you say that the vendor/device-specific drivers does each of such 
> > drivers
> > implement the base vfio-pci functionality or they simply call into the base
> > driver? The reason why I'm asking is that if each of the vendor-specific
> > drivers depend on the vfio-pci module to be loaded as well, then reading
> > /proc/modules should suffice as vfio-pci should be listed right next to the
> > vendor-specific one. What am I missing?
> I don't know the definitive answer to that, as I have no example of a
> working vendor-specific driver to look at and only know about the kernel
> work going on second-hand from Alex. It looks like even the vfio_pci driver
> itself depends on other presumably lower level vfio-* modules (it directly
> uses vfio_pci_core, which in turn uses vfio and vfio_vifqfd), so possibly
> these new drivers would be depending on one or more of those lower level
> modules rather than vfio_pci. Also I would imagine it would be possible for
> other drivers to also depend on the vfio-pci driver while not themselves
> being a vfio driver.
> 
> > 
> > The 'alias' field is optional so do we have any support guarantees from the
> > vendors that the it will always be filled in correctly? I mean you surely
> > handle that case in the code, but once we start supporting this there's no 
> > way
> > back and we already know how painful it can be to convince the vendors to
> > follow some kind of standard so that we don't need to maintain several code
> > paths based on a vendor-matrix.
> 
> The aliases are what is used to determine the "best" vfio driver for a
> particular device, so I don't think it would be possible for a driver to not
> implement it, and the method I've used here to determine if a driver is a
> vfio driver was recommended by Alex after a couple of discussions on the
> subject.
> 
> > 
> > ...
> > 
> > > +int
> > > +virPCIDeviceGetDriverNameAndType(virPCIDevice *dev,
> > > + char **drvName,
> > > + virPCIStubDriver *drvType)
> > > +{
> > > +g_autofree char *drvPath = NULL;
> > > +g_autoptr(virCommand) cmd = NULL;
> > > +g_autofree char *output = NULL;
> > > +g_autoptr(GRegex) regex = NULL;
> > > +g_autoptr(GError) err = NULL;
> > > +g_autoptr(GMatchInfo) info = NULL;
> > > +int exit;
> > > +int tmpType;
> > > +
> > > +if (virPCIDeviceGetDriverPathAndName(dev, , drvName) < 0)
> > > +return -1;
> > > +
> > > +if (!*drvName) {
> > > +*drvType = 

Re: [PATCH] util: basic support for vendor-specific vfio drivers

2022-08-01 Thread Erik Skultety
On Mon, Aug 01, 2022 at 09:30:38AM -0400, Laine Stump wrote:
> On 8/1/22 7:58 AM, Erik Skultety wrote:
> > On Mon, Aug 01, 2022 at 12:02:22AM -0400, Laine Stump wrote:
> > > Before a PCI device can be assigned to a guest with VFIO, that device
> > > must be bound to the vfio-pci driver rather than to the device's
> > > normal driver. The vfio-pci driver provides APIs that permit QEMU to
> > > perform all the necessary operations to make the device accessible to
> > > the guest.
> > > 
> > > There has been kernel work recently to support vendor/device-specific
> > > VFIO drivers that provide the basic vfio-pci driver functionality
> > > while adding support for device-specific operations (for example these
> > > device-specific drivers are planned to support live migration of
> > > certain devices). All that will be needed to make this functionality
> > > available will be to bind the new vendor-specific driver to the device
> > > (rather than the generic vfio-pci driver, which will continue to work
> > > just without the extra functionality).
> > > 
> > > But until now libvirt has required that all PCI devices being assigned
> > > to a guest with VFIO specifically have the "vfio-pci" driver bound to
> > > the device. So even if the user manually binds a shiny new
> > > vendor-specific driver to the device (and puts "managed='no'" in the
> > > config to prevent libvirt from changing that), libvirt will just fail
> > > during startup of the guest (or during hotplug) because the driver
> > > bound to the device isn't named exactly "vfio-pci".
> > > 
> > > Fortunately these new vendor/device-specific drivers can be easily
> > > identified as being "vfio-pci + extra stuff" - all that's needed is to
> > > look at the output of the "modinfo $driver_name" command to see if
> > > "vfio_pci" is in the alias list for the driver.
> > > 
> > > That's what this patch does. When libvirt checks the driver bound to a
> > > device (either to decide if it needs to bind to a different driver or
> > > perform some other operation, or if the current driver is acceptable
> > > as-is), if the driver isn't specifically "vfio-pci", then it will look
> > > at the output of modinfo for the driver that *is* bound to the device;
> > > if modinfo shows vfio_pci as an alias for that device, then we'll
> > > behave as if the driver was exactly "vfio-pci".
> > 
> > Since you say that the vendor/device-specific drivers does each of such 
> > drivers
> > implement the base vfio-pci functionality or they simply call into the base
> > driver? The reason why I'm asking is that if each of the vendor-specific
> > drivers depend on the vfio-pci module to be loaded as well, then reading
> > /proc/modules should suffice as vfio-pci should be listed right next to the
> > vendor-specific one. What am I missing?
> I don't know the definitive answer to that, as I have no example of a
> working vendor-specific driver to look at and only know about the kernel
> work going on second-hand from Alex. It looks like even the vfio_pci driver
> itself depends on other presumably lower level vfio-* modules (it directly
> uses vfio_pci_core, which in turn uses vfio and vfio_vifqfd), so possibly
> these new drivers would be depending on one or more of those lower level
> modules rather than vfio_pci. Also I would imagine it would be possible for
> other drivers to also depend on the vfio-pci driver while not themselves
> being a vfio driver.
> 
> > 
> > The 'alias' field is optional so do we have any support guarantees from the
> > vendors that the it will always be filled in correctly? I mean you surely
> > handle that case in the code, but once we start supporting this there's no 
> > way
> > back and we already know how painful it can be to convince the vendors to
> > follow some kind of standard so that we don't need to maintain several code
> > paths based on a vendor-matrix.
> 
> The aliases are what is used to determine the "best" vfio driver for a
> particular device, so I don't think it would be possible for a driver to not
> implement it, and the method I've used here to determine if a driver is a
> vfio driver was recommended by Alex after a couple of discussions on the
> subject.
> 
> > 
> > ...
> > 
> > > +int
> > > +virPCIDeviceGetDriverNameAndType(virPCIDevice *dev,
> > > + char **drvName,
> > > + virPCIStubDriver *drvType)
> > > +{
> > > +g_autofree char *drvPath = NULL;
> > > +g_autoptr(virCommand) cmd = NULL;
> > > +g_autofree char *output = NULL;
> > > +g_autoptr(GRegex) regex = NULL;
> > > +g_autoptr(GError) err = NULL;
> > > +g_autoptr(GMatchInfo) info = NULL;
> > > +int exit;
> > > +int tmpType;
> > > +
> > > +if (virPCIDeviceGetDriverPathAndName(dev, , drvName) < 0)
> > > +return -1;
> > > +
> > > +if (!*drvName) {
> > > +*drvType = VIR_PCI_STUB_DRIVER_NONE;
> > > +return 0;
> > > +}
> > > +
> > > +tmpType = 

Re: [PATCH] util: basic support for vendor-specific vfio drivers

2022-08-01 Thread Laine Stump

On 8/1/22 7:58 AM, Erik Skultety wrote:

On Mon, Aug 01, 2022 at 12:02:22AM -0400, Laine Stump wrote:

Before a PCI device can be assigned to a guest with VFIO, that device
must be bound to the vfio-pci driver rather than to the device's
normal driver. The vfio-pci driver provides APIs that permit QEMU to
perform all the necessary operations to make the device accessible to
the guest.

There has been kernel work recently to support vendor/device-specific
VFIO drivers that provide the basic vfio-pci driver functionality
while adding support for device-specific operations (for example these
device-specific drivers are planned to support live migration of
certain devices). All that will be needed to make this functionality
available will be to bind the new vendor-specific driver to the device
(rather than the generic vfio-pci driver, which will continue to work
just without the extra functionality).

But until now libvirt has required that all PCI devices being assigned
to a guest with VFIO specifically have the "vfio-pci" driver bound to
the device. So even if the user manually binds a shiny new
vendor-specific driver to the device (and puts "managed='no'" in the
config to prevent libvirt from changing that), libvirt will just fail
during startup of the guest (or during hotplug) because the driver
bound to the device isn't named exactly "vfio-pci".

Fortunately these new vendor/device-specific drivers can be easily
identified as being "vfio-pci + extra stuff" - all that's needed is to
look at the output of the "modinfo $driver_name" command to see if
"vfio_pci" is in the alias list for the driver.

That's what this patch does. When libvirt checks the driver bound to a
device (either to decide if it needs to bind to a different driver or
perform some other operation, or if the current driver is acceptable
as-is), if the driver isn't specifically "vfio-pci", then it will look
at the output of modinfo for the driver that *is* bound to the device;
if modinfo shows vfio_pci as an alias for that device, then we'll
behave as if the driver was exactly "vfio-pci".


Since you say that the vendor/device-specific drivers does each of such drivers
implement the base vfio-pci functionality or they simply call into the base
driver? The reason why I'm asking is that if each of the vendor-specific
drivers depend on the vfio-pci module to be loaded as well, then reading
/proc/modules should suffice as vfio-pci should be listed right next to the
vendor-specific one. What am I missing?
I don't know the definitive answer to that, as I have no example of a 
working vendor-specific driver to look at and only know about the kernel 
work going on second-hand from Alex. It looks like even the vfio_pci 
driver itself depends on other presumably lower level vfio-* modules (it 
directly uses vfio_pci_core, which in turn uses vfio and vfio_vifqfd), 
so possibly these new drivers would be depending on one or more of those 
lower level modules rather than vfio_pci. Also I would imagine it would 
be possible for other drivers to also depend on the vfio-pci driver 
while not themselves being a vfio driver.




The 'alias' field is optional so do we have any support guarantees from the
vendors that the it will always be filled in correctly? I mean you surely
handle that case in the code, but once we start supporting this there's no way
back and we already know how painful it can be to convince the vendors to
follow some kind of standard so that we don't need to maintain several code
paths based on a vendor-matrix.


The aliases are what is used to determine the "best" vfio driver for a 
particular device, so I don't think it would be possible for a driver to 
not implement it, and the method I've used here to determine if a driver 
is a vfio driver was recommended by Alex after a couple of discussions 
on the subject.




...


+int
+virPCIDeviceGetDriverNameAndType(virPCIDevice *dev,
+ char **drvName,
+ virPCIStubDriver *drvType)
+{
+g_autofree char *drvPath = NULL;
+g_autoptr(virCommand) cmd = NULL;
+g_autofree char *output = NULL;
+g_autoptr(GRegex) regex = NULL;
+g_autoptr(GError) err = NULL;
+g_autoptr(GMatchInfo) info = NULL;
+int exit;
+int tmpType;
+
+if (virPCIDeviceGetDriverPathAndName(dev, , drvName) < 0)
+return -1;
+
+if (!*drvName) {
+*drvType = VIR_PCI_STUB_DRIVER_NONE;
+return 0;
+}
+
+tmpType = virPCIStubDriverTypeFromString(*drvName);
+
+if (tmpType > VIR_PCI_STUB_DRIVER_NONE) {
+*drvType = tmpType;
+return 0; /* exact match of a known driver name (or no name) */
+}
+
+/* Check the output of "modinfo $drvName" to see if it has
+ * "vfio_pci" as an alias. If it does, then this driver should
+ * also be considered as a vfio-pci driver, because it implements
+ * all the functionality of the basic vfio-pci (plus additional
+ * device-specific 

Re: [PATCH] util: basic support for vendor-specific vfio drivers

2022-08-01 Thread Erik Skultety
On Mon, Aug 01, 2022 at 12:02:22AM -0400, Laine Stump wrote:
> Before a PCI device can be assigned to a guest with VFIO, that device
> must be bound to the vfio-pci driver rather than to the device's
> normal driver. The vfio-pci driver provides APIs that permit QEMU to
> perform all the necessary operations to make the device accessible to
> the guest.
> 
> There has been kernel work recently to support vendor/device-specific
> VFIO drivers that provide the basic vfio-pci driver functionality
> while adding support for device-specific operations (for example these
> device-specific drivers are planned to support live migration of
> certain devices). All that will be needed to make this functionality
> available will be to bind the new vendor-specific driver to the device
> (rather than the generic vfio-pci driver, which will continue to work
> just without the extra functionality).
> 
> But until now libvirt has required that all PCI devices being assigned
> to a guest with VFIO specifically have the "vfio-pci" driver bound to
> the device. So even if the user manually binds a shiny new
> vendor-specific driver to the device (and puts "managed='no'" in the
> config to prevent libvirt from changing that), libvirt will just fail
> during startup of the guest (or during hotplug) because the driver
> bound to the device isn't named exactly "vfio-pci".
> 
> Fortunately these new vendor/device-specific drivers can be easily
> identified as being "vfio-pci + extra stuff" - all that's needed is to
> look at the output of the "modinfo $driver_name" command to see if
> "vfio_pci" is in the alias list for the driver.
> 
> That's what this patch does. When libvirt checks the driver bound to a
> device (either to decide if it needs to bind to a different driver or
> perform some other operation, or if the current driver is acceptable
> as-is), if the driver isn't specifically "vfio-pci", then it will look
> at the output of modinfo for the driver that *is* bound to the device;
> if modinfo shows vfio_pci as an alias for that device, then we'll
> behave as if the driver was exactly "vfio-pci".

Since you say that the vendor/device-specific drivers does each of such drivers
implement the base vfio-pci functionality or they simply call into the base
driver? The reason why I'm asking is that if each of the vendor-specific
drivers depend on the vfio-pci module to be loaded as well, then reading
/proc/modules should suffice as vfio-pci should be listed right next to the
vendor-specific one. What am I missing?

The 'alias' field is optional so do we have any support guarantees from the
vendors that the it will always be filled in correctly? I mean you surely
handle that case in the code, but once we start supporting this there's no way
back and we already know how painful it can be to convince the vendors to
follow some kind of standard so that we don't need to maintain several code
paths based on a vendor-matrix.

...

> +int
> +virPCIDeviceGetDriverNameAndType(virPCIDevice *dev,
> + char **drvName,
> + virPCIStubDriver *drvType)
> +{
> +g_autofree char *drvPath = NULL;
> +g_autoptr(virCommand) cmd = NULL;
> +g_autofree char *output = NULL;
> +g_autoptr(GRegex) regex = NULL;
> +g_autoptr(GError) err = NULL;
> +g_autoptr(GMatchInfo) info = NULL;
> +int exit;
> +int tmpType;
> +
> +if (virPCIDeviceGetDriverPathAndName(dev, , drvName) < 0)
> +return -1;
> +
> +if (!*drvName) {
> +*drvType = VIR_PCI_STUB_DRIVER_NONE;
> +return 0;
> +}
> +
> +tmpType = virPCIStubDriverTypeFromString(*drvName);
> +
> +if (tmpType > VIR_PCI_STUB_DRIVER_NONE) {
> +*drvType = tmpType;
> +return 0; /* exact match of a known driver name (or no name) */
> +}
> +
> +/* Check the output of "modinfo $drvName" to see if it has
> + * "vfio_pci" as an alias. If it does, then this driver should
> + * also be considered as a vfio-pci driver, because it implements
> + * all the functionality of the basic vfio-pci (plus additional
> + * device-specific stuff).
> + */

Instead of calling an external program and then grepping its output which
technically could change in the future, wouldn't it be better if we read
/lib/modules/`uname -r`/modules.alias and filtered whatever line had the
vfio-pci' substring and compared the module name with the user-provided device
driver?
If not then I think you should pass '-F alias' to the command to speed up the
regex just a tiny bit.

Regards,
Erik



[PATCH] util: basic support for vendor-specific vfio drivers

2022-07-31 Thread Laine Stump
Before a PCI device can be assigned to a guest with VFIO, that device
must be bound to the vfio-pci driver rather than to the device's
normal driver. The vfio-pci driver provides APIs that permit QEMU to
perform all the necessary operations to make the device accessible to
the guest.

There has been kernel work recently to support vendor/device-specific
VFIO drivers that provide the basic vfio-pci driver functionality
while adding support for device-specific operations (for example these
device-specific drivers are planned to support live migration of
certain devices). All that will be needed to make this functionality
available will be to bind the new vendor-specific driver to the device
(rather than the generic vfio-pci driver, which will continue to work
just without the extra functionality).

But until now libvirt has required that all PCI devices being assigned
to a guest with VFIO specifically have the "vfio-pci" driver bound to
the device. So even if the user manually binds a shiny new
vendor-specific driver to the device (and puts "managed='no'" in the
config to prevent libvirt from changing that), libvirt will just fail
during startup of the guest (or during hotplug) because the driver
bound to the device isn't named exactly "vfio-pci".

Fortunately these new vendor/device-specific drivers can be easily
identified as being "vfio-pci + extra stuff" - all that's needed is to
look at the output of the "modinfo $driver_name" command to see if
"vfio_pci" is in the alias list for the driver.

That's what this patch does. When libvirt checks the driver bound to a
device (either to decide if it needs to bind to a different driver or
perform some other operation, or if the current driver is acceptable
as-is), if the driver isn't specifically "vfio-pci", then it will look
at the output of modinfo for the driver that *is* bound to the device;
if modinfo shows vfio_pci as an alias for that device, then we'll
behave as if the driver was exactly "vfio-pci".

The effect of this patch is that users will now be able to pre-setup a
device to be bound to a vendor-specific driver, then put
"managed='no'" in the config and libvirt will allow that driver.

What this patch does *not* do is handle automatically determining the
proper/best vendor-specific driver and binding to it in the case of
"managed='yes'". This will be implemented later when there is a widely
available driver / device combo we can use for testing. This initial
simple patch is just something simple that will permit initial testing
of the new drivers' functionality.

(I personally had to add an extra patch playing with driver names to
my build just to test that everything was working as expected; that's
okay for a patch as simple as this, but wouldn't be acceptable testing
for anything more complex.)

Signed-off-by: Laine Stump 
---
 meson.build |  1 +
 src/hypervisor/virhostdev.c | 26 ---
 src/util/virpci.c   | 90 ++---
 src/util/virpci.h   |  3 ++
 4 files changed, 97 insertions(+), 23 deletions(-)

diff --git a/meson.build b/meson.build
index de59b1be9c..9d96eb3ee3 100644
--- a/meson.build
+++ b/meson.build
@@ -822,6 +822,7 @@ optional_programs = [
   'iscsiadm',
   'mdevctl',
   'mm-ctl',
+  'modinfo',
   'modprobe',
   'ovs-vsctl',
   'pdwtags',
diff --git a/src/hypervisor/virhostdev.c b/src/hypervisor/virhostdev.c
index c0ce867596..15b35fa75e 100644
--- a/src/hypervisor/virhostdev.c
+++ b/src/hypervisor/virhostdev.c
@@ -747,9 +747,8 @@ virHostdevPreparePCIDevicesImpl(virHostdevManager *mgr,
mgr->inactivePCIHostdevs) < 0)
 goto reattachdevs;
 } else {
-g_autofree char *driverPath = NULL;
-g_autofree char *driverName = NULL;
-int stub;
+g_autofree char *drvName = NULL;
+virPCIStubDriver drvType;
 
 /* Unmanaged devices should already have been marked as
  * inactive: if that's the case, we can simply move on */
@@ -769,18 +768,14 @@ virHostdevPreparePCIDevicesImpl(virHostdevManager *mgr,
  *   information about active / inactive device across
  *   daemon restarts has been implemented */
 
-if (virPCIDeviceGetDriverPathAndName(pci,
- , ) < 0)
+if (virPCIDeviceGetDriverNameAndType(pci, , ) < 0)
 goto reattachdevs;
 
-stub = virPCIStubDriverTypeFromString(driverName);
-
-if (stub > VIR_PCI_STUB_DRIVER_NONE &&
-stub < VIR_PCI_STUB_DRIVER_LAST) {
+if (drvType > VIR_PCI_STUB_DRIVER_NONE) {
 
 /* The device is bound to a known stub driver: store this
  * information and add a copy to the inactive list */
-virPCIDeviceSetStubDriver(pci, stub);
+virPCIDeviceSetStubDriver(pci, drvType);