subject:"Re\: \[RFC PATCH\] s390x\/pci\: vfio\-pci breakage with disabled mem enforcement"

Re: [RFC PATCH] s390x/pci: vfio-pci breakage with disabled mem enforcement

2020-07-28 Thread Niklas Schnelle




On 7/28/20 2:52 PM, Alex Williamson wrote:
> On Tue, 28 Jul 2020 11:33:55 +0200
> Niklas Schnelle  wrote:
> 
>> On 7/27/20 6:47 PM, Alex Williamson wrote:
>>> On Mon, 27 Jul 2020 17:40:39 +0200
>>> Pierre Morel  wrote:
>>>   
 On 2020-07-23 18:29, Alex Williamson wrote:  
> On Thu, 23 Jul 2020 11:13:55 -0400
> Matthew Rosato  wrote:
> 
>> I noticed that after kernel commit abafbc55 'vfio-pci: Invalidate mmaps
>> and block MMIO access on disabled memory' vfio-pci via qemu on s390x
>> fails spectacularly, with errors in qemu like:  
>> ... snip ...

 Alex, Matt,

 in s390 we have the possibility to assign a virtual function to a 
 logical partition as function 0.
 In this case it can not be treated as a virtual function but must be 
 treated as a physical function.
 This is currently working very well.
 However, these functions do not set PCI_COMMAND_MEMORY as we need.  
>>>
>>> Where is the vendor and device ID virtualization done for these
>>> devices, we can't have a PF with IDs :.  
>> Pierre doesn't mean the Device/Vendor IDs he means it has devfn == 0
>> so it is the mandatory function zero on it's PCI bus, where until recently
>> we always had only one function per bus but with the recent multi-function
>> support it can act more like on other platforms with several PCI functions
>> sharing the same Bus e.g. a PF and the VFs created through sriov_numvfs.
>> That's why I'm saying that having devfn == 0 should not be very special for 
>> a VF
>> passed to a VM and I really don't see where it would not act like a VF passed
>> from any other Hypervisor.
> 
> My question is relative to other registers on VFs that are not
> implemented in hardware, not the device address.  In addition to the
> memory bit of the command register, SR-IOV VFs do not implement the
> vendor and device IDs, these are to be virtualized from the values
> provided in the PF SR-IOV capability.  It wouldn't make much sense to
> expose a device with no PCI vendor or device ID, so I assume the z/VM
> hypervisor is virtualizing these, but not the memory bit.
Ahh, yes I see. On Z these are actually already virtualized at the LPAR
level as part of the firmware based scanning I mentioned that is the
reason for pdev->no_vf_scan. This is true even for VFs created
through sriov_numvfs in the host which is why I did not realize these
don't come from hardware, even though looking at drivers/pci/iov.c it's
pretty obvious and I did touch that code before. Sorry for the confusion.
>  
>> The only really tricky part in my opinion is where during the "probing"
>> we do set is_virtfn so it happens both for VFs passed-through from z/VM
>> or LPAR and VFs created through sriov_numvfs which unlike on other platforms
>> are also scanned by Firmware (pdev->no_vf_scan disables the Linux side 
>> scanning).
>> With the fix I'm currently testing I had to do this in 
>> pcibios_enable_device()
>> because I also create sysfs links between VFs and their parent PFs and those
>> need the sysfs entries to be already created, which makes the more 
>> apropriately
>> sound pcibios_bus_add_device() too early.
> 
> 
> I don't think it would be wise to set is_virtfn without a physfn being
> present in the OS view.  I believe pci_dev.is_virtfn implies
> pci_dev.physfn points to the PF device.  Thanks> 
> Alex
Thank you a lot for that hint, you're absolutely right, while the
drivers do work with is_virtfn == 1 && physffn == NULL
vfio-pci gets very confused. And sorry Pierre for doubting
your is_virtfn_detached idea, this thing really is confusing.
> 
 Shouldn't we fix this inside the kernel, to keep older QMEU working?

 Then would it be OK to add a new bit/boolean inside the 
 pci_dev/vfio_pci_device like, is_detached_vfn, that we could set during 
 enumeration and test inside __vfio_pci_memory_enabled() to return true?  
>>>
>>> Probably each instance of is_virtfn in vfio-pci should be looked at to
>>> see if it applies to s390.  If we're going to recognize this as a VF,
>>> I'd rather we complete the emulation that the lower level hypervisor
>>> has missed.  If we can enable all the is_virtfn code on s390, then we
>>> should probably cache is_virtfn on the vfio_pci_device object and allow
>>> s390 a place to set it once at probe or enable time.
>>>   
 In the enumeration we have the possibility to know if the function is a 
 HW/Firmware virtual function on devfn 0 or if it is created by SRIOV.

 It seems an easy fix without side effects.

 What do you think?  
>>>
>>> It sure seems preferable to recognize that it is a VF in the kernel
>>> than to require userspace to have arch specific hacks.  Thanks,
>>>
>>> Alex
>>>   
>>
>

Re: [RFC PATCH] s390x/pci: vfio-pci breakage with disabled mem enforcement

2020-07-28 Thread Alex Williamson

On Tue, 28 Jul 2020 11:33:55 +0200
Niklas Schnelle  wrote:

> On 7/27/20 6:47 PM, Alex Williamson wrote:
> > On Mon, 27 Jul 2020 17:40:39 +0200
> > Pierre Morel  wrote:
> >   
> >> On 2020-07-23 18:29, Alex Williamson wrote:  
> >>> On Thu, 23 Jul 2020 11:13:55 -0400
> >>> Matthew Rosato  wrote:
> >>> 
>  I noticed that after kernel commit abafbc55 'vfio-pci: Invalidate mmaps
>  and block MMIO access on disabled memory' vfio-pci via qemu on s390x
>  fails spectacularly, with errors in qemu like:  
> ... snip ...
> >>
> >> Alex, Matt,
> >>
> >> in s390 we have the possibility to assign a virtual function to a 
> >> logical partition as function 0.
> >> In this case it can not be treated as a virtual function but must be 
> >> treated as a physical function.
> >> This is currently working very well.
> >> However, these functions do not set PCI_COMMAND_MEMORY as we need.  
> > 
> > Where is the vendor and device ID virtualization done for these
> > devices, we can't have a PF with IDs :.  
> Pierre doesn't mean the Device/Vendor IDs he means it has devfn == 0
> so it is the mandatory function zero on it's PCI bus, where until recently
> we always had only one function per bus but with the recent multi-function
> support it can act more like on other platforms with several PCI functions
> sharing the same Bus e.g. a PF and the VFs created through sriov_numvfs.
> That's why I'm saying that having devfn == 0 should not be very special for a 
> VF
> passed to a VM and I really don't see where it would not act like a VF passed
> from any other Hypervisor.

My question is relative to other registers on VFs that are not
implemented in hardware, not the device address.  In addition to the
memory bit of the command register, SR-IOV VFs do not implement the
vendor and device IDs, these are to be virtualized from the values
provided in the PF SR-IOV capability.  It wouldn't make much sense to
expose a device with no PCI vendor or device ID, so I assume the z/VM
hypervisor is virtualizing these, but not the memory bit.
 
> The only really tricky part in my opinion is where during the "probing"
> we do set is_virtfn so it happens both for VFs passed-through from z/VM
> or LPAR and VFs created through sriov_numvfs which unlike on other platforms
> are also scanned by Firmware (pdev->no_vf_scan disables the Linux side 
> scanning).
> With the fix I'm currently testing I had to do this in pcibios_enable_device()
> because I also create sysfs links between VFs and their parent PFs and those
> need the sysfs entries to be already created, which makes the more 
> apropriately
> sound pcibios_bus_add_device() too early.


I don't think it would be wise to set is_virtfn without a physfn being
present in the OS view.  I believe pci_dev.is_virtfn implies
pci_dev.physfn points to the PF device.  Thanks,

Alex

> >> Shouldn't we fix this inside the kernel, to keep older QMEU working?
> >>
> >> Then would it be OK to add a new bit/boolean inside the 
> >> pci_dev/vfio_pci_device like, is_detached_vfn, that we could set during 
> >> enumeration and test inside __vfio_pci_memory_enabled() to return true?  
> > 
> > Probably each instance of is_virtfn in vfio-pci should be looked at to
> > see if it applies to s390.  If we're going to recognize this as a VF,
> > I'd rather we complete the emulation that the lower level hypervisor
> > has missed.  If we can enable all the is_virtfn code on s390, then we
> > should probably cache is_virtfn on the vfio_pci_device object and allow
> > s390 a place to set it once at probe or enable time.
> >   
> >> In the enumeration we have the possibility to know if the function is a 
> >> HW/Firmware virtual function on devfn 0 or if it is created by SRIOV.
> >>
> >> It seems an easy fix without side effects.
> >>
> >> What do you think?  
> > 
> > It sure seems preferable to recognize that it is a VF in the kernel
> > than to require userspace to have arch specific hacks.  Thanks,
> > 
> > Alex
> >   
>

Re: [RFC PATCH] s390x/pci: vfio-pci breakage with disabled mem enforcement

2020-07-28 Thread Niklas Schnelle




On 7/27/20 6:47 PM, Alex Williamson wrote:
> On Mon, 27 Jul 2020 17:40:39 +0200
> Pierre Morel  wrote:
> 
>> On 2020-07-23 18:29, Alex Williamson wrote:
>>> On Thu, 23 Jul 2020 11:13:55 -0400
>>> Matthew Rosato  wrote:
>>>   
 I noticed that after kernel commit abafbc55 'vfio-pci: Invalidate mmaps
 and block MMIO access on disabled memory' vfio-pci via qemu on s390x
 fails spectacularly, with errors in qemu like:
... snip ...
>>
>> Alex, Matt,
>>
>> in s390 we have the possibility to assign a virtual function to a 
>> logical partition as function 0.
>> In this case it can not be treated as a virtual function but must be 
>> treated as a physical function.
>> This is currently working very well.
>> However, these functions do not set PCI_COMMAND_MEMORY as we need.
> 
> Where is the vendor and device ID virtualization done for these
> devices, we can't have a PF with IDs :.
Pierre doesn't mean the Device/Vendor IDs he means it has devfn == 0
so it is the mandatory function zero on it's PCI bus, where until recently
we always had only one function per bus but with the recent multi-function
support it can act more like on other platforms with several PCI functions
sharing the same Bus e.g. a PF and the VFs created through sriov_numvfs.
That's why I'm saying that having devfn == 0 should not be very special for a VF
passed to a VM and I really don't see where it would not act like a VF passed
from any other Hypervisor.

The only really tricky part in my opinion is where during the "probing"
we do set is_virtfn so it happens both for VFs passed-through from z/VM
or LPAR and VFs created through sriov_numvfs which unlike on other platforms
are also scanned by Firmware (pdev->no_vf_scan disables the Linux side 
scanning).
With the fix I'm currently testing I had to do this in pcibios_enable_device()
because I also create sysfs links between VFs and their parent PFs and those
need the sysfs entries to be already created, which makes the more apropriately
sound pcibios_bus_add_device() too early.
> 
>> Shouldn't we fix this inside the kernel, to keep older QMEU working?
>>
>> Then would it be OK to add a new bit/boolean inside the 
>> pci_dev/vfio_pci_device like, is_detached_vfn, that we could set during 
>> enumeration and test inside __vfio_pci_memory_enabled() to return true?
> 
> Probably each instance of is_virtfn in vfio-pci should be looked at to
> see if it applies to s390.  If we're going to recognize this as a VF,
> I'd rather we complete the emulation that the lower level hypervisor
> has missed.  If we can enable all the is_virtfn code on s390, then we
> should probably cache is_virtfn on the vfio_pci_device object and allow
> s390 a place to set it once at probe or enable time.
> 
>> In the enumeration we have the possibility to know if the function is a 
>> HW/Firmware virtual function on devfn 0 or if it is created by SRIOV.
>>
>> It seems an easy fix without side effects.
>>
>> What do you think?
> 
> It sure seems preferable to recognize that it is a VF in the kernel
> than to require userspace to have arch specific hacks.  Thanks,
> 
> Alex
>

Re: [RFC PATCH] s390x/pci: vfio-pci breakage with disabled mem enforcement

2020-07-28 Thread Niklas Schnelle




On 7/27/20 5:40 PM, Pierre Morel wrote:
> 
> 
> On 2020-07-23 18:29, Alex Williamson wrote:
>> On Thu, 23 Jul 2020 11:13:55 -0400
>> Matthew Rosato  wrote:
>>
>>> I noticed that after kernel commit abafbc55 'vfio-pci: Invalidate mmaps
>>> and block MMIO access on disabled memory' vfio-pci via qemu on s390x
>>> fails spectacularly, with errors in qemu like:
>>>
>>> qemu-system-s390x: vfio_region_read(0001:00:00.0:region0+0x0, 4) failed: 
>>> Input/output error
>>>
>>>  From read to bar 0 originating out of 
>>> hw/s390x/s390-pci-inst.c:zpci_read_bar().
>>>
>>> So, I'm trying to figure out how to get vfio-pci happy again on s390x.  From
>>> a bit of tracing, we seem to be triggering the new trap in
>>> __vfio_pci_memory_enabled().  Sure enough, if I just force this function to
>>> return 'true' as a test case, things work again.
>>> The included patch attempts to enforce the setting, which restores 
>>> everything
>>> to working order but also triggers vfio_bar_restore() in the process  So
>>> this isn't the right answer, more of a proof-of-concept.
>>>
>>> @Alex: Any guidance on what needs to happen to make qemu-s390x happy with 
>>> this
>>> recent kernel change?
>>
>> Bummer!  I won't claim to understand s390 PCI, but if we have a VF
>> exposed to the "host" (ie. the first level where vfio-pci is being
>> used), but we can't tell that it's a VF, how do we know whether the
>> memory bit in the command register is unimplemented because it's a VF
>> or unimplemented because the device doesn't support MMIO?  How are the
>> device ID, vendor ID, and BAR registers virtualized to the host?  Could
>> the memory enable bit also be emulated by that virtualization, much
>> like vfio-pci does for userspace?  If the other registers are
>> virtualized, but these command register bits are left unimplemented, do
>> we need code to deduce that we have a VF based on the existence of MMIO
>> BARs, but lack of memory enable bit?  Thanks,
>>
>> Alex
> 
> Alex, Matt,
> 
> in s390 we have the possibility to assign a virtual function to a logical 
> partition as function 0.
> In this case it can not be treated as a virtual function but must be treated 
> as a physical function.
> This is currently working very well.
Can you explain why it must be treated as a physical function and must not have 
is_virtfn set?
I'm currently reworking my fix for PF/VF linking not happening for all ways to 
attach a
VF and in that I intend to set is_virtfn = 1 also for VFs that are not linked 
with a PF
including those attached to an LPAR.

So far I really can not see a reason why that should not work since I was wrong 
before
and Firmware does tell us that these are indeed VFs (zdev->is_physfn == 0).
AFAIK on nearly all platforms guests will often have a VF as function zero on a 
bus
because that is what I expect to happen if you pass it through as a PCI 
function.
So unless I'm missing something, that just makes LPAR look more like a QEMU 
guest on
another platform which is very likely much more well tested than treating a VF 
as
a PF as we have been doing.
> However, these functions do not set PCI_COMMAND_MEMORY as we need.
> 
> Shouldn't we fix this inside the kernel, to keep older QMEU working?
> 
> Then would it be OK to add a new bit/boolean inside the 
> pci_dev/vfio_pci_device like, is_detached_vfn, that we could set during 
> enumeration and test inside __vfio_pci_memory_enabled() to return true?
This does not make sense to me, as I wrote above it's totally normal for VMs to 
see VFs detached
from the PF as they are passed-through to a QEMU guest so IMHO that's already 
covered by the meaning
of is_virtfn.
> 
> In the enumeration we have the possibility to know if the function is a 
> HW/Firmware virtual function on devfn 0 or if it is created by SRIOV.
>

Re: [RFC PATCH] s390x/pci: vfio-pci breakage with disabled mem enforcement

2020-07-27 Thread Alex Williamson

On Mon, 27 Jul 2020 17:40:39 +0200
Pierre Morel  wrote:

> On 2020-07-23 18:29, Alex Williamson wrote:
> > On Thu, 23 Jul 2020 11:13:55 -0400
> > Matthew Rosato  wrote:
> >   
> >> I noticed that after kernel commit abafbc55 'vfio-pci: Invalidate mmaps
> >> and block MMIO access on disabled memory' vfio-pci via qemu on s390x
> >> fails spectacularly, with errors in qemu like:
> >>
> >> qemu-system-s390x: vfio_region_read(0001:00:00.0:region0+0x0, 4) failed: 
> >> Input/output error
> >>
> >>  From read to bar 0 originating out of 
> >> hw/s390x/s390-pci-inst.c:zpci_read_bar().
> >>
> >> So, I'm trying to figure out how to get vfio-pci happy again on s390x.  
> >> From
> >> a bit of tracing, we seem to be triggering the new trap in
> >> __vfio_pci_memory_enabled().  Sure enough, if I just force this function to
> >> return 'true' as a test case, things work again.
> >> The included patch attempts to enforce the setting, which restores 
> >> everything
> >> to working order but also triggers vfio_bar_restore() in the process  
> >> So
> >> this isn't the right answer, more of a proof-of-concept.
> >>
> >> @Alex: Any guidance on what needs to happen to make qemu-s390x happy with 
> >> this
> >> recent kernel change?  
> > 
> > Bummer!  I won't claim to understand s390 PCI, but if we have a VF
> > exposed to the "host" (ie. the first level where vfio-pci is being
> > used), but we can't tell that it's a VF, how do we know whether the
> > memory bit in the command register is unimplemented because it's a VF
> > or unimplemented because the device doesn't support MMIO?  How are the
> > device ID, vendor ID, and BAR registers virtualized to the host?  Could
> > the memory enable bit also be emulated by that virtualization, much
> > like vfio-pci does for userspace?  If the other registers are
> > virtualized, but these command register bits are left unimplemented, do
> > we need code to deduce that we have a VF based on the existence of MMIO
> > BARs, but lack of memory enable bit?  Thanks,
> > 
> > Alex  
> 
> Alex, Matt,
> 
> in s390 we have the possibility to assign a virtual function to a 
> logical partition as function 0.
> In this case it can not be treated as a virtual function but must be 
> treated as a physical function.
> This is currently working very well.
> However, these functions do not set PCI_COMMAND_MEMORY as we need.

Where is the vendor and device ID virtualization done for these
devices, we can't have a PF with IDs :.

> Shouldn't we fix this inside the kernel, to keep older QMEU working?
> 
> Then would it be OK to add a new bit/boolean inside the 
> pci_dev/vfio_pci_device like, is_detached_vfn, that we could set during 
> enumeration and test inside __vfio_pci_memory_enabled() to return true?

Probably each instance of is_virtfn in vfio-pci should be looked at to
see if it applies to s390.  If we're going to recognize this as a VF,
I'd rather we complete the emulation that the lower level hypervisor
has missed.  If we can enable all the is_virtfn code on s390, then we
should probably cache is_virtfn on the vfio_pci_device object and allow
s390 a place to set it once at probe or enable time.

> In the enumeration we have the possibility to know if the function is a 
> HW/Firmware virtual function on devfn 0 or if it is created by SRIOV.
> 
> It seems an easy fix without side effects.
> 
> What do you think?

It sure seems preferable to recognize that it is a VF in the kernel
than to require userspace to have arch specific hacks.  Thanks,

Alex

Re: [RFC PATCH] s390x/pci: vfio-pci breakage with disabled mem enforcement

2020-07-27 Thread Matthew Rosato


On 7/27/20 11:40 AM, Pierre Morel wrote:



On 2020-07-23 18:29, Alex Williamson wrote:

On Thu, 23 Jul 2020 11:13:55 -0400
Matthew Rosato  wrote:


I noticed that after kernel commit abafbc55 'vfio-pci: Invalidate mmaps
and block MMIO access on disabled memory' vfio-pci via qemu on s390x
fails spectacularly, with errors in qemu like:

qemu-system-s390x: vfio_region_read(0001:00:00.0:region0+0x0, 4) 
failed: Input/output error


 From read to bar 0 originating out of 
hw/s390x/s390-pci-inst.c:zpci_read_bar().


So, I'm trying to figure out how to get vfio-pci happy again on 
s390x.  From

a bit of tracing, we seem to be triggering the new trap in
__vfio_pci_memory_enabled().  Sure enough, if I just force this 
function to

return 'true' as a test case, things work again.
The included patch attempts to enforce the setting, which restores 
everything
to working order but also triggers vfio_bar_restore() in the 
process  So

this isn't the right answer, more of a proof-of-concept.

@Alex: Any guidance on what needs to happen to make qemu-s390x happy 
with this

recent kernel change?


Bummer!  I won't claim to understand s390 PCI, but if we have a VF
exposed to the "host" (ie. the first level where vfio-pci is being
used), but we can't tell that it's a VF, how do we know whether the
memory bit in the command register is unimplemented because it's a VF
or unimplemented because the device doesn't support MMIO?  How are the
device ID, vendor ID, and BAR registers virtualized to the host?  Could
the memory enable bit also be emulated by that virtualization, much
like vfio-pci does for userspace?  If the other registers are
virtualized, but these command register bits are left unimplemented, do
we need code to deduce that we have a VF based on the existence of MMIO
BARs, but lack of memory enable bit?  Thanks,

Alex


Alex, Matt,

in s390 we have the possibility to assign a virtual function to a 
logical partition as function 0.
In this case it can not be treated as a virtual function but must be 
treated as a physical function.

This is currently working very well.
However, these functions do not set PCI_COMMAND_MEMORY as we need.



Thanks for the explanation.


Shouldn't we fix this inside the kernel, to keep older QMEU working?


Yes, ideally.



Then would it be OK to add a new bit/boolean inside the 
pci_dev/vfio_pci_device like, is_detached_vfn, that we could set during 
enumeration and test inside __vfio_pci_memory_enabled() to return true?


In the enumeration we have the possibility to know if the function is a 
HW/Firmware virtual function on devfn 0 or if it is created by SRIOV.


It seems an easy fix without side effects.

What do you think?



From my perspective at least, this would resolve the issue and is 
in-line with the sort of suggestion Alex floated the other day of "do

we need code to deduce that we have a VF based on the existence of MMIO
BARs, but lack of memory enable bit?" -- it just sounds like a different 
way of coming to the conclusion.  Effectively we need a way to say: 'for 
the purposes of memory_enable detection, treat this thing like a virtual 
function because it is one, even though it doesn't always act like one'






@Nilkas/@Pierre: I wonder if this might be related to host device 
is_virtfn?

I note that my host device lspci output looks like:

:00:00.0 Ethernet controller: Mellanox Technologies MT27710 
Family [ConnectX-4 Lx Virtual Function]


But the device is not marked as is_virtfn..  Otherwise, Alex's fix
from htps://lkml.org/lkml/2020/6/25/628 should cover the case.



I do not think we can fix this problem by forcing the is_virtfn bit.
AFAIU, our HW virtual function works a lot like a physical function.




Matthew Rosato (1):
   s390x/pci: Enforce PCI_COMMAND_MEMORY for vfio-pci

  hw/s390x/s390-pci-inst.c | 10 ++
  1 file changed, 10 insertions(+)






Regards,
Pierre

Re: [RFC PATCH] s390x/pci: vfio-pci breakage with disabled mem enforcement

2020-07-27 Thread Pierre Morel





On 2020-07-23 18:29, Alex Williamson wrote:

On Thu, 23 Jul 2020 11:13:55 -0400
Matthew Rosato  wrote:


I noticed that after kernel commit abafbc55 'vfio-pci: Invalidate mmaps
and block MMIO access on disabled memory' vfio-pci via qemu on s390x
fails spectacularly, with errors in qemu like:

qemu-system-s390x: vfio_region_read(0001:00:00.0:region0+0x0, 4) failed: 
Input/output error

 From read to bar 0 originating out of hw/s390x/s390-pci-inst.c:zpci_read_bar().

So, I'm trying to figure out how to get vfio-pci happy again on s390x.  From
a bit of tracing, we seem to be triggering the new trap in
__vfio_pci_memory_enabled().  Sure enough, if I just force this function to
return 'true' as a test case, things work again.
The included patch attempts to enforce the setting, which restores everything
to working order but also triggers vfio_bar_restore() in the process  So
this isn't the right answer, more of a proof-of-concept.

@Alex: Any guidance on what needs to happen to make qemu-s390x happy with this
recent kernel change?


Bummer!  I won't claim to understand s390 PCI, but if we have a VF
exposed to the "host" (ie. the first level where vfio-pci is being
used), but we can't tell that it's a VF, how do we know whether the
memory bit in the command register is unimplemented because it's a VF
or unimplemented because the device doesn't support MMIO?  How are the
device ID, vendor ID, and BAR registers virtualized to the host?  Could
the memory enable bit also be emulated by that virtualization, much
like vfio-pci does for userspace?  If the other registers are
virtualized, but these command register bits are left unimplemented, do
we need code to deduce that we have a VF based on the existence of MMIO
BARs, but lack of memory enable bit?  Thanks,

Alex


Alex, Matt,

in s390 we have the possibility to assign a virtual function to a 
logical partition as function 0.
In this case it can not be treated as a virtual function but must be 
treated as a physical function.

This is currently working very well.
However, these functions do not set PCI_COMMAND_MEMORY as we need.

Shouldn't we fix this inside the kernel, to keep older QMEU working?

Then would it be OK to add a new bit/boolean inside the 
pci_dev/vfio_pci_device like, is_detached_vfn, that we could set during 
enumeration and test inside __vfio_pci_memory_enabled() to return true?


In the enumeration we have the possibility to know if the function is a 
HW/Firmware virtual function on devfn 0 or if it is created by SRIOV.


It seems an easy fix without side effects.

What do you think?





@Nilkas/@Pierre: I wonder if this might be related to host device is_virtfn?
I note that my host device lspci output looks like:

:00:00.0 Ethernet controller: Mellanox Technologies MT27710 Family 
[ConnectX-4 Lx Virtual Function]

But the device is not marked as is_virtfn..  Otherwise, Alex's fix
from htps://lkml.org/lkml/2020/6/25/628 should cover the case.



I do not think we can fix this problem by forcing the is_virtfn bit.
AFAIU, our HW virtual function works a lot like a physical function.




Matthew Rosato (1):
   s390x/pci: Enforce PCI_COMMAND_MEMORY for vfio-pci

  hw/s390x/s390-pci-inst.c | 10 ++
  1 file changed, 10 insertions(+)






Regards,
Pierre

--
Pierre Morel
IBM Lab Boeblingen

Re: [RFC PATCH] s390x/pci: vfio-pci breakage with disabled mem enforcement

2020-07-24 Thread Niklas Schnelle




On 7/24/20 11:46 AM, Niklas Schnelle wrote:
> 
> 
> On 7/23/20 5:13 PM, Matthew Rosato wrote:
>> I noticed that after kernel commit abafbc55 'vfio-pci: Invalidate mmaps
>> and block MMIO access on disabled memory' vfio-pci via qemu on s390x
>> fails spectacularly, with errors in qemu like:
>>
>> qemu-system-s390x: vfio_region_read(0001:00:00.0:region0+0x0, 4) failed: 
>> Input/output error
>>
>> From read to bar 0 originating out of 
>> hw/s390x/s390-pci-inst.c:zpci_read_bar().
>>
>> So, I'm trying to figure out how to get vfio-pci happy again on s390x.  From
>> a bit of tracing, we seem to be triggering the new trap in
>> __vfio_pci_memory_enabled().  Sure enough, if I just force this function to
>> return 'true' as a test case, things work again.
>> The included patch attempts to enforce the setting, which restores everything
>> to working order but also triggers vfio_bar_restore() in the process  So
>> this isn't the right answer, more of a proof-of-concept.
>>
>> @Alex: Any guidance on what needs to happen to make qemu-s390x happy with 
>> this
>> recent kernel change?
>>
>> @Nilkas/@Pierre: I wonder if this might be related to host device is_virtfn?
>> I note that my host device lspci output looks like:
>>
>> :00:00.0 Ethernet controller: Mellanox Technologies MT27710 Family 
>> [ConnectX-4 Lx Virtual Function]
>>
>> But the device is not marked as is_virtfn..  Otherwise, Alex's fix
>> from htps://lkml.org/lkml/2020/6/25/628 should cover the case.
> With commit e5794cf1a270 ("s390/pci: create links between PFs and VFs") I 
> introduced
> the is_physfn field to struct zpci_dev which gets set through the
> CLP Query PCI Function. Also with that commit this being 0 will set
> is_virtfn to 1.
> Interestingly looking at s390-pci-inst.c in QEMU I'd think that
> on QEMU this should already be 0 and thus is_virtfn should be set
> with Linux >5.8-rc1 and the missing case is actually for passing through
> a PF where it would wrongly be 0 too. 
> Note: If the Linux instance does not see the
> parent PF however the only way I know to test if it is a VF from userspace
> is checking if /sys/bus/pci/devices//vfn is non-zero which is platform
> specific and currently wrongly set 0 on QEMU for VFs.
> If the PF is known the mentioned commit will also create the
> /sys/bus/pci/devices//physfn symlink as on other platforms.
Ok I've been looking into this more and is_physfn is correctly set
to 0 on RoCE VFs, there is just a Bug in zPCI preventing the
is_virtfn from beeing set when the device is found in
CLP List PCI devices (equivalent too boot time bus probing
on other architectures) instead of beeing hot plugged later
which is the case for sriov_numvfs.
I'm working on a fix for a bug with PF/VF linking anyway so
should get that fixed then.
>> Matthew Rosato (1):
>>   s390x/pci: Enforce PCI_COMMAND_MEMORY for vfio-pci
>>
>>  hw/s390x/s390-pci-inst.c | 10 ++
>>  1 file changed, 10 insertions(+)
>>

Re: [RFC PATCH] s390x/pci: vfio-pci breakage with disabled mem enforcement

2020-07-24 Thread Niklas Schnelle




On 7/24/20 11:46 AM, Niklas Schnelle wrote:
> 
> 
> On 7/23/20 5:13 PM, Matthew Rosato wrote:
>> I noticed that after kernel commit abafbc55 'vfio-pci: Invalidate mmaps
>> and block MMIO access on disabled memory' vfio-pci via qemu on s390x
>> fails spectacularly, with errors in qemu like:
>>
>> qemu-system-s390x: vfio_region_read(0001:00:00.0:region0+0x0, 4) failed: 
>> Input/output error
>>
>> From read to bar 0 originating out of 
>> hw/s390x/s390-pci-inst.c:zpci_read_bar().
>>
>> So, I'm trying to figure out how to get vfio-pci happy again on s390x.  From
>> a bit of tracing, we seem to be triggering the new trap in
>> __vfio_pci_memory_enabled().  Sure enough, if I just force this function to
>> return 'true' as a test case, things work again.
>> The included patch attempts to enforce the setting, which restores everything
>> to working order but also triggers vfio_bar_restore() in the process  So
>> this isn't the right answer, more of a proof-of-concept.
>>
>> @Alex: Any guidance on what needs to happen to make qemu-s390x happy with 
>> this
>> recent kernel change?
>>
>> @Nilkas/@Pierre: I wonder if this might be related to host device is_virtfn?
>> I note that my host device lspci output looks like:
>>
>> :00:00.0 Ethernet controller: Mellanox Technologies MT27710 Family 
>> [ConnectX-4 Lx Virtual Function]
>>
>> But the device is not marked as is_virtfn..  Otherwise, Alex's fix
>> from htps://lkml.org/lkml/2020/6/25/628 should cover the case.
> With commit e5794cf1a270 ("s390/pci: create links between PFs and VFs") I 
> introduced
> the is_physfn field to struct zpci_dev which gets set through the
> CLP Query PCI Function. Also with that commit this being 0 will set
> is_virtfn to 1.
> Interestingly looking at s390-pci-inst.c in QEMU I'd think that
> on QEMU this should already be 0 and thus is_virtfn should be set
> with Linux >5.8-rc1 and the missing case is actually for passing through
> a PF where it would wrongly be 0 too. 
> Note: If the Linux instance does not see the
> parent PF however the only way I know to test if it is a VF from userspace
> is checking if /sys/bus/pci/devices//vfn is non-zero which is platform
> specific and currently wrongly set 0 on QEMU for VFs.
> If the PF is known the mentioned commit will also create the
> /sys/bus/pci/devices//physfn symlink as on other platforms.
Arghh, sorry the problem is of course that is_virtfn is not set in
the host. I thought it should be but testing this the is_physfn
bit is actually non-zero for RoCEs on z/VM and LPAR. I will
try figuring out why that is, I guess I should have used the
vfn field instead but I thought is_physfn would be more explicit :-(
>> Matthew Rosato (1):
>>   s390x/pci: Enforce PCI_COMMAND_MEMORY for vfio-pci
>>
>>  hw/s390x/s390-pci-inst.c | 10 ++
>>  1 file changed, 10 insertions(+)
>>

Re: [RFC PATCH] s390x/pci: vfio-pci breakage with disabled mem enforcement

2020-07-24 Thread Niklas Schnelle




On 7/23/20 5:13 PM, Matthew Rosato wrote:
> I noticed that after kernel commit abafbc55 'vfio-pci: Invalidate mmaps
> and block MMIO access on disabled memory' vfio-pci via qemu on s390x
> fails spectacularly, with errors in qemu like:
> 
> qemu-system-s390x: vfio_region_read(0001:00:00.0:region0+0x0, 4) failed: 
> Input/output error
> 
> From read to bar 0 originating out of 
> hw/s390x/s390-pci-inst.c:zpci_read_bar().
> 
> So, I'm trying to figure out how to get vfio-pci happy again on s390x.  From
> a bit of tracing, we seem to be triggering the new trap in
> __vfio_pci_memory_enabled().  Sure enough, if I just force this function to
> return 'true' as a test case, things work again.
> The included patch attempts to enforce the setting, which restores everything
> to working order but also triggers vfio_bar_restore() in the process  So
> this isn't the right answer, more of a proof-of-concept.
> 
> @Alex: Any guidance on what needs to happen to make qemu-s390x happy with this
> recent kernel change?
> 
> @Nilkas/@Pierre: I wonder if this might be related to host device is_virtfn?
> I note that my host device lspci output looks like:
> 
> :00:00.0 Ethernet controller: Mellanox Technologies MT27710 Family 
> [ConnectX-4 Lx Virtual Function]
> 
> But the device is not marked as is_virtfn..  Otherwise, Alex's fix
> from htps://lkml.org/lkml/2020/6/25/628 should cover the case.
With commit e5794cf1a270 ("s390/pci: create links between PFs and VFs") I 
introduced
the is_physfn field to struct zpci_dev which gets set through the
CLP Query PCI Function. Also with that commit this being 0 will set
is_virtfn to 1.
Interestingly looking at s390-pci-inst.c in QEMU I'd think that
on QEMU this should already be 0 and thus is_virtfn should be set
with Linux >5.8-rc1 and the missing case is actually for passing through
a PF where it would wrongly be 0 too. 
Note: If the Linux instance does not see the
parent PF however the only way I know to test if it is a VF from userspace
is checking if /sys/bus/pci/devices//vfn is non-zero which is platform
specific and currently wrongly set 0 on QEMU for VFs.
If the PF is known the mentioned commit will also create the
/sys/bus/pci/devices//physfn symlink as on other platforms.
> Matthew Rosato (1):
>   s390x/pci: Enforce PCI_COMMAND_MEMORY for vfio-pci
> 
>  hw/s390x/s390-pci-inst.c | 10 ++
>  1 file changed, 10 insertions(+)
>

Re: [RFC PATCH] s390x/pci: vfio-pci breakage with disabled mem enforcement

2020-07-23 Thread Matthew Rosato


On 7/23/20 12:29 PM, Alex Williamson wrote:

On Thu, 23 Jul 2020 11:13:55 -0400
Matthew Rosato  wrote:


I noticed that after kernel commit abafbc55 'vfio-pci: Invalidate mmaps
and block MMIO access on disabled memory' vfio-pci via qemu on s390x
fails spectacularly, with errors in qemu like:

qemu-system-s390x: vfio_region_read(0001:00:00.0:region0+0x0, 4) failed: 
Input/output error

 From read to bar 0 originating out of hw/s390x/s390-pci-inst.c:zpci_read_bar().

So, I'm trying to figure out how to get vfio-pci happy again on s390x.  From
a bit of tracing, we seem to be triggering the new trap in
__vfio_pci_memory_enabled().  Sure enough, if I just force this function to
return 'true' as a test case, things work again.
The included patch attempts to enforce the setting, which restores everything
to working order but also triggers vfio_bar_restore() in the process  So
this isn't the right answer, more of a proof-of-concept.

@Alex: Any guidance on what needs to happen to make qemu-s390x happy with this
recent kernel change?


Bummer!  I won't claim to understand s390 PCI, but if we have a VF
exposed to the "host" (ie. the first level where vfio-pci is being
used), but we can't tell that it's a VF, how do we know whether the
memory bit in the command register is unimplemented because it's a VF
or unimplemented because the device doesn't support MMIO?  How are the
device ID, vendor ID, and BAR registers virtualized to the host?  Could


On s390 this info is all advertised/accessed via special instructions 
(my PoC qemu patch was intercepting one of these sorts of instructions 
from the guest and modifying it).



the memory enable bit also be emulated by that virtualization, much
like vfio-pci does for userspace?  If the other registers are
virtualized, but these command register bits are left unimplemented, do
we need code to deduce that we have a VF based on the existence of MMIO
BARs, but lack of memory enable bit?  Thanks,


Yeah, I'm thinking we might need something to this effect for s390 at 
least.  But I'm curious if Niklas/Pierre have any add'l thoughts about 
that since they touched the virtfn stuff recently for s390 PCI.




Alex


@Nilkas/@Pierre: I wonder if this might be related to host device is_virtfn?
I note that my host device lspci output looks like:

:00:00.0 Ethernet controller: Mellanox Technologies MT27710 Family 
[ConnectX-4 Lx Virtual Function]

But the device is not marked as is_virtfn..  Otherwise, Alex's fix
from htps://lkml.org/lkml/2020/6/25/628 should cover the case.



Matthew Rosato (1):
   s390x/pci: Enforce PCI_COMMAND_MEMORY for vfio-pci

  hw/s390x/s390-pci-inst.c | 10 ++
  1 file changed, 10 insertions(+)

Re: [RFC PATCH] s390x/pci: vfio-pci breakage with disabled mem enforcement

2020-07-23 Thread Alex Williamson

On Thu, 23 Jul 2020 11:13:55 -0400
Matthew Rosato  wrote:

> I noticed that after kernel commit abafbc55 'vfio-pci: Invalidate mmaps
> and block MMIO access on disabled memory' vfio-pci via qemu on s390x
> fails spectacularly, with errors in qemu like:
> 
> qemu-system-s390x: vfio_region_read(0001:00:00.0:region0+0x0, 4) failed: 
> Input/output error
> 
> From read to bar 0 originating out of 
> hw/s390x/s390-pci-inst.c:zpci_read_bar().
> 
> So, I'm trying to figure out how to get vfio-pci happy again on s390x.  From
> a bit of tracing, we seem to be triggering the new trap in
> __vfio_pci_memory_enabled().  Sure enough, if I just force this function to
> return 'true' as a test case, things work again.
> The included patch attempts to enforce the setting, which restores everything
> to working order but also triggers vfio_bar_restore() in the process  So
> this isn't the right answer, more of a proof-of-concept.
> 
> @Alex: Any guidance on what needs to happen to make qemu-s390x happy with this
> recent kernel change?

Bummer!  I won't claim to understand s390 PCI, but if we have a VF
exposed to the "host" (ie. the first level where vfio-pci is being
used), but we can't tell that it's a VF, how do we know whether the
memory bit in the command register is unimplemented because it's a VF
or unimplemented because the device doesn't support MMIO?  How are the
device ID, vendor ID, and BAR registers virtualized to the host?  Could
the memory enable bit also be emulated by that virtualization, much
like vfio-pci does for userspace?  If the other registers are
virtualized, but these command register bits are left unimplemented, do
we need code to deduce that we have a VF based on the existence of MMIO
BARs, but lack of memory enable bit?  Thanks,

Alex

> @Nilkas/@Pierre: I wonder if this might be related to host device is_virtfn?
> I note that my host device lspci output looks like:
> 
> :00:00.0 Ethernet controller: Mellanox Technologies MT27710 Family 
> [ConnectX-4 Lx Virtual Function]
> 
> But the device is not marked as is_virtfn..  Otherwise, Alex's fix
> from htps://lkml.org/lkml/2020/6/25/628 should cover the case.
> 
> 
> 
> Matthew Rosato (1):
>   s390x/pci: Enforce PCI_COMMAND_MEMORY for vfio-pci
> 
>  hw/s390x/s390-pci-inst.c | 10 ++
>  1 file changed, 10 insertions(+)
>

Re: [RFC PATCH] s390x/pci: vfio-pci breakage with disabled mem enforcement

Re: [RFC PATCH] s390x/pci: vfio-pci breakage with disabled mem enforcement

Re: [RFC PATCH] s390x/pci: vfio-pci breakage with disabled mem enforcement

Re: [RFC PATCH] s390x/pci: vfio-pci breakage with disabled mem enforcement

Re: [RFC PATCH] s390x/pci: vfio-pci breakage with disabled mem enforcement

Re: [RFC PATCH] s390x/pci: vfio-pci breakage with disabled mem enforcement

Re: [RFC PATCH] s390x/pci: vfio-pci breakage with disabled mem enforcement

Re: [RFC PATCH] s390x/pci: vfio-pci breakage with disabled mem enforcement

Re: [RFC PATCH] s390x/pci: vfio-pci breakage with disabled mem enforcement

Re: [RFC PATCH] s390x/pci: vfio-pci breakage with disabled mem enforcement

Re: [RFC PATCH] s390x/pci: vfio-pci breakage with disabled mem enforcement

Re: [RFC PATCH] s390x/pci: vfio-pci breakage with disabled mem enforcement

12 matches

Site Navigation

Mail list logo

Footer information