Re: [RFC PATCH] s390x/pci: vfio-pci breakage with disabled mem enforcement
On 7/28/20 2:52 PM, Alex Williamson wrote: > On Tue, 28 Jul 2020 11:33:55 +0200 > Niklas Schnelle wrote: > >> On 7/27/20 6:47 PM, Alex Williamson wrote: >>> On Mon, 27 Jul 2020 17:40:39 +0200 >>> Pierre Morel wrote: >>> On 2020-07-23 18:29, Alex Williamson wrote: > On Thu, 23 Jul 2020 11:13:55 -0400 > Matthew Rosato wrote: > >> I noticed that after kernel commit abafbc55 'vfio-pci: Invalidate mmaps >> and block MMIO access on disabled memory' vfio-pci via qemu on s390x >> fails spectacularly, with errors in qemu like: >> ... snip ... Alex, Matt, in s390 we have the possibility to assign a virtual function to a logical partition as function 0. In this case it can not be treated as a virtual function but must be treated as a physical function. This is currently working very well. However, these functions do not set PCI_COMMAND_MEMORY as we need. >>> >>> Where is the vendor and device ID virtualization done for these >>> devices, we can't have a PF with IDs :. >> Pierre doesn't mean the Device/Vendor IDs he means it has devfn == 0 >> so it is the mandatory function zero on it's PCI bus, where until recently >> we always had only one function per bus but with the recent multi-function >> support it can act more like on other platforms with several PCI functions >> sharing the same Bus e.g. a PF and the VFs created through sriov_numvfs. >> That's why I'm saying that having devfn == 0 should not be very special for >> a VF >> passed to a VM and I really don't see where it would not act like a VF passed >> from any other Hypervisor. > > My question is relative to other registers on VFs that are not > implemented in hardware, not the device address. In addition to the > memory bit of the command register, SR-IOV VFs do not implement the > vendor and device IDs, these are to be virtualized from the values > provided in the PF SR-IOV capability. It wouldn't make much sense to > expose a device with no PCI vendor or device ID, so I assume the z/VM > hypervisor is virtualizing these, but not the memory bit. Ahh, yes I see. On Z these are actually already virtualized at the LPAR level as part of the firmware based scanning I mentioned that is the reason for pdev->no_vf_scan. This is true even for VFs created through sriov_numvfs in the host which is why I did not realize these don't come from hardware, even though looking at drivers/pci/iov.c it's pretty obvious and I did touch that code before. Sorry for the confusion. > >> The only really tricky part in my opinion is where during the "probing" >> we do set is_virtfn so it happens both for VFs passed-through from z/VM >> or LPAR and VFs created through sriov_numvfs which unlike on other platforms >> are also scanned by Firmware (pdev->no_vf_scan disables the Linux side >> scanning). >> With the fix I'm currently testing I had to do this in >> pcibios_enable_device() >> because I also create sysfs links between VFs and their parent PFs and those >> need the sysfs entries to be already created, which makes the more >> apropriately >> sound pcibios_bus_add_device() too early. > > > I don't think it would be wise to set is_virtfn without a physfn being > present in the OS view. I believe pci_dev.is_virtfn implies > pci_dev.physfn points to the PF device. Thanks> > Alex Thank you a lot for that hint, you're absolutely right, while the drivers do work with is_virtfn == 1 && physffn == NULL vfio-pci gets very confused. And sorry Pierre for doubting your is_virtfn_detached idea, this thing really is confusing. > Shouldn't we fix this inside the kernel, to keep older QMEU working? Then would it be OK to add a new bit/boolean inside the pci_dev/vfio_pci_device like, is_detached_vfn, that we could set during enumeration and test inside __vfio_pci_memory_enabled() to return true? >>> >>> Probably each instance of is_virtfn in vfio-pci should be looked at to >>> see if it applies to s390. If we're going to recognize this as a VF, >>> I'd rather we complete the emulation that the lower level hypervisor >>> has missed. If we can enable all the is_virtfn code on s390, then we >>> should probably cache is_virtfn on the vfio_pci_device object and allow >>> s390 a place to set it once at probe or enable time. >>> In the enumeration we have the possibility to know if the function is a HW/Firmware virtual function on devfn 0 or if it is created by SRIOV. It seems an easy fix without side effects. What do you think? >>> >>> It sure seems preferable to recognize that it is a VF in the kernel >>> than to require userspace to have arch specific hacks. Thanks, >>> >>> Alex >>> >> >
Re: [RFC PATCH] s390x/pci: vfio-pci breakage with disabled mem enforcement
On Tue, 28 Jul 2020 11:33:55 +0200 Niklas Schnelle wrote: > On 7/27/20 6:47 PM, Alex Williamson wrote: > > On Mon, 27 Jul 2020 17:40:39 +0200 > > Pierre Morel wrote: > > > >> On 2020-07-23 18:29, Alex Williamson wrote: > >>> On Thu, 23 Jul 2020 11:13:55 -0400 > >>> Matthew Rosato wrote: > >>> > I noticed that after kernel commit abafbc55 'vfio-pci: Invalidate mmaps > and block MMIO access on disabled memory' vfio-pci via qemu on s390x > fails spectacularly, with errors in qemu like: > ... snip ... > >> > >> Alex, Matt, > >> > >> in s390 we have the possibility to assign a virtual function to a > >> logical partition as function 0. > >> In this case it can not be treated as a virtual function but must be > >> treated as a physical function. > >> This is currently working very well. > >> However, these functions do not set PCI_COMMAND_MEMORY as we need. > > > > Where is the vendor and device ID virtualization done for these > > devices, we can't have a PF with IDs :. > Pierre doesn't mean the Device/Vendor IDs he means it has devfn == 0 > so it is the mandatory function zero on it's PCI bus, where until recently > we always had only one function per bus but with the recent multi-function > support it can act more like on other platforms with several PCI functions > sharing the same Bus e.g. a PF and the VFs created through sriov_numvfs. > That's why I'm saying that having devfn == 0 should not be very special for a > VF > passed to a VM and I really don't see where it would not act like a VF passed > from any other Hypervisor. My question is relative to other registers on VFs that are not implemented in hardware, not the device address. In addition to the memory bit of the command register, SR-IOV VFs do not implement the vendor and device IDs, these are to be virtualized from the values provided in the PF SR-IOV capability. It wouldn't make much sense to expose a device with no PCI vendor or device ID, so I assume the z/VM hypervisor is virtualizing these, but not the memory bit. > The only really tricky part in my opinion is where during the "probing" > we do set is_virtfn so it happens both for VFs passed-through from z/VM > or LPAR and VFs created through sriov_numvfs which unlike on other platforms > are also scanned by Firmware (pdev->no_vf_scan disables the Linux side > scanning). > With the fix I'm currently testing I had to do this in pcibios_enable_device() > because I also create sysfs links between VFs and their parent PFs and those > need the sysfs entries to be already created, which makes the more > apropriately > sound pcibios_bus_add_device() too early. I don't think it would be wise to set is_virtfn without a physfn being present in the OS view. I believe pci_dev.is_virtfn implies pci_dev.physfn points to the PF device. Thanks, Alex > >> Shouldn't we fix this inside the kernel, to keep older QMEU working? > >> > >> Then would it be OK to add a new bit/boolean inside the > >> pci_dev/vfio_pci_device like, is_detached_vfn, that we could set during > >> enumeration and test inside __vfio_pci_memory_enabled() to return true? > > > > Probably each instance of is_virtfn in vfio-pci should be looked at to > > see if it applies to s390. If we're going to recognize this as a VF, > > I'd rather we complete the emulation that the lower level hypervisor > > has missed. If we can enable all the is_virtfn code on s390, then we > > should probably cache is_virtfn on the vfio_pci_device object and allow > > s390 a place to set it once at probe or enable time. > > > >> In the enumeration we have the possibility to know if the function is a > >> HW/Firmware virtual function on devfn 0 or if it is created by SRIOV. > >> > >> It seems an easy fix without side effects. > >> > >> What do you think? > > > > It sure seems preferable to recognize that it is a VF in the kernel > > than to require userspace to have arch specific hacks. Thanks, > > > > Alex > > >
Re: [RFC PATCH] s390x/pci: vfio-pci breakage with disabled mem enforcement
On 7/27/20 6:47 PM, Alex Williamson wrote: > On Mon, 27 Jul 2020 17:40:39 +0200 > Pierre Morel wrote: > >> On 2020-07-23 18:29, Alex Williamson wrote: >>> On Thu, 23 Jul 2020 11:13:55 -0400 >>> Matthew Rosato wrote: >>> I noticed that after kernel commit abafbc55 'vfio-pci: Invalidate mmaps and block MMIO access on disabled memory' vfio-pci via qemu on s390x fails spectacularly, with errors in qemu like: ... snip ... >> >> Alex, Matt, >> >> in s390 we have the possibility to assign a virtual function to a >> logical partition as function 0. >> In this case it can not be treated as a virtual function but must be >> treated as a physical function. >> This is currently working very well. >> However, these functions do not set PCI_COMMAND_MEMORY as we need. > > Where is the vendor and device ID virtualization done for these > devices, we can't have a PF with IDs :. Pierre doesn't mean the Device/Vendor IDs he means it has devfn == 0 so it is the mandatory function zero on it's PCI bus, where until recently we always had only one function per bus but with the recent multi-function support it can act more like on other platforms with several PCI functions sharing the same Bus e.g. a PF and the VFs created through sriov_numvfs. That's why I'm saying that having devfn == 0 should not be very special for a VF passed to a VM and I really don't see where it would not act like a VF passed from any other Hypervisor. The only really tricky part in my opinion is where during the "probing" we do set is_virtfn so it happens both for VFs passed-through from z/VM or LPAR and VFs created through sriov_numvfs which unlike on other platforms are also scanned by Firmware (pdev->no_vf_scan disables the Linux side scanning). With the fix I'm currently testing I had to do this in pcibios_enable_device() because I also create sysfs links between VFs and their parent PFs and those need the sysfs entries to be already created, which makes the more apropriately sound pcibios_bus_add_device() too early. > >> Shouldn't we fix this inside the kernel, to keep older QMEU working? >> >> Then would it be OK to add a new bit/boolean inside the >> pci_dev/vfio_pci_device like, is_detached_vfn, that we could set during >> enumeration and test inside __vfio_pci_memory_enabled() to return true? > > Probably each instance of is_virtfn in vfio-pci should be looked at to > see if it applies to s390. If we're going to recognize this as a VF, > I'd rather we complete the emulation that the lower level hypervisor > has missed. If we can enable all the is_virtfn code on s390, then we > should probably cache is_virtfn on the vfio_pci_device object and allow > s390 a place to set it once at probe or enable time. > >> In the enumeration we have the possibility to know if the function is a >> HW/Firmware virtual function on devfn 0 or if it is created by SRIOV. >> >> It seems an easy fix without side effects. >> >> What do you think? > > It sure seems preferable to recognize that it is a VF in the kernel > than to require userspace to have arch specific hacks. Thanks, > > Alex >
Re: [RFC PATCH] s390x/pci: vfio-pci breakage with disabled mem enforcement
On 7/27/20 5:40 PM, Pierre Morel wrote: > > > On 2020-07-23 18:29, Alex Williamson wrote: >> On Thu, 23 Jul 2020 11:13:55 -0400 >> Matthew Rosato wrote: >> >>> I noticed that after kernel commit abafbc55 'vfio-pci: Invalidate mmaps >>> and block MMIO access on disabled memory' vfio-pci via qemu on s390x >>> fails spectacularly, with errors in qemu like: >>> >>> qemu-system-s390x: vfio_region_read(0001:00:00.0:region0+0x0, 4) failed: >>> Input/output error >>> >>> From read to bar 0 originating out of >>> hw/s390x/s390-pci-inst.c:zpci_read_bar(). >>> >>> So, I'm trying to figure out how to get vfio-pci happy again on s390x. From >>> a bit of tracing, we seem to be triggering the new trap in >>> __vfio_pci_memory_enabled(). Sure enough, if I just force this function to >>> return 'true' as a test case, things work again. >>> The included patch attempts to enforce the setting, which restores >>> everything >>> to working order but also triggers vfio_bar_restore() in the process So >>> this isn't the right answer, more of a proof-of-concept. >>> >>> @Alex: Any guidance on what needs to happen to make qemu-s390x happy with >>> this >>> recent kernel change? >> >> Bummer! I won't claim to understand s390 PCI, but if we have a VF >> exposed to the "host" (ie. the first level where vfio-pci is being >> used), but we can't tell that it's a VF, how do we know whether the >> memory bit in the command register is unimplemented because it's a VF >> or unimplemented because the device doesn't support MMIO? How are the >> device ID, vendor ID, and BAR registers virtualized to the host? Could >> the memory enable bit also be emulated by that virtualization, much >> like vfio-pci does for userspace? If the other registers are >> virtualized, but these command register bits are left unimplemented, do >> we need code to deduce that we have a VF based on the existence of MMIO >> BARs, but lack of memory enable bit? Thanks, >> >> Alex > > Alex, Matt, > > in s390 we have the possibility to assign a virtual function to a logical > partition as function 0. > In this case it can not be treated as a virtual function but must be treated > as a physical function. > This is currently working very well. Can you explain why it must be treated as a physical function and must not have is_virtfn set? I'm currently reworking my fix for PF/VF linking not happening for all ways to attach a VF and in that I intend to set is_virtfn = 1 also for VFs that are not linked with a PF including those attached to an LPAR. So far I really can not see a reason why that should not work since I was wrong before and Firmware does tell us that these are indeed VFs (zdev->is_physfn == 0). AFAIK on nearly all platforms guests will often have a VF as function zero on a bus because that is what I expect to happen if you pass it through as a PCI function. So unless I'm missing something, that just makes LPAR look more like a QEMU guest on another platform which is very likely much more well tested than treating a VF as a PF as we have been doing. > However, these functions do not set PCI_COMMAND_MEMORY as we need. > > Shouldn't we fix this inside the kernel, to keep older QMEU working? > > Then would it be OK to add a new bit/boolean inside the > pci_dev/vfio_pci_device like, is_detached_vfn, that we could set during > enumeration and test inside __vfio_pci_memory_enabled() to return true? This does not make sense to me, as I wrote above it's totally normal for VMs to see VFs detached from the PF as they are passed-through to a QEMU guest so IMHO that's already covered by the meaning of is_virtfn. > > In the enumeration we have the possibility to know if the function is a > HW/Firmware virtual function on devfn 0 or if it is created by SRIOV. >
Re: [RFC PATCH] s390x/pci: vfio-pci breakage with disabled mem enforcement
On Mon, 27 Jul 2020 17:40:39 +0200 Pierre Morel wrote: > On 2020-07-23 18:29, Alex Williamson wrote: > > On Thu, 23 Jul 2020 11:13:55 -0400 > > Matthew Rosato wrote: > > > >> I noticed that after kernel commit abafbc55 'vfio-pci: Invalidate mmaps > >> and block MMIO access on disabled memory' vfio-pci via qemu on s390x > >> fails spectacularly, with errors in qemu like: > >> > >> qemu-system-s390x: vfio_region_read(0001:00:00.0:region0+0x0, 4) failed: > >> Input/output error > >> > >> From read to bar 0 originating out of > >> hw/s390x/s390-pci-inst.c:zpci_read_bar(). > >> > >> So, I'm trying to figure out how to get vfio-pci happy again on s390x. > >> From > >> a bit of tracing, we seem to be triggering the new trap in > >> __vfio_pci_memory_enabled(). Sure enough, if I just force this function to > >> return 'true' as a test case, things work again. > >> The included patch attempts to enforce the setting, which restores > >> everything > >> to working order but also triggers vfio_bar_restore() in the process > >> So > >> this isn't the right answer, more of a proof-of-concept. > >> > >> @Alex: Any guidance on what needs to happen to make qemu-s390x happy with > >> this > >> recent kernel change? > > > > Bummer! I won't claim to understand s390 PCI, but if we have a VF > > exposed to the "host" (ie. the first level where vfio-pci is being > > used), but we can't tell that it's a VF, how do we know whether the > > memory bit in the command register is unimplemented because it's a VF > > or unimplemented because the device doesn't support MMIO? How are the > > device ID, vendor ID, and BAR registers virtualized to the host? Could > > the memory enable bit also be emulated by that virtualization, much > > like vfio-pci does for userspace? If the other registers are > > virtualized, but these command register bits are left unimplemented, do > > we need code to deduce that we have a VF based on the existence of MMIO > > BARs, but lack of memory enable bit? Thanks, > > > > Alex > > Alex, Matt, > > in s390 we have the possibility to assign a virtual function to a > logical partition as function 0. > In this case it can not be treated as a virtual function but must be > treated as a physical function. > This is currently working very well. > However, these functions do not set PCI_COMMAND_MEMORY as we need. Where is the vendor and device ID virtualization done for these devices, we can't have a PF with IDs :. > Shouldn't we fix this inside the kernel, to keep older QMEU working? > > Then would it be OK to add a new bit/boolean inside the > pci_dev/vfio_pci_device like, is_detached_vfn, that we could set during > enumeration and test inside __vfio_pci_memory_enabled() to return true? Probably each instance of is_virtfn in vfio-pci should be looked at to see if it applies to s390. If we're going to recognize this as a VF, I'd rather we complete the emulation that the lower level hypervisor has missed. If we can enable all the is_virtfn code on s390, then we should probably cache is_virtfn on the vfio_pci_device object and allow s390 a place to set it once at probe or enable time. > In the enumeration we have the possibility to know if the function is a > HW/Firmware virtual function on devfn 0 or if it is created by SRIOV. > > It seems an easy fix without side effects. > > What do you think? It sure seems preferable to recognize that it is a VF in the kernel than to require userspace to have arch specific hacks. Thanks, Alex
Re: [RFC PATCH] s390x/pci: vfio-pci breakage with disabled mem enforcement
On 7/27/20 11:40 AM, Pierre Morel wrote: On 2020-07-23 18:29, Alex Williamson wrote: On Thu, 23 Jul 2020 11:13:55 -0400 Matthew Rosato wrote: I noticed that after kernel commit abafbc55 'vfio-pci: Invalidate mmaps and block MMIO access on disabled memory' vfio-pci via qemu on s390x fails spectacularly, with errors in qemu like: qemu-system-s390x: vfio_region_read(0001:00:00.0:region0+0x0, 4) failed: Input/output error From read to bar 0 originating out of hw/s390x/s390-pci-inst.c:zpci_read_bar(). So, I'm trying to figure out how to get vfio-pci happy again on s390x. From a bit of tracing, we seem to be triggering the new trap in __vfio_pci_memory_enabled(). Sure enough, if I just force this function to return 'true' as a test case, things work again. The included patch attempts to enforce the setting, which restores everything to working order but also triggers vfio_bar_restore() in the process So this isn't the right answer, more of a proof-of-concept. @Alex: Any guidance on what needs to happen to make qemu-s390x happy with this recent kernel change? Bummer! I won't claim to understand s390 PCI, but if we have a VF exposed to the "host" (ie. the first level where vfio-pci is being used), but we can't tell that it's a VF, how do we know whether the memory bit in the command register is unimplemented because it's a VF or unimplemented because the device doesn't support MMIO? How are the device ID, vendor ID, and BAR registers virtualized to the host? Could the memory enable bit also be emulated by that virtualization, much like vfio-pci does for userspace? If the other registers are virtualized, but these command register bits are left unimplemented, do we need code to deduce that we have a VF based on the existence of MMIO BARs, but lack of memory enable bit? Thanks, Alex Alex, Matt, in s390 we have the possibility to assign a virtual function to a logical partition as function 0. In this case it can not be treated as a virtual function but must be treated as a physical function. This is currently working very well. However, these functions do not set PCI_COMMAND_MEMORY as we need. Thanks for the explanation. Shouldn't we fix this inside the kernel, to keep older QMEU working? Yes, ideally. Then would it be OK to add a new bit/boolean inside the pci_dev/vfio_pci_device like, is_detached_vfn, that we could set during enumeration and test inside __vfio_pci_memory_enabled() to return true? In the enumeration we have the possibility to know if the function is a HW/Firmware virtual function on devfn 0 or if it is created by SRIOV. It seems an easy fix without side effects. What do you think? From my perspective at least, this would resolve the issue and is in-line with the sort of suggestion Alex floated the other day of "do we need code to deduce that we have a VF based on the existence of MMIO BARs, but lack of memory enable bit?" -- it just sounds like a different way of coming to the conclusion. Effectively we need a way to say: 'for the purposes of memory_enable detection, treat this thing like a virtual function because it is one, even though it doesn't always act like one' @Nilkas/@Pierre: I wonder if this might be related to host device is_virtfn? I note that my host device lspci output looks like: :00:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx Virtual Function] But the device is not marked as is_virtfn.. Otherwise, Alex's fix from htps://lkml.org/lkml/2020/6/25/628 should cover the case. I do not think we can fix this problem by forcing the is_virtfn bit. AFAIU, our HW virtual function works a lot like a physical function. Matthew Rosato (1): s390x/pci: Enforce PCI_COMMAND_MEMORY for vfio-pci hw/s390x/s390-pci-inst.c | 10 ++ 1 file changed, 10 insertions(+) Regards, Pierre
Re: [RFC PATCH] s390x/pci: vfio-pci breakage with disabled mem enforcement
On 2020-07-23 18:29, Alex Williamson wrote: On Thu, 23 Jul 2020 11:13:55 -0400 Matthew Rosato wrote: I noticed that after kernel commit abafbc55 'vfio-pci: Invalidate mmaps and block MMIO access on disabled memory' vfio-pci via qemu on s390x fails spectacularly, with errors in qemu like: qemu-system-s390x: vfio_region_read(0001:00:00.0:region0+0x0, 4) failed: Input/output error From read to bar 0 originating out of hw/s390x/s390-pci-inst.c:zpci_read_bar(). So, I'm trying to figure out how to get vfio-pci happy again on s390x. From a bit of tracing, we seem to be triggering the new trap in __vfio_pci_memory_enabled(). Sure enough, if I just force this function to return 'true' as a test case, things work again. The included patch attempts to enforce the setting, which restores everything to working order but also triggers vfio_bar_restore() in the process So this isn't the right answer, more of a proof-of-concept. @Alex: Any guidance on what needs to happen to make qemu-s390x happy with this recent kernel change? Bummer! I won't claim to understand s390 PCI, but if we have a VF exposed to the "host" (ie. the first level where vfio-pci is being used), but we can't tell that it's a VF, how do we know whether the memory bit in the command register is unimplemented because it's a VF or unimplemented because the device doesn't support MMIO? How are the device ID, vendor ID, and BAR registers virtualized to the host? Could the memory enable bit also be emulated by that virtualization, much like vfio-pci does for userspace? If the other registers are virtualized, but these command register bits are left unimplemented, do we need code to deduce that we have a VF based on the existence of MMIO BARs, but lack of memory enable bit? Thanks, Alex Alex, Matt, in s390 we have the possibility to assign a virtual function to a logical partition as function 0. In this case it can not be treated as a virtual function but must be treated as a physical function. This is currently working very well. However, these functions do not set PCI_COMMAND_MEMORY as we need. Shouldn't we fix this inside the kernel, to keep older QMEU working? Then would it be OK to add a new bit/boolean inside the pci_dev/vfio_pci_device like, is_detached_vfn, that we could set during enumeration and test inside __vfio_pci_memory_enabled() to return true? In the enumeration we have the possibility to know if the function is a HW/Firmware virtual function on devfn 0 or if it is created by SRIOV. It seems an easy fix without side effects. What do you think? @Nilkas/@Pierre: I wonder if this might be related to host device is_virtfn? I note that my host device lspci output looks like: :00:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx Virtual Function] But the device is not marked as is_virtfn.. Otherwise, Alex's fix from htps://lkml.org/lkml/2020/6/25/628 should cover the case. I do not think we can fix this problem by forcing the is_virtfn bit. AFAIU, our HW virtual function works a lot like a physical function. Matthew Rosato (1): s390x/pci: Enforce PCI_COMMAND_MEMORY for vfio-pci hw/s390x/s390-pci-inst.c | 10 ++ 1 file changed, 10 insertions(+) Regards, Pierre -- Pierre Morel IBM Lab Boeblingen
Re: [RFC PATCH] s390x/pci: vfio-pci breakage with disabled mem enforcement
On 7/24/20 11:46 AM, Niklas Schnelle wrote: > > > On 7/23/20 5:13 PM, Matthew Rosato wrote: >> I noticed that after kernel commit abafbc55 'vfio-pci: Invalidate mmaps >> and block MMIO access on disabled memory' vfio-pci via qemu on s390x >> fails spectacularly, with errors in qemu like: >> >> qemu-system-s390x: vfio_region_read(0001:00:00.0:region0+0x0, 4) failed: >> Input/output error >> >> From read to bar 0 originating out of >> hw/s390x/s390-pci-inst.c:zpci_read_bar(). >> >> So, I'm trying to figure out how to get vfio-pci happy again on s390x. From >> a bit of tracing, we seem to be triggering the new trap in >> __vfio_pci_memory_enabled(). Sure enough, if I just force this function to >> return 'true' as a test case, things work again. >> The included patch attempts to enforce the setting, which restores everything >> to working order but also triggers vfio_bar_restore() in the process So >> this isn't the right answer, more of a proof-of-concept. >> >> @Alex: Any guidance on what needs to happen to make qemu-s390x happy with >> this >> recent kernel change? >> >> @Nilkas/@Pierre: I wonder if this might be related to host device is_virtfn? >> I note that my host device lspci output looks like: >> >> :00:00.0 Ethernet controller: Mellanox Technologies MT27710 Family >> [ConnectX-4 Lx Virtual Function] >> >> But the device is not marked as is_virtfn.. Otherwise, Alex's fix >> from htps://lkml.org/lkml/2020/6/25/628 should cover the case. > With commit e5794cf1a270 ("s390/pci: create links between PFs and VFs") I > introduced > the is_physfn field to struct zpci_dev which gets set through the > CLP Query PCI Function. Also with that commit this being 0 will set > is_virtfn to 1. > Interestingly looking at s390-pci-inst.c in QEMU I'd think that > on QEMU this should already be 0 and thus is_virtfn should be set > with Linux >5.8-rc1 and the missing case is actually for passing through > a PF where it would wrongly be 0 too. > Note: If the Linux instance does not see the > parent PF however the only way I know to test if it is a VF from userspace > is checking if /sys/bus/pci/devices//vfn is non-zero which is platform > specific and currently wrongly set 0 on QEMU for VFs. > If the PF is known the mentioned commit will also create the > /sys/bus/pci/devices//physfn symlink as on other platforms. Ok I've been looking into this more and is_physfn is correctly set to 0 on RoCE VFs, there is just a Bug in zPCI preventing the is_virtfn from beeing set when the device is found in CLP List PCI devices (equivalent too boot time bus probing on other architectures) instead of beeing hot plugged later which is the case for sriov_numvfs. I'm working on a fix for a bug with PF/VF linking anyway so should get that fixed then. >> Matthew Rosato (1): >> s390x/pci: Enforce PCI_COMMAND_MEMORY for vfio-pci >> >> hw/s390x/s390-pci-inst.c | 10 ++ >> 1 file changed, 10 insertions(+) >>
Re: [RFC PATCH] s390x/pci: vfio-pci breakage with disabled mem enforcement
On 7/24/20 11:46 AM, Niklas Schnelle wrote: > > > On 7/23/20 5:13 PM, Matthew Rosato wrote: >> I noticed that after kernel commit abafbc55 'vfio-pci: Invalidate mmaps >> and block MMIO access on disabled memory' vfio-pci via qemu on s390x >> fails spectacularly, with errors in qemu like: >> >> qemu-system-s390x: vfio_region_read(0001:00:00.0:region0+0x0, 4) failed: >> Input/output error >> >> From read to bar 0 originating out of >> hw/s390x/s390-pci-inst.c:zpci_read_bar(). >> >> So, I'm trying to figure out how to get vfio-pci happy again on s390x. From >> a bit of tracing, we seem to be triggering the new trap in >> __vfio_pci_memory_enabled(). Sure enough, if I just force this function to >> return 'true' as a test case, things work again. >> The included patch attempts to enforce the setting, which restores everything >> to working order but also triggers vfio_bar_restore() in the process So >> this isn't the right answer, more of a proof-of-concept. >> >> @Alex: Any guidance on what needs to happen to make qemu-s390x happy with >> this >> recent kernel change? >> >> @Nilkas/@Pierre: I wonder if this might be related to host device is_virtfn? >> I note that my host device lspci output looks like: >> >> :00:00.0 Ethernet controller: Mellanox Technologies MT27710 Family >> [ConnectX-4 Lx Virtual Function] >> >> But the device is not marked as is_virtfn.. Otherwise, Alex's fix >> from htps://lkml.org/lkml/2020/6/25/628 should cover the case. > With commit e5794cf1a270 ("s390/pci: create links between PFs and VFs") I > introduced > the is_physfn field to struct zpci_dev which gets set through the > CLP Query PCI Function. Also with that commit this being 0 will set > is_virtfn to 1. > Interestingly looking at s390-pci-inst.c in QEMU I'd think that > on QEMU this should already be 0 and thus is_virtfn should be set > with Linux >5.8-rc1 and the missing case is actually for passing through > a PF where it would wrongly be 0 too. > Note: If the Linux instance does not see the > parent PF however the only way I know to test if it is a VF from userspace > is checking if /sys/bus/pci/devices//vfn is non-zero which is platform > specific and currently wrongly set 0 on QEMU for VFs. > If the PF is known the mentioned commit will also create the > /sys/bus/pci/devices//physfn symlink as on other platforms. Arghh, sorry the problem is of course that is_virtfn is not set in the host. I thought it should be but testing this the is_physfn bit is actually non-zero for RoCEs on z/VM and LPAR. I will try figuring out why that is, I guess I should have used the vfn field instead but I thought is_physfn would be more explicit :-( >> Matthew Rosato (1): >> s390x/pci: Enforce PCI_COMMAND_MEMORY for vfio-pci >> >> hw/s390x/s390-pci-inst.c | 10 ++ >> 1 file changed, 10 insertions(+) >>
Re: [RFC PATCH] s390x/pci: vfio-pci breakage with disabled mem enforcement
On 7/23/20 5:13 PM, Matthew Rosato wrote: > I noticed that after kernel commit abafbc55 'vfio-pci: Invalidate mmaps > and block MMIO access on disabled memory' vfio-pci via qemu on s390x > fails spectacularly, with errors in qemu like: > > qemu-system-s390x: vfio_region_read(0001:00:00.0:region0+0x0, 4) failed: > Input/output error > > From read to bar 0 originating out of > hw/s390x/s390-pci-inst.c:zpci_read_bar(). > > So, I'm trying to figure out how to get vfio-pci happy again on s390x. From > a bit of tracing, we seem to be triggering the new trap in > __vfio_pci_memory_enabled(). Sure enough, if I just force this function to > return 'true' as a test case, things work again. > The included patch attempts to enforce the setting, which restores everything > to working order but also triggers vfio_bar_restore() in the process So > this isn't the right answer, more of a proof-of-concept. > > @Alex: Any guidance on what needs to happen to make qemu-s390x happy with this > recent kernel change? > > @Nilkas/@Pierre: I wonder if this might be related to host device is_virtfn? > I note that my host device lspci output looks like: > > :00:00.0 Ethernet controller: Mellanox Technologies MT27710 Family > [ConnectX-4 Lx Virtual Function] > > But the device is not marked as is_virtfn.. Otherwise, Alex's fix > from htps://lkml.org/lkml/2020/6/25/628 should cover the case. With commit e5794cf1a270 ("s390/pci: create links between PFs and VFs") I introduced the is_physfn field to struct zpci_dev which gets set through the CLP Query PCI Function. Also with that commit this being 0 will set is_virtfn to 1. Interestingly looking at s390-pci-inst.c in QEMU I'd think that on QEMU this should already be 0 and thus is_virtfn should be set with Linux >5.8-rc1 and the missing case is actually for passing through a PF where it would wrongly be 0 too. Note: If the Linux instance does not see the parent PF however the only way I know to test if it is a VF from userspace is checking if /sys/bus/pci/devices//vfn is non-zero which is platform specific and currently wrongly set 0 on QEMU for VFs. If the PF is known the mentioned commit will also create the /sys/bus/pci/devices//physfn symlink as on other platforms. > Matthew Rosato (1): > s390x/pci: Enforce PCI_COMMAND_MEMORY for vfio-pci > > hw/s390x/s390-pci-inst.c | 10 ++ > 1 file changed, 10 insertions(+) >
Re: [RFC PATCH] s390x/pci: vfio-pci breakage with disabled mem enforcement
On 7/23/20 12:29 PM, Alex Williamson wrote: On Thu, 23 Jul 2020 11:13:55 -0400 Matthew Rosato wrote: I noticed that after kernel commit abafbc55 'vfio-pci: Invalidate mmaps and block MMIO access on disabled memory' vfio-pci via qemu on s390x fails spectacularly, with errors in qemu like: qemu-system-s390x: vfio_region_read(0001:00:00.0:region0+0x0, 4) failed: Input/output error From read to bar 0 originating out of hw/s390x/s390-pci-inst.c:zpci_read_bar(). So, I'm trying to figure out how to get vfio-pci happy again on s390x. From a bit of tracing, we seem to be triggering the new trap in __vfio_pci_memory_enabled(). Sure enough, if I just force this function to return 'true' as a test case, things work again. The included patch attempts to enforce the setting, which restores everything to working order but also triggers vfio_bar_restore() in the process So this isn't the right answer, more of a proof-of-concept. @Alex: Any guidance on what needs to happen to make qemu-s390x happy with this recent kernel change? Bummer! I won't claim to understand s390 PCI, but if we have a VF exposed to the "host" (ie. the first level where vfio-pci is being used), but we can't tell that it's a VF, how do we know whether the memory bit in the command register is unimplemented because it's a VF or unimplemented because the device doesn't support MMIO? How are the device ID, vendor ID, and BAR registers virtualized to the host? Could On s390 this info is all advertised/accessed via special instructions (my PoC qemu patch was intercepting one of these sorts of instructions from the guest and modifying it). the memory enable bit also be emulated by that virtualization, much like vfio-pci does for userspace? If the other registers are virtualized, but these command register bits are left unimplemented, do we need code to deduce that we have a VF based on the existence of MMIO BARs, but lack of memory enable bit? Thanks, Yeah, I'm thinking we might need something to this effect for s390 at least. But I'm curious if Niklas/Pierre have any add'l thoughts about that since they touched the virtfn stuff recently for s390 PCI. Alex @Nilkas/@Pierre: I wonder if this might be related to host device is_virtfn? I note that my host device lspci output looks like: :00:00.0 Ethernet controller: Mellanox Technologies MT27710 Family [ConnectX-4 Lx Virtual Function] But the device is not marked as is_virtfn.. Otherwise, Alex's fix from htps://lkml.org/lkml/2020/6/25/628 should cover the case. Matthew Rosato (1): s390x/pci: Enforce PCI_COMMAND_MEMORY for vfio-pci hw/s390x/s390-pci-inst.c | 10 ++ 1 file changed, 10 insertions(+)
Re: [RFC PATCH] s390x/pci: vfio-pci breakage with disabled mem enforcement
On Thu, 23 Jul 2020 11:13:55 -0400 Matthew Rosato wrote: > I noticed that after kernel commit abafbc55 'vfio-pci: Invalidate mmaps > and block MMIO access on disabled memory' vfio-pci via qemu on s390x > fails spectacularly, with errors in qemu like: > > qemu-system-s390x: vfio_region_read(0001:00:00.0:region0+0x0, 4) failed: > Input/output error > > From read to bar 0 originating out of > hw/s390x/s390-pci-inst.c:zpci_read_bar(). > > So, I'm trying to figure out how to get vfio-pci happy again on s390x. From > a bit of tracing, we seem to be triggering the new trap in > __vfio_pci_memory_enabled(). Sure enough, if I just force this function to > return 'true' as a test case, things work again. > The included patch attempts to enforce the setting, which restores everything > to working order but also triggers vfio_bar_restore() in the process So > this isn't the right answer, more of a proof-of-concept. > > @Alex: Any guidance on what needs to happen to make qemu-s390x happy with this > recent kernel change? Bummer! I won't claim to understand s390 PCI, but if we have a VF exposed to the "host" (ie. the first level where vfio-pci is being used), but we can't tell that it's a VF, how do we know whether the memory bit in the command register is unimplemented because it's a VF or unimplemented because the device doesn't support MMIO? How are the device ID, vendor ID, and BAR registers virtualized to the host? Could the memory enable bit also be emulated by that virtualization, much like vfio-pci does for userspace? If the other registers are virtualized, but these command register bits are left unimplemented, do we need code to deduce that we have a VF based on the existence of MMIO BARs, but lack of memory enable bit? Thanks, Alex > @Nilkas/@Pierre: I wonder if this might be related to host device is_virtfn? > I note that my host device lspci output looks like: > > :00:00.0 Ethernet controller: Mellanox Technologies MT27710 Family > [ConnectX-4 Lx Virtual Function] > > But the device is not marked as is_virtfn.. Otherwise, Alex's fix > from htps://lkml.org/lkml/2020/6/25/628 should cover the case. > > > > Matthew Rosato (1): > s390x/pci: Enforce PCI_COMMAND_MEMORY for vfio-pci > > hw/s390x/s390-pci-inst.c | 10 ++ > 1 file changed, 10 insertions(+) >