Hi Alex, On 19/09/2017 18:58, Alex Williamson wrote: > With virtual PCI-Express chipsets, we now see userspace/guest drivers > trying to match the physical MPS setting to a virtual downstream port. > Of course a lone physical device surrounded by virtual interconnects > cannot make a correct decision for a proper MPS setting. Instead, > let's virtualize the MPS control register so that writes through to > hardware are disallowed. Userspace drivers like QEMU assume they can > write anything to the device and we'll filter out anything dangerous. > Since mismatched MPS can lead to AER and other faults, let's add it > to the kernel side rather than relying on userspace virtualization to > handle it. > > Signed-off-by: Alex Williamson <alex.william...@redhat.com> > --- > > Do we have any reason to suspect that a userspace driver has any > dependencies on the physical MPS setting or is this only tuning the > protocol layer and it's transparent to the driver? Note that per the > PCI spec, a device supporting only 128B MPS can hardwire the control > register to 000b, but it doesn't seem PCIe compliant to hardwire it to > any given value, such as would be the appearance if we exposed this as > a read-only register rather than virtualizing it. QEMU would then be > responsible for virtualizing it, which makes coordinating the upgrade > troublesome. > > drivers/vfio/pci/vfio_pci_config.c | 6 ++++-- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/drivers/vfio/pci/vfio_pci_config.c > b/drivers/vfio/pci/vfio_pci_config.c > index 5628fe114347..91335e6de88a 100644 > --- a/drivers/vfio/pci/vfio_pci_config.c > +++ b/drivers/vfio/pci/vfio_pci_config.c > @@ -849,11 +849,13 @@ static int __init init_pci_cap_exp_perm(struct > perm_bits *perm) > > /* > * Allow writes to device control fields, except devctl_phantom, > - * which could confuse IOMMU, and the ARI bit in devctl2, which > + * which could confuse IOMMU, MPS, which can break communication > + * with other physical devices, and the ARI bit in devctl2, which > * is set at probe time. FLR gets virtualized via our writefn. > */ > p_setw(perm, PCI_EXP_DEVCTL, > - PCI_EXP_DEVCTL_BCR_FLR, ~PCI_EXP_DEVCTL_PHANTOM); > + PCI_EXP_DEVCTL_BCR_FLR | PCI_EXP_DEVCTL_PAYLOAD, > + ~PCI_EXP_DEVCTL_PHANTOM); > p_setw(perm, PCI_EXP_DEVCTL2, NO_VIRT, ~PCI_EXP_DEVCTL2_ARI); Is it correct that the read value still will be the one written by the guest?
I see the MMRS can take the read MPS value in some pcie_bus_config values. So a consequence could be that the applied MMRS (which is not virtualized) is lower than what is set by host, due to a guest pcie root port MPSS for instance. So if the above is not totally wrong, shouldn't we virtualize MMRS as well? Thanks Eric > return 0; > } >