On Sat, Aug 05, 2017 at 03:15:11PM +0800, Ding Tianhong wrote: > When bit4 is set in the PCIe Device Control register, it indicates > whether the device is permitted to use relaxed ordering. > On some platforms using relaxed ordering can have performance issues or > due to erratum can cause data-corruption. In such cases devices must avoid > using relaxed ordering. > > This patch checks if there is any node in the hierarchy that indicates that > using relaxed ordering is not safe.
I think you only check the devices between the root port and the target device. For example, you don't check siblings or cousins of the target device. > In such cases the patch turns off the > relaxed ordering by clearing the eapability for this device. s/eapability/capability/ > And if the > device is probably running in a guest machine, we should do nothing. I don't know what this sentence means. "Probably running in a guest machine" doesn't really make sense, and there's nothing in your patch that explicitly checks for being in a guest machine. > Signed-off-by: Ding Tianhong <dingtianh...@huawei.com> > Acked-by: Alexander Duyck <alexander.h.du...@intel.com> > Acked-by: Ashok Raj <ashok....@intel.com> > --- > drivers/pci/pci.c | 29 +++++++++++++++++++++++++++++ > drivers/pci/probe.c | 37 +++++++++++++++++++++++++++++++++++++ > include/linux/pci.h | 2 ++ > 3 files changed, 68 insertions(+) > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index af0cc34..4f9d7c1 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -4854,6 +4854,35 @@ int pcie_set_mps(struct pci_dev *dev, int mps) > EXPORT_SYMBOL(pcie_set_mps); > > /** > + * pcie_clear_relaxed_ordering - clear PCI Express relaxed ordering bit > + * @dev: PCI device to query > + * > + * If possible clear relaxed ordering Why "If possible"? The bit is required to be RW or hardwired to zero, so PCI_EXP_DEVCTL_RELAX_EN should *always* be zero when this returns. > + */ > +int pcie_clear_relaxed_ordering(struct pci_dev *dev) > +{ > + return pcie_capability_clear_word(dev, PCI_EXP_DEVCTL, > + PCI_EXP_DEVCTL_RELAX_EN); > +} > +EXPORT_SYMBOL(pcie_clear_relaxed_ordering); The current series doesn't add any callers of this except pci_configure_relaxed_ordering(), so it doesn't need to be exported to modules. I think I would put both of these functions in drivers/pci/probe.c. Then this one could be static and you'd only have to add pcie_relaxed_ordering_supported() to include/linux/pci.h. > + > +/** > + * pcie_relaxed_ordering_supported - Probe for PCIe relexed ordering support s/relexed/relaxed/ > + * @dev: PCI device to query > + * > + * Returns true if the device support relaxed ordering attribute. > + */ > +bool pcie_relaxed_ordering_supported(struct pci_dev *dev) > +{ > + u16 v; > + > + pcie_capability_read_word(dev, PCI_EXP_DEVCTL, &v); > + > + return !!(v & PCI_EXP_DEVCTL_RELAX_EN); > +} > +EXPORT_SYMBOL(pcie_relaxed_ordering_supported); This is misnamed. This doesn't tell us anything about whether the device *supports* relaxed ordering. It only tells us whether the device is *permitted* to use it. When a device initiates a transaction, the hardware should set the RO bit in the TLP with logic something like this: RO = <this Function supports relaxed ordering> && <this transaction doesn't require strong write ordering> && <PCI_EXP_DEVCTL_RELAX_EN is set> The issue you're fixing is that some Completers don't handle RO correctly. The determining factor is not the Requester, but the Completer (for this series, a Root Port). So I think this should be something like: int pcie_relaxed_ordering_broken(struct pci_dev *completer) { if (!completer) return 0; return completer->dev_flags & PCI_DEV_FLAGS_NO_RELAXED_ORDERING; } and the caller should do something like this: if (pcie_relaxed_ordering_broken(pci_find_pcie_root_port(pdev))) adapter->flags |= ROOT_NO_RELAXED_ORDERING; That way it's obvious where the issue is, and it's obvious that the answer might be different for peer-to-peer transactions than it is for transactions to the root port, i.e., to coherent memory. > + > +/** > * pcie_get_minimum_link - determine minimum link settings of a PCI device > * @dev: PCI device to query > * @speed: storage for minimum speed > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c > index c31310d..48df012 100644 > --- a/drivers/pci/probe.c > +++ b/drivers/pci/probe.c > @@ -1762,6 +1762,42 @@ static void pci_configure_extended_tags(struct pci_dev > *dev) > PCI_EXP_DEVCTL_EXT_TAG); > } > > +/** > + * pci_dev_should_disable_relaxed_ordering - check if the PCI device > + * should disable the relaxed ordering attribute. > + * @dev: PCI device > + * > + * Return true if any of the PCI devices above us do not support > + * relaxed ordering. > + */ > +static bool pci_dev_should_disable_relaxed_ordering(struct pci_dev *dev) > +{ > + while (dev) { > + if (dev->dev_flags & PCI_DEV_FLAGS_NO_RELAXED_ORDERING) > + return true; > + > + dev = dev->bus->self; > + } > + > + return false; > +} > + > +static void pci_configure_relaxed_ordering(struct pci_dev *dev) > +{ > + /* We should not alter the relaxed ordering bit for the VF */ > + if (dev->is_virtfn) > + return; > + > + /* If the releaxed ordering enable bit is not set, do nothing. */ s/releaxed/relaxed/ > + if (!pcie_relaxed_ordering_supported(dev)) > + return; > + > + if (pci_dev_should_disable_relaxed_ordering(dev)) { > + pcie_clear_relaxed_ordering(dev); > + dev_info(&dev->dev, "Disable Relaxed Ordering\n"); This associates the message with the Requester that may potentially use relaxed ordering. But there's nothing wrong or unusual about the Requester; the issue is with the *Completer*, so I think the message should be in the quirk where we set PCI_DEV_FLAGS_NO_RELAXED_ORDERING. Maybe it should be both places; I dunno. This implementation assumes the device only initiates transactions to coherent memory, i.e., it assumes the device never does peer-to-peer DMA. I guess we'll have to wait and see if we trip over any peer-to-peer issues, then figure out how to handle them. > + } > +} > + > static void pci_configure_device(struct pci_dev *dev) > { > struct hotplug_params hpp; > @@ -1769,6 +1805,7 @@ static void pci_configure_device(struct pci_dev *dev) > > pci_configure_mps(dev); > pci_configure_extended_tags(dev); > + pci_configure_relaxed_ordering(dev); > > memset(&hpp, 0, sizeof(hpp)); > ret = pci_get_hp_params(dev, &hpp); > diff --git a/include/linux/pci.h b/include/linux/pci.h > index 412ec1c..3aa23a2 100644 > --- a/include/linux/pci.h > +++ b/include/linux/pci.h > @@ -1127,6 +1127,8 @@ int pci_add_ext_cap_save_buffer(struct pci_dev *dev, > void pci_pme_wakeup_bus(struct pci_bus *bus); > void pci_d3cold_enable(struct pci_dev *dev); > void pci_d3cold_disable(struct pci_dev *dev); > +int pcie_clear_relaxed_ordering(struct pci_dev *dev); > +bool pcie_relaxed_ordering_supported(struct pci_dev *dev); > > /* PCI Virtual Channel */ > int pci_save_vc_state(struct pci_dev *dev); > -- > 1.8.3.1 > > > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-ker...@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel