Re: [PATCH v9 2/4] PCI: Disable PCIe Relaxed Ordering if unsupported
On 2017/8/9 11:25, Bjorn Helgaas wrote: > On Tue, Aug 08, 2017 at 09:22:39PM -0500, Bjorn Helgaas wrote: >> On Sat, Aug 05, 2017 at 03:15:11PM +0800, Ding Tianhong wrote: >>> When bit4 is set in the PCIe Device Control register, it indicates > After looking at the driver, I wonder if it would be simpler like > this: > > int pcie_relaxed_ordering_enabled(struct pci_dev *dev) > { > u16 ctl; > > pcie_capability_read_word(dev, PCI_EXP_DEVCTL, &ctl); > return ctl & PCI_EXP_DEVCTL_RELAX_EN; > } > EXPORT_SYMBOL(pcie_relaxed_ordering_enabled); > > static void pci_configure_relaxed_ordering(struct pci_dev *dev) > { > struct pci_dev *root; > > if (dev->is_virtfn) > return; /* PCI_EXP_DEVCTL_RELAX_EN is RsvdP in VFs */ > > if (!pcie_relaxed_ordering_enabled(dev)) > return; > > /* > * For now, we only deal with Relaxed Ordering issues with Root > * Ports. Peer-to-peer DMA is another can of worms. > */ > root = pci_find_pcie_root_port(dev); > if (!root) > return; > > if (root->relaxed_ordering_broken) > pcie_capability_clear_word(dev, PCI_EXP_DEVCTL, > PCI_EXP_DEVCTL_RELAX_EN); > } > > This doesn't check every intervening switch, but I don't think we know > about any issues except with root ports. > Yes > And the driver could do: > > if (!pcie_relaxed_ordering_enabled(pdev)) > adapter->flags |= ROOT_NO_RELAXED_ORDERING; > > The driver code wouldn't show anything about coherent memory vs. > peer-to-peer, but we really don't have a clue about how to handle that > yet anyway. > > I guess this is back to exactly what you proposed, except that I > changed the name of pcie_relaxed_ordering_supported() to > pcie_relaxed_ordering_enabled(), which I think is slightly more > specific from the device's point of view. > OK, looks like we reach a consensus finally, I will follow your new opinion and resend, thanks. Ding > Bjorn > > . >
Re: [PATCH v9 2/4] PCI: Disable PCIe Relaxed Ordering if unsupported
Hi Bjorn: On 2017/8/9 10:22, Bjorn Helgaas wrote: > On Sat, Aug 05, 2017 at 03:15:11PM +0800, Ding Tianhong wrote: >> When bit4 is set in the PCIe Device Control register, it indicates >> whether the device is permitted to use relaxed ordering. >> On some platforms using relaxed ordering can have performance issues or >> due to erratum can cause data-corruption. In such cases devices must avoid >> using relaxed ordering. >> >> This patch checks if there is any node in the hierarchy that indicates that >> using relaxed ordering is not safe. > > I think you only check the devices between the root port and the > target device. For example, you don't check siblings or cousins of > the target device. > OK, update the description. >> In such cases the patch turns off the >> relaxed ordering by clearing the eapability for this device. > > s/eapability/capability/ > >> And if the >> device is probably running in a guest machine, we should do nothing. > > I don't know what this sentence means. "Probably running in a guest > machine" doesn't really make sense, and there's nothing in your patch > that explicitly checks for being in a guest machine. > Alex noticed that we should do nothing if in the virtual machine because the Root Complex is NULL at that time, so I think this word should be more clearly here. >> Signed-off-by: Ding Tianhong >> Acked-by: Alexander Duyck >> Acked-by: Ashok Raj >> --- >> drivers/pci/pci.c | 29 + >> drivers/pci/probe.c | 37 + >> include/linux/pci.h | 2 ++ >> 3 files changed, 68 insertions(+) >> >> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c >> index af0cc34..4f9d7c1 100644 >> --- a/drivers/pci/pci.c >> +++ b/drivers/pci/pci.c >> @@ -4854,6 +4854,35 @@ int pcie_set_mps(struct pci_dev *dev, int mps) >> EXPORT_SYMBOL(pcie_set_mps); >> >> /** >> + * pcie_clear_relaxed_ordering - clear PCI Express relaxed ordering bit >> + * @dev: PCI device to query >> + * >> + * If possible clear relaxed ordering > > Why "If possible"? The bit is required to be RW or hardwired to zero, > so PCI_EXP_DEVCTL_RELAX_EN should *always* be zero when this returns. > OK >> + */ >> +int pcie_clear_relaxed_ordering(struct pci_dev *dev) >> +{ ... >> >> ___ >> linux-arm-kernel mailing list >> linux-arm-ker...@lists.infradead.org >> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel > > . >
Re: [PATCH v9 2/4] PCI: Disable PCIe Relaxed Ordering if unsupported
| From: Bjorn Helgaas | Sent: Tuesday, August 8, 2017 7:22 PM | ... | and the caller should do something like this: | | if (pcie_relaxed_ordering_broken(pci_find_pcie_root_port(pdev))) | adapter->flags |= ROOT_NO_RELAXED_ORDERING; | | That way it's obvious where the issue is, and it's obvious that the | answer might be different for peer-to-peer transactions than it is for | transactions to the root port, i.e., to coherent memory. | ... Which is back to something very close to what I initially suggested in my first patch submission. Because you're right, this isn't about broken Source Devices, it's about broken Completing Devices. Unfortunately, as Alexander Duyck noted, in a Virtual Machine we won't be able to see our Root Port in order to make this determination. (And in some Hypervisor implementations I've seen, there's not even a synthetic Root Port available to the VM at all, let alone read-only access to the real one.) So the current scheme was developed of having the Hypervisor kernel traverse down the PCIe Fabric when it finds a broken Root Port implementation (the issue that we've mostly been primarily focused upon), and turning off the PCIe Capability Device Control[Relaxed Ordering Enable]. This was to serve two purposes: 1. Turn that off in order to prevent sending any Transaction Layer Packets with the Relaxed Ordering Attribute to any Completer. Which unfortunately would also prevent Peer-to-Peer use of the Relaxed Ordering Attribute. 2. Act as a message to Device Drivers for those downstream devices that the they were dealing with a broken Root Port implementation. And this would work even for a driver in a VM with an attached device since it would be able to see the PCIe Configuration Space for the attached device. I haven't been excited about any of this because: A. While so far all of the examples we've talked about are broken Root Port Completers, it's perfectly possible that other devices could be broken -- say an NVMe Device which is not "Coherent Memory". How would this fit into the framework APIs being described? B. I have yet to see a n example of how the currently proposed API framework would be used in a hybrid environment where TLPs to the Root Port would not use Relaxed Ordering, but TLPs to a Peer would use Relaxed Ordering. So far its all been about using a "big hammer" to completely disable the use of Relaxed Ordering. But the VM problem keeps cropping up over and over. A driver in a VM doesn't have access to the Root Port to determine if its on a "Black List" and our only way of communicating with the driver in the VM is to leave the device in a particular state (i.e. initialize the PCIe Capability Device Control[Relaxed Ordering Enable] to "off"). Oh, and also, on the current patch submission's focus on broken Root Port implementations: one could suggest that even if we're stuck with the "Device attached to a VM Conundrum", that what we should really be thinking about is if ANY device within a PCIe Fabric has broken Relaxed Ordering completion problems, and, if so, "poisoning" the entire containing PCIe Fabric by turning off Relaxed Ordering Enable for every device, up, down sideways -- including the Root Port itself. | ... | This associates the message with the Requester that may potentially | use relaxed ordering. But there's nothing wrong or unusual about the | Requester; the issue is with the *Completer*, so I think the message | should be in the quirk where we set PCI_DEV_FLAGS_NO_RELAXED_ORDERING. | Maybe it should be both places; I dunno. | | This implementation assumes the device only initiates transactions to | coherent memory, i.e., it assumes the device never does peer-to-peer | DMA. I guess we'll have to wait and see if we trip over any | peer-to-peer issues, then figure out how to handle them. | ... Yes, as soon as we want to implement the hybrid use of Relaxed Ordering I mentioned above. And that the Intel document mentions itself. Casey
Re: [PATCH v9 2/4] PCI: Disable PCIe Relaxed Ordering if unsupported
On Tue, Aug 08, 2017 at 09:22:39PM -0500, Bjorn Helgaas wrote: > On Sat, Aug 05, 2017 at 03:15:11PM +0800, Ding Tianhong wrote: > > When bit4 is set in the PCIe Device Control register, it indicates > > whether the device is permitted to use relaxed ordering. > > On some platforms using relaxed ordering can have performance issues or > > due to erratum can cause data-corruption. In such cases devices must avoid > > using relaxed ordering. > > > > This patch checks if there is any node in the hierarchy that indicates that > > using relaxed ordering is not safe. ... > > +EXPORT_SYMBOL(pcie_relaxed_ordering_supported); > > This is misnamed. This doesn't tell us anything about whether the > device *supports* relaxed ordering. It only tells us whether the > device is *permitted* to use it. > > When a device initiates a transaction, the hardware should set the RO > bit in the TLP with logic something like this: > > RO = && > && > > > The issue you're fixing is that some Completers don't handle RO > correctly. The determining factor is not the Requester, but the > Completer (for this series, a Root Port). So I think this should be > something like: > > int pcie_relaxed_ordering_broken(struct pci_dev *completer) > { > if (!completer) > return 0; > > return completer->dev_flags & PCI_DEV_FLAGS_NO_RELAXED_ORDERING; > } > > and the caller should do something like this: > > if (pcie_relaxed_ordering_broken(pci_find_pcie_root_port(pdev))) >adapter->flags |= ROOT_NO_RELAXED_ORDERING; > > That way it's obvious where the issue is, and it's obvious that the > answer might be different for peer-to-peer transactions than it is for > transactions to the root port, i.e., to coherent memory. After looking at the driver, I wonder if it would be simpler like this: int pcie_relaxed_ordering_enabled(struct pci_dev *dev) { u16 ctl; pcie_capability_read_word(dev, PCI_EXP_DEVCTL, &ctl); return ctl & PCI_EXP_DEVCTL_RELAX_EN; } EXPORT_SYMBOL(pcie_relaxed_ordering_enabled); static void pci_configure_relaxed_ordering(struct pci_dev *dev) { struct pci_dev *root; if (dev->is_virtfn) return; /* PCI_EXP_DEVCTL_RELAX_EN is RsvdP in VFs */ if (!pcie_relaxed_ordering_enabled(dev)) return; /* * For now, we only deal with Relaxed Ordering issues with Root * Ports. Peer-to-peer DMA is another can of worms. */ root = pci_find_pcie_root_port(dev); if (!root) return; if (root->relaxed_ordering_broken) pcie_capability_clear_word(dev, PCI_EXP_DEVCTL, PCI_EXP_DEVCTL_RELAX_EN); } This doesn't check every intervening switch, but I don't think we know about any issues except with root ports. And the driver could do: if (!pcie_relaxed_ordering_enabled(pdev)) adapter->flags |= ROOT_NO_RELAXED_ORDERING; The driver code wouldn't show anything about coherent memory vs. peer-to-peer, but we really don't have a clue about how to handle that yet anyway. I guess this is back to exactly what you proposed, except that I changed the name of pcie_relaxed_ordering_supported() to pcie_relaxed_ordering_enabled(), which I think is slightly more specific from the device's point of view. Bjorn
Re: [PATCH v9 2/4] PCI: Disable PCIe Relaxed Ordering if unsupported
On Sat, Aug 05, 2017 at 03:15:11PM +0800, Ding Tianhong wrote: > When bit4 is set in the PCIe Device Control register, it indicates > whether the device is permitted to use relaxed ordering. > On some platforms using relaxed ordering can have performance issues or > due to erratum can cause data-corruption. In such cases devices must avoid > using relaxed ordering. > > This patch checks if there is any node in the hierarchy that indicates that > using relaxed ordering is not safe. I think you only check the devices between the root port and the target device. For example, you don't check siblings or cousins of the target device. > In such cases the patch turns off the > relaxed ordering by clearing the eapability for this device. s/eapability/capability/ > And if the > device is probably running in a guest machine, we should do nothing. I don't know what this sentence means. "Probably running in a guest machine" doesn't really make sense, and there's nothing in your patch that explicitly checks for being in a guest machine. > Signed-off-by: Ding Tianhong > Acked-by: Alexander Duyck > Acked-by: Ashok Raj > --- > drivers/pci/pci.c | 29 + > drivers/pci/probe.c | 37 + > include/linux/pci.h | 2 ++ > 3 files changed, 68 insertions(+) > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > index af0cc34..4f9d7c1 100644 > --- a/drivers/pci/pci.c > +++ b/drivers/pci/pci.c > @@ -4854,6 +4854,35 @@ int pcie_set_mps(struct pci_dev *dev, int mps) > EXPORT_SYMBOL(pcie_set_mps); > > /** > + * pcie_clear_relaxed_ordering - clear PCI Express relaxed ordering bit > + * @dev: PCI device to query > + * > + * If possible clear relaxed ordering Why "If possible"? The bit is required to be RW or hardwired to zero, so PCI_EXP_DEVCTL_RELAX_EN should *always* be zero when this returns. > + */ > +int pcie_clear_relaxed_ordering(struct pci_dev *dev) > +{ > + return pcie_capability_clear_word(dev, PCI_EXP_DEVCTL, > + PCI_EXP_DEVCTL_RELAX_EN); > +} > +EXPORT_SYMBOL(pcie_clear_relaxed_ordering); The current series doesn't add any callers of this except pci_configure_relaxed_ordering(), so it doesn't need to be exported to modules. I think I would put both of these functions in drivers/pci/probe.c. Then this one could be static and you'd only have to add pcie_relaxed_ordering_supported() to include/linux/pci.h. > + > +/** > + * pcie_relaxed_ordering_supported - Probe for PCIe relexed ordering support s/relexed/relaxed/ > + * @dev: PCI device to query > + * > + * Returns true if the device support relaxed ordering attribute. > + */ > +bool pcie_relaxed_ordering_supported(struct pci_dev *dev) > +{ > + u16 v; > + > + pcie_capability_read_word(dev, PCI_EXP_DEVCTL, &v); > + > + return !!(v & PCI_EXP_DEVCTL_RELAX_EN); > +} > +EXPORT_SYMBOL(pcie_relaxed_ordering_supported); This is misnamed. This doesn't tell us anything about whether the device *supports* relaxed ordering. It only tells us whether the device is *permitted* to use it. When a device initiates a transaction, the hardware should set the RO bit in the TLP with logic something like this: RO = && && The issue you're fixing is that some Completers don't handle RO correctly. The determining factor is not the Requester, but the Completer (for this series, a Root Port). So I think this should be something like: int pcie_relaxed_ordering_broken(struct pci_dev *completer) { if (!completer) return 0; return completer->dev_flags & PCI_DEV_FLAGS_NO_RELAXED_ORDERING; } and the caller should do something like this: if (pcie_relaxed_ordering_broken(pci_find_pcie_root_port(pdev))) adapter->flags |= ROOT_NO_RELAXED_ORDERING; That way it's obvious where the issue is, and it's obvious that the answer might be different for peer-to-peer transactions than it is for transactions to the root port, i.e., to coherent memory. > + > +/** > * pcie_get_minimum_link - determine minimum link settings of a PCI device > * @dev: PCI device to query > * @speed: storage for minimum speed > diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c > index c31310d..48df012 100644 > --- a/drivers/pci/probe.c > +++ b/drivers/pci/probe.c > @@ -1762,6 +1762,42 @@ static void pci_configure_extended_tags(struct pci_dev > *dev) >PCI_EXP_DEVCTL_EXT_TAG); > } > > +/** > + * pci_dev_should_disable_relaxed_ordering - check if the PCI device > + * should disable the relaxed ordering attribute. > + * @dev: PCI device > + * > + * Return true if any of the PCI devices above us do not support > + * relaxed ordering. > + */ > +static bool pci_dev_should_disable_relaxed_ordering(struct pci_dev *dev) > +{ > + while (dev) { > + if (dev->dev_flags & PCI_DEV_FLAGS_NO_RELAXED_ORDERING) > + return true; > + > + dev = dev->bus->self; > + }
[PATCH v9 2/4] PCI: Disable PCIe Relaxed Ordering if unsupported
When bit4 is set in the PCIe Device Control register, it indicates whether the device is permitted to use relaxed ordering. On some platforms using relaxed ordering can have performance issues or due to erratum can cause data-corruption. In such cases devices must avoid using relaxed ordering. This patch checks if there is any node in the hierarchy that indicates that using relaxed ordering is not safe. In such cases the patch turns off the relaxed ordering by clearing the eapability for this device. And if the device is probably running in a guest machine, we should do nothing. Signed-off-by: Ding Tianhong Acked-by: Alexander Duyck Acked-by: Ashok Raj --- drivers/pci/pci.c | 29 + drivers/pci/probe.c | 37 + include/linux/pci.h | 2 ++ 3 files changed, 68 insertions(+) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index af0cc34..4f9d7c1 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -4854,6 +4854,35 @@ int pcie_set_mps(struct pci_dev *dev, int mps) EXPORT_SYMBOL(pcie_set_mps); /** + * pcie_clear_relaxed_ordering - clear PCI Express relaxed ordering bit + * @dev: PCI device to query + * + * If possible clear relaxed ordering + */ +int pcie_clear_relaxed_ordering(struct pci_dev *dev) +{ + return pcie_capability_clear_word(dev, PCI_EXP_DEVCTL, + PCI_EXP_DEVCTL_RELAX_EN); +} +EXPORT_SYMBOL(pcie_clear_relaxed_ordering); + +/** + * pcie_relaxed_ordering_supported - Probe for PCIe relexed ordering support + * @dev: PCI device to query + * + * Returns true if the device support relaxed ordering attribute. + */ +bool pcie_relaxed_ordering_supported(struct pci_dev *dev) +{ + u16 v; + + pcie_capability_read_word(dev, PCI_EXP_DEVCTL, &v); + + return !!(v & PCI_EXP_DEVCTL_RELAX_EN); +} +EXPORT_SYMBOL(pcie_relaxed_ordering_supported); + +/** * pcie_get_minimum_link - determine minimum link settings of a PCI device * @dev: PCI device to query * @speed: storage for minimum speed diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c index c31310d..48df012 100644 --- a/drivers/pci/probe.c +++ b/drivers/pci/probe.c @@ -1762,6 +1762,42 @@ static void pci_configure_extended_tags(struct pci_dev *dev) PCI_EXP_DEVCTL_EXT_TAG); } +/** + * pci_dev_should_disable_relaxed_ordering - check if the PCI device + * should disable the relaxed ordering attribute. + * @dev: PCI device + * + * Return true if any of the PCI devices above us do not support + * relaxed ordering. + */ +static bool pci_dev_should_disable_relaxed_ordering(struct pci_dev *dev) +{ + while (dev) { + if (dev->dev_flags & PCI_DEV_FLAGS_NO_RELAXED_ORDERING) + return true; + + dev = dev->bus->self; + } + + return false; +} + +static void pci_configure_relaxed_ordering(struct pci_dev *dev) +{ + /* We should not alter the relaxed ordering bit for the VF */ + if (dev->is_virtfn) + return; + + /* If the releaxed ordering enable bit is not set, do nothing. */ + if (!pcie_relaxed_ordering_supported(dev)) + return; + + if (pci_dev_should_disable_relaxed_ordering(dev)) { + pcie_clear_relaxed_ordering(dev); + dev_info(&dev->dev, "Disable Relaxed Ordering\n"); + } +} + static void pci_configure_device(struct pci_dev *dev) { struct hotplug_params hpp; @@ -1769,6 +1805,7 @@ static void pci_configure_device(struct pci_dev *dev) pci_configure_mps(dev); pci_configure_extended_tags(dev); + pci_configure_relaxed_ordering(dev); memset(&hpp, 0, sizeof(hpp)); ret = pci_get_hp_params(dev, &hpp); diff --git a/include/linux/pci.h b/include/linux/pci.h index 412ec1c..3aa23a2 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -1127,6 +1127,8 @@ int pci_add_ext_cap_save_buffer(struct pci_dev *dev, void pci_pme_wakeup_bus(struct pci_bus *bus); void pci_d3cold_enable(struct pci_dev *dev); void pci_d3cold_disable(struct pci_dev *dev); +int pcie_clear_relaxed_ordering(struct pci_dev *dev); +bool pcie_relaxed_ordering_supported(struct pci_dev *dev); /* PCI Virtual Channel */ int pci_save_vc_state(struct pci_dev *dev); -- 1.8.3.1