Re: [PATCH v8 2/3] PCI/AER: Disable AER service on suspend
On Wed, Jun 19, 2024 at 4:48 AM Bjorn Helgaas wrote: > > On Thu, Apr 25, 2024 at 03:33:01PM +0800, Kai-Heng Feng wrote: > > On Fri, Apr 19, 2024 at 4:35 AM Bjorn Helgaas wrote: > > > > > > On Tue, Apr 16, 2024 at 12:32:24PM +0800, Kai-Heng Feng wrote: > > > > When the power rail gets cut off, the hardware can create some electric > > > > noise on the link that triggers AER. If IRQ is shared between AER with > > > > PME, such AER noise will cause a spurious wakeup on system suspend. > > > > > > > > When the power rail gets back, the firmware of the device resets itself > > > > and can create unexpected behavior like sending PTM messages. For this > > > > case, the driver will always be too late to toggle off features should > > > > be disabled. > > > > > > > > As Per PCIe Base Spec 5.0, section 5.2, titled "Link State Power > > > > Management", TLP and DLLP transmission are disabled for a Link in L2/L3 > > > > Ready (D3hot), L2 (D3cold with aux power) and L3 (D3cold) states. So if > > > > the power will be turned off during suspend process, disable AER service > > > > and re-enable it during the resume process. This should not affect the > > > > basic functionality. > > > > > > > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=209149 > > > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=216295 > > > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=218090 > > > > Signed-off-by: Kai-Heng Feng > > > > > > Thanks for reviving this series. I tried follow the history about > > > this, but there are at least two series that were very similar and I > > > can't put it all together. > > > > > > > --- > > > > v8: > > > > - Add more bug reports. > > > > > > > > v7: > > > > - Wording > > > > - Disable AER completely (again) if power will be turned off > > > > > > > > v6: > > > > v5: > > > > - Wording. > > > > > > > > v4: > > > > v3: > > > > - No change. > > > > > > > > v2: > > > > - Only disable AER IRQ. > > > > - No more check on PME IRQ#. > > > > - Use helper. > > > > > > > > drivers/pci/pcie/aer.c | 25 + > > > > 1 file changed, 25 insertions(+) > > > > > > > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c > > > > index ac6293c24976..bea7818c2d1b 100644 > > > > --- a/drivers/pci/pcie/aer.c > > > > +++ b/drivers/pci/pcie/aer.c > > > > @@ -28,6 +28,7 @@ > > > > #include > > > > #include > > > > #include > > > > +#include > > > > #include > > > > #include > > > > #include > > > > @@ -1497,6 +1498,28 @@ static int aer_probe(struct pcie_device *dev) > > > > return 0; > > > > } > > > > > > > > +static int aer_suspend(struct pcie_device *dev) > > > > +{ > > > > + struct aer_rpc *rpc = get_service_data(dev); > > > > + struct pci_dev *pdev = rpc->rpd; > > > > + > > > > + if (pci_ancestor_pr3_present(pdev) || pm_suspend_via_firmware()) > > > > + aer_disable_rootport(rpc); > > > > > > Why do we check pci_ancestor_pr3_present(pdev) and > > > pm_suspend_via_firmware()? I'm getting pretty convinced that we need > > > to disable AER interrupts on suspend in general. I think it will be > > > better if we do that consistently on all platforms, not special cases > > > based on details of how we suspend. > > > > Sure. Will change in next revision. > > > > > Also, why do we use aer_disable_rootport() instead of just > > > aer_disable_irq()? I think it's the interrupt that causes issues on > > > suspend. I see that there *were* some versions that used > > > aer_disable_irq(), but I can't find the reason it changed. > > > > Interrupt can cause system wakeup, if it's shared with PME. > > > > The reason why aer_disable_rootport() is used over aer_disable_irq() > > is that when the latter is used the error still gets logged during > > sleep cycle. Once the pcieport driver resumes, it invokes > > aer_root_reset() to reset the hierarchy, while the hierarchy hasn't
Re: [PATCH v8 2/3] PCI/AER: Disable AER service on suspend
On Fri, Apr 19, 2024 at 4:35 AM Bjorn Helgaas wrote: > > On Tue, Apr 16, 2024 at 12:32:24PM +0800, Kai-Heng Feng wrote: > > When the power rail gets cut off, the hardware can create some electric > > noise on the link that triggers AER. If IRQ is shared between AER with > > PME, such AER noise will cause a spurious wakeup on system suspend. > > > > When the power rail gets back, the firmware of the device resets itself > > and can create unexpected behavior like sending PTM messages. For this > > case, the driver will always be too late to toggle off features should > > be disabled. > > > > As Per PCIe Base Spec 5.0, section 5.2, titled "Link State Power > > Management", TLP and DLLP transmission are disabled for a Link in L2/L3 > > Ready (D3hot), L2 (D3cold with aux power) and L3 (D3cold) states. So if > > the power will be turned off during suspend process, disable AER service > > and re-enable it during the resume process. This should not affect the > > basic functionality. > > > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=209149 > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=216295 > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=218090 > > Signed-off-by: Kai-Heng Feng > > Thanks for reviving this series. I tried follow the history about > this, but there are at least two series that were very similar and I > can't put it all together. > > > --- > > v8: > > - Add more bug reports. > > > > v7: > > - Wording > > - Disable AER completely (again) if power will be turned off > > > > v6: > > v5: > > - Wording. > > > > v4: > > v3: > > - No change. > > > > v2: > > - Only disable AER IRQ. > > - No more check on PME IRQ#. > > - Use helper. > > > > drivers/pci/pcie/aer.c | 25 + > > 1 file changed, 25 insertions(+) > > > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c > > index ac6293c24976..bea7818c2d1b 100644 > > --- a/drivers/pci/pcie/aer.c > > +++ b/drivers/pci/pcie/aer.c > > @@ -28,6 +28,7 @@ > > #include > > #include > > #include > > +#include > > #include > > #include > > #include > > @@ -1497,6 +1498,28 @@ static int aer_probe(struct pcie_device *dev) > > return 0; > > } > > > > +static int aer_suspend(struct pcie_device *dev) > > +{ > > + struct aer_rpc *rpc = get_service_data(dev); > > + struct pci_dev *pdev = rpc->rpd; > > + > > + if (pci_ancestor_pr3_present(pdev) || pm_suspend_via_firmware()) > > + aer_disable_rootport(rpc); > > Why do we check pci_ancestor_pr3_present(pdev) and > pm_suspend_via_firmware()? I'm getting pretty convinced that we need > to disable AER interrupts on suspend in general. I think it will be > better if we do that consistently on all platforms, not special cases > based on details of how we suspend. Sure. Will change in next revision. > > Also, why do we use aer_disable_rootport() instead of just > aer_disable_irq()? I think it's the interrupt that causes issues on > suspend. I see that there *were* some versions that used > aer_disable_irq(), but I can't find the reason it changed. Interrupt can cause system wakeup, if it's shared with PME. The reason why aer_disable_rootport() is used over aer_disable_irq() is that when the latter is used the error still gets logged during sleep cycle. Once the pcieport driver resumes, it invokes aer_root_reset() to reset the hierarchy, while the hierarchy hasn't resumed yet. So use aer_disable_rootport() to prevent such issue from happening. Kai-Heng > > > + > > + return 0; > > +} > > + > > +static int aer_resume(struct pcie_device *dev) > > +{ > > + struct aer_rpc *rpc = get_service_data(dev); > > + struct pci_dev *pdev = rpc->rpd; > > + > > + if (pci_ancestor_pr3_present(pdev) || pm_resume_via_firmware()) > > + aer_enable_rootport(rpc); > > + > > + return 0; > > +} > > + > > /** > > * aer_root_reset - reset Root Port hierarchy, RCEC, or RCiEP > > * @dev: pointer to Root Port, RCEC, or RCiEP > > @@ -1561,6 +1584,8 @@ static struct pcie_port_service_driver aerdriver = { > > .service= PCIE_PORT_SERVICE_AER, > > > > .probe = aer_probe, > > + .suspend= aer_suspend, > > + .resume = aer_resume, > > .remove = aer_remove, > > }; > > > > -- > > 2.34.1 > >
Re: [PATCH v8 1/3] PCI: Add helper to check if any of ancestor device support D3cold
On Thu, Apr 18, 2024 at 9:15 AM Kuppuswamy Sathyanarayanan wrote: > > > On 4/15/24 9:32 PM, Kai-Heng Feng wrote: > > In addition to nearest upstream bridge, driver may want to know if the > > entire hierarchy can be powered off to perform different action. > > > > So walk higher up the hierarchy to find out if any device has valid > > _PR3. > > > > The user will be introduced in next patch. > > > > Signed-off-by: Kai-Heng Feng > > --- > > Since it has been a while, I was not sure what this series is about. > > IMO, it is better to include a cover letter with the summary of your > changes. OK, will do in next revision. > > > > v8: > > - No change. > > > > drivers/pci/pci.c | 16 > > include/linux/pci.h | 2 ++ > > 2 files changed, 18 insertions(+) > > > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > > index e5f243dd4288..7a5662f116b8 100644 > > --- a/drivers/pci/pci.c > > +++ b/drivers/pci/pci.c > > @@ -6225,6 +6225,22 @@ bool pci_pr3_present(struct pci_dev *pdev) > > acpi_has_method(adev->handle, "_PR3"); > > } > > EXPORT_SYMBOL_GPL(pci_pr3_present); > > + > > +bool pci_ancestor_pr3_present(struct pci_dev *pdev) > > +{ > > + struct pci_dev *parent = pdev; > > + > > + if (acpi_disabled) > > + return false; > > + > > + while ((parent = pci_upstream_bridge(parent))) { > > + if (pci_pr3_present(pdev)) > > I think it should be "parent" here? Thanks for catching this. But this patch will be dropped in next version for better simplicity. Kai-Heng > > > + return true; > > + } > > + > > + return false; > > +} > > +EXPORT_SYMBOL_GPL(pci_ancestor_pr3_present); > > #endif > > > > /** > > diff --git a/include/linux/pci.h b/include/linux/pci.h > > index 16493426a04f..cd71ebfd0f89 100644 > > --- a/include/linux/pci.h > > +++ b/include/linux/pci.h > > @@ -2620,10 +2620,12 @@ struct irq_domain > > *pci_host_bridge_acpi_msi_domain(struct pci_bus *bus); > > void > > pci_msi_register_fwnode_provider(struct fwnode_handle *(*fn)(struct device > > *)); > > bool pci_pr3_present(struct pci_dev *pdev); > > +bool pci_ancestor_pr3_present(struct pci_dev *pdev); > > #else > > static inline struct irq_domain * > > pci_host_bridge_acpi_msi_domain(struct pci_bus *bus) { return NULL; } > > static inline bool pci_pr3_present(struct pci_dev *pdev) { return false; } > > +static inline bool pci_ancestor_pr3_present(struct pci_dev *pdev) { return > > false; } > > #endif > > > > #ifdef CONFIG_EEH > > -- > Sathyanarayanan Kuppuswamy > Linux Kernel Developer >
[PATCH v8 3/3] PCI/DPC: Disable DPC service on suspend
When the power rail gets cut off, the hardware can create some electric noise on the link that triggers AER. If IRQ is shared between AER with PME, such AER noise will cause a spurious wakeup on system suspend. When the power rail gets back, the firmware of the device resets itself and can create unexpected behavior like sending PTM messages. If DPC is enabled, the DPC reset happens before driver's resume routine. The DPC reset usually fails because the device is not in the right state, and the resume also fails because the device is being reset by hardware. If the scenario happens to device like NVMe, it means the whole system resume fails. As Per PCIe Base Spec 5.0, section 5.2, titled "Link State Power Management", TLP and DLLP transmission are disabled for a Link in L2/L3 Ready (D3hot), L2 (D3cold with aux power) and L3 (D3cold) states. So if the power will be turned off during suspend process, disable DPC service and re-enable it during the resume process. This should not affect the basic functionality. Furthermore, since DPC depends on AER to function, and AER is disabled in previous patch, also disable DPC here. Link: https://bugzilla.kernel.org/show_bug.cgi?id=209149 Link: https://bugzilla.kernel.org/show_bug.cgi?id=216295 Link: https://bugzilla.kernel.org/show_bug.cgi?id=218090 Signed-off-by: Kai-Heng Feng --- v8: - Wording. - Add more bug reports. v7: - Wording. - Disable DPC completely (again) if power will be turned off v6: v5: - Wording. v4: v3: - No change. v2: - Only disable DPC IRQ. - No more check on PME IRQ#. drivers/pci/pcie/dpc.c | 57 ++ 1 file changed, 46 insertions(+), 11 deletions(-) diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c index a668820696dc..7682ac4d6a89 100644 --- a/drivers/pci/pcie/dpc.c +++ b/drivers/pci/pcie/dpc.c @@ -14,6 +14,7 @@ #include #include #include +#include #include "portdrv.h" #include "../pci.h" @@ -412,13 +413,34 @@ void pci_dpc_init(struct pci_dev *pdev) } } +static void dpc_enable(struct pcie_device *dev) +{ + struct pci_dev *pdev = dev->port; + u16 ctl; + + pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, &ctl); + ctl &= ~PCI_EXP_DPC_CTL_EN_MASK; + ctl |= PCI_EXP_DPC_CTL_EN_FATAL | PCI_EXP_DPC_CTL_INT_EN; + pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, ctl); +} + +static void dpc_disable(struct pcie_device *dev) +{ + struct pci_dev *pdev = dev->port; + u16 ctl; + + pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, &ctl); + ctl &= ~(PCI_EXP_DPC_CTL_EN_FATAL | PCI_EXP_DPC_CTL_INT_EN); + pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, ctl); +} + #define FLAG(x, y) (((x) & (y)) ? '+' : '-') static int dpc_probe(struct pcie_device *dev) { struct pci_dev *pdev = dev->port; struct device *device = &dev->device; int status; - u16 ctl, cap; + u16 cap; if (!pcie_aer_is_native(pdev) && !pcie_ports_dpc_native) return -ENOTSUPP; @@ -433,11 +455,7 @@ static int dpc_probe(struct pcie_device *dev) } pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CAP, &cap); - - pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, &ctl); - ctl &= ~PCI_EXP_DPC_CTL_EN_MASK; - ctl |= PCI_EXP_DPC_CTL_EN_FATAL | PCI_EXP_DPC_CTL_INT_EN; - pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, ctl); + dpc_enable(dev); pci_info(pdev, "enabled with IRQ %d\n", dev->irq); pci_info(pdev, "error containment capabilities: Int Msg #%d, RPExt%c PoisonedTLP%c SwTrigger%c RP PIO Log %d, DL_ActiveErr%c\n", @@ -450,14 +468,29 @@ static int dpc_probe(struct pcie_device *dev) return status; } -static void dpc_remove(struct pcie_device *dev) +static int dpc_suspend(struct pcie_device *dev) { struct pci_dev *pdev = dev->port; - u16 ctl; - pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, &ctl); - ctl &= ~(PCI_EXP_DPC_CTL_EN_FATAL | PCI_EXP_DPC_CTL_INT_EN); - pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, ctl); + if (pci_ancestor_pr3_present(pdev) || pm_suspend_via_firmware()) + dpc_disable(dev); + + return 0; +} + +static int dpc_resume(struct pcie_device *dev) +{ + struct pci_dev *pdev = dev->port; + + if (pci_ancestor_pr3_present(pdev) || pm_resume_via_firmware()) + dpc_enable(dev); + + return 0; +} + +static void dpc_remove(struct pcie_device *dev) +{ + dpc_disable(dev); } static struct pcie_port_service_driver dpcdriver = { @@ -465,6 +498,8 @@ static struct pcie_port_service_driver dpcdriver = { .port_type = PCIE_ANY_PORT,
[PATCH v8 2/3] PCI/AER: Disable AER service on suspend
When the power rail gets cut off, the hardware can create some electric noise on the link that triggers AER. If IRQ is shared between AER with PME, such AER noise will cause a spurious wakeup on system suspend. When the power rail gets back, the firmware of the device resets itself and can create unexpected behavior like sending PTM messages. For this case, the driver will always be too late to toggle off features should be disabled. As Per PCIe Base Spec 5.0, section 5.2, titled "Link State Power Management", TLP and DLLP transmission are disabled for a Link in L2/L3 Ready (D3hot), L2 (D3cold with aux power) and L3 (D3cold) states. So if the power will be turned off during suspend process, disable AER service and re-enable it during the resume process. This should not affect the basic functionality. Link: https://bugzilla.kernel.org/show_bug.cgi?id=209149 Link: https://bugzilla.kernel.org/show_bug.cgi?id=216295 Link: https://bugzilla.kernel.org/show_bug.cgi?id=218090 Signed-off-by: Kai-Heng Feng --- v8: - Add more bug reports. v7: - Wording - Disable AER completely (again) if power will be turned off v6: v5: - Wording. v4: v3: - No change. v2: - Only disable AER IRQ. - No more check on PME IRQ#. - Use helper. drivers/pci/pcie/aer.c | 25 + 1 file changed, 25 insertions(+) diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index ac6293c24976..bea7818c2d1b 100644 --- a/drivers/pci/pcie/aer.c +++ b/drivers/pci/pcie/aer.c @@ -28,6 +28,7 @@ #include #include #include +#include #include #include #include @@ -1497,6 +1498,28 @@ static int aer_probe(struct pcie_device *dev) return 0; } +static int aer_suspend(struct pcie_device *dev) +{ + struct aer_rpc *rpc = get_service_data(dev); + struct pci_dev *pdev = rpc->rpd; + + if (pci_ancestor_pr3_present(pdev) || pm_suspend_via_firmware()) + aer_disable_rootport(rpc); + + return 0; +} + +static int aer_resume(struct pcie_device *dev) +{ + struct aer_rpc *rpc = get_service_data(dev); + struct pci_dev *pdev = rpc->rpd; + + if (pci_ancestor_pr3_present(pdev) || pm_resume_via_firmware()) + aer_enable_rootport(rpc); + + return 0; +} + /** * aer_root_reset - reset Root Port hierarchy, RCEC, or RCiEP * @dev: pointer to Root Port, RCEC, or RCiEP @@ -1561,6 +1584,8 @@ static struct pcie_port_service_driver aerdriver = { .service= PCIE_PORT_SERVICE_AER, .probe = aer_probe, + .suspend= aer_suspend, + .resume = aer_resume, .remove = aer_remove, }; -- 2.34.1
[PATCH v8 1/3] PCI: Add helper to check if any of ancestor device support D3cold
In addition to nearest upstream bridge, driver may want to know if the entire hierarchy can be powered off to perform different action. So walk higher up the hierarchy to find out if any device has valid _PR3. The user will be introduced in next patch. Signed-off-by: Kai-Heng Feng --- v8: - No change. drivers/pci/pci.c | 16 include/linux/pci.h | 2 ++ 2 files changed, 18 insertions(+) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index e5f243dd4288..7a5662f116b8 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -6225,6 +6225,22 @@ bool pci_pr3_present(struct pci_dev *pdev) acpi_has_method(adev->handle, "_PR3"); } EXPORT_SYMBOL_GPL(pci_pr3_present); + +bool pci_ancestor_pr3_present(struct pci_dev *pdev) +{ + struct pci_dev *parent = pdev; + + if (acpi_disabled) + return false; + + while ((parent = pci_upstream_bridge(parent))) { + if (pci_pr3_present(pdev)) + return true; + } + + return false; +} +EXPORT_SYMBOL_GPL(pci_ancestor_pr3_present); #endif /** diff --git a/include/linux/pci.h b/include/linux/pci.h index 16493426a04f..cd71ebfd0f89 100644 --- a/include/linux/pci.h +++ b/include/linux/pci.h @@ -2620,10 +2620,12 @@ struct irq_domain *pci_host_bridge_acpi_msi_domain(struct pci_bus *bus); void pci_msi_register_fwnode_provider(struct fwnode_handle *(*fn)(struct device *)); bool pci_pr3_present(struct pci_dev *pdev); +bool pci_ancestor_pr3_present(struct pci_dev *pdev); #else static inline struct irq_domain * pci_host_bridge_acpi_msi_domain(struct pci_bus *bus) { return NULL; } static inline bool pci_pr3_present(struct pci_dev *pdev) { return false; } +static inline bool pci_ancestor_pr3_present(struct pci_dev *pdev) { return false; } #endif #ifdef CONFIG_EEH -- 2.34.1
[PATCH 3/3] PCI/DPC: Disable DPC service on suspend
When the power rail gets cut off, the hardware can create some electric noise on the link that triggers AER. If IRQ is shared between AER with PME, such AER noise will cause a spurious wakeup on system suspend. When the power rail gets back, the firmware of the device resets itself and can create unexpected behavior like sending PTM messages. For this case, the driver will always be too late to toggle off features should be disabled. As Per PCIe Base Spec 5.0, section 5.2, titled "Link State Power Management", TLP and DLLP transmission are disabled for a Link in L2/L3 Ready (D3hot), L2 (D3cold with aux power) and L3 (D3cold) states. So if the power will be turned off during suspend process, disable DPC service and re-enable it during the resume process. This should not affect the basic functionality. Since DPC depends on AER to function, also disable DPC. Link: https://bugzilla.kernel.org/show_bug.cgi?id=209149 Link: https://bugzilla.kernel.org/show_bug.cgi?id=216295 v7: - Wording. - Disable DPC completely (again) if power will be turned off v6: v5: - Wording. v4: v3: - No change. v2: - Only disable DPC IRQ. - No more check on PME IRQ#. Signed-off-by: Kai-Heng Feng --- drivers/pci/pcie/dpc.c | 56 ++ 1 file changed, 46 insertions(+), 10 deletions(-) diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c index 3ceed8e3de41..73426addb2f1 100644 --- a/drivers/pci/pcie/dpc.c +++ b/drivers/pci/pcie/dpc.c @@ -13,6 +13,7 @@ #include #include #include +#include #include "portdrv.h" #include "../pci.h" @@ -347,13 +348,34 @@ void pci_dpc_init(struct pci_dev *pdev) } } +static void dpc_enable(struct pcie_device *dev) +{ + struct pci_dev *pdev = dev->port; + u16 ctl; + + pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, &ctl); + + ctl = (ctl & 0xfff4) | PCI_EXP_DPC_CTL_EN_FATAL | PCI_EXP_DPC_CTL_INT_EN; + pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, ctl); +} + +static void dpc_disable(struct pcie_device *dev) +{ + struct pci_dev *pdev = dev->port; + u16 ctl; + + pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, &ctl); + ctl &= ~(PCI_EXP_DPC_CTL_EN_FATAL | PCI_EXP_DPC_CTL_INT_EN); + pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, ctl); +} + #define FLAG(x, y) (((x) & (y)) ? '+' : '-') static int dpc_probe(struct pcie_device *dev) { struct pci_dev *pdev = dev->port; struct device *device = &dev->device; int status; - u16 ctl, cap; + u16 cap; if (!pcie_aer_is_native(pdev) && !pcie_ports_dpc_native) return -ENOTSUPP; @@ -368,10 +390,7 @@ static int dpc_probe(struct pcie_device *dev) } pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CAP, &cap); - pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, &ctl); - - ctl = (ctl & 0xfff4) | PCI_EXP_DPC_CTL_EN_FATAL | PCI_EXP_DPC_CTL_INT_EN; - pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, ctl); + dpc_enable(dev); pci_info(pdev, "enabled with IRQ %d\n", dev->irq); pci_info(pdev, "error containment capabilities: Int Msg #%d, RPExt%c PoisonedTLP%c SwTrigger%c RP PIO Log %d, DL_ActiveErr%c\n", @@ -384,14 +403,29 @@ static int dpc_probe(struct pcie_device *dev) return status; } -static void dpc_remove(struct pcie_device *dev) +static int dpc_suspend(struct pcie_device *dev) { struct pci_dev *pdev = dev->port; - u16 ctl; - pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, &ctl); - ctl &= ~(PCI_EXP_DPC_CTL_EN_FATAL | PCI_EXP_DPC_CTL_INT_EN); - pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, ctl); + if (pci_ancestor_pr3_present(pdev) || pm_suspend_via_firmware()) + dpc_disable(dev); + + return 0; +} + +static int dpc_resume(struct pcie_device *dev) +{ + struct pci_dev *pdev = dev->port; + + if (pci_ancestor_pr3_present(pdev) || pm_resume_via_firmware()) + dpc_enable(dev); + + return 0; +} + +static void dpc_remove(struct pcie_device *dev) +{ + dpc_disable(dev); } static struct pcie_port_service_driver dpcdriver = { @@ -399,6 +433,8 @@ static struct pcie_port_service_driver dpcdriver = { .port_type = PCIE_ANY_PORT, .service= PCIE_PORT_SERVICE_DPC, .probe = dpc_probe, + .suspend= dpc_suspend, + .resume = dpc_resume, .remove = dpc_remove, }; -- 2.34.1
[PATCH 2/3] PCI/AER: Disable AER service on suspend
When the power rail gets cut off, the hardware can create some electric noise on the link that triggers AER. If IRQ is shared between AER with PME, such AER noise will cause a spurious wakeup on system suspend. When the power rail gets back, the firmware of the device resets itself and can create unexpected behavior like sending PTM messages. For this case, the driver will always be too late to toggle off features should be disabled. As Per PCIe Base Spec 5.0, section 5.2, titled "Link State Power Management", TLP and DLLP transmission are disabled for a Link in L2/L3 Ready (D3hot), L2 (D3cold with aux power) and L3 (D3cold) states. So if the power will be turned off during suspend process, disable AER service and re-enable it during the resume process. This should not affect the basic functionality. Link: https://bugzilla.kernel.org/show_bug.cgi?id=209149 Link: https://bugzilla.kernel.org/show_bug.cgi?id=216295 v7: - Wording - Disable AER completely (again) if power will be turned off v6: v5: - Wording. v4: v3: - No change. v2: - Only disable AER IRQ. - No more check on PME IRQ#. - Use helper. Signed-off-by: Kai-Heng Feng --- drivers/pci/pcie/aer.c | 25 + 1 file changed, 25 insertions(+) diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index f6c24ded134c..b5161bfedd8b 100644 --- a/drivers/pci/pcie/aer.c +++ b/drivers/pci/pcie/aer.c @@ -28,6 +28,7 @@ #include #include #include +#include #include #include @@ -1340,6 +1341,28 @@ static int aer_probe(struct pcie_device *dev) return 0; } +static int aer_suspend(struct pcie_device *dev) +{ + struct aer_rpc *rpc = get_service_data(dev); + struct pci_dev *pdev = rpc->rpd; + + if (pci_ancestor_pr3_present(pdev) || pm_suspend_via_firmware()) + aer_disable_rootport(rpc); + + return 0; +} + +static int aer_resume(struct pcie_device *dev) +{ + struct aer_rpc *rpc = get_service_data(dev); + struct pci_dev *pdev = rpc->rpd; + + if (pci_ancestor_pr3_present(pdev) || pm_resume_via_firmware()) + aer_enable_rootport(rpc); + + return 0; +} + /** * aer_root_reset - reset Root Port hierarchy, RCEC, or RCiEP * @dev: pointer to Root Port, RCEC, or RCiEP @@ -1411,6 +1434,8 @@ static struct pcie_port_service_driver aerdriver = { .service= PCIE_PORT_SERVICE_AER, .probe = aer_probe, + .suspend= aer_suspend, + .resume = aer_resume, .remove = aer_remove, }; -- 2.34.1
Re: [PATCH v6 2/3] PCI/AER: Disable AER interrupt on suspend
On Fri, Aug 11, 2023 at 4:00 PM Kai-Heng Feng wrote: > > On Thu, Aug 10, 2023 at 6:51 PM Bjorn Helgaas wrote: > > > > On Thu, Aug 10, 2023 at 04:17:21PM +0800, Kai-Heng Feng wrote: > > > On Thu, Aug 10, 2023 at 2:52 AM Bjorn Helgaas wrote: > > > > On Fri, Jul 21, 2023 at 11:58:24AM +0800, Kai-Heng Feng wrote: > > > > > On Tue, Jul 18, 2023 at 7:17 PM Bjorn Helgaas > > > > > wrote: > > > > > > On Fri, May 12, 2023 at 08:00:13AM +0800, Kai-Heng Feng wrote: > > > > > > > PCIe services that share an IRQ with PME, such as AER or DPC, > > > > > > > may cause a spurious wakeup on system suspend. To prevent this, > > > > > > > disable the AER interrupt notification during the system suspend > > > > > > > process. > > > > > > > > > > > > I see that in this particular BZ dmesg log, PME, AER, and DPC do > > > > > > share > > > > > > the same IRQ, but I don't think this is true in general. > > > > > > > > > > > > Root Ports usually use MSI or MSI-X. PME and hotplug events use the > > > > > > Interrupt Message Number in the PCIe Capability, but AER uses the > > > > > > one > > > > > > in the AER Root Error Status register, and DPC uses the one in the > > > > > > DPC > > > > > > Capability register. Those potentially correspond to three distinct > > > > > > MSI/MSI-X vectors. > > > > > > > > > > > > I think this probably has nothing to do with the IRQ being *shared*, > > > > > > but just that putting the downstream component into D3cold, where > > > > > > the > > > > > > link state is L3, may cause the upstream component to log and > > > > > > signal a > > > > > > link-related error as the link goes completely down. > > > > > > > > > > That's quite likely a better explanation than my wording. > > > > > Assuming AER IRQ and PME IRQ are not shared, does system get woken up > > > > > by AER IRQ? > > > > > > > > Rafael could answer this better than I can, but > > > > Documentation/power/suspend-and-interrupts.rst says device interrupts > > > > are generally disabled during suspend after the "late" phase of > > > > suspending devices, i.e., > > > > > > > > dpm_suspend_noirq > > > > suspend_device_irqs <-- disable non-wakeup IRQs > > > > dpm_noirq_suspend_devices > > > > ... > > > > pci_pm_suspend_noirq # (I assume) > > > > pci_prepare_to_sleep > > > > > > > > I think the downstream component would be put in D3cold by > > > > pci_prepare_to_sleep(), so non-wakeup interrupts should be disabled by > > > > then. > > > > > > > > I assume PME would generally *not* be disabled since it's needed for > > > > wakeup, so I think any interrupt that shares the PME IRQ and occurs > > > > during suspend may cause a spurious wakeup. > > > > > > Yes, that's the case here. > > > > > > > If so, it's exactly as you said at the beginning: AER/DPC/etc sharing > > > > the PME IRQ may cause spurious wakeups, and we would have to disable > > > > those other interrupts at the source, e.g., by clearing > > > > PCI_ERR_ROOT_CMD_FATAL_EN etc (exactly as your series does). > > > > > > So is the series good to be merged now? > > > > If we merge as-is, won't we disable AER & DPC interrupts unnecessarily > > in the case where the link goes to D3hot? In that case, there's no > > reason to expect interrupts related to the link going down, but things > > like PTM messages still work, and they may cause errors that we should > > know about. > > Because the issue can be observed on D3hot as well [0]. > The root port device [0] is power managed by ACPI, so I wonder if it's > reasonable to disable AER & DPC for devices that power managed by > firmware? OK, I think the D3hot case is different to this one, so I'll work on next revision that only disable AER/DPC when power is really off. In additional to disabling interrupt, is it reasonable to disable AER and DPC service completely, so unwanted electric noise wont trigger a DPC reset? Kai-Heng > [0] https://bugzilla.kernel.org/sh
Re: [PATCH v6 2/3] PCI/AER: Disable AER interrupt on suspend
On Thu, Aug 10, 2023 at 6:51 PM Bjorn Helgaas wrote: > > On Thu, Aug 10, 2023 at 04:17:21PM +0800, Kai-Heng Feng wrote: > > On Thu, Aug 10, 2023 at 2:52 AM Bjorn Helgaas wrote: > > > On Fri, Jul 21, 2023 at 11:58:24AM +0800, Kai-Heng Feng wrote: > > > > On Tue, Jul 18, 2023 at 7:17 PM Bjorn Helgaas > > > > wrote: > > > > > On Fri, May 12, 2023 at 08:00:13AM +0800, Kai-Heng Feng wrote: > > > > > > PCIe services that share an IRQ with PME, such as AER or DPC, > > > > > > may cause a spurious wakeup on system suspend. To prevent this, > > > > > > disable the AER interrupt notification during the system suspend > > > > > > process. > > > > > > > > > > I see that in this particular BZ dmesg log, PME, AER, and DPC do share > > > > > the same IRQ, but I don't think this is true in general. > > > > > > > > > > Root Ports usually use MSI or MSI-X. PME and hotplug events use the > > > > > Interrupt Message Number in the PCIe Capability, but AER uses the one > > > > > in the AER Root Error Status register, and DPC uses the one in the DPC > > > > > Capability register. Those potentially correspond to three distinct > > > > > MSI/MSI-X vectors. > > > > > > > > > > I think this probably has nothing to do with the IRQ being *shared*, > > > > > but just that putting the downstream component into D3cold, where the > > > > > link state is L3, may cause the upstream component to log and signal a > > > > > link-related error as the link goes completely down. > > > > > > > > That's quite likely a better explanation than my wording. > > > > Assuming AER IRQ and PME IRQ are not shared, does system get woken up > > > > by AER IRQ? > > > > > > Rafael could answer this better than I can, but > > > Documentation/power/suspend-and-interrupts.rst says device interrupts > > > are generally disabled during suspend after the "late" phase of > > > suspending devices, i.e., > > > > > > dpm_suspend_noirq > > > suspend_device_irqs <-- disable non-wakeup IRQs > > > dpm_noirq_suspend_devices > > > ... > > > pci_pm_suspend_noirq # (I assume) > > > pci_prepare_to_sleep > > > > > > I think the downstream component would be put in D3cold by > > > pci_prepare_to_sleep(), so non-wakeup interrupts should be disabled by > > > then. > > > > > > I assume PME would generally *not* be disabled since it's needed for > > > wakeup, so I think any interrupt that shares the PME IRQ and occurs > > > during suspend may cause a spurious wakeup. > > > > Yes, that's the case here. > > > > > If so, it's exactly as you said at the beginning: AER/DPC/etc sharing > > > the PME IRQ may cause spurious wakeups, and we would have to disable > > > those other interrupts at the source, e.g., by clearing > > > PCI_ERR_ROOT_CMD_FATAL_EN etc (exactly as your series does). > > > > So is the series good to be merged now? > > If we merge as-is, won't we disable AER & DPC interrupts unnecessarily > in the case where the link goes to D3hot? In that case, there's no > reason to expect interrupts related to the link going down, but things > like PTM messages still work, and they may cause errors that we should > know about. Because the issue can be observed on D3hot as well [0]. The root port device [0] is power managed by ACPI, so I wonder if it's reasonable to disable AER & DPC for devices that power managed by firmware? [0] https://bugzilla.kernel.org/show_bug.cgi?id=216295#c3 Kai-Heng > > > > > > I don't think D0-D3hot should be relevant here because in all those > > > > > states, the link should be active because the downstream config space > > > > > remains accessible. So I'm not sure if it's possible, but I wonder if > > > > > there's a more targeted place we could do this, e.g., in the path that > > > > > puts downstream devices in D3cold. > > > > > > > > Let me try to work on this. > > > > > > > > Kai-Heng > > > > > > > > > > > > > > > As Per PCIe Base Spec 5.0, section 5.2, titled "Link State Power > > > > > > Management", > > > > > > TLP and DLLP transmission are disabled
Re: [PATCH v6 2/3] PCI/AER: Disable AER interrupt on suspend
On Thu, Aug 10, 2023 at 2:52 AM Bjorn Helgaas wrote: > > On Fri, Jul 21, 2023 at 11:58:24AM +0800, Kai-Heng Feng wrote: > > On Tue, Jul 18, 2023 at 7:17 PM Bjorn Helgaas wrote: > > > On Fri, May 12, 2023 at 08:00:13AM +0800, Kai-Heng Feng wrote: > > > > PCIe services that share an IRQ with PME, such as AER or DPC, > > > > may cause a spurious wakeup on system suspend. To prevent this, > > > > disable the AER interrupt notification during the system suspend > > > > process. > > > > > > I see that in this particular BZ dmesg log, PME, AER, and DPC do share > > > the same IRQ, but I don't think this is true in general. > > > > > > Root Ports usually use MSI or MSI-X. PME and hotplug events use the > > > Interrupt Message Number in the PCIe Capability, but AER uses the one > > > in the AER Root Error Status register, and DPC uses the one in the DPC > > > Capability register. Those potentially correspond to three distinct > > > MSI/MSI-X vectors. > > > > > > I think this probably has nothing to do with the IRQ being *shared*, > > > but just that putting the downstream component into D3cold, where the > > > link state is L3, may cause the upstream component to log and signal a > > > link-related error as the link goes completely down. > > > > That's quite likely a better explanation than my wording. > > Assuming AER IRQ and PME IRQ are not shared, does system get woken up > > by AER IRQ? > > Rafael could answer this better than I can, but > Documentation/power/suspend-and-interrupts.rst says device interrupts > are generally disabled during suspend after the "late" phase of > suspending devices, i.e., > > dpm_suspend_noirq > suspend_device_irqs <-- disable non-wakeup IRQs > dpm_noirq_suspend_devices > ... > pci_pm_suspend_noirq # (I assume) > pci_prepare_to_sleep > > I think the downstream component would be put in D3cold by > pci_prepare_to_sleep(), so non-wakeup interrupts should be disabled by > then. > > I assume PME would generally *not* be disabled since it's needed for > wakeup, so I think any interrupt that shares the PME IRQ and occurs > during suspend may cause a spurious wakeup. Yes, that's the case here. > > If so, it's exactly as you said at the beginning: AER/DPC/etc sharing > the PME IRQ may cause spurious wakeups, and we would have to disable > those other interrupts at the source, e.g., by clearing > PCI_ERR_ROOT_CMD_FATAL_EN etc (exactly as your series does). So is the series good to be merged now? Kai-Heng > > > > I don't think D0-D3hot should be relevant here because in all those > > > states, the link should be active because the downstream config space > > > remains accessible. So I'm not sure if it's possible, but I wonder if > > > there's a more targeted place we could do this, e.g., in the path that > > > puts downstream devices in D3cold. > > > > Let me try to work on this. > > > > Kai-Heng > > > > > > > > > As Per PCIe Base Spec 5.0, section 5.2, titled "Link State Power > > > > Management", > > > > TLP and DLLP transmission are disabled for a Link in L2/L3 Ready > > > > (D3hot), L2 > > > > (D3cold with aux power) and L3 (D3cold) states. So disabling the AER > > > > notification during suspend and re-enabling them during the resume > > > > process > > > > should not affect the basic functionality. > > > > > > > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=216295 > > > > Reviewed-by: Mika Westerberg > > > > Signed-off-by: Kai-Heng Feng > > > > --- > > > > v6: > > > > v5: > > > > - Wording. > > > > > > > > v4: > > > > v3: > > > > - No change. > > > > > > > > v2: > > > > - Only disable AER IRQ. > > > > - No more check on PME IRQ#. > > > > - Use helper. > > > > > > > > drivers/pci/pcie/aer.c | 22 ++ > > > > 1 file changed, 22 insertions(+) > > > > > > > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c > > > > index 1420e1f27105..9c07fdbeb52d 100644 > > > > --- a/drivers/pci/pcie/aer.c > > > > +++ b/drivers/pci/pcie/aer.c > > > > @@ -1356,6 +1356,26 @@ static int aer_probe(struct pcie_device *dev) > > > >
Re: [PATCH v6 2/3] PCI/AER: Disable AER interrupt on suspend
On Fri, Jul 21, 2023 at 11:58 AM Kai-Heng Feng wrote: > > On Tue, Jul 18, 2023 at 7:17 PM Bjorn Helgaas wrote: > > > > [+cc Rafael] > > > > On Fri, May 12, 2023 at 08:00:13AM +0800, Kai-Heng Feng wrote: > > > PCIe services that share an IRQ with PME, such as AER or DPC, may cause a > > > spurious wakeup on system suspend. To prevent this, disable the AER > > > interrupt > > > notification during the system suspend process. > > > > I see that in this particular BZ dmesg log, PME, AER, and DPC do share > > the same IRQ, but I don't think this is true in general. > > > > Root Ports usually use MSI or MSI-X. PME and hotplug events use the > > Interrupt Message Number in the PCIe Capability, but AER uses the one > > in the AER Root Error Status register, and DPC uses the one in the DPC > > Capability register. Those potentially correspond to three distinct > > MSI/MSI-X vectors. > > > > I think this probably has nothing to do with the IRQ being *shared*, > > but just that putting the downstream component into D3cold, where the > > link state is L3, may cause the upstream component to log and signal a > > link-related error as the link goes completely down. > > That's quite likely a better explanation than my wording. > Assuming AER IRQ and PME IRQ are not shared, does system get woken up > by AER IRQ? > > > > > I don't think D0-D3hot should be relevant here because in all those > > states, the link should be active because the downstream config space > > remains accessible. So I'm not sure if it's possible, but I wonder if > > there's a more targeted place we could do this, e.g., in the path that > > puts downstream devices in D3cold. > > Let me try to work on this. We are seeing another case where the issue happens on D3hot [0]. So I wonder if it's possible to disable AER unconditionally? [0] https://bugzilla.kernel.org/show_bug.cgi?id=216295#c3 > > Kai-Heng > > > > > > As Per PCIe Base Spec 5.0, section 5.2, titled "Link State Power > > > Management", > > > TLP and DLLP transmission are disabled for a Link in L2/L3 Ready (D3hot), > > > L2 > > > (D3cold with aux power) and L3 (D3cold) states. So disabling the AER > > > notification during suspend and re-enabling them during the resume process > > > should not affect the basic functionality. > > > > > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=216295 > > > Reviewed-by: Mika Westerberg > > > Signed-off-by: Kai-Heng Feng > > > --- > > > v6: > > > v5: > > > - Wording. > > > > > > v4: > > > v3: > > > - No change. > > > > > > v2: > > > - Only disable AER IRQ. > > > - No more check on PME IRQ#. > > > - Use helper. > > > > > > drivers/pci/pcie/aer.c | 22 ++ > > > 1 file changed, 22 insertions(+) > > > > > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c > > > index 1420e1f27105..9c07fdbeb52d 100644 > > > --- a/drivers/pci/pcie/aer.c > > > +++ b/drivers/pci/pcie/aer.c > > > @@ -1356,6 +1356,26 @@ static int aer_probe(struct pcie_device *dev) > > > return 0; > > > } > > > > > > +static int aer_suspend(struct pcie_device *dev) > > > +{ > > > + struct aer_rpc *rpc = get_service_data(dev); > > > + struct pci_dev *pdev = rpc->rpd; > > > + > > > + aer_disable_irq(pdev); > > > + > > > + return 0; > > > +} > > > + > > > +static int aer_resume(struct pcie_device *dev) > > > +{ > > > + struct aer_rpc *rpc = get_service_data(dev); > > > + struct pci_dev *pdev = rpc->rpd; > > > + > > > + aer_enable_irq(pdev); > > > + > > > + return 0; > > > +} > > > + > > > /** > > > * aer_root_reset - reset Root Port hierarchy, RCEC, or RCiEP > > > * @dev: pointer to Root Port, RCEC, or RCiEP > > > @@ -1420,6 +1440,8 @@ static struct pcie_port_service_driver aerdriver = { > > > .service= PCIE_PORT_SERVICE_AER, > > > > > > .probe = aer_probe, > > > + .suspend= aer_suspend, > > > + .resume = aer_resume, > > > .remove = aer_remove, > > > }; > > > > > > -- > > > 2.34.1 > > >
Re: [PATCH v6 2/3] PCI/AER: Disable AER interrupt on suspend
On Tue, Jul 18, 2023 at 7:17 PM Bjorn Helgaas wrote: > > [+cc Rafael] > > On Fri, May 12, 2023 at 08:00:13AM +0800, Kai-Heng Feng wrote: > > PCIe services that share an IRQ with PME, such as AER or DPC, may cause a > > spurious wakeup on system suspend. To prevent this, disable the AER > > interrupt > > notification during the system suspend process. > > I see that in this particular BZ dmesg log, PME, AER, and DPC do share > the same IRQ, but I don't think this is true in general. > > Root Ports usually use MSI or MSI-X. PME and hotplug events use the > Interrupt Message Number in the PCIe Capability, but AER uses the one > in the AER Root Error Status register, and DPC uses the one in the DPC > Capability register. Those potentially correspond to three distinct > MSI/MSI-X vectors. > > I think this probably has nothing to do with the IRQ being *shared*, > but just that putting the downstream component into D3cold, where the > link state is L3, may cause the upstream component to log and signal a > link-related error as the link goes completely down. That's quite likely a better explanation than my wording. Assuming AER IRQ and PME IRQ are not shared, does system get woken up by AER IRQ? > > I don't think D0-D3hot should be relevant here because in all those > states, the link should be active because the downstream config space > remains accessible. So I'm not sure if it's possible, but I wonder if > there's a more targeted place we could do this, e.g., in the path that > puts downstream devices in D3cold. Let me try to work on this. Kai-Heng > > > As Per PCIe Base Spec 5.0, section 5.2, titled "Link State Power > > Management", > > TLP and DLLP transmission are disabled for a Link in L2/L3 Ready (D3hot), L2 > > (D3cold with aux power) and L3 (D3cold) states. So disabling the AER > > notification during suspend and re-enabling them during the resume process > > should not affect the basic functionality. > > > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=216295 > > Reviewed-by: Mika Westerberg > > Signed-off-by: Kai-Heng Feng > > --- > > v6: > > v5: > > - Wording. > > > > v4: > > v3: > > - No change. > > > > v2: > > - Only disable AER IRQ. > > - No more check on PME IRQ#. > > - Use helper. > > > > drivers/pci/pcie/aer.c | 22 ++ > > 1 file changed, 22 insertions(+) > > > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c > > index 1420e1f27105..9c07fdbeb52d 100644 > > --- a/drivers/pci/pcie/aer.c > > +++ b/drivers/pci/pcie/aer.c > > @@ -1356,6 +1356,26 @@ static int aer_probe(struct pcie_device *dev) > > return 0; > > } > > > > +static int aer_suspend(struct pcie_device *dev) > > +{ > > + struct aer_rpc *rpc = get_service_data(dev); > > + struct pci_dev *pdev = rpc->rpd; > > + > > + aer_disable_irq(pdev); > > + > > + return 0; > > +} > > + > > +static int aer_resume(struct pcie_device *dev) > > +{ > > + struct aer_rpc *rpc = get_service_data(dev); > > + struct pci_dev *pdev = rpc->rpd; > > + > > + aer_enable_irq(pdev); > > + > > + return 0; > > +} > > + > > /** > > * aer_root_reset - reset Root Port hierarchy, RCEC, or RCiEP > > * @dev: pointer to Root Port, RCEC, or RCiEP > > @@ -1420,6 +1440,8 @@ static struct pcie_port_service_driver aerdriver = { > > .service= PCIE_PORT_SERVICE_AER, > > > > .probe = aer_probe, > > + .suspend= aer_suspend, > > + .resume = aer_resume, > > .remove = aer_remove, > > }; > > > > -- > > 2.34.1 > >
Re: [PATCH v6 2/3] PCI/AER: Disable AER interrupt on suspend
On Fri, May 12, 2023 at 8:01 AM Kai-Heng Feng wrote: > > PCIe services that share an IRQ with PME, such as AER or DPC, may cause a > spurious wakeup on system suspend. To prevent this, disable the AER interrupt > notification during the system suspend process. > > As Per PCIe Base Spec 5.0, section 5.2, titled "Link State Power Management", > TLP and DLLP transmission are disabled for a Link in L2/L3 Ready (D3hot), L2 > (D3cold with aux power) and L3 (D3cold) states. So disabling the AER > notification during suspend and re-enabling them during the resume process > should not affect the basic functionality. > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=216295 > Reviewed-by: Mika Westerberg > Signed-off-by: Kai-Heng Feng A gentle ping... > --- > v6: > v5: > - Wording. > > v4: > v3: > - No change. > > v2: > - Only disable AER IRQ. > - No more check on PME IRQ#. > - Use helper. > > drivers/pci/pcie/aer.c | 22 ++ > 1 file changed, 22 insertions(+) > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c > index 1420e1f27105..9c07fdbeb52d 100644 > --- a/drivers/pci/pcie/aer.c > +++ b/drivers/pci/pcie/aer.c > @@ -1356,6 +1356,26 @@ static int aer_probe(struct pcie_device *dev) > return 0; > } > > +static int aer_suspend(struct pcie_device *dev) > +{ > + struct aer_rpc *rpc = get_service_data(dev); > + struct pci_dev *pdev = rpc->rpd; > + > + aer_disable_irq(pdev); > + > + return 0; > +} > + > +static int aer_resume(struct pcie_device *dev) > +{ > + struct aer_rpc *rpc = get_service_data(dev); > + struct pci_dev *pdev = rpc->rpd; > + > + aer_enable_irq(pdev); > + > + return 0; > +} > + > /** > * aer_root_reset - reset Root Port hierarchy, RCEC, or RCiEP > * @dev: pointer to Root Port, RCEC, or RCiEP > @@ -1420,6 +1440,8 @@ static struct pcie_port_service_driver aerdriver = { > .service= PCIE_PORT_SERVICE_AER, > > .probe = aer_probe, > + .suspend= aer_suspend, > + .resume = aer_resume, > .remove = aer_remove, > }; > > -- > 2.34.1 >
Re: [PATCH v6 1/3] PCI/AER: Factor out interrupt toggling into helpers
Hi Bjorn, On Fri, May 12, 2023 at 8:01 AM Kai-Heng Feng wrote: > > There are many places that enable and disable AER interrupt, so move > them into helpers. Do you think the series is good to be be merged now? Kai-Heng > > Reviewed-by: Mika Westerberg > Reviewed-by: Kuppuswamy Sathyanarayanan > > Reviewed-by: Jonathan Cameron > Signed-off-by: Kai-Heng Feng > --- > v6: > - No change. > > v5: > - Fix misspelling. > > v4: > - No change. > > v3: > - Correct subject. > > v2: > - New patch. > > drivers/pci/pcie/aer.c | 45 +- > 1 file changed, 27 insertions(+), 18 deletions(-) > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c > index f6c24ded134c..1420e1f27105 100644 > --- a/drivers/pci/pcie/aer.c > +++ b/drivers/pci/pcie/aer.c > @@ -1227,6 +1227,28 @@ static irqreturn_t aer_irq(int irq, void *context) > return IRQ_WAKE_THREAD; > } > > +static void aer_enable_irq(struct pci_dev *pdev) > +{ > + int aer = pdev->aer_cap; > + u32 reg32; > + > + /* Enable Root Port's interrupt in response to error messages */ > + pci_read_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, ®32); > + reg32 |= ROOT_PORT_INTR_ON_MESG_MASK; > + pci_write_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, reg32); > +} > + > +static void aer_disable_irq(struct pci_dev *pdev) > +{ > + int aer = pdev->aer_cap; > + u32 reg32; > + > + /* Disable Root's interrupt in response to error messages */ > + pci_read_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, ®32); > + reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK; > + pci_write_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, reg32); > +} > + > /** > * aer_enable_rootport - enable Root Port's interrupts when receiving > messages > * @rpc: pointer to a Root Port data structure > @@ -1256,10 +1278,7 @@ static void aer_enable_rootport(struct aer_rpc *rpc) > pci_read_config_dword(pdev, aer + PCI_ERR_UNCOR_STATUS, ®32); > pci_write_config_dword(pdev, aer + PCI_ERR_UNCOR_STATUS, reg32); > > - /* Enable Root Port's interrupt in response to error messages */ > - pci_read_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, ®32); > - reg32 |= ROOT_PORT_INTR_ON_MESG_MASK; > - pci_write_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, reg32); > + aer_enable_irq(pdev); > } > > /** > @@ -1274,10 +1293,7 @@ static void aer_disable_rootport(struct aer_rpc *rpc) > int aer = pdev->aer_cap; > u32 reg32; > > - /* Disable Root's interrupt in response to error messages */ > - pci_read_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, ®32); > - reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK; > - pci_write_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, reg32); > + aer_disable_irq(pdev); > > /* Clear Root's error status reg */ > pci_read_config_dword(pdev, aer + PCI_ERR_ROOT_STATUS, ®32); > @@ -1372,12 +1388,8 @@ static pci_ers_result_t aer_root_reset(struct pci_dev > *dev) > */ > aer = root ? root->aer_cap : 0; > > - if ((host->native_aer || pcie_ports_native) && aer) { > - /* Disable Root's interrupt in response to error messages */ > - pci_read_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, > ®32); > - reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK; > - pci_write_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, > reg32); > - } > + if ((host->native_aer || pcie_ports_native) && aer) > + aer_disable_irq(root); > > if (type == PCI_EXP_TYPE_RC_EC || type == PCI_EXP_TYPE_RC_END) { > rc = pcie_reset_flr(dev, PCI_RESET_DO_RESET); > @@ -1396,10 +1408,7 @@ static pci_ers_result_t aer_root_reset(struct pci_dev > *dev) > pci_read_config_dword(root, aer + PCI_ERR_ROOT_STATUS, > ®32); > pci_write_config_dword(root, aer + PCI_ERR_ROOT_STATUS, > reg32); > > - /* Enable Root Port's interrupt in response to error messages > */ > - pci_read_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, > ®32); > - reg32 |= ROOT_PORT_INTR_ON_MESG_MASK; > - pci_write_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, > reg32); > + aer_enable_irq(root); > } > > return rc ? PCI_ERS_RESULT_DISCONNECT : PCI_ERS_RESULT_RECOVERED; > -- > 2.34.1 >
[PATCH v6 3/3] PCI/DPC: Disable DPC interrupt during suspend
PCIe services that share an IRQ with PME, such as AER or DPC, may cause a spurious wakeup on system suspend. Since DPC depends on AER to work, disable DPC interrupt notification during the system suspend process as AER interrupt notification is already disabled by previous patch. As Per PCIe Base Spec 5.0, section 5.2, titled "Link State Power Management", TLP and DLLP transmission are disabled for a Link in L2/L3 Ready (D3hot), L2 (D3cold with aux power) and L3 (D3cold) states. So disabling the DPC notification during suspend and re-enabling them during the resume process should not affect the basic functionality. Link: https://bugzilla.kernel.org/show_bug.cgi?id=216295 Reviewed-by: Mika Westerberg Signed-off-by: Kai-Heng Feng --- v6: v5: - Wording. v4: v3: - No change. v2: - Only disable DPC IRQ. - No more check on PME IRQ#. drivers/pci/pcie/dpc.c | 26 ++ 1 file changed, 26 insertions(+) diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c index 3ceed8e3de41..d2d845c20438 100644 --- a/drivers/pci/pcie/dpc.c +++ b/drivers/pci/pcie/dpc.c @@ -384,6 +384,30 @@ static int dpc_probe(struct pcie_device *dev) return status; } +static int dpc_suspend(struct pcie_device *dev) +{ + struct pci_dev *pdev = dev->port; + u16 ctl; + + pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, &ctl); + ctl &= ~PCI_EXP_DPC_CTL_INT_EN; + pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, ctl); + + return 0; +} + +static int dpc_resume(struct pcie_device *dev) +{ + struct pci_dev *pdev = dev->port; + u16 ctl; + + pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, &ctl); + ctl |= PCI_EXP_DPC_CTL_INT_EN; + pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, ctl); + + return 0; +} + static void dpc_remove(struct pcie_device *dev) { struct pci_dev *pdev = dev->port; @@ -399,6 +423,8 @@ static struct pcie_port_service_driver dpcdriver = { .port_type = PCIE_ANY_PORT, .service= PCIE_PORT_SERVICE_DPC, .probe = dpc_probe, + .suspend= dpc_suspend, + .resume = dpc_resume, .remove = dpc_remove, }; -- 2.34.1
[PATCH v6 2/3] PCI/AER: Disable AER interrupt on suspend
PCIe services that share an IRQ with PME, such as AER or DPC, may cause a spurious wakeup on system suspend. To prevent this, disable the AER interrupt notification during the system suspend process. As Per PCIe Base Spec 5.0, section 5.2, titled "Link State Power Management", TLP and DLLP transmission are disabled for a Link in L2/L3 Ready (D3hot), L2 (D3cold with aux power) and L3 (D3cold) states. So disabling the AER notification during suspend and re-enabling them during the resume process should not affect the basic functionality. Link: https://bugzilla.kernel.org/show_bug.cgi?id=216295 Reviewed-by: Mika Westerberg Signed-off-by: Kai-Heng Feng --- v6: v5: - Wording. v4: v3: - No change. v2: - Only disable AER IRQ. - No more check on PME IRQ#. - Use helper. drivers/pci/pcie/aer.c | 22 ++ 1 file changed, 22 insertions(+) diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index 1420e1f27105..9c07fdbeb52d 100644 --- a/drivers/pci/pcie/aer.c +++ b/drivers/pci/pcie/aer.c @@ -1356,6 +1356,26 @@ static int aer_probe(struct pcie_device *dev) return 0; } +static int aer_suspend(struct pcie_device *dev) +{ + struct aer_rpc *rpc = get_service_data(dev); + struct pci_dev *pdev = rpc->rpd; + + aer_disable_irq(pdev); + + return 0; +} + +static int aer_resume(struct pcie_device *dev) +{ + struct aer_rpc *rpc = get_service_data(dev); + struct pci_dev *pdev = rpc->rpd; + + aer_enable_irq(pdev); + + return 0; +} + /** * aer_root_reset - reset Root Port hierarchy, RCEC, or RCiEP * @dev: pointer to Root Port, RCEC, or RCiEP @@ -1420,6 +1440,8 @@ static struct pcie_port_service_driver aerdriver = { .service= PCIE_PORT_SERVICE_AER, .probe = aer_probe, + .suspend= aer_suspend, + .resume = aer_resume, .remove = aer_remove, }; -- 2.34.1
[PATCH v6 1/3] PCI/AER: Factor out interrupt toggling into helpers
There are many places that enable and disable AER interrupt, so move them into helpers. Reviewed-by: Mika Westerberg Reviewed-by: Kuppuswamy Sathyanarayanan Reviewed-by: Jonathan Cameron Signed-off-by: Kai-Heng Feng --- v6: - No change. v5: - Fix misspelling. v4: - No change. v3: - Correct subject. v2: - New patch. drivers/pci/pcie/aer.c | 45 +- 1 file changed, 27 insertions(+), 18 deletions(-) diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index f6c24ded134c..1420e1f27105 100644 --- a/drivers/pci/pcie/aer.c +++ b/drivers/pci/pcie/aer.c @@ -1227,6 +1227,28 @@ static irqreturn_t aer_irq(int irq, void *context) return IRQ_WAKE_THREAD; } +static void aer_enable_irq(struct pci_dev *pdev) +{ + int aer = pdev->aer_cap; + u32 reg32; + + /* Enable Root Port's interrupt in response to error messages */ + pci_read_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, ®32); + reg32 |= ROOT_PORT_INTR_ON_MESG_MASK; + pci_write_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, reg32); +} + +static void aer_disable_irq(struct pci_dev *pdev) +{ + int aer = pdev->aer_cap; + u32 reg32; + + /* Disable Root's interrupt in response to error messages */ + pci_read_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, ®32); + reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK; + pci_write_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, reg32); +} + /** * aer_enable_rootport - enable Root Port's interrupts when receiving messages * @rpc: pointer to a Root Port data structure @@ -1256,10 +1278,7 @@ static void aer_enable_rootport(struct aer_rpc *rpc) pci_read_config_dword(pdev, aer + PCI_ERR_UNCOR_STATUS, ®32); pci_write_config_dword(pdev, aer + PCI_ERR_UNCOR_STATUS, reg32); - /* Enable Root Port's interrupt in response to error messages */ - pci_read_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, ®32); - reg32 |= ROOT_PORT_INTR_ON_MESG_MASK; - pci_write_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, reg32); + aer_enable_irq(pdev); } /** @@ -1274,10 +1293,7 @@ static void aer_disable_rootport(struct aer_rpc *rpc) int aer = pdev->aer_cap; u32 reg32; - /* Disable Root's interrupt in response to error messages */ - pci_read_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, ®32); - reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK; - pci_write_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, reg32); + aer_disable_irq(pdev); /* Clear Root's error status reg */ pci_read_config_dword(pdev, aer + PCI_ERR_ROOT_STATUS, ®32); @@ -1372,12 +1388,8 @@ static pci_ers_result_t aer_root_reset(struct pci_dev *dev) */ aer = root ? root->aer_cap : 0; - if ((host->native_aer || pcie_ports_native) && aer) { - /* Disable Root's interrupt in response to error messages */ - pci_read_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, ®32); - reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK; - pci_write_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, reg32); - } + if ((host->native_aer || pcie_ports_native) && aer) + aer_disable_irq(root); if (type == PCI_EXP_TYPE_RC_EC || type == PCI_EXP_TYPE_RC_END) { rc = pcie_reset_flr(dev, PCI_RESET_DO_RESET); @@ -1396,10 +1408,7 @@ static pci_ers_result_t aer_root_reset(struct pci_dev *dev) pci_read_config_dword(root, aer + PCI_ERR_ROOT_STATUS, ®32); pci_write_config_dword(root, aer + PCI_ERR_ROOT_STATUS, reg32); - /* Enable Root Port's interrupt in response to error messages */ - pci_read_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, ®32); - reg32 |= ROOT_PORT_INTR_ON_MESG_MASK; - pci_write_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, reg32); + aer_enable_irq(root); } return rc ? PCI_ERS_RESULT_DISCONNECT : PCI_ERS_RESULT_RECOVERED; -- 2.34.1
Re: [PATCH v5 2/3] PCI/AER: Disable AER interrupt on suspend
On Fri, May 12, 2023 at 6:08 AM Sathyanarayanan Kuppuswamy wrote: > > > > On 5/11/23 6:36 AM, Kai-Heng Feng wrote: > > PCIe service that shares IRQ with PME may cause spurious wakeup on > > system suspend. > > > > This is very similar to previous attempts to suspend AER and DPC [1], > > but this time disabling AER IRQ is to prevent immediate PME wakeup when > > AER shares the same IRQ line with PME. > > IMHO, you don't need to mention the previous submission reason. Sure, will remove it in next revision. > > > > > It's okay to disable AER because PCIe Base Spec 5.0, section 5.2 "Link > > State Power Management" states that TLP and DLLP transmission is > > disabled for a Link in L2/L3 Ready (D3hot), L2 (D3cold with aux power) > > and L3 (D3cold), hence we don't lose much here to disable AER IRQ during > > system suspend. > > May be something like below? > > PCIe services that share an IRQ with PME, such as AER or DPC, may cause a > spurious wakeup on system suspend. To prevent this, disable the AER > interrupt notification during the system suspend process. > > As Per PCIe Base Spec 5.0, section 5.2, titled "Link State Power Management", > TLP and DLLP transmission are disabled for a Link in L2/L3 Ready (D3hot), L2 > (D3cold with aux power) and L3 (D3cold) states. So disabling the AER > notification > during suspend and re-enabling them during the resume process should not > affect > the basic functionality. I'll shamelessly use this in the commit message :) Kai-Heng > > > > > [1] > > https://lore.kernel.org/linux-pci/20220408153159.106741-1-kai.heng.f...@canonical.com/ > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=216295 > > > > Reviewed-by: Mika Westerberg > > Signed-off-by: Kai-Heng Feng > > --- > > v5: > > - Wording. > > > > v4: > > v3: > > - No change. > > > > v2: > > - Only disable AER IRQ. > > - No more check on PME IRQ#. > > - Use helper. > > > > drivers/pci/pcie/aer.c | 22 ++ > > 1 file changed, 22 insertions(+) > > > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c > > index 1420e1f27105..9c07fdbeb52d 100644 > > --- a/drivers/pci/pcie/aer.c > > +++ b/drivers/pci/pcie/aer.c > > @@ -1356,6 +1356,26 @@ static int aer_probe(struct pcie_device *dev) > > return 0; > > } > > > > +static int aer_suspend(struct pcie_device *dev) > > +{ > > + struct aer_rpc *rpc = get_service_data(dev); > > + struct pci_dev *pdev = rpc->rpd; > > + > > + aer_disable_irq(pdev); > > + > > + return 0; > > +} > > + > > +static int aer_resume(struct pcie_device *dev) > > +{ > > + struct aer_rpc *rpc = get_service_data(dev); > > + struct pci_dev *pdev = rpc->rpd; > > + > > + aer_enable_irq(pdev); > > + > > + return 0; > > +} > > + > > /** > > * aer_root_reset - reset Root Port hierarchy, RCEC, or RCiEP > > * @dev: pointer to Root Port, RCEC, or RCiEP > > @@ -1420,6 +1440,8 @@ static struct pcie_port_service_driver aerdriver = { > > .service= PCIE_PORT_SERVICE_AER, > > > > .probe = aer_probe, > > + .suspend= aer_suspend, > > + .resume = aer_resume, > > .remove = aer_remove, > > }; > > > > -- > Sathyanarayanan Kuppuswamy > Linux Kernel Developer
[PATCH v5 3/3] PCI/DPC: Disable DPC interrupt during suspend
PCIe service that shares IRQ with PME may cause spurious wakeup on system suspend. Since AER is conditionally disabled in previous patch, also apply the same logic to disable DPC which depends on AER to work. It's okay to disable DPC because PCIe Base Spec 5.0, section 5.2 "Link State Power Management" states that TLP and DLLP transmission is disabled for a Link in L2/L3 Ready (D3hot), L2 (D3cold with aux power) and L3 (D3cold), hence we don't lose much here to disable DPC during system suspend. [1] https://lore.kernel.org/linux-pci/20220408153159.106741-1-kai.heng.f...@canonical.com/ Link: https://bugzilla.kernel.org/show_bug.cgi?id=216295 Reviewed-by: Mika Westerberg Signed-off-by: Kai-Heng Feng --- v5: - Wording. v4: v3: - No change. v2: - Only disable DPC IRQ. - No more check on PME IRQ#. drivers/pci/pcie/dpc.c | 26 ++ 1 file changed, 26 insertions(+) diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c index 3ceed8e3de41..d2d845c20438 100644 --- a/drivers/pci/pcie/dpc.c +++ b/drivers/pci/pcie/dpc.c @@ -384,6 +384,30 @@ static int dpc_probe(struct pcie_device *dev) return status; } +static int dpc_suspend(struct pcie_device *dev) +{ + struct pci_dev *pdev = dev->port; + u16 ctl; + + pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, &ctl); + ctl &= ~PCI_EXP_DPC_CTL_INT_EN; + pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, ctl); + + return 0; +} + +static int dpc_resume(struct pcie_device *dev) +{ + struct pci_dev *pdev = dev->port; + u16 ctl; + + pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, &ctl); + ctl |= PCI_EXP_DPC_CTL_INT_EN; + pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, ctl); + + return 0; +} + static void dpc_remove(struct pcie_device *dev) { struct pci_dev *pdev = dev->port; @@ -399,6 +423,8 @@ static struct pcie_port_service_driver dpcdriver = { .port_type = PCIE_ANY_PORT, .service= PCIE_PORT_SERVICE_DPC, .probe = dpc_probe, + .suspend= dpc_suspend, + .resume = dpc_resume, .remove = dpc_remove, }; -- 2.34.1
[PATCH v5 2/3] PCI/AER: Disable AER interrupt on suspend
PCIe service that shares IRQ with PME may cause spurious wakeup on system suspend. This is very similar to previous attempts to suspend AER and DPC [1], but this time disabling AER IRQ is to prevent immediate PME wakeup when AER shares the same IRQ line with PME. It's okay to disable AER because PCIe Base Spec 5.0, section 5.2 "Link State Power Management" states that TLP and DLLP transmission is disabled for a Link in L2/L3 Ready (D3hot), L2 (D3cold with aux power) and L3 (D3cold), hence we don't lose much here to disable AER IRQ during system suspend. [1] https://lore.kernel.org/linux-pci/20220408153159.106741-1-kai.heng.f...@canonical.com/ Link: https://bugzilla.kernel.org/show_bug.cgi?id=216295 Reviewed-by: Mika Westerberg Signed-off-by: Kai-Heng Feng --- v5: - Wording. v4: v3: - No change. v2: - Only disable AER IRQ. - No more check on PME IRQ#. - Use helper. drivers/pci/pcie/aer.c | 22 ++ 1 file changed, 22 insertions(+) diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index 1420e1f27105..9c07fdbeb52d 100644 --- a/drivers/pci/pcie/aer.c +++ b/drivers/pci/pcie/aer.c @@ -1356,6 +1356,26 @@ static int aer_probe(struct pcie_device *dev) return 0; } +static int aer_suspend(struct pcie_device *dev) +{ + struct aer_rpc *rpc = get_service_data(dev); + struct pci_dev *pdev = rpc->rpd; + + aer_disable_irq(pdev); + + return 0; +} + +static int aer_resume(struct pcie_device *dev) +{ + struct aer_rpc *rpc = get_service_data(dev); + struct pci_dev *pdev = rpc->rpd; + + aer_enable_irq(pdev); + + return 0; +} + /** * aer_root_reset - reset Root Port hierarchy, RCEC, or RCiEP * @dev: pointer to Root Port, RCEC, or RCiEP @@ -1420,6 +1440,8 @@ static struct pcie_port_service_driver aerdriver = { .service= PCIE_PORT_SERVICE_AER, .probe = aer_probe, + .suspend= aer_suspend, + .resume = aer_resume, .remove = aer_remove, }; -- 2.34.1
[PATCH v5 1/3] PCI/AER: Factor out interrupt toggling into helpers
There are many places that enable and disable AER interrupt, so move them into helpers. Reviewed-by: Mika Westerberg Reviewed-by: Kuppuswamy Sathyanarayanan Reviewed-by: Jonathan Cameron Signed-off-by: Kai-Heng Feng --- v5 - Fix misspelling. v4: - No change. v3: - Correct subject. v2: - New patch. drivers/pci/pcie/aer.c | 45 +- 1 file changed, 27 insertions(+), 18 deletions(-) diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index f6c24ded134c..1420e1f27105 100644 --- a/drivers/pci/pcie/aer.c +++ b/drivers/pci/pcie/aer.c @@ -1227,6 +1227,28 @@ static irqreturn_t aer_irq(int irq, void *context) return IRQ_WAKE_THREAD; } +static void aer_enable_irq(struct pci_dev *pdev) +{ + int aer = pdev->aer_cap; + u32 reg32; + + /* Enable Root Port's interrupt in response to error messages */ + pci_read_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, ®32); + reg32 |= ROOT_PORT_INTR_ON_MESG_MASK; + pci_write_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, reg32); +} + +static void aer_disable_irq(struct pci_dev *pdev) +{ + int aer = pdev->aer_cap; + u32 reg32; + + /* Disable Root's interrupt in response to error messages */ + pci_read_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, ®32); + reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK; + pci_write_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, reg32); +} + /** * aer_enable_rootport - enable Root Port's interrupts when receiving messages * @rpc: pointer to a Root Port data structure @@ -1256,10 +1278,7 @@ static void aer_enable_rootport(struct aer_rpc *rpc) pci_read_config_dword(pdev, aer + PCI_ERR_UNCOR_STATUS, ®32); pci_write_config_dword(pdev, aer + PCI_ERR_UNCOR_STATUS, reg32); - /* Enable Root Port's interrupt in response to error messages */ - pci_read_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, ®32); - reg32 |= ROOT_PORT_INTR_ON_MESG_MASK; - pci_write_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, reg32); + aer_enable_irq(pdev); } /** @@ -1274,10 +1293,7 @@ static void aer_disable_rootport(struct aer_rpc *rpc) int aer = pdev->aer_cap; u32 reg32; - /* Disable Root's interrupt in response to error messages */ - pci_read_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, ®32); - reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK; - pci_write_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, reg32); + aer_disable_irq(pdev); /* Clear Root's error status reg */ pci_read_config_dword(pdev, aer + PCI_ERR_ROOT_STATUS, ®32); @@ -1372,12 +1388,8 @@ static pci_ers_result_t aer_root_reset(struct pci_dev *dev) */ aer = root ? root->aer_cap : 0; - if ((host->native_aer || pcie_ports_native) && aer) { - /* Disable Root's interrupt in response to error messages */ - pci_read_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, ®32); - reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK; - pci_write_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, reg32); - } + if ((host->native_aer || pcie_ports_native) && aer) + aer_disable_irq(root); if (type == PCI_EXP_TYPE_RC_EC || type == PCI_EXP_TYPE_RC_END) { rc = pcie_reset_flr(dev, PCI_RESET_DO_RESET); @@ -1396,10 +1408,7 @@ static pci_ers_result_t aer_root_reset(struct pci_dev *dev) pci_read_config_dword(root, aer + PCI_ERR_ROOT_STATUS, ®32); pci_write_config_dword(root, aer + PCI_ERR_ROOT_STATUS, reg32); - /* Enable Root Port's interrupt in response to error messages */ - pci_read_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, ®32); - reg32 |= ROOT_PORT_INTR_ON_MESG_MASK; - pci_write_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, reg32); + aer_enable_irq(root); } return rc ? PCI_ERS_RESULT_DISCONNECT : PCI_ERS_RESULT_RECOVERED; -- 2.34.1
Re: [PATCH v4 2/3] PCI/AER: Disable AER interrupt on suspend
On Tue, Apr 25, 2023 at 7:47 AM Sathyanarayanan Kuppuswamy wrote: > > > > On 4/23/23 10:52 PM, Kai-Heng Feng wrote: > > PCIe service that shares IRQ with PME may cause spurious wakeup on > > system suspend. > > > > PCIe Base Spec 5.0, section 5.2 "Link State Power Management" states > > that TLP and DLLP transmission is disabled for a Link in L2/L3 Ready > > (D3hot), L2 (D3cold with aux power) and L3 (D3cold), so we don't lose > > much here to disable AER during system suspend. > > > > This is very similar to previous attempts to suspend AER and DPC [1], > > but with a different reason. > > > > [1] > > https://lore.kernel.org/linux-pci/20220408153159.106741-1-kai.heng.f...@canonical.com/ > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=216295 > > > > Reviewed-by: Mika Westerberg > > Signed-off-by: Kai-Heng Feng > > --- > > IIUC, you encounter AER errors during the suspend/resume process, which > results in AER IRQ. Because AER and PME share an IRQ, it is regarded as a > spurious wake-up IRQ. So to fix it, you want to disable AER reporting, > right? > > It looks like it is harmless to disable the AER during the suspend/resume > path. But, I am wondering why we get these errors? Did you check what errors > you get during the suspend/resume path? Are these errors valid? AFAIK those errors comes from firmware/hardware side, especially when the device gets put to D3hot/D3cold. Kai-Heng > > > > drivers/pci/pcie/aer.c | 22 ++ > > 1 file changed, 22 insertions(+) > > > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c > > index 1420e1f27105..9c07fdbeb52d 100644 > > --- a/drivers/pci/pcie/aer.c > > +++ b/drivers/pci/pcie/aer.c > > @@ -1356,6 +1356,26 @@ static int aer_probe(struct pcie_device *dev) > > return 0; > > } > > > > +static int aer_suspend(struct pcie_device *dev) > > +{ > > + struct aer_rpc *rpc = get_service_data(dev); > > + struct pci_dev *pdev = rpc->rpd; > > + > > + aer_disable_irq(pdev); > > + > > + return 0; > > +} > > + > > +static int aer_resume(struct pcie_device *dev) > > +{ > > + struct aer_rpc *rpc = get_service_data(dev); > > + struct pci_dev *pdev = rpc->rpd; > > + > > + aer_enable_irq(pdev); > > + > > + return 0; > > +} > > + > > /** > > * aer_root_reset - reset Root Port hierarchy, RCEC, or RCiEP > > * @dev: pointer to Root Port, RCEC, or RCiEP > > @@ -1420,6 +1440,8 @@ static struct pcie_port_service_driver aerdriver = { > > .service= PCIE_PORT_SERVICE_AER, > > > > .probe = aer_probe, > > + .suspend= aer_suspend, > > + .resume = aer_resume, > > .remove = aer_remove, > > }; > > > > -- > Sathyanarayanan Kuppuswamy > Linux Kernel Developer
Re: [PATCH v4 2/3] PCI/AER: Disable AER interrupt on suspend
On Sat, May 6, 2023 at 3:22 AM Bjorn Helgaas wrote: > > On Mon, Apr 24, 2023 at 01:52:48PM +0800, Kai-Heng Feng wrote: > > PCIe service that shares IRQ with PME may cause spurious wakeup on > > system suspend. > > > > PCIe Base Spec 5.0, section 5.2 "Link State Power Management" states > > that TLP and DLLP transmission is disabled for a Link in L2/L3 Ready > > (D3hot), L2 (D3cold with aux power) and L3 (D3cold), so we don't lose > > much here to disable AER during system suspend. > > > > This is very similar to previous attempts to suspend AER and DPC [1], > > but with a different reason. > > What is the reason? I assume it's something to do with the bugzilla > below, but the commit log should outline the user-visible problem this > fixes. The commit log basically makes the case for "why should we > merge this patch." > > I assume it's along the lines of "I tried to suspend this system, but > it immediately woke up again because of an AER interrupt, and > disabling AER during suspend avoids this problem. And disabling > the AER interrupt is not a problem because X" Yes that's the reason :) Will update the message to better reflect what's going on. Kai-Heng > > > [1] > > https://lore.kernel.org/linux-pci/20220408153159.106741-1-kai.heng.f...@canonical.com/ > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=216295 > > > > Reviewed-by: Mika Westerberg > > Signed-off-by: Kai-Heng Feng > > --- > > drivers/pci/pcie/aer.c | 22 ++ > > 1 file changed, 22 insertions(+) > > > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c > > index 1420e1f27105..9c07fdbeb52d 100644 > > --- a/drivers/pci/pcie/aer.c > > +++ b/drivers/pci/pcie/aer.c > > @@ -1356,6 +1356,26 @@ static int aer_probe(struct pcie_device *dev) > > return 0; > > } > > > > +static int aer_suspend(struct pcie_device *dev) > > +{ > > + struct aer_rpc *rpc = get_service_data(dev); > > + struct pci_dev *pdev = rpc->rpd; > > + > > + aer_disable_irq(pdev); > > + > > + return 0; > > +} > > + > > +static int aer_resume(struct pcie_device *dev) > > +{ > > + struct aer_rpc *rpc = get_service_data(dev); > > + struct pci_dev *pdev = rpc->rpd; > > + > > + aer_enable_irq(pdev); > > + > > + return 0; > > +} > > + > > /** > > * aer_root_reset - reset Root Port hierarchy, RCEC, or RCiEP > > * @dev: pointer to Root Port, RCEC, or RCiEP > > @@ -1420,6 +1440,8 @@ static struct pcie_port_service_driver aerdriver = { > > .service= PCIE_PORT_SERVICE_AER, > > > > .probe = aer_probe, > > + .suspend= aer_suspend, > > + .resume = aer_resume, > > .remove = aer_remove, > > }; > > > > -- > > 2.34.1 > >
Re: [PATCH v4 1/3] PCI/AER: Factor out interrupt toggling into helpers
On Fri, May 5, 2023 at 11:37 PM Jonathan Cameron wrote: > > On Mon, 24 Apr 2023 13:52:47 +0800 > Kai-Heng Feng wrote: > > > There are many places that enable and disable AER interrput, so move > > interrupt Thanks, will correct that in next revision. Kai-Heng > > > them into helpers. > > Otherwise looks like a good clean up to me. > FWIW > Reviewed-by: Jonathan Cameron > > > > > Reviewed-by: Mika Westerberg > > Reviewed-by: Kuppuswamy Sathyanarayanan > > > > Signed-off-by: Kai-Heng Feng > > --- > > drivers/pci/pcie/aer.c | 45 +- > > 1 file changed, 27 insertions(+), 18 deletions(-) > > > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c > > index f6c24ded134c..1420e1f27105 100644 > > --- a/drivers/pci/pcie/aer.c > > +++ b/drivers/pci/pcie/aer.c > > @@ -1227,6 +1227,28 @@ static irqreturn_t aer_irq(int irq, void *context) > > return IRQ_WAKE_THREAD; > > } > > > > +static void aer_enable_irq(struct pci_dev *pdev) > > +{ > > + int aer = pdev->aer_cap; > > + u32 reg32; > > + > > + /* Enable Root Port's interrupt in response to error messages */ > > + pci_read_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, ®32); > > + reg32 |= ROOT_PORT_INTR_ON_MESG_MASK; > > + pci_write_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, reg32); > > +} > > + > > +static void aer_disable_irq(struct pci_dev *pdev) > > +{ > > + int aer = pdev->aer_cap; > > + u32 reg32; > > + > > + /* Disable Root's interrupt in response to error messages */ > > + pci_read_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, ®32); > > + reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK; > > + pci_write_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, reg32); > > +} > > + > > /** > > * aer_enable_rootport - enable Root Port's interrupts when receiving > > messages > > * @rpc: pointer to a Root Port data structure > > @@ -1256,10 +1278,7 @@ static void aer_enable_rootport(struct aer_rpc *rpc) > > pci_read_config_dword(pdev, aer + PCI_ERR_UNCOR_STATUS, ®32); > > pci_write_config_dword(pdev, aer + PCI_ERR_UNCOR_STATUS, reg32); > > > > - /* Enable Root Port's interrupt in response to error messages */ > > - pci_read_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, ®32); > > - reg32 |= ROOT_PORT_INTR_ON_MESG_MASK; > > - pci_write_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, reg32); > > + aer_enable_irq(pdev); > > } > > > > /** > > @@ -1274,10 +1293,7 @@ static void aer_disable_rootport(struct aer_rpc *rpc) > > int aer = pdev->aer_cap; > > u32 reg32; > > > > - /* Disable Root's interrupt in response to error messages */ > > - pci_read_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, ®32); > > - reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK; > > - pci_write_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, reg32); > > + aer_disable_irq(pdev); > > > > /* Clear Root's error status reg */ > > pci_read_config_dword(pdev, aer + PCI_ERR_ROOT_STATUS, ®32); > > @@ -1372,12 +1388,8 @@ static pci_ers_result_t aer_root_reset(struct > > pci_dev *dev) > >*/ > > aer = root ? root->aer_cap : 0; > > > > - if ((host->native_aer || pcie_ports_native) && aer) { > > - /* Disable Root's interrupt in response to error messages */ > > - pci_read_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, > > ®32); > > - reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK; > > - pci_write_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, > > reg32); > > - } > > + if ((host->native_aer || pcie_ports_native) && aer) > > + aer_disable_irq(root); > > > > if (type == PCI_EXP_TYPE_RC_EC || type == PCI_EXP_TYPE_RC_END) { > > rc = pcie_reset_flr(dev, PCI_RESET_DO_RESET); > > @@ -1396,10 +1408,7 @@ static pci_ers_result_t aer_root_reset(struct > > pci_dev *dev) > > pci_read_config_dword(root, aer + PCI_ERR_ROOT_STATUS, > > ®32); > > pci_write_config_dword(root, aer + PCI_ERR_ROOT_STATUS, > > reg32); > > > > - /* Enable Root Port's interrupt in response to error messages > > */ > > - pci_read_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, > > ®32); > > - reg32 |= ROOT_PORT_INTR_ON_MESG_MASK; > > - pci_write_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, > > reg32); > > + aer_enable_irq(root); > > } > > > > return rc ? PCI_ERS_RESULT_DISCONNECT : PCI_ERS_RESULT_RECOVERED; >
Re: [PATCH v4 2/3] PCI/AER: Disable AER interrupt on suspend
On Tue, Apr 25, 2023 at 7:47 AM Sathyanarayanan Kuppuswamy wrote: > > > > On 4/23/23 10:52 PM, Kai-Heng Feng wrote: > > PCIe service that shares IRQ with PME may cause spurious wakeup on > > system suspend. > > > > PCIe Base Spec 5.0, section 5.2 "Link State Power Management" states > > that TLP and DLLP transmission is disabled for a Link in L2/L3 Ready > > (D3hot), L2 (D3cold with aux power) and L3 (D3cold), so we don't lose > > much here to disable AER during system suspend. > > > > This is very similar to previous attempts to suspend AER and DPC [1], > > but with a different reason. > > > > [1] > > https://lore.kernel.org/linux-pci/20220408153159.106741-1-kai.heng.f...@canonical.com/ > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=216295 > > > > Reviewed-by: Mika Westerberg > > Signed-off-by: Kai-Heng Feng > > --- > > IIUC, you encounter AER errors during the suspend/resume process, which > results in AER IRQ. Because AER and PME share an IRQ, it is regarded as a > spurious wake-up IRQ. So to fix it, you want to disable AER reporting, > right? Yes. That's exactly what happened. > > It looks like it is harmless to disable the AER during the suspend/resume > path. But, I am wondering why we get these errors? Did you check what errors > you get during the suspend/resume path? Are these errors valid? I really don't know. I think it's similar to the reasoning in commit b07461a8e45b ("PCI/AER: Clear error status registers during enumeration and restore"): "AER errors might be recorded when powering-on devices. These errors can be ignored, ...". For this case, it happens when powering-off the device (D3cold) via turning off power resources. Kai-Heng > > > > drivers/pci/pcie/aer.c | 22 ++ > > 1 file changed, 22 insertions(+) > > > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c > > index 1420e1f27105..9c07fdbeb52d 100644 > > --- a/drivers/pci/pcie/aer.c > > +++ b/drivers/pci/pcie/aer.c > > @@ -1356,6 +1356,26 @@ static int aer_probe(struct pcie_device *dev) > > return 0; > > } > > > > +static int aer_suspend(struct pcie_device *dev) > > +{ > > + struct aer_rpc *rpc = get_service_data(dev); > > + struct pci_dev *pdev = rpc->rpd; > > + > > + aer_disable_irq(pdev); > > + > > + return 0; > > +} > > + > > +static int aer_resume(struct pcie_device *dev) > > +{ > > + struct aer_rpc *rpc = get_service_data(dev); > > + struct pci_dev *pdev = rpc->rpd; > > + > > + aer_enable_irq(pdev); > > + > > + return 0; > > +} > > + > > /** > > * aer_root_reset - reset Root Port hierarchy, RCEC, or RCiEP > > * @dev: pointer to Root Port, RCEC, or RCiEP > > @@ -1420,6 +1440,8 @@ static struct pcie_port_service_driver aerdriver = { > > .service= PCIE_PORT_SERVICE_AER, > > > > .probe = aer_probe, > > + .suspend= aer_suspend, > > + .resume = aer_resume, > > .remove = aer_remove, > > }; > > > > -- > Sathyanarayanan Kuppuswamy > Linux Kernel Developer
[PATCH v4 3/3] PCI/DPC: Disable DPC interrupt during suspend
PCIe service that shares IRQ with PME may cause spurious wakeup on system suspend. Since AER is conditionally disabled in previous patch, also apply the same logic to disable DPC which depends on AER to work. PCIe Base Spec 5.0, section 5.2 "Link State Power Management" states that TLP and DLLP transmission is disabled for a Link in L2/L3 Ready (D3hot), L2 (D3cold with aux power) and L3 (D3cold), so we don't lose much here to disable DPC during system suspend. This is very similar to previous attempts to suspend AER and DPC [1], but with a different reason. [1] https://lore.kernel.org/linux-pci/20220408153159.106741-1-kai.heng.f...@canonical.com/ Link: https://bugzilla.kernel.org/show_bug.cgi?id=216295 Reviewed-by: Mika Westerberg Signed-off-by: Kai-Heng Feng --- drivers/pci/pcie/dpc.c | 26 ++ 1 file changed, 26 insertions(+) diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c index a5d7c69b764e..98bdefde6df1 100644 --- a/drivers/pci/pcie/dpc.c +++ b/drivers/pci/pcie/dpc.c @@ -385,6 +385,30 @@ static int dpc_probe(struct pcie_device *dev) return status; } +static int dpc_suspend(struct pcie_device *dev) +{ + struct pci_dev *pdev = dev->port; + u16 ctl; + + pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, &ctl); + ctl &= ~PCI_EXP_DPC_CTL_INT_EN; + pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, ctl); + + return 0; +} + +static int dpc_resume(struct pcie_device *dev) +{ + struct pci_dev *pdev = dev->port; + u16 ctl; + + pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, &ctl); + ctl |= PCI_EXP_DPC_CTL_INT_EN; + pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, ctl); + + return 0; +} + static void dpc_remove(struct pcie_device *dev) { struct pci_dev *pdev = dev->port; @@ -400,6 +424,8 @@ static struct pcie_port_service_driver dpcdriver = { .port_type = PCIE_ANY_PORT, .service= PCIE_PORT_SERVICE_DPC, .probe = dpc_probe, + .suspend= dpc_suspend, + .resume = dpc_resume, .remove = dpc_remove, }; -- 2.34.1
[PATCH v4 1/3] PCI/AER: Factor out interrupt toggling into helpers
There are many places that enable and disable AER interrput, so move them into helpers. Reviewed-by: Mika Westerberg Reviewed-by: Kuppuswamy Sathyanarayanan Signed-off-by: Kai-Heng Feng --- drivers/pci/pcie/aer.c | 45 +- 1 file changed, 27 insertions(+), 18 deletions(-) diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index f6c24ded134c..1420e1f27105 100644 --- a/drivers/pci/pcie/aer.c +++ b/drivers/pci/pcie/aer.c @@ -1227,6 +1227,28 @@ static irqreturn_t aer_irq(int irq, void *context) return IRQ_WAKE_THREAD; } +static void aer_enable_irq(struct pci_dev *pdev) +{ + int aer = pdev->aer_cap; + u32 reg32; + + /* Enable Root Port's interrupt in response to error messages */ + pci_read_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, ®32); + reg32 |= ROOT_PORT_INTR_ON_MESG_MASK; + pci_write_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, reg32); +} + +static void aer_disable_irq(struct pci_dev *pdev) +{ + int aer = pdev->aer_cap; + u32 reg32; + + /* Disable Root's interrupt in response to error messages */ + pci_read_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, ®32); + reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK; + pci_write_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, reg32); +} + /** * aer_enable_rootport - enable Root Port's interrupts when receiving messages * @rpc: pointer to a Root Port data structure @@ -1256,10 +1278,7 @@ static void aer_enable_rootport(struct aer_rpc *rpc) pci_read_config_dword(pdev, aer + PCI_ERR_UNCOR_STATUS, ®32); pci_write_config_dword(pdev, aer + PCI_ERR_UNCOR_STATUS, reg32); - /* Enable Root Port's interrupt in response to error messages */ - pci_read_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, ®32); - reg32 |= ROOT_PORT_INTR_ON_MESG_MASK; - pci_write_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, reg32); + aer_enable_irq(pdev); } /** @@ -1274,10 +1293,7 @@ static void aer_disable_rootport(struct aer_rpc *rpc) int aer = pdev->aer_cap; u32 reg32; - /* Disable Root's interrupt in response to error messages */ - pci_read_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, ®32); - reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK; - pci_write_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, reg32); + aer_disable_irq(pdev); /* Clear Root's error status reg */ pci_read_config_dword(pdev, aer + PCI_ERR_ROOT_STATUS, ®32); @@ -1372,12 +1388,8 @@ static pci_ers_result_t aer_root_reset(struct pci_dev *dev) */ aer = root ? root->aer_cap : 0; - if ((host->native_aer || pcie_ports_native) && aer) { - /* Disable Root's interrupt in response to error messages */ - pci_read_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, ®32); - reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK; - pci_write_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, reg32); - } + if ((host->native_aer || pcie_ports_native) && aer) + aer_disable_irq(root); if (type == PCI_EXP_TYPE_RC_EC || type == PCI_EXP_TYPE_RC_END) { rc = pcie_reset_flr(dev, PCI_RESET_DO_RESET); @@ -1396,10 +1408,7 @@ static pci_ers_result_t aer_root_reset(struct pci_dev *dev) pci_read_config_dword(root, aer + PCI_ERR_ROOT_STATUS, ®32); pci_write_config_dword(root, aer + PCI_ERR_ROOT_STATUS, reg32); - /* Enable Root Port's interrupt in response to error messages */ - pci_read_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, ®32); - reg32 |= ROOT_PORT_INTR_ON_MESG_MASK; - pci_write_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, reg32); + aer_enable_irq(root); } return rc ? PCI_ERS_RESULT_DISCONNECT : PCI_ERS_RESULT_RECOVERED; -- 2.34.1
[PATCH v4 2/3] PCI/AER: Disable AER interrupt on suspend
PCIe service that shares IRQ with PME may cause spurious wakeup on system suspend. PCIe Base Spec 5.0, section 5.2 "Link State Power Management" states that TLP and DLLP transmission is disabled for a Link in L2/L3 Ready (D3hot), L2 (D3cold with aux power) and L3 (D3cold), so we don't lose much here to disable AER during system suspend. This is very similar to previous attempts to suspend AER and DPC [1], but with a different reason. [1] https://lore.kernel.org/linux-pci/20220408153159.106741-1-kai.heng.f...@canonical.com/ Link: https://bugzilla.kernel.org/show_bug.cgi?id=216295 Reviewed-by: Mika Westerberg Signed-off-by: Kai-Heng Feng --- drivers/pci/pcie/aer.c | 22 ++ 1 file changed, 22 insertions(+) diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index 1420e1f27105..9c07fdbeb52d 100644 --- a/drivers/pci/pcie/aer.c +++ b/drivers/pci/pcie/aer.c @@ -1356,6 +1356,26 @@ static int aer_probe(struct pcie_device *dev) return 0; } +static int aer_suspend(struct pcie_device *dev) +{ + struct aer_rpc *rpc = get_service_data(dev); + struct pci_dev *pdev = rpc->rpd; + + aer_disable_irq(pdev); + + return 0; +} + +static int aer_resume(struct pcie_device *dev) +{ + struct aer_rpc *rpc = get_service_data(dev); + struct pci_dev *pdev = rpc->rpd; + + aer_enable_irq(pdev); + + return 0; +} + /** * aer_root_reset - reset Root Port hierarchy, RCEC, or RCiEP * @dev: pointer to Root Port, RCEC, or RCiEP @@ -1420,6 +1440,8 @@ static struct pcie_port_service_driver aerdriver = { .service= PCIE_PORT_SERVICE_AER, .probe = aer_probe, + .suspend= aer_suspend, + .resume = aer_resume, .remove = aer_remove, }; -- 2.34.1
Re: [PATCH v3 3/4] PCI/AER: Disable AER interrupt on suspend
On Thu, Apr 20, 2023 at 10:53 PM Sathyanarayanan Kuppuswamy wrote: > > > > On 4/20/23 5:59 AM, Kai-Heng Feng wrote: > > PCIe service that shares IRQ with PME may cause spurious wakeup on > > system suspend. > > > > PCIe Base Spec 5.0, section 5.2 "Link State Power Management" states > > that TLP and DLLP transmission is disabled for a Link in L2/L3 Ready > > (D3hot), L2 (D3cold with aux power) and L3 (D3cold), so we don't lose > > much here to disable AER during system suspend. > > > > This is very similar to previous attempts to suspend AER and DPC [1], > > but with a different reason. > > > > [1] > > https://lore.kernel.org/linux-pci/20220408153159.106741-1-kai.heng.f...@canonical.com/ > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=216295 > > > > Reviewed-by: Mika Westerberg > > Signed-off-by: Kai-Heng Feng > > --- > > In Patch #1, you skip clearing AER errors in the resume path, right? So if we > disable > interrupts here, will AER driver not be notified on resume path error? I agree the driver should report the error via aer_isr_one_error() on resume path. But on the system I am using (Intel ADL PCH), once the interrupt is disabled, PCI_ERR_ROOT_STATUS doesn't record error anymore. Not sure if it's intended though. Kai-Heng > > > v3: > > - No change. > > > > v2: > > - Only disable AER IRQ. > > - No more check on PME IRQ#. > > - Use helper. > > > > drivers/pci/pcie/aer.c | 22 ++ > > 1 file changed, 22 insertions(+) > > > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c > > index 1420e1f27105..9c07fdbeb52d 100644 > > --- a/drivers/pci/pcie/aer.c > > +++ b/drivers/pci/pcie/aer.c > > @@ -1356,6 +1356,26 @@ static int aer_probe(struct pcie_device *dev) > > return 0; > > } > > > > +static int aer_suspend(struct pcie_device *dev) > > +{ > > + struct aer_rpc *rpc = get_service_data(dev); > > + struct pci_dev *pdev = rpc->rpd; > > + > > + aer_disable_irq(pdev); > > + > > + return 0; > > +} > > + > > +static int aer_resume(struct pcie_device *dev) > > +{ > > + struct aer_rpc *rpc = get_service_data(dev); > > + struct pci_dev *pdev = rpc->rpd; > > + > > + aer_enable_irq(pdev); > > + > > + return 0; > > +} > > + > > /** > > * aer_root_reset - reset Root Port hierarchy, RCEC, or RCiEP > > * @dev: pointer to Root Port, RCEC, or RCiEP > > @@ -1420,6 +1440,8 @@ static struct pcie_port_service_driver aerdriver = { > > .service= PCIE_PORT_SERVICE_AER, > > > > .probe = aer_probe, > > + .suspend= aer_suspend, > > + .resume = aer_resume, > > .remove = aer_remove, > > }; > > > > -- > Sathyanarayanan Kuppuswamy > Linux Kernel Developer
[PATCH v3 4/4] PCI/DPC: Disable DPC interrupt during suspend
PCIe service that shares IRQ with PME may cause spurious wakeup on system suspend. Since AER is conditionally disabled in previous patch, also apply the same logic to disable DPC which depends on AER to work. PCIe Base Spec 5.0, section 5.2 "Link State Power Management" states that TLP and DLLP transmission is disabled for a Link in L2/L3 Ready (D3hot), L2 (D3cold with aux power) and L3 (D3cold), so we don't lose much here to disable DPC during system suspend. This is very similar to previous attempts to suspend AER and DPC [1], but with a different reason. [1] https://lore.kernel.org/linux-pci/20220408153159.106741-1-kai.heng.f...@canonical.com/ Link: https://bugzilla.kernel.org/show_bug.cgi?id=216295 Reviewed-by: Mika Westerberg Signed-off-by: Kai-Heng Feng --- v3: - No change. v2: - Only disable DPC IRQ. - No more check on PME IRQ#. drivers/pci/pcie/dpc.c | 26 ++ 1 file changed, 26 insertions(+) diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c index a5d7c69b764e..98bdefde6df1 100644 --- a/drivers/pci/pcie/dpc.c +++ b/drivers/pci/pcie/dpc.c @@ -385,6 +385,30 @@ static int dpc_probe(struct pcie_device *dev) return status; } +static int dpc_suspend(struct pcie_device *dev) +{ + struct pci_dev *pdev = dev->port; + u16 ctl; + + pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, &ctl); + ctl &= ~PCI_EXP_DPC_CTL_INT_EN; + pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, ctl); + + return 0; +} + +static int dpc_resume(struct pcie_device *dev) +{ + struct pci_dev *pdev = dev->port; + u16 ctl; + + pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, &ctl); + ctl |= PCI_EXP_DPC_CTL_INT_EN; + pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, ctl); + + return 0; +} + static void dpc_remove(struct pcie_device *dev) { struct pci_dev *pdev = dev->port; @@ -400,6 +424,8 @@ static struct pcie_port_service_driver dpcdriver = { .port_type = PCIE_ANY_PORT, .service= PCIE_PORT_SERVICE_DPC, .probe = dpc_probe, + .suspend= dpc_suspend, + .resume = dpc_resume, .remove = dpc_remove, }; -- 2.34.1
[PATCH v3 3/4] PCI/AER: Disable AER interrupt on suspend
PCIe service that shares IRQ with PME may cause spurious wakeup on system suspend. PCIe Base Spec 5.0, section 5.2 "Link State Power Management" states that TLP and DLLP transmission is disabled for a Link in L2/L3 Ready (D3hot), L2 (D3cold with aux power) and L3 (D3cold), so we don't lose much here to disable AER during system suspend. This is very similar to previous attempts to suspend AER and DPC [1], but with a different reason. [1] https://lore.kernel.org/linux-pci/20220408153159.106741-1-kai.heng.f...@canonical.com/ Link: https://bugzilla.kernel.org/show_bug.cgi?id=216295 Reviewed-by: Mika Westerberg Signed-off-by: Kai-Heng Feng --- v3: - No change. v2: - Only disable AER IRQ. - No more check on PME IRQ#. - Use helper. drivers/pci/pcie/aer.c | 22 ++ 1 file changed, 22 insertions(+) diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index 1420e1f27105..9c07fdbeb52d 100644 --- a/drivers/pci/pcie/aer.c +++ b/drivers/pci/pcie/aer.c @@ -1356,6 +1356,26 @@ static int aer_probe(struct pcie_device *dev) return 0; } +static int aer_suspend(struct pcie_device *dev) +{ + struct aer_rpc *rpc = get_service_data(dev); + struct pci_dev *pdev = rpc->rpd; + + aer_disable_irq(pdev); + + return 0; +} + +static int aer_resume(struct pcie_device *dev) +{ + struct aer_rpc *rpc = get_service_data(dev); + struct pci_dev *pdev = rpc->rpd; + + aer_enable_irq(pdev); + + return 0; +} + /** * aer_root_reset - reset Root Port hierarchy, RCEC, or RCiEP * @dev: pointer to Root Port, RCEC, or RCiEP @@ -1420,6 +1440,8 @@ static struct pcie_port_service_driver aerdriver = { .service= PCIE_PORT_SERVICE_AER, .probe = aer_probe, + .suspend= aer_suspend, + .resume = aer_resume, .remove = aer_remove, }; -- 2.34.1
[PATCH v3 2/4] PCI/AER: Factor out interrupt toggling into helpers
There are many places that enable and disable AER interrput, so move them into helpers. Signed-off-by: Kai-Heng Feng --- v3: - Correct subject. v2: - New patch. drivers/pci/pcie/aer.c | 45 +- 1 file changed, 27 insertions(+), 18 deletions(-) diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index f6c24ded134c..1420e1f27105 100644 --- a/drivers/pci/pcie/aer.c +++ b/drivers/pci/pcie/aer.c @@ -1227,6 +1227,28 @@ static irqreturn_t aer_irq(int irq, void *context) return IRQ_WAKE_THREAD; } +static void aer_enable_irq(struct pci_dev *pdev) +{ + int aer = pdev->aer_cap; + u32 reg32; + + /* Enable Root Port's interrupt in response to error messages */ + pci_read_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, ®32); + reg32 |= ROOT_PORT_INTR_ON_MESG_MASK; + pci_write_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, reg32); +} + +static void aer_disable_irq(struct pci_dev *pdev) +{ + int aer = pdev->aer_cap; + u32 reg32; + + /* Disable Root's interrupt in response to error messages */ + pci_read_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, ®32); + reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK; + pci_write_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, reg32); +} + /** * aer_enable_rootport - enable Root Port's interrupts when receiving messages * @rpc: pointer to a Root Port data structure @@ -1256,10 +1278,7 @@ static void aer_enable_rootport(struct aer_rpc *rpc) pci_read_config_dword(pdev, aer + PCI_ERR_UNCOR_STATUS, ®32); pci_write_config_dword(pdev, aer + PCI_ERR_UNCOR_STATUS, reg32); - /* Enable Root Port's interrupt in response to error messages */ - pci_read_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, ®32); - reg32 |= ROOT_PORT_INTR_ON_MESG_MASK; - pci_write_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, reg32); + aer_enable_irq(pdev); } /** @@ -1274,10 +1293,7 @@ static void aer_disable_rootport(struct aer_rpc *rpc) int aer = pdev->aer_cap; u32 reg32; - /* Disable Root's interrupt in response to error messages */ - pci_read_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, ®32); - reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK; - pci_write_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, reg32); + aer_disable_irq(pdev); /* Clear Root's error status reg */ pci_read_config_dword(pdev, aer + PCI_ERR_ROOT_STATUS, ®32); @@ -1372,12 +1388,8 @@ static pci_ers_result_t aer_root_reset(struct pci_dev *dev) */ aer = root ? root->aer_cap : 0; - if ((host->native_aer || pcie_ports_native) && aer) { - /* Disable Root's interrupt in response to error messages */ - pci_read_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, ®32); - reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK; - pci_write_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, reg32); - } + if ((host->native_aer || pcie_ports_native) && aer) + aer_disable_irq(root); if (type == PCI_EXP_TYPE_RC_EC || type == PCI_EXP_TYPE_RC_END) { rc = pcie_reset_flr(dev, PCI_RESET_DO_RESET); @@ -1396,10 +1408,7 @@ static pci_ers_result_t aer_root_reset(struct pci_dev *dev) pci_read_config_dword(root, aer + PCI_ERR_ROOT_STATUS, ®32); pci_write_config_dword(root, aer + PCI_ERR_ROOT_STATUS, reg32); - /* Enable Root Port's interrupt in response to error messages */ - pci_read_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, ®32); - reg32 |= ROOT_PORT_INTR_ON_MESG_MASK; - pci_write_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, reg32); + aer_enable_irq(root); } return rc ? PCI_ERS_RESULT_DISCONNECT : PCI_ERS_RESULT_RECOVERED; -- 2.34.1
[PATCH v2 4/4] PCI/DPC: Disable DPC interrupt during suspend
PCIe service that shares IRQ with PME may cause spurious wakeup on system suspend. Since AER is conditionally disabled in previous patch, also apply the same logic to disable DPC which depends on AER to work. PCIe Base Spec 5.0, section 5.2 "Link State Power Management" states that TLP and DLLP transmission is disabled for a Link in L2/L3 Ready (D3hot), L2 (D3cold with aux power) and L3 (D3cold), so we don't lose much here to disable DPC during system suspend. This is very similar to previous attempts to suspend AER and DPC [1], but with a different reason. [1] https://lore.kernel.org/linux-pci/20220408153159.106741-1-kai.heng.f...@canonical.com/ Link: https://bugzilla.kernel.org/show_bug.cgi?id=216295 Signed-off-by: Kai-Heng Feng --- v2: - Only disable DPC IRQ. - No more check on PME IRQ#. drivers/pci/pcie/dpc.c | 26 ++ 1 file changed, 26 insertions(+) diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c index a5d7c69b764e..98bdefde6df1 100644 --- a/drivers/pci/pcie/dpc.c +++ b/drivers/pci/pcie/dpc.c @@ -385,6 +385,30 @@ static int dpc_probe(struct pcie_device *dev) return status; } +static int dpc_suspend(struct pcie_device *dev) +{ + struct pci_dev *pdev = dev->port; + u16 ctl; + + pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, &ctl); + ctl &= ~PCI_EXP_DPC_CTL_INT_EN; + pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, ctl); + + return 0; +} + +static int dpc_resume(struct pcie_device *dev) +{ + struct pci_dev *pdev = dev->port; + u16 ctl; + + pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, &ctl); + ctl |= PCI_EXP_DPC_CTL_INT_EN; + pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, ctl); + + return 0; +} + static void dpc_remove(struct pcie_device *dev) { struct pci_dev *pdev = dev->port; @@ -400,6 +424,8 @@ static struct pcie_port_service_driver dpcdriver = { .port_type = PCIE_ANY_PORT, .service= PCIE_PORT_SERVICE_DPC, .probe = dpc_probe, + .suspend= dpc_suspend, + .resume = dpc_resume, .remove = dpc_remove, }; -- 2.34.1
[PATCH v2 3/4] PCI/AER: Disable AER interrupt on suspend
PCIe service that shares IRQ with PME may cause spurious wakeup on system suspend. PCIe Base Spec 5.0, section 5.2 "Link State Power Management" states that TLP and DLLP transmission is disabled for a Link in L2/L3 Ready (D3hot), L2 (D3cold with aux power) and L3 (D3cold), so we don't lose much here to disable AER during system suspend. This is very similar to previous attempts to suspend AER and DPC [1], but with a different reason. [1] https://lore.kernel.org/linux-pci/20220408153159.106741-1-kai.heng.f...@canonical.com/ Link: https://bugzilla.kernel.org/show_bug.cgi?id=216295 Signed-off-by: Kai-Heng Feng --- v2: - Only disable AER IRQ. - No more check on PME IRQ#. - Use helper. drivers/pci/pcie/aer.c | 22 ++ 1 file changed, 22 insertions(+) diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index 1420e1f27105..9c07fdbeb52d 100644 --- a/drivers/pci/pcie/aer.c +++ b/drivers/pci/pcie/aer.c @@ -1356,6 +1356,26 @@ static int aer_probe(struct pcie_device *dev) return 0; } +static int aer_suspend(struct pcie_device *dev) +{ + struct aer_rpc *rpc = get_service_data(dev); + struct pci_dev *pdev = rpc->rpd; + + aer_disable_irq(pdev); + + return 0; +} + +static int aer_resume(struct pcie_device *dev) +{ + struct aer_rpc *rpc = get_service_data(dev); + struct pci_dev *pdev = rpc->rpd; + + aer_enable_irq(pdev); + + return 0; +} + /** * aer_root_reset - reset Root Port hierarchy, RCEC, or RCiEP * @dev: pointer to Root Port, RCEC, or RCiEP @@ -1420,6 +1440,8 @@ static struct pcie_port_service_driver aerdriver = { .service= PCIE_PORT_SERVICE_AER, .probe = aer_probe, + .suspend= aer_suspend, + .resume = aer_resume, .remove = aer_remove, }; -- 2.34.1
[PATCH v2 2/4] PCI/AER: Factor out interrput toggling into helpers
There are many places that enable and disable AER interrput, so move them into helpers. Signed-off-by: Kai-Heng Feng --- v2: - New patch. drivers/pci/pcie/aer.c | 45 +- 1 file changed, 27 insertions(+), 18 deletions(-) diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index f6c24ded134c..1420e1f27105 100644 --- a/drivers/pci/pcie/aer.c +++ b/drivers/pci/pcie/aer.c @@ -1227,6 +1227,28 @@ static irqreturn_t aer_irq(int irq, void *context) return IRQ_WAKE_THREAD; } +static void aer_enable_irq(struct pci_dev *pdev) +{ + int aer = pdev->aer_cap; + u32 reg32; + + /* Enable Root Port's interrupt in response to error messages */ + pci_read_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, ®32); + reg32 |= ROOT_PORT_INTR_ON_MESG_MASK; + pci_write_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, reg32); +} + +static void aer_disable_irq(struct pci_dev *pdev) +{ + int aer = pdev->aer_cap; + u32 reg32; + + /* Disable Root's interrupt in response to error messages */ + pci_read_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, ®32); + reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK; + pci_write_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, reg32); +} + /** * aer_enable_rootport - enable Root Port's interrupts when receiving messages * @rpc: pointer to a Root Port data structure @@ -1256,10 +1278,7 @@ static void aer_enable_rootport(struct aer_rpc *rpc) pci_read_config_dword(pdev, aer + PCI_ERR_UNCOR_STATUS, ®32); pci_write_config_dword(pdev, aer + PCI_ERR_UNCOR_STATUS, reg32); - /* Enable Root Port's interrupt in response to error messages */ - pci_read_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, ®32); - reg32 |= ROOT_PORT_INTR_ON_MESG_MASK; - pci_write_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, reg32); + aer_enable_irq(pdev); } /** @@ -1274,10 +1293,7 @@ static void aer_disable_rootport(struct aer_rpc *rpc) int aer = pdev->aer_cap; u32 reg32; - /* Disable Root's interrupt in response to error messages */ - pci_read_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, ®32); - reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK; - pci_write_config_dword(pdev, aer + PCI_ERR_ROOT_COMMAND, reg32); + aer_disable_irq(pdev); /* Clear Root's error status reg */ pci_read_config_dword(pdev, aer + PCI_ERR_ROOT_STATUS, ®32); @@ -1372,12 +1388,8 @@ static pci_ers_result_t aer_root_reset(struct pci_dev *dev) */ aer = root ? root->aer_cap : 0; - if ((host->native_aer || pcie_ports_native) && aer) { - /* Disable Root's interrupt in response to error messages */ - pci_read_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, ®32); - reg32 &= ~ROOT_PORT_INTR_ON_MESG_MASK; - pci_write_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, reg32); - } + if ((host->native_aer || pcie_ports_native) && aer) + aer_disable_irq(root); if (type == PCI_EXP_TYPE_RC_EC || type == PCI_EXP_TYPE_RC_END) { rc = pcie_reset_flr(dev, PCI_RESET_DO_RESET); @@ -1396,10 +1408,7 @@ static pci_ers_result_t aer_root_reset(struct pci_dev *dev) pci_read_config_dword(root, aer + PCI_ERR_ROOT_STATUS, ®32); pci_write_config_dword(root, aer + PCI_ERR_ROOT_STATUS, reg32); - /* Enable Root Port's interrupt in response to error messages */ - pci_read_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, ®32); - reg32 |= ROOT_PORT_INTR_ON_MESG_MASK; - pci_write_config_dword(root, aer + PCI_ERR_ROOT_COMMAND, reg32); + aer_enable_irq(root); } return rc ? PCI_ERS_RESULT_DISCONNECT : PCI_ERS_RESULT_RECOVERED; -- 2.34.1
Re: [PATCH 3/3] PCI/DPC: Disable DPC service on suspend when IRQ is shared with PME
On Thu, Sep 29, 2022 at 5:24 AM Bjorn Helgaas wrote: > > On Wed, Jul 27, 2022 at 09:32:52AM +0800, Kai-Heng Feng wrote: > > PCIe service that shares IRQ with PME may cause spurious wakeup on > > system suspend. > > > > Since AER is conditionally disabled in previous patch, also apply the > > same condition to disable DPC which depends on AER to work. > > > > PCIe Base Spec 5.0, section 5.2 "Link State Power Management" states > > that TLP and DLLP transmission is disabled for a Link in L2/L3 Ready > > (D3hot), L2 (D3cold with aux power) and L3 (D3cold), so we don't lose > > much here to disable DPC during system suspend. > > > > This is very similar to previous attempts to suspend AER and DPC [1], > > but with a different reason. > > > > [1] > > https://lore.kernel.org/linux-pci/20220408153159.106741-1-kai.heng.f...@canonical.com/ > > Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=216295 > > > > Signed-off-by: Kai-Heng Feng > > --- > > drivers/pci/pcie/dpc.c | 52 +- > > 1 file changed, 41 insertions(+), 11 deletions(-) > > > > diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c > > index 3e9afee02e8d1..542f282c43f75 100644 > > --- a/drivers/pci/pcie/dpc.c > > +++ b/drivers/pci/pcie/dpc.c > > @@ -343,13 +343,33 @@ void pci_dpc_init(struct pci_dev *pdev) > > } > > } > > > > +static void dpc_enable(struct pcie_device *dev) > > +{ > > + struct pci_dev *pdev = dev->port; > > + u16 ctl; > > + > > + pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, &ctl); > > + ctl = (ctl & 0xfff4) | PCI_EXP_DPC_CTL_EN_FATAL | > > PCI_EXP_DPC_CTL_INT_EN; > > + pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, ctl); > > +} > Sorry for the belated response. > I guess the reason we need this is because we disable interupts in > pci_pm_suspend() first, then we call pci_save_dpc_state() from > pci_pm_suspend_noirq(), so we save the *disabled* control register. > Then when we resume, we restore that disabled control register so we > need to enable DPC again. Right? Sorry for the belated response. Yes, and the same logic applies to AER too. > > I think we should save a "dpc_enabled" bit in the pci_dev and > conditionally set PCI_EXP_DPC_CTL_INT_EN here. If we unconditionally > set it here, we depend on portdrv *not* calling dpc_resume() if we > didn't enable DPC at enumeration-time for some reason. Does this scenario really happen? Once the port is marked with PCIE_PORT_SERVICE_DPC, DPC will be enabled by dpc_probe(). So an additional bit seems to be unnecessary. > > And I would leave PCI_EXP_DPC_CTL_EN_FATAL alone; see below. > > > +static void dpc_disable(struct pcie_device *dev) > > +{ > > + struct pci_dev *pdev = dev->port; > > + u16 ctl; > > + > > + pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, &ctl); > > + ctl &= ~(PCI_EXP_DPC_CTL_EN_FATAL | PCI_EXP_DPC_CTL_INT_EN); > > + pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, ctl); > > #define PCI_EXP_DPC_CTL_EN_FATAL 0x0001 > #define PCI_EXP_DPC_CTL_INT_EN 0x0008 > > Clearing PCI_EXP_DPC_CTL_INT_EN makes sense to me, but I don't > understand the PCI_EXP_DPC_CTL_EN_FATAL part. > > PCI_EXP_DPC_CTL_EN_FATAL is one of the four values of the two-bit DPC > Trigger Enable, so clearing that bit leaves the field as either 00b > (DPC is disabled) or 10b (DPC enabled and triggered when the port > detects an uncorrectable error or receives an ERR_NONFATAL or > ERR_FATAL message). > > I think we should only clear PCI_EXP_DPC_CTL_INT_EN. Yes, clearing PCI_EXP_DPC_CTL_INT_EN should be sufficient. > > > +} > > + > > #define FLAG(x, y) (((x) & (y)) ? '+' : '-') > > static int dpc_probe(struct pcie_device *dev) > > { > > struct pci_dev *pdev = dev->port; > > struct device *device = &dev->device; > > int status; > > - u16 ctl, cap; > > + u16 cap; > > > > if (!pcie_aer_is_native(pdev) && !pcie_ports_dpc_native) > > return -ENOTSUPP; > > @@ -364,10 +384,7 @@ static int dpc_probe(struct pcie_device *dev) > > } > > > > pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CAP, &cap); > > - pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, &ctl); > > - > > - ctl = (ctl & 0xfff4) | PCI_EXP_DPC_CTL_EN_FATAL | > > PC
Re: [PATCH 2/3] PCI/AER: Disable AER service on suspend when IRQ is shared with PME
On Thu, Sep 29, 2022 at 5:46 AM Bjorn Helgaas wrote: > > On Wed, Jul 27, 2022 at 09:32:51AM +0800, Kai-Heng Feng wrote: > > PCIe service that shares IRQ with PME may cause spurious wakeup on > > system suspend. > > > > PCIe Base Spec 5.0, section 5.2 "Link State Power Management" states > > that TLP and DLLP transmission is disabled for a Link in L2/L3 Ready > > (D3hot), L2 (D3cold with aux power) and L3 (D3cold), so we don't lose > > much here to disable AER during system suspend. > > > > This is very similar to previous attempts to suspend AER and DPC [1], > > but with a different reason. > > > > [1] > > https://lore.kernel.org/linux-pci/20220408153159.106741-1-kai.heng.f...@canonical.com/ > > Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=216295 > > > > Signed-off-by: Kai-Heng Feng > > --- > > drivers/pci/pcie/aer.c | 23 ++- > > 1 file changed, 22 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c > > index 7952e5efd6cf3..60cc373754af2 100644 > > --- a/drivers/pci/pcie/aer.c > > +++ b/drivers/pci/pcie/aer.c > > @@ -1372,6 +1372,26 @@ static int aer_probe(struct pcie_device *dev) > > return 0; > > } > > > > +static int aer_suspend(struct pcie_device *dev) > > +{ > > + struct aer_rpc *rpc = get_service_data(dev); > > + > > + if (dev->shared_pme_irq) > > + aer_disable_rootport(rpc); > > aer_disable_rootport() seems like it might be overkill. IIUC, what > we want to do here is disable AER interrupts, which should only > require clearing ROOT_PORT_INTR_ON_MESG_MASK in PCI_ERR_ROOT_COMMAND. > > In addition to clearing ROOT_PORT_INTR_ON_MESG_MASK, > aer_disable_rootport() traverses the whole hierarchy, clearing > PCI_EXP_AER_FLAGS (CERE | NFERE | FERE | URRE) in PCI_EXP_DEVCTL. > I don't think these DEVCTL bits control interrupt generation, so I > don't know why we need to touch them. > > aer_disable_rootport() also clears PCI_ERR_ROOT_STATUS, which I think > we should not do during suspend either. We might want to clear it > on resume (which we already do in pci_restore_state()), but I think > generally we should preserve error information as long as it doesn't > cause trouble. > > Your thoughts please :) Sorry for the belated response. Clearing ROOT_PORT_INTR_ON_MESG_MASK along to disable interrupt can solve the issue too. And I agree that the AER information should be preserved too. Kai-Heng > > > + > > + return 0; > > +} > > + > > +static int aer_resume(struct pcie_device *dev) > > +{ > > + struct aer_rpc *rpc = get_service_data(dev); > > + > > + if (dev->shared_pme_irq) > > + aer_enable_rootport(rpc); > > + > > + return 0; > > +} > > + > > /** > > * aer_root_reset - reset Root Port hierarchy, RCEC, or RCiEP > > * @dev: pointer to Root Port, RCEC, or RCiEP > > @@ -1441,8 +1461,9 @@ static struct pcie_port_service_driver aerdriver = { > > .name = "aer", > > .port_type = PCIE_ANY_PORT, > > .service= PCIE_PORT_SERVICE_AER, > > - > > .probe = aer_probe, > > + .suspend= aer_suspend, > > + .resume = aer_resume, > > .remove = aer_remove, > > }; > > > > -- > > 2.36.1 > >
[PATCH 3/3] PCI/DPC: Disable DPC service on suspend when IRQ is shared with PME
PCIe service that shares IRQ with PME may cause spurious wakeup on system suspend. Since AER is conditionally disabled in previous patch, also apply the same condition to disable DPC which depends on AER to work. PCIe Base Spec 5.0, section 5.2 "Link State Power Management" states that TLP and DLLP transmission is disabled for a Link in L2/L3 Ready (D3hot), L2 (D3cold with aux power) and L3 (D3cold), so we don't lose much here to disable DPC during system suspend. This is very similar to previous attempts to suspend AER and DPC [1], but with a different reason. [1] https://lore.kernel.org/linux-pci/20220408153159.106741-1-kai.heng.f...@canonical.com/ Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=216295 Signed-off-by: Kai-Heng Feng --- drivers/pci/pcie/dpc.c | 52 +- 1 file changed, 41 insertions(+), 11 deletions(-) diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c index 3e9afee02e8d1..542f282c43f75 100644 --- a/drivers/pci/pcie/dpc.c +++ b/drivers/pci/pcie/dpc.c @@ -343,13 +343,33 @@ void pci_dpc_init(struct pci_dev *pdev) } } +static void dpc_enable(struct pcie_device *dev) +{ + struct pci_dev *pdev = dev->port; + u16 ctl; + + pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, &ctl); + ctl = (ctl & 0xfff4) | PCI_EXP_DPC_CTL_EN_FATAL | PCI_EXP_DPC_CTL_INT_EN; + pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, ctl); +} + +static void dpc_disable(struct pcie_device *dev) +{ + struct pci_dev *pdev = dev->port; + u16 ctl; + + pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, &ctl); + ctl &= ~(PCI_EXP_DPC_CTL_EN_FATAL | PCI_EXP_DPC_CTL_INT_EN); + pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, ctl); +} + #define FLAG(x, y) (((x) & (y)) ? '+' : '-') static int dpc_probe(struct pcie_device *dev) { struct pci_dev *pdev = dev->port; struct device *device = &dev->device; int status; - u16 ctl, cap; + u16 cap; if (!pcie_aer_is_native(pdev) && !pcie_ports_dpc_native) return -ENOTSUPP; @@ -364,10 +384,7 @@ static int dpc_probe(struct pcie_device *dev) } pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CAP, &cap); - pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, &ctl); - - ctl = (ctl & 0xfff4) | PCI_EXP_DPC_CTL_EN_FATAL | PCI_EXP_DPC_CTL_INT_EN; - pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, ctl); + dpc_enable(dev); pci_info(pdev, "enabled with IRQ %d\n", dev->irq); pci_info(pdev, "error containment capabilities: Int Msg #%d, RPExt%c PoisonedTLP%c SwTrigger%c RP PIO Log %d, DL_ActiveErr%c\n", @@ -380,14 +397,25 @@ static int dpc_probe(struct pcie_device *dev) return status; } -static void dpc_remove(struct pcie_device *dev) +static int dpc_suspend(struct pcie_device *dev) { - struct pci_dev *pdev = dev->port; - u16 ctl; + if (dev->shared_pme_irq) + dpc_disable(dev); - pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, &ctl); - ctl &= ~(PCI_EXP_DPC_CTL_EN_FATAL | PCI_EXP_DPC_CTL_INT_EN); - pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, ctl); + return 0; +} + +static int dpc_resume(struct pcie_device *dev) +{ + if (dev->shared_pme_irq) + dpc_enable(dev); + + return 0; +} + +static void dpc_remove(struct pcie_device *dev) +{ + dpc_disable(dev); } static struct pcie_port_service_driver dpcdriver = { @@ -395,6 +423,8 @@ static struct pcie_port_service_driver dpcdriver = { .port_type = PCIE_ANY_PORT, .service= PCIE_PORT_SERVICE_DPC, .probe = dpc_probe, + .suspend= dpc_suspend, + .resume = dpc_resume, .remove = dpc_remove, }; -- 2.36.1
[PATCH 2/3] PCI/AER: Disable AER service on suspend when IRQ is shared with PME
PCIe service that shares IRQ with PME may cause spurious wakeup on system suspend. PCIe Base Spec 5.0, section 5.2 "Link State Power Management" states that TLP and DLLP transmission is disabled for a Link in L2/L3 Ready (D3hot), L2 (D3cold with aux power) and L3 (D3cold), so we don't lose much here to disable AER during system suspend. This is very similar to previous attempts to suspend AER and DPC [1], but with a different reason. [1] https://lore.kernel.org/linux-pci/20220408153159.106741-1-kai.heng.f...@canonical.com/ Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=216295 Signed-off-by: Kai-Heng Feng --- drivers/pci/pcie/aer.c | 23 ++- 1 file changed, 22 insertions(+), 1 deletion(-) diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index 7952e5efd6cf3..60cc373754af2 100644 --- a/drivers/pci/pcie/aer.c +++ b/drivers/pci/pcie/aer.c @@ -1372,6 +1372,26 @@ static int aer_probe(struct pcie_device *dev) return 0; } +static int aer_suspend(struct pcie_device *dev) +{ + struct aer_rpc *rpc = get_service_data(dev); + + if (dev->shared_pme_irq) + aer_disable_rootport(rpc); + + return 0; +} + +static int aer_resume(struct pcie_device *dev) +{ + struct aer_rpc *rpc = get_service_data(dev); + + if (dev->shared_pme_irq) + aer_enable_rootport(rpc); + + return 0; +} + /** * aer_root_reset - reset Root Port hierarchy, RCEC, or RCiEP * @dev: pointer to Root Port, RCEC, or RCiEP @@ -1441,8 +1461,9 @@ static struct pcie_port_service_driver aerdriver = { .name = "aer", .port_type = PCIE_ANY_PORT, .service= PCIE_PORT_SERVICE_AER, - .probe = aer_probe, + .suspend= aer_suspend, + .resume = aer_resume, .remove = aer_remove, }; -- 2.36.1
Re: [PATCH v4 1/2] PCI/AER: Disable AER service when link is in L2/L3 ready, L2 and L3 state
On Sat, Apr 23, 2022 at 6:26 AM Bjorn Helgaas wrote: > > [+cc Rafael, linux-pm; sorry forgot this last time] > > On Fri, Apr 22, 2022 at 05:24:36PM -0500, Bjorn Helgaas wrote: > > On Fri, Apr 08, 2022 at 11:31:58PM +0800, Kai-Heng Feng wrote: > > > On Intel Alder Lake platforms, Thunderbolt entering D3cold can cause > > > some errors reported by AER: > > > [ 30.100211] pcieport :00:1d.0: AER: Uncorrected (Non-Fatal) error > > > received: :00:1d.0 > > > [ 30.100251] pcieport :00:1d.0: PCIe Bus Error: > > > severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID) > > > [ 30.100256] pcieport :00:1d.0: device [8086:7ab0] error > > > status/mask=0010/4000 > > > [ 30.100262] pcieport :00:1d.0:[20] UnsupReq > > > (First) > > > [ 30.100267] pcieport :00:1d.0: AER: TLP Header: 3400 > > > 0852 > > > [ 30.100372] thunderbolt :0a:00.0: AER: can't recover (no > > > error_detected callback) > > > [ 30.100401] xhci_hcd :3e:00.0: AER: can't recover (no > > > error_detected callback) > > > [ 30.100427] pcieport :00:1d.0: AER: device recovery failed > > > > > > So disable AER service to avoid the noises from turning power rails > > > on/off when the device is in low power states (D3hot and D3cold), as > > > PCIe Base Spec 5.0, section 5.2 "Link State Power Management" states > > > that TLP and DLLP transmission is disabled for a Link in L2/L3 Ready > > > (D3hot), L2 (D3cold with aux power) and L3 (D3cold). > > > > Help me walk through what's happening here, because I'm never very > > confident about how error reporting works. I *think* the Unsupported > > Request error means some request was in progress and was not > > completed. I don't think a link going down should by itself cause > > an Unsupported Request error because there's no *request*. > > > > I have a theory about what happened here. Decoding the TLP Header > > (from PCIe r6.0, sec 2.2.1.1, 2.2.8.10) gives: > > > > 3400 (0011 0100 ...): > > Fmt 0014 DW header, no data > > Type 1 0100Msg, Local - Terminate at Receiver > > > > 0852 (0800 ... 0101 0010) > > Requester ID 080000:08.0 > > Message Code 0101 0010 PTM Request Is there any TLP decoder software available? That will be really helpful for debugging. > > > > From your lspci in bugzilla, 08:00 has PTM enabled. So my theory is > > that: > > > > - 08:00.0 sent a PTM Request Message (a Posted Request) > > - 00:1d.0 received the PTM Request Message > > - The link transitioned to DL_Down > > - Per sec 2.9.1, 00:1d.0 discarded the Request and reported an > > Unsupported Request > > - Or, per sec 6.21.3, if 00:1d.0 received a PTM Request when its > > own PTM Enable was clear, it would also be treated as an > > Unsupported Request > > > > So I suspect we should disable PTM on 08:00.0 before putting it in a > > low-power state. If you manually disable PTM on 08:00.0, do these > > errors stop happening? Yes, disabling PTM on upstream port can solve the issue. Thanks for find the root cause! > > > > David did something like this [1], but just for Root Ports. That > > looks wrong to me because sec 6.21.3 says we should not have PTM > > enabled in an Upstream Port (i.e., in a downstream device like > > 08:00.0) unless it is already enabled in the Downstream Port (i.e., in > > the Root Port 00:1d.0). So I think it should be like this? diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index cfaf40a540a82..8ba8a0e12946e 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -2717,7 +2717,8 @@ int pci_prepare_to_sleep(struct pci_dev *dev) * port to enter a lower-power PM state and the SoC to reach a * lower-power idle state as a whole. */ - if (pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT) + if (pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT || + pci_pcie_type(dev) == PCI_EXP_TYPE_UPSTREAM) pci_disable_ptm(dev); pci_enable_wake(dev, target_state, wakeup); @@ -2775,7 +2776,8 @@ int pci_finish_runtime_suspend(struct pci_dev *dev) * port to enter a lower-power PM state and the SoC to reach a * lower-power idle state as a whole. */ - if (pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT) + if (pci_pcie_type(dev) == PCI_EXP_TYPE_ROOT_PORT || + pci_pcie_type(dev) == PCI_EXP_
Re: [PATCH v4 2/2] PCI/DPC: Disable DPC service when link is in L2/L3 ready, L2 and L3 state
On Mon, Apr 18, 2022 at 10:41 AM Sathyanarayanan Kuppuswamy wrote: > > > > On 4/8/22 8:31 AM, Kai-Heng Feng wrote: > > On Intel Alder Lake platforms, Thunderbolt entering D3cold can cause > > some errors reported by AER: > > [ 30.100211] pcieport :00:1d.0: AER: Uncorrected (Non-Fatal) error > > received: :00:1d.0 > > [ 30.100251] pcieport :00:1d.0: PCIe Bus Error: severity=Uncorrected > > (Non-Fatal), type=Transaction Layer, (Requester ID) > > [ 30.100256] pcieport :00:1d.0: device [8086:7ab0] error > > status/mask=0010/4000 > > [ 30.100262] pcieport :00:1d.0:[20] UnsupReq (First) > > [ 30.100267] pcieport :00:1d.0: AER: TLP Header: 3400 0852 > > > > [ 30.100372] thunderbolt :0a:00.0: AER: can't recover (no > > error_detected callback) > > [ 30.100401] xhci_hcd :3e:00.0: AER: can't recover (no error_detected > > callback) > > [ 30.100427] pcieport :00:1d.0: AER: device recovery failed > > > > Since AER is disabled in previous patch for a Link in L2/L3 Ready, L2 > > and L3, also disable DPC here as DPC depends on AER to work. > > > > Bugzilla:https://bugzilla.kernel.org/show_bug.cgi?id=215453 > > Reviewed-by: Mika Westerberg > > Signed-off-by: Kai-Heng Feng > > Reviewed-by: Kuppuswamy Sathyanarayanan > A gentle ping... > -- > Sathyanarayanan Kuppuswamy > Linux Kernel Developer
[PATCH v4 2/2] PCI/DPC: Disable DPC service when link is in L2/L3 ready, L2 and L3 state
On Intel Alder Lake platforms, Thunderbolt entering D3cold can cause some errors reported by AER: [ 30.100211] pcieport :00:1d.0: AER: Uncorrected (Non-Fatal) error received: :00:1d.0 [ 30.100251] pcieport :00:1d.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID) [ 30.100256] pcieport :00:1d.0: device [8086:7ab0] error status/mask=0010/4000 [ 30.100262] pcieport :00:1d.0:[20] UnsupReq (First) [ 30.100267] pcieport :00:1d.0: AER: TLP Header: 3400 0852 [ 30.100372] thunderbolt :0a:00.0: AER: can't recover (no error_detected callback) [ 30.100401] xhci_hcd :3e:00.0: AER: can't recover (no error_detected callback) [ 30.100427] pcieport :00:1d.0: AER: device recovery failed Since AER is disabled in previous patch for a Link in L2/L3 Ready, L2 and L3, also disable DPC here as DPC depends on AER to work. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=215453 Reviewed-by: Mika Westerberg Signed-off-by: Kai-Heng Feng --- v4: - Wording change. v3: - Wording change to make the patch more clear. v2: - Wording change. - Empty line dropped. drivers/pci/pcie/dpc.c | 60 +++--- 1 file changed, 44 insertions(+), 16 deletions(-) diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c index 3e9afee02e8d1..414258967f08e 100644 --- a/drivers/pci/pcie/dpc.c +++ b/drivers/pci/pcie/dpc.c @@ -343,13 +343,33 @@ void pci_dpc_init(struct pci_dev *pdev) } } +static void dpc_enable(struct pcie_device *dev) +{ + struct pci_dev *pdev = dev->port; + u16 ctl; + + pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, &ctl); + ctl = (ctl & 0xfff4) | PCI_EXP_DPC_CTL_EN_FATAL | PCI_EXP_DPC_CTL_INT_EN; + pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, ctl); +} + +static void dpc_disable(struct pcie_device *dev) +{ + struct pci_dev *pdev = dev->port; + u16 ctl; + + pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, &ctl); + ctl &= ~(PCI_EXP_DPC_CTL_EN_FATAL | PCI_EXP_DPC_CTL_INT_EN); + pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, ctl); +} + #define FLAG(x, y) (((x) & (y)) ? '+' : '-') static int dpc_probe(struct pcie_device *dev) { struct pci_dev *pdev = dev->port; struct device *device = &dev->device; int status; - u16 ctl, cap; + u16 cap; if (!pcie_aer_is_native(pdev) && !pcie_ports_dpc_native) return -ENOTSUPP; @@ -364,10 +384,7 @@ static int dpc_probe(struct pcie_device *dev) } pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CAP, &cap); - pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, &ctl); - - ctl = (ctl & 0xfff4) | PCI_EXP_DPC_CTL_EN_FATAL | PCI_EXP_DPC_CTL_INT_EN; - pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, ctl); + dpc_enable(dev); pci_info(pdev, "enabled with IRQ %d\n", dev->irq); pci_info(pdev, "error containment capabilities: Int Msg #%d, RPExt%c PoisonedTLP%c SwTrigger%c RP PIO Log %d, DL_ActiveErr%c\n", @@ -380,22 +397,33 @@ static int dpc_probe(struct pcie_device *dev) return status; } -static void dpc_remove(struct pcie_device *dev) +static int dpc_suspend(struct pcie_device *dev) { - struct pci_dev *pdev = dev->port; - u16 ctl; + dpc_disable(dev); + return 0; +} - pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, &ctl); - ctl &= ~(PCI_EXP_DPC_CTL_EN_FATAL | PCI_EXP_DPC_CTL_INT_EN); - pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, ctl); +static int dpc_resume(struct pcie_device *dev) +{ + dpc_enable(dev); + return 0; +} + +static void dpc_remove(struct pcie_device *dev) +{ + dpc_disable(dev); } static struct pcie_port_service_driver dpcdriver = { - .name = "dpc", - .port_type = PCIE_ANY_PORT, - .service= PCIE_PORT_SERVICE_DPC, - .probe = dpc_probe, - .remove = dpc_remove, + .name = "dpc", + .port_type = PCIE_ANY_PORT, + .service= PCIE_PORT_SERVICE_DPC, + .probe = dpc_probe, + .suspend= dpc_suspend, + .resume = dpc_resume, + .runtime_suspend= dpc_suspend, + .runtime_resume = dpc_resume, + .remove = dpc_remove, }; int __init pcie_dpc_init(void) -- 2.34.1
[PATCH v4 1/2] PCI/AER: Disable AER service when link is in L2/L3 ready, L2 and L3 state
On Intel Alder Lake platforms, Thunderbolt entering D3cold can cause some errors reported by AER: [ 30.100211] pcieport :00:1d.0: AER: Uncorrected (Non-Fatal) error received: :00:1d.0 [ 30.100251] pcieport :00:1d.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID) [ 30.100256] pcieport :00:1d.0: device [8086:7ab0] error status/mask=0010/4000 [ 30.100262] pcieport :00:1d.0:[20] UnsupReq (First) [ 30.100267] pcieport :00:1d.0: AER: TLP Header: 3400 0852 [ 30.100372] thunderbolt :0a:00.0: AER: can't recover (no error_detected callback) [ 30.100401] xhci_hcd :3e:00.0: AER: can't recover (no error_detected callback) [ 30.100427] pcieport :00:1d.0: AER: device recovery failed So disable AER service to avoid the noises from turning power rails on/off when the device is in low power states (D3hot and D3cold), as PCIe Base Spec 5.0, section 5.2 "Link State Power Management" states that TLP and DLLP transmission is disabled for a Link in L2/L3 Ready (D3hot), L2 (D3cold with aux power) and L3 (D3cold). Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=215453 Reviewed-by: Mika Westerberg Signed-off-by: Kai-Heng Feng --- v4: - Explicitly states the spec version. - Wording change. v3: - Remove reference to ACS. - Wording change. v2: - Wording change. drivers/pci/pcie/aer.c | 31 +-- 1 file changed, 25 insertions(+), 6 deletions(-) diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index 9fa1f97e5b270..e4e9d4a3098d7 100644 --- a/drivers/pci/pcie/aer.c +++ b/drivers/pci/pcie/aer.c @@ -1367,6 +1367,22 @@ static int aer_probe(struct pcie_device *dev) return 0; } +static int aer_suspend(struct pcie_device *dev) +{ + struct aer_rpc *rpc = get_service_data(dev); + + aer_disable_rootport(rpc); + return 0; +} + +static int aer_resume(struct pcie_device *dev) +{ + struct aer_rpc *rpc = get_service_data(dev); + + aer_enable_rootport(rpc); + return 0; +} + /** * aer_root_reset - reset Root Port hierarchy, RCEC, or RCiEP * @dev: pointer to Root Port, RCEC, or RCiEP @@ -1433,12 +1449,15 @@ static pci_ers_result_t aer_root_reset(struct pci_dev *dev) } static struct pcie_port_service_driver aerdriver = { - .name = "aer", - .port_type = PCIE_ANY_PORT, - .service= PCIE_PORT_SERVICE_AER, - - .probe = aer_probe, - .remove = aer_remove, + .name = "aer", + .port_type = PCIE_ANY_PORT, + .service= PCIE_PORT_SERVICE_AER, + .probe = aer_probe, + .suspend= aer_suspend, + .resume = aer_resume, + .runtime_suspend= aer_suspend, + .runtime_resume = aer_resume, + .remove = aer_remove, }; /** -- 2.34.1
Re: [PATCH v3 1/2] PCI/AER: Disable AER service when link is in L2/L3 ready, L2 and L3 state
On Thu, Mar 31, 2022 at 3:40 AM Sathyanarayanan Kuppuswamy wrote: > > > > On 3/29/22 1:31 AM, Kai-Heng Feng wrote: > > On some Intel AlderLake platforms, Thunderbolt entering D3cold can cause > > some errors reported by AER: > > [ 30.100211] pcieport :00:1d.0: AER: Uncorrected (Non-Fatal) error > > received: :00:1d.0 > > [ 30.100251] pcieport :00:1d.0: PCIe Bus Error: severity=Uncorrected > > (Non-Fatal), type=Transaction Layer, (Requester ID) > > [ 30.100256] pcieport :00:1d.0: device [8086:7ab0] error > > status/mask=0010/4000 > > [ 30.100262] pcieport :00:1d.0:[20] UnsupReq (First) > > [ 30.100267] pcieport :00:1d.0: AER: TLP Header: 3400 0852 > > > > [ 30.100372] thunderbolt :0a:00.0: AER: can't recover (no > > error_detected callback) > > [ 30.100401] xhci_hcd :3e:00.0: AER: can't recover (no error_detected > > callback) > > [ 30.100427] pcieport :00:1d.0: AER: device recovery failed > > Include details about in which platform you have seen it and whether > this is a generic power issue? _All_ Alder Lake platforms I worked on have this issue. I don't think have hardware to analyze if it's a power issue though. > > > > > So disable AER service to avoid the noises from turning power rails > > on/off when the device is in low power states (D3hot and D3cold), as > > PCIe spec "5.2 Link State Power Management" states that TLP and DLLP > > Also include PCIe specification version number. Will add in next revision. Kai-Heng > > > transmission is disabled for a Link in L2/L3 Ready (D3hot), L2 (D3cold > > with aux power) and L3 (D3cold). > > > > Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=215453 > > Reviewed-by: Mika Westerberg > > Signed-off-by: Kai-Heng Feng > > --- > > v3: > > - Remove reference to ACS. > > - Wording change. > > > > v2: > > - Wording change. > > > > drivers/pci/pcie/aer.c | 31 +-- > > 1 file changed, 25 insertions(+), 6 deletions(-) > > > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c > > index 9fa1f97e5b270..e4e9d4a3098d7 100644 > > --- a/drivers/pci/pcie/aer.c > > +++ b/drivers/pci/pcie/aer.c > > @@ -1367,6 +1367,22 @@ static int aer_probe(struct pcie_device *dev) > > return 0; > > } > > > > +static int aer_suspend(struct pcie_device *dev) > > +{ > > + struct aer_rpc *rpc = get_service_data(dev); > > + > > + aer_disable_rootport(rpc); > > + return 0; > > +} > > + > > +static int aer_resume(struct pcie_device *dev) > > +{ > > + struct aer_rpc *rpc = get_service_data(dev); > > + > > + aer_enable_rootport(rpc); > > + return 0; > > +} > > + > > /** > >* aer_root_reset - reset Root Port hierarchy, RCEC, or RCiEP > >* @dev: pointer to Root Port, RCEC, or RCiEP > > @@ -1433,12 +1449,15 @@ static pci_ers_result_t aer_root_reset(struct > > pci_dev *dev) > > } > > > > static struct pcie_port_service_driver aerdriver = { > > - .name = "aer", > > - .port_type = PCIE_ANY_PORT, > > - .service= PCIE_PORT_SERVICE_AER, > > - > > - .probe = aer_probe, > > - .remove = aer_remove, > > + .name = "aer", > > + .port_type = PCIE_ANY_PORT, > > + .service= PCIE_PORT_SERVICE_AER, > > + .probe = aer_probe, > > + .suspend= aer_suspend, > > + .resume = aer_resume, > > + .runtime_suspend= aer_suspend, > > + .runtime_resume = aer_resume, > > + .remove = aer_remove, > > }; > > > > /** > > -- > Sathyanarayanan Kuppuswamy > Linux Kernel Developer
[PATCH v3 2/2] PCI/DPC: Disable DPC service when link is in L2/L3 ready, L2 and L3 state
On some Intel AlderLake platforms, Thunderbolt entering D3cold can cause some errors reported by AER: [ 30.100211] pcieport :00:1d.0: AER: Uncorrected (Non-Fatal) error received: :00:1d.0 [ 30.100251] pcieport :00:1d.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID) [ 30.100256] pcieport :00:1d.0: device [8086:7ab0] error status/mask=0010/4000 [ 30.100262] pcieport :00:1d.0:[20] UnsupReq (First) [ 30.100267] pcieport :00:1d.0: AER: TLP Header: 3400 0852 [ 30.100372] thunderbolt :0a:00.0: AER: can't recover (no error_detected callback) [ 30.100401] xhci_hcd :3e:00.0: AER: can't recover (no error_detected callback) [ 30.100427] pcieport :00:1d.0: AER: device recovery failed Since AER is disabled in previous patch for a Link in L2/L3 Ready, L2 and L3, also disable DPC here as DPC depends on AER to work. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=215453 Reviewed-by: Mika Westerberg Signed-off-by: Kai-Heng Feng --- v3: - Wording change to make the patch more clear. v2: - Wording change. - Empty line dropped. drivers/pci/pcie/dpc.c | 60 +++--- 1 file changed, 44 insertions(+), 16 deletions(-) diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c index 3e9afee02e8d1..414258967f08e 100644 --- a/drivers/pci/pcie/dpc.c +++ b/drivers/pci/pcie/dpc.c @@ -343,13 +343,33 @@ void pci_dpc_init(struct pci_dev *pdev) } } +static void dpc_enable(struct pcie_device *dev) +{ + struct pci_dev *pdev = dev->port; + u16 ctl; + + pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, &ctl); + ctl = (ctl & 0xfff4) | PCI_EXP_DPC_CTL_EN_FATAL | PCI_EXP_DPC_CTL_INT_EN; + pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, ctl); +} + +static void dpc_disable(struct pcie_device *dev) +{ + struct pci_dev *pdev = dev->port; + u16 ctl; + + pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, &ctl); + ctl &= ~(PCI_EXP_DPC_CTL_EN_FATAL | PCI_EXP_DPC_CTL_INT_EN); + pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, ctl); +} + #define FLAG(x, y) (((x) & (y)) ? '+' : '-') static int dpc_probe(struct pcie_device *dev) { struct pci_dev *pdev = dev->port; struct device *device = &dev->device; int status; - u16 ctl, cap; + u16 cap; if (!pcie_aer_is_native(pdev) && !pcie_ports_dpc_native) return -ENOTSUPP; @@ -364,10 +384,7 @@ static int dpc_probe(struct pcie_device *dev) } pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CAP, &cap); - pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, &ctl); - - ctl = (ctl & 0xfff4) | PCI_EXP_DPC_CTL_EN_FATAL | PCI_EXP_DPC_CTL_INT_EN; - pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, ctl); + dpc_enable(dev); pci_info(pdev, "enabled with IRQ %d\n", dev->irq); pci_info(pdev, "error containment capabilities: Int Msg #%d, RPExt%c PoisonedTLP%c SwTrigger%c RP PIO Log %d, DL_ActiveErr%c\n", @@ -380,22 +397,33 @@ static int dpc_probe(struct pcie_device *dev) return status; } -static void dpc_remove(struct pcie_device *dev) +static int dpc_suspend(struct pcie_device *dev) { - struct pci_dev *pdev = dev->port; - u16 ctl; + dpc_disable(dev); + return 0; +} - pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, &ctl); - ctl &= ~(PCI_EXP_DPC_CTL_EN_FATAL | PCI_EXP_DPC_CTL_INT_EN); - pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, ctl); +static int dpc_resume(struct pcie_device *dev) +{ + dpc_enable(dev); + return 0; +} + +static void dpc_remove(struct pcie_device *dev) +{ + dpc_disable(dev); } static struct pcie_port_service_driver dpcdriver = { - .name = "dpc", - .port_type = PCIE_ANY_PORT, - .service= PCIE_PORT_SERVICE_DPC, - .probe = dpc_probe, - .remove = dpc_remove, + .name = "dpc", + .port_type = PCIE_ANY_PORT, + .service= PCIE_PORT_SERVICE_DPC, + .probe = dpc_probe, + .suspend= dpc_suspend, + .resume = dpc_resume, + .runtime_suspend= dpc_suspend, + .runtime_resume = dpc_resume, + .remove = dpc_remove, }; int __init pcie_dpc_init(void) -- 2.34.1
[PATCH v3 1/2] PCI/AER: Disable AER service when link is in L2/L3 ready, L2 and L3 state
On some Intel AlderLake platforms, Thunderbolt entering D3cold can cause some errors reported by AER: [ 30.100211] pcieport :00:1d.0: AER: Uncorrected (Non-Fatal) error received: :00:1d.0 [ 30.100251] pcieport :00:1d.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID) [ 30.100256] pcieport :00:1d.0: device [8086:7ab0] error status/mask=0010/4000 [ 30.100262] pcieport :00:1d.0:[20] UnsupReq (First) [ 30.100267] pcieport :00:1d.0: AER: TLP Header: 3400 0852 [ 30.100372] thunderbolt :0a:00.0: AER: can't recover (no error_detected callback) [ 30.100401] xhci_hcd :3e:00.0: AER: can't recover (no error_detected callback) [ 30.100427] pcieport :00:1d.0: AER: device recovery failed So disable AER service to avoid the noises from turning power rails on/off when the device is in low power states (D3hot and D3cold), as PCIe spec "5.2 Link State Power Management" states that TLP and DLLP transmission is disabled for a Link in L2/L3 Ready (D3hot), L2 (D3cold with aux power) and L3 (D3cold). Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=215453 Reviewed-by: Mika Westerberg Signed-off-by: Kai-Heng Feng --- v3: - Remove reference to ACS. - Wording change. v2: - Wording change. drivers/pci/pcie/aer.c | 31 +-- 1 file changed, 25 insertions(+), 6 deletions(-) diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index 9fa1f97e5b270..e4e9d4a3098d7 100644 --- a/drivers/pci/pcie/aer.c +++ b/drivers/pci/pcie/aer.c @@ -1367,6 +1367,22 @@ static int aer_probe(struct pcie_device *dev) return 0; } +static int aer_suspend(struct pcie_device *dev) +{ + struct aer_rpc *rpc = get_service_data(dev); + + aer_disable_rootport(rpc); + return 0; +} + +static int aer_resume(struct pcie_device *dev) +{ + struct aer_rpc *rpc = get_service_data(dev); + + aer_enable_rootport(rpc); + return 0; +} + /** * aer_root_reset - reset Root Port hierarchy, RCEC, or RCiEP * @dev: pointer to Root Port, RCEC, or RCiEP @@ -1433,12 +1449,15 @@ static pci_ers_result_t aer_root_reset(struct pci_dev *dev) } static struct pcie_port_service_driver aerdriver = { - .name = "aer", - .port_type = PCIE_ANY_PORT, - .service= PCIE_PORT_SERVICE_AER, - - .probe = aer_probe, - .remove = aer_remove, + .name = "aer", + .port_type = PCIE_ANY_PORT, + .service= PCIE_PORT_SERVICE_AER, + .probe = aer_probe, + .suspend= aer_suspend, + .resume = aer_resume, + .runtime_suspend= aer_suspend, + .runtime_resume = aer_resume, + .remove = aer_remove, }; /** -- 2.34.1
Re: [PATCH v2 1/2] PCI/AER: Disable AER service when link is in L2/L3 ready, L2 and L3 state
On Sun, Mar 20, 2022 at 4:38 AM Sathyanarayanan Kuppuswamy wrote: > > > > On 1/26/22 6:54 PM, Kai-Heng Feng wrote: > > Commit 50310600ebda ("iommu/vt-d: Enable PCI ACS for platform opt in > > hint") enables ACS, and some platforms lose its NVMe after resume from > > Why enabling ACS makes platform lose NVMe? Can you add more details > about the problem? I don't have a hardware analyzer, so the only detail I can provide is the symptom. I believe the affected system was sent Intel, and there wasn't any feedback since then. > > > S3: > > [ 50.947816] pcieport :00:1b.0: DPC: containment event, status:0x1f01 > > source:0x > > [ 50.947817] pcieport :00:1b.0: DPC: unmasked uncorrectable error > > detected > > [ 50.947829] pcieport :00:1b.0: PCIe Bus Error: severity=Uncorrected > > (Non-Fatal), type=Transaction Layer, (Receiver ID) > > [ 50.947830] pcieport :00:1b.0: device [8086:06ac] error > > status/mask=0020/0001 > > [ 50.947831] pcieport :00:1b.0:[21] ACSViol(First) > > [ 50.947841] pcieport :00:1b.0: AER: broadcast error_detected message > > [ 50.947843] nvme nvme0: frozen state error detected, reset controller > > > > It happens right after ACS gets enabled during resume. > > > > There's another case, when Thunderbolt reaches D3cold: > > [ 30.100211] pcieport :00:1d.0: AER: Uncorrected (Non-Fatal) error > > received: :00:1d.0 > > [ 30.100251] pcieport :00:1d.0: PCIe Bus Error: severity=Uncorrected > > (Non-Fatal), type=Transaction Layer, (Requester ID) > > [ 30.100256] pcieport :00:1d.0: device [8086:7ab0] error > > status/mask=0010/4000 > > [ 30.100262] pcieport :00:1d.0:[20] UnsupReq (First) > > [ 30.100267] pcieport :00:1d.0: AER: TLP Header: 3400 0852 > > > > [ 30.100372] thunderbolt :0a:00.0: AER: can't recover (no > > error_detected callback) > > no callback message means one or more devices in the given port does not > support error handler. How is this related to ACS? This case is about D3cold, not related to ACS. And no error_detected is just part of the message. The whole AER message is more important. Kai-Heng > > > [ 30.100401] xhci_hcd :3e:00.0: AER: can't recover (no error_detected > > callback) > > [ 30.100427] pcieport :00:1d.0: AER: device recovery failed > > > > So disable AER service to avoid the noises from turning power rails > > on/off when the device is in low power states (D3hot and D3cold), as > > PCIe spec "5.2 Link State Power Management" states that TLP and DLLP > > transmission is disabled for a Link in L2/L3 Ready (D3hot), L2 (D3cold > > with aux power) and L3 (D3cold). > > > > Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=209149 > > Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=215453 > > Fixes: 50310600ebda ("iommu/vt-d: Enable PCI ACS for platform opt in hint") > > Signed-off-by: Kai-Heng Feng > > --- > > v2: > > - Wording change. > > > > drivers/pci/pcie/aer.c | 31 +-- > > 1 file changed, 25 insertions(+), 6 deletions(-) > > > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c > > index 9fa1f97e5b270..e4e9d4a3098d7 100644 > > --- a/drivers/pci/pcie/aer.c > > +++ b/drivers/pci/pcie/aer.c > > @@ -1367,6 +1367,22 @@ static int aer_probe(struct pcie_device *dev) > > return 0; > > } > > > > +static int aer_suspend(struct pcie_device *dev) > > +{ > > + struct aer_rpc *rpc = get_service_data(dev); > > + > > + aer_disable_rootport(rpc); > > + return 0; > > +} > > + > > +static int aer_resume(struct pcie_device *dev) > > +{ > > + struct aer_rpc *rpc = get_service_data(dev); > > + > > + aer_enable_rootport(rpc); > > + return 0; > > +} > > + > > /** > >* aer_root_reset - reset Root Port hierarchy, RCEC, or RCiEP > >* @dev: pointer to Root Port, RCEC, or RCiEP > > @@ -1433,12 +1449,15 @@ static pci_ers_result_t aer_root_reset(struct > > pci_dev *dev) > > } > > > > static struct pcie_port_service_driver aerdriver = { > > - .name = "aer", > > - .port_type = PCIE_ANY_PORT, > > - .service= PCIE_PORT_SERVICE_AER, > > - > > - .probe = aer_probe, > > - .remove = aer_remove, > > + .name = "aer", > > + .port_type = PCIE_ANY_PORT, > > + .service= PCIE_PORT_SERVICE_AER, > > + .probe = aer_probe, > > + .suspend= aer_suspend, > > + .resume = aer_resume, > > + .runtime_suspend= aer_suspend, > > + .runtime_resume = aer_resume, > > + .remove = aer_remove, > > }; > > > > /** > > -- > Sathyanarayanan Kuppuswamy > Linux Kernel Developer
Re: [PATCH v2 1/2] PCI/AER: Disable AER service when link is in L2/L3 ready, L2 and L3 state
On Fri, Jan 28, 2022 at 10:57 AM Lu Baolu wrote: > > On 1/27/22 7:14 PM, Kai-Heng Feng wrote: > > On Thu, Jan 27, 2022 at 3:01 PM Lu Baolu wrote: > >> > >> On 2022/1/27 10:54, Kai-Heng Feng wrote: > >>> Commit 50310600ebda ("iommu/vt-d: Enable PCI ACS for platform opt in > >>> hint") enables ACS, and some platforms lose its NVMe after resume from > >>> S3: > >>> [ 50.947816] pcieport :00:1b.0: DPC: containment event, > >>> status:0x1f01 source:0x > >>> [ 50.947817] pcieport :00:1b.0: DPC: unmasked uncorrectable error > >>> detected > >>> [ 50.947829] pcieport :00:1b.0: PCIe Bus Error: > >>> severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Receiver ID) > >>> [ 50.947830] pcieport :00:1b.0: device [8086:06ac] error > >>> status/mask=0020/0001 > >>> [ 50.947831] pcieport :00:1b.0:[21] ACSViol > >>> (First) > >>> [ 50.947841] pcieport :00:1b.0: AER: broadcast error_detected > >>> message > >>> [ 50.947843] nvme nvme0: frozen state error detected, reset controller > >>> > >>> It happens right after ACS gets enabled during resume. > >>> > >>> There's another case, when Thunderbolt reaches D3cold: > >>> [ 30.100211] pcieport :00:1d.0: AER: Uncorrected (Non-Fatal) error > >>> received: :00:1d.0 > >>> [ 30.100251] pcieport :00:1d.0: PCIe Bus Error: > >>> severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID) > >>> [ 30.100256] pcieport :00:1d.0: device [8086:7ab0] error > >>> status/mask=0010/4000 > >>> [ 30.100262] pcieport :00:1d.0:[20] UnsupReq > >>> (First) > >>> [ 30.100267] pcieport :00:1d.0: AER: TLP Header: 3400 > >>> 0852 > >>> [ 30.100372] thunderbolt :0a:00.0: AER: can't recover (no > >>> error_detected callback) > >>> [ 30.100401] xhci_hcd :3e:00.0: AER: can't recover (no > >>> error_detected callback) > >>> [ 30.100427] pcieport :00:1d.0: AER: device recovery failed > >>> > >>> So disable AER service to avoid the noises from turning power rails > >>> on/off when the device is in low power states (D3hot and D3cold), as > >>> PCIe spec "5.2 Link State Power Management" states that TLP and DLLP > >>> transmission is disabled for a Link in L2/L3 Ready (D3hot), L2 (D3cold > >>> with aux power) and L3 (D3cold). > >>> > >>> Bugzilla:https://bugzilla.kernel.org/show_bug.cgi?id=209149 > >>> Bugzilla:https://bugzilla.kernel.org/show_bug.cgi?id=215453 > >>> Fixes: 50310600ebda ("iommu/vt-d: Enable PCI ACS for platform opt in > >>> hint") > >> > >> I don't know what this fix has to do with the commit 50310600ebda. > > > > Commit 50310600ebda only exposed the underlying issue. Do you think > > "Fixes:" tag should change to other commits? > > > >> Commit 50310600ebda only makes sure that PCI ACS is enabled whenever > >> Intel IOMMU is on. Before this commit, PCI ACS could also be enabled > >> and result in the same problem. Or anything I missed? > > > > The system in question didn't enable ACS before commit 50310600ebda. > > This commit exposed the issue on your configuration doesn't mean the > fix should be back ported as far as that commit. I believe if you add > intel-iommu=on in the kernel parameter, the issue still exists even you > revert commit 50310600ebda or checkout a tag before it. That's true. I guess it's better to drop the "Fixes:" tag. Bjorn, should I send another version of it? Kai-Heng > > Best regards, > baolu
Re: [PATCH v2 1/2] PCI/AER: Disable AER service when link is in L2/L3 ready, L2 and L3 state
On Thu, Jan 27, 2022 at 3:01 PM Lu Baolu wrote: > > On 2022/1/27 10:54, Kai-Heng Feng wrote: > > Commit 50310600ebda ("iommu/vt-d: Enable PCI ACS for platform opt in > > hint") enables ACS, and some platforms lose its NVMe after resume from > > S3: > > [ 50.947816] pcieport :00:1b.0: DPC: containment event, status:0x1f01 > > source:0x > > [ 50.947817] pcieport :00:1b.0: DPC: unmasked uncorrectable error > > detected > > [ 50.947829] pcieport :00:1b.0: PCIe Bus Error: severity=Uncorrected > > (Non-Fatal), type=Transaction Layer, (Receiver ID) > > [ 50.947830] pcieport :00:1b.0: device [8086:06ac] error > > status/mask=0020/0001 > > [ 50.947831] pcieport :00:1b.0:[21] ACSViol(First) > > [ 50.947841] pcieport :00:1b.0: AER: broadcast error_detected message > > [ 50.947843] nvme nvme0: frozen state error detected, reset controller > > > > It happens right after ACS gets enabled during resume. > > > > There's another case, when Thunderbolt reaches D3cold: > > [ 30.100211] pcieport :00:1d.0: AER: Uncorrected (Non-Fatal) error > > received: :00:1d.0 > > [ 30.100251] pcieport :00:1d.0: PCIe Bus Error: severity=Uncorrected > > (Non-Fatal), type=Transaction Layer, (Requester ID) > > [ 30.100256] pcieport :00:1d.0: device [8086:7ab0] error > > status/mask=0010/4000 > > [ 30.100262] pcieport :00:1d.0:[20] UnsupReq (First) > > [ 30.100267] pcieport :00:1d.0: AER: TLP Header: 3400 0852 > > > > [ 30.100372] thunderbolt :0a:00.0: AER: can't recover (no > > error_detected callback) > > [ 30.100401] xhci_hcd :3e:00.0: AER: can't recover (no error_detected > > callback) > > [ 30.100427] pcieport :00:1d.0: AER: device recovery failed > > > > So disable AER service to avoid the noises from turning power rails > > on/off when the device is in low power states (D3hot and D3cold), as > > PCIe spec "5.2 Link State Power Management" states that TLP and DLLP > > transmission is disabled for a Link in L2/L3 Ready (D3hot), L2 (D3cold > > with aux power) and L3 (D3cold). > > > > Bugzilla:https://bugzilla.kernel.org/show_bug.cgi?id=209149 > > Bugzilla:https://bugzilla.kernel.org/show_bug.cgi?id=215453 > > Fixes: 50310600ebda ("iommu/vt-d: Enable PCI ACS for platform opt in hint") > > I don't know what this fix has to do with the commit 50310600ebda. Commit 50310600ebda only exposed the underlying issue. Do you think "Fixes:" tag should change to other commits? > Commit 50310600ebda only makes sure that PCI ACS is enabled whenever > Intel IOMMU is on. Before this commit, PCI ACS could also be enabled > and result in the same problem. Or anything I missed? The system in question didn't enable ACS before commit 50310600ebda. Kai-Heng > > Best regards, > baolu
[PATCH v2 1/2] PCI/AER: Disable AER service when link is in L2/L3 ready, L2 and L3 state
Commit 50310600ebda ("iommu/vt-d: Enable PCI ACS for platform opt in hint") enables ACS, and some platforms lose its NVMe after resume from S3: [ 50.947816] pcieport :00:1b.0: DPC: containment event, status:0x1f01 source:0x [ 50.947817] pcieport :00:1b.0: DPC: unmasked uncorrectable error detected [ 50.947829] pcieport :00:1b.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Receiver ID) [ 50.947830] pcieport :00:1b.0: device [8086:06ac] error status/mask=0020/0001 [ 50.947831] pcieport :00:1b.0:[21] ACSViol(First) [ 50.947841] pcieport :00:1b.0: AER: broadcast error_detected message [ 50.947843] nvme nvme0: frozen state error detected, reset controller It happens right after ACS gets enabled during resume. There's another case, when Thunderbolt reaches D3cold: [ 30.100211] pcieport :00:1d.0: AER: Uncorrected (Non-Fatal) error received: :00:1d.0 [ 30.100251] pcieport :00:1d.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID) [ 30.100256] pcieport :00:1d.0: device [8086:7ab0] error status/mask=0010/4000 [ 30.100262] pcieport :00:1d.0:[20] UnsupReq (First) [ 30.100267] pcieport :00:1d.0: AER: TLP Header: 3400 0852 [ 30.100372] thunderbolt :0a:00.0: AER: can't recover (no error_detected callback) [ 30.100401] xhci_hcd :3e:00.0: AER: can't recover (no error_detected callback) [ 30.100427] pcieport :00:1d.0: AER: device recovery failed So disable AER service to avoid the noises from turning power rails on/off when the device is in low power states (D3hot and D3cold), as PCIe spec "5.2 Link State Power Management" states that TLP and DLLP transmission is disabled for a Link in L2/L3 Ready (D3hot), L2 (D3cold with aux power) and L3 (D3cold). Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=209149 Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=215453 Fixes: 50310600ebda ("iommu/vt-d: Enable PCI ACS for platform opt in hint") Signed-off-by: Kai-Heng Feng --- v2: - Wording change. drivers/pci/pcie/aer.c | 31 +-- 1 file changed, 25 insertions(+), 6 deletions(-) diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index 9fa1f97e5b270..e4e9d4a3098d7 100644 --- a/drivers/pci/pcie/aer.c +++ b/drivers/pci/pcie/aer.c @@ -1367,6 +1367,22 @@ static int aer_probe(struct pcie_device *dev) return 0; } +static int aer_suspend(struct pcie_device *dev) +{ + struct aer_rpc *rpc = get_service_data(dev); + + aer_disable_rootport(rpc); + return 0; +} + +static int aer_resume(struct pcie_device *dev) +{ + struct aer_rpc *rpc = get_service_data(dev); + + aer_enable_rootport(rpc); + return 0; +} + /** * aer_root_reset - reset Root Port hierarchy, RCEC, or RCiEP * @dev: pointer to Root Port, RCEC, or RCiEP @@ -1433,12 +1449,15 @@ static pci_ers_result_t aer_root_reset(struct pci_dev *dev) } static struct pcie_port_service_driver aerdriver = { - .name = "aer", - .port_type = PCIE_ANY_PORT, - .service= PCIE_PORT_SERVICE_AER, - - .probe = aer_probe, - .remove = aer_remove, + .name = "aer", + .port_type = PCIE_ANY_PORT, + .service= PCIE_PORT_SERVICE_AER, + .probe = aer_probe, + .suspend= aer_suspend, + .resume = aer_resume, + .runtime_suspend= aer_suspend, + .runtime_resume = aer_resume, + .remove = aer_remove, }; /** -- 2.33.1
[PATCH v2 2/2] PCI/DPC: Disable DPC service when link is in L2/L3 ready, L2 and L3 state
Since TLP and DLLP transmission is disabled for a Link in L2/L3 Ready, L2 and L3 (i.e. device in D3hot and D3cold), and DPC depends on AER, so also disable DPC here. Signed-off-by: Kai-Heng Feng --- v2: - Wording change. - Empty line dropped. drivers/pci/pcie/dpc.c | 60 +++--- 1 file changed, 44 insertions(+), 16 deletions(-) diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c index 3e9afee02e8d1..414258967f08e 100644 --- a/drivers/pci/pcie/dpc.c +++ b/drivers/pci/pcie/dpc.c @@ -343,13 +343,33 @@ void pci_dpc_init(struct pci_dev *pdev) } } +static void dpc_enable(struct pcie_device *dev) +{ + struct pci_dev *pdev = dev->port; + u16 ctl; + + pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, &ctl); + ctl = (ctl & 0xfff4) | PCI_EXP_DPC_CTL_EN_FATAL | PCI_EXP_DPC_CTL_INT_EN; + pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, ctl); +} + +static void dpc_disable(struct pcie_device *dev) +{ + struct pci_dev *pdev = dev->port; + u16 ctl; + + pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, &ctl); + ctl &= ~(PCI_EXP_DPC_CTL_EN_FATAL | PCI_EXP_DPC_CTL_INT_EN); + pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, ctl); +} + #define FLAG(x, y) (((x) & (y)) ? '+' : '-') static int dpc_probe(struct pcie_device *dev) { struct pci_dev *pdev = dev->port; struct device *device = &dev->device; int status; - u16 ctl, cap; + u16 cap; if (!pcie_aer_is_native(pdev) && !pcie_ports_dpc_native) return -ENOTSUPP; @@ -364,10 +384,7 @@ static int dpc_probe(struct pcie_device *dev) } pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CAP, &cap); - pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, &ctl); - - ctl = (ctl & 0xfff4) | PCI_EXP_DPC_CTL_EN_FATAL | PCI_EXP_DPC_CTL_INT_EN; - pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, ctl); + dpc_enable(dev); pci_info(pdev, "enabled with IRQ %d\n", dev->irq); pci_info(pdev, "error containment capabilities: Int Msg #%d, RPExt%c PoisonedTLP%c SwTrigger%c RP PIO Log %d, DL_ActiveErr%c\n", @@ -380,22 +397,33 @@ static int dpc_probe(struct pcie_device *dev) return status; } -static void dpc_remove(struct pcie_device *dev) +static int dpc_suspend(struct pcie_device *dev) { - struct pci_dev *pdev = dev->port; - u16 ctl; + dpc_disable(dev); + return 0; +} - pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, &ctl); - ctl &= ~(PCI_EXP_DPC_CTL_EN_FATAL | PCI_EXP_DPC_CTL_INT_EN); - pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, ctl); +static int dpc_resume(struct pcie_device *dev) +{ + dpc_enable(dev); + return 0; +} + +static void dpc_remove(struct pcie_device *dev) +{ + dpc_disable(dev); } static struct pcie_port_service_driver dpcdriver = { - .name = "dpc", - .port_type = PCIE_ANY_PORT, - .service= PCIE_PORT_SERVICE_DPC, - .probe = dpc_probe, - .remove = dpc_remove, + .name = "dpc", + .port_type = PCIE_ANY_PORT, + .service= PCIE_PORT_SERVICE_DPC, + .probe = dpc_probe, + .suspend= dpc_suspend, + .resume = dpc_resume, + .runtime_suspend= dpc_suspend, + .runtime_resume = dpc_resume, + .remove = dpc_remove, }; int __init pcie_dpc_init(void) -- 2.33.1
Re: [PATCH 2/2] PCI/DPC: Disable DPC when link is in L2/L3 ready, L2 and L3 state
On Wed, Jan 26, 2022 at 7:10 PM Mika Westerberg wrote: > > Hi, > > On Wed, Jan 26, 2022 at 03:18:52PM +0800, Kai-Heng Feng wrote: > > Since TLP and DLLP transmission is disabled for a Link in L2/L3 Ready, > > L2 and L3, and DPC depends on AER, so also disable DPC here. > > Here too I think it is good to mention that the DPC "service" never > implemented the PM hooks in the first place Will amend the commit message a bit. > > > Signed-off-by: Kai-Heng Feng > > One minor comment below, but other than that looks good, > > Reviewed-by: Mika Westerberg > > > --- > > drivers/pci/pcie/dpc.c | 61 +++--- > > 1 file changed, 45 insertions(+), 16 deletions(-) > > > > diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c > > index 3e9afee02e8d1..9585c10b7c577 100644 > > --- a/drivers/pci/pcie/dpc.c > > +++ b/drivers/pci/pcie/dpc.c > > @@ -343,13 +343,34 @@ void pci_dpc_init(struct pci_dev *pdev) > > } > > } > > > > +static void dpc_enable(struct pcie_device *dev) > > +{ > > + struct pci_dev *pdev = dev->port; > > + u16 ctl; > > + > > + pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, &ctl); > > + > > Drop the empty line here. OK, will do. Kai-Heng > > > + ctl = (ctl & 0xfff4) | PCI_EXP_DPC_CTL_EN_FATAL | > > PCI_EXP_DPC_CTL_INT_EN; > > + pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, ctl); > > +} > > + > > +static void dpc_disable(struct pcie_device *dev) > > +{ > > + struct pci_dev *pdev = dev->port; > > + u16 ctl; > > + > > + pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, &ctl); > > + ctl &= ~(PCI_EXP_DPC_CTL_EN_FATAL | PCI_EXP_DPC_CTL_INT_EN); > > + pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, ctl); > > +}
Re: [PATCH 1/2] PCI/AER: Disable AER when link is in L2/L3 ready, L2 and L3 state
On Wed, Jan 26, 2022 at 7:03 PM Mika Westerberg wrote: > > Hi, > > On Wed, Jan 26, 2022 at 03:18:51PM +0800, Kai-Heng Feng wrote: > > Commit 50310600ebda ("iommu/vt-d: Enable PCI ACS for platform opt in > > hint") enables ACS, and some platforms lose its NVMe after resume from > > S3: > > [ 50.947816] pcieport :00:1b.0: DPC: containment event, status:0x1f01 > > source:0x > > [ 50.947817] pcieport :00:1b.0: DPC: unmasked uncorrectable error > > detected > > [ 50.947829] pcieport :00:1b.0: PCIe Bus Error: severity=Uncorrected > > (Non-Fatal), type=Transaction Layer, (Receiver ID) > > [ 50.947830] pcieport :00:1b.0: device [8086:06ac] error > > status/mask=0020/0001 > > [ 50.947831] pcieport :00:1b.0:[21] ACSViol(First) > > [ 50.947841] pcieport :00:1b.0: AER: broadcast error_detected message > > [ 50.947843] nvme nvme0: frozen state error detected, reset controller > > > > It happens right after ACS gets enabled during resume. > > Is this really because of the above commit of due the fact that AER > "service" never implemented the PM hooks in the first place ;-) >From what I can understand, all services other than PME should be disabled during suspend. For example, should we convert commit a697f072f5da8 ("PCI: Disable PTM during suspend to save power") to PM hooks in PTM service? > > > > There's another case, when Thunderbolt reaches D3cold: > > [ 30.100211] pcieport :00:1d.0: AER: Uncorrected (Non-Fatal) error > > received: :00:1d.0 > > [ 30.100251] pcieport :00:1d.0: PCIe Bus Error: severity=Uncorrected > > (Non-Fatal), type=Transaction Layer, (Requester ID) > > [ 30.100256] pcieport :00:1d.0: device [8086:7ab0] error > > status/mask=0010/4000 > > [ 30.100262] pcieport :00:1d.0:[20] UnsupReq (First) > > [ 30.100267] pcieport :00:1d.0: AER: TLP Header: 3400 0852 > > > > [ 30.100372] thunderbolt :0a:00.0: AER: can't recover (no > > error_detected callback) > > [ 30.100401] xhci_hcd :3e:00.0: AER: can't recover (no error_detected > > callback) > > [ 30.100427] pcieport :00:1d.0: AER: device recovery failed > > > > Since PCIe spec "5.2 Link State Power Management" states that TLP and DLLP > > transmission is disabled for a Link in L2/L3 Ready (D3hot), L2 (D3cold with > > aux > > power) and L3 (D3cold), so disable AER to avoid the noises from turning > > power > > rails on/off. > > I think more accurate here is to say when the topology behind the root > port enters low power states. Reason here is that you can't really tell > from the OS standpoint whether the link went into L1 or L2/3 before the > ACPI power resource is turned off. OK, let me change the commit message a bit. My intention is to state "transmission is disabled" in those Link states. Kai-Heng > > > Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=209149 > > Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=215453 > > Fixes: 50310600ebda ("iommu/vt-d: Enable PCI ACS for platform opt in hint") > > Signed-off-by: Kai-Heng Feng > > Thanks for fixing this! > > Reviewed-by: Mika Westerberg
[PATCH 2/2] PCI/DPC: Disable DPC when link is in L2/L3 ready, L2 and L3 state
Since TLP and DLLP transmission is disabled for a Link in L2/L3 Ready, L2 and L3, and DPC depends on AER, so also disable DPC here. Signed-off-by: Kai-Heng Feng --- drivers/pci/pcie/dpc.c | 61 +++--- 1 file changed, 45 insertions(+), 16 deletions(-) diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c index 3e9afee02e8d1..9585c10b7c577 100644 --- a/drivers/pci/pcie/dpc.c +++ b/drivers/pci/pcie/dpc.c @@ -343,13 +343,34 @@ void pci_dpc_init(struct pci_dev *pdev) } } +static void dpc_enable(struct pcie_device *dev) +{ + struct pci_dev *pdev = dev->port; + u16 ctl; + + pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, &ctl); + + ctl = (ctl & 0xfff4) | PCI_EXP_DPC_CTL_EN_FATAL | PCI_EXP_DPC_CTL_INT_EN; + pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, ctl); +} + +static void dpc_disable(struct pcie_device *dev) +{ + struct pci_dev *pdev = dev->port; + u16 ctl; + + pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, &ctl); + ctl &= ~(PCI_EXP_DPC_CTL_EN_FATAL | PCI_EXP_DPC_CTL_INT_EN); + pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, ctl); +} + #define FLAG(x, y) (((x) & (y)) ? '+' : '-') static int dpc_probe(struct pcie_device *dev) { struct pci_dev *pdev = dev->port; struct device *device = &dev->device; int status; - u16 ctl, cap; + u16 cap; if (!pcie_aer_is_native(pdev) && !pcie_ports_dpc_native) return -ENOTSUPP; @@ -364,10 +385,7 @@ static int dpc_probe(struct pcie_device *dev) } pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CAP, &cap); - pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, &ctl); - - ctl = (ctl & 0xfff4) | PCI_EXP_DPC_CTL_EN_FATAL | PCI_EXP_DPC_CTL_INT_EN; - pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, ctl); + dpc_enable(dev); pci_info(pdev, "enabled with IRQ %d\n", dev->irq); pci_info(pdev, "error containment capabilities: Int Msg #%d, RPExt%c PoisonedTLP%c SwTrigger%c RP PIO Log %d, DL_ActiveErr%c\n", @@ -380,22 +398,33 @@ static int dpc_probe(struct pcie_device *dev) return status; } -static void dpc_remove(struct pcie_device *dev) +static int dpc_suspend(struct pcie_device *dev) { - struct pci_dev *pdev = dev->port; - u16 ctl; + dpc_disable(dev); + return 0; +} - pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, &ctl); - ctl &= ~(PCI_EXP_DPC_CTL_EN_FATAL | PCI_EXP_DPC_CTL_INT_EN); - pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, ctl); +static int dpc_resume(struct pcie_device *dev) +{ + dpc_enable(dev); + return 0; +} + +static void dpc_remove(struct pcie_device *dev) +{ + dpc_disable(dev); } static struct pcie_port_service_driver dpcdriver = { - .name = "dpc", - .port_type = PCIE_ANY_PORT, - .service= PCIE_PORT_SERVICE_DPC, - .probe = dpc_probe, - .remove = dpc_remove, + .name = "dpc", + .port_type = PCIE_ANY_PORT, + .service= PCIE_PORT_SERVICE_DPC, + .probe = dpc_probe, + .suspend= dpc_suspend, + .resume = dpc_resume, + .runtime_suspend= dpc_suspend, + .runtime_resume = dpc_resume, + .remove = dpc_remove, }; int __init pcie_dpc_init(void) -- 2.33.1
[PATCH 1/2] PCI/AER: Disable AER when link is in L2/L3 ready, L2 and L3 state
Commit 50310600ebda ("iommu/vt-d: Enable PCI ACS for platform opt in hint") enables ACS, and some platforms lose its NVMe after resume from S3: [ 50.947816] pcieport :00:1b.0: DPC: containment event, status:0x1f01 source:0x [ 50.947817] pcieport :00:1b.0: DPC: unmasked uncorrectable error detected [ 50.947829] pcieport :00:1b.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Receiver ID) [ 50.947830] pcieport :00:1b.0: device [8086:06ac] error status/mask=0020/0001 [ 50.947831] pcieport :00:1b.0:[21] ACSViol(First) [ 50.947841] pcieport :00:1b.0: AER: broadcast error_detected message [ 50.947843] nvme nvme0: frozen state error detected, reset controller It happens right after ACS gets enabled during resume. There's another case, when Thunderbolt reaches D3cold: [ 30.100211] pcieport :00:1d.0: AER: Uncorrected (Non-Fatal) error received: :00:1d.0 [ 30.100251] pcieport :00:1d.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Requester ID) [ 30.100256] pcieport :00:1d.0: device [8086:7ab0] error status/mask=0010/4000 [ 30.100262] pcieport :00:1d.0:[20] UnsupReq (First) [ 30.100267] pcieport :00:1d.0: AER: TLP Header: 3400 0852 [ 30.100372] thunderbolt :0a:00.0: AER: can't recover (no error_detected callback) [ 30.100401] xhci_hcd :3e:00.0: AER: can't recover (no error_detected callback) [ 30.100427] pcieport :00:1d.0: AER: device recovery failed Since PCIe spec "5.2 Link State Power Management" states that TLP and DLLP transmission is disabled for a Link in L2/L3 Ready (D3hot), L2 (D3cold with aux power) and L3 (D3cold), so disable AER to avoid the noises from turning power rails on/off. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=209149 Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=215453 Fixes: 50310600ebda ("iommu/vt-d: Enable PCI ACS for platform opt in hint") Signed-off-by: Kai-Heng Feng --- drivers/pci/pcie/aer.c | 31 +-- 1 file changed, 25 insertions(+), 6 deletions(-) diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index 9fa1f97e5b270..e4e9d4a3098d7 100644 --- a/drivers/pci/pcie/aer.c +++ b/drivers/pci/pcie/aer.c @@ -1367,6 +1367,22 @@ static int aer_probe(struct pcie_device *dev) return 0; } +static int aer_suspend(struct pcie_device *dev) +{ + struct aer_rpc *rpc = get_service_data(dev); + + aer_disable_rootport(rpc); + return 0; +} + +static int aer_resume(struct pcie_device *dev) +{ + struct aer_rpc *rpc = get_service_data(dev); + + aer_enable_rootport(rpc); + return 0; +} + /** * aer_root_reset - reset Root Port hierarchy, RCEC, or RCiEP * @dev: pointer to Root Port, RCEC, or RCiEP @@ -1433,12 +1449,15 @@ static pci_ers_result_t aer_root_reset(struct pci_dev *dev) } static struct pcie_port_service_driver aerdriver = { - .name = "aer", - .port_type = PCIE_ANY_PORT, - .service= PCIE_PORT_SERVICE_AER, - - .probe = aer_probe, - .remove = aer_remove, + .name = "aer", + .port_type = PCIE_ANY_PORT, + .service= PCIE_PORT_SERVICE_AER, + .probe = aer_probe, + .suspend= aer_suspend, + .resume = aer_resume, + .runtime_suspend= aer_suspend, + .runtime_resume = aer_resume, + .remove = aer_remove, }; /** -- 2.33.1
Re: [PATCH 1/2] PCI/AER: Disable AER interrupt during suspend
On Fri, Jul 23, 2021 at 1:24 PM Christoph Hellwig wrote: > > On Thu, Jul 22, 2021 at 05:23:51PM -0500, Bjorn Helgaas wrote: > > Marking both of these as "not applicable" for now because I don't > > think we really understand what's going on. > > > > Apparently a DMA occurs during suspend or resume and triggers an ACS > > violation. I don't think think such a DMA should occur in the first > > place. > > > > Or maybe, since you say the problem happens right after ACS is enabled > > during resume, we're doing the ACS enable incorrectly? Although I > > would think we should not be doing DMA at the same time we're enabling > > ACS, either. > > > > If this really is a system firmware issue, both HP and Dell should > > have the knowledge and equipment to figure out what's going on. > > DMA on resume sounds really odd. OTOH the below mentioned case of > a DMA during suspend seems very like in some setup. NVMe has the > concept of a host memory buffer (HMB) that allows the PCIe device > to use arbitrary host memory for internal purposes. Combine this > with the "Storage D3" misfeature in modern x86 platforms that force > a slot into d3cold without consulting the driver first and you'd see > symptoms like this. Another case would be the NVMe equivalent of the > AER which could lead to a completion without host activity. The issue can also be observed on non-HMB NVMe. > > We now have quirks in the ACPI layer and NVMe to fully shut down the > NVMe controllers on these messed up systems with the "Storage D3" > misfeature which should avoid such "spurious" DMAs at the cost of > wearning out the device much faster. Since the issue is on S3, I think the NVMe always fully shuts down. Kai-Heng
Re: [PATCH] PCI: Try to find two continuous regions for child resource
On Tue, Mar 30, 2021 at 12:23 AM Bjorn Helgaas wrote: > > On Mon, Mar 29, 2021 at 04:47:59PM +0800, Kai-Heng Feng wrote: > > Built-in grahpics on HP EliteDesk 805 G6 doesn't work because graphics > > can't get the BAR it needs: > > [0.611504] pci_bus :00: root bus resource [mem > > 0x1002020-0x100303f window] > > [0.611505] pci_bus :00: root bus resource [mem > > 0x1003040-0x100401f window] > > ... > > [0.638083] pci :00:08.1: bridge window [mem 0xd200-0xd23f] > > [0.638086] pci :00:08.1: bridge window [mem > > 0x1003000-0x100401f 64bit pref] > > [0.962086] pci :00:08.1: can't claim BAR 15 [mem > > 0x1003000-0x100401f 64bit pref]: no compatible bridge window > > [0.962086] pci :00:08.1: [mem 0x1003000-0x100401f 64bit > > pref] clipped to [mem 0x1003000-0x100303f 64bit pref] > > [0.962086] pci :00:08.1: bridge window [mem > > 0x1003000-0x100303f 64bit pref] > > [0.962086] pci :07:00.0: can't claim BAR 0 [mem > > 0x1003000-0x1003fff 64bit pref]: no compatible bridge window > > [0.962086] pci :07:00.0: can't claim BAR 2 [mem > > 0x1004000-0x100401f 64bit pref]: no compatible bridge window > > > > However, the root bus has two continuous regions that can contain the > > child resource requested. > > > > So try to find another parent region if two regions are continuous and > > can contain child resource. This change makes the grahpics works on the > > system in question. > > The BIOS description of PCI0 is interesting: > > pci_bus :00: root bus resource [mem 0x100-0x100201f window] > pci_bus :00: root bus resource [mem 0x1002020-0x100303f window] > pci_bus :00: root bus resource [mem 0x1003040-0x100401f window] > > So the PCI0 _CRS apparently gave us: > > [mem 0x100-0x100201f] size 0x2020 (512MB + 2MB) > [mem 0x1002020-0x100303f] size 0x1020 (256MB + 2MB) > [mem 0x1003040-0x100401f] size 0x0fe0 (254MB) > > These are all contiguous, so we'd have no problem if we coalesced them > into a single window: > > [mem 0x100-0x100401f window] size 0x4020 (1GB + 2MB) > > I think we currently keep these root bus resources separate because if > we ever support _SRS for host bridges, the argument we give to _SRS > must be exactly the same format as what we got from _CRS (see ACPI > v6.3, sec 6.2.16, and pnpacpi_set_resources()). > > pnpacpi_encode_resources() is currently very simple-minded and copies > each device resource back into a single _SRS entry. But (1) we don't > support _SRS for host bridges, and (2) if we ever do, we can make > pnpacpi_encode_resources() smarter so it breaks things back up. > > So I think we should try to fix this by coalescing these adjacent > resources from _CRS so we end up with a single root bus resource that > covers all contiguous regions. Thanks for the tip! Working on v2 patch. > > Typos, etc: > - No need for the timestamps; they're not relevant to the problem. > - s/grahpics/graphics/ (two occurrences above) > - s/continuous/contiguous/ (three occurrences above) Will also update those in v2. Kai-Heng > > > Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=212013 > > Signed-off-by: Kai-Heng Feng > > --- > > arch/microblaze/pci/pci-common.c | 4 +-- > > arch/powerpc/kernel/pci-common.c | 8 ++--- > > arch/sparc/kernel/pci.c | 4 +-- > > drivers/pci/pci.c| 60 +++- > > drivers/pci/setup-res.c | 21 +++ > > drivers/pcmcia/rsrc_nonstatic.c | 4 +-- > > include/linux/pci.h | 6 ++-- > > 7 files changed, 80 insertions(+), 27 deletions(-) > > > > diff --git a/arch/microblaze/pci/pci-common.c > > b/arch/microblaze/pci/pci-common.c > > index 557585f1be41..8e65832fb510 100644 > > --- a/arch/microblaze/pci/pci-common.c > > +++ b/arch/microblaze/pci/pci-common.c > > @@ -669,7 +669,7 @@ static void pcibios_allocate_bus_resources(struct > > pci_bus *bus) > > { > > struct pci_bus *b; > > int i; > > - struct resource *res, *pr; > > + struct resource *res, *pr = NULL; > > > > pr_debug("PCI: Allocating bus resources for %04x:%02x...\n", > >pci_domain_nr(bus), bus->number); > > @@ -688,7 +688,7 @@ static void pcibios_allocate_bus_resources(struct > > pci_bus *bus) > >
[PATCH] PCI: Try to find two continuous regions for child resource
Built-in grahpics on HP EliteDesk 805 G6 doesn't work because graphics can't get the BAR it needs: [0.611504] pci_bus :00: root bus resource [mem 0x1002020-0x100303f window] [0.611505] pci_bus :00: root bus resource [mem 0x1003040-0x100401f window] ... [0.638083] pci :00:08.1: bridge window [mem 0xd200-0xd23f] [0.638086] pci :00:08.1: bridge window [mem 0x1003000-0x100401f 64bit pref] [0.962086] pci :00:08.1: can't claim BAR 15 [mem 0x1003000-0x100401f 64bit pref]: no compatible bridge window [0.962086] pci :00:08.1: [mem 0x1003000-0x100401f 64bit pref] clipped to [mem 0x1003000-0x100303f 64bit pref] [0.962086] pci :00:08.1: bridge window [mem 0x1003000-0x100303f 64bit pref] [0.962086] pci :07:00.0: can't claim BAR 0 [mem 0x1003000-0x1003fff 64bit pref]: no compatible bridge window [0.962086] pci :07:00.0: can't claim BAR 2 [mem 0x1004000-0x100401f 64bit pref]: no compatible bridge window However, the root bus has two continuous regions that can contain the child resource requested. So try to find another parent region if two regions are continuous and can contain child resource. This change makes the grahpics works on the system in question. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=212013 Signed-off-by: Kai-Heng Feng --- arch/microblaze/pci/pci-common.c | 4 +-- arch/powerpc/kernel/pci-common.c | 8 ++--- arch/sparc/kernel/pci.c | 4 +-- drivers/pci/pci.c| 60 +++- drivers/pci/setup-res.c | 21 +++ drivers/pcmcia/rsrc_nonstatic.c | 4 +-- include/linux/pci.h | 6 ++-- 7 files changed, 80 insertions(+), 27 deletions(-) diff --git a/arch/microblaze/pci/pci-common.c b/arch/microblaze/pci/pci-common.c index 557585f1be41..8e65832fb510 100644 --- a/arch/microblaze/pci/pci-common.c +++ b/arch/microblaze/pci/pci-common.c @@ -669,7 +669,7 @@ static void pcibios_allocate_bus_resources(struct pci_bus *bus) { struct pci_bus *b; int i; - struct resource *res, *pr; + struct resource *res, *pr = NULL; pr_debug("PCI: Allocating bus resources for %04x:%02x...\n", pci_domain_nr(bus), bus->number); @@ -688,7 +688,7 @@ static void pcibios_allocate_bus_resources(struct pci_bus *bus) * and as such ensure proper re-allocation * later. */ - pr = pci_find_parent_resource(bus->self, res); + pci_find_parent_resource(bus->self, res, &pr, NULL); if (pr == res) { /* this happens when the generic PCI * code (wrongly) decides that this diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c index 001e90cd8948..f865354b746d 100644 --- a/arch/powerpc/kernel/pci-common.c +++ b/arch/powerpc/kernel/pci-common.c @@ -1196,7 +1196,7 @@ static void pcibios_allocate_bus_resources(struct pci_bus *bus) { struct pci_bus *b; int i; - struct resource *res, *pr; + struct resource *res, *pr = NULL; pr_debug("PCI: Allocating bus resources for %04x:%02x...\n", pci_domain_nr(bus), bus->number); @@ -1213,7 +1213,7 @@ static void pcibios_allocate_bus_resources(struct pci_bus *bus) pr = (res->flags & IORESOURCE_IO) ? &ioport_resource : &iomem_resource; else { - pr = pci_find_parent_resource(bus->self, res); + pci_find_parent_resource(bus->self, res, &pr, NULL); if (pr == res) { /* this happens when the generic PCI * code (wrongly) decides that this @@ -1265,12 +1265,12 @@ static void pcibios_allocate_bus_resources(struct pci_bus *bus) static inline void alloc_resource(struct pci_dev *dev, int idx) { - struct resource *pr, *r = &dev->resource[idx]; + struct resource *pr = NULL, *r = &dev->resource[idx]; pr_debug("PCI: Allocating %s: Resource %d: %pR\n", pci_name(dev), idx, r); - pr = pci_find_parent_resource(dev, r); + pci_find_parent_resource(dev, r, &pr, NULL); if (!pr || (pr->flags & IORESOURCE_UNSET) || request_resource(pr, r) < 0) { printk(KERN_WARNING "PCI: Cannot allocate resource region %d" diff --git a/arch/sparc/kernel/pci.c b/arch/sparc/kernel/pci.c index 9c2b720bfd20..b4006798e4e1 100644 --- a/arch/sparc/kernel/pci.c +++ b/arch/sparc/kernel/pci.c @@ -621,7 +621,7 @@ static void pci_bus_register_of_sysfs(str
Re: [PATCH 1/2] PCI/AER: Disable AER interrupt during suspend
On Fri, Feb 5, 2021 at 7:28 AM Bjorn Helgaas wrote: > > [+cc Alex] > > On Thu, Jan 28, 2021 at 12:09:37PM +0800, Kai-Heng Feng wrote: > > On Thu, Jan 28, 2021 at 4:51 AM Bjorn Helgaas wrote: > > > On Thu, Jan 28, 2021 at 01:31:00AM +0800, Kai-Heng Feng wrote: > > > > Commit 50310600ebda ("iommu/vt-d: Enable PCI ACS for platform opt in > > > > hint") enables ACS, and some platforms lose its NVMe after resume from > > > > firmware: > > > > [ 50.947816] pcieport :00:1b.0: DPC: containment event, > > > > status:0x1f01 source:0x > > > > [ 50.947817] pcieport :00:1b.0: DPC: unmasked uncorrectable error > > > > detected > > > > [ 50.947829] pcieport :00:1b.0: PCIe Bus Error: > > > > severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Receiver ID) > > > > [ 50.947830] pcieport :00:1b.0: device [8086:06ac] error > > > > status/mask=0020/0001 > > > > [ 50.947831] pcieport :00:1b.0:[21] ACSViol > > > > (First) > > > > [ 50.947841] pcieport :00:1b.0: AER: broadcast error_detected > > > > message > > > > [ 50.947843] nvme nvme0: frozen state error detected, reset controller > > > > > > > > It happens right after ACS gets enabled during resume. > > > > > > > > To prevent that from happening, disable AER interrupt and enable it on > > > > system suspend and resume, respectively. > > > > > > Lots of questions here. Maybe this is what we'll end up doing, but I > > > am curious about why the error is reported in the first place. > > > > > > Is this a consequence of the link going down and back up? > > > > Could be. From the observations, it only happens when firmware suspend > > (S3) is used. > > Maybe it happens when it's gets powered up, but I don't have equipment > > to debug at hardware level. > > > > If we use non-firmware suspend method, enabling ACS after resume won't > > trip AER and DPC. > > > > > Is it consequence of the device doing a DMA when it shouldn't? > > > > If it's doing DMA while suspending, the same error should also happen > > after NVMe is suspended and before PCIe port suspending. > > Furthermore, if non-firmware suspend method is used, there's so such > > issue, so less likely to be any DMA operation. > > > > > Are we doing something in the wrong order during suspend? Or maybe > > > resume, since I assume the error is reported during resume? > > > > Yes the error is reported during resume. The suspend/resume order > > seems fine as non-firmware suspend doesn't have this issue. > > I really feel like we need a better understanding of what's going on > here. Disabling the AER interrupt is like closing our eyes and > pretending that because we don't see it, it didn't happen. > > An ACS error is triggered by a DMA, right? I'm assuming an MMIO > access from the CPU wouldn't trigger this error. And it sounds like > the error is triggered before we even start running the driver after > resume. > > If we're powering up an NVMe device from D3cold and it DMAs before the > driver touches it, something would be seriously broken. I doubt > that's what's happening. Maybe a device could resume some previously > programmed DMA after powering up from D3hot. I am not that familiar with PCIe ACS/AER/DPC, so I can't really answer questions you raised. PCIe spec doesn't say the suspend/resume order is also not helping here. However, I really think it's a system firmware issue. I've seen some suspend-to-idle platforms with NVMe can reach D3cold, those are unaffected. > > Or maybe the error occurred on suspend, like if the device wasn't > quiesced or something, but we didn't notice it until resume? The > AER error status bits are RW1CS, which means they can be preserved > across hot/warm/cold resets. > > Can you instrument the code to see whether the AER error status bit is > set before enabling ACS? I'm not sure that merely enabling ACS (I > assume you mean pci_std_enable_acs(), where we write PCI_ACS_CTRL) > should cause an interrupt for a previously-logged error. I suspect > that could happen when enabling *AER*, but I wouldn't think it would > happen when enabling *ACS*. Diff to print AER status: https://bugzilla.kernel.org/show_bug.cgi?id=209149#c11 And dmesg: https://bugzilla.kernel.org/show_bug.cgi?id=209149#c12 Looks like the read before suspend and after r
Re: [PATCH 1/2] PCI/AER: Disable AER interrupt during suspend
On Thu, Jan 28, 2021 at 4:51 AM Bjorn Helgaas wrote: > > On Thu, Jan 28, 2021 at 01:31:00AM +0800, Kai-Heng Feng wrote: > > Commit 50310600ebda ("iommu/vt-d: Enable PCI ACS for platform opt in > > hint") enables ACS, and some platforms lose its NVMe after resume from > > firmware: > > [ 50.947816] pcieport :00:1b.0: DPC: containment event, status:0x1f01 > > source:0x > > [ 50.947817] pcieport :00:1b.0: DPC: unmasked uncorrectable error > > detected > > [ 50.947829] pcieport :00:1b.0: PCIe Bus Error: severity=Uncorrected > > (Non-Fatal), type=Transaction Layer, (Receiver ID) > > [ 50.947830] pcieport :00:1b.0: device [8086:06ac] error > > status/mask=0020/0001 > > [ 50.947831] pcieport :00:1b.0:[21] ACSViol(First) > > [ 50.947841] pcieport :00:1b.0: AER: broadcast error_detected message > > [ 50.947843] nvme nvme0: frozen state error detected, reset controller > > > > It happens right after ACS gets enabled during resume. > > > > To prevent that from happening, disable AER interrupt and enable it on > > system suspend and resume, respectively. > > Lots of questions here. Maybe this is what we'll end up doing, but I > am curious about why the error is reported in the first place. > > Is this a consequence of the link going down and back up? Could be. From the observations, it only happens when firmware suspend (S3) is used. Maybe it happens when it's gets powered up, but I don't have equipment to debug at hardware level. If we use non-firmware suspend method, enabling ACS after resume won't trip AER and DPC. > > Is it consequence of the device doing a DMA when it shouldn't? If it's doing DMA while suspending, the same error should also happen after NVMe is suspended and before PCIe port suspending. Furthermore, if non-firmware suspend method is used, there's so such issue, so less likely to be any DMA operation. > > Are we doing something in the wrong order during suspend? Or maybe > resume, since I assume the error is reported during resume? Yes the error is reported during resume. The suspend/resume order seems fine as non-firmware suspend doesn't have this issue. > > If we *do* take the error, why doesn't DPC recovery work? It works for the root port, but not for the NVMe drive: [ 50.947816] pcieport :00:1b.0: DPC: containment event, status:0x1f01 source:0x [ 50.947817] pcieport :00:1b.0: DPC: unmasked uncorrectable error detected [ 50.947829] pcieport :00:1b.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Receiver ID) [ 50.947830] pcieport :00:1b.0: device [8086:06ac] error status/mask=0020/0001 [ 50.947831] pcieport :00:1b.0:[21] ACSViol(First) [ 50.947841] pcieport :00:1b.0: AER: broadcast error_detected message [ 50.947843] nvme nvme0: frozen state error detected, reset controller [ 50.948400] ACPI: EC: event unblocked [ 50.948432] xhci_hcd :00:14.0: PME# disabled [ 50.948444] xhci_hcd :00:14.0: enabling bus mastering [ 50.949056] pcieport :00:1b.0: PME# disabled [ 50.949068] pcieport :00:1c.0: PME# disabled [ 50.949416] e1000e :00:1f.6: PME# disabled [ 50.949463] e1000e :00:1f.6: enabling bus mastering [ 50.951606] sd 0:0:0:0: [sda] Starting disk [ 50.951610] nvme :01:00.0: can't change power state from D3hot to D0 (config space inaccessible) [ 50.951730] nvme nvme0: Removing after probe failure status: -19 [ 50.952360] nvme nvme0: failed to set APST feature (-19) [ 50.971136] snd_hda_intel :00:1f.3: PME# disabled [ 51.089330] pcieport :00:1b.0: AER: broadcast resume message [ 51.089345] pcieport :00:1b.0: AER: device recovery successful But I think why recovery doesn't work for NVMe is for another discussion... Kai-Heng > > > Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=209149 > > Fixes: 50310600ebda ("iommu/vt-d: Enable PCI ACS for platform opt in hint") > > Signed-off-by: Kai-Heng Feng > > --- > > drivers/pci/pcie/aer.c | 18 ++ > > 1 file changed, 18 insertions(+) > > > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c > > index 77b0f2c45bc0..0e9a85530ae6 100644 > > --- a/drivers/pci/pcie/aer.c > > +++ b/drivers/pci/pcie/aer.c > > @@ -1365,6 +1365,22 @@ static int aer_probe(struct pcie_device *dev) > > return 0; > > } > > > > +static int aer_suspend(struct pcie_device *dev) > > +{ > > + struct aer_rpc *rpc = get_service_data(dev); > > + > > + aer_disable_rootport(rpc); > > + return 0; > > +} > > + > > +static
[PATCH 2/2] PCI/DPC: Disable DPC interrupt during suspend
Commit 50310600ebda ("iommu/vt-d: Enable PCI ACS for platform opt in hint") enables ACS, and some platforms lose its NVMe after resume from firmware: [ 50.947816] pcieport :00:1b.0: DPC: containment event, status:0x1f01 source:0x [ 50.947817] pcieport :00:1b.0: DPC: unmasked uncorrectable error detected [ 50.947829] pcieport :00:1b.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Receiver ID) [ 50.947830] pcieport :00:1b.0: device [8086:06ac] error status/mask=0020/0001 [ 50.947831] pcieport :00:1b.0:[21] ACSViol(First) [ 50.947841] pcieport :00:1b.0: AER: broadcast error_detected message [ 50.947843] nvme nvme0: frozen state error detected, reset controller Like what previous patch does to AER, introduce new helpers to disable DPC interrupt and enable it on system suspend and resume, respectively. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=209149 Fixes: 50310600ebda ("iommu/vt-d: Enable PCI ACS for platform opt in hint") Signed-off-by: Kai-Heng Feng --- drivers/pci/pcie/dpc.c | 49 -- 1 file changed, 38 insertions(+), 11 deletions(-) diff --git a/drivers/pci/pcie/dpc.c b/drivers/pci/pcie/dpc.c index e05aba86a317..d12289cb5d44 100644 --- a/drivers/pci/pcie/dpc.c +++ b/drivers/pci/pcie/dpc.c @@ -279,6 +279,28 @@ void pci_dpc_init(struct pci_dev *pdev) } } +static void dpc_enable(struct pcie_device *dev) +{ + struct pci_dev *pdev = dev->port; + u16 cap, ctl; + + pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CAP, &cap); + pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, &ctl); + + ctl = (ctl & 0xfff4) | PCI_EXP_DPC_CTL_EN_FATAL | PCI_EXP_DPC_CTL_INT_EN; + pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, ctl); +} + +static void dpc_disable(struct pcie_device *dev) +{ + struct pci_dev *pdev = dev->port; + u16 ctl; + + pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, &ctl); + ctl &= ~(PCI_EXP_DPC_CTL_EN_FATAL | PCI_EXP_DPC_CTL_INT_EN); + pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, ctl); +} + #define FLAG(x, y) (((x) & (y)) ? '+' : '-') static int dpc_probe(struct pcie_device *dev) { @@ -299,11 +321,7 @@ static int dpc_probe(struct pcie_device *dev) return status; } - pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CAP, &cap); - pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, &ctl); - - ctl = (ctl & 0xfff4) | PCI_EXP_DPC_CTL_EN_FATAL | PCI_EXP_DPC_CTL_INT_EN; - pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, ctl); + dpc_enable(dev); pci_info(pdev, "enabled with IRQ %d\n", dev->irq); pci_info(pdev, "error containment capabilities: Int Msg #%d, RPExt%c PoisonedTLP%c SwTrigger%c RP PIO Log %d, DL_ActiveErr%c\n", @@ -316,14 +334,21 @@ static int dpc_probe(struct pcie_device *dev) return status; } -static void dpc_remove(struct pcie_device *dev) +static int dpc_suspend(struct pcie_device *dev) { - struct pci_dev *pdev = dev->port; - u16 ctl; + dpc_disable(dev); + return 0; +} - pci_read_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, &ctl); - ctl &= ~(PCI_EXP_DPC_CTL_EN_FATAL | PCI_EXP_DPC_CTL_INT_EN); - pci_write_config_word(pdev, pdev->dpc_cap + PCI_EXP_DPC_CTL, ctl); +static int dpc_resume(struct pcie_device *dev) +{ + dpc_enable(dev); + return 0; +} + +static void dpc_remove(struct pcie_device *dev) +{ + dpc_disable(dev); } static struct pcie_port_service_driver dpcdriver = { @@ -331,6 +356,8 @@ static struct pcie_port_service_driver dpcdriver = { .port_type = PCIE_ANY_PORT, .service= PCIE_PORT_SERVICE_DPC, .probe = dpc_probe, + .suspend= dpc_suspend, + .resume = dpc_resume, .remove = dpc_remove, }; -- 2.29.2
[PATCH 1/2] PCI/AER: Disable AER interrupt during suspend
Commit 50310600ebda ("iommu/vt-d: Enable PCI ACS for platform opt in hint") enables ACS, and some platforms lose its NVMe after resume from firmware: [ 50.947816] pcieport :00:1b.0: DPC: containment event, status:0x1f01 source:0x [ 50.947817] pcieport :00:1b.0: DPC: unmasked uncorrectable error detected [ 50.947829] pcieport :00:1b.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Receiver ID) [ 50.947830] pcieport :00:1b.0: device [8086:06ac] error status/mask=0020/0001 [ 50.947831] pcieport :00:1b.0:[21] ACSViol(First) [ 50.947841] pcieport :00:1b.0: AER: broadcast error_detected message [ 50.947843] nvme nvme0: frozen state error detected, reset controller It happens right after ACS gets enabled during resume. To prevent that from happening, disable AER interrupt and enable it on system suspend and resume, respectively. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=209149 Fixes: 50310600ebda ("iommu/vt-d: Enable PCI ACS for platform opt in hint") Signed-off-by: Kai-Heng Feng --- drivers/pci/pcie/aer.c | 18 ++ 1 file changed, 18 insertions(+) diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index 77b0f2c45bc0..0e9a85530ae6 100644 --- a/drivers/pci/pcie/aer.c +++ b/drivers/pci/pcie/aer.c @@ -1365,6 +1365,22 @@ static int aer_probe(struct pcie_device *dev) return 0; } +static int aer_suspend(struct pcie_device *dev) +{ + struct aer_rpc *rpc = get_service_data(dev); + + aer_disable_rootport(rpc); + return 0; +} + +static int aer_resume(struct pcie_device *dev) +{ + struct aer_rpc *rpc = get_service_data(dev); + + aer_enable_rootport(rpc); + return 0; +} + /** * aer_root_reset - reset Root Port hierarchy, RCEC, or RCiEP * @dev: pointer to Root Port, RCEC, or RCiEP @@ -1437,6 +1453,8 @@ static struct pcie_port_service_driver aerdriver = { .service= PCIE_PORT_SERVICE_AER, .probe = aer_probe, + .suspend= aer_suspend, + .resume = aer_resume, .remove = aer_remove, }; -- 2.29.2