On Tuesday, August 6, 2019 3:36:38 PM CEST Bjorn Helgaas wrote: > On Mon, Aug 05, 2019 at 11:02:51PM +0200, Rafael J. Wysocki wrote: > > On Mon, Aug 5, 2019 at 10:52 PM Bjorn Helgaas <helg...@kernel.org> wrote: > > > > > > pci_check_pme_status() reads the Power Management capability to determine > > > whether a device has generated a PME. The capability is in config space, > > > which is accessible in D0, D1, D2, and D3hot, but not in D3cold. > > > > > > If we call pci_check_pme_status() on a device that's in D3cold, config > > > reads fail and return ~0 data, which we erroneously interpreted as "the > > > device has generated a PME". > > > > > > 000dd5316e1c ("PCI: Do not poll for PME if the device is in D3cold") > > > avoided many of these problems by avoiding pci_check_pme_status() if we > > > think the device is in D3cold. However, it is not a complete fix because > > > the device may go to D3cold after we check its power state but before > > > pci_check_pme_status() reads the Power Management Status Register. > > > > > > Return false ("device has not generated a PME") if we get an error > > > response > > > reading the Power Management Status Register. > > > > > > Fixes: 000dd5316e1c ("PCI: Do not poll for PME if the device is in > > > D3cold") > > > Fixes: 71a83bd727cc ("PCI/PM: add runtime PM support to PCIe port") > > > Signed-off-by: Bjorn Helgaas <bhelg...@google.com> > > > --- > > > drivers/pci/pci.c | 3 +++ > > > 1 file changed, 3 insertions(+) > > > > > > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > > > index 984171d40858..af6a97d7012b 100644 > > > --- a/drivers/pci/pci.c > > > +++ b/drivers/pci/pci.c > > > @@ -2008,6 +2008,9 @@ bool pci_check_pme_status(struct pci_dev *dev) > > > > > > pmcsr_pos = dev->pm_cap + PCI_PM_CTRL; > > > pci_read_config_word(dev, pmcsr_pos, &pmcsr); > > > + if (pmcsr == (u16) PCI_ERROR_RESPONSE) > > > + return false; > > > > No, sorry. > > > > We tried that and it didn't work. > > > > pcie_pme_handle_request() depends on this returning "true" for all > > bits set, as from its perspective "device is not accessible" may very > > well mean "device may have signaled PME". > > Right, it's obviously wrong in the case of devices that advertise > D3cold in PME_Support, i.e., devices that can generate PME even with > main power off. Also, we may want to try to wake up devices if the > config read fails for a reason other than the device being in D3cold. > > What I don't like about the current code is that it checks > PCI_PM_CTRL_PME_STATUS in data that may be completely bogus.
Whether or not the other bits in the register make sense doesn't matter here. Only the PME_STATUS bit matters. > Do you think it would be better to do something like this: > > pci_read_config_word(dev, pmcsr_pos, &pmcsr); > if (pmcsr == (u16) PCI_ERROR_RESPONSE) { > if (pci_pme_capable(dev, PCI_PM_CAP_PME_D3cold)) > return true; > return false; > } > > or maybe this: > > pci_read_config_word(dev, pmcsr_pos, &pmcsr); > if (pmcsr == (u16) PCI_ERROR_RESPONSE) > return true; In this case it still would be prudent to check PME_ENABLE before returning true and so there is no practical difference between ERROR_RESPONSE and the valid data with PME_STATUS set. Except that in the ERROR_RESPONSE case we may as well avoid the PMCSR write which is not going to make a difference. > We should get PCI_ERROR_RESPONSE pretty reliably from devices in > D3cold, so the first possibility would cover that case. > > But since pci_check_pme_status() basically returns a hint ("true" > means a device *may* have generated a PME), and even if the hint is > wrong, the worst that happens is an unnecessary wakeup, maybe the > second possibility is safer. > > What do you think? So if you really want to avoid the PMCSR write in the ERROR_RESPONSE case, something like this can be done IMO: return false; /* Clear PME status. */ - pmcsr |= PCI_PM_CTRL_PME_STATUS; if (pmcsr & PCI_PM_CTRL_PME_ENABLE) { + if (pmcsr == (u16) PCI_ERROR_RESPONSE) + return true; + /* Disable PME to avoid interrupt flood. */ pmcsr &= ~PCI_PM_CTRL_PME_ENABLE; ret = true;