On Tue, May 1, 2018 at 9:55 PM, Bjorn Helgaas <helg...@kernel.org> wrote: > On Tue, May 01, 2018 at 10:34:29AM +0200, Rafael J. Wysocki wrote: >> On Mon, Apr 30, 2018 at 4:22 PM, Joseph Salisbury >> <joseph.salisb...@canonical.com> wrote: >> > On 04/16/2018 11:58 AM, Rafael J. Wysocki wrote: >> >> On Mon, Apr 16, 2018 at 5:31 PM, Joseph Salisbury >> >> <joseph.salisb...@canonical.com> wrote: >> >>> On 04/13/2018 05:34 PM, Rafael J. Wysocki wrote: >> >>>> On Fri, Apr 13, 2018 at 7:56 PM, Joseph Salisbury >> >>>> <joseph.salisb...@canonical.com> wrote: >> >>>>> Hi Rafael, >> >>>>> >> >>>>> A kernel bug report was opened against Ubuntu [0]. After a kernel >> >>>>> bisect, it was found that reverting the following two commits resolved >> >>>>> this bug: >> >>>>> >> >>>>> 0ce3fcaff929 ("PCI / PM: Restore PME Enable after config space >> >>>>> restoration") >> >>>>> 0847684cfc5f("PCI / PM: Simplify device wakeup settings code") >> >>>>> >> >>>>> This is a regression introduced in v4.13-rc1 and still exists in >> >>>>> mainline. The bug causes the battery to drain when the system is >> >>>>> powered down and unplugged, which does not happed prior to these two >> >>>>> commits. >> >>>> What system and what do you mean by "powered down"? How much time >> >>>> does it take for the battery to drain now? >> >>> By powered down, the bug reporter is saying physically powered off and >> >>> unplugged. The system is a HP laptop: >> >>> >> >>> dmi.chassis.vendor: HP >> >>> dmi.product.family: 103C_5335KV HP Notebook >> >>> dmi.product.name: HP Notebook >> >>> vendor_id : GenuineIntel >> >>> cpu family : 6 >> >>> >> >>> >> >>>>> The bisect actually pointed to commit de3ef1e, but reverting >> >>>>> these two commits fixes the issue. >> >>>>> >> >>>>> I was hoping to get your feedback, since you are the patch author. Do >> >>>>> you think gathering any additional data will help diagnose this issue, >> >>>>> or would it be best to submit a revert request? >> >>>> First, reverting these is not an option or you will break systems >> >>>> relying on them now. 4.13 is three releases back at this point. >> >>>> >> >>>> Second, your issue appears to be related to the suspend/shutdown path >> >>>> whereas commit 0ce3fcaff929 is mostly about resume, so presumably the >> >>>> change in pci_enable_wake() causes the problem to happen. Can you try >> >>>> to revert this one alone and see if that helps? >> >>> A test kernel with commits 0ce3fcaff929 and de3ef1eb1cd0 reverted was >> >>> tested. However, the test kernel still exhibited the bug. >> >> So essentially the bisection result cannot be trusted. >> > >> > We performed some more testing and confirmed just a revert of the >> > following commit resolves the bug: >> > >> > 0847684cfc5f0 ("PCI / PM: Simplify device wakeup settings code") >> >> Thanks for confirming this! >> >> > Can you think of any suggestions to help debug further? >> >> The root cause of the regression is likely the change in >> pci_enable_wake() removing the device_may_wakeup() check from it. >> >> Probably, one of the drivers in the platform calls pci_enable_wake() >> directly from its ->shutdown() callback and that causes the device to >> be set up for system wakeup which in turn causes the power draw while >> the system is off to increase. >> >> I would look at the PCI drivers used on that platform to find which of >> them call pci_enable_wake() directly from ->shutdown() and I would >> make these calls conditional on device_may_wakeup(). > > I took a quick look with > > git grep -E "pci_enable_wake\(.*[^0]\);|device_may_wakeup" > > and didn't notice any pci_enable_wake() callers that called > device_may_wakeup() first.
I've just look at a bunch of network drivers doing that. It looks like I may need to restore __pci_enable_wake() with an extra "runtime" argument for internal use. Joseph, can you ask the reporter to test the Bjorn's patch, please?