On Fri, Nov 14, 2025 at 07:58:19PM +0100, Lukas Wunner wrote: > On Thu, Nov 13, 2025 at 10:15:56AM -0600, Bjorn Helgaas wrote: > > It seems like there are two things going on here, and I'm not sure > > they're completely compatible: > > > > 1) Driver calls pci_save_state() to take over device power > > management and prevent the PCI core from doing it. > > > > 2) Driver calls pci_save_state() to capture the device state it > > wants to restore when recovering from an error. > > > > Shouldn't a driver be able to do 2) without also getting 1)? > > In general, it can: > > A number of drivers already call pci_save_state() on probe to capture > the state for subsequent error recovery. If the driver has modified > config space in its probe hook, then calling pci_save_state() continues > to make sense. If the driver has *not* modified config space, then the > call becomes obsolete once this patch is accepted.
So I guess "state_saved == true" means "driver does its own power management and PCI core shouldn't do it", and drivers that want 2) but not 1) just need to set state_saved = false after they call pci_save_state()? That makes sense in sort of a weird way that makes my head hurt every time I try to understand it. I think it's the sequence of "pci_save_state(); dev->state_saved = false" that seems counter-intuitive: we just saved the state; why would we immediately turn around and say we didn't? Maybe "state_saved" isn't the most appropriate name. After error recovery, those drivers will see the state the driver identified when it called pci_save_state(). But after resume, they will see the state the PCI core saved at suspend time. Right? > The reason I'm using the "in general" qualifier: > > I've identified two corner cases where the PCI core neglects to set > state_saved = false before commencing the suspend sequence: > > * If a driver has legacy PCI PM callbacks, pci_legacy_suspend() neglects > to set state_saved = false. Yet both pci_legacy_suspend() and > pci_legacy_suspend_late() subsequently query the state_saved flag. > * If a device is unbound or its driver has no PM callbacks > (driver->pm == NULL), pci_pm_freeze() neglects to set state_saved = false. > Yet pci_pm_freeze_noirq() subsequently queries the state_saved flag. > > In these corner cases, pci_legacy_suspend() and pci_pm_freeze() depend > on some other part of the PCI core to set state_saved = false. > For a freshly enumerated device, the flag is initialized to false by > kzalloc() and pci_device_add() also explicitly sets it to false for good > measure. On resume (or thaw or restore), the flag is set to false by > pci_restore_state(). The latter is preserved as is by my patch and the > former is moved to pci_bus_add_device() to retain the current behavior. > > Clearly, the two corner cases should be fixed and then setting > state_saved = false in pci_bus_add_device() becomes unnecessary. > > I'd prefer doing that in a separate step though. > > So drivers which use legacy PCI PM callbacks or have no PM callbacks > should currently not call pci_save_state() on probe without manually > setting state_saved = false afterwards. If they neglect that, then > pci_legacy_suspend_late() and pci_pm_freeze_noirq() will not call > pci_save_state() on the next suspend cycle and so the state that > will be restored on resume is the one recorded on probe, not the > one that the device had on suspend. If these two states happen > to be identical, there's no problem. > > > > > > +++ b/drivers/pci/bus.c > > > > > @@ -358,6 +358,13 @@ void pci_bus_add_device(struct pci_dev *dev) > > > > > pci_bridge_d3_update(dev); > > > > > > > > > > /* > > > > > + * Save config space for error recoverability. Clear > > > > > state_saved > > > > > + * to detect whether drivers invoked pci_save_state() on suspen > [...] > > > > Can we expand this a little to explain how this is detected and what > > > > drivers *should* be doing? > [...] > > Yes. I should have proposed some text for the comment, e.g., > > > > Save config space for error recoverability. Clear state_saved. If > > driver calls pci_save_state() again, state_saved will be set and > > we'll know that on suspend, the PCI core shouldn't call > > pci_save_state() or change the device power state. > > I'm fine with rewording the code comment like this, as well as splitting > the code comment as suggested by Rafael. Would you prefer amending the > code comment when applying or should I respin with a reworded comment? > > Again, clearing state_saved in pci_bus_add_device() is just a temporary > measure to retain the existing behavior. It (and an accompanying code > comment) can be dropped once pci_legacy_suspend() and pci_pm_freeze() > are fixed. > > > I'm just wishing for a more concrete mention of "pci_save_state()", > > since that's where the critical "state_saved" flag is updated. > > > > And I'm not sure Documentation/ includes anything about the idea of > > a driver using pci_save_state() to capture the state it wants to > > restore after an error. I guess that's obvious now that I write it > > down, but I'm sure learning a lot from this conversation :) > > Okay, noted that the documentation could be improved. I'd be glad > if this could be postponed to a separate step as well though. > I can only address problems one at a time. :) Absolutely. Would you mind posting an update with the tweaks above? I'm not at all confident about doing it myself. Bjorn
