On Tue, Jun 24, 2014 at 2:34 PM, Rajat Jain <[email protected]> wrote: > Hello, > > Sorry, I missed this email. > > Please see below. > >> -----Original Message----- >> From: [email protected] [mailto:linux-pci- >> [email protected]] On Behalf Of Guenter Roeck >> Sent: Tuesday, June 17, 2014 3:56 PM >> To: Bjorn Helgaas; Myron Stowe >> Cc: [email protected]; Kenji Kaneshige; linux- >> [email protected]; Rajat Jain >> Subject: Re: [PATCH] PCI: pciehp: Include the Data Link Layer State Changed >> bit when clearing the Slot Status register's event bits >> >> On 06/17/2014 02:07 PM, Bjorn Helgaas wrote: >> > On Tue, Jun 17, 2014 at 1:27 PM, Myron Stowe >> <[email protected]> wrote: >> >> During PCIe hot-plug initialization - pciehp_probe - data structures >> >> related to slot capabilities are set up. As part of this set up, >> >> ISRs are put in place to handle slot events and all event bits are cleared >> out. >> >> >> >> This patch adds the Data Link Layer State Changed >> >> (PCI_EXP_SLTSTA_DLLSC) Slot Status bit to the event bits that are >> >> cleared out during initialization. >> > >> > I think this is related to e48f1b67f668 ("PCI: pciehp: Use link change >> > notifications for hot-plug and removal"). Prior to that, pcie_isr() >> > didn't look at the PCI_EXP_SLTSTA_DLLSC bit. >> > >> > Apparently there's a non-public report of spurious messages like this >> > at boot-time, with no actual hotplug events: >> > >> > pciehp 0000:82:04.0:pcie24: slot(4): Link Up event >> > pciehp 0000:82:04.0:pcie24: Device 0000:83:00.0 already exists at >> > 0000:83:00, cannot hot-add >> > pciehp 0000:82:04.0:pcie24: Cannot add device at 0000:83:00 >> > >> >> I think I have seen that message once in a while in our systems. >> Rajat, didn't we talk about this a while ago ? > > Essentially my hiccup was that I was not sure whether the driver should or > should not take care of the link change events that have happened BEFORE the > driver gets loaded. This has more impact if the pciehp is built as a kernel > module. > > As an example: > > It is common for the platform init code built into the kernel, to take the > PCI devices on the board out of reset. And that can happen after the PCI > enumeration but before the pciehp driver gets loaded. Thus in this condition, > with this patch, the pciehp will ignore the linkup event, thus device will > not be visible. The only way (other than a rescan) to do hot-add the device > would be to apply-and-remove-reset-signal to the device again. At which point > pciehp may give a warning about about an attempt to remove a non-existent > card, and then will proceed fine with hot-add.
When you saw these problems, was pciehp a module? We changed it (c10cc483bf3f in v3.11) so it can't be a module any more, but if there can still be a significant window between enumeration and pciehp init, I'd like to understand more about it. > The patch looks good, only wanted to make sure that we understand and agree > that the pciehp should NOT process and link events that have happened before > the pciehp was loaded. Currently (before this patch), we clear the Attention Button Pressed, Power Fault Detected, MRL Sensor Changed, Presence Detect Changed, and Command Completed bits, but we *don't* clear Data Link Layer State Changed. That asymmetry seems hard to justify. For example, if a card were inserted after enumeration but before pciehp is initialized, we'd miss the PDC indication, so I think we would fail to notice the new device. That seems basically the same as missing the linkup event in your example. In both cases, I think we *should* notice the new device, so the fact that we don't is a problem. But since pciehp can't be a module any more, the window between PCI enumeration and pciehp initialization should be relatively small. I think the best way to fix this would be to integrate pciehp more closely into PCI enumeration, e.g., by doing pcie_init() at the point where we discover the bridge, before we enumerate any devices below the bridge. This is somewhat tangled up with acpihp, so I don't know how complicated it would be to do this. So I guess my argument is: - Ignoring events that happen before pciehp init decreases our dependency on BIOS - Handling all events consistently is very important - We currently have a problem with missing pre-pciehp events, but there is a way to fix this > Acked-by: Rajat Jain <[email protected]> Thanks for taking a look at this! Bjorn -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [email protected] More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/

