> -----Original Message----- > From: Manivannan Sadhasivam <[email protected]> > Sent: 2026年4月14日 0:35 > To: Brian Norris <[email protected]>; Hongxing Zhu > <[email protected]> > Cc: Hongxing Zhu <[email protected]>; > [email protected]; Bjorn Helgaas > <[email protected]>; Mahesh J Salgaonkar <[email protected]>; > Oliver O'Halloran <[email protected]>; Will Deacon <[email protected]>; > Lorenzo Pieralisi <[email protected]>; Krzysztof Wilczyński > <[email protected]>; Rob Herring <[email protected]>; Heiko Stuebner > <[email protected]>; Philipp Zabel <[email protected]>; > [email protected]; [email protected]; > [email protected]; [email protected]; > [email protected]; [email protected]; Niklas > Cassel <[email protected]>; Wilfred Mallawa <[email protected]>; > Krishna Chaitanya Chundru <[email protected]>; Lukas > Wunner <[email protected]>; Wilson Ding <[email protected]>; Miles > Chen <[email protected]> > Subject: Re: [PATCH v7 0/4] PCI: Add support for resetting the Root Ports in a > platform specific way > > Hi Brian, > > On Wed, Apr 08, 2026 at 06:58:34PM -0700, Brian Norris wrote: > > Hi Richard and Mani, > > > > For the record, I've been using a form of an earlier version of this > > patchset in my environment for some time now, and I've run across > > problems that *might* relate to what Richard is reporting, but I'm not > > quite sure at the moment. Details below. > > > > On Wed, Mar 25, 2026 at 07:06:49AM +0000, Hongxing Zhu wrote: > > > Hi Mani: > > > I've accidentally encountered a new issue based on the reset root port > patch-set. > > > After performing a few hot-reset operations, the PCIe link enters a > continuous up/down cycling pattern. > > > > > > I found that calling pci_reset_secondary_bus() first in > pcibios_reset_secondary_bus() appears to resolve this issue. > > > Have you experienced a similar problem? > > > > > > " > > > ... > > > [ 141.897701] imx6q-pcie 4c300000.pcie: PCIe(LNK_STS:0x00000700) > > > link up detected [ 142.086341] imx6q-pcie 4c300000.pcie: PCIe Gen.3 > > > x1 link up [ 142.092038] imx6q-pcie 4c300000.pcie: > > > PCIe(LNK_STS:0x00000c00) link down detected ... > > > " > > > > > > Platform: i.MX95 EVK board plus local Root Ports reset supports based > on the #1 and #2 patches of v7 patch-set. > > > Notes of the logs: > > > - One Gen3 NVME device is connected. > > > - "./memtool 4c341058=0;./memtool 4c341058=1;" is used to toggle the > LTSSM_EN bit to trigger the link down. > > > - Toggle BIT6 of Bridge Control Register to trigger hot reset by > "./memtool 4c30003c=004001ff; ./memtool 4c30003c=000001ff;" > > > - The Root Port reset patches works correctly at first. > > > However, after several hot-reset triggers, the link enters a repeated > down/up cycling state. > > > > > > Logs: > > > [ 3.553188] imx6q-pcie 4c300000.pcie: host bridge > /soc/pcie@4c300000 ranges: > > > [ 3.560308] imx6q-pcie 4c300000.pcie: IO > 0x006ff00000..0x006fffffff -> 0x0000000000 > > > [ 3.568525] imx6q-pcie 4c300000.pcie: MEM > 0x0910000000..0x091fffffff -> 0x0010000000 > > > [ 3.577314] imx6q-pcie 4c300000.pcie: config reg[1] 0x60100000 == > cpu 0x60100000 > > > [ 3.796029] imx6q-pcie 4c300000.pcie: iATU: unroll T, 128 ob, 128 ib, > align 4K, limit 1024G > > > [ 4.003746] imx6q-pcie 4c300000.pcie: PCIe Gen.3 x1 link up > > > [ 4.009553] imx6q-pcie 4c300000.pcie: PCI host bridge to bus 0000:00 > > > root@imx95evk:~# > > > root@imx95evk:~# > > > root@imx95evk:~# ./memtool 4c341058=0;./memtool 4c341058=1; > Writing > > > 32-bit value 0x0 to address 0x4C341058 Writing 32-bit v > > > [ 87.265348] imx6q-pcie 4c300000.pcie: PCIe(LNK_STS:0x00000d01) > link down detected > > > alue 0x1 to adder > > > [ 87.273106] imx6q-pcie 4c300000.pcie: Stop root bus and handle link > down > > > ss 0x4C341058 > > > [ 87.281264] pcieport 0000:00:00.0: Recovering Root Port due to Link > Down > > > [ 87.289245] pci 0000:01:00.0: AER: can't recover (no error_detected > callback) > > > root@imx95evk:~# [ 87.514216] imx6q-pcie 4c300000.pcie: > PCIe(LNK_STS:0x00000700) link up detected > > > [ 87.702968] imx6q-pcie 4c300000.pcie: PCIe Gen.3 x1 link up > > > [ 87.834983] pcieport 0000:00:00.0: Root Port has been reset > > > [ 87.840714] pcieport 0000:00:00.0: AER: device recovery failed > > > [ 87.846592] imx6q-pcie 4c300000.pcie: Rescan bus after link up is > detected > > > [ 87.855947] pcieport 0000:00:00.0: bridge configuration invalid ([bus > 00-00]), reconfiguring > > > > I've seen this same line ("bridge configuration invalid") before, and > > I believe that's because the saved state (pci_save_state(); more about > > this below) is invalid -- it contains 0 values in places where they > > should be non-zero. So when those values are restored > > (pci_restore_state()), we get confused. > > > > I believe we've pinned down one reason this invalid state occurs -- > > it's because of an automatic (mis)feature in the DesignWare PCIe > hardware. > > Specifically, it's because of what the controller does during a > > surprise link-down error. > > > > From the Designware docs: > > > > "[...] during normal operation, the link might fail and go down. After > > this link-down event, the controller requests the DWC_pcie_clkrst.v > > module to hot-reset the controller. There is no difference in the > > handling of a link-down reset or a hot reset; the controller asserts > > the link_req_rst_not output requesting the DWC_pcie_clkrst.v module > to > > reset the controller." > > > > In some of the adjacent documentation (and confirmed in local > > testing), it suggests that this automatic reset will also reset > > various DBI (i.e., PCIe config space) registers. It also seems as if > > there's not really a good way to completely stop this automatic reset > > -- the docs mention some SW methods prevent the reset, but they all > seem racy or incomplete. > > > > Anyway, I think this implies that patch 1 is somewhat wrong [1]. It > > includes some code like this: > > > > pci_save_state(dev); > > ret = host->reset_root_port(host, dev); > > if (ret) > > pci_err(dev, "Failed to reset Root Port: %d\n", ret); > > else > > /* Now restore it on success */ > > pci_restore_state(dev); > > > > That first line (pci_save_state()) is prone to saving invalid state, > > depending on whether the link-down event has finished flushing and > > resetting the controller yet or not. The resulting impact is a bit > > hard to judge, depending on what (mis)configuration you end up with. > > > > Thanks a lot for your investigation. I think your observation makes sense and > could be the culprit in saving the corrupted state. Even on non-DWC > controllers, there is no guarantee that the Root Port config registers state > will > be preserved after LDn (before Root Port reset). > > > I also noticed commit a2f1e22390ac ("PCI/ERR: Ensure error > > recoverability at all times") was merged recently. With that change, I > > believe it is now safe to perform pci_restore_state() even without > > pci_save_state() here. > > > > So ... can we remove pci_save_state() from > > pcibios_reset_secondary_bus()? Might that help? > > I think so. I will also test it locally and report back soon. > > > It sounds like my above > > observations *may* match Richard's reports, but I'm not sure. And > > anyway, the documented hardware behavior is racy, so it's hard to > > propose a foolproof solution. > > > > @Richard: Can you confirm if removing 'pci_save_state(dev);' from > pcibios_reset_secondary_bus() fixes your issue? I have tested the hot reset trigger hundreds of times, and it works consistently. @Brian Norris @Manivannan Sadhasivam Thanks a lot for your helps.
Best Regards Richard Zhu > > - Mani > > -- > மணிவண்ணன் சதாசிவம்
