Re: rtsx_pci not restoring ASPM state after suspend/resume
Hello, On Tue, 2020-07-28 at 18:06 -0500, Bjorn Helgaas wrote: > > I tried to deduce the problem from the code in aspm.c, but I didn't > see the problem. If you have the ability to build a kernel with a > debug patch, can you boot with the patch below and collect the dmesg > log? I've built such a kernel and attached the dmesg output at https://bugzilla.kernel.org/attachment.cgi?id=290713 For this, the machine was booted from off, no funny udev rules or sysfs tinkering. Thanks, -James
Re: rtsx_pci not restoring ASPM state after suspend/resume
On Tue, Jul 28, 2020 at 09:57:55PM +0100, James Ettle wrote: > On Mon, 2020-07-27 at 16:47 -0500, Bjorn Helgaas wrote: > > > > I don't see anything in rtsx that enables L0s. Can you collect > > the dmesg log when booting with "pci=earlydump"? That will show > > whether the BIOS left it this way. The PCI core isn't supposed to > > do this, so if it did, we need to fix that. > > dmesg log attached to the bugzilla as: > > https://bugzilla.kernel.org/attachment.cgi?id=290655 Thank you! BIOS left the Root Port and both Endpoints with L1 enabled but L0s disabled. Next question is how L0s got enabled. We don't see anything in the driver that does that. And lspci claims L0s was enabled on the Endpoints even without any udev or other manual fiddling. I tried to deduce the problem from the code in aspm.c, but I didn't see the problem. If you have the ability to build a kernel with a debug patch, can you boot with the patch below and collect the dmesg log? FWIW, here's the analysis of the earlydump output that shows L0s disabled: #define PCI_CAPABILITY_LIST 0x34 #define PCI_CAP_ID_MSI 0x05 #define PCI_CAP_ID_EXP 0x10/* PCI Express */ #define PCI_EXP_LNKCTL 16 #define PCI_EXP_LNKCTL_ASPM_L0S 0x0001 #define PCI_EXP_LNKCTL_ASPM_L1 0x0002 #define PCI_EXP_LNKCTL_CCC 0x0040 #define PCI_EXP_LNKCTL_CLKREQ_EN 0x0100 00:1d.0: 0034: 40Offset of first capability 0040: 8010 PCIe capability ID, next cap at 0x80 0050: 0042 PCIe Link Control: CCC L1 01:00.0: 0034: 40Offset of first capability 0040: 7005 MSI capability ID (0x05), next cap at 0x70 0070: b010 PCIe capability ID (0x10), next cap at 0xb0 0080: 0142 PCIe Link Control: CLKREQ_EN CCC L1 01:00.1: 0034: 40Offset of first capability 0040: 7005 MSI capability ID (0x05), next cap at 0x70 0070: b010 PCIe capability ID (0x10), next cap at 0xb0 0080: 0142 PCIe Link Control: CLKREQ_EN CCC L1 commit 555db6d25963 ("debug") Author: Bjorn Helgaas Date: Tue Jul 28 17:52:58 2020 -0500 debug diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c index b17e5ffd31b1..262f35883a2a 100644 --- a/drivers/pci/pcie/aspm.c +++ b/drivers/pci/pcie/aspm.c @@ -577,6 +577,11 @@ static void pcie_aspm_cap_init(struct pcie_link_state *link, int blacklist) pcie_get_aspm_reg(parent, ); pcie_get_aspm_reg(child, ); + pci_info(parent, "%s support %x enabled %x\n", __func__, upreg.support, +upreg.enabled); + pci_info(child, "%s support %x enabled %x\n", __func__, dwreg.support, +dwreg.enabled); + /* * Setup L0s state * @@ -629,6 +634,10 @@ static void pcie_aspm_cap_init(struct pcie_link_state *link, int blacklist) /* Setup initial capable state. Will be updated later */ link->aspm_capable = link->aspm_support; + pci_info(parent, "%s link support %x enabled %x capable %x default %x disable %x\n", +__func__, link->aspm_support, link->aspm_enabled, +link->aspm_capable, link->aspm_default, link->aspm_disable); + /* Get and check endpoint acceptable latencies */ list_for_each_entry(child, >devices, bus_list) { u32 reg32, encoding; @@ -744,6 +753,7 @@ static void pcie_config_aspm_l1ss(struct pcie_link_state *link, u32 state) static void pcie_config_aspm_dev(struct pci_dev *pdev, u32 val) { + pci_info(pdev, "%s set state %x\n", __func__, val); pcie_capability_clear_and_set_word(pdev, PCI_EXP_LNKCTL, PCI_EXP_LNKCTL_ASPMC, val); } @@ -754,6 +764,9 @@ static void pcie_config_aspm_link(struct pcie_link_state *link, u32 state) struct pci_dev *child = link->downstream, *parent = link->pdev; struct pci_bus *linkbus = parent->subordinate; + pci_info(parent, "%s requesting state %x (%s)\n", __func__, +state, policy_str[state]); + /* Enable only the states that were not explicitly disabled */ state &= (link->aspm_capable & ~link->aspm_disable); @@ -767,6 +780,10 @@ static void pcie_config_aspm_link(struct pcie_link_state *link, u32 state) state |= (link->aspm_enabled & ASPM_STATE_L1_SS_PCIPM); } + pci_info(parent, "%s link support %x enabled %x capable %x default %x disable %x\n", +__func__, link->aspm_support, link->aspm_enabled, +link->aspm_capable, link->aspm_default, link->aspm_disable); + /* Nothing to do if the link is already in the requested state */ if (link->aspm_enabled == state) return;
Re: rtsx_pci not restoring ASPM state after suspend/resume
On Mon, 2020-07-27 at 16:47 -0500, Bjorn Helgaas wrote: > > I don't see anything in rtsx that enables L0s. Can you collect the > dmesg log when booting with "pci=earlydump"? That will show whether > the BIOS left it this way. The PCI core isn't supposed to do this, > so > if it did, we need to fix that. > dmesg log attached to the bugzilla as: https://bugzilla.kernel.org/attachment.cgi?id=290655 (to keep everything in one place). Thanks, -James
RE: rtsx_pci not restoring ASPM state after suspend/resume
> On Mon, Jul 27, 2020 at 08:52:25PM +0100, James Ettle wrote: > > On Mon, 2020-07-27 at 09:14 -0500, Bjorn Helgaas wrote: > > > I don't know the connection between ASPM and package C-states, so I > > > need to simplify this even more. All I want to do right now is > > > verify > > > that if we don't have any outside influences on the ASPM > > > configuration > > > (eg, no manual changes and no udev rules), it stays the same across > > > suspend/resume. > > > > Basically this started from me observing deep package C-states weren't > > being used, until I went and fiddled with the ASPM state of the > > rtsx_pci card reader under sysfs -- so phenomenological poking on my > > part. > > > > > So let's read the ASPM state directly from the > > > hardware like this: > > > > > > sudo lspci -vvs 00:1d.0 | egrep "^0|Lnk|L1|LTR|snoop" > > > sudo lspci -vvs 01:00 | egrep "^0|Lnk|L1|LTR|snoop" > > > > > > Can you try that before and after suspend/resume? > > > > I've attached these to the bugzilla entry at: > > > > https://bugzilla.kernel.org/show_bug.cgi?id=208117 > > > > Spoiler: With no udev rules or suspend hooks, things are the same > > before and after suspend/resume. One thing I do see (both before and > > after) is that ASPM L0s and L1 is enabled for the card reader, but > > disabled for the ethernet chip (does r8169 fiddle with ASPM too?). > > Thank you! It's good that this stays the same across suspend/resume. > Do you see different C-state behavior before vs after? > > This is the config I see: > > 00:1d.0 bridge to [bus 01]: ASPM L1 supported; ASPM Disabled > 01:00.0 card reader:ASPM L0s L1 supported; L0s L1 Enabled > 01:00.1 GigE NIC: ASPM L0s L1 supported; ASPM Disabled > > This is actually illegal because PCIe r5.0, sec 5.4.1.3, says software > must not enable L0s in either direction unless components on both ends > of the link support L0s. The bridge (00:1d.0) does not support L0s, > so it's illegal to enable L0s on 01:00.0. I don't know whether this > causes problems in practice. > If system want to entry deep C-state, system have to support L1. Host bridge handshake with device to determine whether to enter the L1 state. Our card reader driver did not set L0s, here need to check who set this, but we thought this L0s enable should not cause Host bridge ASPM disable > I don't see anything in rtsx that enables L0s. Can you collect the > dmesg log when booting with "pci=earlydump"? That will show whether > the BIOS left it this way. The PCI core isn't supposed to do this, so > if it did, we need to fix that. > > I don't know whether r8169 mucks with ASPM. It is legal to have > different configurations for the two functions, even though they share > the same link. Sec 5.4.1 has rules about how hardware resolves > differences. > > > [Oddly when I set ASPM (e.g. using udev) the lspci tools show ASPM > > enabled after a suspend/resume, but still no deep package C-states > > until I manually fiddle via sysfs on the card reader. Sorry if this > > only muddies the water further!] > > Let's defer this for now. It sounds worth pursuing, but I can't keep > everything in my head at once. > > Bjorn > > --Please consider the environment before printing this e-mail.
Re: rtsx_pci not restoring ASPM state after suspend/resume
On Mon, Jul 27, 2020 at 08:52:25PM +0100, James Ettle wrote: > On Mon, 2020-07-27 at 09:14 -0500, Bjorn Helgaas wrote: > > I don't know the connection between ASPM and package C-states, so I > > need to simplify this even more. All I want to do right now is > > verify > > that if we don't have any outside influences on the ASPM > > configuration > > (eg, no manual changes and no udev rules), it stays the same across > > suspend/resume. > > Basically this started from me observing deep package C-states weren't > being used, until I went and fiddled with the ASPM state of the > rtsx_pci card reader under sysfs -- so phenomenological poking on my > part. > > > So let's read the ASPM state directly from the > > hardware like this: > > > > sudo lspci -vvs 00:1d.0 | egrep "^0|Lnk|L1|LTR|snoop" > > sudo lspci -vvs 01:00 | egrep "^0|Lnk|L1|LTR|snoop" > > > > Can you try that before and after suspend/resume? > > I've attached these to the bugzilla entry at: > > https://bugzilla.kernel.org/show_bug.cgi?id=208117 > > Spoiler: With no udev rules or suspend hooks, things are the same > before and after suspend/resume. One thing I do see (both before and > after) is that ASPM L0s and L1 is enabled for the card reader, but > disabled for the ethernet chip (does r8169 fiddle with ASPM too?). Thank you! It's good that this stays the same across suspend/resume. Do you see different C-state behavior before vs after? This is the config I see: 00:1d.0 bridge to [bus 01]: ASPM L1 supported; ASPM Disabled 01:00.0 card reader:ASPM L0s L1 supported; L0s L1 Enabled 01:00.1 GigE NIC: ASPM L0s L1 supported; ASPM Disabled This is actually illegal because PCIe r5.0, sec 5.4.1.3, says software must not enable L0s in either direction unless components on both ends of the link support L0s. The bridge (00:1d.0) does not support L0s, so it's illegal to enable L0s on 01:00.0. I don't know whether this causes problems in practice. I don't see anything in rtsx that enables L0s. Can you collect the dmesg log when booting with "pci=earlydump"? That will show whether the BIOS left it this way. The PCI core isn't supposed to do this, so if it did, we need to fix that. I don't know whether r8169 mucks with ASPM. It is legal to have different configurations for the two functions, even though they share the same link. Sec 5.4.1 has rules about how hardware resolves differences. > [Oddly when I set ASPM (e.g. using udev) the lspci tools show ASPM > enabled after a suspend/resume, but still no deep package C-states > until I manually fiddle via sysfs on the card reader. Sorry if this > only muddies the water further!] Let's defer this for now. It sounds worth pursuing, but I can't keep everything in my head at once. Bjorn
Re: rtsx_pci not restoring ASPM state after suspend/resume
On Mon, 2020-07-27 at 09:14 -0500, Bjorn Helgaas wrote: > I don't know the connection between ASPM and package C-states, so I > need to simplify this even more. All I want to do right now is > verify > that if we don't have any outside influences on the ASPM > configuration > (eg, no manual changes and no udev rules), it stays the same across > suspend/resume. Basically this started from me observing deep package C-states weren't being used, until I went and fiddled with the ASPM state of the rtsx_pci card reader under sysfs -- so phenomenological poking on my part. > So let's read the ASPM state directly from the > hardware like this: > > sudo lspci -vvs 00:1d.0 | egrep "^0|Lnk|L1|LTR|snoop" > sudo lspci -vvs 01:00 | egrep "^0|Lnk|L1|LTR|snoop" > > Can you try that before and after suspend/resume? I've attached these to the bugzilla entry at: https://bugzilla.kernel.org/show_bug.cgi?id=208117 Spoiler: With no udev rules or suspend hooks, things are the same before and after suspend/resume. One thing I do see (both before and after) is that ASPM L0s and L1 is enabled for the card reader, but disabled for the ethernet chip (does r8169 fiddle with ASPM too?). [Oddly when I set ASPM (e.g. using udev) the lspci tools show ASPM enabled after a suspend/resume, but still no deep package C-states until I manually fiddle via sysfs on the card reader. Sorry if this only muddies the water further!] Thanks, -James
Re: rtsx_pci not restoring ASPM state after suspend/resume
On Sat, Jul 25, 2020 at 09:27:11PM +0100, James Ettle wrote: > On Fri, 2020-07-24 at 18:13 -0500, Bjorn Helgaas wrote: > > > > Maybe we should simplify this a little bit more. James, if you don't > > touch ASPM config at all, either manually or via udev, does the ASPM > > configuration stay the same across suspend/resume? > > Yes, it stays the same. Explicitly: > > With the udev rule disabled, immediately following clean boot from > power-off (and no additional tinkering), ASPM is OFF to the best of my > knowledge: > > - link/l1_aspm in sysfs is 0 for PCI devices :01:00.[01]; > - the processor sleeps no deeper than package C3. > > The situation above is the same following a suspend/resume cycle -- > both in terms of sysfs, and observed package C-state occupancy. > > [Tested on kernel 5.7.10, but the behaviour is the same as prior > kernels.] I don't know the connection between ASPM and package C-states, so I need to simplify this even more. All I want to do right now is verify that if we don't have any outside influences on the ASPM configuration (eg, no manual changes and no udev rules), it stays the same across suspend/resume. In https://bugzilla.kernel.org/show_bug.cgi?id=208117#c12, we saw that ASPM L0s was disabled before suspend but was enabled after resume. That should not happen. You're looking at the sysfs link/l1_aspm file, which tells us what the PCI core thinks the state is, but I'm not confident that's accurate, especially because the driver fiddles with the state behind the back of the PCI core. So let's read the ASPM state directly from the hardware like this: sudo lspci -vvs 00:1d.0 | egrep "^0|Lnk|L1|LTR|snoop" sudo lspci -vvs 01:00 | egrep "^0|Lnk|L1|LTR|snoop" Can you try that before and after suspend/resume?
Re: rtsx_pci not restoring ASPM state after suspend/resume
On Fri, 2020-07-24 at 18:13 -0500, Bjorn Helgaas wrote: > > Maybe we should simplify this a little bit more. James, if you don't > touch ASPM config at all, either manually or via udev, does the ASPM > configuration stay the same across suspend/resume? > Yes, it stays the same. Explicitly: With the udev rule disabled, immediately following clean boot from power-off (and no additional tinkering), ASPM is OFF to the best of my knowledge: - link/l1_aspm in sysfs is 0 for PCI devices :01:00.[01]; - the processor sleeps no deeper than package C3. The situation above is the same following a suspend/resume cycle -- both in terms of sysfs, and observed package C-state occupancy. [Tested on kernel 5.7.10, but the behaviour is the same as prior kernels.] Thanks, James.
Re: rtsx_pci not restoring ASPM state after suspend/resume
On Fri, Jul 24, 2020 at 07:16:26AM +, 吳昊澄 Ricky wrote: > Hi James, Bjorn, > > The Card reader(10ec:5287) is a combo chip with Ethernet(10ec:8168), > we think it is not cause by setting our device config space in idle > time. > > We dis/enable the ASPM(setting config space) at busy/idle time, it > can make our R/W performances well not a work around function > > PCI Host and Device setting self config space and do handshaking, we > think it does not affect the system Are you able to reproduce the problem? Specifically, James observes that before suspend, ASPM L1 is enabled, but after resume, L0s and L1 are enabled. The ASPM state should be the same after resume. See: https://bugzilla.kernel.org/show_bug.cgi?id=208117#c8 https://bugzilla.kernel.org/show_bug.cgi?id=208117#c9 James *is* manually enabling ASPM L1 via a udev rule, which complicates things a little. The sysfs link/l1_aspm functionality he's using is new and could well be buggy. Maybe we should simplify this a little bit more. James, if you don't touch ASPM config at all, either manually or via udev, does the ASPM configuration stay the same across suspend/resume? > > -Original Message- > > From: Bjorn Helgaas [mailto:helg...@kernel.org] > > Sent: Friday, July 24, 2020 1:13 AM > > To: 吳昊澄 Ricky; Rui Feng > > Cc: Arnd Bergmann; Greg Kroah-Hartman; James Ettle; Len Brown; Puranjay > > Mohan; linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; Jacopo De > > Simoi > > Subject: Re: rtsx_pci not restoring ASPM state after suspend/resume > > > > [+cc Jacopo] > > > > On Thu, Jul 23, 2020 at 11:56:22AM -0500, Bjorn Helgaas wrote: > > > James reported this issue with rtsx_pci; can you guys please take a > > > look at it? https://bugzilla.kernel.org/show_bug.cgi?id=208117 > > > > > > There's a lot of good info in the bugzilla already. > > > > Likely duplicate: https://bugzilla.kernel.org/show_bug.cgi?id=198951 > > > > Jacopo, could you please attach a complete dmesg log and "sudo lspci > > -vv" output to your bugzilla? > > > > --Please consider the environment before printing this e-mail.
RE: rtsx_pci not restoring ASPM state after suspend/resume
Hi James, Bjorn, The Card reader(10ec:5287) is a combo chip with Ethernet(10ec:8168), we think it is not cause by setting our device config space in idle time. We dis/enable the ASPM(setting config space) at busy/idle time, it can make our R/W performances well not a work around function PCI Host and Device setting self config space and do handshaking, we think it does not affect the system Ricky > -Original Message- > From: Bjorn Helgaas [mailto:helg...@kernel.org] > Sent: Friday, July 24, 2020 1:13 AM > To: 吳昊澄 Ricky; Rui Feng > Cc: Arnd Bergmann; Greg Kroah-Hartman; James Ettle; Len Brown; Puranjay > Mohan; linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; Jacopo De > Simoi > Subject: Re: rtsx_pci not restoring ASPM state after suspend/resume > > [+cc Jacopo] > > On Thu, Jul 23, 2020 at 11:56:22AM -0500, Bjorn Helgaas wrote: > > James reported this issue with rtsx_pci; can you guys please take a > > look at it? https://bugzilla.kernel.org/show_bug.cgi?id=208117 > > > > There's a lot of good info in the bugzilla already. > > Likely duplicate: https://bugzilla.kernel.org/show_bug.cgi?id=198951 > > Jacopo, could you please attach a complete dmesg log and "sudo lspci > -vv" output to your bugzilla? > > --Please consider the environment before printing this e-mail.
Re: rtsx_pci not restoring ASPM state after suspend/resume
[+cc Jacopo] On Thu, Jul 23, 2020 at 11:56:22AM -0500, Bjorn Helgaas wrote: > James reported this issue with rtsx_pci; can you guys please take a > look at it? https://bugzilla.kernel.org/show_bug.cgi?id=208117 > > There's a lot of good info in the bugzilla already. Likely duplicate: https://bugzilla.kernel.org/show_bug.cgi?id=198951 Jacopo, could you please attach a complete dmesg log and "sudo lspci -vv" output to your bugzilla?
rtsx_pci not restoring ASPM state after suspend/resume
James reported this issue with rtsx_pci; can you guys please take a look at it? https://bugzilla.kernel.org/show_bug.cgi?id=208117 There's a lot of good info in the bugzilla already. Bjorn