Re: rtsx_pci not restoring ASPM state after suspend/resume

2020-08-02 Thread James Ettle
Hello,

On Tue, 2020-07-28 at 18:06 -0500, Bjorn Helgaas wrote:
> 
> I tried to deduce the problem from the code in aspm.c, but I didn't
> see the problem.  If you have the ability to build a kernel with a
> debug patch, can you boot with the patch below and collect the dmesg
> log?

I've built such a kernel and attached the dmesg output at

https://bugzilla.kernel.org/attachment.cgi?id=290713

For this, the machine was booted from off, no funny udev rules or sysfs
tinkering.

Thanks,
-James



Re: rtsx_pci not restoring ASPM state after suspend/resume

2020-07-28 Thread Bjorn Helgaas
On Tue, Jul 28, 2020 at 09:57:55PM +0100, James Ettle wrote:
> On Mon, 2020-07-27 at 16:47 -0500, Bjorn Helgaas wrote:
> > 
> > I don't see anything in rtsx that enables L0s.  Can you collect
> > the dmesg log when booting with "pci=earlydump"?  That will show
> > whether the BIOS left it this way.  The PCI core isn't supposed to
> > do this, so if it did, we need to fix that.
> 
> dmesg log attached to the bugzilla as:
> 
> https://bugzilla.kernel.org/attachment.cgi?id=290655

Thank you!  BIOS left the Root Port and both Endpoints with L1 enabled
but L0s disabled.

Next question is how L0s got enabled.  We don't see anything in the
driver that does that.  And lspci claims L0s was enabled on the
Endpoints even without any udev or other manual fiddling.

I tried to deduce the problem from the code in aspm.c, but I didn't
see the problem.  If you have the ability to build a kernel with a
debug patch, can you boot with the patch below and collect the dmesg
log?

FWIW, here's the analysis of the earlydump output that shows L0s
disabled:

  #define PCI_CAPABILITY_LIST 0x34
  #define  PCI_CAP_ID_MSI 0x05
  #define  PCI_CAP_ID_EXP 0x10/* PCI Express */
  #define PCI_EXP_LNKCTL  16
  #define  PCI_EXP_LNKCTL_ASPM_L0S 0x0001
  #define  PCI_EXP_LNKCTL_ASPM_L1  0x0002
  #define  PCI_EXP_LNKCTL_CCC 0x0040
  #define  PCI_EXP_LNKCTL_CLKREQ_EN 0x0100

  00:1d.0:
0034: 40Offset of first capability
0040: 8010  PCIe capability ID, next cap at 0x80
0050: 0042  PCIe Link Control: CCC L1
  01:00.0:
0034: 40Offset of first capability
0040: 7005  MSI capability ID (0x05), next cap at 0x70
0070: b010  PCIe capability ID (0x10), next cap at 0xb0
0080: 0142  PCIe Link Control: CLKREQ_EN CCC L1
  01:00.1:
0034: 40Offset of first capability
0040: 7005  MSI capability ID (0x05), next cap at 0x70
0070: b010  PCIe capability ID (0x10), next cap at 0xb0
0080: 0142  PCIe Link Control: CLKREQ_EN CCC L1


commit 555db6d25963 ("debug")
Author: Bjorn Helgaas 
Date:   Tue Jul 28 17:52:58 2020 -0500

debug


diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c
index b17e5ffd31b1..262f35883a2a 100644
--- a/drivers/pci/pcie/aspm.c
+++ b/drivers/pci/pcie/aspm.c
@@ -577,6 +577,11 @@ static void pcie_aspm_cap_init(struct pcie_link_state 
*link, int blacklist)
pcie_get_aspm_reg(parent, );
pcie_get_aspm_reg(child, );
 
+   pci_info(parent, "%s support %x enabled %x\n", __func__, upreg.support,
+upreg.enabled);
+   pci_info(child, "%s support %x enabled %x\n", __func__, dwreg.support,
+dwreg.enabled);
+
/*
 * Setup L0s state
 *
@@ -629,6 +634,10 @@ static void pcie_aspm_cap_init(struct pcie_link_state 
*link, int blacklist)
/* Setup initial capable state. Will be updated later */
link->aspm_capable = link->aspm_support;
 
+   pci_info(parent, "%s link support %x enabled %x capable %x default %x 
disable %x\n",
+__func__, link->aspm_support, link->aspm_enabled,
+link->aspm_capable, link->aspm_default, link->aspm_disable);
+
/* Get and check endpoint acceptable latencies */
list_for_each_entry(child, >devices, bus_list) {
u32 reg32, encoding;
@@ -744,6 +753,7 @@ static void pcie_config_aspm_l1ss(struct pcie_link_state 
*link, u32 state)
 
 static void pcie_config_aspm_dev(struct pci_dev *pdev, u32 val)
 {
+   pci_info(pdev, "%s set state %x\n", __func__, val);
pcie_capability_clear_and_set_word(pdev, PCI_EXP_LNKCTL,
   PCI_EXP_LNKCTL_ASPMC, val);
 }
@@ -754,6 +764,9 @@ static void pcie_config_aspm_link(struct pcie_link_state 
*link, u32 state)
struct pci_dev *child = link->downstream, *parent = link->pdev;
struct pci_bus *linkbus = parent->subordinate;
 
+   pci_info(parent, "%s requesting state %x (%s)\n", __func__,
+state, policy_str[state]);
+
/* Enable only the states that were not explicitly disabled */
state &= (link->aspm_capable & ~link->aspm_disable);
 
@@ -767,6 +780,10 @@ static void pcie_config_aspm_link(struct pcie_link_state 
*link, u32 state)
state |= (link->aspm_enabled & ASPM_STATE_L1_SS_PCIPM);
}
 
+   pci_info(parent, "%s link support %x enabled %x capable %x default %x 
disable %x\n",
+__func__, link->aspm_support, link->aspm_enabled,
+link->aspm_capable, link->aspm_default, link->aspm_disable);
+
/* Nothing to do if the link is already in the requested state */
if (link->aspm_enabled == state)
return;


Re: rtsx_pci not restoring ASPM state after suspend/resume

2020-07-28 Thread James Ettle
On Mon, 2020-07-27 at 16:47 -0500, Bjorn Helgaas wrote:
> 
> I don't see anything in rtsx that enables L0s.  Can you collect the
> dmesg log when booting with "pci=earlydump"?  That will show whether
> the BIOS left it this way.  The PCI core isn't supposed to do this,
> so
> if it did, we need to fix that.
> 

dmesg log attached to the bugzilla as:

https://bugzilla.kernel.org/attachment.cgi?id=290655

(to keep everything in one place).

Thanks,
-James



RE: rtsx_pci not restoring ASPM state after suspend/resume

2020-07-27 Thread 吳昊澄 Ricky

> On Mon, Jul 27, 2020 at 08:52:25PM +0100, James Ettle wrote:
> > On Mon, 2020-07-27 at 09:14 -0500, Bjorn Helgaas wrote:
> > > I don't know the connection between ASPM and package C-states, so I
> > > need to simplify this even more.  All I want to do right now is
> > > verify
> > > that if we don't have any outside influences on the ASPM
> > > configuration
> > > (eg, no manual changes and no udev rules), it stays the same across
> > > suspend/resume.
> >
> > Basically this started from me observing deep package C-states weren't
> > being used, until I went and fiddled with the ASPM state of the
> > rtsx_pci card reader under sysfs -- so phenomenological poking on my
> > part.
> >
> > > So let's read the ASPM state directly from the
> > > hardware like this:
> > >
> > >   sudo lspci -vvs 00:1d.0 | egrep "^0|Lnk|L1|LTR|snoop"
> > >   sudo lspci -vvs 01:00   | egrep "^0|Lnk|L1|LTR|snoop"
> > >
> > > Can you try that before and after suspend/resume?
> >
> > I've attached these to the bugzilla entry at:
> >
> > https://bugzilla.kernel.org/show_bug.cgi?id=208117
> >
> > Spoiler: With no udev rules or suspend hooks, things are the same
> > before and after suspend/resume. One thing I do see (both before and
> > after) is that ASPM L0s and L1 is enabled for the card reader, but
> > disabled for the ethernet chip (does r8169 fiddle with ASPM too?).
> 
> Thank you!  It's good that this stays the same across suspend/resume.
> Do you see different C-state behavior before vs after?
> 
> This is the config I see:
> 
>   00:1d.0 bridge to [bus 01]: ASPM L1 supported; ASPM Disabled
>   01:00.0 card reader:ASPM L0s L1 supported; L0s L1 Enabled
>   01:00.1 GigE NIC:   ASPM L0s L1 supported; ASPM Disabled
> 
> This is actually illegal because PCIe r5.0, sec 5.4.1.3, says software
> must not enable L0s in either direction unless components on both ends
> of the link support L0s.  The bridge (00:1d.0) does not support L0s,
> so it's illegal to enable L0s on 01:00.0.  I don't know whether this
> causes problems in practice.
> 

If system want to entry deep C-state, system have to support L1. Host bridge 
handshake with device to determine whether to enter the L1 state.
Our card reader driver did not set L0s, here need to check who set this, but we 
thought this L0s enable should not cause Host bridge ASPM disable
 

> I don't see anything in rtsx that enables L0s.  Can you collect the
> dmesg log when booting with "pci=earlydump"?  That will show whether
> the BIOS left it this way.  The PCI core isn't supposed to do this, so
> if it did, we need to fix that.
> 
> I don't know whether r8169 mucks with ASPM.  It is legal to have
> different configurations for the two functions, even though they share
> the same link.  Sec 5.4.1 has rules about how hardware resolves
> differences.
> 
> > [Oddly when I set ASPM (e.g. using udev) the lspci tools show ASPM
> > enabled after a suspend/resume, but still no deep package C-states
> > until I manually fiddle via sysfs on the card reader. Sorry if this
> > only muddies the water further!]
> 
> Let's defer this for now.  It sounds worth pursuing, but I can't keep
> everything in my head at once.
> 
> Bjorn
> 
> --Please consider the environment before printing this e-mail.


Re: rtsx_pci not restoring ASPM state after suspend/resume

2020-07-27 Thread Bjorn Helgaas
On Mon, Jul 27, 2020 at 08:52:25PM +0100, James Ettle wrote:
> On Mon, 2020-07-27 at 09:14 -0500, Bjorn Helgaas wrote:
> > I don't know the connection between ASPM and package C-states, so I
> > need to simplify this even more.  All I want to do right now is
> > verify
> > that if we don't have any outside influences on the ASPM
> > configuration
> > (eg, no manual changes and no udev rules), it stays the same across
> > suspend/resume.
> 
> Basically this started from me observing deep package C-states weren't
> being used, until I went and fiddled with the ASPM state of the
> rtsx_pci card reader under sysfs -- so phenomenological poking on my
> part.
> 
> > So let's read the ASPM state directly from the
> > hardware like this:
> > 
> >   sudo lspci -vvs 00:1d.0 | egrep "^0|Lnk|L1|LTR|snoop"
> >   sudo lspci -vvs 01:00   | egrep "^0|Lnk|L1|LTR|snoop"
> > 
> > Can you try that before and after suspend/resume?
> 
> I've attached these to the bugzilla entry at:
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=208117
> 
> Spoiler: With no udev rules or suspend hooks, things are the same
> before and after suspend/resume. One thing I do see (both before and
> after) is that ASPM L0s and L1 is enabled for the card reader, but
> disabled for the ethernet chip (does r8169 fiddle with ASPM too?).

Thank you!  It's good that this stays the same across suspend/resume.
Do you see different C-state behavior before vs after?

This is the config I see:

  00:1d.0 bridge to [bus 01]: ASPM L1 supported; ASPM Disabled
  01:00.0 card reader:ASPM L0s L1 supported; L0s L1 Enabled
  01:00.1 GigE NIC:   ASPM L0s L1 supported; ASPM Disabled

This is actually illegal because PCIe r5.0, sec 5.4.1.3, says software
must not enable L0s in either direction unless components on both ends
of the link support L0s.  The bridge (00:1d.0) does not support L0s,
so it's illegal to enable L0s on 01:00.0.  I don't know whether this
causes problems in practice.

I don't see anything in rtsx that enables L0s.  Can you collect the
dmesg log when booting with "pci=earlydump"?  That will show whether
the BIOS left it this way.  The PCI core isn't supposed to do this, so
if it did, we need to fix that.

I don't know whether r8169 mucks with ASPM.  It is legal to have
different configurations for the two functions, even though they share
the same link.  Sec 5.4.1 has rules about how hardware resolves
differences.

> [Oddly when I set ASPM (e.g. using udev) the lspci tools show ASPM
> enabled after a suspend/resume, but still no deep package C-states
> until I manually fiddle via sysfs on the card reader. Sorry if this
> only muddies the water further!]

Let's defer this for now.  It sounds worth pursuing, but I can't keep
everything in my head at once.

Bjorn


Re: rtsx_pci not restoring ASPM state after suspend/resume

2020-07-27 Thread James Ettle
On Mon, 2020-07-27 at 09:14 -0500, Bjorn Helgaas wrote:
> I don't know the connection between ASPM and package C-states, so I
> need to simplify this even more.  All I want to do right now is
> verify
> that if we don't have any outside influences on the ASPM
> configuration
> (eg, no manual changes and no udev rules), it stays the same across
> suspend/resume.

Basically this started from me observing deep package C-states weren't
being used, until I went and fiddled with the ASPM state of the
rtsx_pci card reader under sysfs -- so phenomenological poking on my
part.

> So let's read the ASPM state directly from the
> hardware like this:
> 
>   sudo lspci -vvs 00:1d.0 | egrep "^0|Lnk|L1|LTR|snoop"
>   sudo lspci -vvs 01:00   | egrep "^0|Lnk|L1|LTR|snoop"
> 
> Can you try that before and after suspend/resume?

I've attached these to the bugzilla entry at:

https://bugzilla.kernel.org/show_bug.cgi?id=208117

Spoiler: With no udev rules or suspend hooks, things are the same
before and after suspend/resume. One thing I do see (both before and
after) is that ASPM L0s and L1 is enabled for the card reader, but
disabled for the ethernet chip (does r8169 fiddle with ASPM too?).

[Oddly when I set ASPM (e.g. using udev) the lspci tools show ASPM
enabled after a suspend/resume, but still no deep package C-states
until I manually fiddle via sysfs on the card reader. Sorry if this
only muddies the water further!]

Thanks,
-James





Re: rtsx_pci not restoring ASPM state after suspend/resume

2020-07-27 Thread Bjorn Helgaas
On Sat, Jul 25, 2020 at 09:27:11PM +0100, James Ettle wrote:
> On Fri, 2020-07-24 at 18:13 -0500, Bjorn Helgaas wrote:
> > 
> > Maybe we should simplify this a little bit more.  James, if you don't
> > touch ASPM config at all, either manually or via udev, does the ASPM
> > configuration stay the same across suspend/resume?
> 
> Yes, it stays the same. Explicitly: 
> 
> With the udev rule disabled, immediately following clean boot from
> power-off (and no additional tinkering), ASPM is OFF to the best of my
> knowledge:
> 
>  - link/l1_aspm in sysfs is 0 for PCI devices :01:00.[01];
>  - the processor sleeps no deeper than package C3.
> 
> The situation above is the same following a suspend/resume cycle --
> both in terms of sysfs, and observed package C-state occupancy.
> 
> [Tested on kernel 5.7.10, but the behaviour is the same as prior
> kernels.]

I don't know the connection between ASPM and package C-states, so I
need to simplify this even more.  All I want to do right now is verify
that if we don't have any outside influences on the ASPM configuration
(eg, no manual changes and no udev rules), it stays the same across
suspend/resume.

In https://bugzilla.kernel.org/show_bug.cgi?id=208117#c12, we saw that
ASPM L0s was disabled before suspend but was enabled after resume.
That should not happen.

You're looking at the sysfs link/l1_aspm file, which tells us what the
PCI core thinks the state is, but I'm not confident that's accurate,
especially because the driver fiddles with the state behind the back
of the PCI core.  So let's read the ASPM state directly from the
hardware like this:

  sudo lspci -vvs 00:1d.0 | egrep "^0|Lnk|L1|LTR|snoop"
  sudo lspci -vvs 01:00   | egrep "^0|Lnk|L1|LTR|snoop"

Can you try that before and after suspend/resume?


Re: rtsx_pci not restoring ASPM state after suspend/resume

2020-07-25 Thread James Ettle
On Fri, 2020-07-24 at 18:13 -0500, Bjorn Helgaas wrote:
> 
> Maybe we should simplify this a little bit more.  James, if you don't
> touch ASPM config at all, either manually or via udev, does the ASPM
> configuration stay the same across suspend/resume?
> 

Yes, it stays the same. Explicitly: 

With the udev rule disabled, immediately following clean boot from
power-off (and no additional tinkering), ASPM is OFF to the best of my
knowledge:

 - link/l1_aspm in sysfs is 0 for PCI devices :01:00.[01];
 - the processor sleeps no deeper than package C3.

The situation above is the same following a suspend/resume cycle --
both in terms of sysfs, and observed package C-state occupancy.

[Tested on kernel 5.7.10, but the behaviour is the same as prior
kernels.]

Thanks,
James.



Re: rtsx_pci not restoring ASPM state after suspend/resume

2020-07-24 Thread Bjorn Helgaas
On Fri, Jul 24, 2020 at 07:16:26AM +, 吳昊澄 Ricky wrote:
> Hi James, Bjorn,
> 
> The Card reader(10ec:5287) is a combo chip with Ethernet(10ec:8168),
> we think it is not cause by setting our device config space in idle
> time.
>
> We dis/enable the ASPM(setting config space) at busy/idle time, it
> can make our R/W performances well not a work around function
>
> PCI Host and Device setting self config space and do handshaking, we
> think it does not affect the system

Are you able to reproduce the problem?  Specifically, James observes
that before suspend, ASPM L1 is enabled, but after resume, L0s and L1
are enabled.  The ASPM state should be the same after resume.

See:

  https://bugzilla.kernel.org/show_bug.cgi?id=208117#c8
  https://bugzilla.kernel.org/show_bug.cgi?id=208117#c9

James *is* manually enabling ASPM L1 via a udev rule, which
complicates things a little.  The sysfs link/l1_aspm functionality
he's using is new and could well be buggy.

Maybe we should simplify this a little bit more.  James, if you don't
touch ASPM config at all, either manually or via udev, does the ASPM
configuration stay the same across suspend/resume?

> > -Original Message-
> > From: Bjorn Helgaas [mailto:helg...@kernel.org]
> > Sent: Friday, July 24, 2020 1:13 AM
> > To: 吳昊澄 Ricky; Rui Feng
> > Cc: Arnd Bergmann; Greg Kroah-Hartman; James Ettle; Len Brown; Puranjay
> > Mohan; linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; Jacopo De
> > Simoi
> > Subject: Re: rtsx_pci not restoring ASPM state after suspend/resume
> > 
> > [+cc Jacopo]
> > 
> > On Thu, Jul 23, 2020 at 11:56:22AM -0500, Bjorn Helgaas wrote:
> > > James reported this issue with rtsx_pci; can you guys please take a
> > > look at it?  https://bugzilla.kernel.org/show_bug.cgi?id=208117
> > >
> > > There's a lot of good info in the bugzilla already.
> > 
> > Likely duplicate: https://bugzilla.kernel.org/show_bug.cgi?id=198951
> > 
> > Jacopo, could you please attach a complete dmesg log and "sudo lspci
> > -vv" output to your bugzilla?
> > 
> > --Please consider the environment before printing this e-mail.


RE: rtsx_pci not restoring ASPM state after suspend/resume

2020-07-24 Thread 吳昊澄 Ricky
Hi James, Bjorn,

The Card reader(10ec:5287) is a combo chip with Ethernet(10ec:8168), we think 
it is not cause by setting our device config space in idle time.
We dis/enable the ASPM(setting config space) at busy/idle time, it can make our 
R/W performances well not a work around function
PCI Host and Device setting self config space and do handshaking, we think it 
does not affect the system


Ricky 



> -Original Message-
> From: Bjorn Helgaas [mailto:helg...@kernel.org]
> Sent: Friday, July 24, 2020 1:13 AM
> To: 吳昊澄 Ricky; Rui Feng
> Cc: Arnd Bergmann; Greg Kroah-Hartman; James Ettle; Len Brown; Puranjay
> Mohan; linux-...@vger.kernel.org; linux-kernel@vger.kernel.org; Jacopo De
> Simoi
> Subject: Re: rtsx_pci not restoring ASPM state after suspend/resume
> 
> [+cc Jacopo]
> 
> On Thu, Jul 23, 2020 at 11:56:22AM -0500, Bjorn Helgaas wrote:
> > James reported this issue with rtsx_pci; can you guys please take a
> > look at it?  https://bugzilla.kernel.org/show_bug.cgi?id=208117
> >
> > There's a lot of good info in the bugzilla already.
> 
> Likely duplicate: https://bugzilla.kernel.org/show_bug.cgi?id=198951
> 
> Jacopo, could you please attach a complete dmesg log and "sudo lspci
> -vv" output to your bugzilla?
> 
> --Please consider the environment before printing this e-mail.


Re: rtsx_pci not restoring ASPM state after suspend/resume

2020-07-23 Thread Bjorn Helgaas
[+cc Jacopo]

On Thu, Jul 23, 2020 at 11:56:22AM -0500, Bjorn Helgaas wrote:
> James reported this issue with rtsx_pci; can you guys please take a
> look at it?  https://bugzilla.kernel.org/show_bug.cgi?id=208117
> 
> There's a lot of good info in the bugzilla already.

Likely duplicate: https://bugzilla.kernel.org/show_bug.cgi?id=198951

Jacopo, could you please attach a complete dmesg log and "sudo lspci
-vv" output to your bugzilla?