> On Oct 7, 2020, at 21:30, Bjorn Helgaas <helg...@kernel.org> wrote:
> 
> On Wed, Oct 07, 2020 at 12:26:19PM +0800, Kai-Heng Feng wrote:
>>> On Oct 6, 2020, at 03:19, Bjorn Helgaas <helg...@kernel.org> wrote:
>>> On Tue, Oct 06, 2020 at 02:40:32AM +0800, Kai-Heng Feng wrote:
>>>>> On Oct 3, 2020, at 06:18, Bjorn Helgaas <helg...@kernel.org> wrote:
>>>>> On Wed, Sep 30, 2020 at 04:24:54PM +0800, Kai-Heng Feng wrote:
> 
> ...
>>>> I wonder whether other devices that add PCIe domain have the same
>>>> behavior?  Maybe it's not a special case at all...
>>> 
>>> What other devices are these?
>> 
>> Controllers which add PCIe domain.
> 
> I was looking for specific examples, not just a restatement of what
> you said before.  I'm just curious because there are a lot of
> controllers I'm not familiar with, and I can't think of an example.
> 
>>>> I understand the end goal is to keep consistency for the entire ASPM
>>>> logic. However I can't think of any possible solution right now.
>>>> 
>>>>> - If we built with CONFIG_PCIEASPM_POWERSAVE=y, would that solve the
>>>>>  SoC power state problem?
>>>> 
>>>> Yes.
>>>> 
>>>>> - What issues would CONFIG_PCIEASPM_POWERSAVE=y introduce?
>>>> 
>>>> This will break many systems, at least for the 1st Gen Ryzen
>>>> desktops and laptops.
>>>> 
>>>> All PCIe ASPM are not enabled by BIOS, and those systems immediately
>>>> freeze once ASPM is enabled.
>>> 
>>> That indicates a defect in the Linux ASPM code.  We should fix that.
>>> It should be safe to use CONFIG_PCIEASPM_POWERSAVE=y on every system.
>> 
>> On those systems ASPM are also not enabled on Windows. So I think
>> ASPM are disabled for a reason.
> 
> If the platform knows ASPM needs to be disabled, it should be using
> ACPI_FADT_NO_ASPM or _OSC to prevent the OS from using it.  And if
> CONFIG_PCIEASPM_POWERSAVE=y means Linux enables ASPM when it
> shouldn't, that's a Linux bug that we need to fix.

Yes that's a bug which fixed by Ian's new patch.

> 
>>> Are there bug reports for these? The info we would need to start with
>>> includes "lspci -vv" and dmesg log (with CONFIG_PCIEASPM_DEFAULT=y).
>>> If a console log with CONFIG_PCIEASPM_POWERSAVE=y is available, that
>>> might be interesting, too.  We'll likely need to add some
>>> instrumentation and do some experimentation, but in principle, this
>>> should be fixable.
>> 
>> Doing this is asking users to use hardware settings that ODM/OEM
>> never tested, and I think the risk is really high.
> 
> What?  That's not what I said at all.  I'm asking for information
> about these hangs so we can fix them.  I'm not suggesting that you
> should switch to CONFIG_PCIEASPM_POWERSAVE=y for the distro.

Ah, I thought your suggestion is switching to CONFIG_PCIEASPM_POWERSAVE=y, 
because I sense you want to use that to cover the VMD ASPM this patch tries to 
solve.

Do we have a conclusion how to enable VMD ASPM with CONFIG_PCIEASPM_DEFAULT=y?

> 
> Let's back up.  You said:
> 
>  CONFIG_PCIEASPM_POWERSAVE=y ... will break many systems, at least
>  for the 1st Gen Ryzen desktops and laptops.
> 
>  All PCIe ASPM are not enabled by BIOS, and those systems immediately
>  freeze once ASPM is enabled.
> 
> These system hangs might be caused by (1) some hardware issue that
> causes a hang when ASPM is enabled even if it is configured correctly
> or (2) Linux configuring ASPM incorrectly.

It's (2) here.

> 
> For case (1), the platform should be using ACPI_FADT_NO_ASPM or _OSC
> to prevent the OS from enabling ASPM.  Linux should pay attention to
> that even when CONFIG_PCIEASPM_POWERSAVE=y.
> 
> If the platform *should* use these mechanisms but doesn't, the
> solution is a quirk, not the folklore that "we can't use
> CONFIG_PCIEASPM_POWERSAVE=y because it breaks some systems."

The platform in question doesn't prevent OS from enabling ASPM.

> 
> For case (2), we should fix Linux so it configures ASPM correctly.
> 
> We cannot use the build-time CONFIG_PCIEASPM settings to avoid these
> hangs.  We need to fix the Linux run-time code so the system operates
> correctly no matter what CONFIG_PCIEASPM setting is used.
> 
> We have sysfs knobs to control ASPM (see 72ea91afbfb0 ("PCI/ASPM: Add
> sysfs attributes for controlling ASPM link states")).  They can do the
> same thing at run-time as CONFIG_PCIEASPM_POWERSAVE=y does at
> build-time.  If those knobs cause hangs on 1st Gen Ryzen systems, we
> need to fix that.

Ian's patch solves the issue, at least for the systems I have.

Kai-Heng

> 
> Bjorn

Reply via email to