On Fri, Nov 22, 2019 at 1:22 AM Karol Herbst wrote:
>
> Fixes state transitions of Nvidia Pascal GPUs from D3cold into higher device
> states.
>
> v2: convert to pci_dev quirk
> put a proper technical explanation of the issue as a in-code comment
> v3: disable it only for certain combinations of intel and nvidia hardware
> v4: simplify quirk by setting flag on the GPU itself
> v5: restructure quirk to make it easier to add new IDs
> fix whitespace issues
> fix potential NULL pointer access
> update the quirk documentation
>
> Signed-off-by: Karol Herbst
> Cc: Bjorn Helgaas
> Cc: Lyude Paul
> Cc: Rafael J. Wysocki
> Cc: Mika Westerberg
> Cc: linux-...@vger.kernel.org
> Cc: linux...@vger.kernel.org
> Cc: dri-devel@lists.freedesktop.org
> Cc: nouv...@lists.freedesktop.org
> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=205623
> ---
> drivers/pci/pci.c| 7 ++
> drivers/pci/quirks.c | 51
> include/linux/pci.h | 1 +
> 3 files changed, 59 insertions(+)
>
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 57f15a7e6f0b..e08db2daa924 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -850,6 +850,13 @@ static int pci_raw_set_power_state(struct pci_dev *dev,
> pci_power_t state)
>|| (state == PCI_D2 && !dev->d2_support))
> return -EIO;
>
> + /*
> +* Check if we have a bad combination of bridge controller and nvidia
> +* GPU, see quirk_broken_nv_runpm for more info
> +*/
> + if (state != PCI_D0 && dev->broken_nv_runpm)
> + return 0;
The result of this change in the suspend-to-idle path will be leaving
the device and its PCIe port in D0 while suspended, unless the device
itself has power management methods in the ACPI tables (according to
Mika that is not the case).
I don't think that this is desirable.
> +
> pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
>
> /*
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 44c4ae1abd00..24e3f247d291 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -5268,3 +5268,54 @@ static void
> quirk_reset_lenovo_thinkpad_p50_nvgpu(struct pci_dev *pdev)
> DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_NVIDIA, 0x13b1,
> PCI_CLASS_DISPLAY_VGA, 8,
> quirk_reset_lenovo_thinkpad_p50_nvgpu);
> +
> +/*
> + * Some Intel PCIe bridge controllers cause devices to not reappear doing a
> + * D0 -> D3hot -> D3cold -> D0 sequence.
This is inaccurate and not entirely fair AFAICS.
First off, what is a "PCIe bridge controller"? A PCIe root complex?
Second, I don't think that you really can blame hardware here, because
the problem is related to AML (forcing a different code path in AML
makes it go away, so the same hardware with different AML would work).
More precisely, the behavior of the kernel is not what is expected by
AML associated with the PCIe port holding the device.
> Skipping the intermediate D3hot step
> + * seems to make it work again.
Yes, but the change would need to cover both the PM-runtime and
suspend-to-idle code paths.
Also it may be driver-induced rather than quirk-based.
> + *
> + * This leads to various manifestations of this issue:
> + * - AIML code execution hits an infinite loop (as the coe waits on device
Typo: coe -> code
> + *memory to change).
Which AML code is this, the power-off part or power-on part? Is this
AML code associated with the GPU or with the PCIe port holding it (I
guess the latter from what Mika said)?
Also IIRC ACPICA has a mechanism to break infinite loops in AML by
aborting the looping method after a timeout.
> + * - kernel crashes, as all PCI reads return -1, which most code isn't able
> + *to handle well enough.
> + * - sudden shutdowns, as the kernel identified an unrecoverable error after
> + *userspace tries to access the GPU.
IMO it would be enough to say that the GPU is not accessible after an
attempt to remove power from it.
> + *
> + * In all cases dmesg will contain at least one line like this:
> + * 'nouveau :01:00.0: Refused to change power state, currently in D3'
> + * followed by a lot of nouveau timeouts.
> + *
> + * ACPI code
Which ACPI code?
> writes bit 0x80 to the not documented PCI register 0x248 of the
0x248 relative to what? A PCI bar (if so then which one) or the PCI
config space (and which part of it if so)?
> + * Intel PCIe bridge controller (0x1901) in order to power down the GPU.
This doesn't seem accurate. It rather writes to this register to
change the state of the PCIe link between the GPU and the PCIe port
holding it, which is not the same as powering off.
> + * Nonetheless, there are other code paths inside the ACPI firmware which use
> + * other registers, which seem to work fine:
The meaning of the above is unclear.
Does "other" mean "alternative"?
> + * - 0xbc bit 0x20