Re: [PATCH v5] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

2019-11-22 Thread Rafael J. Wysocki
On Fri, Nov 22, 2019 at 1:22 AM Karol Herbst  wrote:
>
> Fixes state transitions of Nvidia Pascal GPUs from D3cold into higher device
> states.
>
> v2: convert to pci_dev quirk
> put a proper technical explanation of the issue as a in-code comment
> v3: disable it only for certain combinations of intel and nvidia hardware
> v4: simplify quirk by setting flag on the GPU itself
> v5: restructure quirk to make it easier to add new IDs
> fix whitespace issues
> fix potential NULL pointer access
> update the quirk documentation
>
> Signed-off-by: Karol Herbst 
> Cc: Bjorn Helgaas 
> Cc: Lyude Paul 
> Cc: Rafael J. Wysocki 
> Cc: Mika Westerberg 
> Cc: linux-...@vger.kernel.org
> Cc: linux...@vger.kernel.org
> Cc: dri-devel@lists.freedesktop.org
> Cc: nouv...@lists.freedesktop.org
> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=205623
> ---
>  drivers/pci/pci.c|  7 ++
>  drivers/pci/quirks.c | 51 
>  include/linux/pci.h  |  1 +
>  3 files changed, 59 insertions(+)
>
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 57f15a7e6f0b..e08db2daa924 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -850,6 +850,13 @@ static int pci_raw_set_power_state(struct pci_dev *dev, 
> pci_power_t state)
>|| (state == PCI_D2 && !dev->d2_support))
> return -EIO;
>
> +   /*
> +* Check if we have a bad combination of bridge controller and nvidia
> +* GPU, see quirk_broken_nv_runpm for more info
> +*/
> +   if (state != PCI_D0 && dev->broken_nv_runpm)
> +   return 0;

The result of this change in the suspend-to-idle path will be leaving
the device and its PCIe port in D0 while suspended, unless the device
itself has power management methods in the ACPI tables (according to
Mika that is not the case).

I don't think that this is desirable.

> +
> pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
>
> /*
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 44c4ae1abd00..24e3f247d291 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -5268,3 +5268,54 @@ static void 
> quirk_reset_lenovo_thinkpad_p50_nvgpu(struct pci_dev *pdev)
>  DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_NVIDIA, 0x13b1,
>   PCI_CLASS_DISPLAY_VGA, 8,
>   quirk_reset_lenovo_thinkpad_p50_nvgpu);
> +
> +/*
> + * Some Intel PCIe bridge controllers cause devices to not reappear doing a
> + * D0 -> D3hot -> D3cold -> D0 sequence.

This is inaccurate and not entirely fair AFAICS.

First off, what is a "PCIe bridge controller"?  A PCIe root complex?

Second, I don't think that you really can blame hardware here, because
the problem is related to AML (forcing a different code path in AML
makes it go away, so the same hardware with different AML would work).

More precisely, the behavior of the kernel is not what is expected by
AML associated with the PCIe port holding the device.

> Skipping the intermediate D3hot step
> + * seems to make it work again.

Yes, but the change would need to cover both the PM-runtime and
suspend-to-idle code paths.

Also it may be driver-induced rather than quirk-based.

> + *
> + * This leads to various manifestations of this issue:
> + *  - AIML code execution hits an infinite loop (as the coe waits on device

Typo: coe -> code

> + *memory to change).

Which AML code is this, the power-off part or power-on part?  Is this
AML code associated with the GPU or with the PCIe port holding it (I
guess the latter from what Mika said)?

Also IIRC ACPICA has a mechanism to break infinite loops in AML by
aborting the looping method after a timeout.

> + *  - kernel crashes, as all PCI reads return -1, which most code isn't able
> + *to handle well enough.
> + *  - sudden shutdowns, as the kernel identified an unrecoverable error after
> + *userspace tries to access the GPU.

IMO it would be enough to say that the GPU is not accessible after an
attempt to remove power from it.

> + *
> + * In all cases dmesg will contain at least one line like this:
> + * 'nouveau :01:00.0: Refused to change power state, currently in D3'
> + * followed by a lot of nouveau timeouts.
> + *
> + * ACPI code

Which ACPI code?

> writes bit 0x80 to the not documented PCI register 0x248 of the

0x248 relative to what?  A PCI bar (if so then which one) or the PCI
config space (and which part of it if so)?

> + * Intel PCIe bridge controller (0x1901) in order to power down the GPU.

This doesn't seem accurate.  It rather writes to this register to
change the state of the PCIe link between the GPU and the PCIe port
holding it, which is not the same as powering off.

> + * Nonetheless, there are other code paths inside the ACPI firmware which use
> + * other registers, which seem to work fine:

The meaning of the above is unclear.

Does "other" mean "alternative"?

> + *  - 0xbc bit 0x20

[PATCH v5] pci: prevent putting nvidia GPUs into lower device states on certain intel bridges

2019-11-21 Thread Karol Herbst
Fixes state transitions of Nvidia Pascal GPUs from D3cold into higher device
states.

v2: convert to pci_dev quirk
put a proper technical explanation of the issue as a in-code comment
v3: disable it only for certain combinations of intel and nvidia hardware
v4: simplify quirk by setting flag on the GPU itself
v5: restructure quirk to make it easier to add new IDs
fix whitespace issues
fix potential NULL pointer access
update the quirk documentation

Signed-off-by: Karol Herbst 
Cc: Bjorn Helgaas 
Cc: Lyude Paul 
Cc: Rafael J. Wysocki 
Cc: Mika Westerberg 
Cc: linux-...@vger.kernel.org
Cc: linux...@vger.kernel.org
Cc: dri-devel@lists.freedesktop.org
Cc: nouv...@lists.freedesktop.org
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=205623
---
 drivers/pci/pci.c|  7 ++
 drivers/pci/quirks.c | 51 
 include/linux/pci.h  |  1 +
 3 files changed, 59 insertions(+)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 57f15a7e6f0b..e08db2daa924 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -850,6 +850,13 @@ static int pci_raw_set_power_state(struct pci_dev *dev, 
pci_power_t state)
   || (state == PCI_D2 && !dev->d2_support))
return -EIO;
 
+   /*
+* Check if we have a bad combination of bridge controller and nvidia
+* GPU, see quirk_broken_nv_runpm for more info
+*/
+   if (state != PCI_D0 && dev->broken_nv_runpm)
+   return 0;
+
pci_read_config_word(dev, dev->pm_cap + PCI_PM_CTRL, &pmcsr);
 
/*
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 44c4ae1abd00..24e3f247d291 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -5268,3 +5268,54 @@ static void quirk_reset_lenovo_thinkpad_p50_nvgpu(struct 
pci_dev *pdev)
 DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_NVIDIA, 0x13b1,
  PCI_CLASS_DISPLAY_VGA, 8,
  quirk_reset_lenovo_thinkpad_p50_nvgpu);
+
+/*
+ * Some Intel PCIe bridge controllers cause devices to not reappear doing a
+ * D0 -> D3hot -> D3cold -> D0 sequence. Skipping the intermediate D3hot step
+ * seems to make it work again.
+ *
+ * This leads to various manifestations of this issue:
+ *  - AIML code execution hits an infinite loop (as the coe waits on device
+ *memory to change).
+ *  - kernel crashes, as all PCI reads return -1, which most code isn't able
+ *to handle well enough.
+ *  - sudden shutdowns, as the kernel identified an unrecoverable error after
+ *userspace tries to access the GPU.
+ *
+ * In all cases dmesg will contain at least one line like this:
+ * 'nouveau :01:00.0: Refused to change power state, currently in D3'
+ * followed by a lot of nouveau timeouts.
+ *
+ * ACPI code writes bit 0x80 to the not documented PCI register 0x248 of the
+ * Intel PCIe bridge controller (0x1901) in order to power down the GPU.
+ * Nonetheless, there are other code paths inside the ACPI firmware which use
+ * other registers, which seem to work fine:
+ *  - 0xbc bit 0x20 (publicly available documentation claims 'reserved')
+ *  - 0xb0 bit 0x10 (link disable)
+ * Changing the conditions inside the firmware by poking into the relevant
+ * addresses does resolve the issue, but it seemed to be ACPI private memory
+ * and not any device accessible memory at all, so there is no portable way of
+ * changing the conditions.
+ *
+ * The only systems where this behavior can be seen are hybrid graphics laptops
+ * with a secondary Nvidia Maxwell, Pascal or Turing GPU. It cannot be ruled
+ * out that this issue only occurs in combination with listed Intel PCIe
+ * bridge controllers and the mentioned GPUs or if it's only a hw bug in the
+ * bridge controller.
+ */
+
+static void quirk_broken_nv_runpm(struct pci_dev *dev)
+{
+   struct pci_dev *bridge = pci_upstream_bridge(dev);
+
+   if (!bridge || bridge->vendor != PCI_VENDOR_ID_INTEL)
+   return;
+
+   switch (bridge->device) {
+   case 0x1901:
+   dev->broken_nv_runpm = 1;
+   }
+}
+DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_NVIDIA, PCI_ANY_ID,
+ PCI_BASE_CLASS_DISPLAY, 16,
+ quirk_broken_nv_runpm);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index ac8a6c4e1792..903a0b3a39ec 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -416,6 +416,7 @@ struct pci_dev {
unsigned int__aer_firmware_first_valid:1;
unsigned int__aer_firmware_first:1;
unsigned intbroken_intx_masking:1;  /* INTx masking can't be used */
+   unsigned intbroken_nv_runpm:1;  /* some combinations of intel 
bridge controller and nvidia GPUs break rtd3 */
unsigned intio_window_1k:1; /* Intel bridge 1K I/O windows 
*/
unsigned intirq_managed:1;
unsigned inthas_secondary_link:1;
-- 
2.23.0

___