Re: [Nouveau] [PATCH v3] PCI: Reprogram bridge prefetch registers on resume

2018-10-02 Thread Bjorn Helgaas
On Tue, Oct 2, 2018 at 4:27 PM Thomas Martitz  wrote:
>
> Am 02.10.18 um 22:03 schrieb Bjorn Helgaas:
> > Hi Thomas,
> >
> > On Mon, Oct 01, 2018 at 04:25:06PM +0200, Thomas Martitz wrote:
> >> Am 01.10.18 um 06:57 schrieb Daniel Drake:
> >>> On Sun, Sep 30, 2018 at 5:07 AM Thomas Martitz  wrote:
>  The latest iteration does not work on my HP system. The GPU fails to
>  power up just like the unpatched kernel.
> >>>
> >>> That's weird, I would not expect a behaviour change in the latest
> >>> patch. pci_restore_config_dword() has some debug messages, could you
> >>> please make them visible and show logs again?
> >>> Also remind us of the PCI device address of the parent bridge (lspci -vt)
> >>
> >> I'll follow up with more the requested information on bugzilla
> >> (Link: https://bugzilla.kernel.org/show_bug.cgi?id=201069).
> >>
> >> On a quick re-check, it seems to depend on if I used the eGPU before
> >> the initial suspend. If I run glxgears (with DRI_PRIME=1) before suspend it
> >> seems fine.
> >
> > Does the patch ([1]) make things *worse* compared to v4.19-rc5?
> >
>
> No, certainly not. It does look like a different issue since resuming now
> works at least if I used the eGPU in some way before suspend
> (DRI_PRIME=1 glxgears seems to be enough, I assume glxinfo would work as
> well).
>
> Without the patch resuming the eGPU does not work whatsoever.
>
> Please ship the patch. I'll hopefully sort this other issue out.

Great, thanks!  That's what I thought, but just wanted to double check.
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH v3] PCI: Reprogram bridge prefetch registers on resume

2018-10-02 Thread Thomas Martitz

Am 02.10.18 um 22:03 schrieb Bjorn Helgaas:

Hi Thomas,

On Mon, Oct 01, 2018 at 04:25:06PM +0200, Thomas Martitz wrote:

Am 01.10.18 um 06:57 schrieb Daniel Drake:

On Sun, Sep 30, 2018 at 5:07 AM Thomas Martitz  wrote:

The latest iteration does not work on my HP system. The GPU fails to
power up just like the unpatched kernel.


That's weird, I would not expect a behaviour change in the latest
patch. pci_restore_config_dword() has some debug messages, could you
please make them visible and show logs again?
Also remind us of the PCI device address of the parent bridge (lspci -vt)


I'll follow up with more the requested information on bugzilla
(Link: https://bugzilla.kernel.org/show_bug.cgi?id=201069).

On a quick re-check, it seems to depend on if I used the eGPU before
the initial suspend. If I run glxgears (with DRI_PRIME=1) before suspend it
seems fine.


Does the patch ([1]) make things *worse* compared to v4.19-rc5?



No, certainly not. It does look like a different issue since resuming now
works at least if I used the eGPU in some way before suspend 
(DRI_PRIME=1 glxgears seems to be enough, I assume glxinfo would work as 
well).


Without the patch resuming the eGPU does not work whatsoever.

Please ship the patch. I'll hopefully sort this other issue out.

Best regards.

___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH v3] PCI: Reprogram bridge prefetch registers on resume

2018-10-02 Thread Bjorn Helgaas
Hi Thomas,

On Mon, Oct 01, 2018 at 04:25:06PM +0200, Thomas Martitz wrote:
> Am 01.10.18 um 06:57 schrieb Daniel Drake:
> > On Sun, Sep 30, 2018 at 5:07 AM Thomas Martitz  wrote:
> > > The latest iteration does not work on my HP system. The GPU fails to
> > > power up just like the unpatched kernel.
> > 
> > That's weird, I would not expect a behaviour change in the latest
> > patch. pci_restore_config_dword() has some debug messages, could you
> > please make them visible and show logs again?
> > Also remind us of the PCI device address of the parent bridge (lspci -vt)
> 
> I'll follow up with more the requested information on bugzilla
> (Link: https://bugzilla.kernel.org/show_bug.cgi?id=201069).
> 
> On a quick re-check, it seems to depend on if I used the eGPU before
> the initial suspend. If I run glxgears (with DRI_PRIME=1) before suspend it
> seems fine.

Does the patch ([1]) make things *worse* compared to v4.19-rc5?

If so, I'll drop the patch until we figure this out.  But if the GPU
power issue also occurs in v4.19-rc5, I think we can assume it's a
different problem and we can go ahead and merge [1].

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci.git/commit/?h=for-linus=083874549fdfefa629dfa752785e20427dde1511
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH v3] PCI: Reprogram bridge prefetch registers on resume

2018-10-01 Thread Thomas Martitz

Am 01.10.18 um 06:57 schrieb Daniel Drake:

On Sun, Sep 30, 2018 at 5:07 AM Thomas Martitz  wrote:

The latest iteration does not work on my HP system. The GPU fails to
power up just like the unpatched kernel.


That's weird, I would not expect a behaviour change in the latest
patch. pci_restore_config_dword() has some debug messages, could you
please make them visible and show logs again?
Also remind us of the PCI device address of the parent bridge (lspci -vt)



I'll follow up with more the requested information on bugzilla
(Link: https://bugzilla.kernel.org/show_bug.cgi?id=201069).

On a quick re-check, it seems to depend on if I used the eGPU before
the initial suspend. If I run glxgears (with DRI_PRIME=1) before suspend 
it seems fine.


Best regards.
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH v3] PCI: Reprogram bridge prefetch registers on resume

2018-09-30 Thread Daniel Drake
On Sun, Sep 30, 2018 at 5:07 AM Thomas Martitz  wrote:
> The latest iteration does not work on my HP system. The GPU fails to
> power up just like the unpatched kernel.

That's weird, I would not expect a behaviour change in the latest
patch. pci_restore_config_dword() has some debug messages, could you
please make them visible and show logs again?
Also remind us of the PCI device address of the parent bridge (lspci -vt)

Thanks
Daniel
___
Nouveau mailing list
Nouveau@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/nouveau


Re: [Nouveau] [PATCH v3] PCI: Reprogram bridge prefetch registers on resume

2018-09-29 Thread Thomas Martitz

Am 27.09.18 um 22:52 schrieb Bjorn Helgaas:

[+cc LKML]

On Tue, Sep 18, 2018 at 04:32:44PM -0500, Bjorn Helgaas wrote:

On Thu, Sep 13, 2018 at 11:37:45AM +0800, Daniel Drake wrote:

On 38+ Intel-based Asus products, the nvidia GPU becomes unusable
after S3 suspend/resume. The affected products include multiple
generations of nvidia GPUs and Intel SoCs. After resume, nouveau logs
many errors such as:

 fifo: fault 00 [READ] at 00555000 engine 00 [GR] client 04
   [HUB/FE] reason 4a [] on channel -1 [007fa91000 unknown]
 DRM: failed to idle channel 0 [DRM]

Similarly, the nvidia proprietary driver also fails after resume
(black screen, 100% CPU usage in Xorg process). We shipped a sample
to Nvidia for diagnosis, and their response indicated that it's a
problem with the parent PCI bridge (on the Intel SoC), not the GPU.

Runtime suspend/resume works fine, only S3 suspend is affected.

We found a workaround: on resume, rewrite the Intel PCI bridge
'Prefetchable Base Upper 32 Bits' register (PCI_PREF_BASE_UPPER32). In
the cases that I checked, this register has value 0 and we just have to
rewrite that value.

Linux already saves and restores PCI config space during suspend/resume,
but this register was being skipped because upon resume, it already
has value 0 (the correct, pre-suspend value).

Intel appear to have previously acknowledged this behaviour and the
requirement to rewrite this register.
https://bugzilla.kernel.org/show_bug.cgi?id=116851#c23

Based on that, rewrite the prefetch register values even when that
appears unnecessary.

We have confirmed this solution on all the affected models we have
in-hands (X542UQ, UX533FD, X530UN, V272UN).

Additionally, this solves an issue where r8169 MSI-X interrupts were
broken after S3 suspend/resume on Asus X441UAR. This issue was recently
worked around in commit 7bb05b85bc2d ("r8169: don't use MSI-X on
RTL8106e"). It also fixes the same issue on RTL6186evl/8111evl on an
Aimfor-tech laptop that we had not yet patched. I suspect it will also
fix the issue that was worked around in commit 7c53a722459c ("r8169:
don't use MSI-X on RTL8168g").

Thomas Martitz reports that this change also solves an issue where
the AMD Radeon Polaris 10 GPU on the HP Zbook 14u G5 is unresponsive
after S3 suspend/resume.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=201069
Signed-off-by: Daniel Drake 


Applied with Rafael's and Peter's reviewed-by to pci/enumeration for v4.20.
Thanks for the the huge investigative effort!


Since this looks low-risk and fixes several painful issues, I think
this merits a stable tag and being included in v4.19 (instead of
waiting for v4.20).

I moved it to for-linus for v4.19.  Let me know if you object.



The latest iteration does not work on my HP system. The GPU fails to 
power up just like the unpatched kernel.


[  516.833580] amdgpu :01:00.0: Refused to change power state, 
currently in D3
[  516.912885] amdgpu :01:00.0: Refused to change power state, 
currently in D3
[  516.929175] amdgpu :01:00.0: Refused to change power state, 
currently in D3
[  521.932435] [drm:atom_op_jump] *ERROR* atombios stuck in loop for 
more than 5secs aborting
[  521.932440] [drm:amdgpu_atom_execute_table_locked] *ERROR* atombios 
stuck executing C392 (len 62, WS 0, PS 0) @ 0xC3AE
[  521.932442] [drm:amdgpu_atom_execute_table_locked] *ERROR* atombios 
stuck executing ADB8 (len 140, WS 0, PS 8) @ 0xADD3

[  521.932444] [drm:amdgpu_device_resume] *ERROR* amdgpu asic init failed
[  522.883309] amdgpu :01:00.0: Wait for MC idle timedout !
[  523.831676] amdgpu :01:00.0: Wait for MC idle timedout !
[  523.832931] [drm] PCIE GART of 256M enabled (table at 
0x00F4).

[  523.836807] amdgpu: [powerplay] Failed to send Message.
[  523.836862] amdgpu: [powerplay] SMC address must be 4 byte aligned.
[  523.836863] amdgpu: [powerplay] [AVFS][Polaris10_SetupGfxLvlStruct] 
Problems copying VRConfig value over to SMC
[  523.836864] amdgpu: [powerplay] [AVFS][Polaris10_AVFSEventMgr] Could 
not Copy Graphics Level table over to SMU

[  523.836908] amdgpu: [powerplay]
last message was failed ret is 65535
[  523.836924] amdgpu: [powerplay]
failed to send message 252 ret is 65535
[  523.836949] amdgpu: [powerplay]
last message was failed ret is 65535
[  523.836965] amdgpu: [powerplay]
failed to send message 253 ret is 65535
[  523.836989] amdgpu: [powerplay]
last message was failed ret is 65535
[  523.837006] amdgpu: [powerplay]
failed to send message 250 ret is 65535
[  523.837029] amdgpu: [powerplay]
last message was failed ret is 65535
[  523.837045] amdgpu: [powerplay]






---
  drivers/pci/pci.c | 25 +
  1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 29ff9619b5fa..5d58220b6997 100644
--- a/drivers/pci/pci.c
+++ 

Re: [Nouveau] [PATCH v3] PCI: Reprogram bridge prefetch registers on resume

2018-09-27 Thread Bjorn Helgaas
[+cc LKML]

On Tue, Sep 18, 2018 at 04:32:44PM -0500, Bjorn Helgaas wrote:
> On Thu, Sep 13, 2018 at 11:37:45AM +0800, Daniel Drake wrote:
> > On 38+ Intel-based Asus products, the nvidia GPU becomes unusable
> > after S3 suspend/resume. The affected products include multiple
> > generations of nvidia GPUs and Intel SoCs. After resume, nouveau logs
> > many errors such as:
> > 
> > fifo: fault 00 [READ] at 00555000 engine 00 [GR] client 04
> >   [HUB/FE] reason 4a [] on channel -1 [007fa91000 unknown]
> > DRM: failed to idle channel 0 [DRM]
> > 
> > Similarly, the nvidia proprietary driver also fails after resume
> > (black screen, 100% CPU usage in Xorg process). We shipped a sample
> > to Nvidia for diagnosis, and their response indicated that it's a
> > problem with the parent PCI bridge (on the Intel SoC), not the GPU.
> > 
> > Runtime suspend/resume works fine, only S3 suspend is affected.
> > 
> > We found a workaround: on resume, rewrite the Intel PCI bridge
> > 'Prefetchable Base Upper 32 Bits' register (PCI_PREF_BASE_UPPER32). In
> > the cases that I checked, this register has value 0 and we just have to
> > rewrite that value.
> > 
> > Linux already saves and restores PCI config space during suspend/resume,
> > but this register was being skipped because upon resume, it already
> > has value 0 (the correct, pre-suspend value).
> > 
> > Intel appear to have previously acknowledged this behaviour and the
> > requirement to rewrite this register.
> > https://bugzilla.kernel.org/show_bug.cgi?id=116851#c23
> > 
> > Based on that, rewrite the prefetch register values even when that
> > appears unnecessary.
> > 
> > We have confirmed this solution on all the affected models we have
> > in-hands (X542UQ, UX533FD, X530UN, V272UN).
> > 
> > Additionally, this solves an issue where r8169 MSI-X interrupts were
> > broken after S3 suspend/resume on Asus X441UAR. This issue was recently
> > worked around in commit 7bb05b85bc2d ("r8169: don't use MSI-X on
> > RTL8106e"). It also fixes the same issue on RTL6186evl/8111evl on an
> > Aimfor-tech laptop that we had not yet patched. I suspect it will also
> > fix the issue that was worked around in commit 7c53a722459c ("r8169:
> > don't use MSI-X on RTL8168g").
> > 
> > Thomas Martitz reports that this change also solves an issue where
> > the AMD Radeon Polaris 10 GPU on the HP Zbook 14u G5 is unresponsive
> > after S3 suspend/resume.
> > 
> > Link: https://bugzilla.kernel.org/show_bug.cgi?id=201069
> > Signed-off-by: Daniel Drake 
> 
> Applied with Rafael's and Peter's reviewed-by to pci/enumeration for v4.20.
> Thanks for the the huge investigative effort!

Since this looks low-risk and fixes several painful issues, I think
this merits a stable tag and being included in v4.19 (instead of
waiting for v4.20).  

I moved it to for-linus for v4.19.  Let me know if you object.

> > ---
> >  drivers/pci/pci.c | 25 +
> >  1 file changed, 17 insertions(+), 8 deletions(-)
> > 
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index 29ff9619b5fa..5d58220b6997 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -1289,12 +1289,12 @@ int pci_save_state(struct pci_dev *dev)
> >  EXPORT_SYMBOL(pci_save_state);
> >  
> >  static void pci_restore_config_dword(struct pci_dev *pdev, int offset,
> > -u32 saved_val, int retry)
> > +u32 saved_val, int retry, bool force)
> >  {
> > u32 val;
> >  
> > pci_read_config_dword(pdev, offset, );
> > -   if (val == saved_val)
> > +   if (!force && val == saved_val)
> > return;
> >  
> > for (;;) {
> > @@ -1313,25 +1313,34 @@ static void pci_restore_config_dword(struct pci_dev 
> > *pdev, int offset,
> >  }
> >  
> >  static void pci_restore_config_space_range(struct pci_dev *pdev,
> > -  int start, int end, int retry)
> > +  int start, int end, int retry,
> > +  bool force)
> >  {
> > int index;
> >  
> > for (index = end; index >= start; index--)
> > pci_restore_config_dword(pdev, 4 * index,
> >  pdev->saved_config_space[index],
> > -retry);
> > +retry, force);
> >  }
> >  
> >  static void pci_restore_config_space(struct pci_dev *pdev)
> >  {
> > if (pdev->hdr_type == PCI_HEADER_TYPE_NORMAL) {
> > -   pci_restore_config_space_range(pdev, 10, 15, 0);
> > +   pci_restore_config_space_range(pdev, 10, 15, 0, false);
> > /* Restore BARs before the command register. */
> > -   pci_restore_config_space_range(pdev, 4, 9, 10);
> > -   pci_restore_config_space_range(pdev, 0, 3, 0);
> > +   pci_restore_config_space_range(pdev, 4, 9, 10, false);
> > +   pci_restore_config_space_range(pdev, 0, 3, 0, 

Re: [Nouveau] [PATCH v3] PCI: Reprogram bridge prefetch registers on resume

2018-09-18 Thread Bjorn Helgaas
On Thu, Sep 13, 2018 at 11:37:45AM +0800, Daniel Drake wrote:
> On 38+ Intel-based Asus products, the nvidia GPU becomes unusable
> after S3 suspend/resume. The affected products include multiple
> generations of nvidia GPUs and Intel SoCs. After resume, nouveau logs
> many errors such as:
> 
> fifo: fault 00 [READ] at 00555000 engine 00 [GR] client 04
>   [HUB/FE] reason 4a [] on channel -1 [007fa91000 unknown]
> DRM: failed to idle channel 0 [DRM]
> 
> Similarly, the nvidia proprietary driver also fails after resume
> (black screen, 100% CPU usage in Xorg process). We shipped a sample
> to Nvidia for diagnosis, and their response indicated that it's a
> problem with the parent PCI bridge (on the Intel SoC), not the GPU.
> 
> Runtime suspend/resume works fine, only S3 suspend is affected.
> 
> We found a workaround: on resume, rewrite the Intel PCI bridge
> 'Prefetchable Base Upper 32 Bits' register (PCI_PREF_BASE_UPPER32). In
> the cases that I checked, this register has value 0 and we just have to
> rewrite that value.
> 
> Linux already saves and restores PCI config space during suspend/resume,
> but this register was being skipped because upon resume, it already
> has value 0 (the correct, pre-suspend value).
> 
> Intel appear to have previously acknowledged this behaviour and the
> requirement to rewrite this register.
> https://bugzilla.kernel.org/show_bug.cgi?id=116851#c23
> 
> Based on that, rewrite the prefetch register values even when that
> appears unnecessary.
> 
> We have confirmed this solution on all the affected models we have
> in-hands (X542UQ, UX533FD, X530UN, V272UN).
> 
> Additionally, this solves an issue where r8169 MSI-X interrupts were
> broken after S3 suspend/resume on Asus X441UAR. This issue was recently
> worked around in commit 7bb05b85bc2d ("r8169: don't use MSI-X on
> RTL8106e"). It also fixes the same issue on RTL6186evl/8111evl on an
> Aimfor-tech laptop that we had not yet patched. I suspect it will also
> fix the issue that was worked around in commit 7c53a722459c ("r8169:
> don't use MSI-X on RTL8168g").
> 
> Thomas Martitz reports that this change also solves an issue where
> the AMD Radeon Polaris 10 GPU on the HP Zbook 14u G5 is unresponsive
> after S3 suspend/resume.
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=201069
> Signed-off-by: Daniel Drake 

Applied with Rafael's and Peter's reviewed-by to pci/enumeration for v4.20.
Thanks for the the huge investigative effort!

> ---
>  drivers/pci/pci.c | 25 +
>  1 file changed, 17 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 29ff9619b5fa..5d58220b6997 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -1289,12 +1289,12 @@ int pci_save_state(struct pci_dev *dev)
>  EXPORT_SYMBOL(pci_save_state);
>  
>  static void pci_restore_config_dword(struct pci_dev *pdev, int offset,
> -  u32 saved_val, int retry)
> +  u32 saved_val, int retry, bool force)
>  {
>   u32 val;
>  
>   pci_read_config_dword(pdev, offset, );
> - if (val == saved_val)
> + if (!force && val == saved_val)
>   return;
>  
>   for (;;) {
> @@ -1313,25 +1313,34 @@ static void pci_restore_config_dword(struct pci_dev 
> *pdev, int offset,
>  }
>  
>  static void pci_restore_config_space_range(struct pci_dev *pdev,
> -int start, int end, int retry)
> +int start, int end, int retry,
> +bool force)
>  {
>   int index;
>  
>   for (index = end; index >= start; index--)
>   pci_restore_config_dword(pdev, 4 * index,
>pdev->saved_config_space[index],
> -  retry);
> +  retry, force);
>  }
>  
>  static void pci_restore_config_space(struct pci_dev *pdev)
>  {
>   if (pdev->hdr_type == PCI_HEADER_TYPE_NORMAL) {
> - pci_restore_config_space_range(pdev, 10, 15, 0);
> + pci_restore_config_space_range(pdev, 10, 15, 0, false);
>   /* Restore BARs before the command register. */
> - pci_restore_config_space_range(pdev, 4, 9, 10);
> - pci_restore_config_space_range(pdev, 0, 3, 0);
> + pci_restore_config_space_range(pdev, 4, 9, 10, false);
> + pci_restore_config_space_range(pdev, 0, 3, 0, false);
> + } else if (pdev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
> + pci_restore_config_space_range(pdev, 12, 15, 0, false);
> + /* Force rewriting of prefetch registers to avoid
> +  * S3 resume issues on Intel PCI bridges that occur when
> +  * these registers are not explicitly written.
> +  */
> + pci_restore_config_space_range(pdev, 9, 11, 0, true);
> + 

Re: [Nouveau] [PATCH v3] PCI: Reprogram bridge prefetch registers on resume

2018-09-13 Thread Peter Wu
On Thu, Sep 13, 2018 at 08:52:29AM +0200, Rafael J. Wysocki wrote:
> On Thu, Sep 13, 2018 at 5:37 AM Daniel Drake  wrote:
> >
> > On 38+ Intel-based Asus products, the nvidia GPU becomes unusable
> > after S3 suspend/resume. The affected products include multiple
> > generations of nvidia GPUs and Intel SoCs. After resume, nouveau logs
> > many errors such as:
> >
> > fifo: fault 00 [READ] at 00555000 engine 00 [GR] client 04
> >   [HUB/FE] reason 4a [] on channel -1 [007fa91000 unknown]
> > DRM: failed to idle channel 0 [DRM]
> >
> > Similarly, the nvidia proprietary driver also fails after resume
> > (black screen, 100% CPU usage in Xorg process). We shipped a sample
> > to Nvidia for diagnosis, and their response indicated that it's a
> > problem with the parent PCI bridge (on the Intel SoC), not the GPU.
> >
> > Runtime suspend/resume works fine, only S3 suspend is affected.
> >
> > We found a workaround: on resume, rewrite the Intel PCI bridge
> > 'Prefetchable Base Upper 32 Bits' register (PCI_PREF_BASE_UPPER32). In
> > the cases that I checked, this register has value 0 and we just have to
> > rewrite that value.
> >
> > Linux already saves and restores PCI config space during suspend/resume,
> > but this register was being skipped because upon resume, it already
> > has value 0 (the correct, pre-suspend value).
> >
> > Intel appear to have previously acknowledged this behaviour and the
> > requirement to rewrite this register.
> > https://bugzilla.kernel.org/show_bug.cgi?id=116851#c23
> >
> > Based on that, rewrite the prefetch register values even when that
> > appears unnecessary.
> >
> > We have confirmed this solution on all the affected models we have
> > in-hands (X542UQ, UX533FD, X530UN, V272UN).
> >
> > Additionally, this solves an issue where r8169 MSI-X interrupts were
> > broken after S3 suspend/resume on Asus X441UAR. This issue was recently
> > worked around in commit 7bb05b85bc2d ("r8169: don't use MSI-X on
> > RTL8106e"). It also fixes the same issue on RTL6186evl/8111evl on an
> > Aimfor-tech laptop that we had not yet patched. I suspect it will also
> > fix the issue that was worked around in commit 7c53a722459c ("r8169:
> > don't use MSI-X on RTL8168g").
> >
> > Thomas Martitz reports that this change also solves an issue where
> > the AMD Radeon Polaris 10 GPU on the HP Zbook 14u G5 is unresponsive
> > after S3 suspend/resume.
> >
> > Link: https://bugzilla.kernel.org/show_bug.cgi?id=201069
> > Signed-off-by: Daniel Drake 
> 
> Still
> 
> Reviewed-by: Rafael J. Wysocki 

Reviewed-By: Peter Wu 

> > ---
> >  drivers/pci/pci.c | 25 +
> >  1 file changed, 17 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> > index 29ff9619b5fa..5d58220b6997 100644
> > --- a/drivers/pci/pci.c
> > +++ b/drivers/pci/pci.c
> > @@ -1289,12 +1289,12 @@ int pci_save_state(struct pci_dev *dev)
> >  EXPORT_SYMBOL(pci_save_state);
> >
> >  static void pci_restore_config_dword(struct pci_dev *pdev, int offset,
> > -u32 saved_val, int retry)
> > +u32 saved_val, int retry, bool force)
> >  {
> > u32 val;
> >
> > pci_read_config_dword(pdev, offset, );
> > -   if (val == saved_val)
> > +   if (!force && val == saved_val)
> > return;
> >
> > for (;;) {
> > @@ -1313,25 +1313,34 @@ static void pci_restore_config_dword(struct pci_dev 
> > *pdev, int offset,
> >  }
> >
> >  static void pci_restore_config_space_range(struct pci_dev *pdev,
> > -  int start, int end, int retry)
> > +  int start, int end, int retry,
> > +  bool force)
> >  {
> > int index;
> >
> > for (index = end; index >= start; index--)
> > pci_restore_config_dword(pdev, 4 * index,
> >  pdev->saved_config_space[index],
> > -retry);
> > +retry, force);
> >  }
> >
> >  static void pci_restore_config_space(struct pci_dev *pdev)
> >  {
> > if (pdev->hdr_type == PCI_HEADER_TYPE_NORMAL) {
> > -   pci_restore_config_space_range(pdev, 10, 15, 0);
> > +   pci_restore_config_space_range(pdev, 10, 15, 0, false);
> > /* Restore BARs before the command register. */
> > -   pci_restore_config_space_range(pdev, 4, 9, 10);
> > -   pci_restore_config_space_range(pdev, 0, 3, 0);
> > +   pci_restore_config_space_range(pdev, 4, 9, 10, false);
> > +   pci_restore_config_space_range(pdev, 0, 3, 0, false);
> > +   } else if (pdev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
> > +   pci_restore_config_space_range(pdev, 12, 15, 0, false);
> > +   /* Force rewriting of prefetch registers 

Re: [Nouveau] [PATCH v3] PCI: Reprogram bridge prefetch registers on resume

2018-09-13 Thread Rafael J. Wysocki
On Thu, Sep 13, 2018 at 5:37 AM Daniel Drake  wrote:
>
> On 38+ Intel-based Asus products, the nvidia GPU becomes unusable
> after S3 suspend/resume. The affected products include multiple
> generations of nvidia GPUs and Intel SoCs. After resume, nouveau logs
> many errors such as:
>
> fifo: fault 00 [READ] at 00555000 engine 00 [GR] client 04
>   [HUB/FE] reason 4a [] on channel -1 [007fa91000 unknown]
> DRM: failed to idle channel 0 [DRM]
>
> Similarly, the nvidia proprietary driver also fails after resume
> (black screen, 100% CPU usage in Xorg process). We shipped a sample
> to Nvidia for diagnosis, and their response indicated that it's a
> problem with the parent PCI bridge (on the Intel SoC), not the GPU.
>
> Runtime suspend/resume works fine, only S3 suspend is affected.
>
> We found a workaround: on resume, rewrite the Intel PCI bridge
> 'Prefetchable Base Upper 32 Bits' register (PCI_PREF_BASE_UPPER32). In
> the cases that I checked, this register has value 0 and we just have to
> rewrite that value.
>
> Linux already saves and restores PCI config space during suspend/resume,
> but this register was being skipped because upon resume, it already
> has value 0 (the correct, pre-suspend value).
>
> Intel appear to have previously acknowledged this behaviour and the
> requirement to rewrite this register.
> https://bugzilla.kernel.org/show_bug.cgi?id=116851#c23
>
> Based on that, rewrite the prefetch register values even when that
> appears unnecessary.
>
> We have confirmed this solution on all the affected models we have
> in-hands (X542UQ, UX533FD, X530UN, V272UN).
>
> Additionally, this solves an issue where r8169 MSI-X interrupts were
> broken after S3 suspend/resume on Asus X441UAR. This issue was recently
> worked around in commit 7bb05b85bc2d ("r8169: don't use MSI-X on
> RTL8106e"). It also fixes the same issue on RTL6186evl/8111evl on an
> Aimfor-tech laptop that we had not yet patched. I suspect it will also
> fix the issue that was worked around in commit 7c53a722459c ("r8169:
> don't use MSI-X on RTL8168g").
>
> Thomas Martitz reports that this change also solves an issue where
> the AMD Radeon Polaris 10 GPU on the HP Zbook 14u G5 is unresponsive
> after S3 suspend/resume.
>
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=201069
> Signed-off-by: Daniel Drake 

Still

Reviewed-by: Rafael J. Wysocki 

> ---
>  drivers/pci/pci.c | 25 +
>  1 file changed, 17 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
> index 29ff9619b5fa..5d58220b6997 100644
> --- a/drivers/pci/pci.c
> +++ b/drivers/pci/pci.c
> @@ -1289,12 +1289,12 @@ int pci_save_state(struct pci_dev *dev)
>  EXPORT_SYMBOL(pci_save_state);
>
>  static void pci_restore_config_dword(struct pci_dev *pdev, int offset,
> -u32 saved_val, int retry)
> +u32 saved_val, int retry, bool force)
>  {
> u32 val;
>
> pci_read_config_dword(pdev, offset, );
> -   if (val == saved_val)
> +   if (!force && val == saved_val)
> return;
>
> for (;;) {
> @@ -1313,25 +1313,34 @@ static void pci_restore_config_dword(struct pci_dev 
> *pdev, int offset,
>  }
>
>  static void pci_restore_config_space_range(struct pci_dev *pdev,
> -  int start, int end, int retry)
> +  int start, int end, int retry,
> +  bool force)
>  {
> int index;
>
> for (index = end; index >= start; index--)
> pci_restore_config_dword(pdev, 4 * index,
>  pdev->saved_config_space[index],
> -retry);
> +retry, force);
>  }
>
>  static void pci_restore_config_space(struct pci_dev *pdev)
>  {
> if (pdev->hdr_type == PCI_HEADER_TYPE_NORMAL) {
> -   pci_restore_config_space_range(pdev, 10, 15, 0);
> +   pci_restore_config_space_range(pdev, 10, 15, 0, false);
> /* Restore BARs before the command register. */
> -   pci_restore_config_space_range(pdev, 4, 9, 10);
> -   pci_restore_config_space_range(pdev, 0, 3, 0);
> +   pci_restore_config_space_range(pdev, 4, 9, 10, false);
> +   pci_restore_config_space_range(pdev, 0, 3, 0, false);
> +   } else if (pdev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
> +   pci_restore_config_space_range(pdev, 12, 15, 0, false);
> +   /* Force rewriting of prefetch registers to avoid
> +* S3 resume issues on Intel PCI bridges that occur when
> +* these registers are not explicitly written.
> +*/
> +   pci_restore_config_space_range(pdev, 9, 11, 0, true);
> +   pci_restore_config_space_range(pdev, 0, 8, 0, 

[Nouveau] [PATCH v3] PCI: Reprogram bridge prefetch registers on resume

2018-09-12 Thread Daniel Drake
On 38+ Intel-based Asus products, the nvidia GPU becomes unusable
after S3 suspend/resume. The affected products include multiple
generations of nvidia GPUs and Intel SoCs. After resume, nouveau logs
many errors such as:

fifo: fault 00 [READ] at 00555000 engine 00 [GR] client 04
  [HUB/FE] reason 4a [] on channel -1 [007fa91000 unknown]
DRM: failed to idle channel 0 [DRM]

Similarly, the nvidia proprietary driver also fails after resume
(black screen, 100% CPU usage in Xorg process). We shipped a sample
to Nvidia for diagnosis, and their response indicated that it's a
problem with the parent PCI bridge (on the Intel SoC), not the GPU.

Runtime suspend/resume works fine, only S3 suspend is affected.

We found a workaround: on resume, rewrite the Intel PCI bridge
'Prefetchable Base Upper 32 Bits' register (PCI_PREF_BASE_UPPER32). In
the cases that I checked, this register has value 0 and we just have to
rewrite that value.

Linux already saves and restores PCI config space during suspend/resume,
but this register was being skipped because upon resume, it already
has value 0 (the correct, pre-suspend value).

Intel appear to have previously acknowledged this behaviour and the
requirement to rewrite this register.
https://bugzilla.kernel.org/show_bug.cgi?id=116851#c23

Based on that, rewrite the prefetch register values even when that
appears unnecessary.

We have confirmed this solution on all the affected models we have
in-hands (X542UQ, UX533FD, X530UN, V272UN).

Additionally, this solves an issue where r8169 MSI-X interrupts were
broken after S3 suspend/resume on Asus X441UAR. This issue was recently
worked around in commit 7bb05b85bc2d ("r8169: don't use MSI-X on
RTL8106e"). It also fixes the same issue on RTL6186evl/8111evl on an
Aimfor-tech laptop that we had not yet patched. I suspect it will also
fix the issue that was worked around in commit 7c53a722459c ("r8169:
don't use MSI-X on RTL8168g").

Thomas Martitz reports that this change also solves an issue where
the AMD Radeon Polaris 10 GPU on the HP Zbook 14u G5 is unresponsive
after S3 suspend/resume.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=201069
Signed-off-by: Daniel Drake 
---
 drivers/pci/pci.c | 25 +
 1 file changed, 17 insertions(+), 8 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 29ff9619b5fa..5d58220b6997 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1289,12 +1289,12 @@ int pci_save_state(struct pci_dev *dev)
 EXPORT_SYMBOL(pci_save_state);
 
 static void pci_restore_config_dword(struct pci_dev *pdev, int offset,
-u32 saved_val, int retry)
+u32 saved_val, int retry, bool force)
 {
u32 val;
 
pci_read_config_dword(pdev, offset, );
-   if (val == saved_val)
+   if (!force && val == saved_val)
return;
 
for (;;) {
@@ -1313,25 +1313,34 @@ static void pci_restore_config_dword(struct pci_dev 
*pdev, int offset,
 }
 
 static void pci_restore_config_space_range(struct pci_dev *pdev,
-  int start, int end, int retry)
+  int start, int end, int retry,
+  bool force)
 {
int index;
 
for (index = end; index >= start; index--)
pci_restore_config_dword(pdev, 4 * index,
 pdev->saved_config_space[index],
-retry);
+retry, force);
 }
 
 static void pci_restore_config_space(struct pci_dev *pdev)
 {
if (pdev->hdr_type == PCI_HEADER_TYPE_NORMAL) {
-   pci_restore_config_space_range(pdev, 10, 15, 0);
+   pci_restore_config_space_range(pdev, 10, 15, 0, false);
/* Restore BARs before the command register. */
-   pci_restore_config_space_range(pdev, 4, 9, 10);
-   pci_restore_config_space_range(pdev, 0, 3, 0);
+   pci_restore_config_space_range(pdev, 4, 9, 10, false);
+   pci_restore_config_space_range(pdev, 0, 3, 0, false);
+   } else if (pdev->hdr_type == PCI_HEADER_TYPE_BRIDGE) {
+   pci_restore_config_space_range(pdev, 12, 15, 0, false);
+   /* Force rewriting of prefetch registers to avoid
+* S3 resume issues on Intel PCI bridges that occur when
+* these registers are not explicitly written.
+*/
+   pci_restore_config_space_range(pdev, 9, 11, 0, true);
+   pci_restore_config_space_range(pdev, 0, 8, 0, false);
} else {
-   pci_restore_config_space_range(pdev, 0, 15, 0);
+   pci_restore_config_space_range(pdev, 0, 15, 0, false);
}
 }
 
-- 
2.17.1

___
Nouveau mailing list
Nouveau@lists.freedesktop.org