Re: [PATCH] drm/amdgpu: always reset asic when going into suspend

2020-10-23 Thread Alex Deucher
On Thu, Jan 16, 2020 at 10:14 AM Alex Deucher  wrote:
>
> On Wed, Jan 15, 2020 at 2:44 AM Daniel Drake  wrote:
> >
> > On Thu, Dec 19, 2019 at 10:08 PM Alex Deucher  wrote:
> > > I think there may be some AMD specific handling needed in
> > > drivers/acpi/sleep.c.  My understanding from reading the modern
> > > standby documents from MS is that each vendor needs to provide a
> > > platform specific PEP driver.  I'm not sure how much of that current
> > > code is Intel specific or not.
> >
> > I don't think there is anything Intel-specific in drivers/acpi/sleep.c.
> >
> > Reading more about PEP, I see that Linux supports PEP devices with
> > ACPI ID INT33A1 or PNP0D80. Indeed the Intel platforms we work with
> > have INT33A1 devices in their ACPI tables.
> >
> > This product has a \_SB.PEP ACPI device with _HID AMD0004 and _CID
> > PNP0D80. Full acpidump:
> > https://gist.github.com/dsd/ff3dfc0f63cdd9eba4a0fbd9e776e8be (see
> > ssdt7)
> >
> > This PEP device responds to a _DSM with UUID argument
> > "e3f32452-febc-43ce-9039-932122d37721", which is not the one
> > documented at 
> > https://uefi.org/sites/default/files/resources/Intel_ACPI_Low_Power_S0_Idle.pdf
> >
> > Nevertheless, there is some data about the GPU:
> > Package (0x04)
> > {
> > One,
> > "\\_SB.PCI0.GP17.VGA",
> > Zero,
> > 0x03
> > },
> >
> > However since this data is identical to many other devices that
> > suspend and resume just fine, I wonder if it is really important.
> >
> > The one supported method does offer two calls which may mirror the
> > Display Off/On Notifications in the above spec:
> > Case (0x02)
> > {
> > \_SB.PCI0.SBRG.EC0.CSEE (0xB7)
> > Return (Zero)
> > }
> > Case (0x03)
> > {
> > \_SB.PCI0.SBRG.EC0.CSEE (0xB8)
> > Notify (\_SB.PCI0.SBRG.EC0.LID, 0x80) //
> > Status Change
> > Return (Zero)
> > }
> >
> > but I tried executing this code after suspending amdgpu, and the
> > problem still stands, amdgpu cannot wakeup correctly.
> >
> > There's nothing else really interesting in the PEP device as far as I can 
> > see.
> >
> > PEP things aside, I am still quite suspicious about the fact that
> > calling amdgpu_device_suspend() then amdgpu_device_resume() on
> > multiple products (not just this one) fails. It seems that this code
> > flow is relying on the BIOS doing something in the S3 suspend/resume
> > path in order to make the device resumable by amdgpu_device_resume(),
> > which is why we have only encountered this issue for the first time on
> > our first AMD platform that does not support S3 suspend.
> >
>
> It makes sense.  This is an SOC, not a collection of individual
> devices.  There are more devices than power rails so the sbios (via
> ACPI) has to handle the power rail.  All of the devices using that
> power rail have to suspend and tell the sbios before the platform
> microcontroller can turn off the power rail.  Presumably there is
> something missing that prevents the microcontroller for powering down
> the rail.  If the power rail is kept on, the device is never powered
> off and still retains its current state.
>
> > With that in mind, and lacking any better info, wouldn't it make sense
> > for amdgpu_device_resume() to always call reset? Maybe it's not
> > necessary in the S3 case, but it shouldn't harm anything. Or perhaps
> > it could check if the device is alive and reset it if it's not?
>
> It's just papering over the problem.  It would be better from a power
> perspective for the driver to just not suspend and keep running like
> normal.  When the driver is not suspended runtime things like clock
> and power gating are active which keep the GPU power at a minimum.
>
> >
> > Alternatively do you have any other contacts within AMD that could
> > help us figure out the underlying question of how to correctly suspend
> > and resume these devices? Happy to ship an affected product sample
> > your way.
> >
>
> I talked to our sbios team and they seem to think our S0ix
> implementation works pretty differently from Intel's.  I'm not really
> an expert on this area however.  We have a new team ramping on up this
> for Linux however.

Initial patches for S0ix platform support:
https://patchwork.kernel.org/project/platform-driver-x86/patch/20201023080410.458629-1-shyam-sundar@amd.com/
https://patchwork.kernel.org/project/linux-acpi/patch/20201023080315.458570-1-shyam-sundar@amd.com/

You also need the following GPU driver patches:
https://patchwork.freedesktop.org/series/82762/

There is also an audio patch required which will be available soon.

Alex

Re: [PATCH] drm/amdgpu: always reset asic when going into suspend

2020-02-06 Thread Daniel Drake
On Thu, Jan 16, 2020 at 11:15 PM Alex Deucher  wrote:
> It's just papering over the problem.  It would be better from a power
> perspective for the driver to just not suspend and keep running like
> normal.  When the driver is not suspended runtime things like clock
> and power gating are active which keep the GPU power at a minimum.

Until we have a better solution, are there any strategies we could
apply here to avoid the suspend as you say?
e.g. DMI quirk these products to disable suspend? Or disable suspend
on all s2idle setups?

This would certainly be better than the current situation of the
machine becoming unusable on resume.

> I talked to our sbios team and they seem to think our S0ix
> implementation works pretty differently from Intel's.  I'm not really
> an expert on this area however.  We have a new team ramping on up this
> for Linux however.

Thanks for following up on this internally! Can I lend a product
sample to the new team so that they have direct access?

Daniel
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: always reset asic when going into suspend

2020-01-16 Thread Alex Deucher
On Wed, Jan 15, 2020 at 2:44 AM Daniel Drake  wrote:
>
> On Thu, Dec 19, 2019 at 10:08 PM Alex Deucher  wrote:
> > I think there may be some AMD specific handling needed in
> > drivers/acpi/sleep.c.  My understanding from reading the modern
> > standby documents from MS is that each vendor needs to provide a
> > platform specific PEP driver.  I'm not sure how much of that current
> > code is Intel specific or not.
>
> I don't think there is anything Intel-specific in drivers/acpi/sleep.c.
>
> Reading more about PEP, I see that Linux supports PEP devices with
> ACPI ID INT33A1 or PNP0D80. Indeed the Intel platforms we work with
> have INT33A1 devices in their ACPI tables.
>
> This product has a \_SB.PEP ACPI device with _HID AMD0004 and _CID
> PNP0D80. Full acpidump:
> https://gist.github.com/dsd/ff3dfc0f63cdd9eba4a0fbd9e776e8be (see
> ssdt7)
>
> This PEP device responds to a _DSM with UUID argument
> "e3f32452-febc-43ce-9039-932122d37721", which is not the one
> documented at 
> https://uefi.org/sites/default/files/resources/Intel_ACPI_Low_Power_S0_Idle.pdf
>
> Nevertheless, there is some data about the GPU:
> Package (0x04)
> {
> One,
> "\\_SB.PCI0.GP17.VGA",
> Zero,
> 0x03
> },
>
> However since this data is identical to many other devices that
> suspend and resume just fine, I wonder if it is really important.
>
> The one supported method does offer two calls which may mirror the
> Display Off/On Notifications in the above spec:
> Case (0x02)
> {
> \_SB.PCI0.SBRG.EC0.CSEE (0xB7)
> Return (Zero)
> }
> Case (0x03)
> {
> \_SB.PCI0.SBRG.EC0.CSEE (0xB8)
> Notify (\_SB.PCI0.SBRG.EC0.LID, 0x80) //
> Status Change
> Return (Zero)
> }
>
> but I tried executing this code after suspending amdgpu, and the
> problem still stands, amdgpu cannot wakeup correctly.
>
> There's nothing else really interesting in the PEP device as far as I can see.
>
> PEP things aside, I am still quite suspicious about the fact that
> calling amdgpu_device_suspend() then amdgpu_device_resume() on
> multiple products (not just this one) fails. It seems that this code
> flow is relying on the BIOS doing something in the S3 suspend/resume
> path in order to make the device resumable by amdgpu_device_resume(),
> which is why we have only encountered this issue for the first time on
> our first AMD platform that does not support S3 suspend.
>

It makes sense.  This is an SOC, not a collection of individual
devices.  There are more devices than power rails so the sbios (via
ACPI) has to handle the power rail.  All of the devices using that
power rail have to suspend and tell the sbios before the platform
microcontroller can turn off the power rail.  Presumably there is
something missing that prevents the microcontroller for powering down
the rail.  If the power rail is kept on, the device is never powered
off and still retains its current state.

> With that in mind, and lacking any better info, wouldn't it make sense
> for amdgpu_device_resume() to always call reset? Maybe it's not
> necessary in the S3 case, but it shouldn't harm anything. Or perhaps
> it could check if the device is alive and reset it if it's not?

It's just papering over the problem.  It would be better from a power
perspective for the driver to just not suspend and keep running like
normal.  When the driver is not suspended runtime things like clock
and power gating are active which keep the GPU power at a minimum.

>
> Alternatively do you have any other contacts within AMD that could
> help us figure out the underlying question of how to correctly suspend
> and resume these devices? Happy to ship an affected product sample
> your way.
>

I talked to our sbios team and they seem to think our S0ix
implementation works pretty differently from Intel's.  I'm not really
an expert on this area however.  We have a new team ramping on up this
for Linux however.

Alex
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: always reset asic when going into suspend

2020-01-14 Thread Daniel Drake
On Thu, Dec 19, 2019 at 10:08 PM Alex Deucher  wrote:
> I think there may be some AMD specific handling needed in
> drivers/acpi/sleep.c.  My understanding from reading the modern
> standby documents from MS is that each vendor needs to provide a
> platform specific PEP driver.  I'm not sure how much of that current
> code is Intel specific or not.

I don't think there is anything Intel-specific in drivers/acpi/sleep.c.

Reading more about PEP, I see that Linux supports PEP devices with
ACPI ID INT33A1 or PNP0D80. Indeed the Intel platforms we work with
have INT33A1 devices in their ACPI tables.

This product has a \_SB.PEP ACPI device with _HID AMD0004 and _CID
PNP0D80. Full acpidump:
https://gist.github.com/dsd/ff3dfc0f63cdd9eba4a0fbd9e776e8be (see
ssdt7)

This PEP device responds to a _DSM with UUID argument
"e3f32452-febc-43ce-9039-932122d37721", which is not the one
documented at 
https://uefi.org/sites/default/files/resources/Intel_ACPI_Low_Power_S0_Idle.pdf

Nevertheless, there is some data about the GPU:
Package (0x04)
{
One,
"\\_SB.PCI0.GP17.VGA",
Zero,
0x03
},

However since this data is identical to many other devices that
suspend and resume just fine, I wonder if it is really important.

The one supported method does offer two calls which may mirror the
Display Off/On Notifications in the above spec:
Case (0x02)
{
\_SB.PCI0.SBRG.EC0.CSEE (0xB7)
Return (Zero)
}
Case (0x03)
{
\_SB.PCI0.SBRG.EC0.CSEE (0xB8)
Notify (\_SB.PCI0.SBRG.EC0.LID, 0x80) //
Status Change
Return (Zero)
}

but I tried executing this code after suspending amdgpu, and the
problem still stands, amdgpu cannot wakeup correctly.

There's nothing else really interesting in the PEP device as far as I can see.

PEP things aside, I am still quite suspicious about the fact that
calling amdgpu_device_suspend() then amdgpu_device_resume() on
multiple products (not just this one) fails. It seems that this code
flow is relying on the BIOS doing something in the S3 suspend/resume
path in order to make the device resumable by amdgpu_device_resume(),
which is why we have only encountered this issue for the first time on
our first AMD platform that does not support S3 suspend.

With that in mind, and lacking any better info, wouldn't it make sense
for amdgpu_device_resume() to always call reset? Maybe it's not
necessary in the S3 case, but it shouldn't harm anything. Or perhaps
it could check if the device is alive and reset it if it's not?

Alternatively do you have any other contacts within AMD that could
help us figure out the underlying question of how to correctly suspend
and resume these devices? Happy to ship an affected product sample
your way.

Thanks
Daniel
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: always reset asic when going into suspend

2019-12-19 Thread Alex Deucher
On Mon, Dec 16, 2019 at 4:00 AM Daniel Drake  wrote:
>
> Hi Alex,
>
> On Mon, Nov 25, 2019 at 1:17 PM Daniel Drake  wrote:
> > Unfortunately not. The original issue still exists (dead gfx after
> > resume from s2idle) and also when I trigger execution of the suspend
> > or runtime suspend routines the power usage increases around 1.5W as
> > before.
> >
> > Have you confirmed that amdgpu s2idle is working on platforms you have in 
> > hand?
>
> Any further ideas here? Or any workarounds that you would consider?

I think there may be some AMD specific handling needed in
drivers/acpi/sleep.c.  My understanding from reading the modern
standby documents from MS is that each vendor needs to provide a
platform specific PEP driver.  I'm not sure how much of that current
code is Intel specific or not.

Alex

>
> This platform has been rather tricky but all of the other problems are
> now solved:
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f897e60a12f0b9146357780d317879bce2a877dc
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d21b8adbd475dba19ac2086d3306327b4a297418
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=406857f773b082bc88edfd24967facf4ed07ac85
> https://patchwork.kernel.org/patch/11263477/
>
> amdgpu is the only breakage left before Linux can be shipped on this
> family of products.
>
> Thanks
> Daniel
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: always reset asic when going into suspend

2019-12-16 Thread Daniel Drake
Hi Alex,

On Mon, Nov 25, 2019 at 1:17 PM Daniel Drake  wrote:
> Unfortunately not. The original issue still exists (dead gfx after
> resume from s2idle) and also when I trigger execution of the suspend
> or runtime suspend routines the power usage increases around 1.5W as
> before.
>
> Have you confirmed that amdgpu s2idle is working on platforms you have in 
> hand?

Any further ideas here? Or any workarounds that you would consider?

This platform has been rather tricky but all of the other problems are
now solved:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f897e60a12f0b9146357780d317879bce2a877dc
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d21b8adbd475dba19ac2086d3306327b4a297418
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=406857f773b082bc88edfd24967facf4ed07ac85
https://patchwork.kernel.org/patch/11263477/

amdgpu is the only breakage left before Linux can be shipped on this
family of products.

Thanks
Daniel
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: always reset asic when going into suspend

2019-11-24 Thread Daniel Drake
On Fri, Nov 22, 2019 at 11:32 PM Alex Deucher  wrote:
> Do these patches help?
> https://patchwork.freedesktop.org/patch/341775/
> https://patchwork.freedesktop.org/patch/341968/

Unfortunately not. The original issue still exists (dead gfx after
resume from s2idle) and also when I trigger execution of the suspend
or runtime suspend routines the power usage increases around 1.5W as
before.

Have you confirmed that amdgpu s2idle is working on platforms you have in hand?

Thanks
Daniel
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: always reset asic when going into suspend

2019-11-22 Thread Alex Deucher
On Wed, Oct 16, 2019 at 3:24 AM Daniel Drake  wrote:
>
> On Wed, Oct 16, 2019 at 2:43 AM Alex Deucher  wrote:
> > Is s2idle actually powering down the GPU?
>
> My understanding is that s2idle (at a high level) just calls all
> devices suspend routines and then puts the CPU into its deepest
> running state.
> So if there is something special to be done to power off the GPU, I
> believe that amdgpu is responsible for making arrangements for that to
> happen.
> In this case the amdgpu code already does:
>
> pci_disable_device(dev->pdev);
> pci_set_power_state(dev->pdev, PCI_D3hot);
>
> And the PCI layer will call through to any appropriate ACPI methods
> related to that low power state.
>
> > Do you see a difference in power usage?  I think you are just working 
> > around the fact that the
> > GPU never actually gets powered down.
>
> I ran a series of experiments.
>
> Base setup: no UI running, ran "setterm -powersave 1; setterm -blank
> 1" and waited 1 minute for screen to turn off.
> Base power usage in this state is 4.7W as reported by BAT0/power_now
>
> 1. Run amdgpu_device_suspend(ddev, true, true); before my change
> --> Power usage increases to 6.1W
>
> 2. Run amdgpu_device_suspend(ddev, true, true); with my change applied
> --> Power usage increases to 6.0W
>
> 3. Put amdgpu device in runtime suspend
> --> Power usage increases to 6.2W
>
> 4. Try unmodified suspend path but d3cold instead of d3hot
> --> Power usage increases to 6.1W
>
> So, all of the suspend schemes actually increase the power usage by
> roughly the same amount, reset or not, with and without my patch :/
> Any ideas?

Do these patches help?
https://patchwork.freedesktop.org/patch/341775/
https://patchwork.freedesktop.org/patch/341968/

Alex

>
> Thanks,
> Daniel
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: always reset asic when going into suspend

2019-10-16 Thread Daniel Drake
On Wed, Oct 16, 2019 at 2:43 AM Alex Deucher  wrote:
> Is s2idle actually powering down the GPU?

My understanding is that s2idle (at a high level) just calls all
devices suspend routines and then puts the CPU into its deepest
running state.
So if there is something special to be done to power off the GPU, I
believe that amdgpu is responsible for making arrangements for that to
happen.
In this case the amdgpu code already does:

pci_disable_device(dev->pdev);
pci_set_power_state(dev->pdev, PCI_D3hot);

And the PCI layer will call through to any appropriate ACPI methods
related to that low power state.

> Do you see a difference in power usage?  I think you are just working around 
> the fact that the
> GPU never actually gets powered down.

I ran a series of experiments.

Base setup: no UI running, ran "setterm -powersave 1; setterm -blank
1" and waited 1 minute for screen to turn off.
Base power usage in this state is 4.7W as reported by BAT0/power_now

1. Run amdgpu_device_suspend(ddev, true, true); before my change
--> Power usage increases to 6.1W

2. Run amdgpu_device_suspend(ddev, true, true); with my change applied
--> Power usage increases to 6.0W

3. Put amdgpu device in runtime suspend
--> Power usage increases to 6.2W

4. Try unmodified suspend path but d3cold instead of d3hot
--> Power usage increases to 6.1W

So, all of the suspend schemes actually increase the power usage by
roughly the same amount, reset or not, with and without my patch :/
Any ideas?

Thanks,
Daniel
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

Re: [PATCH] drm/amdgpu: always reset asic when going into suspend

2019-10-15 Thread Alex Deucher
On Tue, Oct 15, 2019 at 2:50 AM Daniel Drake  wrote:
>
> On Asus UX434DA (Ryzen7 3700U), upon resume from s2idle, the screen
> turns on again and shows the pre-suspend image, but the display remains
> frozen from that point onwards.
>
> The kernel logs show errors:
>
>  [drm] psp command failed and response status is (0x7)
>  [drm] Fence fallback timer expired on ring sdma0
>  [drm] Fence fallback timer expired on ring gfx
>  amdgpu :03:00.0: [drm:amdgpu_ib_ring_tests] *ERROR* IB test failed on 
> gfx (-22).
>  [drm:process_one_work] *ERROR* ib ring test failed (-22).
>
> This can also be reproduced with pm_test:
>  # echo devices > /sys/power/pm_test
>  # echo freeze > /sys/power/mem
>
> The same reproducer causes the same problem on Asus X512DK (Ryzen5 3500U)
> even though that model is normally able to suspend and resume OK via S3.
>
> Experimenting, I observed that this error condition can be invoked on
> any amdgpu product by executing in succession:
>
>   amdgpu_device_suspend(drm_dev, true, true);
>   amdgpu_device_resume(drm_dev, true, true);
>
> i.e. it appears that the resume routine is unable to get the device out
> of suspended state, except for the S3 suspend case where it presumably has
> a bit of extra help from the firmware or hardware.
>
> However, I also observed that the runtime suspend/resume routines work
> OK when tested like this, which lead me to the key difference in these
> two cases: the ASIC reset, which only happens in the runtime suspend path.
>
> Since it takes less than 1ms, we should do the ASIC reset in all
> suspend paths, fixing resume from s2idle on these products.
>

Is s2idle actually powering down the GPU?  Do you see a difference in
power usage?  I think you are just working around the fact that the
GPU never actually gets powered down.  Leaving the GPU in the reset
state probably uses more power than not suspending it in the first
place.

Alex

> Link: https://bugs.freedesktop.org/show_bug.cgi?id=111811
> Signed-off-by: Daniel Drake 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 9 +
>  1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 5a1939dbd4e3..7f4870e974fb 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -3082,15 +3082,16 @@ int amdgpu_device_suspend(struct drm_device *dev, 
> bool suspend, bool fbcon)
>  */
> amdgpu_bo_evict_vram(adev);
>
> +   amdgpu_asic_reset(adev);
> +   r = amdgpu_asic_reset(adev);
> +   if (r)
> +   DRM_ERROR("amdgpu asic reset failed\n");
> +
> pci_save_state(dev->pdev);
> if (suspend) {
> /* Shut down the device */
> pci_disable_device(dev->pdev);
> pci_set_power_state(dev->pdev, PCI_D3hot);
> -   } else {
> -   r = amdgpu_asic_reset(adev);
> -   if (r)
> -   DRM_ERROR("amdgpu asic reset failed\n");
> }
>
> return 0;
> --
> 2.20.1
>
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx

[PATCH] drm/amdgpu: always reset asic when going into suspend

2019-10-15 Thread Daniel Drake
On Asus UX434DA (Ryzen7 3700U), upon resume from s2idle, the screen
turns on again and shows the pre-suspend image, but the display remains
frozen from that point onwards.

The kernel logs show errors:

 [drm] psp command failed and response status is (0x7)
 [drm] Fence fallback timer expired on ring sdma0
 [drm] Fence fallback timer expired on ring gfx
 amdgpu :03:00.0: [drm:amdgpu_ib_ring_tests] *ERROR* IB test failed on gfx 
(-22).
 [drm:process_one_work] *ERROR* ib ring test failed (-22).

This can also be reproduced with pm_test:
 # echo devices > /sys/power/pm_test
 # echo freeze > /sys/power/mem

The same reproducer causes the same problem on Asus X512DK (Ryzen5 3500U)
even though that model is normally able to suspend and resume OK via S3.

Experimenting, I observed that this error condition can be invoked on
any amdgpu product by executing in succession:

  amdgpu_device_suspend(drm_dev, true, true);
  amdgpu_device_resume(drm_dev, true, true);

i.e. it appears that the resume routine is unable to get the device out
of suspended state, except for the S3 suspend case where it presumably has
a bit of extra help from the firmware or hardware.

However, I also observed that the runtime suspend/resume routines work
OK when tested like this, which lead me to the key difference in these
two cases: the ASIC reset, which only happens in the runtime suspend path.

Since it takes less than 1ms, we should do the ASIC reset in all
suspend paths, fixing resume from s2idle on these products.

Link: https://bugs.freedesktop.org/show_bug.cgi?id=111811
Signed-off-by: Daniel Drake 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 5a1939dbd4e3..7f4870e974fb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3082,15 +3082,16 @@ int amdgpu_device_suspend(struct drm_device *dev, bool 
suspend, bool fbcon)
 */
amdgpu_bo_evict_vram(adev);
 
+   amdgpu_asic_reset(adev);
+   r = amdgpu_asic_reset(adev);
+   if (r)
+   DRM_ERROR("amdgpu asic reset failed\n");
+
pci_save_state(dev->pdev);
if (suspend) {
/* Shut down the device */
pci_disable_device(dev->pdev);
pci_set_power_state(dev->pdev, PCI_D3hot);
-   } else {
-   r = amdgpu_asic_reset(adev);
-   if (r)
-   DRM_ERROR("amdgpu asic reset failed\n");
}
 
return 0;
-- 
2.20.1

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx