Re: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended, powergating is explicitly enabled

2021-12-16 Thread Alex Deucher
On Thu, Dec 16, 2021 at 8:43 PM Quan, Evan  wrote:
>
> [AMD Official Use Only]
>
> Hi Alex,
>
> Per our checking, vcn_v2 and vcn_v3 already have the dpm disablement(below) 
> in their ->suspend routine which should prevent them from the issue here.
> if (adev->pm.dpm_enabled)
> amdgpu_dpm_enable_uvd(adev, false);
> So, maybe it's a different story with those newer APUs. Can you forward the 
> bug reports to me?

https://gitlab.freedesktop.org/drm/amd/-/issues/1712#note_1192187

Alex

>
> Thanks,
> Evan
> > -Original Message-
> > From: Alex Deucher 
> > Sent: Thursday, December 16, 2021 11:38 PM
> > To: Quan, Evan 
> > Cc: Zhu, James ; Gong, Curry
> > ; amd-gfx@lists.freedesktop.org; Deucher,
> > Alexander ; Liu, Leo 
> > Subject: Re: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended,
> > powergating is explicitly enabled
> >
> > FWIW, it looks like all versions of VCN need the same fix.  There have been
> > reports of suspend failing when VCN is in use on other newer APUs as well.
> >
> > Alex
> >
> > On Tue, Dec 14, 2021 at 12:59 AM Quan, Evan  wrote:
> > >
> > > [AMD Official Use Only]
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > From: Zhu, James 
> > > Sent: Monday, December 13, 2021 9:39 PM
> > > To: Gong, Curry ; Zhu, James
> > ;
> > > amd-gfx@lists.freedesktop.org
> > > Cc: Liu, Leo ; Quan, Evan ;
> > > Deucher, Alexander 
> > > Subject: Re: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended,
> > > powergating is explicitly enabled
> > >
> > >
> > >
> > > Hi Curry, Evan,
> > >
> > > It seems vcn1.0 power gate sequence are special.
> > >
> > > if the original solution is thread safe, then the following solution will 
> > > be
> > thread safe also.
> > >
> > > static int vcn_v1_0_hw_fini(void *handle) {
> > > struct amdgpu_device *adev = (struct amdgpu_device *)handle;
> > >
> > > cancel_delayed_work_sync(&adev->vcn.idle_work);
> > >
> > > if ((adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG) ||
> > > (adev->vcn.cur_state != AMD_PG_STATE_GATE &&
> > >  RREG32_SOC15(VCN, 0, mmUVD_STATUS))) {
> > > +if (adev->pm.dpm_enabled)
> > > +amdgpu_dpm_enable_uvd(adev, false);
> > > +else
> > > +vcn_v1_0_set_powergating_state(adev, AMD_PG_STATE_GATE);
> > >
> > > [Quan, Evan] Considering adev->pm.dpm_enabled is always true,
> > “vcn_v1_0_set_powergating_state(adev, AMD_PG_STATE_GATE); “ will
> > become dead code.
> > >
> > > Also, the vcn_v1_0_hw_fini is also used in other scenarios(except the
> > suspend/resume discussed here). So, it seems quite risky compared with
> > Curry’s original patch.
> > >
> > > Before we can come up with a better idea, I would suggest to land Curry’s
> > original patch as a quick fix.
> > >
> > >
> > >
> > > BR
> > >
> > > Evan
> > >
> > >
> > > }
> > >
> > > Best Regards!
> > >
> > > James
> > >
> > > On 2021-12-13 3:55 a.m., Gong, Curry wrote:
> > >
> > > [AMD Official Use Only]
> > >
> > >
> > >
> > > Hi James:
> > >
> > >
> > >
> > > With the following patch, an error will be reported when the driver is
> > > loaded
> > >
> > > +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
> > >
> > > @@ -1202,6 +1204,9 @@ static int vcn_v1_0_stop(struct amdgpu_device
> > > *adev)
> > >
> > > else
> > >
> > > r = vcn_v1_0_stop_spg_mode(adev);
> > >
> > >
> > >
> > > c
> > >
> > > return r;
> > >
> > > }
> > >
> > >
> > >
> > >
> > >
> > > $ dmesg
> > >
> > > [  363.181081] INFO: task kworker/3:2:223 blocked for more than 120
> > seconds.
> > >
> > > [  363.181150]   Tainted: G   OE 5.11.0-41-generic 
> > > #45~20.04.1-
> > Ubuntu
> > >
> > > [  363.181208] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> > disables this message.
> > >
> > > [  363.181266] task:kworker/

RE: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended, powergating is explicitly enabled

2021-12-16 Thread Quan, Evan
[AMD Official Use Only]

Hi Alex,

Per our checking, vcn_v2 and vcn_v3 already have the dpm disablement(below) in 
their ->suspend routine which should prevent them from the issue here.
if (adev->pm.dpm_enabled)
amdgpu_dpm_enable_uvd(adev, false);
So, maybe it's a different story with those newer APUs. Can you forward the bug 
reports to me?

Thanks,
Evan
> -Original Message-
> From: Alex Deucher 
> Sent: Thursday, December 16, 2021 11:38 PM
> To: Quan, Evan 
> Cc: Zhu, James ; Gong, Curry
> ; amd-gfx@lists.freedesktop.org; Deucher,
> Alexander ; Liu, Leo 
> Subject: Re: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended,
> powergating is explicitly enabled
> 
> FWIW, it looks like all versions of VCN need the same fix.  There have been
> reports of suspend failing when VCN is in use on other newer APUs as well.
> 
> Alex
> 
> On Tue, Dec 14, 2021 at 12:59 AM Quan, Evan  wrote:
> >
> > [AMD Official Use Only]
> >
> >
> >
> >
> >
> >
> >
> > From: Zhu, James 
> > Sent: Monday, December 13, 2021 9:39 PM
> > To: Gong, Curry ; Zhu, James
> ;
> > amd-gfx@lists.freedesktop.org
> > Cc: Liu, Leo ; Quan, Evan ;
> > Deucher, Alexander 
> > Subject: Re: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended,
> > powergating is explicitly enabled
> >
> >
> >
> > Hi Curry, Evan,
> >
> > It seems vcn1.0 power gate sequence are special.
> >
> > if the original solution is thread safe, then the following solution will be
> thread safe also.
> >
> > static int vcn_v1_0_hw_fini(void *handle) {
> > struct amdgpu_device *adev = (struct amdgpu_device *)handle;
> >
> > cancel_delayed_work_sync(&adev->vcn.idle_work);
> >
> > if ((adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG) ||
> > (adev->vcn.cur_state != AMD_PG_STATE_GATE &&
> >  RREG32_SOC15(VCN, 0, mmUVD_STATUS))) {
> > +if (adev->pm.dpm_enabled)
> > +amdgpu_dpm_enable_uvd(adev, false);
> > +else
> > +vcn_v1_0_set_powergating_state(adev, AMD_PG_STATE_GATE);
> >
> > [Quan, Evan] Considering adev->pm.dpm_enabled is always true,
> “vcn_v1_0_set_powergating_state(adev, AMD_PG_STATE_GATE); “ will
> become dead code.
> >
> > Also, the vcn_v1_0_hw_fini is also used in other scenarios(except the
> suspend/resume discussed here). So, it seems quite risky compared with
> Curry’s original patch.
> >
> > Before we can come up with a better idea, I would suggest to land Curry’s
> original patch as a quick fix.
> >
> >
> >
> > BR
> >
> > Evan
> >
> >
> > }
> >
> > Best Regards!
> >
> > James
> >
> > On 2021-12-13 3:55 a.m., Gong, Curry wrote:
> >
> > [AMD Official Use Only]
> >
> >
> >
> > Hi James:
> >
> >
> >
> > With the following patch, an error will be reported when the driver is
> > loaded
> >
> > +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
> >
> > @@ -1202,6 +1204,9 @@ static int vcn_v1_0_stop(struct amdgpu_device
> > *adev)
> >
> > else
> >
> > r = vcn_v1_0_stop_spg_mode(adev);
> >
> >
> >
> > c
> >
> > return r;
> >
> > }
> >
> >
> >
> >
> >
> > $ dmesg
> >
> > [  363.181081] INFO: task kworker/3:2:223 blocked for more than 120
> seconds.
> >
> > [  363.181150]   Tainted: G   OE 5.11.0-41-generic 
> > #45~20.04.1-
> Ubuntu
> >
> > [  363.181208] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> >
> > [  363.181266] task:kworker/3:2 state:D stack:0 pid:  223 ppid: 
> > 2
> flags:0x4000
> >
> > [  363.181276] Workqueue: events vcn_v1_0_idle_work_handler [amdgpu]
> >
> > [  363.181612] Call Trace:
> >
> > [  363.181618]  __schedule+0x44c/0x8a0
> >
> > [  363.181627]  schedule+0x4f/0xc0
> >
> > [  363.181631]  schedule_preempt_disabled+0xe/0x10
> >
> > [  363.181636]  __mutex_lock.isra.0+0x183/0x4d0
> >
> > [  363.181643]  __mutex_lock_slowpath+0x13/0x20
> >
> > [  363.181648]  mutex_lock+0x32/0x40
> >
> > [  363.181652]  amdgpu_dpm_set_powergating_by_smu+0x9c/0x180
> [amdgpu]
> >
> > [  363.182055]  amdgpu_dpm_enable_uvd+0x38/0x110 [amdgpu]
> >
> > [  363.182454]  vcn_v1_0_set_powergating_state+0x2e7e/0x3cf0

Re: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended, powergating is explicitly enabled

2021-12-16 Thread Alex Deucher
FWIW, it looks like all versions of VCN need the same fix.  There have
been reports of suspend failing when VCN is in use on other newer APUs
as well.

Alex

On Tue, Dec 14, 2021 at 12:59 AM Quan, Evan  wrote:
>
> [AMD Official Use Only]
>
>
>
>
>
>
>
> From: Zhu, James 
> Sent: Monday, December 13, 2021 9:39 PM
> To: Gong, Curry ; Zhu, James ; 
> amd-gfx@lists.freedesktop.org
> Cc: Liu, Leo ; Quan, Evan ; Deucher, 
> Alexander 
> Subject: Re: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended, 
> powergating is explicitly enabled
>
>
>
> Hi Curry, Evan,
>
> It seems vcn1.0 power gate sequence are special.
>
> if the original solution is thread safe, then the following solution will be 
> thread safe also.
>
> static int vcn_v1_0_hw_fini(void *handle)
> {
> struct amdgpu_device *adev = (struct amdgpu_device *)handle;
>
> cancel_delayed_work_sync(&adev->vcn.idle_work);
>
> if ((adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG) ||
> (adev->vcn.cur_state != AMD_PG_STATE_GATE &&
>  RREG32_SOC15(VCN, 0, mmUVD_STATUS))) {
> +if (adev->pm.dpm_enabled)
> +amdgpu_dpm_enable_uvd(adev, false);
> +else
> +vcn_v1_0_set_powergating_state(adev, AMD_PG_STATE_GATE);
>
> [Quan, Evan] Considering adev->pm.dpm_enabled is always true, 
> “vcn_v1_0_set_powergating_state(adev, AMD_PG_STATE_GATE); “ will become dead 
> code.
>
> Also, the vcn_v1_0_hw_fini is also used in other scenarios(except the 
> suspend/resume discussed here). So, it seems quite risky compared with 
> Curry’s original patch.
>
> Before we can come up with a better idea, I would suggest to land Curry’s 
> original patch as a quick fix.
>
>
>
> BR
>
> Evan
>
>
> }
>
> Best Regards!
>
> James
>
> On 2021-12-13 3:55 a.m., Gong, Curry wrote:
>
> [AMD Official Use Only]
>
>
>
> Hi James:
>
>
>
> With the following patch, an error will be reported when the driver is loaded
>
> +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
>
> @@ -1202,6 +1204,9 @@ static int vcn_v1_0_stop(struct amdgpu_device *adev)
>
> else
>
> r = vcn_v1_0_stop_spg_mode(adev);
>
>
>
> c
>
> return r;
>
> }
>
>
>
>
>
> $ dmesg
>
> [  363.181081] INFO: task kworker/3:2:223 blocked for more than 120 seconds.
>
> [  363.181150]   Tainted: G   OE 5.11.0-41-generic 
> #45~20.04.1-Ubuntu
>
> [  363.181208] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
> this message.
>
> [  363.181266] task:kworker/3:2 state:D stack:0 pid:  223 ppid: 2 
> flags:0x4000
>
> [  363.181276] Workqueue: events vcn_v1_0_idle_work_handler [amdgpu]
>
> [  363.181612] Call Trace:
>
> [  363.181618]  __schedule+0x44c/0x8a0
>
> [  363.181627]  schedule+0x4f/0xc0
>
> [  363.181631]  schedule_preempt_disabled+0xe/0x10
>
> [  363.181636]  __mutex_lock.isra.0+0x183/0x4d0
>
> [  363.181643]  __mutex_lock_slowpath+0x13/0x20
>
> [  363.181648]  mutex_lock+0x32/0x40
>
> [  363.181652]  amdgpu_dpm_set_powergating_by_smu+0x9c/0x180 [amdgpu]
>
> [  363.182055]  amdgpu_dpm_enable_uvd+0x38/0x110 [amdgpu]
>
> [  363.182454]  vcn_v1_0_set_powergating_state+0x2e7e/0x3cf0 [amdgpu]
>
> [  363.182776]  amdgpu_device_ip_set_powergating_state+0x6c/0xc0 [amdgpu]
>
> [  363.183028]  smu10_powergate_vcn+0x2a/0x80 [amdgpu]
>
> [  363.183361]  pp_set_powergating_by_smu+0xc5/0x2b0 [amdgpu]
>
> [  363.183699]  amdgpu_dpm_set_powergating_by_smu+0xb6/0x180 [amdgpu]
>
> [  363.184040]  amdgpu_dpm_enable_uvd+0x38/0x110 [amdgpu]
>
> [  363.184391]  vcn_v1_0_idle_work_handler+0xe1/0x130 [amdgpu]
>
> [  363.184667]  process_one_work+0x220/0x3c0
>
> [  363.184674]  worker_thread+0x4d/0x3f0
>
> [  363.184677]  ? process_one_work+0x3c0/0x3c0
>
> [  363.184680]  kthread+0x12b/0x150
>
> [  363.184685]  ? set_kthread_struct+0x40/0x40
>
> [  363.184690]  ret_from_fork+0x22/0x30
>
> [  363.184699] INFO: task kworker/2:2:233 blocked for more than 120 seconds.
>
> [  363.184739]   Tainted: G   OE 5.11.0-41-generic 
> #45~20.04.1-Ubuntu
>
> [  363.184782] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
> this message.
>
> [  363.184825] task:kworker/2:2 state:D stack:0 pid:  233 ppid: 2 
> flags:0x4000
>
> [  363.184831] Workqueue: events amdgpu_device_delayed_init_work_handler 
> [amdgpu]
>
> [  363.185085] Call Trace:
>
> [  363.185087]  __schedule+0x44c/0x8a0
>
> [  363.185092]  schedule+0x4f/0xc0
>
> [  363

RE: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended, powergating is explicitly enabled

2021-12-13 Thread Quan, Evan
[AMD Official Use Only]



From: Zhu, James 
Sent: Monday, December 13, 2021 9:39 PM
To: Gong, Curry ; Zhu, James ; 
amd-gfx@lists.freedesktop.org
Cc: Liu, Leo ; Quan, Evan ; Deucher, 
Alexander 
Subject: Re: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended, 
powergating is explicitly enabled


Hi Curry, Evan,

It seems vcn1.0 power gate sequence are special.

if the original solution is thread safe, then the following solution will be 
thread safe also.
static int vcn_v1_0_hw_fini(void *handle)
{
struct amdgpu_device *adev = (struct amdgpu_device *)handle;

cancel_delayed_work_sync(&adev->vcn.idle_work);

if ((adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG) ||
(adev->vcn.cur_state != AMD_PG_STATE_GATE &&
 RREG32_SOC15(VCN, 0, mmUVD_STATUS))) {
+if (adev->pm.dpm_enabled)
+amdgpu_dpm_enable_uvd(adev, false);
+else
+vcn_v1_0_set_powergating_state(adev, AMD_PG_STATE_GATE);
[Quan, Evan] Considering adev->pm.dpm_enabled is always true, 
"vcn_v1_0_set_powergating_state(adev, AMD_PG_STATE_GATE); " will become dead 
code.
Also, the vcn_v1_0_hw_fini is also used in other scenarios(except the 
suspend/resume discussed here). So, it seems quite risky compared with Curry's 
original patch.
Before we can come up with a better idea, I would suggest to land Curry's 
original patch as a quick fix.

BR
Evan

}

Best Regards!

James
On 2021-12-13 3:55 a.m., Gong, Curry wrote:

[AMD Official Use Only]

Hi James:

With the following patch, an error will be reported when the driver is loaded
+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
@@ -1202,6 +1204,9 @@ static int vcn_v1_0_stop(struct amdgpu_device *adev)
else
r = vcn_v1_0_stop_spg_mode(adev);

c
return r;
}


$ dmesg
[  363.181081] INFO: task kworker/3:2:223 blocked for more than 120 seconds.
[  363.181150]   Tainted: G   OE 5.11.0-41-generic 
#45~20.04.1-Ubuntu
[  363.181208] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[  363.181266] task:kworker/3:2 state:D stack:0 pid:  223 ppid: 2 
flags:0x4000
[  363.181276] Workqueue: events vcn_v1_0_idle_work_handler [amdgpu]
[  363.181612] Call Trace:
[  363.181618]  __schedule+0x44c/0x8a0
[  363.181627]  schedule+0x4f/0xc0
[  363.181631]  schedule_preempt_disabled+0xe/0x10
[  363.181636]  __mutex_lock.isra.0+0x183/0x4d0
[  363.181643]  __mutex_lock_slowpath+0x13/0x20
[  363.181648]  mutex_lock+0x32/0x40
[  363.181652]  amdgpu_dpm_set_powergating_by_smu+0x9c/0x180 [amdgpu]
[  363.182055]  amdgpu_dpm_enable_uvd+0x38/0x110 [amdgpu]
[  363.182454]  vcn_v1_0_set_powergating_state+0x2e7e/0x3cf0 [amdgpu]
[  363.182776]  amdgpu_device_ip_set_powergating_state+0x6c/0xc0 [amdgpu]
[  363.183028]  smu10_powergate_vcn+0x2a/0x80 [amdgpu]
[  363.183361]  pp_set_powergating_by_smu+0xc5/0x2b0 [amdgpu]
[  363.183699]  amdgpu_dpm_set_powergating_by_smu+0xb6/0x180 [amdgpu]
[  363.184040]  amdgpu_dpm_enable_uvd+0x38/0x110 [amdgpu]
[  363.184391]  vcn_v1_0_idle_work_handler+0xe1/0x130 [amdgpu]
[  363.184667]  process_one_work+0x220/0x3c0
[  363.184674]  worker_thread+0x4d/0x3f0
[  363.184677]  ? process_one_work+0x3c0/0x3c0
[  363.184680]  kthread+0x12b/0x150
[  363.184685]  ? set_kthread_struct+0x40/0x40
[  363.184690]  ret_from_fork+0x22/0x30
[  363.184699] INFO: task kworker/2:2:233 blocked for more than 120 seconds.
[  363.184739]   Tainted: G   OE 5.11.0-41-generic 
#45~20.04.1-Ubuntu
[  363.184782] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[  363.184825] task:kworker/2:2 state:D stack:0 pid:  233 ppid: 2 
flags:0x4000
[  363.184831] Workqueue: events amdgpu_device_delayed_init_work_handler 
[amdgpu]
[  363.185085] Call Trace:
[  363.185087]  __schedule+0x44c/0x8a0
[  363.185092]  schedule+0x4f/0xc0
[  363.185095]  schedule_timeout+0x202/0x290
[  363.185099]  ? sched_clock_cpu+0x11/0xb0
[  363.185105]  wait_for_completion+0x94/0x100
[  363.185110]  __flush_work+0x12a/0x1e0
[  363.185113]  ? worker_detach_from_pool+0xc0/0xc0
[  363.185119]  __cancel_work_timer+0x10e/0x190
[  363.185123]  cancel_delayed_work_sync+0x13/0x20
[  363.185126]  vcn_v1_0_ring_begin_use+0x20/0x70 [amdgpu]
[  363.185401]  amdgpu_ring_alloc+0x48/0x60 [amdgpu]
[  363.185640]  amdgpu_ib_schedule+0x493/0x600 [amdgpu]
[  363.185884]  amdgpu_job_submit_direct+0x3c/0xd0 [amdgpu]
[  363.186186]  amdgpu_vcn_dec_send_msg+0x105/0x210 [amdgpu]
[  363.186460]  amdgpu_vcn_dec_ring_test_ib+0x69/0x110 [amdgpu]
[  363.186734]  amdgpu_ib_ring_tests+0xf5/0x160 [amdgpu]
[  363.186978]  amdgpu_device_delayed_init_work_handler+0x15/0x30 [amdgpu]
[  363.187206]  process_one_work+0x220/0x3c0
[  363.187210]  worker_thread+0x4d/0x3f0
[  363.187214]  ? process_one_work+0x3c0/0x3c0
[  363.187217]  kthread+0x12b/0x150
[  363.187221]  ? set_kthread_struct+0x40/0x40
[  363.187226]  ret_from_fo

RE: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended, powergating is explicitly enabled

2021-12-13 Thread Quan, Evan
[AMD Official Use Only]

Checked the log paste below with Curry. The way to add this fix in 
vcn_v1_0_stop is not workable.
As it will induce a circle calling(below) and lead to dead lock.
VCN ring begin use -> amdgpu_dpm_enable_uvd -> acquire the smu_lock -> 
smu10_powergate_vcn -> amdgpu_device_ip_set_powergating_state -> vcn_v1_0_stop 
-> amdgpu_dpm_enable_uvd -> try to acquire the smu_lock again -> dead lock

BR
Evan
From: Gong, Curry 
Sent: Monday, December 13, 2021 4:56 PM
To: Zhu, James ; amd-gfx@lists.freedesktop.org
Cc: Liu, Leo ; Quan, Evan ; Deucher, 
Alexander 
Subject: RE: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended, 
powergating is explicitly enabled


[AMD Official Use Only]

Hi James:

With the following patch, an error will be reported when the driver is loaded
+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
@@ -1202,6 +1204,9 @@ static int vcn_v1_0_stop(struct amdgpu_device *adev)
else
r = vcn_v1_0_stop_spg_mode(adev);

+   if (adev->pm.dpm_enabled)
+   amdgpu_dpm_enable_uvd(adev, false);
+
return r;
}


$ dmesg
[  363.181081] INFO: task kworker/3:2:223 blocked for more than 120 seconds.
[  363.181150]   Tainted: G   OE 5.11.0-41-generic 
#45~20.04.1-Ubuntu
[  363.181208] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[  363.181266] task:kworker/3:2 state:D stack:0 pid:  223 ppid: 2 
flags:0x4000
[  363.181276] Workqueue: events vcn_v1_0_idle_work_handler [amdgpu]
[  363.181612] Call Trace:
[  363.181618]  __schedule+0x44c/0x8a0
[  363.181627]  schedule+0x4f/0xc0
[  363.181631]  schedule_preempt_disabled+0xe/0x10
[  363.181636]  __mutex_lock.isra.0+0x183/0x4d0
[  363.181643]  __mutex_lock_slowpath+0x13/0x20
[  363.181648]  mutex_lock+0x32/0x40
[  363.181652]  amdgpu_dpm_set_powergating_by_smu+0x9c/0x180 [amdgpu]
[  363.182055]  amdgpu_dpm_enable_uvd+0x38/0x110 [amdgpu]
[  363.182454]  vcn_v1_0_set_powergating_state+0x2e7e/0x3cf0 [amdgpu]
[  363.182776]  amdgpu_device_ip_set_powergating_state+0x6c/0xc0 [amdgpu]
[  363.183028]  smu10_powergate_vcn+0x2a/0x80 [amdgpu]
[  363.183361]  pp_set_powergating_by_smu+0xc5/0x2b0 [amdgpu]
[  363.183699]  amdgpu_dpm_set_powergating_by_smu+0xb6/0x180 [amdgpu]
[  363.184040]  amdgpu_dpm_enable_uvd+0x38/0x110 [amdgpu]
[  363.184391]  vcn_v1_0_idle_work_handler+0xe1/0x130 [amdgpu]
[  363.184667]  process_one_work+0x220/0x3c0
[  363.184674]  worker_thread+0x4d/0x3f0
[  363.184677]  ? process_one_work+0x3c0/0x3c0
[  363.184680]  kthread+0x12b/0x150
[  363.184685]  ? set_kthread_struct+0x40/0x40
[  363.184690]  ret_from_fork+0x22/0x30
[  363.184699] INFO: task kworker/2:2:233 blocked for more than 120 seconds.
[  363.184739]   Tainted: G   OE 5.11.0-41-generic 
#45~20.04.1-Ubuntu
[  363.184782] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[  363.184825] task:kworker/2:2 state:D stack:0 pid:  233 ppid: 2 
flags:0x4000
[  363.184831] Workqueue: events amdgpu_device_delayed_init_work_handler 
[amdgpu]
[  363.185085] Call Trace:
[  363.185087]  __schedule+0x44c/0x8a0
[  363.185092]  schedule+0x4f/0xc0
[  363.185095]  schedule_timeout+0x202/0x290
[  363.185099]  ? sched_clock_cpu+0x11/0xb0
[  363.185105]  wait_for_completion+0x94/0x100
[  363.185110]  __flush_work+0x12a/0x1e0
[  363.185113]  ? worker_detach_from_pool+0xc0/0xc0
[  363.185119]  __cancel_work_timer+0x10e/0x190
[  363.185123]  cancel_delayed_work_sync+0x13/0x20
[  363.185126]  vcn_v1_0_ring_begin_use+0x20/0x70 [amdgpu]
[  363.185401]  amdgpu_ring_alloc+0x48/0x60 [amdgpu]
[  363.185640]  amdgpu_ib_schedule+0x493/0x600 [amdgpu]
[  363.185884]  amdgpu_job_submit_direct+0x3c/0xd0 [amdgpu]
[  363.186186]  amdgpu_vcn_dec_send_msg+0x105/0x210 [amdgpu]
[  363.186460]  amdgpu_vcn_dec_ring_test_ib+0x69/0x110 [amdgpu]
[  363.186734]  amdgpu_ib_ring_tests+0xf5/0x160 [amdgpu]
[  363.186978]  amdgpu_device_delayed_init_work_handler+0x15/0x30 [amdgpu]
[  363.187206]  process_one_work+0x220/0x3c0
[  363.187210]  worker_thread+0x4d/0x3f0
[  363.187214]  ? process_one_work+0x3c0/0x3c0
[  363.187217]  kthread+0x12b/0x150
[  363.187221]  ? set_kthread_struct+0x40/0x40
[  363.187226]  ret_from_fork+0x22/0x30


BR
Curry Gong
From: Zhu, James mailto:james@amd.com>>
Sent: Saturday, December 11, 2021 5:07 AM
To: Gong, Curry mailto:curry.g...@amd.com>>; 
amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org>
Cc: Liu, Leo mailto:leo@amd.com>>; Zhu, James 
mailto:james@amd.com>>; Quan, Evan 
mailto:evan.q...@amd.com>>; Deucher, Alexander 
mailto:alexander.deuc...@amd.com>>
Subject: Re: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended, 
powergating is explicitly enabled

On 2021-12-10 6:41 a.m., chen gong wrote:

Play a video on the raven (or PCO, raven2) platform, and then do the S3

test. When resume, the following erro

RE: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended, powergating is explicitly enabled

2021-12-13 Thread Gong, Curry
[AMD Official Use Only]

Hi James:

With the following patch, an error will be reported when the driver is loaded
+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
@@ -1202,6 +1204,9 @@ static int vcn_v1_0_stop(struct amdgpu_device *adev)
else
r = vcn_v1_0_stop_spg_mode(adev);

+   if (adev->pm.dpm_enabled)
+   amdgpu_dpm_enable_uvd(adev, false);
+
return r;
}


$ dmesg
[  363.181081] INFO: task kworker/3:2:223 blocked for more than 120 seconds.
[  363.181150]   Tainted: G   OE 5.11.0-41-generic 
#45~20.04.1-Ubuntu
[  363.181208] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[  363.181266] task:kworker/3:2 state:D stack:0 pid:  223 ppid: 2 
flags:0x4000
[  363.181276] Workqueue: events vcn_v1_0_idle_work_handler [amdgpu]
[  363.181612] Call Trace:
[  363.181618]  __schedule+0x44c/0x8a0
[  363.181627]  schedule+0x4f/0xc0
[  363.181631]  schedule_preempt_disabled+0xe/0x10
[  363.181636]  __mutex_lock.isra.0+0x183/0x4d0
[  363.181643]  __mutex_lock_slowpath+0x13/0x20
[  363.181648]  mutex_lock+0x32/0x40
[  363.181652]  amdgpu_dpm_set_powergating_by_smu+0x9c/0x180 [amdgpu]
[  363.182055]  amdgpu_dpm_enable_uvd+0x38/0x110 [amdgpu]
[  363.182454]  vcn_v1_0_set_powergating_state+0x2e7e/0x3cf0 [amdgpu]
[  363.182776]  amdgpu_device_ip_set_powergating_state+0x6c/0xc0 [amdgpu]
[  363.183028]  smu10_powergate_vcn+0x2a/0x80 [amdgpu]
[  363.183361]  pp_set_powergating_by_smu+0xc5/0x2b0 [amdgpu]
[  363.183699]  amdgpu_dpm_set_powergating_by_smu+0xb6/0x180 [amdgpu]
[  363.184040]  amdgpu_dpm_enable_uvd+0x38/0x110 [amdgpu]
[  363.184391]  vcn_v1_0_idle_work_handler+0xe1/0x130 [amdgpu]
[  363.184667]  process_one_work+0x220/0x3c0
[  363.184674]  worker_thread+0x4d/0x3f0
[  363.184677]  ? process_one_work+0x3c0/0x3c0
[  363.184680]  kthread+0x12b/0x150
[  363.184685]  ? set_kthread_struct+0x40/0x40
[  363.184690]  ret_from_fork+0x22/0x30
[  363.184699] INFO: task kworker/2:2:233 blocked for more than 120 seconds.
[  363.184739]   Tainted: G   OE 5.11.0-41-generic 
#45~20.04.1-Ubuntu
[  363.184782] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this 
message.
[  363.184825] task:kworker/2:2 state:D stack:0 pid:  233 ppid: 2 
flags:0x4000
[  363.184831] Workqueue: events amdgpu_device_delayed_init_work_handler 
[amdgpu]
[  363.185085] Call Trace:
[  363.185087]  __schedule+0x44c/0x8a0
[  363.185092]  schedule+0x4f/0xc0
[  363.185095]  schedule_timeout+0x202/0x290
[  363.185099]  ? sched_clock_cpu+0x11/0xb0
[  363.185105]  wait_for_completion+0x94/0x100
[  363.185110]  __flush_work+0x12a/0x1e0
[  363.185113]  ? worker_detach_from_pool+0xc0/0xc0
[  363.185119]  __cancel_work_timer+0x10e/0x190
[  363.185123]  cancel_delayed_work_sync+0x13/0x20
[  363.185126]  vcn_v1_0_ring_begin_use+0x20/0x70 [amdgpu]
[  363.185401]  amdgpu_ring_alloc+0x48/0x60 [amdgpu]
[  363.185640]  amdgpu_ib_schedule+0x493/0x600 [amdgpu]
[  363.185884]  amdgpu_job_submit_direct+0x3c/0xd0 [amdgpu]
[  363.186186]  amdgpu_vcn_dec_send_msg+0x105/0x210 [amdgpu]
[  363.186460]  amdgpu_vcn_dec_ring_test_ib+0x69/0x110 [amdgpu]
[  363.186734]  amdgpu_ib_ring_tests+0xf5/0x160 [amdgpu]
[  363.186978]  amdgpu_device_delayed_init_work_handler+0x15/0x30 [amdgpu]
[  363.187206]  process_one_work+0x220/0x3c0
[  363.187210]  worker_thread+0x4d/0x3f0
[  363.187214]  ? process_one_work+0x3c0/0x3c0
[  363.187217]  kthread+0x12b/0x150
[  363.187221]  ? set_kthread_struct+0x40/0x40
[  363.187226]  ret_from_fork+0x22/0x30


BR
Curry Gong
From: Zhu, James 
Sent: Saturday, December 11, 2021 5:07 AM
To: Gong, Curry ; amd-gfx@lists.freedesktop.org
Cc: Liu, Leo ; Zhu, James ; Quan, Evan 
; Deucher, Alexander 
Subject: Re: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended, 
powergating is explicitly enabled

On 2021-12-10 6:41 a.m., chen gong wrote:

Play a video on the raven (or PCO, raven2) platform, and then do the S3

test. When resume, the following error will be reported:



amdgpu :02:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring

vcn_dec test failed (-110)

[drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block

 failed -110

amdgpu :02:00.0: amdgpu: amdgpu_device_ip_resume failed (-110).

PM: dpm_run_callback(): pci_pm_resume+0x0/0x90 returns -110



[why]

When playing the video: The power state flag of the vcn block is set to

POWER_STATE_ON.



When doing suspend: There is no change to the power state flag of the

vcn block, it is still POWER_STATE_ON.



When doing resume: Need to open the power gate of the vcn block and set

the power state flag of the VCN block to POWER_STATE_ON.

But at this time, the power state flag of the vcn block is already

POWER_STATE_ON. The power status flag check in the "8f2cdef drm/amd/pm:

avoid duplicate powergate/ungate setting" patch will return the

amdgpu_dpm_set_powergating_by_smu function directly.

As 

Re: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended, powergating is explicitly enabled

2021-12-10 Thread James Zhu


On 2021-12-10 11:19 a.m., Quan, Evan wrote:

[AMD Official Use Only]




-Original Message-
From: Lazar, Lijo
Sent: Friday, December 10, 2021 8:25 PM
To: Gong, Curry;amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander; Zhu, James
; Liu, Leo; Quan, Evan

Subject: Re: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended,
powergating is explicitly enabled



On 12/10/2021 5:11 PM, chen gong wrote:

Play a video on the raven (or PCO, raven2) platform, and then do the
S3 test. When resume, the following error will be reported:

amdgpu :02:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR*
ring vcn_dec test failed (-110)
[drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of

IP

block  failed -110 amdgpu :02:00.0: amdgpu:
amdgpu_device_ip_resume failed (-110).
PM: dpm_run_callback(): pci_pm_resume+0x0/0x90 returns -110

[why]
When playing the video: The power state flag of the vcn block is set
to POWER_STATE_ON.

When doing suspend: There is no change to the power state flag of the
vcn block, it is still POWER_STATE_ON.

When doing resume: Need to open the power gate of the vcn block and
set the power state flag of the VCN block to POWER_STATE_ON.
But at this time, the power state flag of the vcn block is already
POWER_STATE_ON. The power status flag check in the "8f2cdef

drm/amd/pm:

avoid duplicate powergate/ungate setting" patch will return the
amdgpu_dpm_set_powergating_by_smu function directly.
As a result, the gate of the power was not opened, causing the
subsequent ring test to fail.

[how]
In the suspend function of the vcn block, explicitly change the power
state flag of the vcn block to POWER_STATE_OFF.

Signed-off-by: chen gong
---
   drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c | 7 +++
   1 file changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
index d54d720..d73676b 100644
--- a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
@@ -246,6 +246,13 @@ static int vcn_v1_0_suspend(void *handle)
   {
int r;
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
+   bool cancel_success;
+
+   cancel_success = cancel_delayed_work_sync(&adev-
vcn.idle_work);
+   if (cancel_success) {
+   if (adev->pm.dpm_enabled)
+   amdgpu_dpm_enable_uvd(adev, false);
+   }


Probably this is a common issue. Can you try moving this to
amdgpu_vcn_suspend?

[Quan, Evan] Yes, amdgpu_vcn_suspend seems a more proper place. However, the 
vcn code is not consistent.
For vcn v2 and later, they already do the manual gate operation in their 
suspend routine(like vcn_v2_0_stop).
So, it is actually only vcn_v1_0.c suffers this issue.
[JZ] Then what is concern for vcn1 not doing the same thing as 
vcn_v2_0_stop?

if (adev->pm.dpm_enabled)
 amdgpu_device_ip_set_powergating_state(adev,
AMD_IP_BLOCK_TYPE_VCN,AMD_PG_STATE_GATE);

Call this after cancel_delayed_work_sync. Shouldn't have any effect if idle
work already put it in PG state. Evan, what do you think?

[Quan, Evan] Should not for now. But I'm considering to raise the dev_dbg 
prompt in amdgpu_dpm_set_powergating_by_smu for double gate/ungate to dev_info 
or dev_warn.
Hopefully that can help to capture some potential issues like this. So, better 
to keep the check for the return value of cancel_delayed_work_sync here.

BR
Evan

Thanks,
Lijo


r = vcn_v1_0_hw_fini(adev);
if (r)


Re: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended, powergating is explicitly enabled

2021-12-10 Thread James Zhu

On 2021-12-10 6:41 a.m., chen gong wrote:

Play a video on the raven (or PCO, raven2) platform, and then do the S3
test. When resume, the following error will be reported:

amdgpu :02:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring
vcn_dec test failed (-110)
[drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block
 failed -110
amdgpu :02:00.0: amdgpu: amdgpu_device_ip_resume failed (-110).
PM: dpm_run_callback(): pci_pm_resume+0x0/0x90 returns -110

[why]
When playing the video: The power state flag of the vcn block is set to
POWER_STATE_ON.

When doing suspend: There is no change to the power state flag of the
vcn block, it is still POWER_STATE_ON.

When doing resume: Need to open the power gate of the vcn block and set
the power state flag of the VCN block to POWER_STATE_ON.
But at this time, the power state flag of the vcn block is already
POWER_STATE_ON. The power status flag check in the "8f2cdef drm/amd/pm:
avoid duplicate powergate/ungate setting" patch will return the
amdgpu_dpm_set_powergating_by_smu function directly.
As a result, the gate of the power was not opened, causing the
subsequent ring test to fail.

[how]
In the suspend function of the vcn block, explicitly change the power
state flag of the vcn block to POWER_STATE_OFF.

Signed-off-by: chen gong
---
  drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c | 7 +++
  1 file changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c 
b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
index d54d720..d73676b 100644
--- a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
@@ -246,6 +246,13 @@ static int vcn_v1_0_suspend(void *handle)
  {
int r;
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
+   bool cancel_success;
+
+   cancel_success = cancel_delayed_work_sync(&adev->vcn.idle_work);


[JZ] Can you refer to vcn_v3_0_stop , and add 
amdgpu_dpm_enable_uvd(adev, false); to the end of vcn_v1_0_stop?


See if it also can help.


+   if (cancel_success) {
+   if (adev->pm.dpm_enabled)
+   amdgpu_dpm_enable_uvd(adev, false);
+   }
  
  	r = vcn_v1_0_hw_fini(adev);

if (r)

Re: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended, powergating is explicitly enabled

2021-12-10 Thread Alex Deucher
On Fri, Dec 10, 2021 at 11:06 AM Quan, Evan  wrote:
>
> [AMD Official Use Only]
>
> Hi Curry,
>
> Some nitpicks below. With them fixed, the patch is reviewed-by: Evan Quan 
> 
>
> @Deucher, Alexander this should be able address the issue reported by 
> https://gitlab.freedesktop.org/drm/amd/-/issues/1828. Can you help to confirm 
> this?

Yes, the patch works according to the reporter.

Alex

>
> BR
> Evan
> > -Original Message-
> > From: chen gong 
> > Sent: Friday, December 10, 2021 7:42 PM
> > To: amd-gfx@lists.freedesktop.org
> > Cc: Liu, Leo ; Zhu, James ; Quan,
> > Evan ; Deucher, Alexander
> > ; Gong, Curry 
> > Subject: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended,
> > powergating is explicitly enabled
> >
> > Play a video on the raven (or PCO, raven2) platform, and then do the S3
> > test. When resume, the following error will be reported:
> >
> > amdgpu :02:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR*
> > ring
> > vcn_dec test failed (-110)
> > [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP
> > block
> >  failed -110
> > amdgpu :02:00.0: amdgpu: amdgpu_device_ip_resume failed (-110).
> > PM: dpm_run_callback(): pci_pm_resume+0x0/0x90 returns -110
> >
> > [why]
> > When playing the video: The power state flag of the vcn block is set to
> > POWER_STATE_ON.
> >
> > When doing suspend: There is no change to the power state flag of the
> > vcn block, it is still POWER_STATE_ON.
> >
> > When doing resume: Need to open the power gate of the vcn block and set
> > the power state flag of the VCN block to POWER_STATE_ON.
> > But at this time, the power state flag of the vcn block is already
> > POWER_STATE_ON. The power status flag check in the "8f2cdef
> > drm/amd/pm:
> > avoid duplicate powergate/ungate setting" patch will return the
> > amdgpu_dpm_set_powergating_by_smu function directly.
> > As a result, the gate of the power was not opened, causing the
> > subsequent ring test to fail.
> Better to update this as some descriptions  below:
> adev-> vcn.idle_work will be triggered when the VCN ring idle for 1S. And we 
> rely on it to do the VCN gate(power down).
> However, if the VCN ring is still using on suspend kicked, there will be no 
> chance for adev->vcn.idle_work to do the VCN gate.
> That will make driver wrongly treat VCN as ungated(power on) and prevent 
> further VCN ungate(power on) operation(which is actually needed) on resume.
> >
> > [how]
> > In the suspend function of the vcn block, explicitly change the power
> > state flag of the vcn block to POWER_STATE_OFF.
> Maybe some descriptions as below is better:
> Manually do the VCN gate(power down) in the suspend routine if 
> adev->vcn.idle_work does not(have no chance) do that.
> >
> > Signed-off-by: chen gong 
> > ---
> >  drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c | 7 +++
> >  1 file changed, 7 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
> > b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
> > index d54d720..d73676b 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
> > @@ -246,6 +246,13 @@ static int vcn_v1_0_suspend(void *handle)
> >  {
> >   int r;
> >   struct amdgpu_device *adev = (struct amdgpu_device *)handle;
> > + bool cancel_success;
> This seems not a good naming since the cancel always succeed. The difference 
> is whether the idle_work get actually executed.
>  Better to rename it as "idle_work_unexecuted" or just "vcn_not_gated"?
> > +
> > + cancel_success = cancel_delayed_work_sync(&adev-
> > >vcn.idle_work);
> > + if (cancel_success) {
> > + if (adev->pm.dpm_enabled)
> > + amdgpu_dpm_enable_uvd(adev, false);
> > + }
> >
> >   r = vcn_v1_0_hw_fini(adev);
> >   if (r)
> > --
> > 2.7.4


RE: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended, powergating is explicitly enabled

2021-12-10 Thread Quan, Evan
[AMD Official Use Only]



> -Original Message-
> From: Lazar, Lijo 
> Sent: Friday, December 10, 2021 8:25 PM
> To: Gong, Curry ; amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Zhu, James
> ; Liu, Leo ; Quan, Evan
> 
> Subject: Re: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended,
> powergating is explicitly enabled
> 
> 
> 
> On 12/10/2021 5:11 PM, chen gong wrote:
> > Play a video on the raven (or PCO, raven2) platform, and then do the
> > S3 test. When resume, the following error will be reported:
> >
> > amdgpu :02:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR*
> > ring vcn_dec test failed (-110)
> > [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of
> IP
> > block  failed -110 amdgpu :02:00.0: amdgpu:
> > amdgpu_device_ip_resume failed (-110).
> > PM: dpm_run_callback(): pci_pm_resume+0x0/0x90 returns -110
> >
> > [why]
> > When playing the video: The power state flag of the vcn block is set
> > to POWER_STATE_ON.
> >
> > When doing suspend: There is no change to the power state flag of the
> > vcn block, it is still POWER_STATE_ON.
> >
> > When doing resume: Need to open the power gate of the vcn block and
> > set the power state flag of the VCN block to POWER_STATE_ON.
> > But at this time, the power state flag of the vcn block is already
> > POWER_STATE_ON. The power status flag check in the "8f2cdef
> drm/amd/pm:
> > avoid duplicate powergate/ungate setting" patch will return the
> > amdgpu_dpm_set_powergating_by_smu function directly.
> > As a result, the gate of the power was not opened, causing the
> > subsequent ring test to fail.
> >
> > [how]
> > In the suspend function of the vcn block, explicitly change the power
> > state flag of the vcn block to POWER_STATE_OFF.
> >
> > Signed-off-by: chen gong 
> > ---
> >   drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c | 7 +++
> >   1 file changed, 7 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
> > b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
> > index d54d720..d73676b 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
> > @@ -246,6 +246,13 @@ static int vcn_v1_0_suspend(void *handle)
> >   {
> > int r;
> > struct amdgpu_device *adev = (struct amdgpu_device *)handle;
> > +   bool cancel_success;
> > +
> > +   cancel_success = cancel_delayed_work_sync(&adev-
> >vcn.idle_work);
> > +   if (cancel_success) {
> > +   if (adev->pm.dpm_enabled)
> > +   amdgpu_dpm_enable_uvd(adev, false);
> > +   }
> >
> 
> Probably this is a common issue. Can you try moving this to
> amdgpu_vcn_suspend?
[Quan, Evan] Yes, amdgpu_vcn_suspend seems a more proper place. However, the 
vcn code is not consistent.
For vcn v2 and later, they already do the manual gate operation in their 
suspend routine(like vcn_v2_0_stop).
So, it is actually only vcn_v1_0.c suffers this issue.
> 
> if (adev->pm.dpm_enabled)
> amdgpu_device_ip_set_powergating_state(adev,
> AMD_IP_BLOCK_TYPE_VCN,AMD_PG_STATE_GATE);
> 
> Call this after cancel_delayed_work_sync. Shouldn't have any effect if idle
> work already put it in PG state. Evan, what do you think?
[Quan, Evan] Should not for now. But I'm considering to raise the dev_dbg 
prompt in amdgpu_dpm_set_powergating_by_smu for double gate/ungate to dev_info 
or dev_warn.
Hopefully that can help to capture some potential issues like this. So, better 
to keep the check for the return value of cancel_delayed_work_sync here.

BR
Evan
> 
> Thanks,
> Lijo
> 
> > r = vcn_v1_0_hw_fini(adev);
> > if (r)
> >


RE: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended, powergating is explicitly enabled

2021-12-10 Thread Quan, Evan
[AMD Official Use Only]

Hi Curry,

Some nitpicks below. With them fixed, the patch is reviewed-by: Evan Quan 


@Deucher, Alexander this should be able address the issue reported by 
https://gitlab.freedesktop.org/drm/amd/-/issues/1828. Can you help to confirm 
this?

BR
Evan
> -Original Message-
> From: chen gong 
> Sent: Friday, December 10, 2021 7:42 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Liu, Leo ; Zhu, James ; Quan,
> Evan ; Deucher, Alexander
> ; Gong, Curry 
> Subject: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended,
> powergating is explicitly enabled
> 
> Play a video on the raven (or PCO, raven2) platform, and then do the S3
> test. When resume, the following error will be reported:
> 
> amdgpu :02:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR*
> ring
> vcn_dec test failed (-110)
> [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP
> block
>  failed -110
> amdgpu :02:00.0: amdgpu: amdgpu_device_ip_resume failed (-110).
> PM: dpm_run_callback(): pci_pm_resume+0x0/0x90 returns -110
> 
> [why]
> When playing the video: The power state flag of the vcn block is set to
> POWER_STATE_ON.
> 
> When doing suspend: There is no change to the power state flag of the
> vcn block, it is still POWER_STATE_ON.
> 
> When doing resume: Need to open the power gate of the vcn block and set
> the power state flag of the VCN block to POWER_STATE_ON.
> But at this time, the power state flag of the vcn block is already
> POWER_STATE_ON. The power status flag check in the "8f2cdef
> drm/amd/pm:
> avoid duplicate powergate/ungate setting" patch will return the
> amdgpu_dpm_set_powergating_by_smu function directly.
> As a result, the gate of the power was not opened, causing the
> subsequent ring test to fail.
Better to update this as some descriptions  below:
adev-> vcn.idle_work will be triggered when the VCN ring idle for 1S. And we 
rely on it to do the VCN gate(power down).
However, if the VCN ring is still using on suspend kicked, there will be no 
chance for adev->vcn.idle_work to do the VCN gate.
That will make driver wrongly treat VCN as ungated(power on) and prevent 
further VCN ungate(power on) operation(which is actually needed) on resume.
> 
> [how]
> In the suspend function of the vcn block, explicitly change the power
> state flag of the vcn block to POWER_STATE_OFF.
Maybe some descriptions as below is better:
Manually do the VCN gate(power down) in the suspend routine if 
adev->vcn.idle_work does not(have no chance) do that. 
> 
> Signed-off-by: chen gong 
> ---
>  drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
> b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
> index d54d720..d73676b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
> @@ -246,6 +246,13 @@ static int vcn_v1_0_suspend(void *handle)
>  {
>   int r;
>   struct amdgpu_device *adev = (struct amdgpu_device *)handle;
> + bool cancel_success;
This seems not a good naming since the cancel always succeed. The difference is 
whether the idle_work get actually executed.
 Better to rename it as "idle_work_unexecuted" or just "vcn_not_gated"?
> +
> + cancel_success = cancel_delayed_work_sync(&adev-
> >vcn.idle_work);
> + if (cancel_success) {
> + if (adev->pm.dpm_enabled)
> + amdgpu_dpm_enable_uvd(adev, false);
> + }
> 
>   r = vcn_v1_0_hw_fini(adev);
>   if (r)
> --
> 2.7.4


Re: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended, powergating is explicitly enabled

2021-12-10 Thread Lazar, Lijo




On 12/10/2021 5:11 PM, chen gong wrote:

Play a video on the raven (or PCO, raven2) platform, and then do the S3
test. When resume, the following error will be reported:

amdgpu :02:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring
vcn_dec test failed (-110)
[drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block
 failed -110
amdgpu :02:00.0: amdgpu: amdgpu_device_ip_resume failed (-110).
PM: dpm_run_callback(): pci_pm_resume+0x0/0x90 returns -110

[why]
When playing the video: The power state flag of the vcn block is set to
POWER_STATE_ON.

When doing suspend: There is no change to the power state flag of the
vcn block, it is still POWER_STATE_ON.

When doing resume: Need to open the power gate of the vcn block and set
the power state flag of the VCN block to POWER_STATE_ON.
But at this time, the power state flag of the vcn block is already
POWER_STATE_ON. The power status flag check in the "8f2cdef drm/amd/pm:
avoid duplicate powergate/ungate setting" patch will return the
amdgpu_dpm_set_powergating_by_smu function directly.
As a result, the gate of the power was not opened, causing the
subsequent ring test to fail.

[how]
In the suspend function of the vcn block, explicitly change the power
state flag of the vcn block to POWER_STATE_OFF.

Signed-off-by: chen gong 
---
  drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c | 7 +++
  1 file changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c 
b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
index d54d720..d73676b 100644
--- a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
@@ -246,6 +246,13 @@ static int vcn_v1_0_suspend(void *handle)
  {
int r;
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
+   bool cancel_success;
+
+   cancel_success = cancel_delayed_work_sync(&adev->vcn.idle_work);
+   if (cancel_success) {
+   if (adev->pm.dpm_enabled)
+   amdgpu_dpm_enable_uvd(adev, false);
+   }
  


Probably this is a common issue. Can you try moving this to 
amdgpu_vcn_suspend?


if (adev->pm.dpm_enabled)
   amdgpu_device_ip_set_powergating_state(adev, 
AMD_IP_BLOCK_TYPE_VCN,AMD_PG_STATE_GATE);


Call this after cancel_delayed_work_sync. Shouldn't have any effect if 
idle work already put it in PG state. Evan, what do you think?


Thanks,
Lijo


r = vcn_v1_0_hw_fini(adev);
if (r)



[PATCH] drm/amdgpu: When the VCN(1.0) block is suspended, powergating is explicitly enabled

2021-12-10 Thread chen gong
Play a video on the raven (or PCO, raven2) platform, and then do the S3
test. When resume, the following error will be reported:

amdgpu :02:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring
vcn_dec test failed (-110)
[drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block
 failed -110
amdgpu :02:00.0: amdgpu: amdgpu_device_ip_resume failed (-110).
PM: dpm_run_callback(): pci_pm_resume+0x0/0x90 returns -110

[why]
When playing the video: The power state flag of the vcn block is set to
POWER_STATE_ON.

When doing suspend: There is no change to the power state flag of the
vcn block, it is still POWER_STATE_ON.

When doing resume: Need to open the power gate of the vcn block and set
the power state flag of the VCN block to POWER_STATE_ON.
But at this time, the power state flag of the vcn block is already
POWER_STATE_ON. The power status flag check in the "8f2cdef drm/amd/pm:
avoid duplicate powergate/ungate setting" patch will return the
amdgpu_dpm_set_powergating_by_smu function directly.
As a result, the gate of the power was not opened, causing the
subsequent ring test to fail.

[how]
In the suspend function of the vcn block, explicitly change the power
state flag of the vcn block to POWER_STATE_OFF.

Signed-off-by: chen gong 
---
 drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c 
b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
index d54d720..d73676b 100644
--- a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c
@@ -246,6 +246,13 @@ static int vcn_v1_0_suspend(void *handle)
 {
int r;
struct amdgpu_device *adev = (struct amdgpu_device *)handle;
+   bool cancel_success;
+
+   cancel_success = cancel_delayed_work_sync(&adev->vcn.idle_work);
+   if (cancel_success) {
+   if (adev->pm.dpm_enabled)
+   amdgpu_dpm_enable_uvd(adev, false);
+   }
 
r = vcn_v1_0_hw_fini(adev);
if (r)
-- 
2.7.4