Re: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended, powergating is explicitly enabled
On Thu, Dec 16, 2021 at 8:43 PM Quan, Evan wrote: > > [AMD Official Use Only] > > Hi Alex, > > Per our checking, vcn_v2 and vcn_v3 already have the dpm disablement(below) > in their ->suspend routine which should prevent them from the issue here. > if (adev->pm.dpm_enabled) > amdgpu_dpm_enable_uvd(adev, false); > So, maybe it's a different story with those newer APUs. Can you forward the > bug reports to me? https://gitlab.freedesktop.org/drm/amd/-/issues/1712#note_1192187 Alex > > Thanks, > Evan > > -Original Message- > > From: Alex Deucher > > Sent: Thursday, December 16, 2021 11:38 PM > > To: Quan, Evan > > Cc: Zhu, James ; Gong, Curry > > ; amd-gfx@lists.freedesktop.org; Deucher, > > Alexander ; Liu, Leo > > Subject: Re: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended, > > powergating is explicitly enabled > > > > FWIW, it looks like all versions of VCN need the same fix. There have been > > reports of suspend failing when VCN is in use on other newer APUs as well. > > > > Alex > > > > On Tue, Dec 14, 2021 at 12:59 AM Quan, Evan wrote: > > > > > > [AMD Official Use Only] > > > > > > > > > > > > > > > > > > > > > > > > From: Zhu, James > > > Sent: Monday, December 13, 2021 9:39 PM > > > To: Gong, Curry ; Zhu, James > > ; > > > amd-gfx@lists.freedesktop.org > > > Cc: Liu, Leo ; Quan, Evan ; > > > Deucher, Alexander > > > Subject: Re: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended, > > > powergating is explicitly enabled > > > > > > > > > > > > Hi Curry, Evan, > > > > > > It seems vcn1.0 power gate sequence are special. > > > > > > if the original solution is thread safe, then the following solution will > > > be > > thread safe also. > > > > > > static int vcn_v1_0_hw_fini(void *handle) { > > > struct amdgpu_device *adev = (struct amdgpu_device *)handle; > > > > > > cancel_delayed_work_sync(&adev->vcn.idle_work); > > > > > > if ((adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG) || > > > (adev->vcn.cur_state != AMD_PG_STATE_GATE && > > > RREG32_SOC15(VCN, 0, mmUVD_STATUS))) { > > > +if (adev->pm.dpm_enabled) > > > +amdgpu_dpm_enable_uvd(adev, false); > > > +else > > > +vcn_v1_0_set_powergating_state(adev, AMD_PG_STATE_GATE); > > > > > > [Quan, Evan] Considering adev->pm.dpm_enabled is always true, > > “vcn_v1_0_set_powergating_state(adev, AMD_PG_STATE_GATE); “ will > > become dead code. > > > > > > Also, the vcn_v1_0_hw_fini is also used in other scenarios(except the > > suspend/resume discussed here). So, it seems quite risky compared with > > Curry’s original patch. > > > > > > Before we can come up with a better idea, I would suggest to land Curry’s > > original patch as a quick fix. > > > > > > > > > > > > BR > > > > > > Evan > > > > > > > > > } > > > > > > Best Regards! > > > > > > James > > > > > > On 2021-12-13 3:55 a.m., Gong, Curry wrote: > > > > > > [AMD Official Use Only] > > > > > > > > > > > > Hi James: > > > > > > > > > > > > With the following patch, an error will be reported when the driver is > > > loaded > > > > > > +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c > > > > > > @@ -1202,6 +1204,9 @@ static int vcn_v1_0_stop(struct amdgpu_device > > > *adev) > > > > > > else > > > > > > r = vcn_v1_0_stop_spg_mode(adev); > > > > > > > > > > > > c > > > > > > return r; > > > > > > } > > > > > > > > > > > > > > > > > > $ dmesg > > > > > > [ 363.181081] INFO: task kworker/3:2:223 blocked for more than 120 > > seconds. > > > > > > [ 363.181150] Tainted: G OE 5.11.0-41-generic > > > #45~20.04.1- > > Ubuntu > > > > > > [ 363.181208] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > > disables this message. > > > > > > [ 363.181266] task:kworker/
RE: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended, powergating is explicitly enabled
[AMD Official Use Only] Hi Alex, Per our checking, vcn_v2 and vcn_v3 already have the dpm disablement(below) in their ->suspend routine which should prevent them from the issue here. if (adev->pm.dpm_enabled) amdgpu_dpm_enable_uvd(adev, false); So, maybe it's a different story with those newer APUs. Can you forward the bug reports to me? Thanks, Evan > -Original Message- > From: Alex Deucher > Sent: Thursday, December 16, 2021 11:38 PM > To: Quan, Evan > Cc: Zhu, James ; Gong, Curry > ; amd-gfx@lists.freedesktop.org; Deucher, > Alexander ; Liu, Leo > Subject: Re: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended, > powergating is explicitly enabled > > FWIW, it looks like all versions of VCN need the same fix. There have been > reports of suspend failing when VCN is in use on other newer APUs as well. > > Alex > > On Tue, Dec 14, 2021 at 12:59 AM Quan, Evan wrote: > > > > [AMD Official Use Only] > > > > > > > > > > > > > > > > From: Zhu, James > > Sent: Monday, December 13, 2021 9:39 PM > > To: Gong, Curry ; Zhu, James > ; > > amd-gfx@lists.freedesktop.org > > Cc: Liu, Leo ; Quan, Evan ; > > Deucher, Alexander > > Subject: Re: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended, > > powergating is explicitly enabled > > > > > > > > Hi Curry, Evan, > > > > It seems vcn1.0 power gate sequence are special. > > > > if the original solution is thread safe, then the following solution will be > thread safe also. > > > > static int vcn_v1_0_hw_fini(void *handle) { > > struct amdgpu_device *adev = (struct amdgpu_device *)handle; > > > > cancel_delayed_work_sync(&adev->vcn.idle_work); > > > > if ((adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG) || > > (adev->vcn.cur_state != AMD_PG_STATE_GATE && > > RREG32_SOC15(VCN, 0, mmUVD_STATUS))) { > > +if (adev->pm.dpm_enabled) > > +amdgpu_dpm_enable_uvd(adev, false); > > +else > > +vcn_v1_0_set_powergating_state(adev, AMD_PG_STATE_GATE); > > > > [Quan, Evan] Considering adev->pm.dpm_enabled is always true, > “vcn_v1_0_set_powergating_state(adev, AMD_PG_STATE_GATE); “ will > become dead code. > > > > Also, the vcn_v1_0_hw_fini is also used in other scenarios(except the > suspend/resume discussed here). So, it seems quite risky compared with > Curry’s original patch. > > > > Before we can come up with a better idea, I would suggest to land Curry’s > original patch as a quick fix. > > > > > > > > BR > > > > Evan > > > > > > } > > > > Best Regards! > > > > James > > > > On 2021-12-13 3:55 a.m., Gong, Curry wrote: > > > > [AMD Official Use Only] > > > > > > > > Hi James: > > > > > > > > With the following patch, an error will be reported when the driver is > > loaded > > > > +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c > > > > @@ -1202,6 +1204,9 @@ static int vcn_v1_0_stop(struct amdgpu_device > > *adev) > > > > else > > > > r = vcn_v1_0_stop_spg_mode(adev); > > > > > > > > c > > > > return r; > > > > } > > > > > > > > > > > > $ dmesg > > > > [ 363.181081] INFO: task kworker/3:2:223 blocked for more than 120 > seconds. > > > > [ 363.181150] Tainted: G OE 5.11.0-41-generic > > #45~20.04.1- > Ubuntu > > > > [ 363.181208] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > > > > [ 363.181266] task:kworker/3:2 state:D stack:0 pid: 223 ppid: > > 2 > flags:0x4000 > > > > [ 363.181276] Workqueue: events vcn_v1_0_idle_work_handler [amdgpu] > > > > [ 363.181612] Call Trace: > > > > [ 363.181618] __schedule+0x44c/0x8a0 > > > > [ 363.181627] schedule+0x4f/0xc0 > > > > [ 363.181631] schedule_preempt_disabled+0xe/0x10 > > > > [ 363.181636] __mutex_lock.isra.0+0x183/0x4d0 > > > > [ 363.181643] __mutex_lock_slowpath+0x13/0x20 > > > > [ 363.181648] mutex_lock+0x32/0x40 > > > > [ 363.181652] amdgpu_dpm_set_powergating_by_smu+0x9c/0x180 > [amdgpu] > > > > [ 363.182055] amdgpu_dpm_enable_uvd+0x38/0x110 [amdgpu] > > > > [ 363.182454] vcn_v1_0_set_powergating_state+0x2e7e/0x3cf0
Re: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended, powergating is explicitly enabled
FWIW, it looks like all versions of VCN need the same fix. There have been reports of suspend failing when VCN is in use on other newer APUs as well. Alex On Tue, Dec 14, 2021 at 12:59 AM Quan, Evan wrote: > > [AMD Official Use Only] > > > > > > > > From: Zhu, James > Sent: Monday, December 13, 2021 9:39 PM > To: Gong, Curry ; Zhu, James ; > amd-gfx@lists.freedesktop.org > Cc: Liu, Leo ; Quan, Evan ; Deucher, > Alexander > Subject: Re: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended, > powergating is explicitly enabled > > > > Hi Curry, Evan, > > It seems vcn1.0 power gate sequence are special. > > if the original solution is thread safe, then the following solution will be > thread safe also. > > static int vcn_v1_0_hw_fini(void *handle) > { > struct amdgpu_device *adev = (struct amdgpu_device *)handle; > > cancel_delayed_work_sync(&adev->vcn.idle_work); > > if ((adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG) || > (adev->vcn.cur_state != AMD_PG_STATE_GATE && > RREG32_SOC15(VCN, 0, mmUVD_STATUS))) { > +if (adev->pm.dpm_enabled) > +amdgpu_dpm_enable_uvd(adev, false); > +else > +vcn_v1_0_set_powergating_state(adev, AMD_PG_STATE_GATE); > > [Quan, Evan] Considering adev->pm.dpm_enabled is always true, > “vcn_v1_0_set_powergating_state(adev, AMD_PG_STATE_GATE); “ will become dead > code. > > Also, the vcn_v1_0_hw_fini is also used in other scenarios(except the > suspend/resume discussed here). So, it seems quite risky compared with > Curry’s original patch. > > Before we can come up with a better idea, I would suggest to land Curry’s > original patch as a quick fix. > > > > BR > > Evan > > > } > > Best Regards! > > James > > On 2021-12-13 3:55 a.m., Gong, Curry wrote: > > [AMD Official Use Only] > > > > Hi James: > > > > With the following patch, an error will be reported when the driver is loaded > > +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c > > @@ -1202,6 +1204,9 @@ static int vcn_v1_0_stop(struct amdgpu_device *adev) > > else > > r = vcn_v1_0_stop_spg_mode(adev); > > > > c > > return r; > > } > > > > > > $ dmesg > > [ 363.181081] INFO: task kworker/3:2:223 blocked for more than 120 seconds. > > [ 363.181150] Tainted: G OE 5.11.0-41-generic > #45~20.04.1-Ubuntu > > [ 363.181208] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > > [ 363.181266] task:kworker/3:2 state:D stack:0 pid: 223 ppid: 2 > flags:0x4000 > > [ 363.181276] Workqueue: events vcn_v1_0_idle_work_handler [amdgpu] > > [ 363.181612] Call Trace: > > [ 363.181618] __schedule+0x44c/0x8a0 > > [ 363.181627] schedule+0x4f/0xc0 > > [ 363.181631] schedule_preempt_disabled+0xe/0x10 > > [ 363.181636] __mutex_lock.isra.0+0x183/0x4d0 > > [ 363.181643] __mutex_lock_slowpath+0x13/0x20 > > [ 363.181648] mutex_lock+0x32/0x40 > > [ 363.181652] amdgpu_dpm_set_powergating_by_smu+0x9c/0x180 [amdgpu] > > [ 363.182055] amdgpu_dpm_enable_uvd+0x38/0x110 [amdgpu] > > [ 363.182454] vcn_v1_0_set_powergating_state+0x2e7e/0x3cf0 [amdgpu] > > [ 363.182776] amdgpu_device_ip_set_powergating_state+0x6c/0xc0 [amdgpu] > > [ 363.183028] smu10_powergate_vcn+0x2a/0x80 [amdgpu] > > [ 363.183361] pp_set_powergating_by_smu+0xc5/0x2b0 [amdgpu] > > [ 363.183699] amdgpu_dpm_set_powergating_by_smu+0xb6/0x180 [amdgpu] > > [ 363.184040] amdgpu_dpm_enable_uvd+0x38/0x110 [amdgpu] > > [ 363.184391] vcn_v1_0_idle_work_handler+0xe1/0x130 [amdgpu] > > [ 363.184667] process_one_work+0x220/0x3c0 > > [ 363.184674] worker_thread+0x4d/0x3f0 > > [ 363.184677] ? process_one_work+0x3c0/0x3c0 > > [ 363.184680] kthread+0x12b/0x150 > > [ 363.184685] ? set_kthread_struct+0x40/0x40 > > [ 363.184690] ret_from_fork+0x22/0x30 > > [ 363.184699] INFO: task kworker/2:2:233 blocked for more than 120 seconds. > > [ 363.184739] Tainted: G OE 5.11.0-41-generic > #45~20.04.1-Ubuntu > > [ 363.184782] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables > this message. > > [ 363.184825] task:kworker/2:2 state:D stack:0 pid: 233 ppid: 2 > flags:0x4000 > > [ 363.184831] Workqueue: events amdgpu_device_delayed_init_work_handler > [amdgpu] > > [ 363.185085] Call Trace: > > [ 363.185087] __schedule+0x44c/0x8a0 > > [ 363.185092] schedule+0x4f/0xc0 > > [ 363
RE: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended, powergating is explicitly enabled
[AMD Official Use Only] From: Zhu, James Sent: Monday, December 13, 2021 9:39 PM To: Gong, Curry ; Zhu, James ; amd-gfx@lists.freedesktop.org Cc: Liu, Leo ; Quan, Evan ; Deucher, Alexander Subject: Re: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended, powergating is explicitly enabled Hi Curry, Evan, It seems vcn1.0 power gate sequence are special. if the original solution is thread safe, then the following solution will be thread safe also. static int vcn_v1_0_hw_fini(void *handle) { struct amdgpu_device *adev = (struct amdgpu_device *)handle; cancel_delayed_work_sync(&adev->vcn.idle_work); if ((adev->pg_flags & AMD_PG_SUPPORT_VCN_DPG) || (adev->vcn.cur_state != AMD_PG_STATE_GATE && RREG32_SOC15(VCN, 0, mmUVD_STATUS))) { +if (adev->pm.dpm_enabled) +amdgpu_dpm_enable_uvd(adev, false); +else +vcn_v1_0_set_powergating_state(adev, AMD_PG_STATE_GATE); [Quan, Evan] Considering adev->pm.dpm_enabled is always true, "vcn_v1_0_set_powergating_state(adev, AMD_PG_STATE_GATE); " will become dead code. Also, the vcn_v1_0_hw_fini is also used in other scenarios(except the suspend/resume discussed here). So, it seems quite risky compared with Curry's original patch. Before we can come up with a better idea, I would suggest to land Curry's original patch as a quick fix. BR Evan } Best Regards! James On 2021-12-13 3:55 a.m., Gong, Curry wrote: [AMD Official Use Only] Hi James: With the following patch, an error will be reported when the driver is loaded +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c @@ -1202,6 +1204,9 @@ static int vcn_v1_0_stop(struct amdgpu_device *adev) else r = vcn_v1_0_stop_spg_mode(adev); c return r; } $ dmesg [ 363.181081] INFO: task kworker/3:2:223 blocked for more than 120 seconds. [ 363.181150] Tainted: G OE 5.11.0-41-generic #45~20.04.1-Ubuntu [ 363.181208] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 363.181266] task:kworker/3:2 state:D stack:0 pid: 223 ppid: 2 flags:0x4000 [ 363.181276] Workqueue: events vcn_v1_0_idle_work_handler [amdgpu] [ 363.181612] Call Trace: [ 363.181618] __schedule+0x44c/0x8a0 [ 363.181627] schedule+0x4f/0xc0 [ 363.181631] schedule_preempt_disabled+0xe/0x10 [ 363.181636] __mutex_lock.isra.0+0x183/0x4d0 [ 363.181643] __mutex_lock_slowpath+0x13/0x20 [ 363.181648] mutex_lock+0x32/0x40 [ 363.181652] amdgpu_dpm_set_powergating_by_smu+0x9c/0x180 [amdgpu] [ 363.182055] amdgpu_dpm_enable_uvd+0x38/0x110 [amdgpu] [ 363.182454] vcn_v1_0_set_powergating_state+0x2e7e/0x3cf0 [amdgpu] [ 363.182776] amdgpu_device_ip_set_powergating_state+0x6c/0xc0 [amdgpu] [ 363.183028] smu10_powergate_vcn+0x2a/0x80 [amdgpu] [ 363.183361] pp_set_powergating_by_smu+0xc5/0x2b0 [amdgpu] [ 363.183699] amdgpu_dpm_set_powergating_by_smu+0xb6/0x180 [amdgpu] [ 363.184040] amdgpu_dpm_enable_uvd+0x38/0x110 [amdgpu] [ 363.184391] vcn_v1_0_idle_work_handler+0xe1/0x130 [amdgpu] [ 363.184667] process_one_work+0x220/0x3c0 [ 363.184674] worker_thread+0x4d/0x3f0 [ 363.184677] ? process_one_work+0x3c0/0x3c0 [ 363.184680] kthread+0x12b/0x150 [ 363.184685] ? set_kthread_struct+0x40/0x40 [ 363.184690] ret_from_fork+0x22/0x30 [ 363.184699] INFO: task kworker/2:2:233 blocked for more than 120 seconds. [ 363.184739] Tainted: G OE 5.11.0-41-generic #45~20.04.1-Ubuntu [ 363.184782] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 363.184825] task:kworker/2:2 state:D stack:0 pid: 233 ppid: 2 flags:0x4000 [ 363.184831] Workqueue: events amdgpu_device_delayed_init_work_handler [amdgpu] [ 363.185085] Call Trace: [ 363.185087] __schedule+0x44c/0x8a0 [ 363.185092] schedule+0x4f/0xc0 [ 363.185095] schedule_timeout+0x202/0x290 [ 363.185099] ? sched_clock_cpu+0x11/0xb0 [ 363.185105] wait_for_completion+0x94/0x100 [ 363.185110] __flush_work+0x12a/0x1e0 [ 363.185113] ? worker_detach_from_pool+0xc0/0xc0 [ 363.185119] __cancel_work_timer+0x10e/0x190 [ 363.185123] cancel_delayed_work_sync+0x13/0x20 [ 363.185126] vcn_v1_0_ring_begin_use+0x20/0x70 [amdgpu] [ 363.185401] amdgpu_ring_alloc+0x48/0x60 [amdgpu] [ 363.185640] amdgpu_ib_schedule+0x493/0x600 [amdgpu] [ 363.185884] amdgpu_job_submit_direct+0x3c/0xd0 [amdgpu] [ 363.186186] amdgpu_vcn_dec_send_msg+0x105/0x210 [amdgpu] [ 363.186460] amdgpu_vcn_dec_ring_test_ib+0x69/0x110 [amdgpu] [ 363.186734] amdgpu_ib_ring_tests+0xf5/0x160 [amdgpu] [ 363.186978] amdgpu_device_delayed_init_work_handler+0x15/0x30 [amdgpu] [ 363.187206] process_one_work+0x220/0x3c0 [ 363.187210] worker_thread+0x4d/0x3f0 [ 363.187214] ? process_one_work+0x3c0/0x3c0 [ 363.187217] kthread+0x12b/0x150 [ 363.187221] ? set_kthread_struct+0x40/0x40 [ 363.187226] ret_from_fo
RE: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended, powergating is explicitly enabled
[AMD Official Use Only] Checked the log paste below with Curry. The way to add this fix in vcn_v1_0_stop is not workable. As it will induce a circle calling(below) and lead to dead lock. VCN ring begin use -> amdgpu_dpm_enable_uvd -> acquire the smu_lock -> smu10_powergate_vcn -> amdgpu_device_ip_set_powergating_state -> vcn_v1_0_stop -> amdgpu_dpm_enable_uvd -> try to acquire the smu_lock again -> dead lock BR Evan From: Gong, Curry Sent: Monday, December 13, 2021 4:56 PM To: Zhu, James ; amd-gfx@lists.freedesktop.org Cc: Liu, Leo ; Quan, Evan ; Deucher, Alexander Subject: RE: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended, powergating is explicitly enabled [AMD Official Use Only] Hi James: With the following patch, an error will be reported when the driver is loaded +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c @@ -1202,6 +1204,9 @@ static int vcn_v1_0_stop(struct amdgpu_device *adev) else r = vcn_v1_0_stop_spg_mode(adev); + if (adev->pm.dpm_enabled) + amdgpu_dpm_enable_uvd(adev, false); + return r; } $ dmesg [ 363.181081] INFO: task kworker/3:2:223 blocked for more than 120 seconds. [ 363.181150] Tainted: G OE 5.11.0-41-generic #45~20.04.1-Ubuntu [ 363.181208] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 363.181266] task:kworker/3:2 state:D stack:0 pid: 223 ppid: 2 flags:0x4000 [ 363.181276] Workqueue: events vcn_v1_0_idle_work_handler [amdgpu] [ 363.181612] Call Trace: [ 363.181618] __schedule+0x44c/0x8a0 [ 363.181627] schedule+0x4f/0xc0 [ 363.181631] schedule_preempt_disabled+0xe/0x10 [ 363.181636] __mutex_lock.isra.0+0x183/0x4d0 [ 363.181643] __mutex_lock_slowpath+0x13/0x20 [ 363.181648] mutex_lock+0x32/0x40 [ 363.181652] amdgpu_dpm_set_powergating_by_smu+0x9c/0x180 [amdgpu] [ 363.182055] amdgpu_dpm_enable_uvd+0x38/0x110 [amdgpu] [ 363.182454] vcn_v1_0_set_powergating_state+0x2e7e/0x3cf0 [amdgpu] [ 363.182776] amdgpu_device_ip_set_powergating_state+0x6c/0xc0 [amdgpu] [ 363.183028] smu10_powergate_vcn+0x2a/0x80 [amdgpu] [ 363.183361] pp_set_powergating_by_smu+0xc5/0x2b0 [amdgpu] [ 363.183699] amdgpu_dpm_set_powergating_by_smu+0xb6/0x180 [amdgpu] [ 363.184040] amdgpu_dpm_enable_uvd+0x38/0x110 [amdgpu] [ 363.184391] vcn_v1_0_idle_work_handler+0xe1/0x130 [amdgpu] [ 363.184667] process_one_work+0x220/0x3c0 [ 363.184674] worker_thread+0x4d/0x3f0 [ 363.184677] ? process_one_work+0x3c0/0x3c0 [ 363.184680] kthread+0x12b/0x150 [ 363.184685] ? set_kthread_struct+0x40/0x40 [ 363.184690] ret_from_fork+0x22/0x30 [ 363.184699] INFO: task kworker/2:2:233 blocked for more than 120 seconds. [ 363.184739] Tainted: G OE 5.11.0-41-generic #45~20.04.1-Ubuntu [ 363.184782] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 363.184825] task:kworker/2:2 state:D stack:0 pid: 233 ppid: 2 flags:0x4000 [ 363.184831] Workqueue: events amdgpu_device_delayed_init_work_handler [amdgpu] [ 363.185085] Call Trace: [ 363.185087] __schedule+0x44c/0x8a0 [ 363.185092] schedule+0x4f/0xc0 [ 363.185095] schedule_timeout+0x202/0x290 [ 363.185099] ? sched_clock_cpu+0x11/0xb0 [ 363.185105] wait_for_completion+0x94/0x100 [ 363.185110] __flush_work+0x12a/0x1e0 [ 363.185113] ? worker_detach_from_pool+0xc0/0xc0 [ 363.185119] __cancel_work_timer+0x10e/0x190 [ 363.185123] cancel_delayed_work_sync+0x13/0x20 [ 363.185126] vcn_v1_0_ring_begin_use+0x20/0x70 [amdgpu] [ 363.185401] amdgpu_ring_alloc+0x48/0x60 [amdgpu] [ 363.185640] amdgpu_ib_schedule+0x493/0x600 [amdgpu] [ 363.185884] amdgpu_job_submit_direct+0x3c/0xd0 [amdgpu] [ 363.186186] amdgpu_vcn_dec_send_msg+0x105/0x210 [amdgpu] [ 363.186460] amdgpu_vcn_dec_ring_test_ib+0x69/0x110 [amdgpu] [ 363.186734] amdgpu_ib_ring_tests+0xf5/0x160 [amdgpu] [ 363.186978] amdgpu_device_delayed_init_work_handler+0x15/0x30 [amdgpu] [ 363.187206] process_one_work+0x220/0x3c0 [ 363.187210] worker_thread+0x4d/0x3f0 [ 363.187214] ? process_one_work+0x3c0/0x3c0 [ 363.187217] kthread+0x12b/0x150 [ 363.187221] ? set_kthread_struct+0x40/0x40 [ 363.187226] ret_from_fork+0x22/0x30 BR Curry Gong From: Zhu, James mailto:james@amd.com>> Sent: Saturday, December 11, 2021 5:07 AM To: Gong, Curry mailto:curry.g...@amd.com>>; amd-gfx@lists.freedesktop.org<mailto:amd-gfx@lists.freedesktop.org> Cc: Liu, Leo mailto:leo@amd.com>>; Zhu, James mailto:james@amd.com>>; Quan, Evan mailto:evan.q...@amd.com>>; Deucher, Alexander mailto:alexander.deuc...@amd.com>> Subject: Re: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended, powergating is explicitly enabled On 2021-12-10 6:41 a.m., chen gong wrote: Play a video on the raven (or PCO, raven2) platform, and then do the S3 test. When resume, the following erro
RE: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended, powergating is explicitly enabled
[AMD Official Use Only] Hi James: With the following patch, an error will be reported when the driver is loaded +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c @@ -1202,6 +1204,9 @@ static int vcn_v1_0_stop(struct amdgpu_device *adev) else r = vcn_v1_0_stop_spg_mode(adev); + if (adev->pm.dpm_enabled) + amdgpu_dpm_enable_uvd(adev, false); + return r; } $ dmesg [ 363.181081] INFO: task kworker/3:2:223 blocked for more than 120 seconds. [ 363.181150] Tainted: G OE 5.11.0-41-generic #45~20.04.1-Ubuntu [ 363.181208] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 363.181266] task:kworker/3:2 state:D stack:0 pid: 223 ppid: 2 flags:0x4000 [ 363.181276] Workqueue: events vcn_v1_0_idle_work_handler [amdgpu] [ 363.181612] Call Trace: [ 363.181618] __schedule+0x44c/0x8a0 [ 363.181627] schedule+0x4f/0xc0 [ 363.181631] schedule_preempt_disabled+0xe/0x10 [ 363.181636] __mutex_lock.isra.0+0x183/0x4d0 [ 363.181643] __mutex_lock_slowpath+0x13/0x20 [ 363.181648] mutex_lock+0x32/0x40 [ 363.181652] amdgpu_dpm_set_powergating_by_smu+0x9c/0x180 [amdgpu] [ 363.182055] amdgpu_dpm_enable_uvd+0x38/0x110 [amdgpu] [ 363.182454] vcn_v1_0_set_powergating_state+0x2e7e/0x3cf0 [amdgpu] [ 363.182776] amdgpu_device_ip_set_powergating_state+0x6c/0xc0 [amdgpu] [ 363.183028] smu10_powergate_vcn+0x2a/0x80 [amdgpu] [ 363.183361] pp_set_powergating_by_smu+0xc5/0x2b0 [amdgpu] [ 363.183699] amdgpu_dpm_set_powergating_by_smu+0xb6/0x180 [amdgpu] [ 363.184040] amdgpu_dpm_enable_uvd+0x38/0x110 [amdgpu] [ 363.184391] vcn_v1_0_idle_work_handler+0xe1/0x130 [amdgpu] [ 363.184667] process_one_work+0x220/0x3c0 [ 363.184674] worker_thread+0x4d/0x3f0 [ 363.184677] ? process_one_work+0x3c0/0x3c0 [ 363.184680] kthread+0x12b/0x150 [ 363.184685] ? set_kthread_struct+0x40/0x40 [ 363.184690] ret_from_fork+0x22/0x30 [ 363.184699] INFO: task kworker/2:2:233 blocked for more than 120 seconds. [ 363.184739] Tainted: G OE 5.11.0-41-generic #45~20.04.1-Ubuntu [ 363.184782] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 363.184825] task:kworker/2:2 state:D stack:0 pid: 233 ppid: 2 flags:0x4000 [ 363.184831] Workqueue: events amdgpu_device_delayed_init_work_handler [amdgpu] [ 363.185085] Call Trace: [ 363.185087] __schedule+0x44c/0x8a0 [ 363.185092] schedule+0x4f/0xc0 [ 363.185095] schedule_timeout+0x202/0x290 [ 363.185099] ? sched_clock_cpu+0x11/0xb0 [ 363.185105] wait_for_completion+0x94/0x100 [ 363.185110] __flush_work+0x12a/0x1e0 [ 363.185113] ? worker_detach_from_pool+0xc0/0xc0 [ 363.185119] __cancel_work_timer+0x10e/0x190 [ 363.185123] cancel_delayed_work_sync+0x13/0x20 [ 363.185126] vcn_v1_0_ring_begin_use+0x20/0x70 [amdgpu] [ 363.185401] amdgpu_ring_alloc+0x48/0x60 [amdgpu] [ 363.185640] amdgpu_ib_schedule+0x493/0x600 [amdgpu] [ 363.185884] amdgpu_job_submit_direct+0x3c/0xd0 [amdgpu] [ 363.186186] amdgpu_vcn_dec_send_msg+0x105/0x210 [amdgpu] [ 363.186460] amdgpu_vcn_dec_ring_test_ib+0x69/0x110 [amdgpu] [ 363.186734] amdgpu_ib_ring_tests+0xf5/0x160 [amdgpu] [ 363.186978] amdgpu_device_delayed_init_work_handler+0x15/0x30 [amdgpu] [ 363.187206] process_one_work+0x220/0x3c0 [ 363.187210] worker_thread+0x4d/0x3f0 [ 363.187214] ? process_one_work+0x3c0/0x3c0 [ 363.187217] kthread+0x12b/0x150 [ 363.187221] ? set_kthread_struct+0x40/0x40 [ 363.187226] ret_from_fork+0x22/0x30 BR Curry Gong From: Zhu, James Sent: Saturday, December 11, 2021 5:07 AM To: Gong, Curry ; amd-gfx@lists.freedesktop.org Cc: Liu, Leo ; Zhu, James ; Quan, Evan ; Deucher, Alexander Subject: Re: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended, powergating is explicitly enabled On 2021-12-10 6:41 a.m., chen gong wrote: Play a video on the raven (or PCO, raven2) platform, and then do the S3 test. When resume, the following error will be reported: amdgpu :02:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring vcn_dec test failed (-110) [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block failed -110 amdgpu :02:00.0: amdgpu: amdgpu_device_ip_resume failed (-110). PM: dpm_run_callback(): pci_pm_resume+0x0/0x90 returns -110 [why] When playing the video: The power state flag of the vcn block is set to POWER_STATE_ON. When doing suspend: There is no change to the power state flag of the vcn block, it is still POWER_STATE_ON. When doing resume: Need to open the power gate of the vcn block and set the power state flag of the VCN block to POWER_STATE_ON. But at this time, the power state flag of the vcn block is already POWER_STATE_ON. The power status flag check in the "8f2cdef drm/amd/pm: avoid duplicate powergate/ungate setting" patch will return the amdgpu_dpm_set_powergating_by_smu function directly. As
Re: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended, powergating is explicitly enabled
On 2021-12-10 11:19 a.m., Quan, Evan wrote: [AMD Official Use Only] -Original Message- From: Lazar, Lijo Sent: Friday, December 10, 2021 8:25 PM To: Gong, Curry;amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander; Zhu, James ; Liu, Leo; Quan, Evan Subject: Re: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended, powergating is explicitly enabled On 12/10/2021 5:11 PM, chen gong wrote: Play a video on the raven (or PCO, raven2) platform, and then do the S3 test. When resume, the following error will be reported: amdgpu :02:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring vcn_dec test failed (-110) [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block failed -110 amdgpu :02:00.0: amdgpu: amdgpu_device_ip_resume failed (-110). PM: dpm_run_callback(): pci_pm_resume+0x0/0x90 returns -110 [why] When playing the video: The power state flag of the vcn block is set to POWER_STATE_ON. When doing suspend: There is no change to the power state flag of the vcn block, it is still POWER_STATE_ON. When doing resume: Need to open the power gate of the vcn block and set the power state flag of the VCN block to POWER_STATE_ON. But at this time, the power state flag of the vcn block is already POWER_STATE_ON. The power status flag check in the "8f2cdef drm/amd/pm: avoid duplicate powergate/ungate setting" patch will return the amdgpu_dpm_set_powergating_by_smu function directly. As a result, the gate of the power was not opened, causing the subsequent ring test to fail. [how] In the suspend function of the vcn block, explicitly change the power state flag of the vcn block to POWER_STATE_OFF. Signed-off-by: chen gong --- drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c index d54d720..d73676b 100644 --- a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c @@ -246,6 +246,13 @@ static int vcn_v1_0_suspend(void *handle) { int r; struct amdgpu_device *adev = (struct amdgpu_device *)handle; + bool cancel_success; + + cancel_success = cancel_delayed_work_sync(&adev- vcn.idle_work); + if (cancel_success) { + if (adev->pm.dpm_enabled) + amdgpu_dpm_enable_uvd(adev, false); + } Probably this is a common issue. Can you try moving this to amdgpu_vcn_suspend? [Quan, Evan] Yes, amdgpu_vcn_suspend seems a more proper place. However, the vcn code is not consistent. For vcn v2 and later, they already do the manual gate operation in their suspend routine(like vcn_v2_0_stop). So, it is actually only vcn_v1_0.c suffers this issue. [JZ] Then what is concern for vcn1 not doing the same thing as vcn_v2_0_stop? if (adev->pm.dpm_enabled) amdgpu_device_ip_set_powergating_state(adev, AMD_IP_BLOCK_TYPE_VCN,AMD_PG_STATE_GATE); Call this after cancel_delayed_work_sync. Shouldn't have any effect if idle work already put it in PG state. Evan, what do you think? [Quan, Evan] Should not for now. But I'm considering to raise the dev_dbg prompt in amdgpu_dpm_set_powergating_by_smu for double gate/ungate to dev_info or dev_warn. Hopefully that can help to capture some potential issues like this. So, better to keep the check for the return value of cancel_delayed_work_sync here. BR Evan Thanks, Lijo r = vcn_v1_0_hw_fini(adev); if (r)
Re: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended, powergating is explicitly enabled
On 2021-12-10 6:41 a.m., chen gong wrote: Play a video on the raven (or PCO, raven2) platform, and then do the S3 test. When resume, the following error will be reported: amdgpu :02:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring vcn_dec test failed (-110) [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block failed -110 amdgpu :02:00.0: amdgpu: amdgpu_device_ip_resume failed (-110). PM: dpm_run_callback(): pci_pm_resume+0x0/0x90 returns -110 [why] When playing the video: The power state flag of the vcn block is set to POWER_STATE_ON. When doing suspend: There is no change to the power state flag of the vcn block, it is still POWER_STATE_ON. When doing resume: Need to open the power gate of the vcn block and set the power state flag of the VCN block to POWER_STATE_ON. But at this time, the power state flag of the vcn block is already POWER_STATE_ON. The power status flag check in the "8f2cdef drm/amd/pm: avoid duplicate powergate/ungate setting" patch will return the amdgpu_dpm_set_powergating_by_smu function directly. As a result, the gate of the power was not opened, causing the subsequent ring test to fail. [how] In the suspend function of the vcn block, explicitly change the power state flag of the vcn block to POWER_STATE_OFF. Signed-off-by: chen gong --- drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c index d54d720..d73676b 100644 --- a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c @@ -246,6 +246,13 @@ static int vcn_v1_0_suspend(void *handle) { int r; struct amdgpu_device *adev = (struct amdgpu_device *)handle; + bool cancel_success; + + cancel_success = cancel_delayed_work_sync(&adev->vcn.idle_work); [JZ] Can you refer to vcn_v3_0_stop , and add amdgpu_dpm_enable_uvd(adev, false); to the end of vcn_v1_0_stop? See if it also can help. + if (cancel_success) { + if (adev->pm.dpm_enabled) + amdgpu_dpm_enable_uvd(adev, false); + } r = vcn_v1_0_hw_fini(adev); if (r)
Re: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended, powergating is explicitly enabled
On Fri, Dec 10, 2021 at 11:06 AM Quan, Evan wrote: > > [AMD Official Use Only] > > Hi Curry, > > Some nitpicks below. With them fixed, the patch is reviewed-by: Evan Quan > > > @Deucher, Alexander this should be able address the issue reported by > https://gitlab.freedesktop.org/drm/amd/-/issues/1828. Can you help to confirm > this? Yes, the patch works according to the reporter. Alex > > BR > Evan > > -Original Message- > > From: chen gong > > Sent: Friday, December 10, 2021 7:42 PM > > To: amd-gfx@lists.freedesktop.org > > Cc: Liu, Leo ; Zhu, James ; Quan, > > Evan ; Deucher, Alexander > > ; Gong, Curry > > Subject: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended, > > powergating is explicitly enabled > > > > Play a video on the raven (or PCO, raven2) platform, and then do the S3 > > test. When resume, the following error will be reported: > > > > amdgpu :02:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* > > ring > > vcn_dec test failed (-110) > > [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP > > block > > failed -110 > > amdgpu :02:00.0: amdgpu: amdgpu_device_ip_resume failed (-110). > > PM: dpm_run_callback(): pci_pm_resume+0x0/0x90 returns -110 > > > > [why] > > When playing the video: The power state flag of the vcn block is set to > > POWER_STATE_ON. > > > > When doing suspend: There is no change to the power state flag of the > > vcn block, it is still POWER_STATE_ON. > > > > When doing resume: Need to open the power gate of the vcn block and set > > the power state flag of the VCN block to POWER_STATE_ON. > > But at this time, the power state flag of the vcn block is already > > POWER_STATE_ON. The power status flag check in the "8f2cdef > > drm/amd/pm: > > avoid duplicate powergate/ungate setting" patch will return the > > amdgpu_dpm_set_powergating_by_smu function directly. > > As a result, the gate of the power was not opened, causing the > > subsequent ring test to fail. > Better to update this as some descriptions below: > adev-> vcn.idle_work will be triggered when the VCN ring idle for 1S. And we > rely on it to do the VCN gate(power down). > However, if the VCN ring is still using on suspend kicked, there will be no > chance for adev->vcn.idle_work to do the VCN gate. > That will make driver wrongly treat VCN as ungated(power on) and prevent > further VCN ungate(power on) operation(which is actually needed) on resume. > > > > [how] > > In the suspend function of the vcn block, explicitly change the power > > state flag of the vcn block to POWER_STATE_OFF. > Maybe some descriptions as below is better: > Manually do the VCN gate(power down) in the suspend routine if > adev->vcn.idle_work does not(have no chance) do that. > > > > Signed-off-by: chen gong > > --- > > drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c | 7 +++ > > 1 file changed, 7 insertions(+) > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c > > b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c > > index d54d720..d73676b 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c > > +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c > > @@ -246,6 +246,13 @@ static int vcn_v1_0_suspend(void *handle) > > { > > int r; > > struct amdgpu_device *adev = (struct amdgpu_device *)handle; > > + bool cancel_success; > This seems not a good naming since the cancel always succeed. The difference > is whether the idle_work get actually executed. > Better to rename it as "idle_work_unexecuted" or just "vcn_not_gated"? > > + > > + cancel_success = cancel_delayed_work_sync(&adev- > > >vcn.idle_work); > > + if (cancel_success) { > > + if (adev->pm.dpm_enabled) > > + amdgpu_dpm_enable_uvd(adev, false); > > + } > > > > r = vcn_v1_0_hw_fini(adev); > > if (r) > > -- > > 2.7.4
RE: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended, powergating is explicitly enabled
[AMD Official Use Only] > -Original Message- > From: Lazar, Lijo > Sent: Friday, December 10, 2021 8:25 PM > To: Gong, Curry ; amd-gfx@lists.freedesktop.org > Cc: Deucher, Alexander ; Zhu, James > ; Liu, Leo ; Quan, Evan > > Subject: Re: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended, > powergating is explicitly enabled > > > > On 12/10/2021 5:11 PM, chen gong wrote: > > Play a video on the raven (or PCO, raven2) platform, and then do the > > S3 test. When resume, the following error will be reported: > > > > amdgpu :02:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* > > ring vcn_dec test failed (-110) > > [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of > IP > > block failed -110 amdgpu :02:00.0: amdgpu: > > amdgpu_device_ip_resume failed (-110). > > PM: dpm_run_callback(): pci_pm_resume+0x0/0x90 returns -110 > > > > [why] > > When playing the video: The power state flag of the vcn block is set > > to POWER_STATE_ON. > > > > When doing suspend: There is no change to the power state flag of the > > vcn block, it is still POWER_STATE_ON. > > > > When doing resume: Need to open the power gate of the vcn block and > > set the power state flag of the VCN block to POWER_STATE_ON. > > But at this time, the power state flag of the vcn block is already > > POWER_STATE_ON. The power status flag check in the "8f2cdef > drm/amd/pm: > > avoid duplicate powergate/ungate setting" patch will return the > > amdgpu_dpm_set_powergating_by_smu function directly. > > As a result, the gate of the power was not opened, causing the > > subsequent ring test to fail. > > > > [how] > > In the suspend function of the vcn block, explicitly change the power > > state flag of the vcn block to POWER_STATE_OFF. > > > > Signed-off-by: chen gong > > --- > > drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c | 7 +++ > > 1 file changed, 7 insertions(+) > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c > > b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c > > index d54d720..d73676b 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c > > +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c > > @@ -246,6 +246,13 @@ static int vcn_v1_0_suspend(void *handle) > > { > > int r; > > struct amdgpu_device *adev = (struct amdgpu_device *)handle; > > + bool cancel_success; > > + > > + cancel_success = cancel_delayed_work_sync(&adev- > >vcn.idle_work); > > + if (cancel_success) { > > + if (adev->pm.dpm_enabled) > > + amdgpu_dpm_enable_uvd(adev, false); > > + } > > > > Probably this is a common issue. Can you try moving this to > amdgpu_vcn_suspend? [Quan, Evan] Yes, amdgpu_vcn_suspend seems a more proper place. However, the vcn code is not consistent. For vcn v2 and later, they already do the manual gate operation in their suspend routine(like vcn_v2_0_stop). So, it is actually only vcn_v1_0.c suffers this issue. > > if (adev->pm.dpm_enabled) > amdgpu_device_ip_set_powergating_state(adev, > AMD_IP_BLOCK_TYPE_VCN,AMD_PG_STATE_GATE); > > Call this after cancel_delayed_work_sync. Shouldn't have any effect if idle > work already put it in PG state. Evan, what do you think? [Quan, Evan] Should not for now. But I'm considering to raise the dev_dbg prompt in amdgpu_dpm_set_powergating_by_smu for double gate/ungate to dev_info or dev_warn. Hopefully that can help to capture some potential issues like this. So, better to keep the check for the return value of cancel_delayed_work_sync here. BR Evan > > Thanks, > Lijo > > > r = vcn_v1_0_hw_fini(adev); > > if (r) > >
RE: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended, powergating is explicitly enabled
[AMD Official Use Only] Hi Curry, Some nitpicks below. With them fixed, the patch is reviewed-by: Evan Quan @Deucher, Alexander this should be able address the issue reported by https://gitlab.freedesktop.org/drm/amd/-/issues/1828. Can you help to confirm this? BR Evan > -Original Message- > From: chen gong > Sent: Friday, December 10, 2021 7:42 PM > To: amd-gfx@lists.freedesktop.org > Cc: Liu, Leo ; Zhu, James ; Quan, > Evan ; Deucher, Alexander > ; Gong, Curry > Subject: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended, > powergating is explicitly enabled > > Play a video on the raven (or PCO, raven2) platform, and then do the S3 > test. When resume, the following error will be reported: > > amdgpu :02:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* > ring > vcn_dec test failed (-110) > [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP > block > failed -110 > amdgpu :02:00.0: amdgpu: amdgpu_device_ip_resume failed (-110). > PM: dpm_run_callback(): pci_pm_resume+0x0/0x90 returns -110 > > [why] > When playing the video: The power state flag of the vcn block is set to > POWER_STATE_ON. > > When doing suspend: There is no change to the power state flag of the > vcn block, it is still POWER_STATE_ON. > > When doing resume: Need to open the power gate of the vcn block and set > the power state flag of the VCN block to POWER_STATE_ON. > But at this time, the power state flag of the vcn block is already > POWER_STATE_ON. The power status flag check in the "8f2cdef > drm/amd/pm: > avoid duplicate powergate/ungate setting" patch will return the > amdgpu_dpm_set_powergating_by_smu function directly. > As a result, the gate of the power was not opened, causing the > subsequent ring test to fail. Better to update this as some descriptions below: adev-> vcn.idle_work will be triggered when the VCN ring idle for 1S. And we rely on it to do the VCN gate(power down). However, if the VCN ring is still using on suspend kicked, there will be no chance for adev->vcn.idle_work to do the VCN gate. That will make driver wrongly treat VCN as ungated(power on) and prevent further VCN ungate(power on) operation(which is actually needed) on resume. > > [how] > In the suspend function of the vcn block, explicitly change the power > state flag of the vcn block to POWER_STATE_OFF. Maybe some descriptions as below is better: Manually do the VCN gate(power down) in the suspend routine if adev->vcn.idle_work does not(have no chance) do that. > > Signed-off-by: chen gong > --- > drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c | 7 +++ > 1 file changed, 7 insertions(+) > > diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c > b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c > index d54d720..d73676b 100644 > --- a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c > +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c > @@ -246,6 +246,13 @@ static int vcn_v1_0_suspend(void *handle) > { > int r; > struct amdgpu_device *adev = (struct amdgpu_device *)handle; > + bool cancel_success; This seems not a good naming since the cancel always succeed. The difference is whether the idle_work get actually executed. Better to rename it as "idle_work_unexecuted" or just "vcn_not_gated"? > + > + cancel_success = cancel_delayed_work_sync(&adev- > >vcn.idle_work); > + if (cancel_success) { > + if (adev->pm.dpm_enabled) > + amdgpu_dpm_enable_uvd(adev, false); > + } > > r = vcn_v1_0_hw_fini(adev); > if (r) > -- > 2.7.4
Re: [PATCH] drm/amdgpu: When the VCN(1.0) block is suspended, powergating is explicitly enabled
On 12/10/2021 5:11 PM, chen gong wrote: Play a video on the raven (or PCO, raven2) platform, and then do the S3 test. When resume, the following error will be reported: amdgpu :02:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring vcn_dec test failed (-110) [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block failed -110 amdgpu :02:00.0: amdgpu: amdgpu_device_ip_resume failed (-110). PM: dpm_run_callback(): pci_pm_resume+0x0/0x90 returns -110 [why] When playing the video: The power state flag of the vcn block is set to POWER_STATE_ON. When doing suspend: There is no change to the power state flag of the vcn block, it is still POWER_STATE_ON. When doing resume: Need to open the power gate of the vcn block and set the power state flag of the VCN block to POWER_STATE_ON. But at this time, the power state flag of the vcn block is already POWER_STATE_ON. The power status flag check in the "8f2cdef drm/amd/pm: avoid duplicate powergate/ungate setting" patch will return the amdgpu_dpm_set_powergating_by_smu function directly. As a result, the gate of the power was not opened, causing the subsequent ring test to fail. [how] In the suspend function of the vcn block, explicitly change the power state flag of the vcn block to POWER_STATE_OFF. Signed-off-by: chen gong --- drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c index d54d720..d73676b 100644 --- a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c @@ -246,6 +246,13 @@ static int vcn_v1_0_suspend(void *handle) { int r; struct amdgpu_device *adev = (struct amdgpu_device *)handle; + bool cancel_success; + + cancel_success = cancel_delayed_work_sync(&adev->vcn.idle_work); + if (cancel_success) { + if (adev->pm.dpm_enabled) + amdgpu_dpm_enable_uvd(adev, false); + } Probably this is a common issue. Can you try moving this to amdgpu_vcn_suspend? if (adev->pm.dpm_enabled) amdgpu_device_ip_set_powergating_state(adev, AMD_IP_BLOCK_TYPE_VCN,AMD_PG_STATE_GATE); Call this after cancel_delayed_work_sync. Shouldn't have any effect if idle work already put it in PG state. Evan, what do you think? Thanks, Lijo r = vcn_v1_0_hw_fini(adev); if (r)
[PATCH] drm/amdgpu: When the VCN(1.0) block is suspended, powergating is explicitly enabled
Play a video on the raven (or PCO, raven2) platform, and then do the S3 test. When resume, the following error will be reported: amdgpu :02:00.0: [drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring vcn_dec test failed (-110) [drm:amdgpu_device_ip_resume_phase2 [amdgpu]] *ERROR* resume of IP block failed -110 amdgpu :02:00.0: amdgpu: amdgpu_device_ip_resume failed (-110). PM: dpm_run_callback(): pci_pm_resume+0x0/0x90 returns -110 [why] When playing the video: The power state flag of the vcn block is set to POWER_STATE_ON. When doing suspend: There is no change to the power state flag of the vcn block, it is still POWER_STATE_ON. When doing resume: Need to open the power gate of the vcn block and set the power state flag of the VCN block to POWER_STATE_ON. But at this time, the power state flag of the vcn block is already POWER_STATE_ON. The power status flag check in the "8f2cdef drm/amd/pm: avoid duplicate powergate/ungate setting" patch will return the amdgpu_dpm_set_powergating_by_smu function directly. As a result, the gate of the power was not opened, causing the subsequent ring test to fail. [how] In the suspend function of the vcn block, explicitly change the power state flag of the vcn block to POWER_STATE_OFF. Signed-off-by: chen gong --- drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c index d54d720..d73676b 100644 --- a/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c +++ b/drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c @@ -246,6 +246,13 @@ static int vcn_v1_0_suspend(void *handle) { int r; struct amdgpu_device *adev = (struct amdgpu_device *)handle; + bool cancel_success; + + cancel_success = cancel_delayed_work_sync(&adev->vcn.idle_work); + if (cancel_success) { + if (adev->pm.dpm_enabled) + amdgpu_dpm_enable_uvd(adev, false); + } r = vcn_v1_0_hw_fini(adev); if (r) -- 2.7.4