Re: [PATCH] drm/amd/dispaly: fix deadlock issue in amdgpu reset
+ Harry and Nick On 2021-03-22 9:42 p.m., Yu, Lang wrote: [AMD Official Use Only - Internal Distribution Only] -Original Message- From: Grodzovsky, Andrey Sent: Monday, March 22, 2021 11:01 PM To: Yu, Lang ; amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander ; Huang, Ray Subject: Re: [PATCH] drm/amd/dispaly: fix deadlock issue in amdgpu reset On 2021-03-22 4:11 a.m., Lang Yu wrote: In amdggpu reset, while dm.dc_lock is held by dm_suspend, handle_hpd_rx_irq tries to acquire it. Deadlock occurred! Deadlock log: [ 104.528304] amdgpu :03:00.0: amdgpu: GPU reset begin! [ 104.640084] == [ 104.640092] WARNING: possible circular locking dependency detected [ 104.640099] 5.11.0-custom #1 Tainted: GW E [ 104.640107] -- [ 104.640114] cat/1158 is trying to acquire lock: [ 104.640120] 88810a09ce00 ((work_completion)(&lh->work)){+.+.}-{0:0}, at: __flush_work+0x2e3/0x450 [ 104.640144] but task is already holding lock: [ 104.640151] 88810a09cc70 (&adev->dm.dc_lock){+.+.}-{3:3}, at: dm_suspend+0xb2/0x1d0 [amdgpu] [ 104.640581] which lock already depends on the new lock. [ 104.640590] the existing dependency chain (in reverse order) is: [ 104.640598] -> #2 (&adev->dm.dc_lock){+.+.}-{3:3}: [ 104.640611]lock_acquire+0xca/0x390 [ 104.640623]__mutex_lock+0x9b/0x930 [ 104.640633]mutex_lock_nested+0x1b/0x20 [ 104.640640]handle_hpd_rx_irq+0x9b/0x1c0 [amdgpu] [ 104.640959]dm_irq_work_func+0x4e/0x60 [amdgpu] [ 104.641264]process_one_work+0x2a7/0x5b0 [ 104.641275]worker_thread+0x4a/0x3d0 [ 104.641283]kthread+0x125/0x160 [ 104.641290]ret_from_fork+0x22/0x30 [ 104.641300] -> #1 (&aconnector->hpd_lock){+.+.}-{3:3}: [ 104.641312]lock_acquire+0xca/0x390 [ 104.641321]__mutex_lock+0x9b/0x930 [ 104.641328]mutex_lock_nested+0x1b/0x20 [ 104.641336]handle_hpd_rx_irq+0x67/0x1c0 [amdgpu] [ 104.641635]dm_irq_work_func+0x4e/0x60 [amdgpu] [ 104.641931]process_one_work+0x2a7/0x5b0 [ 104.641940]worker_thread+0x4a/0x3d0 [ 104.641948]kthread+0x125/0x160 [ 104.641954]ret_from_fork+0x22/0x30 [ 104.641963] -> #0 ((work_completion)(&lh->work)){+.+.}-{0:0}: [ 104.641975]check_prev_add+0x94/0xbf0 [ 104.641983]__lock_acquire+0x130d/0x1ce0 [ 104.641992]lock_acquire+0xca/0x390 [ 104.642000]__flush_work+0x303/0x450 [ 104.642008]flush_work+0x10/0x20 [ 104.642016]amdgpu_dm_irq_suspend+0x93/0x100 [amdgpu] [ 104.642312]dm_suspend+0x181/0x1d0 [amdgpu] [ 104.642605]amdgpu_device_ip_suspend_phase1+0x8a/0x100 [amdgpu] [ 104.642835]amdgpu_device_ip_suspend+0x21/0x70 [amdgpu] [ 104.643066]amdgpu_device_pre_asic_reset+0x1bd/0x1d2 [amdgpu] [ 104.643403]amdgpu_device_gpu_recover.cold+0x5df/0xa9d [amdgpu] [ 104.643715]gpu_recover_get+0x2e/0x60 [amdgpu] [ 104.643951]simple_attr_read+0x6d/0x110 [ 104.643960]debugfs_attr_read+0x49/0x70 [ 104.643970]full_proxy_read+0x5f/0x90 [ 104.643979]vfs_read+0xa3/0x190 [ 104.643986]ksys_read+0x70/0xf0 [ 104.643992]__x64_sys_read+0x1a/0x20 [ 104.643999]do_syscall_64+0x38/0x90 [ 104.644007]entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 104.644017] other info that might help us debug this: [ 104.644026] Chain exists of: (work_completion)(&lh->work) --> &aconnector->hpd_lock --> &adev->dm.dc_lock [ 104.644043] Possible unsafe locking scenario: [ 104.644049]CPU0CPU1 [ 104.644055] [ 104.644060] lock(&adev->dm.dc_lock); [ 104.644066]lock(&aconnector->hpd_lock); [ 104.644075]lock(&adev->dm.dc_lock); [ 104.644083] lock((work_completion)(&lh->work)); [ 104.644090] *** DEADLOCK *** [ 104.644096] 3 locks held by cat/1158: [ 104.644103] #0: 88810d0e4eb8 (&attr->mutex){+.+.}-{3:3}, at: simple_attr_read+0x4e/0x110 [ 104.644119] #1: 88810a0a1600 (&adev->reset_sem){}-{3:3}, at: amdgpu_device_lock_adev+0x42/0x94 [amdgpu] [ 104.644489] #2: 88810a09cc70 (&adev->dm.dc_lock){+.+.}-{3:3}, at: dm_suspend+0xb2/0x1d0 [amdgpu] Signed-off-by: Lang Yu --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index e176ea84d75b..8727488df769 100644 --- a/driver
Re: [PATCH] drm/amd/dispaly: fix deadlock issue in amdgpu reset
Typo in the title: s/dispaly/display - Joshie 🐸✨ On 3/22/21 8:11 AM, Lang Yu wrote: In amdggpu reset, while dm.dc_lock is held by dm_suspend, handle_hpd_rx_irq tries to acquire it. Deadlock occurred! Deadlock log: [ 104.528304] amdgpu :03:00.0: amdgpu: GPU reset begin! [ 104.640084] == [ 104.640092] WARNING: possible circular locking dependency detected [ 104.640099] 5.11.0-custom #1 Tainted: GW E [ 104.640107] -- [ 104.640114] cat/1158 is trying to acquire lock: [ 104.640120] 88810a09ce00 ((work_completion)(&lh->work)){+.+.}-{0:0}, at: __flush_work+0x2e3/0x450 [ 104.640144] but task is already holding lock: [ 104.640151] 88810a09cc70 (&adev->dm.dc_lock){+.+.}-{3:3}, at: dm_suspend+0xb2/0x1d0 [amdgpu] [ 104.640581] which lock already depends on the new lock. [ 104.640590] the existing dependency chain (in reverse order) is: [ 104.640598] -> #2 (&adev->dm.dc_lock){+.+.}-{3:3}: [ 104.640611]lock_acquire+0xca/0x390 [ 104.640623]__mutex_lock+0x9b/0x930 [ 104.640633]mutex_lock_nested+0x1b/0x20 [ 104.640640]handle_hpd_rx_irq+0x9b/0x1c0 [amdgpu] [ 104.640959]dm_irq_work_func+0x4e/0x60 [amdgpu] [ 104.641264]process_one_work+0x2a7/0x5b0 [ 104.641275]worker_thread+0x4a/0x3d0 [ 104.641283]kthread+0x125/0x160 [ 104.641290]ret_from_fork+0x22/0x30 [ 104.641300] -> #1 (&aconnector->hpd_lock){+.+.}-{3:3}: [ 104.641312]lock_acquire+0xca/0x390 [ 104.641321]__mutex_lock+0x9b/0x930 [ 104.641328]mutex_lock_nested+0x1b/0x20 [ 104.641336]handle_hpd_rx_irq+0x67/0x1c0 [amdgpu] [ 104.641635]dm_irq_work_func+0x4e/0x60 [amdgpu] [ 104.641931]process_one_work+0x2a7/0x5b0 [ 104.641940]worker_thread+0x4a/0x3d0 [ 104.641948]kthread+0x125/0x160 [ 104.641954]ret_from_fork+0x22/0x30 [ 104.641963] -> #0 ((work_completion)(&lh->work)){+.+.}-{0:0}: [ 104.641975]check_prev_add+0x94/0xbf0 [ 104.641983]__lock_acquire+0x130d/0x1ce0 [ 104.641992]lock_acquire+0xca/0x390 [ 104.642000]__flush_work+0x303/0x450 [ 104.642008]flush_work+0x10/0x20 [ 104.642016]amdgpu_dm_irq_suspend+0x93/0x100 [amdgpu] [ 104.642312]dm_suspend+0x181/0x1d0 [amdgpu] [ 104.642605]amdgpu_device_ip_suspend_phase1+0x8a/0x100 [amdgpu] [ 104.642835]amdgpu_device_ip_suspend+0x21/0x70 [amdgpu] [ 104.643066]amdgpu_device_pre_asic_reset+0x1bd/0x1d2 [amdgpu] [ 104.643403]amdgpu_device_gpu_recover.cold+0x5df/0xa9d [amdgpu] [ 104.643715]gpu_recover_get+0x2e/0x60 [amdgpu] [ 104.643951]simple_attr_read+0x6d/0x110 [ 104.643960]debugfs_attr_read+0x49/0x70 [ 104.643970]full_proxy_read+0x5f/0x90 [ 104.643979]vfs_read+0xa3/0x190 [ 104.643986]ksys_read+0x70/0xf0 [ 104.643992]__x64_sys_read+0x1a/0x20 [ 104.643999]do_syscall_64+0x38/0x90 [ 104.644007]entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 104.644017] other info that might help us debug this: [ 104.644026] Chain exists of: (work_completion)(&lh->work) --> &aconnector->hpd_lock --> &adev->dm.dc_lock [ 104.644043] Possible unsafe locking scenario: [ 104.644049]CPU0CPU1 [ 104.644055] [ 104.644060] lock(&adev->dm.dc_lock); [ 104.644066]lock(&aconnector->hpd_lock); [ 104.644075]lock(&adev->dm.dc_lock); [ 104.644083] lock((work_completion)(&lh->work)); [ 104.644090] *** DEADLOCK *** [ 104.644096] 3 locks held by cat/1158: [ 104.644103] #0: 88810d0e4eb8 (&attr->mutex){+.+.}-{3:3}, at: simple_attr_read+0x4e/0x110 [ 104.644119] #1: 88810a0a1600 (&adev->reset_sem){}-{3:3}, at: amdgpu_device_lock_adev+0x42/0x94 [amdgpu] [ 104.644489] #2: 88810a09cc70 (&adev->dm.dc_lock){+.+.}-{3:3}, at: dm_suspend+0xb2/0x1d0 [amdgpu] Signed-off-by: Lang Yu --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index e176ea84d75b..8727488df769 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -2657,13 +2657,15 @@ static void handle_hpd_rx_irq(void *param) } } - mutex_lock(&adev->dm.dc_lock); + if (!amdgpu_in_reset(adev)) + mutex_lock(&adev->dm.dc_lock); #ifdef CONFIG_DRM_AMD_DC_HDCP result = dc_link_handle_hpd_rx_irq(dc_link, &hpd_irq_data, NULL); #else result =
RE: [PATCH] drm/amd/dispaly: fix deadlock issue in amdgpu reset
[AMD Official Use Only - Internal Distribution Only] -Original Message- From: Grodzovsky, Andrey Sent: Monday, March 22, 2021 11:01 PM To: Yu, Lang ; amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander ; Huang, Ray Subject: Re: [PATCH] drm/amd/dispaly: fix deadlock issue in amdgpu reset On 2021-03-22 4:11 a.m., Lang Yu wrote: > In amdggpu reset, while dm.dc_lock is held by dm_suspend, > handle_hpd_rx_irq tries to acquire it. Deadlock occurred! > > Deadlock log: > > [ 104.528304] amdgpu :03:00.0: amdgpu: GPU reset begin! > > [ 104.640084] == > [ 104.640092] WARNING: possible circular locking dependency detected > [ 104.640099] 5.11.0-custom #1 Tainted: GW E > [ 104.640107] -- > [ 104.640114] cat/1158 is trying to acquire lock: > [ 104.640120] 88810a09ce00 > ((work_completion)(&lh->work)){+.+.}-{0:0}, at: __flush_work+0x2e3/0x450 [ > 104.640144] > but task is already holding lock: > [ 104.640151] 88810a09cc70 (&adev->dm.dc_lock){+.+.}-{3:3}, at: > dm_suspend+0xb2/0x1d0 [amdgpu] [ 104.640581] > which lock already depends on the new lock. > > [ 104.640590] > the existing dependency chain (in reverse order) is: > [ 104.640598] > -> #2 (&adev->dm.dc_lock){+.+.}-{3:3}: > [ 104.640611]lock_acquire+0xca/0x390 > [ 104.640623]__mutex_lock+0x9b/0x930 > [ 104.640633]mutex_lock_nested+0x1b/0x20 > [ 104.640640]handle_hpd_rx_irq+0x9b/0x1c0 [amdgpu] > [ 104.640959]dm_irq_work_func+0x4e/0x60 [amdgpu] > [ 104.641264]process_one_work+0x2a7/0x5b0 > [ 104.641275]worker_thread+0x4a/0x3d0 > [ 104.641283]kthread+0x125/0x160 > [ 104.641290]ret_from_fork+0x22/0x30 > [ 104.641300] > -> #1 (&aconnector->hpd_lock){+.+.}-{3:3}: > [ 104.641312]lock_acquire+0xca/0x390 > [ 104.641321]__mutex_lock+0x9b/0x930 > [ 104.641328]mutex_lock_nested+0x1b/0x20 > [ 104.641336]handle_hpd_rx_irq+0x67/0x1c0 [amdgpu] > [ 104.641635]dm_irq_work_func+0x4e/0x60 [amdgpu] > [ 104.641931]process_one_work+0x2a7/0x5b0 > [ 104.641940]worker_thread+0x4a/0x3d0 > [ 104.641948]kthread+0x125/0x160 > [ 104.641954]ret_from_fork+0x22/0x30 > [ 104.641963] > -> #0 ((work_completion)(&lh->work)){+.+.}-{0:0}: > [ 104.641975]check_prev_add+0x94/0xbf0 > [ 104.641983]__lock_acquire+0x130d/0x1ce0 > [ 104.641992]lock_acquire+0xca/0x390 > [ 104.642000]__flush_work+0x303/0x450 > [ 104.642008]flush_work+0x10/0x20 > [ 104.642016]amdgpu_dm_irq_suspend+0x93/0x100 [amdgpu] > [ 104.642312]dm_suspend+0x181/0x1d0 [amdgpu] > [ 104.642605]amdgpu_device_ip_suspend_phase1+0x8a/0x100 [amdgpu] > [ 104.642835]amdgpu_device_ip_suspend+0x21/0x70 [amdgpu] > [ 104.643066]amdgpu_device_pre_asic_reset+0x1bd/0x1d2 [amdgpu] > [ 104.643403]amdgpu_device_gpu_recover.cold+0x5df/0xa9d [amdgpu] > [ 104.643715]gpu_recover_get+0x2e/0x60 [amdgpu] > [ 104.643951]simple_attr_read+0x6d/0x110 > [ 104.643960]debugfs_attr_read+0x49/0x70 > [ 104.643970]full_proxy_read+0x5f/0x90 > [ 104.643979]vfs_read+0xa3/0x190 > [ 104.643986]ksys_read+0x70/0xf0 > [ 104.643992]__x64_sys_read+0x1a/0x20 > [ 104.643999]do_syscall_64+0x38/0x90 > [ 104.644007]entry_SYSCALL_64_after_hwframe+0x44/0xa9 > [ 104.644017] > other info that might help us debug this: > > [ 104.644026] Chain exists of: > (work_completion)(&lh->work) --> > &aconnector->hpd_lock --> &adev->dm.dc_lock > > [ 104.644043] Possible unsafe locking scenario: > > [ 104.644049]CPU0CPU1 > [ 104.644055] > [ 104.644060] lock(&adev->dm.dc_lock); > [ 104.644066]lock(&aconnector->hpd_lock); > [ 104.644075]lock(&adev->dm.dc_lock); > [ 104.644083] lock((work_completion)(&lh->work)); > [ 104.644090] > *** DEADLOCK *** > > [ 104.644096] 3 locks held by cat/1158: > [ 104.644103] #0: 88810d0e4eb8 (&attr->mutex){+.+.}-{3:3}, at: > simple_attr_read+0x4e/0x110 [ 104.644119] #1: 88810a0a1600 > (&adev->reset_sem){}-{3:3}, at: amdgpu_device_lock_adev+0x42/0x94 > [amdgpu] [ 104.644489] #2: 88810a09cc70
Re: [PATCH] drm/amd/dispaly: fix deadlock issue in amdgpu reset
On 2021-03-22 4:11 a.m., Lang Yu wrote: In amdggpu reset, while dm.dc_lock is held by dm_suspend, handle_hpd_rx_irq tries to acquire it. Deadlock occurred! Deadlock log: [ 104.528304] amdgpu :03:00.0: amdgpu: GPU reset begin! [ 104.640084] == [ 104.640092] WARNING: possible circular locking dependency detected [ 104.640099] 5.11.0-custom #1 Tainted: GW E [ 104.640107] -- [ 104.640114] cat/1158 is trying to acquire lock: [ 104.640120] 88810a09ce00 ((work_completion)(&lh->work)){+.+.}-{0:0}, at: __flush_work+0x2e3/0x450 [ 104.640144] but task is already holding lock: [ 104.640151] 88810a09cc70 (&adev->dm.dc_lock){+.+.}-{3:3}, at: dm_suspend+0xb2/0x1d0 [amdgpu] [ 104.640581] which lock already depends on the new lock. [ 104.640590] the existing dependency chain (in reverse order) is: [ 104.640598] -> #2 (&adev->dm.dc_lock){+.+.}-{3:3}: [ 104.640611]lock_acquire+0xca/0x390 [ 104.640623]__mutex_lock+0x9b/0x930 [ 104.640633]mutex_lock_nested+0x1b/0x20 [ 104.640640]handle_hpd_rx_irq+0x9b/0x1c0 [amdgpu] [ 104.640959]dm_irq_work_func+0x4e/0x60 [amdgpu] [ 104.641264]process_one_work+0x2a7/0x5b0 [ 104.641275]worker_thread+0x4a/0x3d0 [ 104.641283]kthread+0x125/0x160 [ 104.641290]ret_from_fork+0x22/0x30 [ 104.641300] -> #1 (&aconnector->hpd_lock){+.+.}-{3:3}: [ 104.641312]lock_acquire+0xca/0x390 [ 104.641321]__mutex_lock+0x9b/0x930 [ 104.641328]mutex_lock_nested+0x1b/0x20 [ 104.641336]handle_hpd_rx_irq+0x67/0x1c0 [amdgpu] [ 104.641635]dm_irq_work_func+0x4e/0x60 [amdgpu] [ 104.641931]process_one_work+0x2a7/0x5b0 [ 104.641940]worker_thread+0x4a/0x3d0 [ 104.641948]kthread+0x125/0x160 [ 104.641954]ret_from_fork+0x22/0x30 [ 104.641963] -> #0 ((work_completion)(&lh->work)){+.+.}-{0:0}: [ 104.641975]check_prev_add+0x94/0xbf0 [ 104.641983]__lock_acquire+0x130d/0x1ce0 [ 104.641992]lock_acquire+0xca/0x390 [ 104.642000]__flush_work+0x303/0x450 [ 104.642008]flush_work+0x10/0x20 [ 104.642016]amdgpu_dm_irq_suspend+0x93/0x100 [amdgpu] [ 104.642312]dm_suspend+0x181/0x1d0 [amdgpu] [ 104.642605]amdgpu_device_ip_suspend_phase1+0x8a/0x100 [amdgpu] [ 104.642835]amdgpu_device_ip_suspend+0x21/0x70 [amdgpu] [ 104.643066]amdgpu_device_pre_asic_reset+0x1bd/0x1d2 [amdgpu] [ 104.643403]amdgpu_device_gpu_recover.cold+0x5df/0xa9d [amdgpu] [ 104.643715]gpu_recover_get+0x2e/0x60 [amdgpu] [ 104.643951]simple_attr_read+0x6d/0x110 [ 104.643960]debugfs_attr_read+0x49/0x70 [ 104.643970]full_proxy_read+0x5f/0x90 [ 104.643979]vfs_read+0xa3/0x190 [ 104.643986]ksys_read+0x70/0xf0 [ 104.643992]__x64_sys_read+0x1a/0x20 [ 104.643999]do_syscall_64+0x38/0x90 [ 104.644007]entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 104.644017] other info that might help us debug this: [ 104.644026] Chain exists of: (work_completion)(&lh->work) --> &aconnector->hpd_lock --> &adev->dm.dc_lock [ 104.644043] Possible unsafe locking scenario: [ 104.644049]CPU0CPU1 [ 104.644055] [ 104.644060] lock(&adev->dm.dc_lock); [ 104.644066]lock(&aconnector->hpd_lock); [ 104.644075]lock(&adev->dm.dc_lock); [ 104.644083] lock((work_completion)(&lh->work)); [ 104.644090] *** DEADLOCK *** [ 104.644096] 3 locks held by cat/1158: [ 104.644103] #0: 88810d0e4eb8 (&attr->mutex){+.+.}-{3:3}, at: simple_attr_read+0x4e/0x110 [ 104.644119] #1: 88810a0a1600 (&adev->reset_sem){}-{3:3}, at: amdgpu_device_lock_adev+0x42/0x94 [amdgpu] [ 104.644489] #2: 88810a09cc70 (&adev->dm.dc_lock){+.+.}-{3:3}, at: dm_suspend+0xb2/0x1d0 [amdgpu] Signed-off-by: Lang Yu --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index e176ea84d75b..8727488df769 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -2657,13 +2657,15 @@ static void handle_hpd_rx_irq(void *param) } } - mutex_lock(&adev->dm.dc_lock); + if (!amdgpu_in_reset(adev)) + mutex_lock(&adev->dm.dc_lock); #ifdef CONFIG_DRM_AMD_DC_HDCP result = dc_link_handle_hpd_rx_irq(dc_link, &hpd_irq_data, NULL); #else result = dc_link_handle_hpd_rx_irq(dc_link, NULL, NUL
[PATCH] drm/amd/dispaly: fix deadlock issue in amdgpu reset
In amdggpu reset, while dm.dc_lock is held by dm_suspend, handle_hpd_rx_irq tries to acquire it. Deadlock occurred! Deadlock log: [ 104.528304] amdgpu :03:00.0: amdgpu: GPU reset begin! [ 104.640084] == [ 104.640092] WARNING: possible circular locking dependency detected [ 104.640099] 5.11.0-custom #1 Tainted: GW E [ 104.640107] -- [ 104.640114] cat/1158 is trying to acquire lock: [ 104.640120] 88810a09ce00 ((work_completion)(&lh->work)){+.+.}-{0:0}, at: __flush_work+0x2e3/0x450 [ 104.640144] but task is already holding lock: [ 104.640151] 88810a09cc70 (&adev->dm.dc_lock){+.+.}-{3:3}, at: dm_suspend+0xb2/0x1d0 [amdgpu] [ 104.640581] which lock already depends on the new lock. [ 104.640590] the existing dependency chain (in reverse order) is: [ 104.640598] -> #2 (&adev->dm.dc_lock){+.+.}-{3:3}: [ 104.640611]lock_acquire+0xca/0x390 [ 104.640623]__mutex_lock+0x9b/0x930 [ 104.640633]mutex_lock_nested+0x1b/0x20 [ 104.640640]handle_hpd_rx_irq+0x9b/0x1c0 [amdgpu] [ 104.640959]dm_irq_work_func+0x4e/0x60 [amdgpu] [ 104.641264]process_one_work+0x2a7/0x5b0 [ 104.641275]worker_thread+0x4a/0x3d0 [ 104.641283]kthread+0x125/0x160 [ 104.641290]ret_from_fork+0x22/0x30 [ 104.641300] -> #1 (&aconnector->hpd_lock){+.+.}-{3:3}: [ 104.641312]lock_acquire+0xca/0x390 [ 104.641321]__mutex_lock+0x9b/0x930 [ 104.641328]mutex_lock_nested+0x1b/0x20 [ 104.641336]handle_hpd_rx_irq+0x67/0x1c0 [amdgpu] [ 104.641635]dm_irq_work_func+0x4e/0x60 [amdgpu] [ 104.641931]process_one_work+0x2a7/0x5b0 [ 104.641940]worker_thread+0x4a/0x3d0 [ 104.641948]kthread+0x125/0x160 [ 104.641954]ret_from_fork+0x22/0x30 [ 104.641963] -> #0 ((work_completion)(&lh->work)){+.+.}-{0:0}: [ 104.641975]check_prev_add+0x94/0xbf0 [ 104.641983]__lock_acquire+0x130d/0x1ce0 [ 104.641992]lock_acquire+0xca/0x390 [ 104.642000]__flush_work+0x303/0x450 [ 104.642008]flush_work+0x10/0x20 [ 104.642016]amdgpu_dm_irq_suspend+0x93/0x100 [amdgpu] [ 104.642312]dm_suspend+0x181/0x1d0 [amdgpu] [ 104.642605]amdgpu_device_ip_suspend_phase1+0x8a/0x100 [amdgpu] [ 104.642835]amdgpu_device_ip_suspend+0x21/0x70 [amdgpu] [ 104.643066]amdgpu_device_pre_asic_reset+0x1bd/0x1d2 [amdgpu] [ 104.643403]amdgpu_device_gpu_recover.cold+0x5df/0xa9d [amdgpu] [ 104.643715]gpu_recover_get+0x2e/0x60 [amdgpu] [ 104.643951]simple_attr_read+0x6d/0x110 [ 104.643960]debugfs_attr_read+0x49/0x70 [ 104.643970]full_proxy_read+0x5f/0x90 [ 104.643979]vfs_read+0xa3/0x190 [ 104.643986]ksys_read+0x70/0xf0 [ 104.643992]__x64_sys_read+0x1a/0x20 [ 104.643999]do_syscall_64+0x38/0x90 [ 104.644007]entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 104.644017] other info that might help us debug this: [ 104.644026] Chain exists of: (work_completion)(&lh->work) --> &aconnector->hpd_lock --> &adev->dm.dc_lock [ 104.644043] Possible unsafe locking scenario: [ 104.644049]CPU0CPU1 [ 104.644055] [ 104.644060] lock(&adev->dm.dc_lock); [ 104.644066]lock(&aconnector->hpd_lock); [ 104.644075]lock(&adev->dm.dc_lock); [ 104.644083] lock((work_completion)(&lh->work)); [ 104.644090] *** DEADLOCK *** [ 104.644096] 3 locks held by cat/1158: [ 104.644103] #0: 88810d0e4eb8 (&attr->mutex){+.+.}-{3:3}, at: simple_attr_read+0x4e/0x110 [ 104.644119] #1: 88810a0a1600 (&adev->reset_sem){}-{3:3}, at: amdgpu_device_lock_adev+0x42/0x94 [amdgpu] [ 104.644489] #2: 88810a09cc70 (&adev->dm.dc_lock){+.+.}-{3:3}, at: dm_suspend+0xb2/0x1d0 [amdgpu] Signed-off-by: Lang Yu --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c index e176ea84d75b..8727488df769 100644 --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c @@ -2657,13 +2657,15 @@ static void handle_hpd_rx_irq(void *param) } } - mutex_lock(&adev->dm.dc_lock); + if (!amdgpu_in_reset(adev)) + mutex_lock(&adev->dm.dc_lock); #ifdef CONFIG_DRM_AMD_DC_HDCP result = dc_link_handle_hpd_rx_irq(dc_link, &hpd_irq_data, NULL); #else result = dc_link_handle_hpd_rx_irq(dc_link, NULL, NULL); #endif - mutex_unlock(&adev->dm.dc_lock); +