Re: [PATCH] drm/amd/dispaly: fix deadlock issue in amdgpu reset

2021-03-23 Thread Andrey Grodzovsky

+ Harry and Nick

On 2021-03-22 9:42 p.m., Yu, Lang wrote:

[AMD Official Use Only - Internal Distribution Only]



-Original Message-
From: Grodzovsky, Andrey 
Sent: Monday, March 22, 2021 11:01 PM
To: Yu, Lang ; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Huang, Ray 

Subject: Re: [PATCH] drm/amd/dispaly: fix deadlock issue in amdgpu reset



On 2021-03-22 4:11 a.m., Lang Yu wrote:

In amdggpu reset, while dm.dc_lock is held by dm_suspend,
handle_hpd_rx_irq tries to acquire it. Deadlock occurred!

Deadlock log:

[  104.528304] amdgpu :03:00.0: amdgpu: GPU reset begin!

[  104.640084] ==
[  104.640092] WARNING: possible circular locking dependency detected
[  104.640099] 5.11.0-custom #1 Tainted: GW   E
[  104.640107] --
[  104.640114] cat/1158 is trying to acquire lock:
[  104.640120] 88810a09ce00
((work_completion)(&lh->work)){+.+.}-{0:0}, at: __flush_work+0x2e3/0x450 [  
104.640144]
 but task is already holding lock:
[  104.640151] 88810a09cc70 (&adev->dm.dc_lock){+.+.}-{3:3}, at:
dm_suspend+0xb2/0x1d0 [amdgpu] [  104.640581]
 which lock already depends on the new lock.

[  104.640590]
 the existing dependency chain (in reverse order) is:
[  104.640598]
 -> #2 (&adev->dm.dc_lock){+.+.}-{3:3}:
[  104.640611]lock_acquire+0xca/0x390
[  104.640623]__mutex_lock+0x9b/0x930
[  104.640633]mutex_lock_nested+0x1b/0x20
[  104.640640]handle_hpd_rx_irq+0x9b/0x1c0 [amdgpu]
[  104.640959]dm_irq_work_func+0x4e/0x60 [amdgpu]
[  104.641264]process_one_work+0x2a7/0x5b0
[  104.641275]worker_thread+0x4a/0x3d0
[  104.641283]kthread+0x125/0x160
[  104.641290]ret_from_fork+0x22/0x30
[  104.641300]
 -> #1 (&aconnector->hpd_lock){+.+.}-{3:3}:
[  104.641312]lock_acquire+0xca/0x390
[  104.641321]__mutex_lock+0x9b/0x930
[  104.641328]mutex_lock_nested+0x1b/0x20
[  104.641336]handle_hpd_rx_irq+0x67/0x1c0 [amdgpu]
[  104.641635]dm_irq_work_func+0x4e/0x60 [amdgpu]
[  104.641931]process_one_work+0x2a7/0x5b0
[  104.641940]worker_thread+0x4a/0x3d0
[  104.641948]kthread+0x125/0x160
[  104.641954]ret_from_fork+0x22/0x30
[  104.641963]
 -> #0 ((work_completion)(&lh->work)){+.+.}-{0:0}:
[  104.641975]check_prev_add+0x94/0xbf0
[  104.641983]__lock_acquire+0x130d/0x1ce0
[  104.641992]lock_acquire+0xca/0x390
[  104.642000]__flush_work+0x303/0x450
[  104.642008]flush_work+0x10/0x20
[  104.642016]amdgpu_dm_irq_suspend+0x93/0x100 [amdgpu]
[  104.642312]dm_suspend+0x181/0x1d0 [amdgpu]
[  104.642605]amdgpu_device_ip_suspend_phase1+0x8a/0x100 [amdgpu]
[  104.642835]amdgpu_device_ip_suspend+0x21/0x70 [amdgpu]
[  104.643066]amdgpu_device_pre_asic_reset+0x1bd/0x1d2 [amdgpu]
[  104.643403]amdgpu_device_gpu_recover.cold+0x5df/0xa9d [amdgpu]
[  104.643715]gpu_recover_get+0x2e/0x60 [amdgpu]
[  104.643951]simple_attr_read+0x6d/0x110
[  104.643960]debugfs_attr_read+0x49/0x70
[  104.643970]full_proxy_read+0x5f/0x90
[  104.643979]vfs_read+0xa3/0x190
[  104.643986]ksys_read+0x70/0xf0
[  104.643992]__x64_sys_read+0x1a/0x20
[  104.643999]do_syscall_64+0x38/0x90
[  104.644007]entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  104.644017]
 other info that might help us debug this:

[  104.644026] Chain exists of:
   (work_completion)(&lh->work) -->
&aconnector->hpd_lock --> &adev->dm.dc_lock

[  104.644043]  Possible unsafe locking scenario:

[  104.644049]CPU0CPU1
[  104.644055]
[  104.644060]   lock(&adev->dm.dc_lock);
[  104.644066]lock(&aconnector->hpd_lock);
[  104.644075]lock(&adev->dm.dc_lock);
[  104.644083]   lock((work_completion)(&lh->work));
[  104.644090]
  *** DEADLOCK ***

[  104.644096] 3 locks held by cat/1158:
[  104.644103]  #0: 88810d0e4eb8 (&attr->mutex){+.+.}-{3:3}, at:
simple_attr_read+0x4e/0x110 [  104.644119]  #1: 88810a0a1600
(&adev->reset_sem){}-{3:3}, at: amdgpu_device_lock_adev+0x42/0x94
[amdgpu] [  104.644489]  #2: 88810a09cc70
(&adev->dm.dc_lock){+.+.}-{3:3}, at: dm_suspend+0xb2/0x1d0 [amdgpu]

Signed-off-by: Lang Yu 
---
   drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 6 --
   1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index e176ea84d75b..8727488df769 100644
--- a/driver

Re: [PATCH] drm/amd/dispaly: fix deadlock issue in amdgpu reset

2021-03-22 Thread Joshua Ashton

Typo in the title:

s/dispaly/display

- Joshie 🐸✨

On 3/22/21 8:11 AM, Lang Yu wrote:

In amdggpu reset, while dm.dc_lock is held by dm_suspend,
handle_hpd_rx_irq tries to acquire it. Deadlock occurred!

Deadlock log:

[  104.528304] amdgpu :03:00.0: amdgpu: GPU reset begin!

[  104.640084] ==
[  104.640092] WARNING: possible circular locking dependency detected
[  104.640099] 5.11.0-custom #1 Tainted: GW   E
[  104.640107] --
[  104.640114] cat/1158 is trying to acquire lock:
[  104.640120] 88810a09ce00 ((work_completion)(&lh->work)){+.+.}-{0:0}, at: 
__flush_work+0x2e3/0x450
[  104.640144]
but task is already holding lock:
[  104.640151] 88810a09cc70 (&adev->dm.dc_lock){+.+.}-{3:3}, at: 
dm_suspend+0xb2/0x1d0 [amdgpu]
[  104.640581]
which lock already depends on the new lock.

[  104.640590]
the existing dependency chain (in reverse order) is:
[  104.640598]
-> #2 (&adev->dm.dc_lock){+.+.}-{3:3}:
[  104.640611]lock_acquire+0xca/0x390
[  104.640623]__mutex_lock+0x9b/0x930
[  104.640633]mutex_lock_nested+0x1b/0x20
[  104.640640]handle_hpd_rx_irq+0x9b/0x1c0 [amdgpu]
[  104.640959]dm_irq_work_func+0x4e/0x60 [amdgpu]
[  104.641264]process_one_work+0x2a7/0x5b0
[  104.641275]worker_thread+0x4a/0x3d0
[  104.641283]kthread+0x125/0x160
[  104.641290]ret_from_fork+0x22/0x30
[  104.641300]
-> #1 (&aconnector->hpd_lock){+.+.}-{3:3}:
[  104.641312]lock_acquire+0xca/0x390
[  104.641321]__mutex_lock+0x9b/0x930
[  104.641328]mutex_lock_nested+0x1b/0x20
[  104.641336]handle_hpd_rx_irq+0x67/0x1c0 [amdgpu]
[  104.641635]dm_irq_work_func+0x4e/0x60 [amdgpu]
[  104.641931]process_one_work+0x2a7/0x5b0
[  104.641940]worker_thread+0x4a/0x3d0
[  104.641948]kthread+0x125/0x160
[  104.641954]ret_from_fork+0x22/0x30
[  104.641963]
-> #0 ((work_completion)(&lh->work)){+.+.}-{0:0}:
[  104.641975]check_prev_add+0x94/0xbf0
[  104.641983]__lock_acquire+0x130d/0x1ce0
[  104.641992]lock_acquire+0xca/0x390
[  104.642000]__flush_work+0x303/0x450
[  104.642008]flush_work+0x10/0x20
[  104.642016]amdgpu_dm_irq_suspend+0x93/0x100 [amdgpu]
[  104.642312]dm_suspend+0x181/0x1d0 [amdgpu]
[  104.642605]amdgpu_device_ip_suspend_phase1+0x8a/0x100 [amdgpu]
[  104.642835]amdgpu_device_ip_suspend+0x21/0x70 [amdgpu]
[  104.643066]amdgpu_device_pre_asic_reset+0x1bd/0x1d2 [amdgpu]
[  104.643403]amdgpu_device_gpu_recover.cold+0x5df/0xa9d [amdgpu]
[  104.643715]gpu_recover_get+0x2e/0x60 [amdgpu]
[  104.643951]simple_attr_read+0x6d/0x110
[  104.643960]debugfs_attr_read+0x49/0x70
[  104.643970]full_proxy_read+0x5f/0x90
[  104.643979]vfs_read+0xa3/0x190
[  104.643986]ksys_read+0x70/0xf0
[  104.643992]__x64_sys_read+0x1a/0x20
[  104.643999]do_syscall_64+0x38/0x90
[  104.644007]entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  104.644017]
other info that might help us debug this:

[  104.644026] Chain exists of:
  (work_completion)(&lh->work) --> &aconnector->hpd_lock --> 
&adev->dm.dc_lock

[  104.644043]  Possible unsafe locking scenario:

[  104.644049]CPU0CPU1
[  104.644055]
[  104.644060]   lock(&adev->dm.dc_lock);
[  104.644066]lock(&aconnector->hpd_lock);
[  104.644075]lock(&adev->dm.dc_lock);
[  104.644083]   lock((work_completion)(&lh->work));
[  104.644090]
 *** DEADLOCK ***

[  104.644096] 3 locks held by cat/1158:
[  104.644103]  #0: 88810d0e4eb8 (&attr->mutex){+.+.}-{3:3}, at: 
simple_attr_read+0x4e/0x110
[  104.644119]  #1: 88810a0a1600 (&adev->reset_sem){}-{3:3}, at: 
amdgpu_device_lock_adev+0x42/0x94 [amdgpu]
[  104.644489]  #2: 88810a09cc70 (&adev->dm.dc_lock){+.+.}-{3:3}, at: 
dm_suspend+0xb2/0x1d0 [amdgpu]

Signed-off-by: Lang Yu 
---
  drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 6 --
  1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index e176ea84d75b..8727488df769 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -2657,13 +2657,15 @@ static void handle_hpd_rx_irq(void *param)
}
}
  
-	mutex_lock(&adev->dm.dc_lock);

+   if (!amdgpu_in_reset(adev))
+   mutex_lock(&adev->dm.dc_lock);
  #ifdef CONFIG_DRM_AMD_DC_HDCP
result = dc_link_handle_hpd_rx_irq(dc_link, &hpd_irq_data, NULL);
  #else
result = 

RE: [PATCH] drm/amd/dispaly: fix deadlock issue in amdgpu reset

2021-03-22 Thread Yu, Lang
[AMD Official Use Only - Internal Distribution Only]



-Original Message-
From: Grodzovsky, Andrey  
Sent: Monday, March 22, 2021 11:01 PM
To: Yu, Lang ; amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Huang, Ray 

Subject: Re: [PATCH] drm/amd/dispaly: fix deadlock issue in amdgpu reset



On 2021-03-22 4:11 a.m., Lang Yu wrote:
> In amdggpu reset, while dm.dc_lock is held by dm_suspend, 
> handle_hpd_rx_irq tries to acquire it. Deadlock occurred!
> 
> Deadlock log:
> 
> [  104.528304] amdgpu :03:00.0: amdgpu: GPU reset begin!
> 
> [  104.640084] ==
> [  104.640092] WARNING: possible circular locking dependency detected
> [  104.640099] 5.11.0-custom #1 Tainted: GW   E
> [  104.640107] --
> [  104.640114] cat/1158 is trying to acquire lock:
> [  104.640120] 88810a09ce00 
> ((work_completion)(&lh->work)){+.+.}-{0:0}, at: __flush_work+0x2e3/0x450 [  
> 104.640144]
> but task is already holding lock:
> [  104.640151] 88810a09cc70 (&adev->dm.dc_lock){+.+.}-{3:3}, at: 
> dm_suspend+0xb2/0x1d0 [amdgpu] [  104.640581]
> which lock already depends on the new lock.
> 
> [  104.640590]
> the existing dependency chain (in reverse order) is:
> [  104.640598]
> -> #2 (&adev->dm.dc_lock){+.+.}-{3:3}:
> [  104.640611]lock_acquire+0xca/0x390
> [  104.640623]__mutex_lock+0x9b/0x930
> [  104.640633]mutex_lock_nested+0x1b/0x20
> [  104.640640]handle_hpd_rx_irq+0x9b/0x1c0 [amdgpu]
> [  104.640959]dm_irq_work_func+0x4e/0x60 [amdgpu]
> [  104.641264]process_one_work+0x2a7/0x5b0
> [  104.641275]worker_thread+0x4a/0x3d0
> [  104.641283]kthread+0x125/0x160
> [  104.641290]ret_from_fork+0x22/0x30
> [  104.641300]
> -> #1 (&aconnector->hpd_lock){+.+.}-{3:3}:
> [  104.641312]lock_acquire+0xca/0x390
> [  104.641321]__mutex_lock+0x9b/0x930
> [  104.641328]mutex_lock_nested+0x1b/0x20
> [  104.641336]handle_hpd_rx_irq+0x67/0x1c0 [amdgpu]
> [  104.641635]dm_irq_work_func+0x4e/0x60 [amdgpu]
> [  104.641931]process_one_work+0x2a7/0x5b0
> [  104.641940]worker_thread+0x4a/0x3d0
> [  104.641948]kthread+0x125/0x160
> [  104.641954]ret_from_fork+0x22/0x30
> [  104.641963]
> -> #0 ((work_completion)(&lh->work)){+.+.}-{0:0}:
> [  104.641975]check_prev_add+0x94/0xbf0
> [  104.641983]__lock_acquire+0x130d/0x1ce0
> [  104.641992]lock_acquire+0xca/0x390
> [  104.642000]__flush_work+0x303/0x450
> [  104.642008]flush_work+0x10/0x20
> [  104.642016]amdgpu_dm_irq_suspend+0x93/0x100 [amdgpu]
> [  104.642312]dm_suspend+0x181/0x1d0 [amdgpu]
> [  104.642605]amdgpu_device_ip_suspend_phase1+0x8a/0x100 [amdgpu]
> [  104.642835]amdgpu_device_ip_suspend+0x21/0x70 [amdgpu]
> [  104.643066]amdgpu_device_pre_asic_reset+0x1bd/0x1d2 [amdgpu]
> [  104.643403]amdgpu_device_gpu_recover.cold+0x5df/0xa9d [amdgpu]
> [  104.643715]gpu_recover_get+0x2e/0x60 [amdgpu]
> [  104.643951]simple_attr_read+0x6d/0x110
> [  104.643960]debugfs_attr_read+0x49/0x70
> [  104.643970]full_proxy_read+0x5f/0x90
> [  104.643979]vfs_read+0xa3/0x190
> [  104.643986]ksys_read+0x70/0xf0
> [  104.643992]__x64_sys_read+0x1a/0x20
> [  104.643999]do_syscall_64+0x38/0x90
> [  104.644007]entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [  104.644017]
> other info that might help us debug this:
> 
> [  104.644026] Chain exists of:
>   (work_completion)(&lh->work) --> 
> &aconnector->hpd_lock --> &adev->dm.dc_lock
> 
> [  104.644043]  Possible unsafe locking scenario:
> 
> [  104.644049]CPU0CPU1
> [  104.644055]
> [  104.644060]   lock(&adev->dm.dc_lock);
> [  104.644066]lock(&aconnector->hpd_lock);
> [  104.644075]lock(&adev->dm.dc_lock);
> [  104.644083]   lock((work_completion)(&lh->work));
> [  104.644090]
>  *** DEADLOCK ***
> 
> [  104.644096] 3 locks held by cat/1158:
> [  104.644103]  #0: 88810d0e4eb8 (&attr->mutex){+.+.}-{3:3}, at: 
> simple_attr_read+0x4e/0x110 [  104.644119]  #1: 88810a0a1600 
> (&adev->reset_sem){}-{3:3}, at: amdgpu_device_lock_adev+0x42/0x94 
> [amdgpu] [  104.644489]  #2: 88810a09cc70 

Re: [PATCH] drm/amd/dispaly: fix deadlock issue in amdgpu reset

2021-03-22 Thread Andrey Grodzovsky




On 2021-03-22 4:11 a.m., Lang Yu wrote:

In amdggpu reset, while dm.dc_lock is held by dm_suspend,
handle_hpd_rx_irq tries to acquire it. Deadlock occurred!

Deadlock log:

[  104.528304] amdgpu :03:00.0: amdgpu: GPU reset begin!

[  104.640084] ==
[  104.640092] WARNING: possible circular locking dependency detected
[  104.640099] 5.11.0-custom #1 Tainted: GW   E
[  104.640107] --
[  104.640114] cat/1158 is trying to acquire lock:
[  104.640120] 88810a09ce00 ((work_completion)(&lh->work)){+.+.}-{0:0}, at: 
__flush_work+0x2e3/0x450
[  104.640144]
but task is already holding lock:
[  104.640151] 88810a09cc70 (&adev->dm.dc_lock){+.+.}-{3:3}, at: 
dm_suspend+0xb2/0x1d0 [amdgpu]
[  104.640581]
which lock already depends on the new lock.

[  104.640590]
the existing dependency chain (in reverse order) is:
[  104.640598]
-> #2 (&adev->dm.dc_lock){+.+.}-{3:3}:
[  104.640611]lock_acquire+0xca/0x390
[  104.640623]__mutex_lock+0x9b/0x930
[  104.640633]mutex_lock_nested+0x1b/0x20
[  104.640640]handle_hpd_rx_irq+0x9b/0x1c0 [amdgpu]
[  104.640959]dm_irq_work_func+0x4e/0x60 [amdgpu]
[  104.641264]process_one_work+0x2a7/0x5b0
[  104.641275]worker_thread+0x4a/0x3d0
[  104.641283]kthread+0x125/0x160
[  104.641290]ret_from_fork+0x22/0x30
[  104.641300]
-> #1 (&aconnector->hpd_lock){+.+.}-{3:3}:
[  104.641312]lock_acquire+0xca/0x390
[  104.641321]__mutex_lock+0x9b/0x930
[  104.641328]mutex_lock_nested+0x1b/0x20
[  104.641336]handle_hpd_rx_irq+0x67/0x1c0 [amdgpu]
[  104.641635]dm_irq_work_func+0x4e/0x60 [amdgpu]
[  104.641931]process_one_work+0x2a7/0x5b0
[  104.641940]worker_thread+0x4a/0x3d0
[  104.641948]kthread+0x125/0x160
[  104.641954]ret_from_fork+0x22/0x30
[  104.641963]
-> #0 ((work_completion)(&lh->work)){+.+.}-{0:0}:
[  104.641975]check_prev_add+0x94/0xbf0
[  104.641983]__lock_acquire+0x130d/0x1ce0
[  104.641992]lock_acquire+0xca/0x390
[  104.642000]__flush_work+0x303/0x450
[  104.642008]flush_work+0x10/0x20
[  104.642016]amdgpu_dm_irq_suspend+0x93/0x100 [amdgpu]
[  104.642312]dm_suspend+0x181/0x1d0 [amdgpu]
[  104.642605]amdgpu_device_ip_suspend_phase1+0x8a/0x100 [amdgpu]
[  104.642835]amdgpu_device_ip_suspend+0x21/0x70 [amdgpu]
[  104.643066]amdgpu_device_pre_asic_reset+0x1bd/0x1d2 [amdgpu]
[  104.643403]amdgpu_device_gpu_recover.cold+0x5df/0xa9d [amdgpu]
[  104.643715]gpu_recover_get+0x2e/0x60 [amdgpu]
[  104.643951]simple_attr_read+0x6d/0x110
[  104.643960]debugfs_attr_read+0x49/0x70
[  104.643970]full_proxy_read+0x5f/0x90
[  104.643979]vfs_read+0xa3/0x190
[  104.643986]ksys_read+0x70/0xf0
[  104.643992]__x64_sys_read+0x1a/0x20
[  104.643999]do_syscall_64+0x38/0x90
[  104.644007]entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  104.644017]
other info that might help us debug this:

[  104.644026] Chain exists of:
  (work_completion)(&lh->work) --> &aconnector->hpd_lock --> 
&adev->dm.dc_lock

[  104.644043]  Possible unsafe locking scenario:

[  104.644049]CPU0CPU1
[  104.644055]
[  104.644060]   lock(&adev->dm.dc_lock);
[  104.644066]lock(&aconnector->hpd_lock);
[  104.644075]lock(&adev->dm.dc_lock);
[  104.644083]   lock((work_completion)(&lh->work));
[  104.644090]
 *** DEADLOCK ***

[  104.644096] 3 locks held by cat/1158:
[  104.644103]  #0: 88810d0e4eb8 (&attr->mutex){+.+.}-{3:3}, at: 
simple_attr_read+0x4e/0x110
[  104.644119]  #1: 88810a0a1600 (&adev->reset_sem){}-{3:3}, at: 
amdgpu_device_lock_adev+0x42/0x94 [amdgpu]
[  104.644489]  #2: 88810a09cc70 (&adev->dm.dc_lock){+.+.}-{3:3}, at: 
dm_suspend+0xb2/0x1d0 [amdgpu]

Signed-off-by: Lang Yu 
---
  drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 6 --
  1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index e176ea84d75b..8727488df769 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -2657,13 +2657,15 @@ static void handle_hpd_rx_irq(void *param)
}
}
  
-	mutex_lock(&adev->dm.dc_lock);

+   if (!amdgpu_in_reset(adev))
+   mutex_lock(&adev->dm.dc_lock);
  #ifdef CONFIG_DRM_AMD_DC_HDCP
result = dc_link_handle_hpd_rx_irq(dc_link, &hpd_irq_data, NULL);
  #else
result = dc_link_handle_hpd_rx_irq(dc_link, NULL, NUL

[PATCH] drm/amd/dispaly: fix deadlock issue in amdgpu reset

2021-03-22 Thread Lang Yu
In amdggpu reset, while dm.dc_lock is held by dm_suspend,
handle_hpd_rx_irq tries to acquire it. Deadlock occurred!

Deadlock log:

[  104.528304] amdgpu :03:00.0: amdgpu: GPU reset begin!

[  104.640084] ==
[  104.640092] WARNING: possible circular locking dependency detected
[  104.640099] 5.11.0-custom #1 Tainted: GW   E
[  104.640107] --
[  104.640114] cat/1158 is trying to acquire lock:
[  104.640120] 88810a09ce00 ((work_completion)(&lh->work)){+.+.}-{0:0}, at: 
__flush_work+0x2e3/0x450
[  104.640144]
   but task is already holding lock:
[  104.640151] 88810a09cc70 (&adev->dm.dc_lock){+.+.}-{3:3}, at: 
dm_suspend+0xb2/0x1d0 [amdgpu]
[  104.640581]
   which lock already depends on the new lock.

[  104.640590]
   the existing dependency chain (in reverse order) is:
[  104.640598]
   -> #2 (&adev->dm.dc_lock){+.+.}-{3:3}:
[  104.640611]lock_acquire+0xca/0x390
[  104.640623]__mutex_lock+0x9b/0x930
[  104.640633]mutex_lock_nested+0x1b/0x20
[  104.640640]handle_hpd_rx_irq+0x9b/0x1c0 [amdgpu]
[  104.640959]dm_irq_work_func+0x4e/0x60 [amdgpu]
[  104.641264]process_one_work+0x2a7/0x5b0
[  104.641275]worker_thread+0x4a/0x3d0
[  104.641283]kthread+0x125/0x160
[  104.641290]ret_from_fork+0x22/0x30
[  104.641300]
   -> #1 (&aconnector->hpd_lock){+.+.}-{3:3}:
[  104.641312]lock_acquire+0xca/0x390
[  104.641321]__mutex_lock+0x9b/0x930
[  104.641328]mutex_lock_nested+0x1b/0x20
[  104.641336]handle_hpd_rx_irq+0x67/0x1c0 [amdgpu]
[  104.641635]dm_irq_work_func+0x4e/0x60 [amdgpu]
[  104.641931]process_one_work+0x2a7/0x5b0
[  104.641940]worker_thread+0x4a/0x3d0
[  104.641948]kthread+0x125/0x160
[  104.641954]ret_from_fork+0x22/0x30
[  104.641963]
   -> #0 ((work_completion)(&lh->work)){+.+.}-{0:0}:
[  104.641975]check_prev_add+0x94/0xbf0
[  104.641983]__lock_acquire+0x130d/0x1ce0
[  104.641992]lock_acquire+0xca/0x390
[  104.642000]__flush_work+0x303/0x450
[  104.642008]flush_work+0x10/0x20
[  104.642016]amdgpu_dm_irq_suspend+0x93/0x100 [amdgpu]
[  104.642312]dm_suspend+0x181/0x1d0 [amdgpu]
[  104.642605]amdgpu_device_ip_suspend_phase1+0x8a/0x100 [amdgpu]
[  104.642835]amdgpu_device_ip_suspend+0x21/0x70 [amdgpu]
[  104.643066]amdgpu_device_pre_asic_reset+0x1bd/0x1d2 [amdgpu]
[  104.643403]amdgpu_device_gpu_recover.cold+0x5df/0xa9d [amdgpu]
[  104.643715]gpu_recover_get+0x2e/0x60 [amdgpu]
[  104.643951]simple_attr_read+0x6d/0x110
[  104.643960]debugfs_attr_read+0x49/0x70
[  104.643970]full_proxy_read+0x5f/0x90
[  104.643979]vfs_read+0xa3/0x190
[  104.643986]ksys_read+0x70/0xf0
[  104.643992]__x64_sys_read+0x1a/0x20
[  104.643999]do_syscall_64+0x38/0x90
[  104.644007]entry_SYSCALL_64_after_hwframe+0x44/0xa9
[  104.644017]
   other info that might help us debug this:

[  104.644026] Chain exists of:
 (work_completion)(&lh->work) --> &aconnector->hpd_lock --> 
&adev->dm.dc_lock

[  104.644043]  Possible unsafe locking scenario:

[  104.644049]CPU0CPU1
[  104.644055]
[  104.644060]   lock(&adev->dm.dc_lock);
[  104.644066]lock(&aconnector->hpd_lock);
[  104.644075]lock(&adev->dm.dc_lock);
[  104.644083]   lock((work_completion)(&lh->work));
[  104.644090]
*** DEADLOCK ***

[  104.644096] 3 locks held by cat/1158:
[  104.644103]  #0: 88810d0e4eb8 (&attr->mutex){+.+.}-{3:3}, at: 
simple_attr_read+0x4e/0x110
[  104.644119]  #1: 88810a0a1600 (&adev->reset_sem){}-{3:3}, at: 
amdgpu_device_lock_adev+0x42/0x94 [amdgpu]
[  104.644489]  #2: 88810a09cc70 (&adev->dm.dc_lock){+.+.}-{3:3}, at: 
dm_suspend+0xb2/0x1d0 [amdgpu]

Signed-off-by: Lang Yu 
---
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index e176ea84d75b..8727488df769 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -2657,13 +2657,15 @@ static void handle_hpd_rx_irq(void *param)
}
}
 
-   mutex_lock(&adev->dm.dc_lock);
+   if (!amdgpu_in_reset(adev))
+   mutex_lock(&adev->dm.dc_lock);
 #ifdef CONFIG_DRM_AMD_DC_HDCP
result = dc_link_handle_hpd_rx_irq(dc_link, &hpd_irq_data, NULL);
 #else
result = dc_link_handle_hpd_rx_irq(dc_link, NULL, NULL);
 #endif
-   mutex_unlock(&adev->dm.dc_lock);
+