RE: [PATCH] drm/amdgpu: Fixed bug on error when uninstalling amdgpu

2022-12-16 Thread Chai, Thomas
[AMD Official Use Only - General]

OK, I will update subject line.  Thanks!


-
Best Regards,
Thomas

-Original Message-
From: Christian König  
Sent: Friday, December 16, 2022 4:50 PM
To: Chai, Thomas ; amd-gfx@lists.freedesktop.org; Paneer 
Selvam, Arunpravin 
Cc: Zhou1, Tao ; Zhang, Hawking ; 
Chai, Thomas 
Subject: Re: [PATCH] drm/amdgpu: Fixed bug on error when uninstalling amdgpu

Am 16.12.22 um 03:56 schrieb YiPeng Chai:
> Fixed bug on error when uninstalling amdgpu.
> The error message is as follows:
> [  304.852489] kernel BUG at drivers/gpu/drm/drm_buddy.c:278!
> [  304.852503] invalid opcode:  [#1] PREEMPT SMP NOPTI
> [  304.852510] CPU: 2 PID: 4192 Comm: modprobe Tainted: GW IOE 
> 5.19.0-thomas #1
> [  304.852519] Hardware name: ASUS System Product Name/PRIME Z390-A, 
> BIOS 2004 11/02/2021 [  304.852526] RIP: 
> 0010:drm_buddy_free_block+0x26/0x30 [drm_buddy] [  304.852535] Code: 
> 00 00 00 90 0f 1f 44 00 00 48 8b 0e 89 c8 25 00 0c 00 00 3d 00 04 00 
> 00 75 10 48 8b 47 18 48 d3 e0 48 01 47 28 e9 fa fe ff ff <0f> 0b 0f 1f 
> 84 00 00 00 00 00 0f 1f 44 00 00 41 54 55 48 89 f5 53 [  304.852549] 
> RSP: 0018:9afac17bbcb8 EFLAGS: 00010287 [  304.852556] RAX: 
>  RBX: 8dacd37fd778 RCX:  [  
> 304.852563] RDX: 8dacd37fd7a0 RSI: 8dacd37fd3b8 RDI: 
> 8dac672a5f80 [  304.852570] RBP: 8dacd37fd3a0 R08: 
> 0001 R09:  [  304.852577] R10: 
> 8dac68185500 R11: 9afac17bbd00 R12: 8dac672a5f80 [  
> 304.852584] R13: 8dac672a5fe0 R14: 8dacd37fd380 R15: 
> 8dac672a5f80 [  304.852590] FS:  7f0fa9b30c40() 
> GS:8dadb648() knlGS: [  304.852598] CS:  0010 DS: 
>  ES:  CR0: 80050033 [  304.852604] CR2: 7f4bf1a1ba50 CR3: 
> 000108c58004 CR4: 003706e0 [  304.852611] DR0:  
> DR1:  DR2:  [  304.852618] DR3: 
>  DR6: fffe0ff0 DR7: 0400 [  304.852625] 
> Call Trace:
> [  304.852629]  
> [  304.852632]  drm_buddy_free_list+0x2a/0x60 [drm_buddy] [  
> 304.852639]  amdgpu_vram_mgr_fini+0xea/0x180 [amdgpu] [  304.852827]  
> amdgpu_ttm_fini+0x1f9/0x280 [amdgpu] [  304.852925]  
> amdgpu_bo_fini+0x22/0x90 [amdgpu] [  304.853022]  
> gmc_v11_0_sw_fini+0x26/0x30 [amdgpu] [  304.853132]  
> amdgpu_device_fini_sw+0xc5/0x3b0 [amdgpu] [  304.853229]  
> amdgpu_driver_release_kms+0x12/0x30 [amdgpu] [  304.853327]  
> drm_dev_release+0x20/0x40 [drm] [  304.853352]  
> release_nodes+0x35/0xb0 [  304.853359]  devres_release_all+0x8b/0xc0 [  
> 304.853364]  device_unbind_cleanup+0xe/0x70 [  304.853370]  
> device_release_driver_internal+0xee/0x160
> [  304.853377]  driver_detach+0x44/0x90 [  304.853382]  
> bus_remove_driver+0x55/0xe0 [  304.853387]  
> pci_unregister_driver+0x3b/0x90 [  304.853393]  amdgpu_exit+0x11/0x69 
> [amdgpu] [  304.853540]  __x64_sys_delete_module+0x142/0x260
> [  304.853548]  ? exit_to_user_mode_prepare+0x3e/0x190
> [  304.853555]  do_syscall_64+0x38/0x90 [  304.853562]  
> entry_SYSCALL_64_after_hwframe+0x63/0xcd
>
> Signed-off-by: YiPeng Chai 

The subject line should probably read "when unloading amdgpu", but apart from 
that good catch.

Reviewed-by: Christian König 

> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
> index 0b598b510bd8..eb63324c30d2 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
> @@ -829,7 +829,7 @@ void amdgpu_vram_mgr_fini(struct amdgpu_device *adev)
>   kfree(rsv);
>   
>   list_for_each_entry_safe(rsv, temp, >reserved_pages, blocks) {
> - drm_buddy_free_list(>mm, >blocks);
> + drm_buddy_free_list(>mm, >allocated);
>   kfree(rsv);
>   }
>   drm_buddy_fini(>mm);


Re: [PATCH] drm/amdgpu: Fixed bug on error when uninstalling amdgpu

2022-12-16 Thread Christian König

Am 16.12.22 um 03:56 schrieb YiPeng Chai:

Fixed bug on error when uninstalling amdgpu.
The error message is as follows:
[  304.852489] kernel BUG at drivers/gpu/drm/drm_buddy.c:278!
[  304.852503] invalid opcode:  [#1] PREEMPT SMP NOPTI
[  304.852510] CPU: 2 PID: 4192 Comm: modprobe Tainted: GW IOE 
5.19.0-thomas #1
[  304.852519] Hardware name: ASUS System Product Name/PRIME Z390-A, BIOS 2004 
11/02/2021
[  304.852526] RIP: 0010:drm_buddy_free_block+0x26/0x30 [drm_buddy]
[  304.852535] Code: 00 00 00 90 0f 1f 44 00 00 48 8b 0e 89 c8 25 00 0c 00 00 3d 00 
04 00 00 75 10 48 8b 47 18 48 d3 e0 48 01 47 28 e9 fa fe ff ff <0f> 0b 0f 1f 84 
00 00 00 00 00 0f 1f 44 00 00 41 54 55 48 89 f5 53
[  304.852549] RSP: 0018:9afac17bbcb8 EFLAGS: 00010287
[  304.852556] RAX:  RBX: 8dacd37fd778 RCX: 
[  304.852563] RDX: 8dacd37fd7a0 RSI: 8dacd37fd3b8 RDI: 8dac672a5f80
[  304.852570] RBP: 8dacd37fd3a0 R08: 0001 R09: 
[  304.852577] R10: 8dac68185500 R11: 9afac17bbd00 R12: 8dac672a5f80
[  304.852584] R13: 8dac672a5fe0 R14: 8dacd37fd380 R15: 8dac672a5f80
[  304.852590] FS:  7f0fa9b30c40() GS:8dadb648() 
knlGS:
[  304.852598] CS:  0010 DS:  ES:  CR0: 80050033
[  304.852604] CR2: 7f4bf1a1ba50 CR3: 000108c58004 CR4: 003706e0
[  304.852611] DR0:  DR1:  DR2: 
[  304.852618] DR3:  DR6: fffe0ff0 DR7: 0400
[  304.852625] Call Trace:
[  304.852629]  
[  304.852632]  drm_buddy_free_list+0x2a/0x60 [drm_buddy]
[  304.852639]  amdgpu_vram_mgr_fini+0xea/0x180 [amdgpu]
[  304.852827]  amdgpu_ttm_fini+0x1f9/0x280 [amdgpu]
[  304.852925]  amdgpu_bo_fini+0x22/0x90 [amdgpu]
[  304.853022]  gmc_v11_0_sw_fini+0x26/0x30 [amdgpu]
[  304.853132]  amdgpu_device_fini_sw+0xc5/0x3b0 [amdgpu]
[  304.853229]  amdgpu_driver_release_kms+0x12/0x30 [amdgpu]
[  304.853327]  drm_dev_release+0x20/0x40 [drm]
[  304.853352]  release_nodes+0x35/0xb0
[  304.853359]  devres_release_all+0x8b/0xc0
[  304.853364]  device_unbind_cleanup+0xe/0x70
[  304.853370]  device_release_driver_internal+0xee/0x160
[  304.853377]  driver_detach+0x44/0x90
[  304.853382]  bus_remove_driver+0x55/0xe0
[  304.853387]  pci_unregister_driver+0x3b/0x90
[  304.853393]  amdgpu_exit+0x11/0x69 [amdgpu]
[  304.853540]  __x64_sys_delete_module+0x142/0x260
[  304.853548]  ? exit_to_user_mode_prepare+0x3e/0x190
[  304.853555]  do_syscall_64+0x38/0x90
[  304.853562]  entry_SYSCALL_64_after_hwframe+0x63/0xcd

Signed-off-by: YiPeng Chai 


The subject line should probably read "when unloading amdgpu", but apart 
from that good catch.


Reviewed-by: Christian König 


---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
index 0b598b510bd8..eb63324c30d2 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
@@ -829,7 +829,7 @@ void amdgpu_vram_mgr_fini(struct amdgpu_device *adev)
kfree(rsv);
  
  	list_for_each_entry_safe(rsv, temp, >reserved_pages, blocks) {

-   drm_buddy_free_list(>mm, >blocks);
+   drm_buddy_free_list(>mm, >allocated);
kfree(rsv);
}
drm_buddy_fini(>mm);