[PATCH] drm/amd/powerplay: update smu11_driver_if_navi10.h
To pair the latest SMU firmwares. Change-Id: I5262c750fa08bc6268b43e3420e110e9ee71ccf6 Signed-off-by: Evan Quan --- drivers/gpu/drm/amd/powerplay/inc/smu11_driver_if_navi10.h | 3 ++- drivers/gpu/drm/amd/powerplay/inc/smu_v11_0.h | 4 ++-- 2 files changed, 4 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/powerplay/inc/smu11_driver_if_navi10.h b/drivers/gpu/drm/amd/powerplay/inc/smu11_driver_if_navi10.h index ac0120e384be..4b2da98afcd2 100644 --- a/drivers/gpu/drm/amd/powerplay/inc/smu11_driver_if_navi10.h +++ b/drivers/gpu/drm/amd/powerplay/inc/smu11_driver_if_navi10.h @@ -701,7 +701,8 @@ typedef struct { // APCC Settings uint16_t PccThresholdLow; uint16_t PccThresholdHigh; - uint32_t PaddingAPCC[6]; //FIXME pending SPEC + uint32_t MGpuFanBoostLimitRpm; + uint32_t PaddingAPCC[5]; // Temperature Dependent Vmin uint16_t VDDGFX_TVmin; //Celcius diff --git a/drivers/gpu/drm/amd/powerplay/inc/smu_v11_0.h b/drivers/gpu/drm/amd/powerplay/inc/smu_v11_0.h index d5314d12628a..acccdf621b4e 100644 --- a/drivers/gpu/drm/amd/powerplay/inc/smu_v11_0.h +++ b/drivers/gpu/drm/amd/powerplay/inc/smu_v11_0.h @@ -28,8 +28,8 @@ #define SMU11_DRIVER_IF_VERSION_INV 0x #define SMU11_DRIVER_IF_VERSION_VG20 0x13 #define SMU11_DRIVER_IF_VERSION_ARCT 0x12 -#define SMU11_DRIVER_IF_VERSION_NV10 0x33 -#define SMU11_DRIVER_IF_VERSION_NV14 0x34 +#define SMU11_DRIVER_IF_VERSION_NV10 0x35 +#define SMU11_DRIVER_IF_VERSION_NV14 0x36 /* MP Apertures */ #define MP0_Public 0x0380 -- 2.25.0 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
RE: [PATCH 3/3] drm/amdgpu/powerplay: fix baco check for vega20
Thanks for the fixes. The series is reviewed-by: Evan Quan > -Original Message- > From: amd-gfx On Behalf Of Alex > Deucher > Sent: Friday, February 7, 2020 11:19 PM > To: amd-gfx@lists.freedesktop.org > Cc: Deucher, Alexander > Subject: [PATCH 3/3] drm/amdgpu/powerplay: fix baco check for vega20 > > We need to handle the runpm case as well as GPU reset. > > Signed-off-by: Alex Deucher > --- > drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c | 7 --- > 1 file changed, 4 insertions(+), 3 deletions(-) > > diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c > b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c > index 3b3ec5666051..08b6ba39a6d7 100644 > --- a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c > +++ b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c > @@ -487,15 +487,16 @@ static int vega20_setup_asic_task(struct pp_hwmgr > *hwmgr) { > struct amdgpu_device *adev = (struct amdgpu_device *)(hwmgr- > >adev); > int ret = 0; > + bool use_baco = (adev->in_gpu_reset && > + (amdgpu_asic_reset_method(adev) == > AMD_RESET_METHOD_BACO)) || > + (adev->in_runpm && amdgpu_asic_supports_baco(adev)); > > ret = vega20_init_sclk_threshold(hwmgr); > PP_ASSERT_WITH_CODE(!ret, > "Failed to init sclk threshold!", > return ret); > > - if (adev->in_gpu_reset && > - (amdgpu_asic_reset_method(adev) == > AMD_RESET_METHOD_BACO)) { > - > + if (use_baco) { > ret = vega20_baco_apply_vdci_flush_workaround(hwmgr); > if (ret) > pr_err("Failed to apply vega20 baco workaround!\n"); > -- > 2.24.1 > > ___ > amd-gfx mailing list > amd-gfx@lists.freedesktop.org > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists. > freedesktop.org%2Fmailman%2Flistinfo%2Famd- > gfxdata=02%7C01%7Cevan.quan%40amd.com%7C781e5f8feb044301f > 50408d7abe11d18%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C6 > 37166855729019493sdata=L8aUO%2F8ut25Yf7zVSfMDE207bcyYnYi2hd > 7ANXL6vEk%3Dreserved=0 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: Power limit OD stopped working for navi10 - broken on previously working commit
Sorry for the followup, but I did finally manage to track this down to a firmware/driver incompatibility and bisected `linux-firmware` to find when it broke. Since the firmware is just binaries, I can't really tell ya what is wrong, but this is the commit where writing to the sysfs interface (and in general sending the SetPptPowerLimit message to the SMC) stopped doing anything. https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/commit/amdgpu?id=af76fd0ed266440ac406d5737218af7ac3cfc750 Let me know what I can do to help get this fixed. For now, I've just downgraded to the first-released microcode as a stop-gap. On 2/9/20 2:13 PM, Matt Coffin wrote: > I was doing some benchmarking, and noticed some poor performance, > indicating that my overdrive settings were not in place, which they > were. hwmon/power1_cap reports the correctly adjusted value after it is > written to, and I confirmed with a quick patch that the updated power > limit value is actually being returned from the SMU after it is set, yet > the card refuses to go over stock settings (+/- 3% of stock power draw, > even with a 50% increase in power limit). > > Since I worked on that code a while back, I went to go bisect, using > c39f062e881dcc6ab4c1c1c5835dc774be1bcfd6 as a starting location, since I > know that commit had working power limit overdrive before. > > Strangely, I'm seeing the same behavior on that > previously-known-to-be-working commit! > > This happens for both *increased* and *decreased* power limits. sysfs > reflects the change, but I see no change in the actual power draw on the > card, and for the *increased* case, performance reflects a card that is > throttling due to power limits. > > Were there any firmware changes or anything that could be causing this > since I don't know where to start since a previously-working commit is > now somehow broken. > > Since the behavior seems to have changed on me, it would also be > incredibly helpful if anyone can either confirm or deny that they can > reproduce this problem (or not) off of the latest codebase OR > c39f062e881dcc6ab4c1c1c5835dc774be1bcfd6. > > Any help, testing information, or simple confirm/deny from your side > would go a long way. > > Thanks in advance, > Matt > signature.asc Description: OpenPGP digital signature ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
[PATCH V2] drm/amdgpu: Do not move root PT bo to relocated list
hit panic when we update the page tables. <1>[ 122.103290] BUG: kernel NULL pointer dereference, address: 0008 <1>[ 122.103348] #PF: supervisor read access in kernel mode <1>[ 122.103376] #PF: error_code(0x) - not-present page <6>[ 122.103403] PGD 0 P4D 0 <4>[ 122.103421] Oops: [#1] SMP PTI <4>[ 122.103442] CPU: 13 PID: 2133 Comm: kfdtest Tainted: G OE 5.4.0-rc7+ #7 <4>[ 122.103480] Hardware name: Supermicro SYS-7048GR-TR/X10DRG-Q, BIOS 3.0b 03/09/2018 <4>[ 122.103657] RIP: 0010:amdgpu_vm_update_pdes+0x140/0x330 [amdgpu] <4>[ 122.103689] Code: 03 4c 89 73 08 49 89 9d c8 00 00 00 48 8b 7b f0 c6 43 10 00 45 31 c0 48 8b 87 28 04 00 00 48 85 c0 74 07 4c 8b 80 20 04 00 00 <4d> 8b 70 08 31 f6 49 8b 86 28 04 00 00 48 85 c0 74 0f 48 8b 80 28 <4>[ 122.103769] RSP: 0018:b49a0a6a3a98 EFLAGS: 00010246 <4>[ 122.103797] RAX: RBX: 9020f823c148 RCX: dead0122 <4>[ 122.103831] RDX: 9020ece70018 RSI: 9020f823c0c8 RDI: 9010ca31c800 <4>[ 122.103865] RBP: b49a0a6a3b38 R08: R09: 0001 <4>[ 122.103899] R10: 6044f994 R11: df57fb58 R12: 9020f823c000 <4>[ 122.103933] R13: 9020f823c000 R14: 9020f823c0c8 R15: 9010d5d2 <4>[ 122.103968] FS: 7f32c83dc780() GS:9020ff38() knlGS: <4>[ 122.104006] CS: 0010 DS: ES: CR0: 80050033 <4>[ 122.104035] CR2: 0008 CR3: 002036bba005 CR4: 003606e0 <4>[ 122.104069] DR0: DR1: DR2: <4>[ 122.104103] DR3: DR6: fffe0ff0 DR7: 0400 <4>[ 122.104137] Call Trace: <4>[ 122.104241] vm_update_pds+0x31/0x50 [amdgpu] <4>[ 122.104347] amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0x2ef/0x690 [amdgpu] <4>[ 122.104466] kfd_process_alloc_gpuvm+0x98/0x190 [amdgpu] <4>[ 122.104576] kfd_process_device_init_vm.part.8+0xf3/0x1f0 [amdgpu] <4>[ 122.104688] kfd_process_device_init_vm+0x24/0x30 [amdgpu] <4>[ 122.104794] kfd_ioctl_acquire_vm+0xa4/0xc0 [amdgpu] <4>[ 122.104900] kfd_ioctl+0x277/0x500 [amdgpu] <4>[ 122.105001] ? kfd_ioctl_free_memory_of_gpu+0xc0/0xc0 [amdgpu] <4>[ 122.105039] ? rcu_read_lock_sched_held+0x4f/0x80 <4>[ 122.105068] ? kmem_cache_free+0x2ba/0x300 <4>[ 122.105093] ? vm_area_free+0x18/0x20 <4>[ 122.105117] ? find_held_lock+0x35/0xa0 <4>[ 122.105143] do_vfs_ioctl+0xa9/0x6f0 <4>[ 122.106001] ksys_ioctl+0x75/0x80 <4>[ 122.106802] ? do_syscall_64+0x17/0x230 <4>[ 122.107605] __x64_sys_ioctl+0x1a/0x20 <4>[ 122.108378] do_syscall_64+0x5f/0x230 <4>[ 122.109118] entry_SYSCALL_64_after_hwframe+0x49/0xbe <4>[ 122.109842] RIP: 0033:0x7f32c6b495d7 Signed-off-by: xinhui pan --- change from v1: move root pt bo to idle state instead. --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index 3195bc9..c3d1af5 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -2619,9 +2619,12 @@ void amdgpu_vm_bo_invalidate(struct amdgpu_device *adev, continue; bo_base->moved = true; - if (bo->tbo.type == ttm_bo_type_kernel) - amdgpu_vm_bo_relocated(bo_base); - else if (bo->tbo.base.resv == vm->root.base.bo->tbo.base.resv) + if (bo->tbo.type == ttm_bo_type_kernel) { + if (bo->parent) + amdgpu_vm_bo_relocated(bo_base); + else + amdgpu_vm_bo_idle(bo_base); + } else if (bo->tbo.base.resv == vm->root.base.bo->tbo.base.resv) amdgpu_vm_bo_moved(bo_base); else amdgpu_vm_bo_invalidated(bo_base); -- 2.7.4 ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amdgpu: Do not move root PT bo to relocated list
[AMD Official Use Only - Internal Distribution Only] If so the function name does not match its functionality. From: Christian König Sent: Sunday, February 9, 2020 4:21:13 PM To: Pan, Xinhui ; amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander ; Koenig, Christian Subject: Re: [PATCH] drm/amdgpu: Do not move root PT bo to relocated list Am 09.02.20 um 03:52 schrieb Pan, Xinhui: > hit panic when we update the page tables. > > <1>[ 122.103290] BUG: kernel NULL pointer dereference, address: > 0008 > <1>[ 122.103348] #PF: supervisor read access in kernel mode > <1>[ 122.103376] #PF: error_code(0x) - not-present page > <6>[ 122.103403] PGD 0 P4D 0 > <4>[ 122.103421] Oops: [#1] SMP PTI > <4>[ 122.103442] CPU: 13 PID: 2133 Comm: kfdtest Tainted: G OE > 5.4.0-rc7+ #7 > <4>[ 122.103480] Hardware name: Supermicro SYS-7048GR-TR/X10DRG-Q, BIOS 3.0b > 03/09/2018 > <4>[ 122.103657] RIP: 0010:amdgpu_vm_update_pdes+0x140/0x330 [amdgpu] > <4>[ 122.103689] Code: 03 4c 89 73 08 49 89 9d c8 00 00 00 48 8b 7b f0 c6 43 > 10 00 45 31 c0 48 8b 87 28 04 00 00 48 85 c0 74 07 4c 8b 80 20 04 00 00 <4d> > 8b 70 08 31 f6 49 8b 86 28 04 00 00 48 85 c0 74 0f 48 8b 80 28 > <4>[ 122.103769] RSP: 0018:b49a0a6a3a98 EFLAGS: 00010246 > <4>[ 122.103797] RAX: RBX: 9020f823c148 RCX: > dead0122 > <4>[ 122.103831] RDX: 9020ece70018 RSI: 9020f823c0c8 RDI: > 9010ca31c800 > <4>[ 122.103865] RBP: b49a0a6a3b38 R08: R09: > 0001 > <4>[ 122.103899] R10: 6044f994 R11: df57fb58 R12: > 9020f823c000 > <4>[ 122.103933] R13: 9020f823c000 R14: 9020f823c0c8 R15: > 9010d5d2 > <4>[ 122.103968] FS: 7f32c83dc780() GS:9020ff38() > knlGS: > <4>[ 122.104006] CS: 0010 DS: ES: CR0: 80050033 > <4>[ 122.104035] CR2: 0008 CR3: 002036bba005 CR4: > 003606e0 > <4>[ 122.104069] DR0: DR1: DR2: > > <4>[ 122.104103] DR3: DR6: fffe0ff0 DR7: > 0400 > <4>[ 122.104137] Call Trace: > <4>[ 122.104241] vm_update_pds+0x31/0x50 [amdgpu] > <4>[ 122.104347] amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0x2ef/0x690 [amdgpu] > <4>[ 122.104466] kfd_process_alloc_gpuvm+0x98/0x190 [amdgpu] > <4>[ 122.104576] kfd_process_device_init_vm.part.8+0xf3/0x1f0 [amdgpu] > <4>[ 122.104688] kfd_process_device_init_vm+0x24/0x30 [amdgpu] > <4>[ 122.104794] kfd_ioctl_acquire_vm+0xa4/0xc0 [amdgpu] > <4>[ 122.104900] kfd_ioctl+0x277/0x500 [amdgpu] > <4>[ 122.105001] ? kfd_ioctl_free_memory_of_gpu+0xc0/0xc0 [amdgpu] > <4>[ 122.105039] ? rcu_read_lock_sched_held+0x4f/0x80 > <4>[ 122.105068] ? kmem_cache_free+0x2ba/0x300 > <4>[ 122.105093] ? vm_area_free+0x18/0x20 > <4>[ 122.105117] ? find_held_lock+0x35/0xa0 > <4>[ 122.105143] do_vfs_ioctl+0xa9/0x6f0 > <4>[ 122.106001] ksys_ioctl+0x75/0x80 > <4>[ 122.106802] ? do_syscall_64+0x17/0x230 > <4>[ 122.107605] __x64_sys_ioctl+0x1a/0x20 > <4>[ 122.108378] do_syscall_64+0x5f/0x230 > <4>[ 122.109118] entry_SYSCALL_64_after_hwframe+0x49/0xbe > <4>[ 122.109842] RIP: 0033:0x7f32c6b495d7 > > Signed-off-by: xinhui pan > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > index 3195bc90985a..3c388fdf335c 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c > @@ -2619,7 +2619,7 @@ void amdgpu_vm_bo_invalidate(struct amdgpu_device *adev, >continue; >bo_base->moved = true; > > - if (bo->tbo.type == ttm_bo_type_kernel) > + if (bo->tbo.type == ttm_bo_type_kernel && bo->parent) Good catch, but that would mean that we move the root PD to the moved state which in turn is illegal as well. Maybe better adjust amdgpu_vm_bo_relocated() to move the root PD to the idle state instead. Christian. >amdgpu_vm_bo_relocated(bo_base); >else if (bo->tbo.base.resv == vm->root.base.bo->tbo.base.resv) >amdgpu_vm_bo_moved(bo_base); From: Christian König Sent: Sunday, February 9, 2020 4:21:13 PM To: Pan, Xinhui ; amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander ; Koenig, Christian Subject: Re: [PATCH] drm/amdgpu: Do not move root PT bo to relocated list Am 09.02.20 um 03:52 schrieb Pan, Xinhui: > hit panic when we update the page tables. > > <1>[ 122.103290] BUG: kernel NULL pointer dereference, address: > 0008 > <1>[ 122.103348] #PF: supervisor read access in kernel mode > <1>[ 122.103376] #PF: error_code(0x) - not-present page > <6>[ 122.103403] PGD 0 P4D 0 > <4>[ 122.103421] Oops: [#1] SMP
Re: [PATCH 2/2] drm/amdgpu:/navi10: use the ODCAP enum to index the caps array
On 2/6/20 12:55 PM, Alex Deucher wrote: > Rather than the FEATURE_ID flags. Avoids a possible reading past > the end of the array. Just to make sure I understand, this has been broken the whole time, right, and just happened to be working because we were only using the lower-end values and happened to not read past the end of the array? I'll do some testing for navi10, and play around with actually disabling the capabilities manually to make sure we're responding correctly. Thanks for fixing it! signature.asc Description: OpenPGP digital signature ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Power limit OD stopped working for navi10 - broken on previously working commit
I was doing some benchmarking, and noticed some poor performance, indicating that my overdrive settings were not in place, which they were. hwmon/power1_cap reports the correctly adjusted value after it is written to, and I confirmed with a quick patch that the updated power limit value is actually being returned from the SMU after it is set, yet the card refuses to go over stock settings (+/- 3% of stock power draw, even with a 50% increase in power limit). Since I worked on that code a while back, I went to go bisect, using c39f062e881dcc6ab4c1c1c5835dc774be1bcfd6 as a starting location, since I know that commit had working power limit overdrive before. Strangely, I'm seeing the same behavior on that previously-known-to-be-working commit! This happens for both *increased* and *decreased* power limits. sysfs reflects the change, but I see no change in the actual power draw on the card, and for the *increased* case, performance reflects a card that is throttling due to power limits. Were there any firmware changes or anything that could be causing this since I don't know where to start since a previously-working commit is now somehow broken. Since the behavior seems to have changed on me, it would also be incredibly helpful if anyone can either confirm or deny that they can reproduce this problem (or not) off of the latest codebase OR c39f062e881dcc6ab4c1c1c5835dc774be1bcfd6. Any help, testing information, or simple confirm/deny from your side would go a long way. Thanks in advance, Matt ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Re: [PATCH] drm/amdgpu: Do not move root PT bo to relocated list
Am 09.02.20 um 03:52 schrieb Pan, Xinhui: hit panic when we update the page tables. <1>[ 122.103290] BUG: kernel NULL pointer dereference, address: 0008 <1>[ 122.103348] #PF: supervisor read access in kernel mode <1>[ 122.103376] #PF: error_code(0x) - not-present page <6>[ 122.103403] PGD 0 P4D 0 <4>[ 122.103421] Oops: [#1] SMP PTI <4>[ 122.103442] CPU: 13 PID: 2133 Comm: kfdtest Tainted: G OE 5.4.0-rc7+ #7 <4>[ 122.103480] Hardware name: Supermicro SYS-7048GR-TR/X10DRG-Q, BIOS 3.0b 03/09/2018 <4>[ 122.103657] RIP: 0010:amdgpu_vm_update_pdes+0x140/0x330 [amdgpu] <4>[ 122.103689] Code: 03 4c 89 73 08 49 89 9d c8 00 00 00 48 8b 7b f0 c6 43 10 00 45 31 c0 48 8b 87 28 04 00 00 48 85 c0 74 07 4c 8b 80 20 04 00 00 <4d> 8b 70 08 31 f6 49 8b 86 28 04 00 00 48 85 c0 74 0f 48 8b 80 28 <4>[ 122.103769] RSP: 0018:b49a0a6a3a98 EFLAGS: 00010246 <4>[ 122.103797] RAX: RBX: 9020f823c148 RCX: dead0122 <4>[ 122.103831] RDX: 9020ece70018 RSI: 9020f823c0c8 RDI: 9010ca31c800 <4>[ 122.103865] RBP: b49a0a6a3b38 R08: R09: 0001 <4>[ 122.103899] R10: 6044f994 R11: df57fb58 R12: 9020f823c000 <4>[ 122.103933] R13: 9020f823c000 R14: 9020f823c0c8 R15: 9010d5d2 <4>[ 122.103968] FS: 7f32c83dc780() GS:9020ff38() knlGS: <4>[ 122.104006] CS: 0010 DS: ES: CR0: 80050033 <4>[ 122.104035] CR2: 0008 CR3: 002036bba005 CR4: 003606e0 <4>[ 122.104069] DR0: DR1: DR2: <4>[ 122.104103] DR3: DR6: fffe0ff0 DR7: 0400 <4>[ 122.104137] Call Trace: <4>[ 122.104241] vm_update_pds+0x31/0x50 [amdgpu] <4>[ 122.104347] amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0x2ef/0x690 [amdgpu] <4>[ 122.104466] kfd_process_alloc_gpuvm+0x98/0x190 [amdgpu] <4>[ 122.104576] kfd_process_device_init_vm.part.8+0xf3/0x1f0 [amdgpu] <4>[ 122.104688] kfd_process_device_init_vm+0x24/0x30 [amdgpu] <4>[ 122.104794] kfd_ioctl_acquire_vm+0xa4/0xc0 [amdgpu] <4>[ 122.104900] kfd_ioctl+0x277/0x500 [amdgpu] <4>[ 122.105001] ? kfd_ioctl_free_memory_of_gpu+0xc0/0xc0 [amdgpu] <4>[ 122.105039] ? rcu_read_lock_sched_held+0x4f/0x80 <4>[ 122.105068] ? kmem_cache_free+0x2ba/0x300 <4>[ 122.105093] ? vm_area_free+0x18/0x20 <4>[ 122.105117] ? find_held_lock+0x35/0xa0 <4>[ 122.105143] do_vfs_ioctl+0xa9/0x6f0 <4>[ 122.106001] ksys_ioctl+0x75/0x80 <4>[ 122.106802] ? do_syscall_64+0x17/0x230 <4>[ 122.107605] __x64_sys_ioctl+0x1a/0x20 <4>[ 122.108378] do_syscall_64+0x5f/0x230 <4>[ 122.109118] entry_SYSCALL_64_after_hwframe+0x49/0xbe <4>[ 122.109842] RIP: 0033:0x7f32c6b495d7 Signed-off-by: xinhui pan --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c index 3195bc90985a..3c388fdf335c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c @@ -2619,7 +2619,7 @@ void amdgpu_vm_bo_invalidate(struct amdgpu_device *adev, continue; bo_base->moved = true; - if (bo->tbo.type == ttm_bo_type_kernel) + if (bo->tbo.type == ttm_bo_type_kernel && bo->parent) Good catch, but that would mean that we move the root PD to the moved state which in turn is illegal as well. Maybe better adjust amdgpu_vm_bo_relocated() to move the root PD to the idle state instead. Christian. amdgpu_vm_bo_relocated(bo_base); else if (bo->tbo.base.resv == vm->root.base.bo->tbo.base.resv) amdgpu_vm_bo_moved(bo_base); ___ amd-gfx mailing list amd-gfx@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/amd-gfx