[PATCH] drm/amd/powerplay: update smu11_driver_if_navi10.h

2020-02-09 Thread Evan Quan
To pair the latest SMU firmwares.

Change-Id: I5262c750fa08bc6268b43e3420e110e9ee71ccf6
Signed-off-by: Evan Quan 
---
 drivers/gpu/drm/amd/powerplay/inc/smu11_driver_if_navi10.h | 3 ++-
 drivers/gpu/drm/amd/powerplay/inc/smu_v11_0.h  | 4 ++--
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/powerplay/inc/smu11_driver_if_navi10.h 
b/drivers/gpu/drm/amd/powerplay/inc/smu11_driver_if_navi10.h
index ac0120e384be..4b2da98afcd2 100644
--- a/drivers/gpu/drm/amd/powerplay/inc/smu11_driver_if_navi10.h
+++ b/drivers/gpu/drm/amd/powerplay/inc/smu11_driver_if_navi10.h
@@ -701,7 +701,8 @@ typedef struct {
   // APCC Settings
   uint16_t PccThresholdLow;
   uint16_t PccThresholdHigh;
-  uint32_t PaddingAPCC[6];  //FIXME pending SPEC
+  uint32_t MGpuFanBoostLimitRpm;
+  uint32_t PaddingAPCC[5];
 
   // Temperature Dependent Vmin
   uint16_t VDDGFX_TVmin;   //Celcius
diff --git a/drivers/gpu/drm/amd/powerplay/inc/smu_v11_0.h 
b/drivers/gpu/drm/amd/powerplay/inc/smu_v11_0.h
index d5314d12628a..acccdf621b4e 100644
--- a/drivers/gpu/drm/amd/powerplay/inc/smu_v11_0.h
+++ b/drivers/gpu/drm/amd/powerplay/inc/smu_v11_0.h
@@ -28,8 +28,8 @@
 #define SMU11_DRIVER_IF_VERSION_INV 0x
 #define SMU11_DRIVER_IF_VERSION_VG20 0x13
 #define SMU11_DRIVER_IF_VERSION_ARCT 0x12
-#define SMU11_DRIVER_IF_VERSION_NV10 0x33
-#define SMU11_DRIVER_IF_VERSION_NV14 0x34
+#define SMU11_DRIVER_IF_VERSION_NV10 0x35
+#define SMU11_DRIVER_IF_VERSION_NV14 0x36
 
 /* MP Apertures */
 #define MP0_Public 0x0380
-- 
2.25.0

___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


RE: [PATCH 3/3] drm/amdgpu/powerplay: fix baco check for vega20

2020-02-09 Thread Quan, Evan
Thanks for the fixes. The series is reviewed-by: Evan Quan 

> -Original Message-
> From: amd-gfx  On Behalf Of Alex
> Deucher
> Sent: Friday, February 7, 2020 11:19 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander 
> Subject: [PATCH 3/3] drm/amdgpu/powerplay: fix baco check for vega20
> 
> We need to handle the runpm case as well as GPU reset.
> 
> Signed-off-by: Alex Deucher 
> ---
>  drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
> b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
> index 3b3ec5666051..08b6ba39a6d7 100644
> --- a/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
> +++ b/drivers/gpu/drm/amd/powerplay/hwmgr/vega20_hwmgr.c
> @@ -487,15 +487,16 @@ static int vega20_setup_asic_task(struct pp_hwmgr
> *hwmgr)  {
>   struct amdgpu_device *adev = (struct amdgpu_device *)(hwmgr-
> >adev);
>   int ret = 0;
> + bool use_baco = (adev->in_gpu_reset &&
> +  (amdgpu_asic_reset_method(adev) ==
> AMD_RESET_METHOD_BACO)) ||
> + (adev->in_runpm && amdgpu_asic_supports_baco(adev));
> 
>   ret = vega20_init_sclk_threshold(hwmgr);
>   PP_ASSERT_WITH_CODE(!ret,
>   "Failed to init sclk threshold!",
>   return ret);
> 
> - if (adev->in_gpu_reset &&
> - (amdgpu_asic_reset_method(adev) ==
> AMD_RESET_METHOD_BACO)) {
> -
> + if (use_baco) {
>   ret = vega20_baco_apply_vdci_flush_workaround(hwmgr);
>   if (ret)
>   pr_err("Failed to apply vega20 baco workaround!\n");
> --
> 2.24.1
> 
> ___
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.
> freedesktop.org%2Fmailman%2Flistinfo%2Famd-
> gfxdata=02%7C01%7Cevan.quan%40amd.com%7C781e5f8feb044301f
> 50408d7abe11d18%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C6
> 37166855729019493sdata=L8aUO%2F8ut25Yf7zVSfMDE207bcyYnYi2hd
> 7ANXL6vEk%3Dreserved=0
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: Power limit OD stopped working for navi10 - broken on previously working commit

2020-02-09 Thread Matt Coffin
Sorry for the followup, but I did finally manage to track this down to a
firmware/driver incompatibility and bisected `linux-firmware` to find
when it broke.

Since the firmware is just binaries, I can't really tell ya what is
wrong, but this is the commit where writing to the sysfs interface (and
in general sending the SetPptPowerLimit message to the SMC) stopped
doing anything.

https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/commit/amdgpu?id=af76fd0ed266440ac406d5737218af7ac3cfc750

Let me know what I can do to help get this fixed. For now, I've just
downgraded to the first-released microcode as a stop-gap.

On 2/9/20 2:13 PM, Matt Coffin wrote:
> I was doing some benchmarking, and noticed some poor performance,
> indicating that my overdrive settings were not in place, which they
> were. hwmon/power1_cap reports the correctly adjusted value after it is
> written to, and I confirmed with a quick patch that the updated power
> limit value is actually being returned from the SMU after it is set, yet
> the card refuses to go over stock settings (+/- 3% of stock power draw,
> even with a 50% increase in power limit).
> 
> Since I worked on that code a while back, I went to go bisect, using
> c39f062e881dcc6ab4c1c1c5835dc774be1bcfd6 as a starting location, since I
> know that commit had working power limit overdrive before.
> 
> Strangely, I'm seeing the same behavior on that
> previously-known-to-be-working commit!
> 
> This happens for both *increased* and *decreased* power limits. sysfs
> reflects the change, but I see no change in the actual power draw on the
> card, and for the *increased* case, performance reflects a card that is
> throttling due to power limits.
> 
> Were there any firmware changes or anything that could be causing this
> since I don't know where to start since a previously-working commit is
> now somehow broken.
> 
> Since the behavior seems to have changed on me, it would also be
> incredibly helpful if anyone can either confirm or deny that they can
> reproduce this problem (or not) off of the latest codebase OR
> c39f062e881dcc6ab4c1c1c5835dc774be1bcfd6.
> 
> Any help, testing information, or simple confirm/deny from your side
> would go a long way.
> 
> Thanks in advance,
> Matt
> 



signature.asc
Description: OpenPGP digital signature
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


[PATCH V2] drm/amdgpu: Do not move root PT bo to relocated list

2020-02-09 Thread Pan, Xinhui
hit panic when we update the page tables.

<1>[  122.103290] BUG: kernel NULL pointer dereference, address: 
0008
<1>[  122.103348] #PF: supervisor read access in kernel mode
<1>[  122.103376] #PF: error_code(0x) - not-present page
<6>[  122.103403] PGD 0 P4D 0
<4>[  122.103421] Oops:  [#1] SMP PTI
<4>[  122.103442] CPU: 13 PID: 2133 Comm: kfdtest Tainted: G   OE 
5.4.0-rc7+ #7
<4>[  122.103480] Hardware name: Supermicro SYS-7048GR-TR/X10DRG-Q, BIOS 3.0b 
03/09/2018
<4>[  122.103657] RIP: 0010:amdgpu_vm_update_pdes+0x140/0x330 [amdgpu]
<4>[  122.103689] Code: 03 4c 89 73 08 49 89 9d c8 00 00 00 48 8b 7b f0 c6 43 
10 00 45 31 c0 48 8b 87 28 04 00 00 48 85 c0 74 07 4c 8b 80 20 04 00 00 <4d> 8b 
70 08 31 f6 49 8b 86 28 04 00 00 48 85 c0 74 0f 48 8b 80 28
<4>[  122.103769] RSP: 0018:b49a0a6a3a98 EFLAGS: 00010246
<4>[  122.103797] RAX:  RBX: 9020f823c148 RCX: 
dead0122
<4>[  122.103831] RDX: 9020ece70018 RSI: 9020f823c0c8 RDI: 
9010ca31c800
<4>[  122.103865] RBP: b49a0a6a3b38 R08:  R09: 
0001
<4>[  122.103899] R10: 6044f994 R11: df57fb58 R12: 
9020f823c000
<4>[  122.103933] R13: 9020f823c000 R14: 9020f823c0c8 R15: 
9010d5d2
<4>[  122.103968] FS:  7f32c83dc780() GS:9020ff38() 
knlGS:
<4>[  122.104006] CS:  0010 DS:  ES:  CR0: 80050033
<4>[  122.104035] CR2: 0008 CR3: 002036bba005 CR4: 
003606e0
<4>[  122.104069] DR0:  DR1:  DR2: 

<4>[  122.104103] DR3:  DR6: fffe0ff0 DR7: 
0400
<4>[  122.104137] Call Trace:
<4>[  122.104241]  vm_update_pds+0x31/0x50 [amdgpu]
<4>[  122.104347]  amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0x2ef/0x690 [amdgpu]
<4>[  122.104466]  kfd_process_alloc_gpuvm+0x98/0x190 [amdgpu]
<4>[  122.104576]  kfd_process_device_init_vm.part.8+0xf3/0x1f0 [amdgpu]
<4>[  122.104688]  kfd_process_device_init_vm+0x24/0x30 [amdgpu]
<4>[  122.104794]  kfd_ioctl_acquire_vm+0xa4/0xc0 [amdgpu]
<4>[  122.104900]  kfd_ioctl+0x277/0x500 [amdgpu]
<4>[  122.105001]  ? kfd_ioctl_free_memory_of_gpu+0xc0/0xc0 [amdgpu]
<4>[  122.105039]  ? rcu_read_lock_sched_held+0x4f/0x80
<4>[  122.105068]  ? kmem_cache_free+0x2ba/0x300
<4>[  122.105093]  ? vm_area_free+0x18/0x20
<4>[  122.105117]  ? find_held_lock+0x35/0xa0
<4>[  122.105143]  do_vfs_ioctl+0xa9/0x6f0
<4>[  122.106001]  ksys_ioctl+0x75/0x80
<4>[  122.106802]  ? do_syscall_64+0x17/0x230
<4>[  122.107605]  __x64_sys_ioctl+0x1a/0x20
<4>[  122.108378]  do_syscall_64+0x5f/0x230
<4>[  122.109118]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4>[  122.109842] RIP: 0033:0x7f32c6b495d7

Signed-off-by: xinhui pan 
---
change from v1:
   move root pt bo to idle state instead.
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 3195bc9..c3d1af5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -2619,9 +2619,12 @@ void amdgpu_vm_bo_invalidate(struct amdgpu_device *adev,
continue;
bo_base->moved = true;
 
-   if (bo->tbo.type == ttm_bo_type_kernel)
-   amdgpu_vm_bo_relocated(bo_base);
-   else if (bo->tbo.base.resv == vm->root.base.bo->tbo.base.resv)
+   if (bo->tbo.type == ttm_bo_type_kernel) {
+   if (bo->parent)
+   amdgpu_vm_bo_relocated(bo_base);
+   else
+   amdgpu_vm_bo_idle(bo_base);
+   } else if (bo->tbo.base.resv == vm->root.base.bo->tbo.base.resv)
amdgpu_vm_bo_moved(bo_base);
else
amdgpu_vm_bo_invalidated(bo_base);
-- 
2.7.4
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: Do not move root PT bo to relocated list

2020-02-09 Thread Pan, Xinhui
[AMD Official Use Only - Internal Distribution Only]

If so the function name does not match its functionality.


From: Christian König 
Sent: Sunday, February 9, 2020 4:21:13 PM
To: Pan, Xinhui ; amd-gfx@lists.freedesktop.org 

Cc: Deucher, Alexander ; Koenig, Christian 

Subject: Re: [PATCH] drm/amdgpu: Do not move root PT bo to relocated list

Am 09.02.20 um 03:52 schrieb Pan, Xinhui:
> hit panic when we update the page tables.
>
> <1>[  122.103290] BUG: kernel NULL pointer dereference, address: 
> 0008
> <1>[  122.103348] #PF: supervisor read access in kernel mode
> <1>[  122.103376] #PF: error_code(0x) - not-present page
> <6>[  122.103403] PGD 0 P4D 0
> <4>[  122.103421] Oops:  [#1] SMP PTI
> <4>[  122.103442] CPU: 13 PID: 2133 Comm: kfdtest Tainted: G   OE 
> 5.4.0-rc7+ #7
> <4>[  122.103480] Hardware name: Supermicro SYS-7048GR-TR/X10DRG-Q, BIOS 3.0b 
> 03/09/2018
> <4>[  122.103657] RIP: 0010:amdgpu_vm_update_pdes+0x140/0x330 [amdgpu]
> <4>[  122.103689] Code: 03 4c 89 73 08 49 89 9d c8 00 00 00 48 8b 7b f0 c6 43 
> 10 00 45 31 c0 48 8b 87 28 04 00 00 48 85 c0 74 07 4c 8b 80 20 04 00 00 <4d> 
> 8b 70 08 31 f6 49 8b 86 28 04 00 00 48 85 c0 74 0f 48 8b 80 28
> <4>[  122.103769] RSP: 0018:b49a0a6a3a98 EFLAGS: 00010246
> <4>[  122.103797] RAX:  RBX: 9020f823c148 RCX: 
> dead0122
> <4>[  122.103831] RDX: 9020ece70018 RSI: 9020f823c0c8 RDI: 
> 9010ca31c800
> <4>[  122.103865] RBP: b49a0a6a3b38 R08:  R09: 
> 0001
> <4>[  122.103899] R10: 6044f994 R11: df57fb58 R12: 
> 9020f823c000
> <4>[  122.103933] R13: 9020f823c000 R14: 9020f823c0c8 R15: 
> 9010d5d2
> <4>[  122.103968] FS:  7f32c83dc780() GS:9020ff38() 
> knlGS:
> <4>[  122.104006] CS:  0010 DS:  ES:  CR0: 80050033
> <4>[  122.104035] CR2: 0008 CR3: 002036bba005 CR4: 
> 003606e0
> <4>[  122.104069] DR0:  DR1:  DR2: 
> 
> <4>[  122.104103] DR3:  DR6: fffe0ff0 DR7: 
> 0400
> <4>[  122.104137] Call Trace:
> <4>[  122.104241]  vm_update_pds+0x31/0x50 [amdgpu]
> <4>[  122.104347]  amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0x2ef/0x690 [amdgpu]
> <4>[  122.104466]  kfd_process_alloc_gpuvm+0x98/0x190 [amdgpu]
> <4>[  122.104576]  kfd_process_device_init_vm.part.8+0xf3/0x1f0 [amdgpu]
> <4>[  122.104688]  kfd_process_device_init_vm+0x24/0x30 [amdgpu]
> <4>[  122.104794]  kfd_ioctl_acquire_vm+0xa4/0xc0 [amdgpu]
> <4>[  122.104900]  kfd_ioctl+0x277/0x500 [amdgpu]
> <4>[  122.105001]  ? kfd_ioctl_free_memory_of_gpu+0xc0/0xc0 [amdgpu]
> <4>[  122.105039]  ? rcu_read_lock_sched_held+0x4f/0x80
> <4>[  122.105068]  ? kmem_cache_free+0x2ba/0x300
> <4>[  122.105093]  ? vm_area_free+0x18/0x20
> <4>[  122.105117]  ? find_held_lock+0x35/0xa0
> <4>[  122.105143]  do_vfs_ioctl+0xa9/0x6f0
> <4>[  122.106001]  ksys_ioctl+0x75/0x80
> <4>[  122.106802]  ? do_syscall_64+0x17/0x230
> <4>[  122.107605]  __x64_sys_ioctl+0x1a/0x20
> <4>[  122.108378]  do_syscall_64+0x5f/0x230
> <4>[  122.109118]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
> <4>[  122.109842] RIP: 0033:0x7f32c6b495d7
>
> Signed-off-by: xinhui pan 
> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> index 3195bc90985a..3c388fdf335c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
> @@ -2619,7 +2619,7 @@ void amdgpu_vm_bo_invalidate(struct amdgpu_device *adev,
>continue;
>bo_base->moved = true;
>
> - if (bo->tbo.type == ttm_bo_type_kernel)
> + if (bo->tbo.type == ttm_bo_type_kernel && bo->parent)

Good catch, but that would mean that we move the root PD to the moved
state which in turn is illegal as well.

Maybe better adjust amdgpu_vm_bo_relocated() to move the root PD to the
idle state instead.

Christian.


>amdgpu_vm_bo_relocated(bo_base);
>else if (bo->tbo.base.resv == vm->root.base.bo->tbo.base.resv)
>amdgpu_vm_bo_moved(bo_base);


From: Christian König 
Sent: Sunday, February 9, 2020 4:21:13 PM
To: Pan, Xinhui ; amd-gfx@lists.freedesktop.org 

Cc: Deucher, Alexander ; Koenig, Christian 

Subject: Re: [PATCH] drm/amdgpu: Do not move root PT bo to relocated list

Am 09.02.20 um 03:52 schrieb Pan, Xinhui:
> hit panic when we update the page tables.
>
> <1>[  122.103290] BUG: kernel NULL pointer dereference, address: 
> 0008
> <1>[  122.103348] #PF: supervisor read access in kernel mode
> <1>[  122.103376] #PF: error_code(0x) - not-present page
> <6>[  122.103403] PGD 0 P4D 0
> <4>[  122.103421] Oops:  [#1] SMP 

Re: [PATCH 2/2] drm/amdgpu:/navi10: use the ODCAP enum to index the caps array

2020-02-09 Thread Matt Coffin
On 2/6/20 12:55 PM, Alex Deucher wrote:
> Rather than the FEATURE_ID flags.  Avoids a possible reading past
> the end of the array.

Just to make sure I understand, this has been broken the whole time,
right, and just happened to be working because we were only using the
lower-end values and happened to not read past the end of the array?

I'll do some testing for navi10, and play around with actually disabling
the capabilities manually to make sure we're responding correctly.

Thanks for fixing it!



signature.asc
Description: OpenPGP digital signature
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Power limit OD stopped working for navi10 - broken on previously working commit

2020-02-09 Thread Matt Coffin
I was doing some benchmarking, and noticed some poor performance,
indicating that my overdrive settings were not in place, which they
were. hwmon/power1_cap reports the correctly adjusted value after it is
written to, and I confirmed with a quick patch that the updated power
limit value is actually being returned from the SMU after it is set, yet
the card refuses to go over stock settings (+/- 3% of stock power draw,
even with a 50% increase in power limit).

Since I worked on that code a while back, I went to go bisect, using
c39f062e881dcc6ab4c1c1c5835dc774be1bcfd6 as a starting location, since I
know that commit had working power limit overdrive before.

Strangely, I'm seeing the same behavior on that
previously-known-to-be-working commit!

This happens for both *increased* and *decreased* power limits. sysfs
reflects the change, but I see no change in the actual power draw on the
card, and for the *increased* case, performance reflects a card that is
throttling due to power limits.

Were there any firmware changes or anything that could be causing this
since I don't know where to start since a previously-working commit is
now somehow broken.

Since the behavior seems to have changed on me, it would also be
incredibly helpful if anyone can either confirm or deny that they can
reproduce this problem (or not) off of the latest codebase OR
c39f062e881dcc6ab4c1c1c5835dc774be1bcfd6.

Any help, testing information, or simple confirm/deny from your side
would go a long way.

Thanks in advance,
Matt
___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx


Re: [PATCH] drm/amdgpu: Do not move root PT bo to relocated list

2020-02-09 Thread Christian König

Am 09.02.20 um 03:52 schrieb Pan, Xinhui:

hit panic when we update the page tables.

<1>[  122.103290] BUG: kernel NULL pointer dereference, address: 
0008
<1>[  122.103348] #PF: supervisor read access in kernel mode
<1>[  122.103376] #PF: error_code(0x) - not-present page
<6>[  122.103403] PGD 0 P4D 0
<4>[  122.103421] Oops:  [#1] SMP PTI
<4>[  122.103442] CPU: 13 PID: 2133 Comm: kfdtest Tainted: G   OE 
5.4.0-rc7+ #7
<4>[  122.103480] Hardware name: Supermicro SYS-7048GR-TR/X10DRG-Q, BIOS 3.0b 
03/09/2018
<4>[  122.103657] RIP: 0010:amdgpu_vm_update_pdes+0x140/0x330 [amdgpu]
<4>[  122.103689] Code: 03 4c 89 73 08 49 89 9d c8 00 00 00 48 8b 7b f0 c6 43 10 00 
45 31 c0 48 8b 87 28 04 00 00 48 85 c0 74 07 4c 8b 80 20 04 00 00 <4d> 8b 70 08 31 f6 
49 8b 86 28 04 00 00 48 85 c0 74 0f 48 8b 80 28
<4>[  122.103769] RSP: 0018:b49a0a6a3a98 EFLAGS: 00010246
<4>[  122.103797] RAX:  RBX: 9020f823c148 RCX: 
dead0122
<4>[  122.103831] RDX: 9020ece70018 RSI: 9020f823c0c8 RDI: 
9010ca31c800
<4>[  122.103865] RBP: b49a0a6a3b38 R08:  R09: 
0001
<4>[  122.103899] R10: 6044f994 R11: df57fb58 R12: 
9020f823c000
<4>[  122.103933] R13: 9020f823c000 R14: 9020f823c0c8 R15: 
9010d5d2
<4>[  122.103968] FS:  7f32c83dc780() GS:9020ff38() 
knlGS:
<4>[  122.104006] CS:  0010 DS:  ES:  CR0: 80050033
<4>[  122.104035] CR2: 0008 CR3: 002036bba005 CR4: 
003606e0
<4>[  122.104069] DR0:  DR1:  DR2: 

<4>[  122.104103] DR3:  DR6: fffe0ff0 DR7: 
0400
<4>[  122.104137] Call Trace:
<4>[  122.104241]  vm_update_pds+0x31/0x50 [amdgpu]
<4>[  122.104347]  amdgpu_amdkfd_gpuvm_map_memory_to_gpu+0x2ef/0x690 [amdgpu]
<4>[  122.104466]  kfd_process_alloc_gpuvm+0x98/0x190 [amdgpu]
<4>[  122.104576]  kfd_process_device_init_vm.part.8+0xf3/0x1f0 [amdgpu]
<4>[  122.104688]  kfd_process_device_init_vm+0x24/0x30 [amdgpu]
<4>[  122.104794]  kfd_ioctl_acquire_vm+0xa4/0xc0 [amdgpu]
<4>[  122.104900]  kfd_ioctl+0x277/0x500 [amdgpu]
<4>[  122.105001]  ? kfd_ioctl_free_memory_of_gpu+0xc0/0xc0 [amdgpu]
<4>[  122.105039]  ? rcu_read_lock_sched_held+0x4f/0x80
<4>[  122.105068]  ? kmem_cache_free+0x2ba/0x300
<4>[  122.105093]  ? vm_area_free+0x18/0x20
<4>[  122.105117]  ? find_held_lock+0x35/0xa0
<4>[  122.105143]  do_vfs_ioctl+0xa9/0x6f0
<4>[  122.106001]  ksys_ioctl+0x75/0x80
<4>[  122.106802]  ? do_syscall_64+0x17/0x230
<4>[  122.107605]  __x64_sys_ioctl+0x1a/0x20
<4>[  122.108378]  do_syscall_64+0x5f/0x230
<4>[  122.109118]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
<4>[  122.109842] RIP: 0033:0x7f32c6b495d7

Signed-off-by: xinhui pan 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 3195bc90985a..3c388fdf335c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -2619,7 +2619,7 @@ void amdgpu_vm_bo_invalidate(struct amdgpu_device *adev,
continue;
bo_base->moved = true;
  
-		if (bo->tbo.type == ttm_bo_type_kernel)

+   if (bo->tbo.type == ttm_bo_type_kernel && bo->parent)


Good catch, but that would mean that we move the root PD to the moved 
state which in turn is illegal as well.


Maybe better adjust amdgpu_vm_bo_relocated() to move the root PD to the 
idle state instead.


Christian.



amdgpu_vm_bo_relocated(bo_base);
else if (bo->tbo.base.resv == vm->root.base.bo->tbo.base.resv)
amdgpu_vm_bo_moved(bo_base);


___
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx