On 2025-10-11 06:30, Jesse.Zhang wrote:
The kernel OOPS occurs due to a NULL pointer dereference in
amdttm_resource_manager_usage() when accessing the ttm_resource_manager lock,
specifically on APU devices:

[  911.954646] BUG: kernel NULL pointer dereference, address: 00000000000008f8
[  911.962437] #PF: supervisor write access in kernel mode
[  912.007045] RIP: 0010:_raw_spin_lock+0x1e/0x40
[  912.105151]  amdttm_resource_manager_usage+0x1f/0x40 [amdttm]
[  912.111579]  amdgpu_cs_parser_bos.isra.0+0x543/0x800 [amdgpu]

I just noticed another NULL pointer reference back trace from amdttm_resource_manager_usage, when using rocm-smi on APU, maybe add NULL pointer or is_app_apu check inside amdttm_resource_manager_usage?

 BUG: kernel NULL pointer dereference, address: 00000000000008f8
 Call Trace:
  amdttm_resource_manager_usage+0x1f/0x40 [amdttm]
  amdgpu_mem_info_vram_used_show+0x1e/0x40 [amdgpu]
  dev_attr_show+0x1d/0x40
  kernfs_seq_show+0x27/0x30
  seq_read_iter+0x126/0x490
  kernfs_fop_read_iter+0x150/0x1d0
  new_sync_read+0x101/0x190
  vfs_read+0x147/0x1a0
  ksys_read+0x67/0xe0
  __x64_sys_read+0x1a/0x20

Regards,

Philip

This happens because APU devices (adev->gmc.is_app_apu) don't initialize
vram_mgr.manager (no dedicated VRAM), but the code still attempted to calculate
memory migration thresholds using VRAM-specific logic.

Fix this by:
1. Merging APU check with the existing bandwidth control support check in
    amdgpu_cs_get_threshold_for_moves()
2. Returning 0 thresholds immediately for APU devices to skip VRAM-related logic

This prevents accessing uninitialized vram_mgr.manager on APUs and avoids the
NULL pointer dereference in _raw_spin_lock().

Signed-off-by: Jesse Zhang <[email protected]>
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
index 5f515fdcc775..d80414b32015 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
@@ -709,7 +709,7 @@ static void amdgpu_cs_get_threshold_for_moves(struct 
amdgpu_device *adev,
         */
        const s64 us_upper_bound = 200000;
- if (!adev->mm_stats.log2_max_MBps) {
+       if ((!adev->mm_stats.log2_max_MBps) || adev->gmc.is_app_apu) {
                *max_bytes = 0;
                *max_vis_bytes = 0;
                return;

Reply via email to