On 2025-10-11 06:30, Jesse.Zhang wrote:
The kernel OOPS occurs due to a NULL pointer dereference in amdttm_resource_manager_usage() when accessing the ttm_resource_manager lock, specifically on APU devices: [ 911.954646] BUG: kernel NULL pointer dereference, address: 00000000000008f8 [ 911.962437] #PF: supervisor write access in kernel mode [ 912.007045] RIP: 0010:_raw_spin_lock+0x1e/0x40 [ 912.105151] amdttm_resource_manager_usage+0x1f/0x40 [amdttm] [ 912.111579] amdgpu_cs_parser_bos.isra.0+0x543/0x800 [amdgpu]
I just noticed another NULL pointer reference back trace from amdttm_resource_manager_usage, when using rocm-smi on APU, maybe add NULL pointer or is_app_apu check inside amdttm_resource_manager_usage?
BUG: kernel NULL pointer dereference, address: 00000000000008f8 Call Trace: amdttm_resource_manager_usage+0x1f/0x40 [amdttm] amdgpu_mem_info_vram_used_show+0x1e/0x40 [amdgpu] dev_attr_show+0x1d/0x40 kernfs_seq_show+0x27/0x30 seq_read_iter+0x126/0x490 kernfs_fop_read_iter+0x150/0x1d0 new_sync_read+0x101/0x190 vfs_read+0x147/0x1a0 ksys_read+0x67/0xe0 __x64_sys_read+0x1a/0x20 Regards, Philip
This happens because APU devices (adev->gmc.is_app_apu) don't initialize vram_mgr.manager (no dedicated VRAM), but the code still attempted to calculate memory migration thresholds using VRAM-specific logic. Fix this by: 1. Merging APU check with the existing bandwidth control support check in amdgpu_cs_get_threshold_for_moves() 2. Returning 0 thresholds immediately for APU devices to skip VRAM-related logic This prevents accessing uninitialized vram_mgr.manager on APUs and avoids the NULL pointer dereference in _raw_spin_lock(). Signed-off-by: Jesse Zhang <[email protected]> --- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c index 5f515fdcc775..d80414b32015 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c @@ -709,7 +709,7 @@ static void amdgpu_cs_get_threshold_for_moves(struct amdgpu_device *adev, */ const s64 us_upper_bound = 200000;- if (!adev->mm_stats.log2_max_MBps) {+ if ((!adev->mm_stats.log2_max_MBps) || adev->gmc.is_app_apu) { *max_bytes = 0; *max_vis_bytes = 0; return;
