[AMD Official Use Only - AMD Internal Distribution Only]

> -----Original Message-----
> From: Yang, Philip <[email protected]>
> Sent: Monday, October 13, 2025 7:55 AM
> To: Zhang, Jesse(Jie) <[email protected]>; [email protected]
> Cc: Deucher, Alexander <[email protected]>; Koenig, Christian
> <[email protected]>
> Subject: Re: [PATCH] drm/amdgpu: Fix NULL pointer dereference in APU threshold
> calculation
>
>
> On 2025-10-11 06:30, Jesse.Zhang wrote:
> > The kernel OOPS occurs due to a NULL pointer dereference in
> > amdttm_resource_manager_usage() when accessing the
> > ttm_resource_manager lock, specifically on APU devices:
> >
> > [  911.954646] BUG: kernel NULL pointer dereference, address:
> > 00000000000008f8 [  911.962437] #PF: supervisor write access in kernel
> > mode [  912.007045] RIP: 0010:_raw_spin_lock+0x1e/0x40 [  912.105151]
> > amdttm_resource_manager_usage+0x1f/0x40 [amdttm] [  912.111579]
> > amdgpu_cs_parser_bos.isra.0+0x543/0x800 [amdgpu]
>
> I just noticed another NULL pointer reference back trace from
> amdttm_resource_manager_usage, when using rocm-smi on APU, maybe add
> NULL pointer or is_app_apu check inside amdttm_resource_manager_usage?
[Zhang, Jesse(Jie)]  yes, we need it and will update the patch. Thanks Philip.

>
>   BUG: kernel NULL pointer dereference, address: 00000000000008f8
>   Call Trace:
>    amdttm_resource_manager_usage+0x1f/0x40 [amdttm]
>    amdgpu_mem_info_vram_used_show+0x1e/0x40 [amdgpu]
>    dev_attr_show+0x1d/0x40
>    kernfs_seq_show+0x27/0x30
>    seq_read_iter+0x126/0x490
>    kernfs_fop_read_iter+0x150/0x1d0
>    new_sync_read+0x101/0x190
>    vfs_read+0x147/0x1a0
>    ksys_read+0x67/0xe0
>    __x64_sys_read+0x1a/0x20
>
> Regards,
>
> Philip
>
> > This happens because APU devices (adev->gmc.is_app_apu) don't
> > initialize vram_mgr.manager (no dedicated VRAM), but the code still
> > attempted to calculate memory migration thresholds using VRAM-specific 
> > logic.
> >
> > Fix this by:
> > 1. Merging APU check with the existing bandwidth control support check in
> >     amdgpu_cs_get_threshold_for_moves()
> > 2. Returning 0 thresholds immediately for APU devices to skip
> > VRAM-related logic
> >
> > This prevents accessing uninitialized vram_mgr.manager on APUs and
> > avoids the NULL pointer dereference in _raw_spin_lock().
> >
> > Signed-off-by: Jesse Zhang <[email protected]>
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > index 5f515fdcc775..d80414b32015 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
> > @@ -709,7 +709,7 @@ static void amdgpu_cs_get_threshold_for_moves(struct
> amdgpu_device *adev,
> >      */
> >     const s64 us_upper_bound = 200000;
> >
> > -   if (!adev->mm_stats.log2_max_MBps) {
> > +   if ((!adev->mm_stats.log2_max_MBps) || adev->gmc.is_app_apu) {
> >             *max_bytes = 0;
> >             *max_vis_bytes = 0;
> >             return;

Reply via email to