On Wednesday, January 21, 2026 7:24:47 PM Central European Standard Time Alex 
Deucher wrote:
> From: Jon Doron <[email protected]>
> 
> On APUs such as Raven and Renoir (GC 9.1.0, 9.2.2, 9.3.0), the ih1 and
> ih2 interrupt ring buffers are not initialized. This is by design, as
> these secondary IH rings are only available on discrete GPUs. See
> vega10_ih_sw_init() which explicitly skips ih1/ih2 initialization when
> AMD_IS_APU is set.
> 
> However, amdgpu_gmc_filter_faults_remove() unconditionally uses ih1 to
> get the timestamp of the last interrupt entry. When retry faults are
> enabled on APUs (noretry=0), this function is called from the SVM page
> fault recovery path, resulting in a NULL pointer dereference when
> amdgpu_ih_decode_iv_ts_helper() attempts to access ih->ring[].
> 
> The crash manifests as:
> 
>   BUG: kernel NULL pointer dereference, address: 0000000000000004
>   RIP: 0010:amdgpu_ih_decode_iv_ts_helper+0x22/0x40 [amdgpu]
>   Call Trace:
>    amdgpu_gmc_filter_faults_remove+0x60/0x130 [amdgpu]
>    svm_range_restore_pages+0xae5/0x11c0 [amdgpu]
>    amdgpu_vm_handle_fault+0xc8/0x340 [amdgpu]
>    gmc_v9_0_process_interrupt+0x191/0x220 [amdgpu]
>    amdgpu_irq_dispatch+0xed/0x2c0 [amdgpu]
>    amdgpu_ih_process+0x84/0x100 [amdgpu]
> 
> This issue was exposed by commit 1446226d32a4 ("drm/amdgpu: Remove GC HW
> IP 9.3.0 from noretry=1") which changed the default for Renoir APU from
> noretry=1 to noretry=0, enabling retry fault handling and thus
> exercising the buggy code path.
> 
> Fix this by adding a check for ih1.ring_size before attempting to use
> it. Also restore the soft_ih support from commit dd299441654f ("drm/amdgpu:
> Rework retry fault removal").  This is needed if the hardware doesn't
> support secondary HW IH rings.
> 
> v2: additional updates (Alex)
> 
> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3814
> Fixes: dd299441654f ("drm/amdgpu: Rework retry fault removal")
> Cc: [email protected]
> Signed-off-by: Jon Doron <[email protected]>
> Signed-off-by: Alex Deucher <[email protected]>

Reviewed-by: Timur Kristóf <[email protected]>

Thank you for taking care of this!

> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c index
> 8e65fec9f534e..243d75917458a 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c
> @@ -498,8 +498,13 @@ void amdgpu_gmc_filter_faults_remove(struct
> amdgpu_device *adev, uint64_t addr,
> 
>       if (adev->irq.retry_cam_enabled)
>               return;
> +     else if (adev->irq.ih1.ring_size)
> +             ih = &adev->irq.ih1;
> +     else if (adev->irq.ih_soft.enabled)
> +             ih = &adev->irq.ih_soft;
> +     else
> +             return;
> 
> -     ih = &adev->irq.ih1;
>       /* Get the WPTR of the last entry in IH ring */
>       last_wptr = amdgpu_ih_get_wptr(adev, ih);
>       /* Order wptr with ring data. */




Reply via email to