On Wednesday, January 21, 2026 7:24:47 PM Central European Standard Time Alex Deucher wrote: > From: Jon Doron <[email protected]> > > On APUs such as Raven and Renoir (GC 9.1.0, 9.2.2, 9.3.0), the ih1 and > ih2 interrupt ring buffers are not initialized. This is by design, as > these secondary IH rings are only available on discrete GPUs. See > vega10_ih_sw_init() which explicitly skips ih1/ih2 initialization when > AMD_IS_APU is set. > > However, amdgpu_gmc_filter_faults_remove() unconditionally uses ih1 to > get the timestamp of the last interrupt entry. When retry faults are > enabled on APUs (noretry=0), this function is called from the SVM page > fault recovery path, resulting in a NULL pointer dereference when > amdgpu_ih_decode_iv_ts_helper() attempts to access ih->ring[]. > > The crash manifests as: > > BUG: kernel NULL pointer dereference, address: 0000000000000004 > RIP: 0010:amdgpu_ih_decode_iv_ts_helper+0x22/0x40 [amdgpu] > Call Trace: > amdgpu_gmc_filter_faults_remove+0x60/0x130 [amdgpu] > svm_range_restore_pages+0xae5/0x11c0 [amdgpu] > amdgpu_vm_handle_fault+0xc8/0x340 [amdgpu] > gmc_v9_0_process_interrupt+0x191/0x220 [amdgpu] > amdgpu_irq_dispatch+0xed/0x2c0 [amdgpu] > amdgpu_ih_process+0x84/0x100 [amdgpu] > > This issue was exposed by commit 1446226d32a4 ("drm/amdgpu: Remove GC HW > IP 9.3.0 from noretry=1") which changed the default for Renoir APU from > noretry=1 to noretry=0, enabling retry fault handling and thus > exercising the buggy code path. > > Fix this by adding a check for ih1.ring_size before attempting to use > it. Also restore the soft_ih support from commit dd299441654f ("drm/amdgpu: > Rework retry fault removal"). This is needed if the hardware doesn't > support secondary HW IH rings. > > v2: additional updates (Alex) > > Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3814 > Fixes: dd299441654f ("drm/amdgpu: Rework retry fault removal") > Cc: [email protected] > Signed-off-by: Jon Doron <[email protected]> > Signed-off-by: Alex Deucher <[email protected]>
Reviewed-by: Timur Kristóf <[email protected]> Thank you for taking care of this! > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 7 ++++++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c index > 8e65fec9f534e..243d75917458a 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c > @@ -498,8 +498,13 @@ void amdgpu_gmc_filter_faults_remove(struct > amdgpu_device *adev, uint64_t addr, > > if (adev->irq.retry_cam_enabled) > return; > + else if (adev->irq.ih1.ring_size) > + ih = &adev->irq.ih1; > + else if (adev->irq.ih_soft.enabled) > + ih = &adev->irq.ih_soft; > + else > + return; > > - ih = &adev->irq.ih1; > /* Get the WPTR of the last entry in IH ring */ > last_wptr = amdgpu_ih_get_wptr(adev, ih); > /* Order wptr with ring data. */
