On 1/13/26 22:17, Alex Deucher wrote: > On Tue, Jan 13, 2026 at 8:57 AM Christian König > <[email protected]> wrote: >> >> Patches #1-#3: Reviewed-by: Christian König <[email protected]> >> >> Comment on patch #4 which also affects patches #5-#26. > > What was your comment on patch 4? I don't see that reply on the mailing list.
That we didn't used the job because we couldn't allocate memory while in the GPU reset. We could use GFP_ATOMIC when allocating from the GPU reset IB pool to solve this. Christian. > > Alex > >> >> Comment on patch #27 and #28. When #28 comes before #27 then that would >> potentially solve the issue with #27. >> >> Patches #31: Reviewed-by: Christian König <[email protected]> >> >> Patches #32-#40 that looks extremely questionable to me. I've intentionally >> removed that state from the job because it isn't job dependent and sometimes >> has inter-job meaning. >> >> Patch #41: Absolutely clear NAK! We have exercised that nonsense to the max >> and I'm clearly against doing that over and over again. Saving the ring >> content clearly seems to be the saver approach. >> >> Regards, >> Christian. >> >> On 1/8/26 15:48, Alex Deucher wrote: >>> This set contains a number of bug fixes and cleanups for >>> IB handling that I worked on over the holidays. >>> >>> Patches 1-2: >>> Simple bug fixes. >>> >>> Patches 3-26: >>> Removes the direct submit path for IBs and requires >>> that all IB submissions use a job structure. This >>> greatly simplifies the IB submission code. >>> >>> Patches 27-42: >>> Split IB state setup and ring emission. This keeps all >>> of the IB state in the job. This greatly simplifies >>> re-emission of non-timed-out jobs after a ring reset and >>> allows for re-emission multiple times if multiple resets >>> happen in a row. It also properly handles the dma fence >>> error handling for timedout jobs with adapter resets. >>> >>> Alex Deucher (42): >>> drm/amdgpu/jpeg4.0.3: remove redundant sr-iov check >>> drm/amdgpu: fix error handling in ib_schedule() >>> drm/amdgpu: add new job ids >>> drm/amdgpu/vpe: switch to using job for IBs >>> drm/amdgpu/gfx6: switch to using job for IBs >>> drm/amdgpu/gfx7: switch to using job for IBs >>> drm/amdgpu/gfx8: switch to using job for IBs >>> drm/amdgpu/gfx9: switch to using job for IBs >>> drm/amdgpu/gfx9.4.2: switch to using job for IBs >>> drm/amdgpu/gfx9.4.3: switch to using job for IBs >>> drm/amdgpu/gfx10: switch to using job for IBs >>> drm/amdgpu/gfx11: switch to using job for IBs >>> drm/amdgpu/gfx12: switch to using job for IBs >>> drm/amdgpu/gfx12.1: switch to using job for IBs >>> drm/amdgpu/si_dma: switch to using job for IBs >>> drm/amdgpu/cik_sdma: switch to using job for IBs >>> drm/amdgpu/sdma2.4: switch to using job for IBs >>> drm/amdgpu/sdma3: switch to using job for IBs >>> drm/amdgpu/sdma4: switch to using job for IBs >>> drm/amdgpu/sdma4.4.2: switch to using job for IBs >>> drm/amdgpu/sdma5: switch to using job for IBs >>> drm/amdgpu/sdma5.2: switch to using job for IBs >>> drm/amdgpu/sdma6: switch to using job for IBs >>> drm/amdgpu/sdma7: switch to using job for IBs >>> drm/amdgpu/sdma7.1: switch to using job for IBs >>> drm/amdgpu: require a job to schedule an IB >>> drm/amdgpu: mark fences with errors before ring reset >>> drm/amdgpu: rename amdgpu_fence_driver_guilty_force_completion() >>> drm/amdgpu: don't call drm_sched_stop/start() in asic reset >>> drm/amdgpu: drop drm_sched_increase_karma() >>> drm/amdgpu: plumb timedout fence through to force completion >>> drm/amdgpu: change function signature for emit_pipeline_sync() >>> drm/amdgpu: drop extra parameter for vm_flush >>> drm/amdgpu: move need_ctx_switch into amdgpu_job >>> drm/amdgpu: store vm flush state in amdgpu_job >>> drm/amdgpu: split fence init and emit logic >>> drm/amdgpu: split vm flush and vm flush emit logic >>> drm/amdgpu: split ib schedule and ib emit logic >>> drm/amdgpu: move drm sched stop/start into amdgpu_job_timedout() >>> drm/amdgpu: add an all_instance_rings_reset ring flag >>> drm/amdgpu: rework reset reemit handling >>> drm/amdgpu: simplify per queue reset code >>> >>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 2 +- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 2 +- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 13 +- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 136 +++------ >>> drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 289 ++++++++++---------- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 40 ++- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_job.h | 13 + >>> drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 67 ----- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 37 +-- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c | 4 +- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c | 2 +- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 21 +- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 141 +++++----- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 3 +- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_vpe.c | 45 +-- >>> drivers/gpu/drm/amd/amdgpu/cik_sdma.c | 36 ++- >>> drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 41 ++- >>> drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 41 ++- >>> drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c | 41 ++- >>> drivers/gpu/drm/amd/amdgpu/gfx_v12_1.c | 33 ++- >>> drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c | 28 +- >>> drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c | 30 +- >>> drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 143 +++++----- >>> drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 149 +++++----- >>> drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c | 26 +- >>> drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 38 +-- >>> drivers/gpu/drm/amd/amdgpu/jpeg_v2_0.c | 3 +- >>> drivers/gpu/drm/amd/amdgpu/jpeg_v2_5.c | 3 +- >>> drivers/gpu/drm/amd/amdgpu/jpeg_v3_0.c | 3 +- >>> drivers/gpu/drm/amd/amdgpu/jpeg_v4_0.c | 3 +- >>> drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_3.c | 6 +- >>> drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_5.c | 3 +- >>> drivers/gpu/drm/amd/amdgpu/jpeg_v5_0_0.c | 3 +- >>> drivers/gpu/drm/amd/amdgpu/jpeg_v5_0_1.c | 3 +- >>> drivers/gpu/drm/amd/amdgpu/jpeg_v5_3_0.c | 3 +- >>> drivers/gpu/drm/amd/amdgpu/sdma_v2_4.c | 43 +-- >>> drivers/gpu/drm/amd/amdgpu/sdma_v3_0.c | 43 +-- >>> drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 43 +-- >>> drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c | 45 +-- >>> drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c | 46 ++-- >>> drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c | 45 +-- >>> drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c | 45 +-- >>> drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c | 45 +-- >>> drivers/gpu/drm/amd/amdgpu/sdma_v7_1.c | 45 +-- >>> drivers/gpu/drm/amd/amdgpu/si_dma.c | 34 ++- >>> drivers/gpu/drm/amd/amdgpu/uvd_v6_0.c | 8 +- >>> drivers/gpu/drm/amd/amdgpu/vce_v3_0.c | 4 +- >>> drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c | 2 + >>> drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c | 2 + >>> drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c | 3 +- >>> drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c | 4 +- >>> drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c | 3 +- >>> drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c | 3 +- >>> drivers/gpu/drm/amd/amdgpu/vcn_v5_0_1.c | 4 +- >>> 54 files changed, 952 insertions(+), 966 deletions(-) >>> >>
