On 16.07.25 11:43, cao, lin wrote: > [AMD Official Use Only - AMD Internal Distribution Only] > > > Hi Philipp, > > Thank you for the review. I found that this optimization was introduced 9 > years ago in commit 777dbd458c89d4ca74a659f85ffb5bc817f29a35 ("drm/amdgpu: > drop a dummy wakeup scheduler"). > > Given that the codebase has undergone significant changes over these 9 years. > May I ask if I still need to include the Fixes: tag?
Most likely not, a CC: stable tag should be sufficient. Regards, Christian. > > Thanks, > Lin > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ > *From:* Philipp Stanner <pha...@mailbox.org> > *Sent:* Wednesday, July 16, 2025 16:33 > *To:* cao, lin <lin....@amd.com>; dri-devel@lists.freedesktop.org > <dri-devel@lists.freedesktop.org> > *Cc:* Yin, ZhenGuo (Chris) <zhenguo....@amd.com>; Deng, Emily > <emily.d...@amd.com>; Koenig, Christian <christian.koe...@amd.com>; > pha...@kernel.org <pha...@kernel.org>; d...@kernel.org <d...@kernel.org>; > matthew.br...@intel.com <matthew.br...@intel.com> > *Subject:* Re: [PATCH] drm/sched: Remove optimization that causes hang when > killing dependent jobs > > On Tue, 2025-07-15 at 21:50 +0800, Lin.Cao wrote: >> When application A submits jobs and application B submits a job with >> a >> dependency on A's fence, the normal flow wakes up the scheduler after >> processing each job. However, the optimization in >> drm_sched_entity_add_dependency_cb() uses a callback that only clears >> dependencies without waking up the scheduler. >> >> When application A is killed before its jobs can run, the callback >> gets >> triggered but only clears the dependency without waking up the >> scheduler, >> causing the scheduler to enter sleep state and application B to hang. >> >> Remove the optimization by deleting drm_sched_entity_clear_dep() and >> its >> usage, ensuring the scheduler is always woken up when dependencies >> are >> cleared. >> >> Signed-off-by: Lin.Cao <linca...@amd.com> > > This is, still, a bug fix, so it needs Fixes: and Cc: stable :) > > Could also include a Suggested-by: Christian > > P. > >> --- >> drivers/gpu/drm/scheduler/sched_entity.c | 21 ++------------------- >> 1 file changed, 2 insertions(+), 19 deletions(-) >> >> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c >> b/drivers/gpu/drm/scheduler/sched_entity.c >> index e671aa241720..ac678de7fe5e 100644 >> --- a/drivers/gpu/drm/scheduler/sched_entity.c >> +++ b/drivers/gpu/drm/scheduler/sched_entity.c >> @@ -355,17 +355,6 @@ void drm_sched_entity_destroy(struct >> drm_sched_entity *entity) >> } >> EXPORT_SYMBOL(drm_sched_entity_destroy); >> >> -/* drm_sched_entity_clear_dep - callback to clear the entities >> dependency */ >> -static void drm_sched_entity_clear_dep(struct dma_fence *f, >> - struct dma_fence_cb *cb) >> -{ >> - struct drm_sched_entity *entity = >> - container_of(cb, struct drm_sched_entity, cb); >> - >> - entity->dependency = NULL; >> - dma_fence_put(f); >> -} >> - >> /* >> * drm_sched_entity_wakeup - callback to clear the entity's >> dependency and >> * wake up the scheduler >> @@ -376,7 +365,8 @@ static void drm_sched_entity_wakeup(struct >> dma_fence *f, >> struct drm_sched_entity *entity = >> container_of(cb, struct drm_sched_entity, cb); >> >> - drm_sched_entity_clear_dep(f, cb); >> + entity->dependency = NULL; >> + dma_fence_put(f); >> drm_sched_wakeup(entity->rq->sched); >> } >> >> @@ -429,13 +419,6 @@ static bool >> drm_sched_entity_add_dependency_cb(struct drm_sched_entity *entity) >> fence = dma_fence_get(&s_fence->scheduled); >> dma_fence_put(entity->dependency); >> entity->dependency = fence; >> - if (!dma_fence_add_callback(fence, &entity->cb, >> - >> drm_sched_entity_clear_dep)) >> - return true; >> - >> - /* Ignore it when it is already scheduled */ >> - dma_fence_put(fence); >> - return false; >> } >> >> if (!dma_fence_add_callback(entity->dependency, &entity->cb, >