Happy 2026, Tvrtko!
On Wed, 2026-01-07 at 13:44 +0000, Tvrtko Ursulin wrote:
> drm_sched_entity_is_idle() is called when flushing the entity before
> destroying it and currently decides on the idle status based either on
> number of jobs in its queue being zero, or whether the entity is not part
> of any run-queue.
>
> If entity is not part of a run-queue it is implied it can have no jobs in
> its queue, from which it follows it is redundant to look at the both and
> we can simplify to only look at the queue.
>
> The list_empty() check was added in
> a875f58e237a ("drm/scheduler: stop setting rq to NULL")
> where it replaced the entity->rq check which was added in
> 741f01e636b7 ("drm/scheduler: Avoid using wait_event_killable for dying
> process (V4)").
>
> Since for the submit race involving exiting entities, explicit handling
> via entity->stopped was added soon after in
> 62347a33001c ("drm/scheduler: Add stopped flag to drm_sched_entity")
> it indeed looks we are safe to remove the list_empty() check.
>
> This mean we can remove the memory barrier as well and fewer memory
> barriers the better.
I am not convinced that this change is worth it.
Not to long ago we discussed that the spsc_queue should be removed and
replaced by some sort of list, with proper locks. Christian has agreed
that this should fly.
The spsc queue has only 1 user in the kernel and it's super hard to
understand how it's supposed to work and when any why barriers and
READ_ONCE's are necessary (that's, of course, also not documented in
the queue).
Until that is done I don't really want to touch any of those memory
barriers..
>
> While at it, we add READ_ONCE annotation on the entity->stopped check to
> mark the unlocked read.
This would effectively legalize the lockless access. But it is illegal
and undefined behavior. See drm_sched_fini().
P.
>
> Signed-off-by: Tvrtko Ursulin <[email protected]>
> Cc: Christian König <[email protected]>
> Cc: Danilo Krummrich <[email protected]>
> Cc: Matthew Brost <[email protected]>
> Cc: Philipp Stanner <[email protected]>
> ---
> drivers/gpu/drm/scheduler/sched_entity.c | 10 ++--------
> 1 file changed, 2 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c
> b/drivers/gpu/drm/scheduler/sched_entity.c
> index bb7e5fc47f99..2ed84504cfd6 100644
> --- a/drivers/gpu/drm/scheduler/sched_entity.c
> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> @@ -137,14 +137,8 @@ EXPORT_SYMBOL(drm_sched_entity_modify_sched);
>
> static bool drm_sched_entity_is_idle(struct drm_sched_entity *entity)
> {
> - rmb(); /* for list_empty to work without lock */
> -
> - if (list_empty(&entity->list) ||
> - spsc_queue_count(&entity->job_queue) == 0 ||
> - entity->stopped)
> - return true;
> -
> - return false;
> + return spsc_queue_count(&entity->job_queue) == 0 ||
> + READ_ONCE(entity->stopped);
> }
>
> /**