On 1/9/26 15:06, Tvrtko Ursulin wrote:
> \
> On 07/01/2026 14:11, Philipp Stanner wrote:
>> Happy 2026, Tvrtko!
>>
>>
>> On Wed, 2026-01-07 at 13:44 +0000, Tvrtko Ursulin wrote:
>>> drm_sched_entity_is_idle() is called when flushing the entity before
>>> destroying it and currently decides on the idle status based either on
>>> number of jobs in its queue being zero, or whether the entity is not part
>>> of any run-queue.
>>>
>>> If entity is not part of a run-queue it is implied it can have no jobs in
>>> its queue, from which it follows it is redundant to look at the both and
>>> we can simplify to only look at the queue.
>>>
>>> The list_empty() check was added in
>>> a875f58e237a ("drm/scheduler: stop setting rq to NULL")
>>> where it replaced the entity->rq check which was added in
>>> 741f01e636b7 ("drm/scheduler: Avoid using wait_event_killable for dying
>>> process (V4)").
>>>
>>> Since for the submit race involving exiting entities, explicit handling
>>> via entity->stopped was added soon after in
>>> 62347a33001c ("drm/scheduler: Add stopped flag to drm_sched_entity")
>>> it indeed looks we are safe to remove the list_empty() check.
>>>
>>> This mean we can remove the memory barrier as well and fewer memory
>>> barriers the better.
>>
>> I am not convinced that this change is worth it.
>>
>> Not to long ago we discussed that the spsc_queue should be removed and
>> replaced by some sort of list, with proper locks. Christian has agreed
>> that this should fly.
>>
>> The spsc queue has only 1 user in the kernel and it's super hard to
>> understand how it's supposed to work and when any why barriers and
>> READ_ONCE's are necessary (that's, of course, also not documented in
>> the queue).
>>
>> Until that is done I don't really want to touch any of those memory
>> barriers..
>
> (I had a branch with spsc gone more than a year ago but I abandoned it for
> now since it contained some other too ambitious changes. So no complaints
> from me. Who is doing it BTW?)
>
> Back to the point - this patch can wait, no problem. To explain the context
> though.
>
> I wanted to get rid of looking at the list_empty here because I have a branch
> which improves the flow for the 1:1 sched:entity drivers.
>
> Why are the two related? If you remember in the fair scheduler series all the
> run-queue stuff is nicely grouped in sched_rq.c and encapsulated in the rq
> API, which made it possible to follow up with virtualizing the rq operations.
>
> The yet another relevant thing is the patch I sent this week which removes
> the special case where entity can be initialized with no schedulers.
If my memory not completely fails me the patch to reject initializing entities
with no scheduler is actually a pre requisite of this patch here.
The list_empty() is most likely only there because we had entities initialized
without a scheduler.
>
> If we combined all these three pre-requisites, my branch allows the fully
> invariant sched:entity and 1:1:1 sched:rq:entity. Run-queue vfuncs for the
> 1:1 drivers become mostly no-ops (remove and pop entity are not needed at
> all, while add just checks for entity->stopped). So all the list and tree
> management needed by M:N drivers simply does not happen.
That sounds sane to me, but I would do one step at a time.
Regards,
Christian.
>
> In that branch 1:1 entities do not take part in the rq->entities_list so,
> going back to this patch, the list_empty check will be in the way.
>
> Anyway, just for context, I will park this cleanup for now but you can mull
> it over whether the bigger picture sounds interesting to you.
>
> Regards,
>
> Tvrtko
>
>>>
>>> While at it, we add READ_ONCE annotation on the entity->stopped check to
>>> mark the unlocked read.
>>
>> This would effectively legalize the lockless access. But it is illegal
>> and undefined behavior. See drm_sched_fini().
>>
>> P.
>>
>>
>>>
>>> Signed-off-by: Tvrtko Ursulin <[email protected]>
>>> Cc: Christian König <[email protected]>
>>> Cc: Danilo Krummrich <[email protected]>
>>> Cc: Matthew Brost <[email protected]>
>>> Cc: Philipp Stanner <[email protected]>
>>> ---
>>> drivers/gpu/drm/scheduler/sched_entity.c | 10 ++--------
>>> 1 file changed, 2 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c
>>> b/drivers/gpu/drm/scheduler/sched_entity.c
>>> index bb7e5fc47f99..2ed84504cfd6 100644
>>> --- a/drivers/gpu/drm/scheduler/sched_entity.c
>>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
>>> @@ -137,14 +137,8 @@ EXPORT_SYMBOL(drm_sched_entity_modify_sched);
>>> static bool drm_sched_entity_is_idle(struct drm_sched_entity *entity)
>>> {
>>> - rmb(); /* for list_empty to work without lock */
>>> -
>>> - if (list_empty(&entity->list) ||
>>> - spsc_queue_count(&entity->job_queue) == 0 ||
>>> - entity->stopped)
>>> - return true;
>>> -
>>> - return false;
>>> + return spsc_queue_count(&entity->job_queue) == 0 ||
>>> + READ_ONCE(entity->stopped);
>>> }
>>> /**
>>
>