On 1/9/26 15:06, Tvrtko Ursulin wrote:
> \
> On 07/01/2026 14:11, Philipp Stanner wrote:
>> Happy 2026, Tvrtko!
>>
>>
>> On Wed, 2026-01-07 at 13:44 +0000, Tvrtko Ursulin wrote:
>>> drm_sched_entity_is_idle() is called when flushing the entity before
>>> destroying it and currently decides on the idle status based either on
>>> number of jobs in its queue being zero, or whether the entity is not part
>>> of any run-queue.
>>>
>>> If entity is not part of a run-queue it is implied it can have no jobs in
>>> its queue, from which it follows it is redundant to look at the both and
>>> we can simplify to only look at the queue.
>>>
>>> The list_empty() check was added in
>>> a875f58e237a ("drm/scheduler: stop setting rq to NULL")
>>> where it replaced the entity->rq check which was added in
>>> 741f01e636b7 ("drm/scheduler: Avoid using wait_event_killable for dying 
>>> process (V4)").
>>>
>>> Since for the submit race involving exiting entities, explicit handling
>>> via entity->stopped was added soon after in
>>> 62347a33001c ("drm/scheduler: Add stopped flag to drm_sched_entity")
>>> it indeed looks we are safe to remove the list_empty() check.
>>>
>>> This mean we can remove the memory barrier as well and fewer memory
>>> barriers the better.
>>
>> I am not convinced that this change is worth it.
>>
>> Not to long ago we discussed that the spsc_queue should be removed and
>> replaced by some sort of list, with proper locks. Christian has agreed
>> that this should fly.
>>
>> The spsc queue has only 1 user in the kernel and it's super hard to
>> understand how it's supposed to work and when any why barriers and
>> READ_ONCE's are necessary (that's, of course, also not documented in
>> the queue).
>>
>> Until that is done I don't really want to touch any of those memory
>> barriers..
> 
> (I had a branch with spsc gone more than a year ago but I abandoned it for 
> now since it contained some other too ambitious changes. So no complaints 
> from me. Who is doing it BTW?)
> 
> Back to the point - this patch can wait, no problem. To explain the context 
> though.
> 
> I wanted to get rid of looking at the list_empty here because I have a branch 
> which improves the flow for the 1:1 sched:entity drivers.
> 
> Why are the two related? If you remember in the fair scheduler series all the 
> run-queue stuff is nicely grouped in sched_rq.c and encapsulated in the rq 
> API, which made it possible to follow up with virtualizing the rq operations.
> 
> The yet another relevant thing is the patch I sent this week which removes 
> the special case where entity can be initialized with no schedulers.

If my memory not completely fails me the patch to reject initializing entities 
with no scheduler is actually a pre requisite of this patch here.

The list_empty() is most likely only there because we had entities initialized 
without a scheduler.

> 
> If we combined all these three pre-requisites, my branch allows the fully 
> invariant sched:entity and 1:1:1 sched:rq:entity. Run-queue vfuncs for the 
> 1:1 drivers become mostly no-ops (remove and pop entity are not needed at 
> all, while add just checks for entity->stopped). So all the list and tree 
> management needed by M:N drivers simply does not happen.

That sounds sane to me, but I would do one step at a time.

Regards,
Christian.

> 
> In that branch 1:1 entities do not take part in the rq->entities_list so, 
> going back to this patch, the list_empty check will be in the way.
> 
> Anyway, just for context, I will park this cleanup for now but you can mull 
> it over whether the bigger picture sounds interesting to you.
> 
> Regards,
> 
> Tvrtko
> 
>>>
>>> While at it, we add READ_ONCE annotation on the entity->stopped check to
>>> mark the unlocked read.
>>
>> This would effectively legalize the lockless access. But it is illegal
>> and undefined behavior. See drm_sched_fini().
>>
>> P.
>>
>>
>>>
>>> Signed-off-by: Tvrtko Ursulin <[email protected]>
>>> Cc: Christian König <[email protected]>
>>> Cc: Danilo Krummrich <[email protected]>
>>> Cc: Matthew Brost <[email protected]>
>>> Cc: Philipp Stanner <[email protected]>
>>> ---
>>>   drivers/gpu/drm/scheduler/sched_entity.c | 10 ++--------
>>>   1 file changed, 2 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
>>> b/drivers/gpu/drm/scheduler/sched_entity.c
>>> index bb7e5fc47f99..2ed84504cfd6 100644
>>> --- a/drivers/gpu/drm/scheduler/sched_entity.c
>>> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
>>> @@ -137,14 +137,8 @@ EXPORT_SYMBOL(drm_sched_entity_modify_sched);
>>>     static bool drm_sched_entity_is_idle(struct drm_sched_entity *entity)
>>>   {
>>> -    rmb(); /* for list_empty to work without lock */
>>> -
>>> -    if (list_empty(&entity->list) ||
>>> -        spsc_queue_count(&entity->job_queue) == 0 ||
>>> -        entity->stopped)
>>> -        return true;
>>> -
>>> -    return false;
>>> +    return spsc_queue_count(&entity->job_queue) == 0 ||
>>> +           READ_ONCE(entity->stopped);
>>>   }
>>>     /**
>>
> 

Reply via email to