sched: Convert drm scheduler to use a work queue rather than kthread

Matthew Brost Wed, 11 Jan 2023 10:55:51 -0800

On Wed, Jan 11, 2023 at 10:52:54AM -0800, John Harrison wrote:
> On 1/11/2023 10:07, Matthew Brost wrote:
> > On Wed, Jan 11, 2023 at 09:17:01AM +0000, Tvrtko Ursulin wrote:
> > > On 10/01/2023 19:01, Matthew Brost wrote:
> > > > On Tue, Jan 10, 2023 at 04:50:55PM +0000, Tvrtko Ursulin wrote:
> > > > > On 10/01/2023 15:55, Matthew Brost wrote:
> > > > > > On Tue, Jan 10, 2023 at 12:19:35PM +0000, Tvrtko Ursulin wrote:
> > > > > > > On 10/01/2023 11:28, Tvrtko Ursulin wrote:
> > > > > > > > On 09/01/2023 17:27, Jason Ekstrand wrote:
> > > > > > > > 
> > > > > > > > [snip]
> > > > > > > > 
> > > > > > > > >         >>> AFAICT it proposes to have 1:1 between 
> > > > > > > > > *userspace* created
> > > > > > > > >        contexts (per
> > > > > > > > >         >>> context _and_ engine) and drm_sched. I am not 
> > > > > > > > > sure avoiding
> > > > > > > > >        invasive changes
> > > > > > > > >         >>> to the shared code is in the spirit of the 
> > > > > > > > > overall idea and
> > > > > > > > > instead
> > > > > > > > >         >>> opportunity should be used to look at way to 
> > > > > > > > > refactor/improve
> > > > > > > > >        drm_sched.
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > Maybe?  I'm not convinced that what Xe is doing is an abuse 
> > > > > > > > > at all
> > > > > > > > > or really needs to drive a re-factor.  (More on that later.)
> > > > > > > > > There's only one real issue which is that it fires off 
> > > > > > > > > potentially a
> > > > > > > > > lot of kthreads. Even that's not that bad given that kthreads 
> > > > > > > > > are
> > > > > > > > > pretty light and you're not likely to have more kthreads than
> > > > > > > > > userspace threads which are much heavier.  Not ideal, but not 
> > > > > > > > > the
> > > > > > > > > end of the world either.  Definitely something we can/should
> > > > > > > > > optimize but if we went through with Xe without this patch, 
> > > > > > > > > it would
> > > > > > > > > probably be mostly ok.
> > > > > > > > > 
> > > > > > > > >         >> Yes, it is 1:1 *userspace* engines and drm_sched.
> > > > > > > > >         >>
> > > > > > > > >         >> I'm not really prepared to make large changes to 
> > > > > > > > > DRM scheduler
> > > > > > > > >        at the
> > > > > > > > >         >> moment for Xe as they are not really required nor 
> > > > > > > > > does Boris
> > > > > > > > >        seem they
> > > > > > > > >         >> will be required for his work either. I am 
> > > > > > > > > interested to see
> > > > > > > > >        what Boris
> > > > > > > > >         >> comes up with.
> > > > > > > > >         >>
> > > > > > > > >         >>> Even on the low level, the idea to replace 
> > > > > > > > > drm_sched threads
> > > > > > > > >        with workers
> > > > > > > > >         >>> has a few problems.
> > > > > > > > >         >>>
> > > > > > > > >         >>> To start with, the pattern of:
> > > > > > > > >         >>>
> > > > > > > > >         >>>    while (not_stopped) {
> > > > > > > > >         >>>     keep picking jobs
> > > > > > > > >         >>>    }
> > > > > > > > >         >>>
> > > > > > > > >         >>> Feels fundamentally in disagreement with workers 
> > > > > > > > > (while
> > > > > > > > >        obviously fits
> > > > > > > > >         >>> perfectly with the current kthread design).
> > > > > > > > >         >>
> > > > > > > > >         >> The while loop breaks and worker exists if no jobs 
> > > > > > > > > are ready.
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > > I'm not very familiar with workqueues. What are you saying 
> > > > > > > > > would fit
> > > > > > > > > better? One scheduling job per work item rather than one big 
> > > > > > > > > work
> > > > > > > > > item which handles all available jobs?
> > > > > > > > Yes and no, it indeed IMO does not fit to have a work item 
> > > > > > > > which is
> > > > > > > > potentially unbound in runtime. But it is a bit moot conceptual 
> > > > > > > > mismatch
> > > > > > > > because it is a worst case / theoretical, and I think due more
> > > > > > > > fundamental concerns.
> > > > > > > > 
> > > > > > > > If we have to go back to the low level side of things, I've 
> > > > > > > > picked this
> > > > > > > > random spot to consolidate what I have already mentioned and 
> > > > > > > > perhaps
> > > > > > > > expand.
> > > > > > > > 
> > > > > > > > To start with, let me pull out some thoughts from workqueue.rst:
> > > > > > > > 
> > > > > > > > """
> > > > > > > > Generally, work items are not expected to hog a CPU and consume 
> > > > > > > > many
> > > > > > > > cycles. That means maintaining just enough concurrency to 
> > > > > > > > prevent work
> > > > > > > > processing from stalling should be optimal.
> > > > > > > > """
> > > > > > > > 
> > > > > > > > For unbound queues:
> > > > > > > > """
> > > > > > > > The responsibility of regulating concurrency level is on the 
> > > > > > > > users.
> > > > > > > > """
> > > > > > > > 
> > > > > > > > Given the unbound queues will be spawned on demand to service 
> > > > > > > > all queued
> > > > > > > > work items (more interesting when mixing up with the 
> > > > > > > > system_unbound_wq),
> > > > > > > > in the proposed design the number of instantiated worker 
> > > > > > > > threads does
> > > > > > > > not correspond to the number of user threads (as you have 
> > > > > > > > elsewhere
> > > > > > > > stated), but pessimistically to the number of active user 
> > > > > > > > contexts. That
> > > > > > > > is the number which drives the maximum number of not-runnable 
> > > > > > > > jobs that
> > > > > > > > can become runnable at once, and hence spawn that many work 
> > > > > > > > items, and
> > > > > > > > in turn unbound worker threads.
> > > > > > > > 
> > > > > > > > Several problems there.
> > > > > > > > 
> > > > > > > > It is fundamentally pointless to have potentially that many 
> > > > > > > > more threads
> > > > > > > > than the number of CPU cores - it simply creates a scheduling 
> > > > > > > > storm.
> > > > > > > To make matters worse, if I follow the code correctly, all these 
> > > > > > > per user
> > > > > > > context worker thread / work items end up contending on the same 
> > > > > > > lock or
> > > > > > > circular buffer, both are one instance per GPU:
> > > > > > > 
> > > > > > > guc_engine_run_job
> > > > > > >     -> submit_engine
> > > > > > >        a) wq_item_append
> > > > > > >            -> wq_wait_for_space
> > > > > > >              -> msleep
> > > > > > a) is dedicated per xe_engine
> > > > > Hah true, what its for then? I thought throttling the LRCA ring is 
> > > > > done via:
> > > > > 
> > > > This is a per guc_id 'work queue' which is used for parallel submission
> > > > (e.g. multiple LRC tail values need to written atomically by the GuC).
> > > > Again in practice there should always be space.
> > > Speaking of guc id, where does blocking when none are available happen in
> > > the non parallel case?
> > > 
> > We have 64k guc_ids on native, 1k guc_ids with 64k VFs. Either way we
> > think that is more than enough and can just reject xe_engine creation if
> > we run out of guc_ids. If this proves to false, we can fix this but the
> > guc_id stealing the i915 is rather complicated and hopefully not needed.
> > 
> > We will limit the number of guc_ids allowed per user pid to reasonible
> > number to prevent a DoS. Elevated pids (e.g. IGTs) will be able do to
> > whatever they want.
> What about doorbells? As some point, we will have to start using those and
> they are a much more limited resource - 256 total and way less with VFs.
>


We haven't thought about that one yet, will figure this one out when we
implement this.

Matt

> John.
>

Re: [Intel-gfx] [RFC PATCH 04/20] drm/sched: Convert drm scheduler to use a work queue rather than kthread

Reply via email to