Re: [PATCH v4 02/10] drm/sched: Convert drm scheduler to use a work queue rather than kthread
On 2023-10-06 11:14, Matthew Brost wrote: > On Fri, Oct 06, 2023 at 08:59:15AM +0100, Tvrtko Ursulin wrote: >> >> On 05/10/2023 05:13, Luben Tuikov wrote: >>> On 2023-10-04 23:33, Matthew Brost wrote: On Tue, Sep 26, 2023 at 11:32:10PM -0400, Luben Tuikov wrote: > Hi, > > On 2023-09-19 01:01, Matthew Brost wrote: >> In XE, the new Intel GPU driver, a choice has made to have a 1 to 1 >> mapping between a drm_gpu_scheduler and drm_sched_entity. At first this >> seems a bit odd but let us explain the reasoning below. >> >> 1. In XE the submission order from multiple drm_sched_entity is not >> guaranteed to be the same completion even if targeting the same hardware >> engine. This is because in XE we have a firmware scheduler, the GuC, >> which allowed to reorder, timeslice, and preempt submissions. If a using >> shared drm_gpu_scheduler across multiple drm_sched_entity, the TDR falls >> apart as the TDR expects submission order == completion order. Using a >> dedicated drm_gpu_scheduler per drm_sched_entity solve this problem. >> >> 2. In XE submissions are done via programming a ring buffer (circular >> buffer), a drm_gpu_scheduler provides a limit on number of jobs, if the >> limit of number jobs is set to RING_SIZE / MAX_SIZE_PER_JOB we get flow >> control on the ring for free. >> >> A problem with this design is currently a drm_gpu_scheduler uses a >> kthread for submission / job cleanup. This doesn't scale if a large >> number of drm_gpu_scheduler are used. To work around the scaling issue, >> use a worker rather than kthread for submission / job cleanup. >> >> v2: >>- (Rob Clark) Fix msm build >>- Pass in run work queue >> v3: >>- (Boris) don't have loop in worker >> v4: >>- (Tvrtko) break out submit ready, stop, start helpers into own patch >> v5: >>- (Boris) default to ordered work queue >> >> Signed-off-by: Matthew Brost >> --- >> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +- >> drivers/gpu/drm/etnaviv/etnaviv_sched.c| 2 +- >> drivers/gpu/drm/lima/lima_sched.c | 2 +- >> drivers/gpu/drm/msm/msm_ringbuffer.c | 2 +- >> drivers/gpu/drm/nouveau/nouveau_sched.c| 2 +- >> drivers/gpu/drm/panfrost/panfrost_job.c| 2 +- >> drivers/gpu/drm/scheduler/sched_main.c | 118 ++--- >> drivers/gpu/drm/v3d/v3d_sched.c| 10 +- >> include/drm/gpu_scheduler.h| 14 ++- >> 9 files changed, 79 insertions(+), 75 deletions(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> index e366f61c3aed..16f3cfe1574a 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >> @@ -2279,7 +2279,7 @@ static int amdgpu_device_init_schedulers(struct >> amdgpu_device *adev) >> break; >> } >> -r = drm_sched_init(&ring->sched, &amdgpu_sched_ops, >> +r = drm_sched_init(&ring->sched, &amdgpu_sched_ops, >> NULL, >> ring->num_hw_submission, 0, >> timeout, adev->reset_domain->wq, >> ring->sched_score, ring->name, >> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c >> b/drivers/gpu/drm/etnaviv/etnaviv_sched.c >> index 345fec6cb1a4..618a804ddc34 100644 >> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c >> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c >> @@ -134,7 +134,7 @@ int etnaviv_sched_init(struct etnaviv_gpu *gpu) >> { >> int ret; >> -ret = drm_sched_init(&gpu->sched, &etnaviv_sched_ops, >> +ret = drm_sched_init(&gpu->sched, &etnaviv_sched_ops, NULL, >> etnaviv_hw_jobs_limit, >> etnaviv_job_hang_limit, >> msecs_to_jiffies(500), NULL, NULL, >> dev_name(gpu->dev), gpu->dev); >> diff --git a/drivers/gpu/drm/lima/lima_sched.c >> b/drivers/gpu/drm/lima/lima_sched.c >> index ffd91a5ee299..8d858aed0e56 100644 >> --- a/drivers/gpu/drm/lima/lima_sched.c >> +++ b/drivers/gpu/drm/lima/lima_sched.c >> @@ -488,7 +488,7 @@ int lima_sched_pipe_init(struct lima_sched_pipe >> *pipe, const char *name) >> INIT_WORK(&pipe->recover_work, lima_sched_recover_work); >> -return drm_sched_init(&pipe->base, &lima_sched_ops, 1, >> +return drm_sched_init(&pipe->base, &lima_sched_ops, NULL, 1, >>lima_job_hang_limit, >>msecs_to_jiffies(timeout), NULL, >>NU
Re: [PATCH v4 02/10] drm/sched: Convert drm scheduler to use a work queue rather than kthread
On 2023-10-06 03:59, Tvrtko Ursulin wrote: > > On 05/10/2023 05:13, Luben Tuikov wrote: >> On 2023-10-04 23:33, Matthew Brost wrote: >>> On Tue, Sep 26, 2023 at 11:32:10PM -0400, Luben Tuikov wrote: Hi, On 2023-09-19 01:01, Matthew Brost wrote: > In XE, the new Intel GPU driver, a choice has made to have a 1 to 1 > mapping between a drm_gpu_scheduler and drm_sched_entity. At first this > seems a bit odd but let us explain the reasoning below. > > 1. In XE the submission order from multiple drm_sched_entity is not > guaranteed to be the same completion even if targeting the same hardware > engine. This is because in XE we have a firmware scheduler, the GuC, > which allowed to reorder, timeslice, and preempt submissions. If a using > shared drm_gpu_scheduler across multiple drm_sched_entity, the TDR falls > apart as the TDR expects submission order == completion order. Using a > dedicated drm_gpu_scheduler per drm_sched_entity solve this problem. > > 2. In XE submissions are done via programming a ring buffer (circular > buffer), a drm_gpu_scheduler provides a limit on number of jobs, if the > limit of number jobs is set to RING_SIZE / MAX_SIZE_PER_JOB we get flow > control on the ring for free. > > A problem with this design is currently a drm_gpu_scheduler uses a > kthread for submission / job cleanup. This doesn't scale if a large > number of drm_gpu_scheduler are used. To work around the scaling issue, > use a worker rather than kthread for submission / job cleanup. > > v2: >- (Rob Clark) Fix msm build >- Pass in run work queue > v3: >- (Boris) don't have loop in worker > v4: >- (Tvrtko) break out submit ready, stop, start helpers into own patch > v5: >- (Boris) default to ordered work queue > > Signed-off-by: Matthew Brost > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +- > drivers/gpu/drm/etnaviv/etnaviv_sched.c| 2 +- > drivers/gpu/drm/lima/lima_sched.c | 2 +- > drivers/gpu/drm/msm/msm_ringbuffer.c | 2 +- > drivers/gpu/drm/nouveau/nouveau_sched.c| 2 +- > drivers/gpu/drm/panfrost/panfrost_job.c| 2 +- > drivers/gpu/drm/scheduler/sched_main.c | 118 ++--- > drivers/gpu/drm/v3d/v3d_sched.c| 10 +- > include/drm/gpu_scheduler.h| 14 ++- > 9 files changed, 79 insertions(+), 75 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > index e366f61c3aed..16f3cfe1574a 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > @@ -2279,7 +2279,7 @@ static int amdgpu_device_init_schedulers(struct > amdgpu_device *adev) > break; > } > > - r = drm_sched_init(&ring->sched, &amdgpu_sched_ops, > + r = drm_sched_init(&ring->sched, &amdgpu_sched_ops, NULL, > ring->num_hw_submission, 0, > timeout, adev->reset_domain->wq, > ring->sched_score, ring->name, > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c > b/drivers/gpu/drm/etnaviv/etnaviv_sched.c > index 345fec6cb1a4..618a804ddc34 100644 > --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c > +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c > @@ -134,7 +134,7 @@ int etnaviv_sched_init(struct etnaviv_gpu *gpu) > { > int ret; > > - ret = drm_sched_init(&gpu->sched, &etnaviv_sched_ops, > + ret = drm_sched_init(&gpu->sched, &etnaviv_sched_ops, NULL, >etnaviv_hw_jobs_limit, > etnaviv_job_hang_limit, >msecs_to_jiffies(500), NULL, NULL, >dev_name(gpu->dev), gpu->dev); > diff --git a/drivers/gpu/drm/lima/lima_sched.c > b/drivers/gpu/drm/lima/lima_sched.c > index ffd91a5ee299..8d858aed0e56 100644 > --- a/drivers/gpu/drm/lima/lima_sched.c > +++ b/drivers/gpu/drm/lima/lima_sched.c > @@ -488,7 +488,7 @@ int lima_sched_pipe_init(struct lima_sched_pipe > *pipe, const char *name) > > INIT_WORK(&pipe->recover_work, lima_sched_recover_work); > > - return drm_sched_init(&pipe->base, &lima_sched_ops, 1, > + return drm_sched_init(&pipe->base, &lima_sched_ops, NULL, 1, > lima_job_hang_limit, > msecs_to_jiffies(timeout), NULL, > NULL, name, pipe->ldev->dev); > diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.c > b/drivers/gpu/drm/msm/msm_ringbuffer.c > index 40c0bc35a44c..b8
Re: [PATCH v4 02/10] drm/sched: Convert drm scheduler to use a work queue rather than kthread
On Fri, Oct 06, 2023 at 08:59:15AM +0100, Tvrtko Ursulin wrote: > > On 05/10/2023 05:13, Luben Tuikov wrote: > > On 2023-10-04 23:33, Matthew Brost wrote: > > > On Tue, Sep 26, 2023 at 11:32:10PM -0400, Luben Tuikov wrote: > > > > Hi, > > > > > > > > On 2023-09-19 01:01, Matthew Brost wrote: > > > > > In XE, the new Intel GPU driver, a choice has made to have a 1 to 1 > > > > > mapping between a drm_gpu_scheduler and drm_sched_entity. At first > > > > > this > > > > > seems a bit odd but let us explain the reasoning below. > > > > > > > > > > 1. In XE the submission order from multiple drm_sched_entity is not > > > > > guaranteed to be the same completion even if targeting the same > > > > > hardware > > > > > engine. This is because in XE we have a firmware scheduler, the GuC, > > > > > which allowed to reorder, timeslice, and preempt submissions. If a > > > > > using > > > > > shared drm_gpu_scheduler across multiple drm_sched_entity, the TDR > > > > > falls > > > > > apart as the TDR expects submission order == completion order. Using a > > > > > dedicated drm_gpu_scheduler per drm_sched_entity solve this problem. > > > > > > > > > > 2. In XE submissions are done via programming a ring buffer (circular > > > > > buffer), a drm_gpu_scheduler provides a limit on number of jobs, if > > > > > the > > > > > limit of number jobs is set to RING_SIZE / MAX_SIZE_PER_JOB we get > > > > > flow > > > > > control on the ring for free. > > > > > > > > > > A problem with this design is currently a drm_gpu_scheduler uses a > > > > > kthread for submission / job cleanup. This doesn't scale if a large > > > > > number of drm_gpu_scheduler are used. To work around the scaling > > > > > issue, > > > > > use a worker rather than kthread for submission / job cleanup. > > > > > > > > > > v2: > > > > >- (Rob Clark) Fix msm build > > > > >- Pass in run work queue > > > > > v3: > > > > >- (Boris) don't have loop in worker > > > > > v4: > > > > >- (Tvrtko) break out submit ready, stop, start helpers into own > > > > > patch > > > > > v5: > > > > >- (Boris) default to ordered work queue > > > > > > > > > > Signed-off-by: Matthew Brost > > > > > --- > > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +- > > > > > drivers/gpu/drm/etnaviv/etnaviv_sched.c| 2 +- > > > > > drivers/gpu/drm/lima/lima_sched.c | 2 +- > > > > > drivers/gpu/drm/msm/msm_ringbuffer.c | 2 +- > > > > > drivers/gpu/drm/nouveau/nouveau_sched.c| 2 +- > > > > > drivers/gpu/drm/panfrost/panfrost_job.c| 2 +- > > > > > drivers/gpu/drm/scheduler/sched_main.c | 118 > > > > > ++--- > > > > > drivers/gpu/drm/v3d/v3d_sched.c| 10 +- > > > > > include/drm/gpu_scheduler.h| 14 ++- > > > > > 9 files changed, 79 insertions(+), 75 deletions(-) > > > > > > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > > > > index e366f61c3aed..16f3cfe1574a 100644 > > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > > > > @@ -2279,7 +2279,7 @@ static int amdgpu_device_init_schedulers(struct > > > > > amdgpu_device *adev) > > > > > break; > > > > > } > > > > > - r = drm_sched_init(&ring->sched, &amdgpu_sched_ops, > > > > > + r = drm_sched_init(&ring->sched, &amdgpu_sched_ops, > > > > > NULL, > > > > > ring->num_hw_submission, 0, > > > > > timeout, adev->reset_domain->wq, > > > > > ring->sched_score, ring->name, > > > > > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c > > > > > b/drivers/gpu/drm/etnaviv/etnaviv_sched.c > > > > > index 345fec6cb1a4..618a804ddc34 100644 > > > > > --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c > > > > > +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c > > > > > @@ -134,7 +134,7 @@ int etnaviv_sched_init(struct etnaviv_gpu *gpu) > > > > > { > > > > > int ret; > > > > > - ret = drm_sched_init(&gpu->sched, &etnaviv_sched_ops, > > > > > + ret = drm_sched_init(&gpu->sched, &etnaviv_sched_ops, NULL, > > > > >etnaviv_hw_jobs_limit, > > > > > etnaviv_job_hang_limit, > > > > >msecs_to_jiffies(500), NULL, NULL, > > > > >dev_name(gpu->dev), gpu->dev); > > > > > diff --git a/drivers/gpu/drm/lima/lima_sched.c > > > > > b/drivers/gpu/drm/lima/lima_sched.c > > > > > index ffd91a5ee299..8d858aed0e56 100644 > > > > > --- a/drivers/gpu/drm/lima/lima_sched.c > > > > > +++ b/drivers/gpu/drm/lima/lima_sched.c > > > > > @@ -488,7 +488,7 @@ int lima_sched_pipe_init(struct lima_sched_pipe > > > > > *pipe, const char *name) > > > > > INIT_WORK(&pipe->recover_work, lima_sched_recover_work); > > > > > - return drm_sched_init(&pipe->b
Re: [PATCH v4 02/10] drm/sched: Convert drm scheduler to use a work queue rather than kthread
On 05/10/2023 05:13, Luben Tuikov wrote: On 2023-10-04 23:33, Matthew Brost wrote: On Tue, Sep 26, 2023 at 11:32:10PM -0400, Luben Tuikov wrote: Hi, On 2023-09-19 01:01, Matthew Brost wrote: In XE, the new Intel GPU driver, a choice has made to have a 1 to 1 mapping between a drm_gpu_scheduler and drm_sched_entity. At first this seems a bit odd but let us explain the reasoning below. 1. In XE the submission order from multiple drm_sched_entity is not guaranteed to be the same completion even if targeting the same hardware engine. This is because in XE we have a firmware scheduler, the GuC, which allowed to reorder, timeslice, and preempt submissions. If a using shared drm_gpu_scheduler across multiple drm_sched_entity, the TDR falls apart as the TDR expects submission order == completion order. Using a dedicated drm_gpu_scheduler per drm_sched_entity solve this problem. 2. In XE submissions are done via programming a ring buffer (circular buffer), a drm_gpu_scheduler provides a limit on number of jobs, if the limit of number jobs is set to RING_SIZE / MAX_SIZE_PER_JOB we get flow control on the ring for free. A problem with this design is currently a drm_gpu_scheduler uses a kthread for submission / job cleanup. This doesn't scale if a large number of drm_gpu_scheduler are used. To work around the scaling issue, use a worker rather than kthread for submission / job cleanup. v2: - (Rob Clark) Fix msm build - Pass in run work queue v3: - (Boris) don't have loop in worker v4: - (Tvrtko) break out submit ready, stop, start helpers into own patch v5: - (Boris) default to ordered work queue Signed-off-by: Matthew Brost --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +- drivers/gpu/drm/etnaviv/etnaviv_sched.c| 2 +- drivers/gpu/drm/lima/lima_sched.c | 2 +- drivers/gpu/drm/msm/msm_ringbuffer.c | 2 +- drivers/gpu/drm/nouveau/nouveau_sched.c| 2 +- drivers/gpu/drm/panfrost/panfrost_job.c| 2 +- drivers/gpu/drm/scheduler/sched_main.c | 118 ++--- drivers/gpu/drm/v3d/v3d_sched.c| 10 +- include/drm/gpu_scheduler.h| 14 ++- 9 files changed, 79 insertions(+), 75 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index e366f61c3aed..16f3cfe1574a 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -2279,7 +2279,7 @@ static int amdgpu_device_init_schedulers(struct amdgpu_device *adev) break; } - r = drm_sched_init(&ring->sched, &amdgpu_sched_ops, + r = drm_sched_init(&ring->sched, &amdgpu_sched_ops, NULL, ring->num_hw_submission, 0, timeout, adev->reset_domain->wq, ring->sched_score, ring->name, diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c b/drivers/gpu/drm/etnaviv/etnaviv_sched.c index 345fec6cb1a4..618a804ddc34 100644 --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c @@ -134,7 +134,7 @@ int etnaviv_sched_init(struct etnaviv_gpu *gpu) { int ret; - ret = drm_sched_init(&gpu->sched, &etnaviv_sched_ops, + ret = drm_sched_init(&gpu->sched, &etnaviv_sched_ops, NULL, etnaviv_hw_jobs_limit, etnaviv_job_hang_limit, msecs_to_jiffies(500), NULL, NULL, dev_name(gpu->dev), gpu->dev); diff --git a/drivers/gpu/drm/lima/lima_sched.c b/drivers/gpu/drm/lima/lima_sched.c index ffd91a5ee299..8d858aed0e56 100644 --- a/drivers/gpu/drm/lima/lima_sched.c +++ b/drivers/gpu/drm/lima/lima_sched.c @@ -488,7 +488,7 @@ int lima_sched_pipe_init(struct lima_sched_pipe *pipe, const char *name) INIT_WORK(&pipe->recover_work, lima_sched_recover_work); - return drm_sched_init(&pipe->base, &lima_sched_ops, 1, + return drm_sched_init(&pipe->base, &lima_sched_ops, NULL, 1, lima_job_hang_limit, msecs_to_jiffies(timeout), NULL, NULL, name, pipe->ldev->dev); diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.c b/drivers/gpu/drm/msm/msm_ringbuffer.c index 40c0bc35a44c..b8865e61b40f 100644 --- a/drivers/gpu/drm/msm/msm_ringbuffer.c +++ b/drivers/gpu/drm/msm/msm_ringbuffer.c @@ -94,7 +94,7 @@ struct msm_ringbuffer *msm_ringbuffer_new(struct msm_gpu *gpu, int id, /* currently managing hangcheck ourselves: */ sched_timeout = MAX_SCHEDULE_TIMEOUT; - ret = drm_sched_init(&ring->sched, &msm_sched_ops, + ret = drm_sched_init(&ring->sched, &msm_sched_ops, NULL, num_hw_submissions, 0, sched_timeout, NULL, NULL, to_msm_bo(ring->bo)->name, gpu->dev->dev); checkpatch.pl complains here about unmatched open parens. Will fix an
Re: [PATCH v4 02/10] drm/sched: Convert drm scheduler to use a work queue rather than kthread
On Thu, Oct 05, 2023 at 12:13:01AM -0400, Luben Tuikov wrote: > On 2023-10-04 23:33, Matthew Brost wrote: > > On Tue, Sep 26, 2023 at 11:32:10PM -0400, Luben Tuikov wrote: > >> Hi, > >> > >> On 2023-09-19 01:01, Matthew Brost wrote: > >>> In XE, the new Intel GPU driver, a choice has made to have a 1 to 1 > >>> mapping between a drm_gpu_scheduler and drm_sched_entity. At first this > >>> seems a bit odd but let us explain the reasoning below. > >>> > >>> 1. In XE the submission order from multiple drm_sched_entity is not > >>> guaranteed to be the same completion even if targeting the same hardware > >>> engine. This is because in XE we have a firmware scheduler, the GuC, > >>> which allowed to reorder, timeslice, and preempt submissions. If a using > >>> shared drm_gpu_scheduler across multiple drm_sched_entity, the TDR falls > >>> apart as the TDR expects submission order == completion order. Using a > >>> dedicated drm_gpu_scheduler per drm_sched_entity solve this problem. > >>> > >>> 2. In XE submissions are done via programming a ring buffer (circular > >>> buffer), a drm_gpu_scheduler provides a limit on number of jobs, if the > >>> limit of number jobs is set to RING_SIZE / MAX_SIZE_PER_JOB we get flow > >>> control on the ring for free. > >>> > >>> A problem with this design is currently a drm_gpu_scheduler uses a > >>> kthread for submission / job cleanup. This doesn't scale if a large > >>> number of drm_gpu_scheduler are used. To work around the scaling issue, > >>> use a worker rather than kthread for submission / job cleanup. > >>> > >>> v2: > >>> - (Rob Clark) Fix msm build > >>> - Pass in run work queue > >>> v3: > >>> - (Boris) don't have loop in worker > >>> v4: > >>> - (Tvrtko) break out submit ready, stop, start helpers into own patch > >>> v5: > >>> - (Boris) default to ordered work queue > >>> > >>> Signed-off-by: Matthew Brost > >>> --- > >>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +- > >>> drivers/gpu/drm/etnaviv/etnaviv_sched.c| 2 +- > >>> drivers/gpu/drm/lima/lima_sched.c | 2 +- > >>> drivers/gpu/drm/msm/msm_ringbuffer.c | 2 +- > >>> drivers/gpu/drm/nouveau/nouveau_sched.c| 2 +- > >>> drivers/gpu/drm/panfrost/panfrost_job.c| 2 +- > >>> drivers/gpu/drm/scheduler/sched_main.c | 118 ++--- > >>> drivers/gpu/drm/v3d/v3d_sched.c| 10 +- > >>> include/drm/gpu_scheduler.h| 14 ++- > >>> 9 files changed, 79 insertions(+), 75 deletions(-) > >>> > >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > >>> index e366f61c3aed..16f3cfe1574a 100644 > >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > >>> @@ -2279,7 +2279,7 @@ static int amdgpu_device_init_schedulers(struct > >>> amdgpu_device *adev) > >>> break; > >>> } > >>> > >>> - r = drm_sched_init(&ring->sched, &amdgpu_sched_ops, > >>> + r = drm_sched_init(&ring->sched, &amdgpu_sched_ops, NULL, > >>> ring->num_hw_submission, 0, > >>> timeout, adev->reset_domain->wq, > >>> ring->sched_score, ring->name, > >>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c > >>> b/drivers/gpu/drm/etnaviv/etnaviv_sched.c > >>> index 345fec6cb1a4..618a804ddc34 100644 > >>> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c > >>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c > >>> @@ -134,7 +134,7 @@ int etnaviv_sched_init(struct etnaviv_gpu *gpu) > >>> { > >>> int ret; > >>> > >>> - ret = drm_sched_init(&gpu->sched, &etnaviv_sched_ops, > >>> + ret = drm_sched_init(&gpu->sched, &etnaviv_sched_ops, NULL, > >>>etnaviv_hw_jobs_limit, etnaviv_job_hang_limit, > >>>msecs_to_jiffies(500), NULL, NULL, > >>>dev_name(gpu->dev), gpu->dev); > >>> diff --git a/drivers/gpu/drm/lima/lima_sched.c > >>> b/drivers/gpu/drm/lima/lima_sched.c > >>> index ffd91a5ee299..8d858aed0e56 100644 > >>> --- a/drivers/gpu/drm/lima/lima_sched.c > >>> +++ b/drivers/gpu/drm/lima/lima_sched.c > >>> @@ -488,7 +488,7 @@ int lima_sched_pipe_init(struct lima_sched_pipe > >>> *pipe, const char *name) > >>> > >>> INIT_WORK(&pipe->recover_work, lima_sched_recover_work); > >>> > >>> - return drm_sched_init(&pipe->base, &lima_sched_ops, 1, > >>> + return drm_sched_init(&pipe->base, &lima_sched_ops, NULL, 1, > >>> lima_job_hang_limit, > >>> msecs_to_jiffies(timeout), NULL, > >>> NULL, name, pipe->ldev->dev); > >>> diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.c > >>> b/drivers/gpu/drm/msm/msm_ringbuffer.c > >>> index 40c0bc35a44c..b8865e61b40f 100644 > >>> --- a/drivers/gpu/drm/msm/msm_ringbuffer.c > >>> +++ b/drivers/gpu/drm/msm/msm_ringbuffer.c > >>> @@ -94,7 +94,7 @@ struct msm_ringbuff
Re: [PATCH v4 02/10] drm/sched: Convert drm scheduler to use a work queue rather than kthread
On 2023-10-04 23:33, Matthew Brost wrote: > On Tue, Sep 26, 2023 at 11:32:10PM -0400, Luben Tuikov wrote: >> Hi, >> >> On 2023-09-19 01:01, Matthew Brost wrote: >>> In XE, the new Intel GPU driver, a choice has made to have a 1 to 1 >>> mapping between a drm_gpu_scheduler and drm_sched_entity. At first this >>> seems a bit odd but let us explain the reasoning below. >>> >>> 1. In XE the submission order from multiple drm_sched_entity is not >>> guaranteed to be the same completion even if targeting the same hardware >>> engine. This is because in XE we have a firmware scheduler, the GuC, >>> which allowed to reorder, timeslice, and preempt submissions. If a using >>> shared drm_gpu_scheduler across multiple drm_sched_entity, the TDR falls >>> apart as the TDR expects submission order == completion order. Using a >>> dedicated drm_gpu_scheduler per drm_sched_entity solve this problem. >>> >>> 2. In XE submissions are done via programming a ring buffer (circular >>> buffer), a drm_gpu_scheduler provides a limit on number of jobs, if the >>> limit of number jobs is set to RING_SIZE / MAX_SIZE_PER_JOB we get flow >>> control on the ring for free. >>> >>> A problem with this design is currently a drm_gpu_scheduler uses a >>> kthread for submission / job cleanup. This doesn't scale if a large >>> number of drm_gpu_scheduler are used. To work around the scaling issue, >>> use a worker rather than kthread for submission / job cleanup. >>> >>> v2: >>> - (Rob Clark) Fix msm build >>> - Pass in run work queue >>> v3: >>> - (Boris) don't have loop in worker >>> v4: >>> - (Tvrtko) break out submit ready, stop, start helpers into own patch >>> v5: >>> - (Boris) default to ordered work queue >>> >>> Signed-off-by: Matthew Brost >>> --- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +- >>> drivers/gpu/drm/etnaviv/etnaviv_sched.c| 2 +- >>> drivers/gpu/drm/lima/lima_sched.c | 2 +- >>> drivers/gpu/drm/msm/msm_ringbuffer.c | 2 +- >>> drivers/gpu/drm/nouveau/nouveau_sched.c| 2 +- >>> drivers/gpu/drm/panfrost/panfrost_job.c| 2 +- >>> drivers/gpu/drm/scheduler/sched_main.c | 118 ++--- >>> drivers/gpu/drm/v3d/v3d_sched.c| 10 +- >>> include/drm/gpu_scheduler.h| 14 ++- >>> 9 files changed, 79 insertions(+), 75 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>> index e366f61c3aed..16f3cfe1574a 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>> @@ -2279,7 +2279,7 @@ static int amdgpu_device_init_schedulers(struct >>> amdgpu_device *adev) >>> break; >>> } >>> >>> - r = drm_sched_init(&ring->sched, &amdgpu_sched_ops, >>> + r = drm_sched_init(&ring->sched, &amdgpu_sched_ops, NULL, >>>ring->num_hw_submission, 0, >>>timeout, adev->reset_domain->wq, >>>ring->sched_score, ring->name, >>> diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c >>> b/drivers/gpu/drm/etnaviv/etnaviv_sched.c >>> index 345fec6cb1a4..618a804ddc34 100644 >>> --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c >>> +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c >>> @@ -134,7 +134,7 @@ int etnaviv_sched_init(struct etnaviv_gpu *gpu) >>> { >>> int ret; >>> >>> - ret = drm_sched_init(&gpu->sched, &etnaviv_sched_ops, >>> + ret = drm_sched_init(&gpu->sched, &etnaviv_sched_ops, NULL, >>> etnaviv_hw_jobs_limit, etnaviv_job_hang_limit, >>> msecs_to_jiffies(500), NULL, NULL, >>> dev_name(gpu->dev), gpu->dev); >>> diff --git a/drivers/gpu/drm/lima/lima_sched.c >>> b/drivers/gpu/drm/lima/lima_sched.c >>> index ffd91a5ee299..8d858aed0e56 100644 >>> --- a/drivers/gpu/drm/lima/lima_sched.c >>> +++ b/drivers/gpu/drm/lima/lima_sched.c >>> @@ -488,7 +488,7 @@ int lima_sched_pipe_init(struct lima_sched_pipe *pipe, >>> const char *name) >>> >>> INIT_WORK(&pipe->recover_work, lima_sched_recover_work); >>> >>> - return drm_sched_init(&pipe->base, &lima_sched_ops, 1, >>> + return drm_sched_init(&pipe->base, &lima_sched_ops, NULL, 1, >>> lima_job_hang_limit, >>> msecs_to_jiffies(timeout), NULL, >>> NULL, name, pipe->ldev->dev); >>> diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.c >>> b/drivers/gpu/drm/msm/msm_ringbuffer.c >>> index 40c0bc35a44c..b8865e61b40f 100644 >>> --- a/drivers/gpu/drm/msm/msm_ringbuffer.c >>> +++ b/drivers/gpu/drm/msm/msm_ringbuffer.c >>> @@ -94,7 +94,7 @@ struct msm_ringbuffer *msm_ringbuffer_new(struct msm_gpu >>> *gpu, int id, >>> /* currently managing hangcheck ourselves: */ >>> sched_timeout = MAX_SCHEDULE_TIMEOUT; >>> >>> - ret = drm_sched_init(&ring->sched, &msm_sched_ops, >
Re: [PATCH v4 02/10] drm/sched: Convert drm scheduler to use a work queue rather than kthread
On Tue, Sep 26, 2023 at 11:32:10PM -0400, Luben Tuikov wrote: > Hi, > > On 2023-09-19 01:01, Matthew Brost wrote: > > In XE, the new Intel GPU driver, a choice has made to have a 1 to 1 > > mapping between a drm_gpu_scheduler and drm_sched_entity. At first this > > seems a bit odd but let us explain the reasoning below. > > > > 1. In XE the submission order from multiple drm_sched_entity is not > > guaranteed to be the same completion even if targeting the same hardware > > engine. This is because in XE we have a firmware scheduler, the GuC, > > which allowed to reorder, timeslice, and preempt submissions. If a using > > shared drm_gpu_scheduler across multiple drm_sched_entity, the TDR falls > > apart as the TDR expects submission order == completion order. Using a > > dedicated drm_gpu_scheduler per drm_sched_entity solve this problem. > > > > 2. In XE submissions are done via programming a ring buffer (circular > > buffer), a drm_gpu_scheduler provides a limit on number of jobs, if the > > limit of number jobs is set to RING_SIZE / MAX_SIZE_PER_JOB we get flow > > control on the ring for free. > > > > A problem with this design is currently a drm_gpu_scheduler uses a > > kthread for submission / job cleanup. This doesn't scale if a large > > number of drm_gpu_scheduler are used. To work around the scaling issue, > > use a worker rather than kthread for submission / job cleanup. > > > > v2: > > - (Rob Clark) Fix msm build > > - Pass in run work queue > > v3: > > - (Boris) don't have loop in worker > > v4: > > - (Tvrtko) break out submit ready, stop, start helpers into own patch > > v5: > > - (Boris) default to ordered work queue > > > > Signed-off-by: Matthew Brost > > --- > > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +- > > drivers/gpu/drm/etnaviv/etnaviv_sched.c| 2 +- > > drivers/gpu/drm/lima/lima_sched.c | 2 +- > > drivers/gpu/drm/msm/msm_ringbuffer.c | 2 +- > > drivers/gpu/drm/nouveau/nouveau_sched.c| 2 +- > > drivers/gpu/drm/panfrost/panfrost_job.c| 2 +- > > drivers/gpu/drm/scheduler/sched_main.c | 118 ++--- > > drivers/gpu/drm/v3d/v3d_sched.c| 10 +- > > include/drm/gpu_scheduler.h| 14 ++- > > 9 files changed, 79 insertions(+), 75 deletions(-) > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > index e366f61c3aed..16f3cfe1574a 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > > @@ -2279,7 +2279,7 @@ static int amdgpu_device_init_schedulers(struct > > amdgpu_device *adev) > > break; > > } > > > > - r = drm_sched_init(&ring->sched, &amdgpu_sched_ops, > > + r = drm_sched_init(&ring->sched, &amdgpu_sched_ops, NULL, > >ring->num_hw_submission, 0, > >timeout, adev->reset_domain->wq, > >ring->sched_score, ring->name, > > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c > > b/drivers/gpu/drm/etnaviv/etnaviv_sched.c > > index 345fec6cb1a4..618a804ddc34 100644 > > --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c > > +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c > > @@ -134,7 +134,7 @@ int etnaviv_sched_init(struct etnaviv_gpu *gpu) > > { > > int ret; > > > > - ret = drm_sched_init(&gpu->sched, &etnaviv_sched_ops, > > + ret = drm_sched_init(&gpu->sched, &etnaviv_sched_ops, NULL, > > etnaviv_hw_jobs_limit, etnaviv_job_hang_limit, > > msecs_to_jiffies(500), NULL, NULL, > > dev_name(gpu->dev), gpu->dev); > > diff --git a/drivers/gpu/drm/lima/lima_sched.c > > b/drivers/gpu/drm/lima/lima_sched.c > > index ffd91a5ee299..8d858aed0e56 100644 > > --- a/drivers/gpu/drm/lima/lima_sched.c > > +++ b/drivers/gpu/drm/lima/lima_sched.c > > @@ -488,7 +488,7 @@ int lima_sched_pipe_init(struct lima_sched_pipe *pipe, > > const char *name) > > > > INIT_WORK(&pipe->recover_work, lima_sched_recover_work); > > > > - return drm_sched_init(&pipe->base, &lima_sched_ops, 1, > > + return drm_sched_init(&pipe->base, &lima_sched_ops, NULL, 1, > > lima_job_hang_limit, > > msecs_to_jiffies(timeout), NULL, > > NULL, name, pipe->ldev->dev); > > diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.c > > b/drivers/gpu/drm/msm/msm_ringbuffer.c > > index 40c0bc35a44c..b8865e61b40f 100644 > > --- a/drivers/gpu/drm/msm/msm_ringbuffer.c > > +++ b/drivers/gpu/drm/msm/msm_ringbuffer.c > > @@ -94,7 +94,7 @@ struct msm_ringbuffer *msm_ringbuffer_new(struct msm_gpu > > *gpu, int id, > > /* currently managing hangcheck ourselves: */ > > sched_timeout = MAX_SCHEDULE_TIMEOUT; > > > > - ret = drm_sched_init(&ring->sched, &msm_sched_ops, > > + ret = drm_sched_init(&ring->sched
Re: [PATCH v4 02/10] drm/sched: Convert drm scheduler to use a work queue rather than kthread
Hi, On 2023-09-19 01:01, Matthew Brost wrote: > In XE, the new Intel GPU driver, a choice has made to have a 1 to 1 > mapping between a drm_gpu_scheduler and drm_sched_entity. At first this > seems a bit odd but let us explain the reasoning below. > > 1. In XE the submission order from multiple drm_sched_entity is not > guaranteed to be the same completion even if targeting the same hardware > engine. This is because in XE we have a firmware scheduler, the GuC, > which allowed to reorder, timeslice, and preempt submissions. If a using > shared drm_gpu_scheduler across multiple drm_sched_entity, the TDR falls > apart as the TDR expects submission order == completion order. Using a > dedicated drm_gpu_scheduler per drm_sched_entity solve this problem. > > 2. In XE submissions are done via programming a ring buffer (circular > buffer), a drm_gpu_scheduler provides a limit on number of jobs, if the > limit of number jobs is set to RING_SIZE / MAX_SIZE_PER_JOB we get flow > control on the ring for free. > > A problem with this design is currently a drm_gpu_scheduler uses a > kthread for submission / job cleanup. This doesn't scale if a large > number of drm_gpu_scheduler are used. To work around the scaling issue, > use a worker rather than kthread for submission / job cleanup. > > v2: > - (Rob Clark) Fix msm build > - Pass in run work queue > v3: > - (Boris) don't have loop in worker > v4: > - (Tvrtko) break out submit ready, stop, start helpers into own patch > v5: > - (Boris) default to ordered work queue > > Signed-off-by: Matthew Brost > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +- > drivers/gpu/drm/etnaviv/etnaviv_sched.c| 2 +- > drivers/gpu/drm/lima/lima_sched.c | 2 +- > drivers/gpu/drm/msm/msm_ringbuffer.c | 2 +- > drivers/gpu/drm/nouveau/nouveau_sched.c| 2 +- > drivers/gpu/drm/panfrost/panfrost_job.c| 2 +- > drivers/gpu/drm/scheduler/sched_main.c | 118 ++--- > drivers/gpu/drm/v3d/v3d_sched.c| 10 +- > include/drm/gpu_scheduler.h| 14 ++- > 9 files changed, 79 insertions(+), 75 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > index e366f61c3aed..16f3cfe1574a 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > @@ -2279,7 +2279,7 @@ static int amdgpu_device_init_schedulers(struct > amdgpu_device *adev) > break; > } > > - r = drm_sched_init(&ring->sched, &amdgpu_sched_ops, > + r = drm_sched_init(&ring->sched, &amdgpu_sched_ops, NULL, > ring->num_hw_submission, 0, > timeout, adev->reset_domain->wq, > ring->sched_score, ring->name, > diff --git a/drivers/gpu/drm/etnaviv/etnaviv_sched.c > b/drivers/gpu/drm/etnaviv/etnaviv_sched.c > index 345fec6cb1a4..618a804ddc34 100644 > --- a/drivers/gpu/drm/etnaviv/etnaviv_sched.c > +++ b/drivers/gpu/drm/etnaviv/etnaviv_sched.c > @@ -134,7 +134,7 @@ int etnaviv_sched_init(struct etnaviv_gpu *gpu) > { > int ret; > > - ret = drm_sched_init(&gpu->sched, &etnaviv_sched_ops, > + ret = drm_sched_init(&gpu->sched, &etnaviv_sched_ops, NULL, >etnaviv_hw_jobs_limit, etnaviv_job_hang_limit, >msecs_to_jiffies(500), NULL, NULL, >dev_name(gpu->dev), gpu->dev); > diff --git a/drivers/gpu/drm/lima/lima_sched.c > b/drivers/gpu/drm/lima/lima_sched.c > index ffd91a5ee299..8d858aed0e56 100644 > --- a/drivers/gpu/drm/lima/lima_sched.c > +++ b/drivers/gpu/drm/lima/lima_sched.c > @@ -488,7 +488,7 @@ int lima_sched_pipe_init(struct lima_sched_pipe *pipe, > const char *name) > > INIT_WORK(&pipe->recover_work, lima_sched_recover_work); > > - return drm_sched_init(&pipe->base, &lima_sched_ops, 1, > + return drm_sched_init(&pipe->base, &lima_sched_ops, NULL, 1, > lima_job_hang_limit, > msecs_to_jiffies(timeout), NULL, > NULL, name, pipe->ldev->dev); > diff --git a/drivers/gpu/drm/msm/msm_ringbuffer.c > b/drivers/gpu/drm/msm/msm_ringbuffer.c > index 40c0bc35a44c..b8865e61b40f 100644 > --- a/drivers/gpu/drm/msm/msm_ringbuffer.c > +++ b/drivers/gpu/drm/msm/msm_ringbuffer.c > @@ -94,7 +94,7 @@ struct msm_ringbuffer *msm_ringbuffer_new(struct msm_gpu > *gpu, int id, >/* currently managing hangcheck ourselves: */ > sched_timeout = MAX_SCHEDULE_TIMEOUT; > > - ret = drm_sched_init(&ring->sched, &msm_sched_ops, > + ret = drm_sched_init(&ring->sched, &msm_sched_ops, NULL, > num_hw_submissions, 0, sched_timeout, > NULL, NULL, to_msm_bo(ring->bo)->name, gpu->dev->dev); checkpatch.pl complains here about unmatched open parens.