[Public]

> -----Original Message-----
> From: amd-gfx <amd-gfx-boun...@lists.freedesktop.org> On Behalf Of Lazar,
> Lijo
> Sent: Wednesday, July 19, 2023 12:13 AM
> To: Zhu, James <james....@amd.com>; amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander <alexander.deuc...@amd.com>; Zhu, James
> <james....@amd.com>; Kamal, Asad <asad.ka...@amd.com>; Zhang,
> Hawking <hawking.zh...@amd.com>
> Subject: Re: [PATCH] drm/amdgpu: Update ring scheduler info as needed
>
>
>
> On 7/18/2023 9:39 PM, James Zhu wrote:
> >
> > On 2023-07-18 11:54, Lazar, Lijo wrote:
> >>
> >>
> >> On 7/18/2023 8:39 PM, James Zhu wrote:
> >>>
> >>> On 2023-07-18 10:19, Lazar, Lijo wrote:
> >>>>
> >>>>
> >>>> On 7/18/2023 7:30 PM, James Zhu wrote:
> >>>>>
> >>>>> On 2023-07-18 08:21, Lijo Lazar wrote:
> >>>>>> Not all rings have scheduler associated. Only update scheduler
> >>>>>> data for rings with scheduler. It could result in out of bound
> >>>>>> access as total rings are more than those associated with
> >>>>>> particular IPs.
> >>>>>>
> >>>>>> Signed-off-by: Lijo Lazar <lijo.la...@amd.com>
> >>>>>> ---
> >>>>>>   drivers/gpu/drm/amd/amdgpu/aqua_vanjaram.c | 2 +-
> >>>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
> >>>>>>
> >>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram.c
> >>>>>> b/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram.c
> >>>>>> index 72b629a78c62..d0fc62784e82 100644
> >>>>>> --- a/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram.c
> >>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/aqua_vanjaram.c
> >>>>>> @@ -134,7 +134,7 @@ static int
> >>>>>> aqua_vanjaram_xcp_sched_list_update(
> >>>>>>       for (i = 0; i < AMDGPU_MAX_RINGS; i++) {
> >>>>>>           ring = adev->rings[i];
> >>>>>> -        if (!ring || !ring->sched.ready)
> >>>>>> +        if (!ring || !ring->sched.ready || ring->no_scheduler)
> >>>>> [JZ] any case for ring->no_scheduler = true, but ring->sched.ready
> >>>>> = true?
> >>>>
> >>>> amdgpu_gfx_kiq_init_ring sets no_scheduler flag true for kiq rings.
> >>>> amdgpu_device_init_schedulers inits schedulers only for rings which
> >>>> doesn't have no_scheduler. However in gfx_v9_4_3_xcc_kiq_resume,
> >>>> the ring is marked with ring->sched.ready = true;
> >>>>
> >>>> I'm not sure why it's done this way, but this is the case even for
> >>>> gfxv9.

Driver so far still has some concept abuse on sched.ready. In 
amdgpu_device_init_schedulers , it's set to be true once setting up sw 
scheduler for the ring requirement, however, after driver is up, this flag 
works like a hint to tell the corresponding ring is ready for HW submission 
after ring test succeeds.

For this case, it's probably caused by amdgpu_gfx_enable_kcq  calling 
amdgpu_ring_test_helper, which sets sched.ready unconditionally based on ring 
test result. This will lead value mismatch between ring->no_scheduler and 
ring->sched.ready. If my understanding is correct, I think adding a check in 
this helper function which only sets sched.ready when no_scheduler is false 
will help.  So I will provide a patch soon to prevent this mismatch in future.

+if (!ring->no_scheduler)
+       ring->sched.ready != r;

Regards,
Guchun

> >>> [JZ] I think the sched.ready confuses people. here just means ring
> >>> is ready be used.
> >>
> >> Thanks for the clarification. One question is then do we need to
> >> check ring ready status for assigning xcp ids or just check if the
> >> ring is associated with a scheduler?
> >>
> >> Keep only this - if (!ring || ring->no_scheduler) to assign xcp ids
> >> and leave the ring ready status to the logic which really accepts
> >> jobs on the ring?
> > [JZ] I think keeping ring->sched.ready will reduce schedule list which
> > will save a little scheduling time.
>
> Fine, will just push this one.
>
> Thanks,
> Lijo
>
> >>
> >> Thanks,
> >> Lijo
> >>
> >>>> This patch is Reviewed-by: James Zhu <james....@amd.com>
> >>>
> >>>> Thanks,
> >>>> Lijo
> >>>>
> >>>>>>               continue;
> >>>>>>           aqua_vanjaram_xcp_gpu_sched_update(adev, ring,
> >>>>>> ring->xcp_id);

Reply via email to