[Public]

> -----Original Message-----
> From: Alex Deucher <[email protected]>
> Sent: Wednesday, October 8, 2025 1:12 PM
> To: Kim, Jonathan <[email protected]>
> Cc: [email protected]; Liu, Shaoyun <[email protected]>
> Subject: Re: [PATCH] drm/amdgpu: fix gfx12 mes packet submission status check
>
> On Wed, Oct 8, 2025 at 12:51 PM Jonathan Kim <[email protected]>
> wrote:
> >
> > The driver currently only checks that the MES packet submission fence
> > did not timeout but does not actually check if the fence return status
> > matches the expected completion value it passed to MES prior to
> > submission.
> >
> > For example, this can result in REMOVE_QUEUE requests returning success
> > to the driver when the queue actually failed to preempt.
> >
> > Fix this by having the driver actually compare the completion status
> > value to the expected success value.
>
> This should be correct as is:
>
> *status_ptr = 0;

That's not what I'm observing at the moment.
amdgpu_fence_wait_polling can still return fine where status_ptr != 0 nor 1.
From what I've been told, only 0x1 == success (the completion fence value 
passed to MES).
Shaoyun can probably elaborate or correct me if I'm wrong.
It's pretty easy for me to put the HW in a bad state (dangling queues post 
remove) without the driver knowing in its current state.

Jon

> ...
> api_status->api_completion_fence_value = 1;
> ...
> if (r < 1 || !*status_ptr) {
>
> Alex
>
> >
> > Signed-off-by: Jonathan Kim <[email protected]>
> > ---
> >  drivers/gpu/drm/amd/amdgpu/mes_v12_0.c | 3 +--
> >  1 file changed, 1 insertion(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
> b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
> > index aff06f06aeee..58f61170cf85 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
> > @@ -228,8 +228,7 @@ static int
> mes_v12_0_submit_pkt_and_poll_completion(struct amdgpu_mes *mes,
> >                         pipe, x_pkt->header.opcode);
> >
> >         r = amdgpu_fence_wait_polling(ring, seq, timeout);
> > -       if (r < 1 || !*status_ptr) {
> > -
> > +       if (r < 1 || *status_ptr != api_status->api_completion_fence_value) 
> > {
> >                 if (misc_op_str)
> >                         dev_err(adev->dev, "MES(%d) failed to respond to 
> > msg=%s
> (%s)\n",
> >                                 pipe, op_str, misc_op_str);
> > --
> > 2.34.1
> >

Reply via email to