On Wed, Oct 8, 2025 at 12:51 PM Jonathan Kim <[email protected]> wrote:
>
> The driver currently only checks that the MES packet submission fence
> did not timeout but does not actually check if the fence return status
> matches the expected completion value it passed to MES prior to
> submission.
>
> For example, this can result in REMOVE_QUEUE requests returning success
> to the driver when the queue actually failed to preempt.
>
> Fix this by having the driver actually compare the completion status
> value to the expected success value.
This should be correct as is:
*status_ptr = 0;
...
api_status->api_completion_fence_value = 1;
...
if (r < 1 || !*status_ptr) {
Alex
>
> Signed-off-by: Jonathan Kim <[email protected]>
> ---
> drivers/gpu/drm/amd/amdgpu/mes_v12_0.c | 3 +--
> 1 file changed, 1 insertion(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
> b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
> index aff06f06aeee..58f61170cf85 100644
> --- a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
> @@ -228,8 +228,7 @@ static int
> mes_v12_0_submit_pkt_and_poll_completion(struct amdgpu_mes *mes,
> pipe, x_pkt->header.opcode);
>
> r = amdgpu_fence_wait_polling(ring, seq, timeout);
> - if (r < 1 || !*status_ptr) {
> -
> + if (r < 1 || *status_ptr != api_status->api_completion_fence_value) {
> if (misc_op_str)
> dev_err(adev->dev, "MES(%d) failed to respond to
> msg=%s (%s)\n",
> pipe, op_str, misc_op_str);
> --
> 2.34.1
>