> Not sure if I'm being nit-picky, but if restore_id is negative, then 
> idr_alloc will return -EINVAL. That -EINVAL will go back up to the 
> create_signal_event call, which gives > a pr_warn of "Signal event wasn't 
> created because out of kernel memory" , even though the issue was bad input. 
> Mostly, I'm wondering if we need to add something > in dmesg if idr_alloc 
> fails for that reason, since someone could end up spending time debugging OOM 
> when the issue is an invalid param.
> Or is a negative restore_id indicative of OOM?

Not a nit pick; nice catch.

I think "Signal event wasn't created because out of kernel memory" is just 
straight wrong - idr_alloc can fail with -ENOSPC if there isn't an available 
ID,,
which is different from being out of memory.

David Francis

________________________________________
From: Russell, Kent <[email protected]>
Sent: Friday, June 5, 2026 12:16 PM
To: Francis, David; [email protected]
Cc: Francis, David
Subject: RE: [PATCH V2] drm/amdkfd: Check bounds on CRIU restore event id

AMD General

> -----Original Message-----
> From: amd-gfx <[email protected]> On Behalf Of David
> Francis
> Sent: June 2, 2026 2:47 PM
> To: [email protected]
> Cc: Francis, David <[email protected]>
> Subject: [PATCH V2] drm/amdkfd: Check bounds on CRIU restore event id
>
> The valid amdkfd event ids go from 0 to KFD_SIGNAL_EVENT_LIMIT - 1.
>
> During CRIU restore, ensure that the provided event ids are
> in that range.
>
> v2: No need for lower bound check since idr_alloc rejects negative
> inputs
>
> Signed-off-by: David Francis <[email protected]>
> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_events.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c
> b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
> index e9be798c0a2b..850d6befeb6d 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_events.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_events.c
> @@ -107,6 +107,9 @@ static int allocate_event_notification_slot(struct
> kfd_process *p,
>       }
>
>       if (restore_id) {
> +             if (*restore_id >= KFD_SIGNAL_EVENT_LIMIT)
> +                     return -EINVAL;
> +
>               id = idr_alloc(&p->event_idr, ev, *restore_id, *restore_id + 1,
>                               GFP_KERNEL);
Not sure if I'm being nit-picky, but if restore_id is negative, then idr_alloc 
will return -EINVAL. That -EINVAL will go back up to the create_signal_event 
call, which gives a pr_warn of "Signal event wasn't created because out of 
kernel memory" , even though the issue was bad input. Mostly, I'm wondering if 
we need to add something in dmesg if idr_alloc fails for that reason, since 
someone could end up spending time debugging OOM when the issue is an invalid 
param.
Or is a negative restore_id indicative of OOM?

 Kent


>       } else {
> --
> 2.34.1

Reply via email to