On Wed, May 13, 2026 at 11:49 AM Sunday Clement <[email protected]> wrote:
>
> The get_wave_state() function for v9 trusts cp_hqd_cntl_stack_size and
> cp_hqd_cntl_stack_offset values read directly from the MQD, which are
> written by GPU microcode and fully attacker-controlled on the
> CRIU-restore path (via AMDKFD_IOC_RESTORE_PROCESS with H3).
>
> this leads to an unbounded copy_to_user() that can leak adjacent
> GTT/kernel memory. If offset > size, integer underflow produces a ~4 GiB
> read length, if size is set to 1 MiB against a 4 KiB allocation, we leak
> 1 MiB of adjacent kernel memory (other queues' MQDs, ring buffers, KASLR
> pointers).
>
> Fix by clamping both cp_hqd_cntl_stack_size to the actual allocated
> buffer size (q->ctl_stack_size) and cp_hqd_cntl_stack_offset to the
> clamped size before performing arithmetic and copy_to_user().
>
> This ensures we never read beyond the allocated kernel BO regardless of
> attacker-supplied MQD field values.
>
> Signed-off-by: Sunday Clement <[email protected]>
> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c | 10 +++++++---
>  1 file changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
> index 7232a0117a00..b311e3918eb9 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c
> @@ -395,9 +395,13 @@ static int get_wave_state(struct mqd_manager *mm, void 
> *mqd,
>         if (copy_to_user(ctl_stack, &header, sizeof(header.wave_state)))
>                 return -EFAULT;
>
> -       if (copy_to_user(ctl_stack + m->cp_hqd_cntl_stack_offset,
> -                               mqd_ctl_stack + m->cp_hqd_cntl_stack_offset,
> -                               *ctl_stack_used_size))
> +       u32 cntl_stack_size  = min_t(u32, m->cp_hqd_cntl_stack_size,   
> q->ctl_stack_size);
> +       u32 cntl_stack_offset  = min_t(u32, m->cp_hqd_cntl_stack_offset, 
> cntl_stack_size);

Some compilers will complain if you mix code and declarations.  With that fixed,
Acked-by: Alex Deucher <[email protected]>

> +
> +       *ctl_stack_used_size = cntl_stack_size - cntl_stack_offset;
> +
> +       if (copy_to_user(ctl_stack + cntl_stack_offset, mqd_ctl_stack + 
> cntl_stack_offset,
> +                                       *ctl_stack_used_size))
>                 return -EFAULT;
>
>         return 0;
> --
> 2.43.0
>

Reply via email to