From: David Francis <[email protected]>

[ Upstream commit 85705b18ae7674347f8675f64b2b3115fb1d5629 ]

The kfd CRIU checkpoint ioctl would return an error if trying
to checkpoint a process with no kfd buffer objects.

This is a normal case and should not be an error.

Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: David Francis <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---

LLM Generated explanations, may be completely bogus:

YES

- What it fixes: Previously, the CRIU path rejected processes with no
  KFD buffer objects by requiring both a non-NULL `bos` pointer and a
  non-zero `num_bos`. The commit relaxes this so that a process with
  zero BOs is treated as a normal case instead of an error.

- Precise change: In `criu_restore`, the validation changes from
  rejecting zero BOs to only requiring `args->bos` when there actually
  are BOs:
  - New check only requires `bos` if `num_bos > 0`:
    `drivers/gpu/drm/amd/amdkfd/kfd_chardev.c:2568-2570`
    - `(args->num_bos > 0 && !args->bos) || !args->devices ||
      !args->priv_data || !args->priv_data_size || !args->num_devices`
  - This removes the old unconditional `!args->bos` and `!args->num_bos`
    rejection.

- Why it’s correct and safe:
  - Downstream restore code already handles zero BOs correctly:
    - Size checks scale with `num_bos`:
      `drivers/gpu/drm/amd/amdkfd/kfd_chardev.c:2439-2440`
    - Zero-length allocations are fine; `kvmalloc_array(args->num_bos,
      ...)` and `kvzalloc(...)` safely handle `num_bos == 0` and
      `kvfree` is safe:
      `drivers/gpu/drm/amd/amdkfd/kfd_chardev.c:2445-2453, 2463-2467,
      2495-2499`
    - `copy_from_user` and `copy_to_user` with size 0 are no-ops and
      safe even if `bos` is NULL:
      `drivers/gpu/drm/amd/amdkfd/kfd_chardev.c:2455-2461, 2487-2492`
    - The loop over BOs naturally skips when `num_bos == 0`:
      `drivers/gpu/drm/amd/amdkfd/kfd_chardev.c:2479-2485`
  - For `num_bos > 0`, the new check still requires a valid `bos`
    pointer, preserving existing behavior where needed.

- Scope and risk:
  - Small, localized input validation fix in KFD CRIU restore path only;
    no architectural changes.
  - No impact on other subsystems; error handling paths remain
    unchanged.
  - Regression risk is minimal because it only relaxes a reject
    condition for a valid scenario and downstream code already supports
    zero BOs.

- User impact:
  - Fixes spurious `-EINVAL` on CRIU operations for processes without
    KFD BOs, which is a normal scenario per the commit message.
  - Improves reliability of CRIU-based workflows for AMD GPU compute
    processes.

- Stable backport criteria:
  - Important bugfix affecting real users.
  - Minimal, contained change with low risk.
  - No new features or API changes; aligns behavior with existing code
    expectations.

Note: While the commit message mentions the checkpoint ioctl, this
change updates the restore validation
(`drivers/gpu/drm/amd/amdkfd/kfd_chardev.c:2568-2570`). It still
satisfies stable criteria by correcting CRIU handling for the no-BO case
on restore, with the downstream code already safely handling `num_bos ==
0`.

 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index 43115a3744694..8535a52a62cab 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -2571,8 +2571,8 @@ static int criu_restore(struct file *filep,
        pr_debug("CRIU restore (num_devices:%u num_bos:%u num_objects:%u 
priv_data_size:%llu)\n",
                 args->num_devices, args->num_bos, args->num_objects, 
args->priv_data_size);
 
-       if (!args->bos || !args->devices || !args->priv_data || 
!args->priv_data_size ||
-           !args->num_devices || !args->num_bos)
+       if ((args->num_bos > 0 && !args->bos) || !args->devices || 
!args->priv_data ||
+           !args->priv_data_size || !args->num_devices)
                return -EINVAL;
 
        mutex_lock(&p->mutex);
-- 
2.51.0

Reply via email to