From: David Francis <[email protected]> [ Upstream commit 85705b18ae7674347f8675f64b2b3115fb1d5629 ]
The kfd CRIU checkpoint ioctl would return an error if trying to checkpoint a process with no kfd buffer objects. This is a normal case and should not be an error. Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: David Francis <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Signed-off-by: Sasha Levin <[email protected]> --- LLM Generated explanations, may be completely bogus: YES - What it fixes: Previously, the CRIU path rejected processes with no KFD buffer objects by requiring both a non-NULL `bos` pointer and a non-zero `num_bos`. The commit relaxes this so that a process with zero BOs is treated as a normal case instead of an error. - Precise change: In `criu_restore`, the validation changes from rejecting zero BOs to only requiring `args->bos` when there actually are BOs: - New check only requires `bos` if `num_bos > 0`: `drivers/gpu/drm/amd/amdkfd/kfd_chardev.c:2568-2570` - `(args->num_bos > 0 && !args->bos) || !args->devices || !args->priv_data || !args->priv_data_size || !args->num_devices` - This removes the old unconditional `!args->bos` and `!args->num_bos` rejection. - Why it’s correct and safe: - Downstream restore code already handles zero BOs correctly: - Size checks scale with `num_bos`: `drivers/gpu/drm/amd/amdkfd/kfd_chardev.c:2439-2440` - Zero-length allocations are fine; `kvmalloc_array(args->num_bos, ...)` and `kvzalloc(...)` safely handle `num_bos == 0` and `kvfree` is safe: `drivers/gpu/drm/amd/amdkfd/kfd_chardev.c:2445-2453, 2463-2467, 2495-2499` - `copy_from_user` and `copy_to_user` with size 0 are no-ops and safe even if `bos` is NULL: `drivers/gpu/drm/amd/amdkfd/kfd_chardev.c:2455-2461, 2487-2492` - The loop over BOs naturally skips when `num_bos == 0`: `drivers/gpu/drm/amd/amdkfd/kfd_chardev.c:2479-2485` - For `num_bos > 0`, the new check still requires a valid `bos` pointer, preserving existing behavior where needed. - Scope and risk: - Small, localized input validation fix in KFD CRIU restore path only; no architectural changes. - No impact on other subsystems; error handling paths remain unchanged. - Regression risk is minimal because it only relaxes a reject condition for a valid scenario and downstream code already supports zero BOs. - User impact: - Fixes spurious `-EINVAL` on CRIU operations for processes without KFD BOs, which is a normal scenario per the commit message. - Improves reliability of CRIU-based workflows for AMD GPU compute processes. - Stable backport criteria: - Important bugfix affecting real users. - Minimal, contained change with low risk. - No new features or API changes; aligns behavior with existing code expectations. Note: While the commit message mentions the checkpoint ioctl, this change updates the restore validation (`drivers/gpu/drm/amd/amdkfd/kfd_chardev.c:2568-2570`). It still satisfies stable criteria by correcting CRIU handling for the no-BO case on restore, with the downstream code already safely handling `num_bos == 0`. drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c index 43115a3744694..8535a52a62cab 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c @@ -2571,8 +2571,8 @@ static int criu_restore(struct file *filep, pr_debug("CRIU restore (num_devices:%u num_bos:%u num_objects:%u priv_data_size:%llu)\n", args->num_devices, args->num_bos, args->num_objects, args->priv_data_size); - if (!args->bos || !args->devices || !args->priv_data || !args->priv_data_size || - !args->num_devices || !args->num_bos) + if ((args->num_bos > 0 && !args->bos) || !args->devices || !args->priv_data || + !args->priv_data_size || !args->num_devices) return -EINVAL; mutex_lock(&p->mutex); -- 2.51.0
