On 2/5/2026 10:28 AM, [email protected] wrote:
Replace non-atomic vm->process_info assignment with cmpxchg()
to prevent race when parent/child processes sharing a drm_file
both try to acquire the same VM after fork().
I wonder how parent/child processes can share same drm file? The child
process should close render node after fork/exec, then create its own
gpu vm. Thunk open render node with O_CLOEXEC.
Regards
Xiaogang
Signed-off-by: [email protected] <[email protected]>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 00ea69baa126..f7b2358a0303 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -1432,7 +1432,10 @@ static int init_kfd_vm(struct amdgpu_vm *vm, void
**process_info,
*process_info = info;
}
- vm->process_info = *process_info;
+ if (cmpxchg(&vm->process_info, NULL, *process_info) != NULL) {
+ ret = -EINVAL;
+ goto already_acquired;
+ }
/* Validate page directory and attach eviction fence */
ret = amdgpu_bo_reserve(vm->root.bo, true);
@@ -1472,6 +1475,7 @@ static int init_kfd_vm(struct amdgpu_vm *vm, void
**process_info,
amdgpu_bo_unreserve(vm->root.bo);
reserve_pd_fail:
vm->process_info = NULL;
+already_acquired:
if (info) {
dma_fence_put(&info->eviction_fence->base);
*process_info = NULL;