From: Donet Tom <[email protected]> [ Upstream commit 6c160001661b6c4e20f5c31909c722741e14c2d8 ]
In svm_migrate_gart_map(), while migrating GART mapping, the number of bytes copied for the GART table only accounts for CPU pages. On non-4K systems, each CPU page can contain multiple GPU pages, and the GART requires one 8-byte PTE per GPU page. As a result, an incorrect size was passed to the DMA, causing only a partial update of the GART table. Fix this function to work correctly on non-4K page-size systems by accounting for the number of GPU pages per CPU page when calculating the number of bytes to be copied. Acked-by: Christian König <[email protected]> Reviewed-by: Philip Yang <[email protected]> Signed-off-by: Ritesh Harjani (IBM) <[email protected]> Signed-off-by: Donet Tom <[email protected]> Signed-off-by: Felix Kuehling <[email protected]> Reviewed-by: Felix Kuehling <[email protected]> Signed-off-by: Alex Deucher <[email protected]> Signed-off-by: Sasha Levin <[email protected]> --- LLM Generated explanations, may be completely bogus: This confirms the issue. At line 362, `amdgpu_gart_map` already correctly iterates `AMDGPU_GPU_PAGES_IN_CPU_PAGE` times per CPU page (inner loop `j`). So it generates `npages * AMDGPU_GPU_PAGES_IN_CPU_PAGE` PTEs (each 8 bytes), totaling `npages * 8 * AMDGPU_GPU_PAGES_IN_CPU_PAGE` bytes. But the DMA copy was only copying `npages * 8` bytes — a mismatch that results in partial GART updates on non-4K systems. ### 3. Classification This is a clear **bug fix** for incorrect DMA size calculation. On non-4K page systems (common on aarch64 and POWER architectures), only a fraction of the GART PTEs would be DMA-copied to the GPU page table, leading to: - **GPU page faults** when accessing unmapped entries - **Potential data corruption** if stale GART entries point to wrong memory - **GPU hangs** or crashes during SVM (Shared Virtual Memory) migration ### 4. Scope and Risk Assessment - **Lines changed**: 1 line (minimal) - **Files changed**: 1 file - **Risk**: Very low. On 4K page systems (x86), `AMDGPU_GPU_PAGES_IN_CPU_PAGE` = 1, so the behavior is unchanged. The fix only affects non-4K systems where it was broken. - **Subsystem**: AMD KFD (Kernel Fusion Driver) — GPU compute path, used for HPC and AI workloads on AMD GPUs ### 5. User Impact This affects users running: - AMD GPUs with KFD/ROCm on **aarch64 or POWER** systems (which commonly use 64K pages) - SVM (Shared Virtual Memory) migration operations - Without this fix, GPU compute workloads would be broken or unreliable on non-4K page systems ### 6. Stability Indicators - Reviewed and acked by 3 AMD DRM maintainers/developers - The fix is mathematically obvious — the existing `amdgpu_gart_map` function already generates `AMDGPU_GPU_PAGES_IN_CPU_PAGE` PTEs per CPU page, so the copy size must match - No-op on 4K page systems, so zero risk of regression for x86 ### 7. Dependency Check The fix uses `AMDGPU_GPU_PAGES_IN_CPU_PAGE` which has been defined in `amdgpu_gart.h` for a long time. The `svm_migrate_gart_map` function and the surrounding code exist in all recent stable trees. No additional dependencies needed. ### Summary This is a textbook stable backport candidate: - **Obviously correct**: One-line fix that aligns DMA copy size with the actual number of PTEs generated - **Fixes a real bug**: Partial GART table update on non-4K page systems - **Small and contained**: Single line change in a single file - **No new features**: Pure bug fix - **Low risk**: No-op on 4K (x86) systems; fixes broken behavior on non-4K systems - **Well-reviewed**: Multiple AMD maintainer acks/reviews **YES** drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c index 6ada7b4af7c68..5086caac3fd06 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c @@ -61,7 +61,7 @@ svm_migrate_gart_map(struct amdgpu_ring *ring, u64 npages, *gart_addr = adev->gmc.gart_start; num_dw = ALIGN(adev->mman.buffer_funcs->copy_num_dw, 8); - num_bytes = npages * 8; + num_bytes = npages * 8 * AMDGPU_GPU_PAGES_IN_CPU_PAGE; r = amdgpu_job_alloc_with_ib(adev, &adev->mman.high_pr, AMDGPU_FENCE_OWNER_UNDEFINED, -- 2.51.0
