From: Donet Tom <[email protected]>

[ Upstream commit 6c160001661b6c4e20f5c31909c722741e14c2d8 ]

In svm_migrate_gart_map(), while migrating GART mapping, the number of
bytes copied for the GART table only accounts for CPU pages. On non-4K
systems, each CPU page can contain multiple GPU pages, and the GART
requires one 8-byte PTE per GPU page. As a result, an incorrect size was
passed to the DMA, causing only a partial update of the GART table.

Fix this function to work correctly on non-4K page-size systems by
accounting for the number of GPU pages per CPU page when calculating the
number of bytes to be copied.

Acked-by: Christian König <[email protected]>
Reviewed-by: Philip Yang <[email protected]>
Signed-off-by: Ritesh Harjani (IBM) <[email protected]>
Signed-off-by: Donet Tom <[email protected]>
Signed-off-by: Felix Kuehling <[email protected]>
Reviewed-by: Felix Kuehling <[email protected]>
Signed-off-by: Alex Deucher <[email protected]>
Signed-off-by: Sasha Levin <[email protected]>
---

LLM Generated explanations, may be completely bogus:

This confirms the issue. At line 362, `amdgpu_gart_map` already
correctly iterates `AMDGPU_GPU_PAGES_IN_CPU_PAGE` times per CPU page
(inner loop `j`). So it generates `npages *
AMDGPU_GPU_PAGES_IN_CPU_PAGE` PTEs (each 8 bytes), totaling `npages * 8
* AMDGPU_GPU_PAGES_IN_CPU_PAGE` bytes. But the DMA copy was only copying
`npages * 8` bytes — a mismatch that results in partial GART updates on
non-4K systems.

### 3. Classification

This is a clear **bug fix** for incorrect DMA size calculation. On
non-4K page systems (common on aarch64 and POWER architectures), only a
fraction of the GART PTEs would be DMA-copied to the GPU page table,
leading to:
- **GPU page faults** when accessing unmapped entries
- **Potential data corruption** if stale GART entries point to wrong
  memory
- **GPU hangs** or crashes during SVM (Shared Virtual Memory) migration

### 4. Scope and Risk Assessment

- **Lines changed**: 1 line (minimal)
- **Files changed**: 1 file
- **Risk**: Very low. On 4K page systems (x86),
  `AMDGPU_GPU_PAGES_IN_CPU_PAGE` = 1, so the behavior is unchanged. The
  fix only affects non-4K systems where it was broken.
- **Subsystem**: AMD KFD (Kernel Fusion Driver) — GPU compute path, used
  for HPC and AI workloads on AMD GPUs

### 5. User Impact

This affects users running:
- AMD GPUs with KFD/ROCm on **aarch64 or POWER** systems (which commonly
  use 64K pages)
- SVM (Shared Virtual Memory) migration operations
- Without this fix, GPU compute workloads would be broken or unreliable
  on non-4K page systems

### 6. Stability Indicators

- Reviewed and acked by 3 AMD DRM maintainers/developers
- The fix is mathematically obvious — the existing `amdgpu_gart_map`
  function already generates `AMDGPU_GPU_PAGES_IN_CPU_PAGE` PTEs per CPU
  page, so the copy size must match
- No-op on 4K page systems, so zero risk of regression for x86

### 7. Dependency Check

The fix uses `AMDGPU_GPU_PAGES_IN_CPU_PAGE` which has been defined in
`amdgpu_gart.h` for a long time. The `svm_migrate_gart_map` function and
the surrounding code exist in all recent stable trees. No additional
dependencies needed.

### Summary

This is a textbook stable backport candidate:
- **Obviously correct**: One-line fix that aligns DMA copy size with the
  actual number of PTEs generated
- **Fixes a real bug**: Partial GART table update on non-4K page systems
- **Small and contained**: Single line change in a single file
- **No new features**: Pure bug fix
- **Low risk**: No-op on 4K (x86) systems; fixes broken behavior on
  non-4K systems
- **Well-reviewed**: Multiple AMD maintainer acks/reviews

**YES**

 drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
index 6ada7b4af7c68..5086caac3fd06 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c
@@ -61,7 +61,7 @@ svm_migrate_gart_map(struct amdgpu_ring *ring, u64 npages,
        *gart_addr = adev->gmc.gart_start;
 
        num_dw = ALIGN(adev->mman.buffer_funcs->copy_num_dw, 8);
-       num_bytes = npages * 8;
+       num_bytes = npages * 8 * AMDGPU_GPU_PAGES_IN_CPU_PAGE;
 
        r = amdgpu_job_alloc_with_ib(adev, &adev->mman.high_pr,
                                     AMDGPU_FENCE_OWNER_UNDEFINED,
-- 
2.51.0

Reply via email to