On 2026-01-12 09:06, Donet Tom wrote:
RFC -> v2
=========

In RFC patch v1 [1], there were 8 patches. From that series, patches 1–3 are
required to enable minimal support for 64K pages in AMDGPU. I have added those
3 pacthes in this series.

With these three patches applied, all RCCL tests and the rocr-debug-agent tests
pass on a ppc64le system with 64K page size on 2GPUs.  However, on systems with
more than 2 GPUs and with XNACK enabled, we require  additional Patches [4-8]
which were posted earlier as part of RFC [1]  Since that require a bit of 
additional
work and discussion. We will post v2 of them later as Part-2.

1. Patch 1 was updated to only relax the EOP buffer size check, based on Philip 
Yang’s comment.

2. Philip’s review comments on Patch 2 were addressed, and Reviewed-by tags 
were added to
    Patch 2 and Patch 3.

[1] https://lore.kernel.org/all/[email protected]/

If this looks good, could we pull these changes into v6.20?

The series looks good to me.

Reviewed-by: Felix Kuehling <[email protected]>

Alex, what does it take to get this into 6.20? I guess you'll want to include this in a pull-request for drm-fixes ASAP?

Regards,
  Felix



This patch series addresses few issues which we encountered while running rocr
debug agent and rccl unit tests with AMD GPU on Power10 (ppc64le), using 64k
system pagesize.

Note that we don't observe any of these issues while booting with 4k system
pagesize on Power. So with the 64K system pagesize what we observed so far is,
at few of the places, the conversion between gpu pfn to cpu pfn (or vice versa)
may not be done correctly (due to different page size of AMD GPU (4K)
v/s cpu pagesize (64K)) which causes issues like gpu page faults or gpu hang
while running these tests.

Changes so far in this series:
=============================
1. For now, during kfd queue creation, this patch lifts the restriction on EOP
    buffer size to be same buffer object mapping size.

2. Fix SVM range map/unmap operations to convert CPU page numbers to GPU page
    numbers before calling amdgpu_vm_update_range(), which expects 4K GPU pages.
    Without this the rocr-debug-agent tests and rccl unit  tests were failing.

3. Fix GART PTE allocation in migration code to account for multiple GPU pages
    per CPU page. The current code only allocates PTEs based on number of CPU
    pages, but GART may need one PTE per 4K GPU page.

Setup details:
============
System details: Power10 LPAR using 64K pagesize.
AMD GPU:
   Name:                    gfx90a
   Marketing Name:          AMD Instinct MI210

Donet Tom (3):
   drm/amdkfd: Relax size checking during queue buffer get
   drm/amdkfd: Fix SVM map/unmap address conversion for non-4k page sizes
   drm/amdkfd: Fix GART PTE for non-4K pagesize in svm_migrate_gart_map()

  drivers/gpu/drm/amd/amdkfd/kfd_migrate.c |  2 +-
  drivers/gpu/drm/amd/amdkfd/kfd_queue.c   |  6 ++---
  drivers/gpu/drm/amd/amdkfd/kfd_svm.c     | 29 +++++++++++++++++-------
  3 files changed, 25 insertions(+), 12 deletions(-)

Reply via email to