On 1/12/26 21:39, Alex Deucher wrote:
> On Mon, Jan 12, 2026 at 3:28 PM Felix Kuehling <[email protected]> wrote:
>>
>>
>> On 2026-01-12 09:06, Donet Tom wrote:
>>> RFC -> v2
>>> =========
>>>
>>> In RFC patch v1 [1], there were 8 patches. From that series, patches 1–3 are
>>> required to enable minimal support for 64K pages in AMDGPU. I have added
>>> those
>>> 3 pacthes in this series.
>>>
>>> With these three patches applied, all RCCL tests and the rocr-debug-agent
>>> tests
>>> pass on a ppc64le system with 64K page size on 2GPUs. However, on systems
>>> with
>>> more than 2 GPUs and with XNACK enabled, we require additional Patches
>>> [4-8]
>>> which were posted earlier as part of RFC [1] Since that require a bit of
>>> additional
>>> work and discussion. We will post v2 of them later as Part-2.
>>>
>>> 1. Patch 1 was updated to only relax the EOP buffer size check, based on
>>> Philip Yang’s comment.
>>>
>>> 2. Philip’s review comments on Patch 2 were addressed, and Reviewed-by tags
>>> were added to
>>> Patch 2 and Patch 3.
>>>
>>> [1] https://lore.kernel.org/all/[email protected]/
>>>
>>> If this looks good, could we pull these changes into v6.20?
>>
>> The series looks good to me.
>>
>> Reviewed-by: Felix Kuehling <[email protected]>
>>
>> Alex, what does it take to get this into 6.20? I guess you'll want to
>> include this in a pull-request for drm-fixes ASAP?
>
> Yes, if you can land it in amd-staging-drm-next ASAP, I'll include it
> in this week's PR.
If possible feel free to add an Acked-by: Christian König
<[email protected]>.
I will try to work with Pierre-Eric to get the DMA window patches upstream so
that it is possible to base the rest of the work on top of that.
Regards,
Christian.
>
> Alex
>
>>
>> Regards,
>> Felix
>>
>>
>>>
>>> This patch series addresses few issues which we encountered while running
>>> rocr
>>> debug agent and rccl unit tests with AMD GPU on Power10 (ppc64le), using 64k
>>> system pagesize.
>>>
>>> Note that we don't observe any of these issues while booting with 4k system
>>> pagesize on Power. So with the 64K system pagesize what we observed so far
>>> is,
>>> at few of the places, the conversion between gpu pfn to cpu pfn (or vice
>>> versa)
>>> may not be done correctly (due to different page size of AMD GPU (4K)
>>> v/s cpu pagesize (64K)) which causes issues like gpu page faults or gpu hang
>>> while running these tests.
>>>
>>> Changes so far in this series:
>>> =============================
>>> 1. For now, during kfd queue creation, this patch lifts the restriction on
>>> EOP
>>> buffer size to be same buffer object mapping size.
>>>
>>> 2. Fix SVM range map/unmap operations to convert CPU page numbers to GPU
>>> page
>>> numbers before calling amdgpu_vm_update_range(), which expects 4K GPU
>>> pages.
>>> Without this the rocr-debug-agent tests and rccl unit tests were
>>> failing.
>>>
>>> 3. Fix GART PTE allocation in migration code to account for multiple GPU
>>> pages
>>> per CPU page. The current code only allocates PTEs based on number of
>>> CPU
>>> pages, but GART may need one PTE per 4K GPU page.
>>>
>>> Setup details:
>>> ============
>>> System details: Power10 LPAR using 64K pagesize.
>>> AMD GPU:
>>> Name: gfx90a
>>> Marketing Name: AMD Instinct MI210
>>>
>>> Donet Tom (3):
>>> drm/amdkfd: Relax size checking during queue buffer get
>>> drm/amdkfd: Fix SVM map/unmap address conversion for non-4k page sizes
>>> drm/amdkfd: Fix GART PTE for non-4K pagesize in svm_migrate_gart_map()
>>>
>>> drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 2 +-
>>> drivers/gpu/drm/amd/amdkfd/kfd_queue.c | 6 ++---
>>> drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 29 +++++++++++++++++-------
>>> 3 files changed, 25 insertions(+), 12 deletions(-)
>>>