On 1/12/26 21:39, Alex Deucher wrote:
> On Mon, Jan 12, 2026 at 3:28 PM Felix Kuehling <[email protected]> wrote:
>>
>>
>> On 2026-01-12 09:06, Donet Tom wrote:
>>> RFC -> v2
>>> =========
>>>
>>> In RFC patch v1 [1], there were 8 patches. From that series, patches 1–3 are
>>> required to enable minimal support for 64K pages in AMDGPU. I have added 
>>> those
>>> 3 pacthes in this series.
>>>
>>> With these three patches applied, all RCCL tests and the rocr-debug-agent 
>>> tests
>>> pass on a ppc64le system with 64K page size on 2GPUs.  However, on systems 
>>> with
>>> more than 2 GPUs and with XNACK enabled, we require  additional Patches 
>>> [4-8]
>>> which were posted earlier as part of RFC [1]  Since that require a bit of 
>>> additional
>>> work and discussion. We will post v2 of them later as Part-2.
>>>
>>> 1. Patch 1 was updated to only relax the EOP buffer size check, based on 
>>> Philip Yang’s comment.
>>>
>>> 2. Philip’s review comments on Patch 2 were addressed, and Reviewed-by tags 
>>> were added to
>>>     Patch 2 and Patch 3.
>>>
>>> [1] https://lore.kernel.org/all/[email protected]/
>>>
>>> If this looks good, could we pull these changes into v6.20?
>>
>> The series looks good to me.
>>
>> Reviewed-by: Felix Kuehling <[email protected]>
>>
>> Alex, what does it take to get this into 6.20? I guess you'll want to
>> include this in a pull-request for drm-fixes ASAP?
> 
> Yes, if you can land it in amd-staging-drm-next ASAP, I'll include it
> in this week's PR.

If possible feel free to add an Acked-by: Christian König 
<[email protected]>.

I will try to work with Pierre-Eric to get the DMA window patches upstream so 
that it is possible to base the rest of the work on top of that.

Regards,
Christian.

> 
> Alex
> 
>>
>> Regards,
>>    Felix
>>
>>
>>>
>>> This patch series addresses few issues which we encountered while running 
>>> rocr
>>> debug agent and rccl unit tests with AMD GPU on Power10 (ppc64le), using 64k
>>> system pagesize.
>>>
>>> Note that we don't observe any of these issues while booting with 4k system
>>> pagesize on Power. So with the 64K system pagesize what we observed so far 
>>> is,
>>> at few of the places, the conversion between gpu pfn to cpu pfn (or vice 
>>> versa)
>>> may not be done correctly (due to different page size of AMD GPU (4K)
>>> v/s cpu pagesize (64K)) which causes issues like gpu page faults or gpu hang
>>> while running these tests.
>>>
>>> Changes so far in this series:
>>> =============================
>>> 1. For now, during kfd queue creation, this patch lifts the restriction on 
>>> EOP
>>>     buffer size to be same buffer object mapping size.
>>>
>>> 2. Fix SVM range map/unmap operations to convert CPU page numbers to GPU 
>>> page
>>>     numbers before calling amdgpu_vm_update_range(), which expects 4K GPU 
>>> pages.
>>>     Without this the rocr-debug-agent tests and rccl unit  tests were 
>>> failing.
>>>
>>> 3. Fix GART PTE allocation in migration code to account for multiple GPU 
>>> pages
>>>     per CPU page. The current code only allocates PTEs based on number of 
>>> CPU
>>>     pages, but GART may need one PTE per 4K GPU page.
>>>
>>> Setup details:
>>> ============
>>> System details: Power10 LPAR using 64K pagesize.
>>> AMD GPU:
>>>    Name:                    gfx90a
>>>    Marketing Name:          AMD Instinct MI210
>>>
>>> Donet Tom (3):
>>>    drm/amdkfd: Relax size checking during queue buffer get
>>>    drm/amdkfd: Fix SVM map/unmap address conversion for non-4k page sizes
>>>    drm/amdkfd: Fix GART PTE for non-4K pagesize in svm_migrate_gart_map()
>>>
>>>   drivers/gpu/drm/amd/amdkfd/kfd_migrate.c |  2 +-
>>>   drivers/gpu/drm/amd/amdkfd/kfd_queue.c   |  6 ++---
>>>   drivers/gpu/drm/amd/amdkfd/kfd_svm.c     | 29 +++++++++++++++++-------
>>>   3 files changed, 25 insertions(+), 12 deletions(-)
>>>

Reply via email to