On 2026-04-07 09:38, Philip Yang wrote:
On multi-socket MI300A APU systems, system memory pages mapped to the
closest GPU must use MTYPE_RW instead of MTYPE_NC to maintain correct
cache coherence. The existing mtype override in amdgpu_vm_pte_update_flags()
excluded non-contiguous page mappings from the override. This caused
incorrect MTYPE_NC for scattered local pages, leading to cache coherence
issues.

The override applies to both contiguous and non-contiguous mappings.
When pages_addr is set, resolve the physical address via
pages_addr[addr >> PAGE_SHIFT] before passing it to the override
callback for NUMA node lookup.

Introduce amdgpu_vm_addr_contiguous() helper that, on MI300A, treats
pages on different NUMA nodes as non-contiguous even if their DMA
addresses are adjacent. This ensures amdgpu_vm_update_range() splits
page table updates at NUMA node boundaries so each batch gets the
correct mtype override.

Signed-off-by: Philip Yang<[email protected]>
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c    | 48 +++++++++++++++++++----
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c | 14 +++++--
  2 files changed, 50 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
index 63156289ae7f..f8fcbf079bf4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
@@ -1099,6 +1099,34 @@ amdgpu_vm_tlb_flush(struct amdgpu_vm_update_params 
*params,
        }
  }
+/**
+ * amdgpu_vm_addr_contiguous - check if two DMA addresses are contiguous

The function name is a bit misleading. It doesn't return whether the pages are contiguous, but whether they are in the same "group" (same amdgpu_vm_ptes_update call) by some pretty arbitrary criteria. Maybe call it amdgpu_vm_addr_same_group.


+ *
+ * @adev: amdgpu_device pointer
+ * @addr: current DMA address
+ * @addr_next: next DMA address to check against
+ * @contiguous: current contiguity state of the range being built
+ *
+ * Check whether @addr and @addr_next are physically contiguous. On APU
+ * platforms with multiple NUMA nodes (e.g. MI300A), a NUMA node boundary
+ * also breaks contiguity so that each contiguous batch stays within a
+ * single NUMA node for correct MTYPE override selection.
+ *
+ * Returns:
+ * true if @addr_next continues the current contiguous range, false otherwise.
+ */
+static inline bool amdgpu_vm_addr_contiguous(struct amdgpu_device *adev, 
dma_addr_t addr,
+                                            dma_addr_t addr_next, bool 
contiguous)
+{
+       if (!adev->gmc.is_app_apu || !page_is_ram(addr >> PAGE_SHIFT))

Isn't page_is_ram always true on APP APUs?

On the other hand, I think you need a check here that IOMMU device isolation is off (adev->ram_is_direct_mapped). Otherwise you cannot infer the NUMA node from the DMA address. I'd put the condition for that in the caller where you only need to check it once. Then replace the adev parameter with a bool same_nid. And while you're at it, you can add some more conditions to make the fast-path more likely:

        same_nid = adev->gmc.is_app_apu && adev->ram_is_direct_mapped &&
                   adev->gmc.gmc_funcs->override_vm_pte_flags &&
                   num_possible_nodes() > 1 && params->allow_override;

Regards,
  Felix


+               return (addr + PAGE_SIZE) == addr_next;
+
+       if (pfn_to_nid(addr >> PAGE_SHIFT) != pfn_to_nid(addr_next >> 
PAGE_SHIFT))
+               return !contiguous;
+
+       return (addr + PAGE_SIZE) == addr_next;
+}
+
  /**
   * amdgpu_vm_update_range - update a range in the vm page table
   *
@@ -1198,22 +1226,26 @@ int amdgpu_vm_update_range(struct amdgpu_device *adev, 
struct amdgpu_vm *vm,
                                uint64_t pfn = cursor.start >> PAGE_SHIFT;
                                uint64_t count;
- contiguous = pages_addr[pfn + 1] ==
-                                       pages_addr[pfn] + PAGE_SIZE;
+                               contiguous = amdgpu_vm_addr_contiguous(adev,
+                                                                      
pages_addr[pfn],
+                                                                      
pages_addr[pfn + 1],
+                                                                      
contiguous);
- tmp = num_entries /
-                                       AMDGPU_GPU_PAGES_IN_CPU_PAGE;
+                               tmp = num_entries / 
AMDGPU_GPU_PAGES_IN_CPU_PAGE;
                                for (count = 2; count < tmp; ++count) {
                                        uint64_t idx = pfn + count;
- if (contiguous != (pages_addr[idx] ==
-                                           pages_addr[idx - 1] + PAGE_SIZE))
+                                       if (contiguous != 
amdgpu_vm_addr_contiguous(adev,
+                                                                       
pages_addr[idx - 1],
+                                                                       
pages_addr[idx],
+                                                                       
contiguous))
                                                break;
                                }
+
                                if (!contiguous)
                                        count--;
-                               num_entries = count *
-                                       AMDGPU_GPU_PAGES_IN_CPU_PAGE;
+
+                               num_entries = count * 
AMDGPU_GPU_PAGES_IN_CPU_PAGE;
                        }
if (!contiguous) {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
index 31a437ce9570..9e1607fb3b2e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm_pt.c
@@ -708,13 +708,19 @@ static void amdgpu_vm_pte_update_flags(struct 
amdgpu_vm_update_params *params,
                amdgpu_vm_pte_update_noretry_flags(adev, &flags);
/* APUs mapping system memory may need different MTYPEs on different
-        * NUMA nodes. Only do this for contiguous ranges that can be assumed
-        * to be on the same NUMA node.
+        * NUMA nodes. Both contiguous and non-contiguous ranges are handled
+        * since amdgpu_vm_update_range ensures updates don't span NUMA
+        * node boundaries.
         */
        if ((flags & AMDGPU_PTE_SYSTEM) && (adev->flags & AMD_IS_APU) &&
            adev->gmc.gmc_funcs->override_vm_pte_flags &&
-           num_possible_nodes() > 1 && !params->pages_addr && 
params->allow_override)
-               amdgpu_gmc_override_vm_pte_flags(adev, params->vm, addr, 
&flags);
+           num_possible_nodes() > 1 && params->allow_override) {
+               if (params->pages_addr)
+                       amdgpu_gmc_override_vm_pte_flags(adev, params->vm,
+                                       params->pages_addr[addr >> PAGE_SHIFT], 
&flags);
+               else
+                       amdgpu_gmc_override_vm_pte_flags(adev, params->vm, addr, 
&flags);
+       }
params->vm->update_funcs->update(params, pt, pe, addr, count, incr,
                                         flags);

Reply via email to