Re: [PATCH 2/5] drm/amdgpu: Use placements of 256M GART segments for SI/CIK

Timur Kristóf Tue, 19 May 2026 02:16:40 -0700

On Tuesday, May 19, 2026 11:01:48 AM Central European Summer Time Christian 
König wrote:
> On 5/19/26 10:59, Timur Kristóf wrote:
> > On Tuesday, May 19, 2026 10:54:10 AM Central European Summer Time
> > Christian
> > 
> > König wrote:
> >> On 5/19/26 10:22, Timur Kristóf wrote:
> >>> UVD 4.x and older require that BOs don't cross 256M segments.
> >>> We need to respect that in amdgpu_ttm_alloc_gart().
> >>> We can't move the BOs later because GTT->GTT moves are
> >>> not implemented. We also can't force all BOs to VRAM
> >>> because that becomes very problematic in low VRAM scenarios.
> >>> 
> >>> This fixes UVD CS BOs crossing 256M segments
> >>> when they are placed in the GART.
> >> 
> >> Clear NAK for that approach.
> >> 
> >> This is the general TTM interface function and shouldn't have any HW
> >> generation dependent code in it.
> > 
> > I don't see how else to solve this, since GTT->GTT moves are not
> > implemented, so we can't move the BO to a suitable address later. We also
> > can't move it to VRAM.
> 
> GTT to GTT moves should be relatively easy to implement.
> 
> We just need to wait for the BO to be idle, unbind, move and bind again.


Implementing GTT->GTT moves sounds like a bigger task and cannot be backported 
as that would be a new feature not a bug fix. So I strongly prefer to solve 
this problem with the tools we already have available.

Also, I would prefer to not have to move the BO at all and give it a suitable 
address from the beginning, to avoid the overhead of the move.

As far as I understand you take issue with checking adev->family in 
amdgpu_ttm_alloc_gart(), right? So, how about one of these alternatives:

- add a bool argument so the caller can request 256M segments, then the caller 
can check the GPU generation
- add an optional argument so the caller can just pass in a placements array

Or, if you have a different suggestion, let me know.

Thanks,
Timur

> >> 
> >>> Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/4799
> >>> Signed-off-by: Timur Kristóf <[email protected]>
> >>> ---
> >>> 
> >>>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 56 ++++++++++++++++++++++---
> >>>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h |  3 ++
> >>>  2 files changed, 53 insertions(+), 6 deletions(-)
> >>> 
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index
> >>> 6c6ab4dd6ea9..a106c7e77e26 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> >>> @@ -959,6 +959,40 @@ static int amdgpu_ttm_backend_bind(struct
> >>> ttm_device
> >>> *bdev,>
> >>> 
> >>>   return 0;
> >>>  
> >>>  }
> >>> 
> >>> +/**
> >>> + * amdgpu_ttm_fill_gart_256M_placements() - Fill placements array with
> >>> 256M GART segments + *
> >>> + * @bo: TTM buffer objects whose placements should be filled
> >>> + * @placements: Pointer to an array of placements
> >>> + * @max_placements: Size of the placements array
> >>> + *
> >>> + * Fill the specified placements array with 256M GART segments,
> >>> + * starting from the highest address in order to reduce the
> >>> + * contention of the lowest segment.
> >>> + *
> >>> + * Returns the number of placements filled.
> >>> + */
> >>> +u32 amdgpu_ttm_fill_gart_256M_placements(struct ttm_buffer_object *bo,
> >>> +                                  struct ttm_place
> > 
> > *placements,
> > 
> >>> +                                  u32 max_placements)
> >>> +{
> >>> + struct amdgpu_device *adev = amdgpu_ttm_adev(bo->bdev);
> >>> + u32 i;
> >>> +
> >>> + /* Fill the placements array with 256M segments, starting from
> > 
> > highest.
> > 
> >>> */ +      for (i = 0; i < max_placements; ++i) {
> >>> +         if (i * SZ_256M >= adev->gmc.gart_size)
> >>> +                 break;
> >>> +
> >>> +         placements[i].lpfn = (adev->gmc.gart_size - i *
> > 
> > SZ_256M) >> PAGE_SHIFT;
> > 
> >>> +         placements[i].fpfn = ALIGN_DOWN(placements[i].lpfn - 1,
> > 
> > SZ_256M >>
> > 
> >>> PAGE_SHIFT); +            placements[i].mem_type = TTM_PL_TT;
> >>> +         placements[i].flags = bo->resource->placement;
> >>> + }
> >>> +
> >>> + return i;
> >>> +}
> >>> +
> >>> 
> >>>  /*
> >>>  
> >>>   * amdgpu_ttm_alloc_gart - Make sure buffer object is accessible either
> >>>   * through AGP or GART aperture.
> >>> 
> >>> @@ -973,7 +1007,7 @@ int amdgpu_ttm_alloc_gart(struct ttm_buffer_object
> >>> *bo)>
> >>> 
> >>>   struct ttm_operation_ctx ctx = { false, false };
> >>>   struct amdgpu_ttm_tt *gtt = ttm_to_amdgpu_ttm_tt(bo->ttm);
> >>>   struct ttm_placement placement;
> >>> 
> >>> - struct ttm_place placements;
> >>> + struct ttm_place placements[AMDGPU_BO_MAX_PLACEMENTS];
> >>> 
> >>>   struct ttm_resource *tmp;
> >>>   uint64_t addr, flags;
> >>>   int r;
> >>> 
> >>> @@ -987,11 +1021,21 @@ int amdgpu_ttm_alloc_gart(struct
> >>> ttm_buffer_object
> >>> *bo)>
> >>> 
> >>>   /* allocate GART space */
> >>>   placement.num_placement = 1;
> >>> 
> >>> - placement.placement = &placements;
> >>> - placements.fpfn = 0;
> >>> - placements.lpfn = adev->gmc.gart_size >> PAGE_SHIFT;
> >>> - placements.mem_type = TTM_PL_TT;
> >>> - placements.flags = bo->resource->placement;
> >>> + placement.placement = &placements[0];
> >>> + placements[0].fpfn = 0;
> >>> + placements[0].lpfn = adev->gmc.gart_size >> PAGE_SHIFT;
> >>> + placements[0].mem_type = TTM_PL_TT;
> >>> + placements[0].flags = bo->resource->placement;
> >>> +
> >>> + /*
> >>> +  * UVD 4.x and older require that BOs don't cross 256M segments.
> >>> +  * We need to respect that here. We can't move the BO later
> >>> +  * because GTT->GTT moves are not implemented.
> >>> +  */
> >>> + if (bo->base.size < SZ_256M && adev->family <= AMDGPU_FAMILY_KV)
> >>> +         placement.num_placement =
> >>> +                 amdgpu_ttm_fill_gart_256M_placements(bo,
> > 
> > placements,
> > 
> >>> +
> > 
> > ARRAY_SIZE(placements));
> > 
> >>>   r = ttm_bo_mem_space(bo, &placement, &tmp, &ctx);
> >>>   if (unlikely(r))
> >>> 
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
> >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h index
> >>> 2d72fa217274..e9de628c8d2d 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h
> >>> @@ -202,6 +202,9 @@ int amdgpu_ttm_clear_buffer(struct
> >>> amdgpu_ttm_buffer_entity *entity,>
> >>> 
> >>>                       u64 k_job_id);
> >>>  
> >>>  struct amdgpu_ttm_buffer_entity *amdgpu_ttm_next_clear_entity(struct
> >>>  amdgpu_device *adev);>
> >>> 
> >>> +u32 amdgpu_ttm_fill_gart_256M_placements(struct ttm_buffer_object *bo,
> >>> +                                  struct ttm_place
> > 
> > *placements,
> > 
> >>> +                                  u32 max_placements);
> >>> 
> >>>  int amdgpu_ttm_alloc_gart(struct ttm_buffer_object *bo);
> >>>  void amdgpu_ttm_recover_gart(struct ttm_buffer_object *tbo);
> >>>  uint64_t amdgpu_ttm_domain_start(struct amdgpu_device *adev, uint32_t
> >>>  type);

Re: [PATCH 2/5] drm/amdgpu: Use placements of 256M GART segments for SI/CIK

Reply via email to