On 15.09.25 14:36, Natalie Vock wrote:
> Hi all,
> 
> I've been looking into some cases where dmem protection fails to prevent
> allocations from ending up in GTT when VRAM gets scarce and apps start
> competing hard.
> 
> In short, this is because other (unprotected) applications end up
> filling VRAM before protected applications do. This causes TTM to back
> off and try allocating in GTT before anything else, and that is where
> the allocation is placed in the end. The existing eviction protection
> cannot prevent this, because no attempt at evicting is ever made
> (although you could consider the backing-off as an immediate eviction to
> GTT).

Well depending on what you gave as GEM flags from userspace that is expected 
behavior.

For applications using RADV we usually give GTT|VRAM as placement which 
basically tells the kernel that it shouldn't evict at all and immediately 
fallback to GTT.

Regards,
Christian.

> 
> This series tries to alleviate this by adding a special case when the
> allocation is protected by cgroups: Instead of backing off immediately,
> TTM will try evicting unprotected buffers from the domain to make space
> for the protected one. This ensures that applications can actually use
> all the memory protection awarded to them by the system, without being
> prone to ping-ponging (only protected allocations can evict unprotected
> ones, never the other way around).
> 
> The first two patches just add a few small utilities needed to implement
> this to the dmem controller. The second two patches are the TTM
> implementation:
> 
> "drm/ttm: Be more aggressive..." decouples cgroup charging from resource
> allocation to allow us to hold on to the charge even if allocation fails
> on first try, and adds a path to call ttm_bo_evict_alloc when the
> charged allocation falls within min/low protection limits.
> 
> "drm/ttm: Use common ancestor..." is a more general improvement in
> correctly implementing cgroup protection semantics. With recursive
> protection rules, unused memory protection afforded to a parent node is
> transferred to children recursively, which helps protect entire
> subtrees from stealing each others' memory without needing to protect
> each cgroup individually. This doesn't apply when considering direct
> siblings inside the same subtree, so in order to not break
> prioritization between these siblings, we need to consider the
> relationship of evictor and evictee when calculating protection.
> In practice, this fixes cases where a protected cgroup cannot steal
> memory from unprotected siblings (which, in turn, leads to eviction
> failures and new allocations being placed in GTT).
> 
> Thanks,
> Natalie
> 
> Signed-off-by: Natalie Vock <[email protected]>
> ---
> Natalie Vock (4):
>       cgroup/dmem: Add queries for protection values
>       cgroup/dmem: Add dmem_cgroup_common_ancestor helper
>       drm/ttm: Be more aggressive when allocating below protection limit
>       drm/ttm: Use common ancestor of evictor and evictee as limit pool
> 
>  drivers/gpu/drm/ttm/ttm_bo.c       | 79 
> ++++++++++++++++++++++++++++++++------
>  drivers/gpu/drm/ttm/ttm_resource.c | 48 ++++++++++++++++-------
>  include/drm/ttm/ttm_resource.h     |  6 ++-
>  include/linux/cgroup_dmem.h        | 25 ++++++++++++
>  kernel/cgroup/dmem.c               | 73 +++++++++++++++++++++++++++++++++++
>  5 files changed, 205 insertions(+), 26 deletions(-)
> ---
> base-commit: f3e82936857b3bd77b824ecd2fa7839dd99ec0c6
> change-id: 20250915-dmemcg-aggressive-protect-5cf37f717cdb
> 
> Best regards,

Reply via email to