On 15.09.25 14:36, Natalie Vock wrote: > Hi all, > > I've been looking into some cases where dmem protection fails to prevent > allocations from ending up in GTT when VRAM gets scarce and apps start > competing hard. > > In short, this is because other (unprotected) applications end up > filling VRAM before protected applications do. This causes TTM to back > off and try allocating in GTT before anything else, and that is where > the allocation is placed in the end. The existing eviction protection > cannot prevent this, because no attempt at evicting is ever made > (although you could consider the backing-off as an immediate eviction to > GTT).
Well depending on what you gave as GEM flags from userspace that is expected behavior. For applications using RADV we usually give GTT|VRAM as placement which basically tells the kernel that it shouldn't evict at all and immediately fallback to GTT. Regards, Christian. > > This series tries to alleviate this by adding a special case when the > allocation is protected by cgroups: Instead of backing off immediately, > TTM will try evicting unprotected buffers from the domain to make space > for the protected one. This ensures that applications can actually use > all the memory protection awarded to them by the system, without being > prone to ping-ponging (only protected allocations can evict unprotected > ones, never the other way around). > > The first two patches just add a few small utilities needed to implement > this to the dmem controller. The second two patches are the TTM > implementation: > > "drm/ttm: Be more aggressive..." decouples cgroup charging from resource > allocation to allow us to hold on to the charge even if allocation fails > on first try, and adds a path to call ttm_bo_evict_alloc when the > charged allocation falls within min/low protection limits. > > "drm/ttm: Use common ancestor..." is a more general improvement in > correctly implementing cgroup protection semantics. With recursive > protection rules, unused memory protection afforded to a parent node is > transferred to children recursively, which helps protect entire > subtrees from stealing each others' memory without needing to protect > each cgroup individually. This doesn't apply when considering direct > siblings inside the same subtree, so in order to not break > prioritization between these siblings, we need to consider the > relationship of evictor and evictee when calculating protection. > In practice, this fixes cases where a protected cgroup cannot steal > memory from unprotected siblings (which, in turn, leads to eviction > failures and new allocations being placed in GTT). > > Thanks, > Natalie > > Signed-off-by: Natalie Vock <[email protected]> > --- > Natalie Vock (4): > cgroup/dmem: Add queries for protection values > cgroup/dmem: Add dmem_cgroup_common_ancestor helper > drm/ttm: Be more aggressive when allocating below protection limit > drm/ttm: Use common ancestor of evictor and evictee as limit pool > > drivers/gpu/drm/ttm/ttm_bo.c | 79 > ++++++++++++++++++++++++++++++++------ > drivers/gpu/drm/ttm/ttm_resource.c | 48 ++++++++++++++++------- > include/drm/ttm/ttm_resource.h | 6 ++- > include/linux/cgroup_dmem.h | 25 ++++++++++++ > kernel/cgroup/dmem.c | 73 +++++++++++++++++++++++++++++++++++ > 5 files changed, 205 insertions(+), 26 deletions(-) > --- > base-commit: f3e82936857b3bd77b824ecd2fa7839dd99ec0c6 > change-id: 20250915-dmemcg-aggressive-protect-5cf37f717cdb > > Best regards,
