On 19.09.25 13:13, Thadeu Lima de Souza Cascardo wrote: >>> >>>> The alternative I can offer is to disable the fallback which in your case >>>> would trigger the OOM killer. >>>> > > Warning could be as simple as removing __GFP_NOWARN. But I don't think we > want either a warning or to trigger the OOM killer when allocating lower > order pages are still possible. That will already happen when we get to 0 > order pages, where there is no fallback available anymore, and, then, it > makes sense to try harder and warn if no page can be allocated.
I don't think you understand the problem. Allocating lower order pages is not really an alternative. You run into really a lot of technical issues with that. The reason we have it is to prevent crashes in OOM situations. In other words still allow displaying warning messages for example. > Under my current workload, the balance skews torwards 0-order pages, > reducing the amount of 10 and 9 order pages to half, when comparing runs > with direct reclaim and without direct reclaim. That pretty much completely disqualifies this approach. This is a clear indicator that your system simply doesn't have enough memory for the workload you are trying to run. > So, I understand your > concern in respect to the impact on the GPU TLB and potential flickering. > Is there a way we can measure it on the devices we are using? And, then, if > it does not show to be a problem on those devices, would making this be a > setting per-device be acceptable to you? In a way that we could have in > userspace a list of devices where it is okay to prefer not to reclaim over > getting huge pages and that could be set if the workload prefers lower > latency in those allocations? No, you are clearly trying to run a use case which as far as I can see we can't really support without running into a lot of trouble sooner or later. Regards, Christian. > > Thanks. > Cascardo. > >>>> Regards, >>>> Christian. >>>> >>>>> >>>>> Other drivers can later opt to use this mechanism too. >>>>> >>>>> Signed-off-by: Thadeu Lima de Souza Cascardo <[email protected]> >>>>> --- >>>>> Changes in v2: >>>>> - Make disabling direct reclaim an option. >>>>> - Link to v1: >>>>> https://lore.kernel.org/r/[email protected] >>>>> >>>>> --- >>>>> Thadeu Lima de Souza Cascardo (3): >>>>> ttm: pool: allow requests to prefer latency over throughput >>>>> ttm: pool: add a module parameter to set latency preference >>>>> drm/amdgpu: allow allocation preferences when creating GEM object >>>>> >>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 3 ++- >>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 3 ++- >>>>> drivers/gpu/drm/ttm/ttm_pool.c | 23 +++++++++++++++++------ >>>>> drivers/gpu/drm/ttm/ttm_tt.c | 2 +- >>>>> include/drm/ttm/ttm_bo.h | 5 +++++ >>>>> include/drm/ttm/ttm_pool.h | 2 +- >>>>> include/drm/ttm/ttm_tt.h | 2 +- >>>>> include/uapi/drm/amdgpu_drm.h | 9 +++++++++ >>>>> 8 files changed, 38 insertions(+), 11 deletions(-) >>>>> --- >>>>> base-commit: f83ec76bf285bea5727f478a68b894f5543ca76e >>>>> change-id: 20250909-ttm_pool_no_direct_reclaim-ee0807a2d3fe >>>>> >>>>> Best regards, >>>> >>> >>
