Hi,
sorry to jump in this late but the timing of previous versions didn't
really work well for me.

On Sun 11-01-26 14:49:57, Mathieu Desnoyers wrote:
[...]
> Here is a (possibly incomplete) list of the prior approaches that were
> used or proposed, along with their downside:
> 
> 1) Per-thread rss tracking: large error on many-thread processes.
> 
> 2) Per-CPU counters: up to 12% slower for short-lived processes and 9%
>    increased system time in make test workloads [1]. Moreover, the
>    inaccuracy increases with O(n^2) with the number of CPUs.
> 
> 3) Per-NUMA-node counters: requires atomics on fast-path (overhead),
>    error is high with systems that have lots of NUMA nodes (32 times
>    the number of NUMA nodes).
> 
> The approach proposed here is to replace this by the hierarchical
> per-cpu counters, which bounds the inaccuracy based on the system
> topology with O(N*logN).

The concept of hierarchical pcp counter is interesting and I am
definitely not opposed if there are more users that would benefit.

>From the OOM POV, IIUC the primary problem is that get_mm_counter
(percpu_counter_read_positive) is too imprecise on systems when the task
is moving around a large number of cpus. In the list of alternative
solutions I do not see percpu_counter_sum_positive to be mentioned.
oom_badness() is a really slow path and taking the slow path to
calculate a much more precise value seems acceptable. Have you
considered that option?

-- 
Michal Hocko
SUSE Labs

Reply via email to