On Tue, 8 Apr 2025, Mathieu Desnoyers wrote: > - Minimize contention when incrementing and decrementing counters, > - Provide fast access to a sum approximation,
In general I like this as a abstraction of the Zoned VM counters in vmstat.c that will make the scalable counters there useful elsewhere. > It aims at fixing the per-mm RSS tracking which has become too > inaccurate for OOM killer purposes on large many-core systems [1]. There are numerous cases where these issues occur. I know of a few I could use something like this. > The hierarchical per-CPU counters propagate a sum approximation through > a binary tree. When reaching the batch size, the carry is propagated > through a binary tree which consists of log2(nr_cpu_ids) levels. The > batch size for each level is twice the batch size of the prior level. A binary tree? Could we do this N-way? Otherwise the tree will be 8 levels on a 512 cpu machine. Given the inflation of the number of cpus this scheme better work up to 8K cpus. > +int percpu_counter_tree_precise_sum(struct percpu_counter_tree *counter); > +int percpu_counter_tree_precise_compare(struct percpu_counter_tree *a, > struct percpu_counter_tree *b); > +int percpu_counter_tree_precise_compare_value(struct percpu_counter_tree > *counter, int v); Precise? Concurrent counter updates can occur while determining the global value. People may get confused. Also maybe there would be a need for a function to collape the values into the global if f.e. a cpu goes off line or in order to switch off OS activities on a cpu.