On Mon, Apr 07, 2025 at 12:23:16PM -0400, Waiman Long <long...@redhat.com> wrote: > Child Actual usage Expected usage %err > ----- ------------ -------------- ---- > 1 16990208 22020096 -12.9% > 1 17252352 22020096 -12.1% > 0 37699584 30408704 +10.7% > 1 14368768 22020096 -21.0% > 1 16871424 22020096 -13.2% > > The current 10% error tolerenace might be right at the time > test_memcontrol.c was first introduced in v4.18 kernel, but memory > reclaim have certainly evolved quite a bit since then which may result > in a bit more run-to-run variation than previously expected.
I like Roman's suggestion of nr_cpus dependence but I assume your variations were still on the same system, weren't they? Is it fair to say that reclaim is chaotic [1]? I wonder what may cause variations between separate runs of the test. Would it help to `echo 3 >drop_caches` before each run to have more stable initial conditions? (Not sure if it's OK in selftests.) <del>Or sleep 0.5s to settle rstat flushing?</del> No, page_counter's don't suffer that but stock MEMCG_CHARGE_BATCH in percpu stocks. So maybe drain the stock so that counters are precise after the test? (Either by executing a dummy memcg on each CPU or via some debugging API.) Michal [1] https://en.wikipedia.org/wiki/Chaos_theory#Chaotic_dynamics
signature.asc
Description: PGP signature