Dear reviewers,
This is my first attempt at contributing to the Linux kernel. I am doing
an internship at Meta on the Linux team, and have recently been learning
the basics of the memory controller (cgroup v2) and BPF. I find these
topics really interesting; to help other beginners like me understand
how BPF is used, and to make a small contribution to this great
community, I wrote a few self-tests that compare two ways of reading
memory-cgroup statistics for a whole cgroup subtree:
(A) the traditional path: open, read and parse memory.stat (plus
memory.current / memory.max) for every cgroup from user space; and
(B) a BPF path: a single SEC("iter.s/cgroup") program walked over the
subtree that calls the memcg kfuncs (bpf_get_mem_cgroup,
bpf_mem_cgroup_flush_stats, bpf_mem_cgroup_page_state,
bpf_mem_cgroup_vm_events, bpf_put_mem_cgroup) for each cgroup and
stores the results in a hash map, drained once afterwards.
The series builds on the memcg BPF kfuncs (mm/bpf_memcontrol.c). When those
kfuncs are unavailable (for example CONFIG_MEMCG=n) the tests skip cleanly
rather than failing to load.
These tests may also be useful as a small, self-contained comparison of the
BPF cgroup iterator against the file-based interface across cgroup trees of
different sizes and under different load. The pass/fail result of every test
depends only on the correctness / structural checks; the timing tables are
informational and are printed only under -v (or when a test fails), never on
a normal PASS.
The patches are:
1/3 memcg_stat_reader - reads a quiescent (charged once) subtree both
ways, asserts that the BPF snapshot agrees with memory.stat for the
anon counter (which is rstat-flushed and deterministic), and reports
the wall-clock cost of each path. It also adds a small
read_cgroup_file() helper to cgroup_helpers (the read counterpart of
write_cgroup_file) and selects CONFIG_MEMCG=y in the base selftest
config.
2/3 memcg_stat_churn - runs the same comparison while the tree is under
continuous allocation churn (one busy mmap()/memset()/munmap() process
per selected leaf), so each read pays a realistic rstat flush. It
reuses the BPF program and map from patch 1 verbatim; only the
user-space load model and sampling loop are new. Pass/fail is
structural only. This is a closer simulation of real-world
workloads than the first test.
3/3 memcg_stat_churn_percpu - extends the churn test to make the
per-cgroup cross-CPU rstat flush fan-out an explicit knob: each
churner migrates across K CPUs, so a cgroup's statistics become dirty
on K CPUs and a reader's flush must visit K per-cpu trees for it. This
shows how the cost of the two readers changes as that fan-out grows.
In my testing (a 60-CPU VM) the BPF path is roughly an order of magnitude
faster than the per-cgroup memory.stat parse for a whole-tree scan, mainly
because it avoids the per-cgroup open/read and string parsing. The gap
narrows as the rstat flush that both paths share grows larger, for example
when a cgroup's statistics are dirty on many CPUs at once. The exact numbers
are included in each patch's changelog.
I used AI tools in part to help me understand these subsystems and to help
write the code. I have reviewed all of the code myself.
I would be very grateful for any feedback, and I apologise in advance for
anything I have gotten wrong. Thank you for taking the time to look at this.
Have a good day!
Suggested-by: Shakeel Butt <[email protected]>
Signed-off-by: Ziyang Men <[email protected]>
Ziyang Men (3):
selftests/bpf: add memcg_stat_reader BPF-vs-memory.stat benchmark
selftests/bpf: add memcg_stat_churn BPF-vs-memory.stat benchmark under
churn
selftests/bpf: add memcg_stat_churn_percpu BPF-vs-memory.stat
benchmark under cross-CPU churn
tools/testing/selftests/bpf/cgroup_helpers.c | 46 +
tools/testing/selftests/bpf/cgroup_helpers.h | 2 +
tools/testing/selftests/bpf/config | 1 +
.../testing/selftests/bpf/memcg_stat_reader.h | 35 +
.../bpf/prog_tests/memcg_stat_churn.c | 716 ++++++++++++++
.../bpf/prog_tests/memcg_stat_churn_percpu.c | 902 ++++++++++++++++++
.../bpf/prog_tests/memcg_stat_reader.c | 617 ++++++++++++
.../selftests/bpf/progs/memcg_stat_reader.c | 181 ++++
8 files changed, 2500 insertions(+)
create mode 100644 tools/testing/selftests/bpf/memcg_stat_reader.h
create mode 100644 tools/testing/selftests/bpf/prog_tests/memcg_stat_churn.c
create mode 100644
tools/testing/selftests/bpf/prog_tests/memcg_stat_churn_percpu.c
create mode 100644 tools/testing/selftests/bpf/prog_tests/memcg_stat_reader.c
create mode 100644 tools/testing/selftests/bpf/progs/memcg_stat_reader.c
--
2.53.0-Meta