Dear reviewers,

This is my first attempt at contributing to the Linux kernel. I am doing
an internship at Meta on the Linux team, and have recently been learning
the basics of the memory controller (cgroup v2) and BPF. I find these
topics really interesting; to help other beginners like me understand
how BPF is used, and to make a small contribution to this great
community, I wrote a few self-tests that compare two ways of reading
memory-cgroup statistics for a whole cgroup subtree:

  (A) the traditional path: open, read and parse memory.stat (plus
      memory.current / memory.max) for every cgroup from user space; and

  (B) a BPF path: a single SEC("iter.s/cgroup") program walked over the
      subtree that calls the memcg kfuncs (bpf_get_mem_cgroup,
      bpf_mem_cgroup_flush_stats, bpf_mem_cgroup_page_state,
      bpf_mem_cgroup_vm_events, bpf_put_mem_cgroup) for each cgroup and
      stores the results in a hash map, drained once afterwards.

The series builds on the memcg BPF kfuncs (mm/bpf_memcontrol.c). When those
kfuncs are unavailable (for example CONFIG_MEMCG=n) the tests skip cleanly
rather than failing to load.

These tests may also be useful as a small, self-contained comparison of the
BPF cgroup iterator against the file-based interface across cgroup trees of
different sizes and under different load. The pass/fail result of every test
depends only on the correctness / structural checks; the timing tables are
informational and are printed only under -v (or when a test fails), never on
a normal PASS.

The patches are:

  1/3 memcg_stat_reader - reads a quiescent (charged once) subtree both
      ways, asserts that the BPF snapshot agrees with memory.stat for the
      anon counter (which is rstat-flushed and deterministic), and reports
      the wall-clock cost of each path. It also adds a small
      read_cgroup_file() helper to cgroup_helpers (the read counterpart of
      write_cgroup_file) and selects CONFIG_MEMCG=y in the base selftest
      config.

  2/3 memcg_stat_churn - runs the same comparison while the tree is under
      continuous allocation churn (one busy mmap()/memset()/munmap() process
      per selected leaf), so each read pays a realistic rstat flush. It
      reuses the BPF program and map from patch 1 verbatim; only the
      user-space load model and sampling loop are new. Pass/fail is
      structural only. This is a closer simulation of real-world
      workloads than the first test.

  3/3 memcg_stat_churn_percpu - extends the churn test to make the
      per-cgroup cross-CPU rstat flush fan-out an explicit knob: each
      churner migrates across K CPUs, so a cgroup's statistics become dirty
      on K CPUs and a reader's flush must visit K per-cpu trees for it. This
      shows how the cost of the two readers changes as that fan-out grows.

In my testing (a 60-CPU VM) the BPF path is roughly an order of magnitude
faster than the per-cgroup memory.stat parse for a whole-tree scan, mainly
because it avoids the per-cgroup open/read and string parsing. The gap
narrows as the rstat flush that both paths share grows larger, for example
when a cgroup's statistics are dirty on many CPUs at once. The exact numbers
are included in each patch's changelog.

I used AI tools in part to help me understand these subsystems and to help
write the code. I have reviewed all of the code myself.

I would be very grateful for any feedback, and I apologise in advance for
anything I have gotten wrong. Thank you for taking the time to look at this.

Have a good day!

Suggested-by: Shakeel Butt <[email protected]>
Signed-off-by: Ziyang Men <[email protected]>

Ziyang Men (3):
  selftests/bpf: add memcg_stat_reader BPF-vs-memory.stat benchmark
  selftests/bpf: add memcg_stat_churn BPF-vs-memory.stat benchmark under
    churn
  selftests/bpf: add memcg_stat_churn_percpu BPF-vs-memory.stat
    benchmark under cross-CPU churn

 tools/testing/selftests/bpf/cgroup_helpers.c  |  46 +
 tools/testing/selftests/bpf/cgroup_helpers.h  |   2 +
 tools/testing/selftests/bpf/config            |   1 +
 .../testing/selftests/bpf/memcg_stat_reader.h |  35 +
 .../bpf/prog_tests/memcg_stat_churn.c         | 716 ++++++++++++++
 .../bpf/prog_tests/memcg_stat_churn_percpu.c  | 902 ++++++++++++++++++
 .../bpf/prog_tests/memcg_stat_reader.c        | 617 ++++++++++++
 .../selftests/bpf/progs/memcg_stat_reader.c   | 181 ++++
 8 files changed, 2500 insertions(+)
 create mode 100644 tools/testing/selftests/bpf/memcg_stat_reader.h
 create mode 100644 tools/testing/selftests/bpf/prog_tests/memcg_stat_churn.c
 create mode 100644 
tools/testing/selftests/bpf/prog_tests/memcg_stat_churn_percpu.c
 create mode 100644 tools/testing/selftests/bpf/prog_tests/memcg_stat_reader.c
 create mode 100644 tools/testing/selftests/bpf/progs/memcg_stat_reader.c

-- 
2.53.0-Meta


Reply via email to