When libbpf attaches kprobe.session programs with exact function names
(the common case: SEC("kprobe.session/vfs_read")), the current code path
has two independent performance bottlenecks:
1. Userspace (libbpf): attach_kprobe_session() always parses
/proc/kallsyms to resolve function names, even when the name is exact
(no wildcards).
2. Kernel (ftrace): ftrace_lookup_symbols() does a full O(N) linear scan
Worse case ~200K kernel symbols via kallsyms_on_each_symbol(), decompressing
every symbol name, even when resolving a single symbol (cnt == 1).
This series optimizes both layers:
Patch 1 adds a dual-path optimization to libbpf's attach_kprobe_session().
When the section name contains no wildcards (* or ?), it passes the
function name via opts.syms[] directly to the kernel, completely skipping
the /proc/kallsyms parse. When wildcards are present, it falls back to
the existing pattern matching path. Error codes are normalized (ESRCH →
ENOENT) so both paths present identical errors for "symbol not found".
Patch 2 adds a cnt == 1 fast path inside ftrace_lookup_symbols(). For a
single symbol, it uses kallsyms_lookup_name() which performs an O(log N)
binary search via the sorted kallsyms index, needing only ~17 symbol
decompressions instead of ~200K. If the binary lookup fails (duplicate
symbol names where the first match is not ftrace-instrumented, or module
symbols), it falls through to the existing linear scan.
The optimization is placed inside ftrace_lookup_symbols() rather than in
its callers because:
- It benefits all callers (bpf_kprobe_multi_link_attach,
register_fprobe_syms) without duplicating logic.
- The cnt == 1 binary search with fallback is purely an internal
optimization detail of ftrace_lookup_symbols()'s contract.
For batch lookups (cnt > 1), the existing single-pass O(N) linear scan
is retained. Empirical profiling with perf and bpftrace on both QEMU
and real hardware showed that the linear scan beats per-symbol
binary search for batch resolution at every measured scale (500, 10K,
41K symbols).
Patch 3 adds selftests covering the optimization: test_session_syms
validates that exact function name attachment works correctly through
the fast path, and test_session_errors verifies that both the wildcard
(slow) and exact (fast) paths return identical -ENOENT errors for
non-existent functions.
Example - (50 kprobe.session programs, each attaching to one exact
function name via separate BPF_LINK_CREATE syscall, 50 distinct
functions):
Configuration Attach Time
-----------------------------------------------+-----------
Before (unpatched libbpf + kernel) 7,488 ms
Patched libbpf only 858 ms
Both patches (libbpf + ftrace) 52 ms
Traditional kprobe pairs (100 progs, reference) 132 ms
Combined improvement: 144x faster. kprobe.session is now 2.5x faster
than the equivalent traditional kprobe entry+return pair.
Background: ftrace_lookup_symbols() was added by "ftrace: Add
ftrace_lookup_symbols function" to batch-resolve thousands of
wildcard-matched symbols in a single linear pass. At the time,
kallsyms_lookup_name() was also a linear scan, so the batch approach
was strictly better. "kallsyms: Improve the performance of
kallsyms_lookup_name()" later added a sorted index making
kallsyms_lookup_name() O(log N), but ftrace_lookup_symbols() was
never updated to take advantage of this for the single-symbol case.
Andrey Grodzovsky (3):
libbpf: Optimize kprobe.session attachment for exact function names
ftrace: Use kallsyms binary search for single-symbol lookup
selftests/bpf: add tests for kprobe.session optimization
kernel/trace/ftrace.c | 28 +++++++
tools/lib/bpf/libbpf.c | 32 ++++++--
.../bpf/prog_tests/kprobe_multi_test.c | 76 +++++++++++++++++++
.../bpf/progs/kprobe_multi_session_errors.c | 27 +++++++
.../bpf/progs/kprobe_multi_session_syms.c | 45 +++++++++++
5 files changed, 203 insertions(+), 5 deletions(-)
create mode 100644
tools/testing/selftests/bpf/progs/kprobe_multi_session_errors.c
create mode 100644
tools/testing/selftests/bpf/progs/kprobe_multi_session_syms.c
--
2.34.1