When libbpf attaches kprobe.session programs with exact function names
(the common case: SEC("kprobe.session/vfs_read")), the current code path
has two independent performance bottlenecks:

1. Userspace (libbpf): attach_kprobe_session() always parses
   /proc/kallsyms to resolve function names, even when the name is exact
   (no wildcards).

2. Kernel (ftrace): ftrace_lookup_symbols() does a full O(N) linear scan
   Worse case ~200K kernel symbols via kallsyms_on_each_symbol(), decompressing
   every symbol name, even when resolving a single symbol (cnt == 1).

This series optimizes both layers:

Patch 1 adds a dual-path optimization to libbpf's attach_kprobe_session().
When the section name contains no wildcards (* or ?), it passes the
function name via opts.syms[] directly to the kernel, completely skipping
the /proc/kallsyms parse.  When wildcards are present, it falls back to
the existing pattern matching path.  Error codes are normalized (ESRCH →
ENOENT) so both paths present identical errors for "symbol not found".

Patch 2 adds a cnt == 1 fast path inside ftrace_lookup_symbols().  For a
single symbol, it uses kallsyms_lookup_name() which performs an O(log N)
binary search via the sorted kallsyms index, needing only ~17 symbol
decompressions instead of ~200K.  If the binary lookup fails (duplicate
symbol names where the first match is not ftrace-instrumented, or module
symbols), it falls through to the existing linear scan.

The optimization is placed inside ftrace_lookup_symbols() rather than in
its callers because:
  - It benefits all callers (bpf_kprobe_multi_link_attach,
    register_fprobe_syms) without duplicating logic.
  - The cnt == 1 binary search with fallback is purely an internal
    optimization detail of ftrace_lookup_symbols()'s contract.

For batch lookups (cnt > 1), the existing single-pass O(N) linear scan
is retained.  Empirical profiling with perf and bpftrace on both QEMU
and real hardware showed that the linear scan beats per-symbol
binary search for batch resolution at every measured scale (500, 10K,
41K symbols).

Patch 3 adds selftests covering the optimization: test_session_syms
validates that exact function name attachment works correctly through
the fast path, and test_session_errors verifies that both the wildcard
(slow) and exact (fast) paths return identical -ENOENT errors for
non-existent functions.

Example -  (50 kprobe.session programs, each attaching to one exact
function name via separate BPF_LINK_CREATE syscall, 50 distinct
functions):

  Configuration                                  Attach Time
  -----------------------------------------------+-----------
  Before (unpatched libbpf + kernel)              7,488 ms
  Patched libbpf only                               858 ms
  Both patches (libbpf + ftrace)                      52 ms
  Traditional kprobe pairs (100 progs, reference)    132 ms

Combined improvement: 144x faster.  kprobe.session is now 2.5x faster
than the equivalent traditional kprobe entry+return pair.

Background: ftrace_lookup_symbols() was added by "ftrace: Add
ftrace_lookup_symbols function" to batch-resolve thousands of
wildcard-matched symbols in a single linear pass.  At the time,
kallsyms_lookup_name() was also a linear scan, so the batch approach
was strictly better.  "kallsyms: Improve the performance of
kallsyms_lookup_name()" later added a sorted index making
kallsyms_lookup_name() O(log N), but ftrace_lookup_symbols() was
never updated to take advantage of this for the single-symbol case.

Andrey Grodzovsky (3):
  libbpf: Optimize kprobe.session attachment for exact function names
  ftrace: Use kallsyms binary search for single-symbol lookup
  selftests/bpf: add tests for kprobe.session optimization

 kernel/trace/ftrace.c                         | 28 +++++++
 tools/lib/bpf/libbpf.c                        | 32 ++++++--
 .../bpf/prog_tests/kprobe_multi_test.c        | 76 +++++++++++++++++++
 .../bpf/progs/kprobe_multi_session_errors.c   | 27 +++++++
 .../bpf/progs/kprobe_multi_session_syms.c     | 45 +++++++++++
 5 files changed, 203 insertions(+), 5 deletions(-)
 create mode 100644 
tools/testing/selftests/bpf/progs/kprobe_multi_session_errors.c
 create mode 100644 
tools/testing/selftests/bpf/progs/kprobe_multi_session_syms.c

-- 
2.34.1


Reply via email to