When ftrace_lookup_symbols() is called with a single symbol (cnt == 1),
use kallsyms_lookup_name() for O(log N) binary search instead of the
full linear scan via kallsyms_on_each_symbol().

ftrace_lookup_symbols() was designed for batch resolution of many
symbols in a single pass.  For large cnt this is efficient: a single
O(N) walk over all symbols with O(log cnt) binary search into the
sorted input array.  But for cnt == 1 it still decompresses all ~200K
kernel symbols only to match one.

kallsyms_lookup_name() uses the sorted kallsyms index and needs only
~17 decompressions for a single lookup.

This is the common path for kprobe.session with exact function names,
where libbpf sends one symbol per BPF_LINK_CREATE syscall.

If binary lookup fails (duplicate symbol names where the first match
is not ftrace-instrumented, or module symbols), the function falls
through to the existing linear scan path.

Before (cnt=1, 50 kprobe.session programs):
  Attach: 858 ms  (kallsyms_expand_symbol 25% of CPU)

After:
  Attach:  52 ms  (16x faster)

Signed-off-by: Andrey Grodzovsky <[email protected]>
---
 kernel/trace/ftrace.c | 28 ++++++++++++++++++++++++++++
 1 file changed, 28 insertions(+)

diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 827fb9a0bf0d..bfd7670669c2 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -9263,6 +9263,19 @@ static int kallsyms_callback(void *data, const char 
*name, unsigned long addr)
  * @addrs array, which needs to be big enough to store at least @cnt
  * addresses.
  *
+ * For a single symbol (cnt == 1), uses kallsyms_lookup_name() which
+ * performs an O(log N) binary search via the sorted kallsyms index.
+ * This avoids the full O(N) linear scan over all kernel symbols that
+ * the multi-symbol path requires.
+ *
+ * For multiple symbols, uses a single-pass linear scan via
+ * kallsyms_on_each_symbol() with binary search into the sorted input
+ * array.  While individual lookups are O(log N), doing K lookups
+ * totals O(K * log N) which loses to a single sequential O(N) pass
+ * at scale due to cache-friendly memory access patterns of the linear
+ * walk.  Empirical testing shows the linear scan is faster for batch
+ * lookups even well below 10K symbols.
+ *
  * Returns: 0 if all provided symbols are found, -ESRCH otherwise.
  */
 int ftrace_lookup_symbols(const char **sorted_syms, size_t cnt, unsigned long 
*addrs)
@@ -9270,6 +9283,21 @@ int ftrace_lookup_symbols(const char **sorted_syms, 
size_t cnt, unsigned long *a
        struct kallsyms_data args;
        int found_all;
 
+       /* Fast path: single symbol uses O(log N) binary search */
+       if (cnt == 1) {
+               addrs[0] = kallsyms_lookup_name(sorted_syms[0]);
+               if (addrs[0])
+                       addrs[0] = ftrace_location(addrs[0]);
+               if (addrs[0])
+                       return 0;
+               /*
+                * Binary lookup can fail for duplicate symbol names
+                * where the first match is not ftrace-instrumented,
+                * or for module symbols.  Retry with linear scan.
+                */
+       }
+
+       /* Batch path: single-pass O(N) linear scan */
        memset(addrs, 0, sizeof(*addrs) * cnt);
        args.addrs = addrs;
        args.syms = sorted_syms;
-- 
2.34.1


Reply via email to