On Thu, Jul 11, 2024 at 4:07 AM Peter Zijlstra <[email protected]> wrote: > > Hi! > > These patches implement the (S)RCU based proposal to optimize uprobes. > > On my c^Htrusty old IVB-EP -- where each (of the 40) CPU calls 'func' in a > tight loop: > > perf probe -x ./uprobes test=func > perf stat -ae probe_uprobe:test -- sleep 1 > > perf probe -x ./uprobes test=func%return > perf stat -ae probe_uprobe:test__return -- sleep 1 > > PRE: > > 4,038,804 probe_uprobe:test > 2,356,275 probe_uprobe:test__return > > POST: > > 7,216,579 probe_uprobe:test > 6,744,786 probe_uprobe:test__return > > (copy-paste FTW, I didn't do new numbers because the fast paths didn't change > -- > and quick test run shows similar numbers) > > Patches also available here: > > git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git perf/uprobes > > > Changes since last time: > - better split with intermediate inc_not_zero() > - fix UPROBE_HANDLER_REMOVE > - restored the lost rcu_assign_pointer() > - avoid lockdep for uretprobe_srcu > - add missing put_uprobe() -> srcu_read_unlock() conversion > - actually initialize return_instance::has_ref > - a few comments > - things I don't remember > >
Hey Peter! Thanks for the v2, I plan to look at it more thoroughly tomorrow. But meanwhile I spent a good chunk of today to write an uprobes stress-test, so we can validate that we are not regressing anything (yes, I don't trust lockless code and people in general ;) Anyways, if you'd like to use it, it's at [0]. All you should need to build and run it is: $ cd examples/c $ make -j$(nproc) uprobe-stress $ sudo ./uprobe-stress -tN -aM -mP -fR N, M, P, R are number of threads dedicated to one of four functions of the stress test: triggering user space functions (N), attaching/detaching various random subsets of uprobes (M), mmap()ing parts of executable with uprobes (P), and forking the process and triggering uprobes for a little bit (R). The idea is to test various timings and interleavings of uprobe-related logic. You should only need not-too-old Clang to build everything (Clang 12+ should work, I believe). But do let me know if you run into troubles. I did run this stress test for a little while on current bpf-next/master with no issues detected (yay!). But then I also ran it on Linux built from perf/uprobes branch (these patches), and after a few seconds I see that there is no more attachment/detachment happening. Eventually I got splats, which you can see in [1]. I used `sudo ./uprobe-stress -a10 -t5 -m5 -f3` command to run it inside my QEMU image. So there is still something off, hopefully this will help to debug and hammer out any remaining kinks. Thanks! [0] https://github.com/libbpf/libbpf-bootstrap/commit/2f88cef90f9728ec8c7bee7bd48fdbcf197806c3 [1] https://gist.github.com/anakryiko/f761690addf7aa5f08caec95fda9ef1a

