On Thu, Jul 11, 2024 at 4:07 AM Peter Zijlstra <[email protected]> wrote:
>
> Hi!
>
> These patches implement the (S)RCU based proposal to optimize uprobes.
>
> On my c^Htrusty old IVB-EP -- where each (of the 40) CPU calls 'func' in a
> tight loop:
>
>   perf probe -x ./uprobes test=func
>   perf stat -ae probe_uprobe:test  -- sleep 1
>
>   perf probe -x ./uprobes test=func%return
>   perf stat -ae probe_uprobe:test__return -- sleep 1
>
> PRE:
>
>   4,038,804      probe_uprobe:test
>   2,356,275      probe_uprobe:test__return
>
> POST:
>
>   7,216,579      probe_uprobe:test
>   6,744,786      probe_uprobe:test__return
>
> (copy-paste FTW, I didn't do new numbers because the fast paths didn't change 
> --
>  and quick test run shows similar numbers)
>
> Patches also available here:
>
>   git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git perf/uprobes
>
>
> Changes since last time:
>  - better split with intermediate inc_not_zero()
>  - fix UPROBE_HANDLER_REMOVE
>  - restored the lost rcu_assign_pointer()
>  - avoid lockdep for uretprobe_srcu
>  - add missing put_uprobe() -> srcu_read_unlock() conversion
>  - actually initialize return_instance::has_ref
>  - a few comments
>  - things I don't remember
>
>

Hey Peter!

Thanks for the v2, I plan to look at it more thoroughly tomorrow. But
meanwhile I spent a good chunk of today to write an uprobes
stress-test, so we can validate that we are not regressing anything
(yes, I don't trust lockless code and people in general ;)

Anyways, if you'd like to use it, it's at [0]. All you should need to
build and run it is:

  $ cd examples/c
  $ make -j$(nproc) uprobe-stress
  $ sudo ./uprobe-stress -tN -aM -mP -fR


N, M, P, R are number of threads dedicated to one of four functions of
the stress test: triggering user space functions (N),
attaching/detaching various random subsets of uprobes (M), mmap()ing
parts of executable with uprobes (P), and forking the process and
triggering uprobes for a little bit (R). The idea is to test various
timings and interleavings of uprobe-related logic.

You should only need not-too-old Clang to build everything (Clang 12+
should work, I believe). But do let me know if you run into troubles.

I did run this stress test for a little while on current
bpf-next/master with no issues detected (yay!).

But then I also ran it on Linux built from perf/uprobes branch (these
patches), and after a few seconds I see that there is no more
attachment/detachment happening. Eventually I got splats, which you
can see in [1]. I used `sudo ./uprobe-stress -a10 -t5 -m5 -f3` command
to run it inside my QEMU image.

So there is still something off, hopefully this will help to debug and
hammer out any remaining kinks. Thanks!

  [0] 
https://github.com/libbpf/libbpf-bootstrap/commit/2f88cef90f9728ec8c7bee7bd48fdbcf197806c3
  [1] https://gist.github.com/anakryiko/f761690addf7aa5f08caec95fda9ef1a

Reply via email to