On Wed, Apr 14, 2021 at 03:46:49PM -0700, Andrii Nakryiko wrote:
> On Wed, Apr 14, 2021 at 5:19 AM Jiri Olsa <jo...@redhat.com> wrote:
> >
> > On Tue, Apr 13, 2021 at 06:04:05PM -0700, Andrii Nakryiko wrote:
> > > On Tue, Apr 13, 2021 at 7:57 AM Jiri Olsa <jo...@kernel.org> wrote:
> > > >
> > > > hi,
> > > > sending another attempt on speeding up load of multiple probes
> > > > for bpftrace and possibly other tools (first post in [1]).
> > > >
> > > > This patchset adds support to attach bpf program directly to
> > > > ftrace probe as suggested by Steven and it speeds up loading
> > > > for bpftrace commands like:
> > > >
> > > >    # bpftrace -e 'kfunc:_raw_spin* { @[probe] = count(); }'
> > > >    # bpftrace -e 'kfunc:ksys_* { @[probe] = count(); }'
> > > >
> > > > Using ftrace with single bpf program for attachment to multiple
> > > > functions is much faster than current approach, where we need to
> > > > load and attach program for each probe function.
> > > >
> > >
> > > Ok, so first of all, I think it's super important to allow fast
> > > attachment of a single BPF program to multiple kernel functions (I
> > > call it mass-attachment). I've been recently prototyping a tool
> > > (retsnoop, [0]) that allows attaching fentry/fexit to multiple
> > > functions, and not having this feature turned into lots of extra code
> > > and slow startup/teardown speeds. So we should definitely fix that.
> > >
> > > But I think the approach you've taken is not the best one, even though
> > > it's a good starting point for discussion.
> > >
> > > First, you are saying function return attachment support is missing,
> > > but is not needed so far. I actually think that without func return
> > > the whole feature is extremely limiting. Not being able to measure
> > > function latency  by tracking enter/exit events is crippling for tons
> > > of useful applications. So I think this should go with both at the
> > > same time.
> > >
> > > But guess what, we already have a good BPF infra (BPF trampoline and
> > > fexit programs) that supports func exit tracing. Additionally, it
> > > supports the ability to read input arguments *on function exit*, which
> > > is something that kretprobe doesn't support and which is often a very
> > > limiting restriction, necessitating complicated logic to trace
> > > function entry just to store input arguments. It's a killer feature
> > > and one that makes fexit so much more useful than kretprobe.
> > >
> > > The only problem is that currently we have a 1:1:1 relationship
> > > between BPF trampoline, BPF program, and kernel function. I think we
> > > should allow to have a single BPF program, using a single BPF
> > > trampoline, but being able to attach to multiple kernel functions
> > > (1:1:N). This will allow to validate BPF program once, allocate only
> > > one dedicated BPF trampoline, and then (with appropriate attach API)
> > > attach them in a batch mode.
> >
> > heya,
> > I had some initial prototypes trying this way, but always ended up
> > in complicated code, that's why I turned to ftrace_ops.
> >
> > let's see if it'll make any sense to you ;-)
> >
> > 1) so let's say we have extra trampoline for the program (which
> > also seems a bit of waste since there will be just single record
> 
> BPF trampoline does more than just calls BPF program. At the very
> least it saves input arguments for fexit program to be able to access
> it. But given it's one BPF trampoline attached to thousands of
> functions, I don't see any problem there.
> 
> > in it, but sure) - this single trampoline can be easily attached
> > to multiple functions, but what about other trampolines/tools,
> > that want to trace the same function? we'd need some way for a
> > function to share/call multiple trampolines - I did not see easy
> > solution in here so I moved to another way..
> 
> The easiest would be to make the existing BPF trampoline to co-exist
> with this new multi-attach one. As to how, I don't know the code well
> enough yet to answer specifically.

I did not explore this possibility, because it seemed too
complicated ;-) I'll see if I can come up with something,
that we could start discussion for, so:

  - new trampoline type that would attach single program
    to multiple functions
  - it needs to 'co-exist' with current trampolines so
    both types could be attached to same function

> 
> >
> >
> > 2) we keep the trampoline:function relationship to 1:1 and allow
> > 'mass-attachment' program to register in multiple trampolines.
> > (it needs special hlist node for each attachment, but that's ok)
> >
> > the problem was that to make this fast, you don't want to attach/detach
> > program to trampolines one by one, you need to do it in batch,
> > so you can call ftrace API just once (ftrace API is another problem below)
> > and doing this in batch mode means, that you need to lock all the
> > related trampolines and not allow any change in them by another tools,
> > and that's where I couldn't find any easy solution.. you can't take
> > a lock for 100 trampolines.. and having some 'master' lock is tricky
> 
> So this generic fentry would have its own BPF trampoline. Now you need
> to attach it to 1000s of places with a single batch API call. We won't
> have to modify 100s of other BPF trampolines, if we can find a good
> way to let them co-exist.
> 
> 
> >
> > another problem is the ftrace API.. to make it fast we either
> > need to use ftrace_ops or create fast API to ftrace's direct
> > functions.. and that was rejected last time [1]
> 
> I don't read it as a rejection, just that ftrace infra needs to be
> improved to support. In any case, I haven't spent enough time thinking
> and digging through code, but I know that without fexit support this
> feature is useless in a lot of cases. And input argument reading in
> fexit is too good to give up at this point either.
> 
> >
> >
> > 3) bpf has support for batch interface already, but only if ftrace
> 
> It does? What is it? Last time I looked I didn't find anything like that.

trampolines uses text_poke_bp function (when ftrace is not compiled in
or the function is not ftrace-managed)

text_poke_bp is wrapper for text_poke_bp_batch to change 1 location,
text_poke_bp_batch allows to change more than one place with:

   text_poke_queue
   text_poke_queue
   ...
   text_poke_finish -> text_poke_flush -> text_poke_bp_batch

thanks,
jirka

Reply via email to