Re: [PATCH] kretprobe: produce sane stack traces

Masami Hiramatsu Thu, 01 Nov 2018 01:06:51 -0700

On Tue, 30 Oct 2018 14:19:53 +1100
Aleksa Sarai <[email protected]> wrote:


> On 2018-10-30, Masami Hiramatsu <[email protected]> wrote:
> > > Historically, kretprobe has always produced unusable stack traces
> > > (kretprobe_trampoline is the only entry in most cases, because of the
> > > funky stack pointer overwriting). This has caused quite a few annoyances
> > > when using tracing to debug problems[1] -- since return values are only
> > > available with kretprobes but stack traces were only usable for kprobes,
> > > users had to probe both and then manually associate them.
> > 
> > Yes, this unfortunately still happens. I once tried to fix it by
> > replacing current "kretprobe instance" with graph-tracer's per-thread
> > return stack. (https://lkml.org/lkml/2017/8/21/553)
> 
> I played with graph-tracer a while ago and it didn't appear to have
> associated return values? Is this hidden somewhere or did I just miss
> it?

Graph tracer just doesn't trace it. We still can access it.

> 
> > I still believe that direction is the best solution to solve this kind
> > of issues, otherwise, we have to have 2 different stack fixups for
> > kretprobe and ftrace graph tracer. (I will have a talk with Steve at
> > plumbers next month)
> 
> I'm definitely :+1: on removing the duplication of the stack fixups, my
> first instinct was to try to refactor all of the stack_trace code so
> that we didn't have multiple arch-specific "get the stack trace" paths
> (and so we could generically add current_kretprobe_instance() to one
> codepath). But after looking into it, I was convinced this would be more
> than a little ugly to do.

Yes, it would take a time to fix it up all, but should be done.

> > > With the advent of bpf_trace, users would have been able to do this
> > > association in bpf, but this was less than ideal (because
> > > bpf_get_stackid would still produce rubbish and programs that didn't
> > > know better would get silly results). The main usecase for stack traces
> > > (at least with bpf_trace) is for DTrace-style aggregation on stack
> > > traces (both entry and exit). Therefore we cannot simply correct the
> > > stack trace on exit -- we must stash away the stack trace and return the
> > > entry stack trace when it is requested.
> > > 
> > > In theory, patches like commit 76094a2cf46e ("ftrace: distinguish
> > > kretprobe'd functions in trace logs") are no longer necessary *for
> > > tracing* because now all kretprobe traces should produce sane stack
> > > traces. However it's not clear whether removing them completely is
> > > reasonable.
> > 
> > Then, let's try to revert it :)
> 
> Sure. :P
> 
> > BTW, could you also add a test case for ftrace too?
> > also, I have some comments below.
> 
> Yup, will do.
> 
> > > +#define KRETPROBE_TRACE_SIZE 1024
> > > +struct kretprobe_trace {
> > > + int nr_entries;
> > > + unsigned long entries[KRETPROBE_TRACE_SIZE];
> > > +};
> > 
> > Hmm, do we really need all entries? It takes 8KB for each instances.
> > Note that the number of instances can be big if the system core number
> > is larger.
> 
> Yeah, you're right this is too large for a default.
> 
> But the problem is that we need it to be large enough for any of the
> tracers to be happy -- otherwise we'd have to dynamically allocate it
> and I had a feeling this would be seen as a Bad Idea™ in the kprobe
> paths.

Note that we can skip if it is not enough with just nmissed+1

> 
>   * ftrace uses PAGE_SIZE/sizeof(u64) == 512 (on x86_64).
>   * perf_events (and thus BPF) uses 127 as the default but can be
>     configured via sysctl -- and thus can be unbounded.
>   * show_stack(...) doesn't appear to have a limit, but I might just be
>     misreading the x86-specific code.
> 
> As mentioned above, the lack of consensus on a single structure for
> storing stack traces also means that there is a lack of consensus on
> what the largest reasonable stack is.
> 
> But maybe just doing 127 would be "reasonable"?

Yeah, I think that is reasonable size.

> 
> (Athough, dynamically allocating would allow us to just use 'struct
> stack_trace' directly without needing to embed a different structure.)
> 
> > > + hlist_for_each_entry_safe(iter, next, head, hlist) {
> > 
> > Why would you use "_safe" variant here? if you don't modify the hlist,
> > you don't need to use it.
> 
> Yup, my mistake.
> 
> > > +void kretprobe_save_stack_trace(struct kretprobe_instance *ri,
> > > +                         struct stack_trace *trace)
> > > +{
> > > + int i;
> > > + struct kretprobe_trace *krt = &ri->entry;
> > > +
> > > + for (i = trace->skip; i < krt->nr_entries; i++) {
> > > +         if (trace->nr_entries >= trace->max_entries)
> > > +                 break;
> > > +         trace->entries[trace->nr_entries++] = krt->entries[i];
> > > + }
> > > +}
> > > +EXPORT_SYMBOL_GPL(kretprobe_save_stack_trace);
> > > +
> > > +void kretprobe_perf_callchain_kernel(struct kretprobe_instance *ri,
> > > +                              struct perf_callchain_entry_ctx *ctx)
> > > +{
> > > + int i;
> > > + struct kretprobe_trace *krt = &ri->entry;
> > > +
> > > + for (i = 0; i < krt->nr_entries; i++) {
> > > +         if (krt->entries[i] == ULONG_MAX)
> > > +                 break;
> > > +         perf_callchain_store(ctx, (u64) krt->entries[i]);
> > > + }
> > > +}
> > > +EXPORT_SYMBOL_GPL(kretprobe_perf_callchain_kernel);
> > 
> > 
> > Why do we need to export these functions?
> 
> That's a good question -- I must've just banged out the EXPORT
> statements without thinking. I'll remove them in v2.

OK.

Thank you,

-- 
Masami Hiramatsu <[email protected]>

Re: [PATCH] kretprobe: produce sane stack traces

Reply via email to