On Tue, 5 Apr 2016 14:06:26 +0200
Peter Zijlstra <pet...@infradead.org> wrote:

> On Mon, Apr 04, 2016 at 09:52:47PM -0700, Alexei Starovoitov wrote:
> > avoid memset in perf_fetch_caller_regs, since it's the critical path of all 
> > tracepoints.
> > It's called from perf_sw_event_sched, perf_event_task_sched_in and all of 
> > perf_trace_##call
> > with this_cpu_ptr(&__perf_regs[..]) which are zero initialized by 
> > perpcu_alloc  
> 
> Its not actually allocated; but because its a static uninitialized
> variable we get .bss like behaviour and the initial value is copied to
> all CPUs when the per-cpu allocator thingy bootstraps SMP IIRC.
> 
> > and
> > subsequent call to perf_arch_fetch_caller_regs initializes the same fields 
> > on all archs,
> > so we can safely drop memset from all of the above cases and   
> 
> Indeed.
> 
> > move it into
> > perf_ftrace_function_call that calls it with stack allocated pt_regs.  
> 
> Hmm, is there a reason that's still on-stack instead of using the
> per-cpu thing, Steve?

Well, what do you do when you are tracing with regs in an interrupt
that already set the per cpu regs field? We could create our own
per-cpu one as well, but then that would require checking which level
we are in, as we can have one for normal context, one for softirq
context, one for irq context and one for nmi context.

-- Steve



> 
> > Signed-off-by: Alexei Starovoitov <a...@kernel.org>  
> 
> In any case,
> 
> Acked-by: Peter Zijlstra (Intel) <pet...@infradead.org>

Reply via email to