On Sun, May 20, 2018 at 11:28:43AM -0400, Steven Rostedt wrote:
> 
> [ Steve interrupts his time off ]

Hope you're enjoying your vacation :)

> On Sat, 19 May 2018 17:49:38 -0700
> "Paul E. McKenney" <paul...@linux.vnet.ibm.com> wrote:
> 
> > I suggested to Steven that the rcu_read_lock() and rcu_read_unlock() might
> > be outside of the trampoline, but this turned out to be infeasible.  Not
> > that I remember why!  ;-)
> 
> Because the trampoline itself is what needs to be freed. The trampoline
> is what mcount/fentry or an optimized kprobe jumps to.
> 
> 
> <func>:
>       nop
> 
> [ enable function tracing ]
> 
> <func>:
>       call func_tramp --> set up stack
>                           call function_tracer()
>                           pop stack
>                           ret
> 
>                           ^^^^^
>                           This is the trampoline
> 
> There's no way to know when a task will be on the trampoline or not.
> The trampoline is allocated, and we need RCU_tasks to know when we can
> free it. The only way to make a "wrapper" is to modify more of the code
> text to do whatever before calling the trampoline, which is
> impractical.
> 
> The allocated trampolines were added as an optimization, where two
> registered callback functions from ftrace that are attached to two
> different functions don't call the same trampoline which would have to
> do a loop and a hash lookup to know what callback to call per function.
> If a callback is the only one attached to a specific function, then a
> trampoline is allocated and will call that callback directly, keeping
> the overhead down.

Right, I saw your trampoline prototype tree. I understand how it works now,
thanks.

> There is no feasible way to know when a task is on a trampoline
> without adding overhead that negates the speed up we receive by making
> individual trampolines to begin with.

Are you speaking of time overhead or space overhead, or both?

Just thinking out loud and probably some food for thought..

The rcu_read_lock/unlock primitive are extrememly fast, so I don't personally
think there's a time hit.

Could we get around the trampoline code == data issue by say using a
multi-stage trampoline like so? :

        call func_tramp --> (static
                            trampoline)               (dynamic trampoline)
                            rcu_read_lock()  -------> set up stack
                                                      call function_tracer()
                                                      pop stack
                            rcu_read_unlock() <------ ret
 
I know there's probably more to it than this, but conceptually atleast, it
feels like all the RCU infrastructure is already there to handle preemption
within a trampoline and it would be cool if the trampoline were as shown
above for the dynamically allocated trampolines. Atleast I feel it will be
faster than the pre-trampoline code that did the hash lookups / matching to
call the right function callbacks, and could help eliminiate need for the
RCU-tasks subsystem and its kthread then.

If you still feel its nots worth it, then that's okay too and clearly the
RCU-tasks has benefits such as a simpler trampoline implementation..

thanks!

- Joel

Reply via email to