(2014/08/08 23:28), Paul E. McKenney wrote:
> On Fri, Aug 08, 2014 at 10:12:21AM -0400, Steven Rostedt wrote:
>> On Fri, 8 Aug 2014 08:40:20 +0200
>> Peter Zijlstra <pet...@infradead.org> wrote:
>>
>>> On Thu, Aug 07, 2014 at 05:18:23PM -0400, Steven Rostedt wrote:
>>>> On Thu, 7 Aug 2014 22:08:13 +0200
>>>> Peter Zijlstra <pet...@infradead.org> wrote:
>>>>
>>>>> OK, you've got to start over and start at the beginning, because I'm
>>>>> really not understanding this..
>>>>>
>>>>> What is a 'trampoline' and what are you going to use them for.
>>>>
>>>> Great question! :-)
>>>>
>>>> The trampoline is some code that is used to jump to and then jump
>>>> someplace else. Currently, we use this for kprobes and ftrace. For
>>>> ftrace we have the ftrace_caller trampoline, which is static. When
>>>> booting, most functions in the kernel call the mcount code which
>>>> simply returns without doing anything. This too is a "trampoline". At
>>>> boot, we convert these calls to nops (as you already know). When we
>>>> enable callbacks from functions, we convert those calls to call
>>>> "ftrace_caller" which is a small assembly trampoline that will call
>>>> some function that registered with ftrace.
>>>>
>>>> Now why do we need the call_rcu_task() routine?
>>>>
>>>> Right now, if you register multiple callbacks to ftrace, even if they
>>>> are not tracing the same routine, ftrace has to change ftrace_caller to
>>>> call another trampoline (in C), that does a loop of all ops registered
>>>> with ftrace, and compares the function to the ops hash tables to see if
>>>> the ops function should be called for that function.
>>>>
>>>> What we want to do is to create a dynamic trampoline that is a copy of
>>>> the ftrace_caller code, but instead of calling this list trampoline, it
>>>> calls the ops function directly. This way, each ops registered with
>>>> ftrace can have its own custom trampoline that when called will only
>>>> call the ops function and not have to iterate over a list. This only
>>>> happens if the function being traced only has this one ops registered.
>>>> For functions with multiple ops attached to it, we need to call the
>>>> list anyway. But for the majority of the cases, this is not the case.
>>>>
>>>> The one caveat for this is, how do we free this custom trampoline when
>>>> the ops is done with it? Especially for users of ftrace that
>>>> dynamically create their own ops (like perf, and ftrace instances).
>>>>
>>>> We need to find a way to free it, but unfortunately, there's no way to
>>>> know when it is safe to free it. There's no way to disable preemption
>>>> or have some other notifier to let us know if a task has jumped to this
>>>> trampoline and has been preempted (sleeping). The only safe way to know
>>>> that no task is on the trampoline is to remove the calls to it,
>>>> synchronize the CPUS (so the trampolines are not even in the caches),
>>>> and then wait for all tasks to go through some quiescent state. This
>>>> state happens to be either not running, in userspace, or when it
>>>> voluntarily calls schedule. Because nothing that uses this trampoline
>>>> should do that, and if the task voluntarily calls schedule, we know
>>>> it's not on the trampoline.
>>>>
>>>> Make sense?
>>>
>>> Ok, so they're purely used in the function prologue/epilogue callchain.
>>
>> No, they are also used by optimized kprobes. This is why optimized
>> kprobes depend on !CONFIG_PREEMPT. [ added Masami to the discussion ].
>>
>> Which reminds me. On !CONFIG_PREEMPT, call_rcu_task() should be
>> equivalent to call_rcu_sched().
> 
> Almost.  One difference is that call_rcu_sched() won't wait for
> idle-task execution.  So presumably you are currently prohibited from
> putting kprobes in idle tasks.

No need to prohibit all kprobes, just prohibit optimizing if the kprobe
is in the idle context (if I can detect it). Since I've already replaced
text-area based __kprobes with list-based NOKPROBE_SYMOBOL in core kernel,
I think it is an option to add NOOPTPROBE_SYMBOL for that purpose.

Thank you,

> Oleg slipped this one past me, and for more than a full hour,
> (https://lkml.org/lkml/2014/8/2/18), but this time I remembered.  ;-)
> 
>                                                       Thanx, Paul



-- 
Masami HIRAMATSU
Software Platform Research Dept. Linux Technology Research Center
Hitachi, Ltd., Yokohama Research Laboratory
E-mail: masami.hiramatsu...@hitachi.com


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Reply via email to