On Tue, 10 Dec 2013 16:42:16 +0100 Petr Mladek <pmla...@suse.cz> wrote:
> When trying to use text_poke_bp_iter in ftrace, the result was slower than > the original implementation. > > It turned out that one difference was in text_poke vs. ftrace_write. > text_poke did many extra operations to make sure that code was read-write > and the change was atomic. > > In fact, we do not need the atomic operation inside text_poke_bp_iter because > the modified code is guarded by the int3 handler. The int3 interrupt itself > is added and removed by an atomic operation because we modify only one byte. > > Also if we patch many functions, it is more effective to set the whole text > code > read-write instead of remapping each address into a helper address space. > > This patch adds text_poke_part which is inspired by ftrace_write. > Then it is used in text_poke_bp_iter instead of the slower text_poke. > It adds the limitation that text_poke_bp_iter user is responsible for > setting the patched code read-write. > > Note that we can't use it in text_poke because it is not effective > to set the page or two read-write just because patching one piece. > > For example, I tried to switch between 7 tracers: blk, branch, function_graph, > wakeup_rt, irqsoff, function, and nop. Every tracer has also been enabled and > disabled. With 100 cycles, I got these times with text_poke: > > real 25m7.380s 25m13.44s 25m9.072s > user 0m0.004s 0m0.004s 0m0.004s > sys 0m3.480s 0m3.508s 0m3.420s > > and with text_poke_part: > > real 3m23.335s 3m24.422s 3m26.686s > user 0m0.004s 0m0.004s 0m0.004s > sys 0m3.724s 0m3.772s 0m3.588s > > Signed-off-by: Petr Mladek <pmla...@suse.cz> > Reviewed-by: Masami Hiramatsu <masami.hiramatsu...@hitachi.com> > --- > arch/x86/kernel/alternative.c | 67 > ++++++++++++++++++++++++++++++++++++------- > 1 file changed, 57 insertions(+), 10 deletions(-) > > diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c > index 8e57ac03a0e8..03abf9abf681 100644 > --- a/arch/x86/kernel/alternative.c > +++ b/arch/x86/kernel/alternative.c > @@ -595,6 +595,51 @@ void __kprobes text_poke_or_die(void *addr, const void > *opcode, size_t len) > BUG_ON(err); > } > > +/** > + * text_poke_part - Update part of the instruction on a live kernel when > using > + * int3 breakpoint > + * @addr: address to modify > + * @opcode: source of the copy > + * @len: length to copy > + * > + * If we patch instructions using int3 interrupt, we do not need to be so > + * careful about an atomic write. Adding and removing the interrupt is atomic > + * because we modify only one byte. The rest of the instruction could be > + * modified in several steps because it is guarded by the interrupt handler. Another nit, because I'm anal. I know Intel confuses things by calling int3 an interrupt, when in reality it is a trap. There's nothing asynchronous about it and nothing is being interrupted. Can we use the term "trap" or "breakpoint" instead of "interrupt"? > + * Hence we could use faster code here. See also text_poke_bp_iter below > + * for more details. > + * > + * Note: This function must be called under text_mutex. Also the caller is > + * responsible for setting the patched code read-write. This is more > effective > + * only when you patch many addresses at the same time. s/many/several/ > + */ > +static int text_poke_part(void *addr, const void *opcode, size_t len) > +{ > + int ret; > + > + /* The patching makes sense only for a text code */ s/for a/for/ -- Steve > + if (unlikely((unsigned long)addr < (unsigned long)_text) || > + unlikely((unsigned long)addr >= (unsigned long)_etext)) > + return -EFAULT; > + > +#if defined(CONFIG_X86_64) && defined(CONFIG_DEBUG_RODATA) > + /* > + * On x86_64, the kernel text mapping is always mapped read-only with > + * CONFIG_DEBUG_RODATA. Instead, we need to use the kernel identity > + * mapping that can be set read-write, for example using > + * set_kernel_text_rw(). > + */ > + addr = __va(__pa_symbol(addr)); > +#endif > + > + ret = probe_kernel_write(addr, opcode, len); > + if (unlikely(ret)) > + return -EPERM; > + > + sync_core(); > + return 0; > +} > + -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/