Re: [PATCH v8 2/2] ARM: kprobes: enable OPTPROBES for ARM 32
On 2014/11/18 14:32, Wang Nan wrote: > This patch introduce kprobeopt for ARM 32. > > Limitations: > - Currently only kernel compiled with ARM ISA is supported. > > - Offset between probe point and optinsn slot must not larger than >32MiB. Masami Hiramatsu suggests replacing 2 words, it will make >things complex. Futher patch can make such optimization. > > Kprobe opt on ARM is relatively simpler than kprobe opt on x86 because > ARM instruction is always 4 bytes aligned and 4 bytes long. This patch > replace probed instruction by a 'b', branch to trampoline code and then > calls optimized_callback(). optimized_callback() calls opt_pre_handler() > to execute kprobe handler. It also emulate/simulate replaced instruction. > > When unregistering kprobe, the deferred manner of unoptimizer may leave > branch instruction before optimizer is called. Different from x86_64, > which only copy the probed insn after optprobe_template_end and > reexecute them, this patch call singlestep to emulate/simulate the insn > directly. Futher patch can optimize this behavior. > > v1 -> v2: > > - Improvement: if replaced instruction is conditional, generate a >conditional branch instruction for it; > > - Introduces RELATIVEJUMP_OPCODES due to ARM kprobe_opcode_t is 4 >bytes; > > - Removes size field in struct arch_optimized_insn; > > - Use arm_gen_branch() to generate branch instruction; > > - Remove all recover logic: ARM doesn't use tail buffer, no need >recover replaced instructions like x86; > > - Remove incorrect CONFIG_THUMB checking; > > - can_optimize() always returns true if address is well aligned; > > - Improve optimized_callback: using opt_pre_handler(); > > - Bugfix: correct range checking code and improve comments; > > - Fix commit message. > > v2 -> v3: > > - Rename RELATIVEJUMP_OPCODES to MAX_COPIED_INSNS; > > - Remove unneeded checking: > arch_check_optimized_kprobe(), can_optimize(); > > - Add missing flush_icache_range() in arch_prepare_optimized_kprobe(); > > - Remove unneeded 'return;'. > > v3 -> v4: > > - Use __mem_to_opcode_arm() to translate copied_insn to ensure it >works in big endian kernel; > > - Replace 'nop' placeholder in trampoline code template with >'.long 0' to avoid confusion: reader may regard 'nop' as an >instruction, but it is value in fact. > > v4 -> v5: > > - Don't optimize stack store operations. > > - Introduce prepared field to arch_optimized_insn to indicate whether >it is prepared. Similar to size field with x86. See v1 -> v2. > > v5 -> v6: > > - Dynamically reserve stack according to instruction. > > - Rename: kprobes-opt.c -> kprobes-opt-arm.c. > > - Set op->optinsn.insn after all works are done. > > v6 -> v7: > > - Using checker to check stack consumption. > > v7 -> v8: > > - Small code adjustments. > > Signed-off-by: Wang Nan > Acked-by: Masami Hiramatsu > Cc: Jon Medhurst (Tixy) > Cc: Russell King - ARM Linux > Cc: Will Deacon > --- > arch/arm/Kconfig | 1 + > arch/arm/include/asm/kprobes.h| 26 > arch/arm/kernel/Makefile | 3 +- > arch/arm/kernel/kprobes-opt-arm.c | 285 > ++ > 4 files changed, 314 insertions(+), 1 deletion(-) > create mode 100644 arch/arm/kernel/kprobes-opt-arm.c > > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig > index 89c4b5c..8281cea 100644 > --- a/arch/arm/Kconfig > +++ b/arch/arm/Kconfig > @@ -59,6 +59,7 @@ config ARM > select HAVE_MEMBLOCK > select HAVE_MOD_ARCH_SPECIFIC if ARM_UNWIND > select HAVE_OPROFILE if (HAVE_PERF_EVENTS) > + select HAVE_OPTPROBES if (!THUMB2_KERNEL) > select HAVE_PERF_EVENTS > select HAVE_PERF_REGS > select HAVE_PERF_USER_STACK_DUMP > diff --git a/arch/arm/include/asm/kprobes.h b/arch/arm/include/asm/kprobes.h > index 56f9ac6..c1016cb 100644 > --- a/arch/arm/include/asm/kprobes.h > +++ b/arch/arm/include/asm/kprobes.h > @@ -50,5 +50,31 @@ int kprobe_fault_handler(struct pt_regs *regs, unsigned > int fsr); > int kprobe_exceptions_notify(struct notifier_block *self, >unsigned long val, void *data); > > +/* optinsn template addresses */ > +extern __visible kprobe_opcode_t optprobe_template_entry; > +extern __visible kprobe_opcode_t optprobe_template_val; > +extern __visible kprobe_opcode_t optprobe_template_call; > +extern __visible kprobe_opcode_t optprobe_template_end; > + > +#define MAX_OPTIMIZED_LENGTH (4) > +#define MAX_OPTINSN_SIZE \ > + (((unsigned long)_template_end - \ > + (unsigned long)_template_entry)) > +#define RELATIVEJUMP_SIZE(4) > + > +struct arch_optimized_insn { > + /* > + * copy of the original instructions. > + * Different from x86, ARM kprobe_opcode_t is u32. > + */ > +#define MAX_COPIED_INSN ((RELATIVEJUMP_SIZE) / sizeof(kprobe_opcode_t)) > + kprobe_opcode_t
Re: [PATCH v8 2/2] ARM: kprobes: enable OPTPROBES for ARM 32
On 2014/11/18 14:32, Wang Nan wrote: This patch introduce kprobeopt for ARM 32. Limitations: - Currently only kernel compiled with ARM ISA is supported. - Offset between probe point and optinsn slot must not larger than 32MiB. Masami Hiramatsu suggests replacing 2 words, it will make things complex. Futher patch can make such optimization. Kprobe opt on ARM is relatively simpler than kprobe opt on x86 because ARM instruction is always 4 bytes aligned and 4 bytes long. This patch replace probed instruction by a 'b', branch to trampoline code and then calls optimized_callback(). optimized_callback() calls opt_pre_handler() to execute kprobe handler. It also emulate/simulate replaced instruction. When unregistering kprobe, the deferred manner of unoptimizer may leave branch instruction before optimizer is called. Different from x86_64, which only copy the probed insn after optprobe_template_end and reexecute them, this patch call singlestep to emulate/simulate the insn directly. Futher patch can optimize this behavior. v1 - v2: - Improvement: if replaced instruction is conditional, generate a conditional branch instruction for it; - Introduces RELATIVEJUMP_OPCODES due to ARM kprobe_opcode_t is 4 bytes; - Removes size field in struct arch_optimized_insn; - Use arm_gen_branch() to generate branch instruction; - Remove all recover logic: ARM doesn't use tail buffer, no need recover replaced instructions like x86; - Remove incorrect CONFIG_THUMB checking; - can_optimize() always returns true if address is well aligned; - Improve optimized_callback: using opt_pre_handler(); - Bugfix: correct range checking code and improve comments; - Fix commit message. v2 - v3: - Rename RELATIVEJUMP_OPCODES to MAX_COPIED_INSNS; - Remove unneeded checking: arch_check_optimized_kprobe(), can_optimize(); - Add missing flush_icache_range() in arch_prepare_optimized_kprobe(); - Remove unneeded 'return;'. v3 - v4: - Use __mem_to_opcode_arm() to translate copied_insn to ensure it works in big endian kernel; - Replace 'nop' placeholder in trampoline code template with '.long 0' to avoid confusion: reader may regard 'nop' as an instruction, but it is value in fact. v4 - v5: - Don't optimize stack store operations. - Introduce prepared field to arch_optimized_insn to indicate whether it is prepared. Similar to size field with x86. See v1 - v2. v5 - v6: - Dynamically reserve stack according to instruction. - Rename: kprobes-opt.c - kprobes-opt-arm.c. - Set op-optinsn.insn after all works are done. v6 - v7: - Using checker to check stack consumption. v7 - v8: - Small code adjustments. Signed-off-by: Wang Nan wangn...@huawei.com Acked-by: Masami Hiramatsu masami.hiramatsu...@hitachi.com Cc: Jon Medhurst (Tixy) t...@linaro.org Cc: Russell King - ARM Linux li...@arm.linux.org.uk Cc: Will Deacon will.dea...@arm.com --- arch/arm/Kconfig | 1 + arch/arm/include/asm/kprobes.h| 26 arch/arm/kernel/Makefile | 3 +- arch/arm/kernel/kprobes-opt-arm.c | 285 ++ 4 files changed, 314 insertions(+), 1 deletion(-) create mode 100644 arch/arm/kernel/kprobes-opt-arm.c diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 89c4b5c..8281cea 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -59,6 +59,7 @@ config ARM select HAVE_MEMBLOCK select HAVE_MOD_ARCH_SPECIFIC if ARM_UNWIND select HAVE_OPROFILE if (HAVE_PERF_EVENTS) + select HAVE_OPTPROBES if (!THUMB2_KERNEL) select HAVE_PERF_EVENTS select HAVE_PERF_REGS select HAVE_PERF_USER_STACK_DUMP diff --git a/arch/arm/include/asm/kprobes.h b/arch/arm/include/asm/kprobes.h index 56f9ac6..c1016cb 100644 --- a/arch/arm/include/asm/kprobes.h +++ b/arch/arm/include/asm/kprobes.h @@ -50,5 +50,31 @@ int kprobe_fault_handler(struct pt_regs *regs, unsigned int fsr); int kprobe_exceptions_notify(struct notifier_block *self, unsigned long val, void *data); +/* optinsn template addresses */ +extern __visible kprobe_opcode_t optprobe_template_entry; +extern __visible kprobe_opcode_t optprobe_template_val; +extern __visible kprobe_opcode_t optprobe_template_call; +extern __visible kprobe_opcode_t optprobe_template_end; + +#define MAX_OPTIMIZED_LENGTH (4) +#define MAX_OPTINSN_SIZE \ + (((unsigned long)optprobe_template_end - \ + (unsigned long)optprobe_template_entry)) +#define RELATIVEJUMP_SIZE(4) + +struct arch_optimized_insn { + /* + * copy of the original instructions. + * Different from x86, ARM kprobe_opcode_t is u32. + */ +#define MAX_COPIED_INSN ((RELATIVEJUMP_SIZE) / sizeof(kprobe_opcode_t)) + kprobe_opcode_t copied_insn[MAX_COPIED_INSN]; + /* detour code
[PATCH v8 2/2] ARM: kprobes: enable OPTPROBES for ARM 32
This patch introduce kprobeopt for ARM 32. Limitations: - Currently only kernel compiled with ARM ISA is supported. - Offset between probe point and optinsn slot must not larger than 32MiB. Masami Hiramatsu suggests replacing 2 words, it will make things complex. Futher patch can make such optimization. Kprobe opt on ARM is relatively simpler than kprobe opt on x86 because ARM instruction is always 4 bytes aligned and 4 bytes long. This patch replace probed instruction by a 'b', branch to trampoline code and then calls optimized_callback(). optimized_callback() calls opt_pre_handler() to execute kprobe handler. It also emulate/simulate replaced instruction. When unregistering kprobe, the deferred manner of unoptimizer may leave branch instruction before optimizer is called. Different from x86_64, which only copy the probed insn after optprobe_template_end and reexecute them, this patch call singlestep to emulate/simulate the insn directly. Futher patch can optimize this behavior. v1 -> v2: - Improvement: if replaced instruction is conditional, generate a conditional branch instruction for it; - Introduces RELATIVEJUMP_OPCODES due to ARM kprobe_opcode_t is 4 bytes; - Removes size field in struct arch_optimized_insn; - Use arm_gen_branch() to generate branch instruction; - Remove all recover logic: ARM doesn't use tail buffer, no need recover replaced instructions like x86; - Remove incorrect CONFIG_THUMB checking; - can_optimize() always returns true if address is well aligned; - Improve optimized_callback: using opt_pre_handler(); - Bugfix: correct range checking code and improve comments; - Fix commit message. v2 -> v3: - Rename RELATIVEJUMP_OPCODES to MAX_COPIED_INSNS; - Remove unneeded checking: arch_check_optimized_kprobe(), can_optimize(); - Add missing flush_icache_range() in arch_prepare_optimized_kprobe(); - Remove unneeded 'return;'. v3 -> v4: - Use __mem_to_opcode_arm() to translate copied_insn to ensure it works in big endian kernel; - Replace 'nop' placeholder in trampoline code template with '.long 0' to avoid confusion: reader may regard 'nop' as an instruction, but it is value in fact. v4 -> v5: - Don't optimize stack store operations. - Introduce prepared field to arch_optimized_insn to indicate whether it is prepared. Similar to size field with x86. See v1 -> v2. v5 -> v6: - Dynamically reserve stack according to instruction. - Rename: kprobes-opt.c -> kprobes-opt-arm.c. - Set op->optinsn.insn after all works are done. v6 -> v7: - Using checker to check stack consumption. v7 -> v8: - Small code adjustments. Signed-off-by: Wang Nan Acked-by: Masami Hiramatsu Cc: Jon Medhurst (Tixy) Cc: Russell King - ARM Linux Cc: Will Deacon --- arch/arm/Kconfig | 1 + arch/arm/include/asm/kprobes.h| 26 arch/arm/kernel/Makefile | 3 +- arch/arm/kernel/kprobes-opt-arm.c | 285 ++ 4 files changed, 314 insertions(+), 1 deletion(-) create mode 100644 arch/arm/kernel/kprobes-opt-arm.c diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 89c4b5c..8281cea 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -59,6 +59,7 @@ config ARM select HAVE_MEMBLOCK select HAVE_MOD_ARCH_SPECIFIC if ARM_UNWIND select HAVE_OPROFILE if (HAVE_PERF_EVENTS) + select HAVE_OPTPROBES if (!THUMB2_KERNEL) select HAVE_PERF_EVENTS select HAVE_PERF_REGS select HAVE_PERF_USER_STACK_DUMP diff --git a/arch/arm/include/asm/kprobes.h b/arch/arm/include/asm/kprobes.h index 56f9ac6..c1016cb 100644 --- a/arch/arm/include/asm/kprobes.h +++ b/arch/arm/include/asm/kprobes.h @@ -50,5 +50,31 @@ int kprobe_fault_handler(struct pt_regs *regs, unsigned int fsr); int kprobe_exceptions_notify(struct notifier_block *self, unsigned long val, void *data); +/* optinsn template addresses */ +extern __visible kprobe_opcode_t optprobe_template_entry; +extern __visible kprobe_opcode_t optprobe_template_val; +extern __visible kprobe_opcode_t optprobe_template_call; +extern __visible kprobe_opcode_t optprobe_template_end; + +#define MAX_OPTIMIZED_LENGTH (4) +#define MAX_OPTINSN_SIZE \ + (((unsigned long)_template_end - \ + (unsigned long)_template_entry)) +#define RELATIVEJUMP_SIZE (4) + +struct arch_optimized_insn { + /* +* copy of the original instructions. +* Different from x86, ARM kprobe_opcode_t is u32. +*/ +#define MAX_COPIED_INSN((RELATIVEJUMP_SIZE) / sizeof(kprobe_opcode_t)) + kprobe_opcode_t copied_insn[MAX_COPIED_INSN]; + /* detour code buffer */ + kprobe_opcode_t *insn; + /* +* we always copies one instruction on arm32, +* size always be 4, so no size field. +*/ +}; #endif /* _ARM_KPROBES_H */ diff --git a/arch/arm/kernel/Makefile
[PATCH v8 2/2] ARM: kprobes: enable OPTPROBES for ARM 32
This patch introduce kprobeopt for ARM 32. Limitations: - Currently only kernel compiled with ARM ISA is supported. - Offset between probe point and optinsn slot must not larger than 32MiB. Masami Hiramatsu suggests replacing 2 words, it will make things complex. Futher patch can make such optimization. Kprobe opt on ARM is relatively simpler than kprobe opt on x86 because ARM instruction is always 4 bytes aligned and 4 bytes long. This patch replace probed instruction by a 'b', branch to trampoline code and then calls optimized_callback(). optimized_callback() calls opt_pre_handler() to execute kprobe handler. It also emulate/simulate replaced instruction. When unregistering kprobe, the deferred manner of unoptimizer may leave branch instruction before optimizer is called. Different from x86_64, which only copy the probed insn after optprobe_template_end and reexecute them, this patch call singlestep to emulate/simulate the insn directly. Futher patch can optimize this behavior. v1 - v2: - Improvement: if replaced instruction is conditional, generate a conditional branch instruction for it; - Introduces RELATIVEJUMP_OPCODES due to ARM kprobe_opcode_t is 4 bytes; - Removes size field in struct arch_optimized_insn; - Use arm_gen_branch() to generate branch instruction; - Remove all recover logic: ARM doesn't use tail buffer, no need recover replaced instructions like x86; - Remove incorrect CONFIG_THUMB checking; - can_optimize() always returns true if address is well aligned; - Improve optimized_callback: using opt_pre_handler(); - Bugfix: correct range checking code and improve comments; - Fix commit message. v2 - v3: - Rename RELATIVEJUMP_OPCODES to MAX_COPIED_INSNS; - Remove unneeded checking: arch_check_optimized_kprobe(), can_optimize(); - Add missing flush_icache_range() in arch_prepare_optimized_kprobe(); - Remove unneeded 'return;'. v3 - v4: - Use __mem_to_opcode_arm() to translate copied_insn to ensure it works in big endian kernel; - Replace 'nop' placeholder in trampoline code template with '.long 0' to avoid confusion: reader may regard 'nop' as an instruction, but it is value in fact. v4 - v5: - Don't optimize stack store operations. - Introduce prepared field to arch_optimized_insn to indicate whether it is prepared. Similar to size field with x86. See v1 - v2. v5 - v6: - Dynamically reserve stack according to instruction. - Rename: kprobes-opt.c - kprobes-opt-arm.c. - Set op-optinsn.insn after all works are done. v6 - v7: - Using checker to check stack consumption. v7 - v8: - Small code adjustments. Signed-off-by: Wang Nan wangn...@huawei.com Acked-by: Masami Hiramatsu masami.hiramatsu...@hitachi.com Cc: Jon Medhurst (Tixy) t...@linaro.org Cc: Russell King - ARM Linux li...@arm.linux.org.uk Cc: Will Deacon will.dea...@arm.com --- arch/arm/Kconfig | 1 + arch/arm/include/asm/kprobes.h| 26 arch/arm/kernel/Makefile | 3 +- arch/arm/kernel/kprobes-opt-arm.c | 285 ++ 4 files changed, 314 insertions(+), 1 deletion(-) create mode 100644 arch/arm/kernel/kprobes-opt-arm.c diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 89c4b5c..8281cea 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -59,6 +59,7 @@ config ARM select HAVE_MEMBLOCK select HAVE_MOD_ARCH_SPECIFIC if ARM_UNWIND select HAVE_OPROFILE if (HAVE_PERF_EVENTS) + select HAVE_OPTPROBES if (!THUMB2_KERNEL) select HAVE_PERF_EVENTS select HAVE_PERF_REGS select HAVE_PERF_USER_STACK_DUMP diff --git a/arch/arm/include/asm/kprobes.h b/arch/arm/include/asm/kprobes.h index 56f9ac6..c1016cb 100644 --- a/arch/arm/include/asm/kprobes.h +++ b/arch/arm/include/asm/kprobes.h @@ -50,5 +50,31 @@ int kprobe_fault_handler(struct pt_regs *regs, unsigned int fsr); int kprobe_exceptions_notify(struct notifier_block *self, unsigned long val, void *data); +/* optinsn template addresses */ +extern __visible kprobe_opcode_t optprobe_template_entry; +extern __visible kprobe_opcode_t optprobe_template_val; +extern __visible kprobe_opcode_t optprobe_template_call; +extern __visible kprobe_opcode_t optprobe_template_end; + +#define MAX_OPTIMIZED_LENGTH (4) +#define MAX_OPTINSN_SIZE \ + (((unsigned long)optprobe_template_end - \ + (unsigned long)optprobe_template_entry)) +#define RELATIVEJUMP_SIZE (4) + +struct arch_optimized_insn { + /* +* copy of the original instructions. +* Different from x86, ARM kprobe_opcode_t is u32. +*/ +#define MAX_COPIED_INSN((RELATIVEJUMP_SIZE) / sizeof(kprobe_opcode_t)) + kprobe_opcode_t copied_insn[MAX_COPIED_INSN]; + /* detour code buffer */ + kprobe_opcode_t *insn; + /* +* we always copies one instruction on arm32, +* size always be 4, so