Re: [PATCH v8 2/2] ARM: kprobes: enable OPTPROBES for ARM 32

2014-11-18 Thread Wang Nan
On 2014/11/18 14:32, Wang Nan wrote:
> This patch introduce kprobeopt for ARM 32.
> 
> Limitations:
>  - Currently only kernel compiled with ARM ISA is supported.
> 
>  - Offset between probe point and optinsn slot must not larger than
>32MiB. Masami Hiramatsu suggests replacing 2 words, it will make
>things complex. Futher patch can make such optimization.
> 
> Kprobe opt on ARM is relatively simpler than kprobe opt on x86 because
> ARM instruction is always 4 bytes aligned and 4 bytes long. This patch
> replace probed instruction by a 'b', branch to trampoline code and then
> calls optimized_callback(). optimized_callback() calls opt_pre_handler()
> to execute kprobe handler. It also emulate/simulate replaced instruction.
> 
> When unregistering kprobe, the deferred manner of unoptimizer may leave
> branch instruction before optimizer is called. Different from x86_64,
> which only copy the probed insn after optprobe_template_end and
> reexecute them, this patch call singlestep to emulate/simulate the insn
> directly. Futher patch can optimize this behavior.
> 
> v1 -> v2:
> 
>  - Improvement: if replaced instruction is conditional, generate a
>conditional branch instruction for it;
> 
>  - Introduces RELATIVEJUMP_OPCODES due to ARM kprobe_opcode_t is 4
>bytes;
> 
>  - Removes size field in struct arch_optimized_insn;
> 
>  - Use arm_gen_branch() to generate branch instruction;
> 
>  - Remove all recover logic: ARM doesn't use tail buffer, no need
>recover replaced instructions like x86;
> 
>  - Remove incorrect CONFIG_THUMB checking;
> 
>  - can_optimize() always returns true if address is well aligned;
> 
>  - Improve optimized_callback: using opt_pre_handler();
> 
>  - Bugfix: correct range checking code and improve comments;
> 
>  - Fix commit message.
> 
> v2 -> v3:
> 
>  - Rename RELATIVEJUMP_OPCODES to MAX_COPIED_INSNS;
> 
>  - Remove unneeded checking:
>   arch_check_optimized_kprobe(), can_optimize();
> 
>  - Add missing flush_icache_range() in arch_prepare_optimized_kprobe();
> 
>  - Remove unneeded 'return;'.
> 
> v3 -> v4:
> 
>  - Use __mem_to_opcode_arm() to translate copied_insn to ensure it
>works in big endian kernel;
> 
>  - Replace 'nop' placeholder in trampoline code template with
>'.long 0' to avoid confusion: reader may regard 'nop' as an
>instruction, but it is value in fact.
> 
> v4 -> v5:
> 
>  - Don't optimize stack store operations.
> 
>  - Introduce prepared field to arch_optimized_insn to indicate whether
>it is prepared. Similar to size field with x86. See v1 -> v2.
> 
> v5 -> v6:
> 
>  - Dynamically reserve stack according to instruction.
> 
>  - Rename: kprobes-opt.c -> kprobes-opt-arm.c.
> 
>  - Set op->optinsn.insn after all works are done.
> 
> v6 -> v7:
> 
>  - Using checker to check stack consumption.
> 
> v7 -> v8:
> 
>  - Small code adjustments.
> 
> Signed-off-by: Wang Nan 
> Acked-by: Masami Hiramatsu 
> Cc: Jon Medhurst (Tixy) 
> Cc: Russell King - ARM Linux 
> Cc: Will Deacon 
> ---
>  arch/arm/Kconfig  |   1 +
>  arch/arm/include/asm/kprobes.h|  26 
>  arch/arm/kernel/Makefile  |   3 +-
>  arch/arm/kernel/kprobes-opt-arm.c | 285 
> ++
>  4 files changed, 314 insertions(+), 1 deletion(-)
>  create mode 100644 arch/arm/kernel/kprobes-opt-arm.c
> 
> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> index 89c4b5c..8281cea 100644
> --- a/arch/arm/Kconfig
> +++ b/arch/arm/Kconfig
> @@ -59,6 +59,7 @@ config ARM
>   select HAVE_MEMBLOCK
>   select HAVE_MOD_ARCH_SPECIFIC if ARM_UNWIND
>   select HAVE_OPROFILE if (HAVE_PERF_EVENTS)
> + select HAVE_OPTPROBES if (!THUMB2_KERNEL)
>   select HAVE_PERF_EVENTS
>   select HAVE_PERF_REGS
>   select HAVE_PERF_USER_STACK_DUMP
> diff --git a/arch/arm/include/asm/kprobes.h b/arch/arm/include/asm/kprobes.h
> index 56f9ac6..c1016cb 100644
> --- a/arch/arm/include/asm/kprobes.h
> +++ b/arch/arm/include/asm/kprobes.h
> @@ -50,5 +50,31 @@ int kprobe_fault_handler(struct pt_regs *regs, unsigned 
> int fsr);
>  int kprobe_exceptions_notify(struct notifier_block *self,
>unsigned long val, void *data);
>  
> +/* optinsn template addresses */
> +extern __visible kprobe_opcode_t optprobe_template_entry;
> +extern __visible kprobe_opcode_t optprobe_template_val;
> +extern __visible kprobe_opcode_t optprobe_template_call;
> +extern __visible kprobe_opcode_t optprobe_template_end;
> +
> +#define MAX_OPTIMIZED_LENGTH (4)
> +#define MAX_OPTINSN_SIZE \
> + (((unsigned long)_template_end -   \
> +   (unsigned long)_template_entry))
> +#define RELATIVEJUMP_SIZE(4)
> +
> +struct arch_optimized_insn {
> + /*
> +  * copy of the original instructions.
> +  * Different from x86, ARM kprobe_opcode_t is u32.
> +  */
> +#define MAX_COPIED_INSN  ((RELATIVEJUMP_SIZE) / sizeof(kprobe_opcode_t))
> + kprobe_opcode_t 

Re: [PATCH v8 2/2] ARM: kprobes: enable OPTPROBES for ARM 32

2014-11-18 Thread Wang Nan
On 2014/11/18 14:32, Wang Nan wrote:
 This patch introduce kprobeopt for ARM 32.
 
 Limitations:
  - Currently only kernel compiled with ARM ISA is supported.
 
  - Offset between probe point and optinsn slot must not larger than
32MiB. Masami Hiramatsu suggests replacing 2 words, it will make
things complex. Futher patch can make such optimization.
 
 Kprobe opt on ARM is relatively simpler than kprobe opt on x86 because
 ARM instruction is always 4 bytes aligned and 4 bytes long. This patch
 replace probed instruction by a 'b', branch to trampoline code and then
 calls optimized_callback(). optimized_callback() calls opt_pre_handler()
 to execute kprobe handler. It also emulate/simulate replaced instruction.
 
 When unregistering kprobe, the deferred manner of unoptimizer may leave
 branch instruction before optimizer is called. Different from x86_64,
 which only copy the probed insn after optprobe_template_end and
 reexecute them, this patch call singlestep to emulate/simulate the insn
 directly. Futher patch can optimize this behavior.
 
 v1 - v2:
 
  - Improvement: if replaced instruction is conditional, generate a
conditional branch instruction for it;
 
  - Introduces RELATIVEJUMP_OPCODES due to ARM kprobe_opcode_t is 4
bytes;
 
  - Removes size field in struct arch_optimized_insn;
 
  - Use arm_gen_branch() to generate branch instruction;
 
  - Remove all recover logic: ARM doesn't use tail buffer, no need
recover replaced instructions like x86;
 
  - Remove incorrect CONFIG_THUMB checking;
 
  - can_optimize() always returns true if address is well aligned;
 
  - Improve optimized_callback: using opt_pre_handler();
 
  - Bugfix: correct range checking code and improve comments;
 
  - Fix commit message.
 
 v2 - v3:
 
  - Rename RELATIVEJUMP_OPCODES to MAX_COPIED_INSNS;
 
  - Remove unneeded checking:
   arch_check_optimized_kprobe(), can_optimize();
 
  - Add missing flush_icache_range() in arch_prepare_optimized_kprobe();
 
  - Remove unneeded 'return;'.
 
 v3 - v4:
 
  - Use __mem_to_opcode_arm() to translate copied_insn to ensure it
works in big endian kernel;
 
  - Replace 'nop' placeholder in trampoline code template with
'.long 0' to avoid confusion: reader may regard 'nop' as an
instruction, but it is value in fact.
 
 v4 - v5:
 
  - Don't optimize stack store operations.
 
  - Introduce prepared field to arch_optimized_insn to indicate whether
it is prepared. Similar to size field with x86. See v1 - v2.
 
 v5 - v6:
 
  - Dynamically reserve stack according to instruction.
 
  - Rename: kprobes-opt.c - kprobes-opt-arm.c.
 
  - Set op-optinsn.insn after all works are done.
 
 v6 - v7:
 
  - Using checker to check stack consumption.
 
 v7 - v8:
 
  - Small code adjustments.
 
 Signed-off-by: Wang Nan wangn...@huawei.com
 Acked-by: Masami Hiramatsu masami.hiramatsu...@hitachi.com
 Cc: Jon Medhurst (Tixy) t...@linaro.org
 Cc: Russell King - ARM Linux li...@arm.linux.org.uk
 Cc: Will Deacon will.dea...@arm.com
 ---
  arch/arm/Kconfig  |   1 +
  arch/arm/include/asm/kprobes.h|  26 
  arch/arm/kernel/Makefile  |   3 +-
  arch/arm/kernel/kprobes-opt-arm.c | 285 
 ++
  4 files changed, 314 insertions(+), 1 deletion(-)
  create mode 100644 arch/arm/kernel/kprobes-opt-arm.c
 
 diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
 index 89c4b5c..8281cea 100644
 --- a/arch/arm/Kconfig
 +++ b/arch/arm/Kconfig
 @@ -59,6 +59,7 @@ config ARM
   select HAVE_MEMBLOCK
   select HAVE_MOD_ARCH_SPECIFIC if ARM_UNWIND
   select HAVE_OPROFILE if (HAVE_PERF_EVENTS)
 + select HAVE_OPTPROBES if (!THUMB2_KERNEL)
   select HAVE_PERF_EVENTS
   select HAVE_PERF_REGS
   select HAVE_PERF_USER_STACK_DUMP
 diff --git a/arch/arm/include/asm/kprobes.h b/arch/arm/include/asm/kprobes.h
 index 56f9ac6..c1016cb 100644
 --- a/arch/arm/include/asm/kprobes.h
 +++ b/arch/arm/include/asm/kprobes.h
 @@ -50,5 +50,31 @@ int kprobe_fault_handler(struct pt_regs *regs, unsigned 
 int fsr);
  int kprobe_exceptions_notify(struct notifier_block *self,
unsigned long val, void *data);
  
 +/* optinsn template addresses */
 +extern __visible kprobe_opcode_t optprobe_template_entry;
 +extern __visible kprobe_opcode_t optprobe_template_val;
 +extern __visible kprobe_opcode_t optprobe_template_call;
 +extern __visible kprobe_opcode_t optprobe_template_end;
 +
 +#define MAX_OPTIMIZED_LENGTH (4)
 +#define MAX_OPTINSN_SIZE \
 + (((unsigned long)optprobe_template_end -   \
 +   (unsigned long)optprobe_template_entry))
 +#define RELATIVEJUMP_SIZE(4)
 +
 +struct arch_optimized_insn {
 + /*
 +  * copy of the original instructions.
 +  * Different from x86, ARM kprobe_opcode_t is u32.
 +  */
 +#define MAX_COPIED_INSN  ((RELATIVEJUMP_SIZE) / sizeof(kprobe_opcode_t))
 + kprobe_opcode_t copied_insn[MAX_COPIED_INSN];
 + /* detour code 

[PATCH v8 2/2] ARM: kprobes: enable OPTPROBES for ARM 32

2014-11-17 Thread Wang Nan
This patch introduce kprobeopt for ARM 32.

Limitations:
 - Currently only kernel compiled with ARM ISA is supported.

 - Offset between probe point and optinsn slot must not larger than
   32MiB. Masami Hiramatsu suggests replacing 2 words, it will make
   things complex. Futher patch can make such optimization.

Kprobe opt on ARM is relatively simpler than kprobe opt on x86 because
ARM instruction is always 4 bytes aligned and 4 bytes long. This patch
replace probed instruction by a 'b', branch to trampoline code and then
calls optimized_callback(). optimized_callback() calls opt_pre_handler()
to execute kprobe handler. It also emulate/simulate replaced instruction.

When unregistering kprobe, the deferred manner of unoptimizer may leave
branch instruction before optimizer is called. Different from x86_64,
which only copy the probed insn after optprobe_template_end and
reexecute them, this patch call singlestep to emulate/simulate the insn
directly. Futher patch can optimize this behavior.

v1 -> v2:

 - Improvement: if replaced instruction is conditional, generate a
   conditional branch instruction for it;

 - Introduces RELATIVEJUMP_OPCODES due to ARM kprobe_opcode_t is 4
   bytes;

 - Removes size field in struct arch_optimized_insn;

 - Use arm_gen_branch() to generate branch instruction;

 - Remove all recover logic: ARM doesn't use tail buffer, no need
   recover replaced instructions like x86;

 - Remove incorrect CONFIG_THUMB checking;

 - can_optimize() always returns true if address is well aligned;

 - Improve optimized_callback: using opt_pre_handler();

 - Bugfix: correct range checking code and improve comments;

 - Fix commit message.

v2 -> v3:

 - Rename RELATIVEJUMP_OPCODES to MAX_COPIED_INSNS;

 - Remove unneeded checking:
  arch_check_optimized_kprobe(), can_optimize();

 - Add missing flush_icache_range() in arch_prepare_optimized_kprobe();

 - Remove unneeded 'return;'.

v3 -> v4:

 - Use __mem_to_opcode_arm() to translate copied_insn to ensure it
   works in big endian kernel;

 - Replace 'nop' placeholder in trampoline code template with
   '.long 0' to avoid confusion: reader may regard 'nop' as an
   instruction, but it is value in fact.

v4 -> v5:

 - Don't optimize stack store operations.

 - Introduce prepared field to arch_optimized_insn to indicate whether
   it is prepared. Similar to size field with x86. See v1 -> v2.

v5 -> v6:

 - Dynamically reserve stack according to instruction.

 - Rename: kprobes-opt.c -> kprobes-opt-arm.c.

 - Set op->optinsn.insn after all works are done.

v6 -> v7:

 - Using checker to check stack consumption.

v7 -> v8:

 - Small code adjustments.

Signed-off-by: Wang Nan 
Acked-by: Masami Hiramatsu 
Cc: Jon Medhurst (Tixy) 
Cc: Russell King - ARM Linux 
Cc: Will Deacon 
---
 arch/arm/Kconfig  |   1 +
 arch/arm/include/asm/kprobes.h|  26 
 arch/arm/kernel/Makefile  |   3 +-
 arch/arm/kernel/kprobes-opt-arm.c | 285 ++
 4 files changed, 314 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm/kernel/kprobes-opt-arm.c

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 89c4b5c..8281cea 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -59,6 +59,7 @@ config ARM
select HAVE_MEMBLOCK
select HAVE_MOD_ARCH_SPECIFIC if ARM_UNWIND
select HAVE_OPROFILE if (HAVE_PERF_EVENTS)
+   select HAVE_OPTPROBES if (!THUMB2_KERNEL)
select HAVE_PERF_EVENTS
select HAVE_PERF_REGS
select HAVE_PERF_USER_STACK_DUMP
diff --git a/arch/arm/include/asm/kprobes.h b/arch/arm/include/asm/kprobes.h
index 56f9ac6..c1016cb 100644
--- a/arch/arm/include/asm/kprobes.h
+++ b/arch/arm/include/asm/kprobes.h
@@ -50,5 +50,31 @@ int kprobe_fault_handler(struct pt_regs *regs, unsigned int 
fsr);
 int kprobe_exceptions_notify(struct notifier_block *self,
 unsigned long val, void *data);
 
+/* optinsn template addresses */
+extern __visible kprobe_opcode_t optprobe_template_entry;
+extern __visible kprobe_opcode_t optprobe_template_val;
+extern __visible kprobe_opcode_t optprobe_template_call;
+extern __visible kprobe_opcode_t optprobe_template_end;
+
+#define MAX_OPTIMIZED_LENGTH   (4)
+#define MAX_OPTINSN_SIZE   \
+   (((unsigned long)_template_end -   \
+ (unsigned long)_template_entry))
+#define RELATIVEJUMP_SIZE  (4)
+
+struct arch_optimized_insn {
+   /*
+* copy of the original instructions.
+* Different from x86, ARM kprobe_opcode_t is u32.
+*/
+#define MAX_COPIED_INSN((RELATIVEJUMP_SIZE) / sizeof(kprobe_opcode_t))
+   kprobe_opcode_t copied_insn[MAX_COPIED_INSN];
+   /* detour code buffer */
+   kprobe_opcode_t *insn;
+   /*
+*  we always copies one instruction on arm32,
+*  size always be 4, so no size field.
+*/
+};
 
 #endif /* _ARM_KPROBES_H */
diff --git a/arch/arm/kernel/Makefile 

[PATCH v8 2/2] ARM: kprobes: enable OPTPROBES for ARM 32

2014-11-17 Thread Wang Nan
This patch introduce kprobeopt for ARM 32.

Limitations:
 - Currently only kernel compiled with ARM ISA is supported.

 - Offset between probe point and optinsn slot must not larger than
   32MiB. Masami Hiramatsu suggests replacing 2 words, it will make
   things complex. Futher patch can make such optimization.

Kprobe opt on ARM is relatively simpler than kprobe opt on x86 because
ARM instruction is always 4 bytes aligned and 4 bytes long. This patch
replace probed instruction by a 'b', branch to trampoline code and then
calls optimized_callback(). optimized_callback() calls opt_pre_handler()
to execute kprobe handler. It also emulate/simulate replaced instruction.

When unregistering kprobe, the deferred manner of unoptimizer may leave
branch instruction before optimizer is called. Different from x86_64,
which only copy the probed insn after optprobe_template_end and
reexecute them, this patch call singlestep to emulate/simulate the insn
directly. Futher patch can optimize this behavior.

v1 - v2:

 - Improvement: if replaced instruction is conditional, generate a
   conditional branch instruction for it;

 - Introduces RELATIVEJUMP_OPCODES due to ARM kprobe_opcode_t is 4
   bytes;

 - Removes size field in struct arch_optimized_insn;

 - Use arm_gen_branch() to generate branch instruction;

 - Remove all recover logic: ARM doesn't use tail buffer, no need
   recover replaced instructions like x86;

 - Remove incorrect CONFIG_THUMB checking;

 - can_optimize() always returns true if address is well aligned;

 - Improve optimized_callback: using opt_pre_handler();

 - Bugfix: correct range checking code and improve comments;

 - Fix commit message.

v2 - v3:

 - Rename RELATIVEJUMP_OPCODES to MAX_COPIED_INSNS;

 - Remove unneeded checking:
  arch_check_optimized_kprobe(), can_optimize();

 - Add missing flush_icache_range() in arch_prepare_optimized_kprobe();

 - Remove unneeded 'return;'.

v3 - v4:

 - Use __mem_to_opcode_arm() to translate copied_insn to ensure it
   works in big endian kernel;

 - Replace 'nop' placeholder in trampoline code template with
   '.long 0' to avoid confusion: reader may regard 'nop' as an
   instruction, but it is value in fact.

v4 - v5:

 - Don't optimize stack store operations.

 - Introduce prepared field to arch_optimized_insn to indicate whether
   it is prepared. Similar to size field with x86. See v1 - v2.

v5 - v6:

 - Dynamically reserve stack according to instruction.

 - Rename: kprobes-opt.c - kprobes-opt-arm.c.

 - Set op-optinsn.insn after all works are done.

v6 - v7:

 - Using checker to check stack consumption.

v7 - v8:

 - Small code adjustments.

Signed-off-by: Wang Nan wangn...@huawei.com
Acked-by: Masami Hiramatsu masami.hiramatsu...@hitachi.com
Cc: Jon Medhurst (Tixy) t...@linaro.org
Cc: Russell King - ARM Linux li...@arm.linux.org.uk
Cc: Will Deacon will.dea...@arm.com
---
 arch/arm/Kconfig  |   1 +
 arch/arm/include/asm/kprobes.h|  26 
 arch/arm/kernel/Makefile  |   3 +-
 arch/arm/kernel/kprobes-opt-arm.c | 285 ++
 4 files changed, 314 insertions(+), 1 deletion(-)
 create mode 100644 arch/arm/kernel/kprobes-opt-arm.c

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index 89c4b5c..8281cea 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -59,6 +59,7 @@ config ARM
select HAVE_MEMBLOCK
select HAVE_MOD_ARCH_SPECIFIC if ARM_UNWIND
select HAVE_OPROFILE if (HAVE_PERF_EVENTS)
+   select HAVE_OPTPROBES if (!THUMB2_KERNEL)
select HAVE_PERF_EVENTS
select HAVE_PERF_REGS
select HAVE_PERF_USER_STACK_DUMP
diff --git a/arch/arm/include/asm/kprobes.h b/arch/arm/include/asm/kprobes.h
index 56f9ac6..c1016cb 100644
--- a/arch/arm/include/asm/kprobes.h
+++ b/arch/arm/include/asm/kprobes.h
@@ -50,5 +50,31 @@ int kprobe_fault_handler(struct pt_regs *regs, unsigned int 
fsr);
 int kprobe_exceptions_notify(struct notifier_block *self,
 unsigned long val, void *data);
 
+/* optinsn template addresses */
+extern __visible kprobe_opcode_t optprobe_template_entry;
+extern __visible kprobe_opcode_t optprobe_template_val;
+extern __visible kprobe_opcode_t optprobe_template_call;
+extern __visible kprobe_opcode_t optprobe_template_end;
+
+#define MAX_OPTIMIZED_LENGTH   (4)
+#define MAX_OPTINSN_SIZE   \
+   (((unsigned long)optprobe_template_end -   \
+ (unsigned long)optprobe_template_entry))
+#define RELATIVEJUMP_SIZE  (4)
+
+struct arch_optimized_insn {
+   /*
+* copy of the original instructions.
+* Different from x86, ARM kprobe_opcode_t is u32.
+*/
+#define MAX_COPIED_INSN((RELATIVEJUMP_SIZE) / sizeof(kprobe_opcode_t))
+   kprobe_opcode_t copied_insn[MAX_COPIED_INSN];
+   /* detour code buffer */
+   kprobe_opcode_t *insn;
+   /*
+*  we always copies one instruction on arm32,
+*  size always be 4, so