Re: [PATCH v2] arm64: fix unwind_frame() for filtered out fn for function graph tracing

2017-09-12 Thread Pratyush Anand



On Wednesday 13 September 2017 08:12 AM, Will Deacon wrote:

On Tue, Sep 12, 2017 at 10:54:28AM +0100, James Morse wrote:

Hi Pratyush,

On 01/09/17 06:48, Pratyush Anand wrote:

do_task_stat() calls get_wchan(), which further does unbind_frame().
unbind_frame() restores frame->pc to original value in case function
graph tracer has modified a return address (LR) in a stack frame to hook
a function return. However, if function graph tracer has hit a filtered
function, then we can't unwind it as ftrace_push_return_trace() has
biased the index(frame->graph) with a 'huge negative'
offset(-FTRACE_NOTRACE_DEPTH).

Moreover, arm64 stack walker defines index(frame->graph) as unsigned
int, which can not compare a -ve number.

Similar problem we can have with calling of walk_stackframe() from
save_stack_trace_tsk() or dump_backtrace().

This patch fixes unwind_frame() to test the index for -ve value and
restore index accordingly before we can restore frame->pc.


I've just spotted arm64's profile_pc, which does this:
 From arch/arm64/kernel/time.c:profile_pc():

#ifdef CONFIG_FUNCTION_GRAPH_TRACER
frame.graph = -1; /* no task info */
#endif


Is this another elaborate way of hitting this problem?

I guess the options are skip any return-address restore in the unwinder if
frame.graph is -1. (and profile_pc may have a bug here). Or, put
current->curr_ret_stack in there.


I think we should go with latter, ie assign frame.graph = 
current->curr_ret_stack in profile_pc().




profile_pc() always passes tsk=NULL, so the unwinder assumes its current...
kernel/profile.c pulls the pt_regs from a per-cpu irq_regs variable, that is
updated by handle_IPI ... so it looks like this should always be current...


Hmmm... is profile_pc the *only* case where frame->graph isn't equal to
tsk->curr_ret_stack in unwind_frame? If so, maybe unwind_frame should just


Yes, it is the only place.


use that, and we could kill the graph member of struct stackframe completely?



Humm, not sure, we initialize frame->graph out of the while loop in 
unwind_frame()'s caller and then keep in decrementing it in looped function.


--
Regards
Pratyush


Re: [PATCH v2] arm64: fix unwind_frame() for filtered out fn for function graph tracing

2017-09-12 Thread Pratyush Anand



On Wednesday 13 September 2017 08:12 AM, Will Deacon wrote:

On Tue, Sep 12, 2017 at 10:54:28AM +0100, James Morse wrote:

Hi Pratyush,

On 01/09/17 06:48, Pratyush Anand wrote:

do_task_stat() calls get_wchan(), which further does unbind_frame().
unbind_frame() restores frame->pc to original value in case function
graph tracer has modified a return address (LR) in a stack frame to hook
a function return. However, if function graph tracer has hit a filtered
function, then we can't unwind it as ftrace_push_return_trace() has
biased the index(frame->graph) with a 'huge negative'
offset(-FTRACE_NOTRACE_DEPTH).

Moreover, arm64 stack walker defines index(frame->graph) as unsigned
int, which can not compare a -ve number.

Similar problem we can have with calling of walk_stackframe() from
save_stack_trace_tsk() or dump_backtrace().

This patch fixes unwind_frame() to test the index for -ve value and
restore index accordingly before we can restore frame->pc.


I've just spotted arm64's profile_pc, which does this:
 From arch/arm64/kernel/time.c:profile_pc():

#ifdef CONFIG_FUNCTION_GRAPH_TRACER
frame.graph = -1; /* no task info */
#endif


Is this another elaborate way of hitting this problem?

I guess the options are skip any return-address restore in the unwinder if
frame.graph is -1. (and profile_pc may have a bug here). Or, put
current->curr_ret_stack in there.


I think we should go with latter, ie assign frame.graph = 
current->curr_ret_stack in profile_pc().




profile_pc() always passes tsk=NULL, so the unwinder assumes its current...
kernel/profile.c pulls the pt_regs from a per-cpu irq_regs variable, that is
updated by handle_IPI ... so it looks like this should always be current...


Hmmm... is profile_pc the *only* case where frame->graph isn't equal to
tsk->curr_ret_stack in unwind_frame? If so, maybe unwind_frame should just


Yes, it is the only place.


use that, and we could kill the graph member of struct stackframe completely?



Humm, not sure, we initialize frame->graph out of the while loop in 
unwind_frame()'s caller and then keep in decrementing it in looped function.


--
Regards
Pratyush


[PATCH v2] arm64: fix unwind_frame() for filtered out fn for function graph tracing

2017-08-31 Thread Pratyush Anand
do_task_stat() calls get_wchan(), which further does unbind_frame().
unbind_frame() restores frame->pc to original value in case function
graph tracer has modified a return address (LR) in a stack frame to hook
a function return. However, if function graph tracer has hit a filtered
function, then we can't unwind it as ftrace_push_return_trace() has
biased the index(frame->graph) with a 'huge negative'
offset(-FTRACE_NOTRACE_DEPTH).

Moreover, arm64 stack walker defines index(frame->graph) as unsigned
int, which can not compare a -ve number.

Similar problem we can have with calling of walk_stackframe() from
save_stack_trace_tsk() or dump_backtrace().

This patch fixes unwind_frame() to test the index for -ve value and
restore index accordingly before we can restore frame->pc.

Reproducer:

cd /sys/kernel/debug/tracing/
echo schedule > set_graph_notrace
echo 1 > options/display-graph
echo wakeup > current_tracer
ps -ef | grep -i agent

Above commands result in:
Unable to handle kernel paging request at virtual address 801bd3d1e000
pgd = 8003cbe97c00
[801bd3d1e000] *pgd=, *pud=
Internal error: Oops: 9606 [#1] SMP
[...]
CPU: 5 PID: 11696 Comm: ps Not tainted 4.11.0+ #33
[...]
task: 8003c21ba000 task.stack: 8003cc6c
PC is at unwind_frame+0x12c/0x180
LR is at get_wchan+0xd4/0x134
pc : [] lr : [] pstate: 6145
sp : 8003cc6c3ab0
x29: 8003cc6c3ab0 x28: 0001
x27: 0026 x26: 0026
x25: 12d8 x24: 
x23: 8003c1c04000 x22: 08c83000
x21: 8003c1c0 x20: 000f
x19: 8003c1bc x18: fc593690
x17:  x16: 0001
x15: b855670e2b60 x14: 0003e97f22cf1d0f
x13: 0001 x12: 
x11: e8f4883e x10: 000154f47ec8
x9 : 70f367c0 x8 : 
x7 : 8003f729 x6 : 0018
x5 :  x4 : 8003c1c03cb0
x3 : 8003c1c03ca0 x2 : 0017ffe8
x1 : 8003cc6c3af8 x0 : 8003d3e9e000

Process ps (pid: 11696, stack limit = 0x8003cc6c)
Stack: (0x8003cc6c3ab0 to 0x8003cc6c4000)
[...]
[] unwind_frame+0x12c/0x180
[] do_task_stat+0x864/0x870
[] proc_tgid_stat+0x3c/0x48
[] proc_single_show+0x5c/0xb8
[] seq_read+0x160/0x414
[] __vfs_read+0x58/0x164
[] vfs_read+0x88/0x144
[] SyS_read+0x60/0xc0
[] __sys_trace_return+0x0/0x4

fixes: 20380bb390a4 (arm64: ftrace: fix a stack tracer's output under function 
graph tracer)
Signed-off-by: Pratyush Anand <pan...@redhat.com>
---
v1 -> v2:
- improved commit log
- now index is restored and thereafter frame->pc as well.

 arch/arm64/include/asm/stacktrace.h | 2 +-
 arch/arm64/kernel/stacktrace.c  | 3 +++
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/stacktrace.h 
b/arch/arm64/include/asm/stacktrace.h
index 5b6eafccc5d8..816db5b9b874 100644
--- a/arch/arm64/include/asm/stacktrace.h
+++ b/arch/arm64/include/asm/stacktrace.h
@@ -23,7 +23,7 @@ struct stackframe {
unsigned long sp;
unsigned long pc;
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
-   unsigned int graph;
+   int graph;
 #endif
 };
 
diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index 09d37d66b630..4c47147d0554 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -75,6 +75,9 @@ int notrace unwind_frame(struct task_struct *tsk, struct 
stackframe *frame)
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
if (tsk->ret_stack &&
(frame->pc == (unsigned long)return_to_handler)) {
+   if (frame->graph < 0)
+   frame->graph += FTRACE_NOTRACE_DEPTH;
+
/*
 * This is a case where function graph tracer has
 * modified a return address (LR) in a stack frame
-- 
2.9.4



[PATCH v2] arm64: fix unwind_frame() for filtered out fn for function graph tracing

2017-08-31 Thread Pratyush Anand
do_task_stat() calls get_wchan(), which further does unbind_frame().
unbind_frame() restores frame->pc to original value in case function
graph tracer has modified a return address (LR) in a stack frame to hook
a function return. However, if function graph tracer has hit a filtered
function, then we can't unwind it as ftrace_push_return_trace() has
biased the index(frame->graph) with a 'huge negative'
offset(-FTRACE_NOTRACE_DEPTH).

Moreover, arm64 stack walker defines index(frame->graph) as unsigned
int, which can not compare a -ve number.

Similar problem we can have with calling of walk_stackframe() from
save_stack_trace_tsk() or dump_backtrace().

This patch fixes unwind_frame() to test the index for -ve value and
restore index accordingly before we can restore frame->pc.

Reproducer:

cd /sys/kernel/debug/tracing/
echo schedule > set_graph_notrace
echo 1 > options/display-graph
echo wakeup > current_tracer
ps -ef | grep -i agent

Above commands result in:
Unable to handle kernel paging request at virtual address 801bd3d1e000
pgd = 8003cbe97c00
[801bd3d1e000] *pgd=, *pud=
Internal error: Oops: 9606 [#1] SMP
[...]
CPU: 5 PID: 11696 Comm: ps Not tainted 4.11.0+ #33
[...]
task: 8003c21ba000 task.stack: 8003cc6c
PC is at unwind_frame+0x12c/0x180
LR is at get_wchan+0xd4/0x134
pc : [] lr : [] pstate: 6145
sp : 8003cc6c3ab0
x29: 8003cc6c3ab0 x28: 0001
x27: 0026 x26: 0026
x25: 12d8 x24: 
x23: 8003c1c04000 x22: 08c83000
x21: 8003c1c0 x20: 000f
x19: 8003c1bc x18: fc593690
x17:  x16: 0001
x15: b855670e2b60 x14: 0003e97f22cf1d0f
x13: 0001 x12: 
x11: e8f4883e x10: 000154f47ec8
x9 : 70f367c0 x8 : 
x7 : 8003f729 x6 : 0018
x5 :  x4 : 8003c1c03cb0
x3 : 8003c1c03ca0 x2 : 0017ffe8
x1 : 8003cc6c3af8 x0 : 8003d3e9e000

Process ps (pid: 11696, stack limit = 0x8003cc6c)
Stack: (0x8003cc6c3ab0 to 0x8003cc6c4000)
[...]
[] unwind_frame+0x12c/0x180
[] do_task_stat+0x864/0x870
[] proc_tgid_stat+0x3c/0x48
[] proc_single_show+0x5c/0xb8
[] seq_read+0x160/0x414
[] __vfs_read+0x58/0x164
[] vfs_read+0x88/0x144
[] SyS_read+0x60/0xc0
[] __sys_trace_return+0x0/0x4

fixes: 20380bb390a4 (arm64: ftrace: fix a stack tracer's output under function 
graph tracer)
Signed-off-by: Pratyush Anand 
---
v1 -> v2:
- improved commit log
- now index is restored and thereafter frame->pc as well.

 arch/arm64/include/asm/stacktrace.h | 2 +-
 arch/arm64/kernel/stacktrace.c  | 3 +++
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/stacktrace.h 
b/arch/arm64/include/asm/stacktrace.h
index 5b6eafccc5d8..816db5b9b874 100644
--- a/arch/arm64/include/asm/stacktrace.h
+++ b/arch/arm64/include/asm/stacktrace.h
@@ -23,7 +23,7 @@ struct stackframe {
unsigned long sp;
unsigned long pc;
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
-   unsigned int graph;
+   int graph;
 #endif
 };
 
diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index 09d37d66b630..4c47147d0554 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -75,6 +75,9 @@ int notrace unwind_frame(struct task_struct *tsk, struct 
stackframe *frame)
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
if (tsk->ret_stack &&
(frame->pc == (unsigned long)return_to_handler)) {
+   if (frame->graph < 0)
+   frame->graph += FTRACE_NOTRACE_DEPTH;
+
/*
 * This is a case where function graph tracer has
 * modified a return address (LR) in a stack frame
-- 
2.9.4



Re: [PATCH 03/14] resource: add walk_system_ram_res_rev()

2017-08-30 Thread Pratyush Anand



On Thursday 24 August 2017 01:48 PM, AKASHI Takahiro wrote:

This function, being a variant of walk_system_ram_res() introduced in
commit 8c86e70acead ("resource: provide new functions to walk through
resources"), walks through a list of all the resources of System RAM
in reversed order, i.e., from higher to lower.

It will be used in kexec_file implementation on arm64.

Signed-off-by: AKASHI Takahiro 
Cc: Vivek Goyal 
Cc: Andrew Morton 
Cc: Linus Torvalds 
---
  include/linux/ioport.h |  3 +++
  kernel/resource.c  | 48 
  2 files changed, 51 insertions(+)

diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index 6230064d7f95..9a212266299f 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -271,6 +271,9 @@ extern int
  walk_system_ram_res(u64 start, u64 end, void *arg,
int (*func)(u64, u64, void *));
  extern int
+walk_system_ram_res_rev(u64 start, u64 end, void *arg,
+   int (*func)(u64, u64, void *));
+extern int
  walk_iomem_res_desc(unsigned long desc, unsigned long flags, u64 start, u64 
end,
void *arg, int (*func)(u64, u64, void *));
  
diff --git a/kernel/resource.c b/kernel/resource.c

index 9b5f04404152..1d6d734c75ac 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -23,6 +23,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  
  
@@ -469,6 +470,53 @@ int walk_system_ram_res(u64 start, u64 end, void *arg,

return ret;
  }
  
+int walk_system_ram_res_rev(u64 start, u64 end, void *arg,

+   int (*func)(u64, u64, void *))
+{
+   struct resource res, *rams;
+   u64 orig_end;
+   int count, i;
+   int ret = -1;
+
+   count = 16; /* initial */
+again:
+   /* create a list */
+   rams = vmalloc(sizeof(struct resource) * count);
+   if (!rams)
+   return ret;
+
+   res.start = start;
+   res.end = end;
+   res.flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
+   orig_end = res.end;
+   i = 0;
+   while ((res.start < res.end) &&
+   (!find_next_iomem_res(, IORES_DESC_NONE, true))) {
+   if (i >= count) {
+   /* unlikely but */
+   vfree(rams);
+   count += 16;
+   goto again;


Wounld't it be better to re-alloc a bigger space,copy previous values and free 
the previous pointer, instead of going *again*.



+   }
+
+   rams[i].start = res.start;
+   rams[i++].end = res.end;
+
+   res.start = res.end + 1;
+   res.end = orig_end;
+   }
+
+   /* go reverse */
+   for (i--; i >= 0; i--) {
+   ret = (*func)(rams[i].start, rams[i].end, arg);
+   if (ret)
+   break;
+   }
+
+   vfree(rams);
+   return ret;
+}
+
  #if !defined(CONFIG_ARCH_HAS_WALK_MEMORY)
  
  /*




--
Regards
Pratyush


Re: [PATCH 03/14] resource: add walk_system_ram_res_rev()

2017-08-30 Thread Pratyush Anand



On Thursday 24 August 2017 01:48 PM, AKASHI Takahiro wrote:

This function, being a variant of walk_system_ram_res() introduced in
commit 8c86e70acead ("resource: provide new functions to walk through
resources"), walks through a list of all the resources of System RAM
in reversed order, i.e., from higher to lower.

It will be used in kexec_file implementation on arm64.

Signed-off-by: AKASHI Takahiro 
Cc: Vivek Goyal 
Cc: Andrew Morton 
Cc: Linus Torvalds 
---
  include/linux/ioport.h |  3 +++
  kernel/resource.c  | 48 
  2 files changed, 51 insertions(+)

diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index 6230064d7f95..9a212266299f 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -271,6 +271,9 @@ extern int
  walk_system_ram_res(u64 start, u64 end, void *arg,
int (*func)(u64, u64, void *));
  extern int
+walk_system_ram_res_rev(u64 start, u64 end, void *arg,
+   int (*func)(u64, u64, void *));
+extern int
  walk_iomem_res_desc(unsigned long desc, unsigned long flags, u64 start, u64 
end,
void *arg, int (*func)(u64, u64, void *));
  
diff --git a/kernel/resource.c b/kernel/resource.c

index 9b5f04404152..1d6d734c75ac 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -23,6 +23,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  
  
@@ -469,6 +470,53 @@ int walk_system_ram_res(u64 start, u64 end, void *arg,

return ret;
  }
  
+int walk_system_ram_res_rev(u64 start, u64 end, void *arg,

+   int (*func)(u64, u64, void *))
+{
+   struct resource res, *rams;
+   u64 orig_end;
+   int count, i;
+   int ret = -1;
+
+   count = 16; /* initial */
+again:
+   /* create a list */
+   rams = vmalloc(sizeof(struct resource) * count);
+   if (!rams)
+   return ret;
+
+   res.start = start;
+   res.end = end;
+   res.flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
+   orig_end = res.end;
+   i = 0;
+   while ((res.start < res.end) &&
+   (!find_next_iomem_res(, IORES_DESC_NONE, true))) {
+   if (i >= count) {
+   /* unlikely but */
+   vfree(rams);
+   count += 16;
+   goto again;


Wounld't it be better to re-alloc a bigger space,copy previous values and free 
the previous pointer, instead of going *again*.



+   }
+
+   rams[i].start = res.start;
+   rams[i++].end = res.end;
+
+   res.start = res.end + 1;
+   res.end = orig_end;
+   }
+
+   /* go reverse */
+   for (i--; i >= 0; i--) {
+   ret = (*func)(rams[i].start, rams[i].end, arg);
+   if (ret)
+   break;
+   }
+
+   vfree(rams);
+   return ret;
+}
+
  #if !defined(CONFIG_ARCH_HAS_WALK_MEMORY)
  
  /*




--
Regards
Pratyush


Re: [PATCH] arm64: fix unwind_frame() for filtered out fn via set_graph_notrace

2017-08-30 Thread Pratyush Anand



On Tuesday 29 August 2017 10:33 PM, James Morse wrote:

Hi Pratyush,

(Nit: get_maintainer.pl will give you the list of people to CC, you're reliant
on the maintainers being eagle-eyed to spot this!)


I noticed it after sending. I do use it, but there was a bug in my cccmd 
script. I have fixed it. I hope, it won't miss from next time.




On 28/08/17 13:53, Pratyush Anand wrote:

Testcase:
cd /sys/kernel/debug/tracing/
echo schedule > set_graph_notrace
echo 1 > options/display-graph
echo wakeup > current_tracer
ps -ef | grep -i agent

Above commands result in
PANIC: "Unable to handle kernel paging request at virtual address
801bcbde7000"


This didn't panic for me, it just killed the running task. (I guess you have
panic-on-oops set)



yes..




vmcore analysis:
1)
crash> bt
PID: 1561   TASK: 8003cb7e4000  CPU: 0   COMMAND: "ps"
  #0 [8003c4ff77b0] crash_kexec at 0816b9b8
  #1 [8003c4ff77e0] die at 08088b34
  #2 [8003c4ff7820] __do_kernel_fault at 0809b830
  #3 [8003c4ff7850] do_bad_area at 08098b90
  #4 [8003c4ff7880] do_translation_fault at 087c6cdc
  #5 [8003c4ff78b0] do_mem_abort at 08081334
  #6 [8003c4ff7ab0] el1_ia at 08082cc0
  PC: 0808811c  [unwind_frame+300]
  LR: 080858a8  [get_wchan+212]
  SP: 8003c4ff7ab0  PSTATE: 6145
 X29: 8003c4ff7ab0  X28: 0001  X27: 
 X26:   X25:   X24: 
 X23: 8003c1c2  X22: 08c73000  X21: 8003c1c1c000
 X20: 000f  X19: 8003c1bc7000  X18: 0010
 X17:   X16: 0001  X15: ffed
 X14: 0010  X13:   X12: 0004
 X11:   X10: 02dd14c0   X9: 1999
  X8: 003f   X7: 8003f71b   X6: 0018
  X5:    X4: 8003c1c1fd20   X3: 8003c1c1fd10
  X2: 0017ffe8   X1: 8003c4ff7af8   X0: 8003cbf67000
  #7 [8003c4ff7b20] do_task_stat at 08304f0c
  #8 [8003c4ff7c60] proc_tgid_stat at 08305b48
  #9 [8003c4ff7ca0] proc_single_show at 082fdd10
  #10 [8003c4ff7ce0] seq_read at 082b27bc
  #11 [8003c4ff7d70] __vfs_read at 08289e54
  #12 [8003c4ff7e30] vfs_read at 0828b14c
  #13 [8003c4ff7e70] sys_read at 0828c2d0
  #14 [8003c4ff7ed0] __sys_trace at 0808349c
  PC: 0006  LR:   SP: ffed  PSTATE: 003f
 X12: 0010 X11:  X10: 0004  X9: 7febe8d0
  X8:   X7: 1999  X6: 003f  X5: 000c
  X4: 7fce9c78  X3: 000c  X2:   X1: 
  X0: 0400

(2) Instruction at 0808811c caused IA/DA.

crash> dis -l 08088108 6
/usr/src/debug//xx/arch/arm64/kernel/stacktrace.c:
84
0x08088108 <unwind_frame+280>:  ldr w2, [x1,#24]
0x0808810c <unwind_frame+284>:  sub w6, w2, #0x1
0x08088110 <unwind_frame+288>:  str w6, [x1,#24]
0x08088114 <unwind_frame+292>:  mov w6, #0x18 // #24
0x08088118 <unwind_frame+296>:  umull   x2, w2, w6
0x0808811c <unwind_frame+300>:  ldr x0, [x0,x2]

Corresponding c statement is
frame->pc = tsk->ret_stack[frame->graph--].ret;

(3) So, it caused data abort while executing
0x0808811c <unwind_frame+300>:  ldr x0, [x0,x2]

x0 + x2 = 8003cbf67000 + 0017ffe8 = 801bcbde7000
Access of 801bcbde7000 resulted in "Unable to handle kernel paging
request"

from above data: frame->graph = task->curr_ret_stack which  should be,
x2 / 0x18 = , which is -FTRACE_NOTRACE_DEPTH.


(0x18 is the size of struct ftrace_ret_stack for your config?)


yes.





OK, so problem is here:
do_task_stat() calls get_wchan(). Here p->curr_ret_stack  is
-FTRACE_NOTRACE_DEPTH in the failed case.



It means, when do_task_stat()
has been called for task A (ps in this case) on CPUm,task A was in mid
of execution on CPUn,


get_wchan() on a running processes can't work: the stack may be modified while
we walk it.
 From arch/arm64/kernel/process.c::get_wchan():

if (!p || p == current || p->state == TASK_RUNNING)
return 0;


As far as I can see, if task A is running on CPU-N then CPU-Ms attempt to
get_wchan() it will return 0.


You seems right.





and was in the mid of mcount() execution where
curr_ret_stack had been decremented in ftrace_push_return_trace() for
hitting schedule() function, but it was yet to be incremented in
ftrace_return_to_handler().


So if the function-graph-tracer has hooked the return values on a stack and hit
a filtered function, (schedule() in your example),

Re: [PATCH] arm64: fix unwind_frame() for filtered out fn via set_graph_notrace

2017-08-30 Thread Pratyush Anand



On Tuesday 29 August 2017 10:33 PM, James Morse wrote:

Hi Pratyush,

(Nit: get_maintainer.pl will give you the list of people to CC, you're reliant
on the maintainers being eagle-eyed to spot this!)


I noticed it after sending. I do use it, but there was a bug in my cccmd 
script. I have fixed it. I hope, it won't miss from next time.




On 28/08/17 13:53, Pratyush Anand wrote:

Testcase:
cd /sys/kernel/debug/tracing/
echo schedule > set_graph_notrace
echo 1 > options/display-graph
echo wakeup > current_tracer
ps -ef | grep -i agent

Above commands result in
PANIC: "Unable to handle kernel paging request at virtual address
801bcbde7000"


This didn't panic for me, it just killed the running task. (I guess you have
panic-on-oops set)



yes..




vmcore analysis:
1)
crash> bt
PID: 1561   TASK: 8003cb7e4000  CPU: 0   COMMAND: "ps"
  #0 [8003c4ff77b0] crash_kexec at 0816b9b8
  #1 [8003c4ff77e0] die at 08088b34
  #2 [8003c4ff7820] __do_kernel_fault at 0809b830
  #3 [8003c4ff7850] do_bad_area at 08098b90
  #4 [8003c4ff7880] do_translation_fault at 087c6cdc
  #5 [8003c4ff78b0] do_mem_abort at 08081334
  #6 [8003c4ff7ab0] el1_ia at 08082cc0
  PC: 0808811c  [unwind_frame+300]
  LR: 080858a8  [get_wchan+212]
  SP: 8003c4ff7ab0  PSTATE: 6145
 X29: 8003c4ff7ab0  X28: 0001  X27: 
 X26:   X25:   X24: 
 X23: 8003c1c2  X22: 08c73000  X21: 8003c1c1c000
 X20: 000f  X19: 8003c1bc7000  X18: 0010
 X17:   X16: 0001  X15: ffed
 X14: 0010  X13:   X12: 0004
 X11:   X10: 02dd14c0   X9: 1999
  X8: 003f   X7: 8003f71b   X6: 0018
  X5:    X4: 8003c1c1fd20   X3: 8003c1c1fd10
  X2: 0017ffe8   X1: 8003c4ff7af8   X0: 8003cbf67000
  #7 [8003c4ff7b20] do_task_stat at 08304f0c
  #8 [8003c4ff7c60] proc_tgid_stat at 08305b48
  #9 [8003c4ff7ca0] proc_single_show at 082fdd10
  #10 [8003c4ff7ce0] seq_read at 082b27bc
  #11 [8003c4ff7d70] __vfs_read at 08289e54
  #12 [8003c4ff7e30] vfs_read at 0828b14c
  #13 [8003c4ff7e70] sys_read at 0828c2d0
  #14 [8003c4ff7ed0] __sys_trace at 0808349c
  PC: 0006  LR:   SP: ffed  PSTATE: 003f
 X12: 0010 X11:  X10: 0004  X9: 7febe8d0
  X8:   X7: 1999  X6: 003f  X5: 000c
  X4: 7fce9c78  X3: 000c  X2:   X1: 
  X0: 0400

(2) Instruction at 0808811c caused IA/DA.

crash> dis -l 08088108 6
/usr/src/debug//xx/arch/arm64/kernel/stacktrace.c:
84
0x08088108 :  ldr w2, [x1,#24]
0x0808810c :  sub w6, w2, #0x1
0x08088110 :  str w6, [x1,#24]
0x08088114 :  mov w6, #0x18 // #24
0x08088118 :  umull   x2, w2, w6
0x0808811c :  ldr x0, [x0,x2]

Corresponding c statement is
frame->pc = tsk->ret_stack[frame->graph--].ret;

(3) So, it caused data abort while executing
0x0808811c :  ldr x0, [x0,x2]

x0 + x2 = 8003cbf67000 + 0017ffe8 = 801bcbde7000
Access of 801bcbde7000 resulted in "Unable to handle kernel paging
request"

from above data: frame->graph = task->curr_ret_stack which  should be,
x2 / 0x18 = , which is -FTRACE_NOTRACE_DEPTH.


(0x18 is the size of struct ftrace_ret_stack for your config?)


yes.





OK, so problem is here:
do_task_stat() calls get_wchan(). Here p->curr_ret_stack  is
-FTRACE_NOTRACE_DEPTH in the failed case.



It means, when do_task_stat()
has been called for task A (ps in this case) on CPUm,task A was in mid
of execution on CPUn,


get_wchan() on a running processes can't work: the stack may be modified while
we walk it.
 From arch/arm64/kernel/process.c::get_wchan():

if (!p || p == current || p->state == TASK_RUNNING)
return 0;


As far as I can see, if task A is running on CPU-N then CPU-Ms attempt to
get_wchan() it will return 0.


You seems right.





and was in the mid of mcount() execution where
curr_ret_stack had been decremented in ftrace_push_return_trace() for
hitting schedule() function, but it was yet to be incremented in
ftrace_return_to_handler().


So if the function-graph-tracer has hooked the return values on a stack and hit
a filtered function, (schedule() in your example), we can't unwind it as
ftrace_push_return_trace() has biased the index with a 'huge negative' offset to
flag it as 'this should be filtered out'.

The arm64 stack walker isn't awa

[PATCH] arm64: fix unwind_frame() for filtered out fn via set_graph_notrace

2017-08-28 Thread Pratyush Anand
Testcase:
cd /sys/kernel/debug/tracing/
echo schedule > set_graph_notrace
echo 1 > options/display-graph
echo wakeup > current_tracer
ps -ef | grep -i agent

Above commands result in
PANIC: "Unable to handle kernel paging request at virtual address
801bcbde7000"

vmcore analysis:
1)
crash> bt
PID: 1561   TASK: 8003cb7e4000  CPU: 0   COMMAND: "ps"
 #0 [8003c4ff77b0] crash_kexec at 0816b9b8
 #1 [8003c4ff77e0] die at 08088b34
 #2 [8003c4ff7820] __do_kernel_fault at 0809b830
 #3 [8003c4ff7850] do_bad_area at 08098b90
 #4 [8003c4ff7880] do_translation_fault at 087c6cdc
 #5 [8003c4ff78b0] do_mem_abort at 08081334
 #6 [8003c4ff7ab0] el1_ia at 08082cc0
 PC: 0808811c  [unwind_frame+300]
 LR: 080858a8  [get_wchan+212]
 SP: 8003c4ff7ab0  PSTATE: 6145
X29: 8003c4ff7ab0  X28: 0001  X27: 
X26:   X25:   X24: 
X23: 8003c1c2  X22: 08c73000  X21: 8003c1c1c000
X20: 000f  X19: 8003c1bc7000  X18: 0010
X17:   X16: 0001  X15: ffed
X14: 0010  X13:   X12: 0004
X11:   X10: 02dd14c0   X9: 1999
 X8: 003f   X7: 8003f71b   X6: 0018
 X5:    X4: 8003c1c1fd20   X3: 8003c1c1fd10
 X2: 0017ffe8   X1: 8003c4ff7af8   X0: 8003cbf67000
 #7 [8003c4ff7b20] do_task_stat at 08304f0c
 #8 [8003c4ff7c60] proc_tgid_stat at 08305b48
 #9 [8003c4ff7ca0] proc_single_show at 082fdd10
 #10 [8003c4ff7ce0] seq_read at 082b27bc
 #11 [8003c4ff7d70] __vfs_read at 08289e54
 #12 [8003c4ff7e30] vfs_read at 0828b14c
 #13 [8003c4ff7e70] sys_read at 0828c2d0
 #14 [8003c4ff7ed0] __sys_trace at 0808349c
 PC: 0006  LR:   SP: ffed  PSTATE: 003f
X12: 0010 X11:  X10: 0004  X9: 7febe8d0
 X8:   X7: 1999  X6: 003f  X5: 000c
 X4: 7fce9c78  X3: 000c  X2:   X1: 
 X0: 0400

(2) Instruction at 0808811c caused IA/DA.

crash> dis -l 08088108 6
/usr/src/debug//xx/arch/arm64/kernel/stacktrace.c:
84
0x08088108 <unwind_frame+280>:  ldr w2, [x1,#24]
0x0808810c <unwind_frame+284>:  sub w6, w2, #0x1
0x08088110 <unwind_frame+288>:  str w6, [x1,#24]
0x08088114 <unwind_frame+292>:  mov w6, #0x18 // #24
0x08088118 <unwind_frame+296>:  umull   x2, w2, w6
0x0808811c <unwind_frame+300>:  ldr x0, [x0,x2]

Corresponding c statement is
frame->pc = tsk->ret_stack[frame->graph--].ret;

(3) So, it caused data abort while executing
0x0808811c <unwind_frame+300>:  ldr x0, [x0,x2]

x0 + x2 = 8003cbf67000 + 0017ffe8 = 801bcbde7000
Access of 801bcbde7000 resulted in "Unable to handle kernel paging
request"

from above data: frame->graph = task->curr_ret_stack which  should be,
x2 / 0x18 = , which is -FTRACE_NOTRACE_DEPTH.

OK, so problem is here:
do_task_stat() calls get_wchan(). Here p->curr_ret_stack  is
-FTRACE_NOTRACE_DEPTH in the failed case. It means, when do_task_stat()
has been called for task A (ps in this case) on CPUm,task A was in mid
of execution on CPUn, and was in the mid of mcount() execution where
curr_ret_stack had been decremented in ftrace_push_return_trace() for
hitting schedule() function, but it was yet to be incremented in
ftrace_return_to_handler().

Similar problem we can have with calling of walk_stackframe() from
save_stack_trace_tsk() or dump_backtrace().

This patch fixes unwind_frame() to not to manipulate frame->pc for
function graph tracer if the function has been marked in
set_graph_notrace.

This patch fixes: 20380bb390a4 (arm64: ftrace: fix a stack tracer's
output under function graph tracer)

Signed-off-by: Pratyush Anand <pan...@redhat.com>
---
 arch/arm64/kernel/stacktrace.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index 09d37d66b630..e79035d673b3 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -74,7 +74,8 @@ int notrace unwind_frame(struct task_struct *tsk, struct 
stackframe *frame)
 
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
if (tsk->ret_stack &&
-   (frame->pc == (unsigned long)return_to_handler)) {
+   (frame->pc == (unsigned long)return_to_handler) &&
+   (frame->graph > -1)) {
/*
 * This is a case where function graph tracer has
 * modified a return address (LR) in a stack frame
-- 
2.9.4



[PATCH] arm64: fix unwind_frame() for filtered out fn via set_graph_notrace

2017-08-28 Thread Pratyush Anand
Testcase:
cd /sys/kernel/debug/tracing/
echo schedule > set_graph_notrace
echo 1 > options/display-graph
echo wakeup > current_tracer
ps -ef | grep -i agent

Above commands result in
PANIC: "Unable to handle kernel paging request at virtual address
801bcbde7000"

vmcore analysis:
1)
crash> bt
PID: 1561   TASK: 8003cb7e4000  CPU: 0   COMMAND: "ps"
 #0 [8003c4ff77b0] crash_kexec at 0816b9b8
 #1 [8003c4ff77e0] die at 08088b34
 #2 [8003c4ff7820] __do_kernel_fault at 0809b830
 #3 [8003c4ff7850] do_bad_area at 08098b90
 #4 [8003c4ff7880] do_translation_fault at 087c6cdc
 #5 [8003c4ff78b0] do_mem_abort at 08081334
 #6 [8003c4ff7ab0] el1_ia at 08082cc0
 PC: 0808811c  [unwind_frame+300]
 LR: 080858a8  [get_wchan+212]
 SP: 8003c4ff7ab0  PSTATE: 6145
X29: 8003c4ff7ab0  X28: 0001  X27: 
X26:   X25:   X24: 
X23: 8003c1c2  X22: 08c73000  X21: 8003c1c1c000
X20: 000f  X19: 8003c1bc7000  X18: 0010
X17:   X16: 0001  X15: ffed
X14: 0010  X13:   X12: 0004
X11:   X10: 02dd14c0   X9: 1999
 X8: 003f   X7: 8003f71b   X6: 0018
 X5:    X4: 8003c1c1fd20   X3: 8003c1c1fd10
 X2: 0017ffe8   X1: 8003c4ff7af8   X0: 8003cbf67000
 #7 [8003c4ff7b20] do_task_stat at 08304f0c
 #8 [8003c4ff7c60] proc_tgid_stat at 08305b48
 #9 [8003c4ff7ca0] proc_single_show at 082fdd10
 #10 [8003c4ff7ce0] seq_read at 082b27bc
 #11 [8003c4ff7d70] __vfs_read at 08289e54
 #12 [8003c4ff7e30] vfs_read at 0828b14c
 #13 [8003c4ff7e70] sys_read at 0828c2d0
 #14 [8003c4ff7ed0] __sys_trace at 0808349c
 PC: 0006  LR:   SP: ffed  PSTATE: 003f
X12: 0010 X11:  X10: 0004  X9: 7febe8d0
 X8:   X7: 1999  X6: 003f  X5: 000c
 X4: 7fce9c78  X3: 000c  X2:   X1: 
 X0: 0400

(2) Instruction at 0808811c caused IA/DA.

crash> dis -l 08088108 6
/usr/src/debug//xx/arch/arm64/kernel/stacktrace.c:
84
0x08088108 :  ldr w2, [x1,#24]
0x0808810c :  sub w6, w2, #0x1
0x08088110 :  str w6, [x1,#24]
0x08088114 :  mov w6, #0x18 // #24
0x08088118 :  umull   x2, w2, w6
0x0808811c :  ldr x0, [x0,x2]

Corresponding c statement is
frame->pc = tsk->ret_stack[frame->graph--].ret;

(3) So, it caused data abort while executing
0x0808811c :  ldr x0, [x0,x2]

x0 + x2 = 8003cbf67000 + 0017ffe8 = 801bcbde7000
Access of 801bcbde7000 resulted in "Unable to handle kernel paging
request"

from above data: frame->graph = task->curr_ret_stack which  should be,
x2 / 0x18 = , which is -FTRACE_NOTRACE_DEPTH.

OK, so problem is here:
do_task_stat() calls get_wchan(). Here p->curr_ret_stack  is
-FTRACE_NOTRACE_DEPTH in the failed case. It means, when do_task_stat()
has been called for task A (ps in this case) on CPUm,task A was in mid
of execution on CPUn, and was in the mid of mcount() execution where
curr_ret_stack had been decremented in ftrace_push_return_trace() for
hitting schedule() function, but it was yet to be incremented in
ftrace_return_to_handler().

Similar problem we can have with calling of walk_stackframe() from
save_stack_trace_tsk() or dump_backtrace().

This patch fixes unwind_frame() to not to manipulate frame->pc for
function graph tracer if the function has been marked in
set_graph_notrace.

This patch fixes: 20380bb390a4 (arm64: ftrace: fix a stack tracer's
output under function graph tracer)

Signed-off-by: Pratyush Anand 
---
 arch/arm64/kernel/stacktrace.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index 09d37d66b630..e79035d673b3 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -74,7 +74,8 @@ int notrace unwind_frame(struct task_struct *tsk, struct 
stackframe *frame)
 
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
if (tsk->ret_stack &&
-   (frame->pc == (unsigned long)return_to_handler)) {
+   (frame->pc == (unsigned long)return_to_handler) &&
+   (frame->graph > -1)) {
/*
 * This is a case where function graph tracer has
 * modified a return address (LR) in a stack frame
-- 
2.9.4



Re: [PATCH v3 0/5] ARM64: disable irq between breakpoint and step exception

2017-08-25 Thread Pratyush Anand

Hi Mark, James and Will,

What is the take on this series?

Just to summarize:

- James took first three patches of the series and replaced 4th and 5th patch 
with his 3 patch series [3].
- My patchset disables interrupt while exiting from HW breakpoint/watchpoint 
and SW breakpoint handlers, and re-enables after single step handling..but it 
will work reliably for single stepping even if an interrupt is generated 
during  breakpoint/watchpoint handler execution.
- James's approach keeps interrupt enabled, but it might fail if a hardware 
breakpoint is instrumented on an ISR called between breakpoint and step exception.


@James, I understand there would be other things on priority..do you have a 
plan to fix the pitfall and send a new series. If not, can I address Peter's 
concern on patch 1/5 and send v4?


[3] https://www.spinics.net/lists/arm-kernel/msg598214.html

On Monday 31 July 2017 04:10 PM, Pratyush Anand wrote:

v2 -> v3
- Moved step_needed from uapi structure to kernel only structure
- Re-enable interrupt if stepped instruction faults
- Modified register_wide_hw_breakpoint() to accept step_needed arg
v2 was here: http://marc.info/?l=linux-arm-kernel=149942910730496=2

v1 -> v2:
- patch 1 of v1 has been modified to patch 1-3 of v2.
- Introduced a new event attribute step_needed and implemented
   hw_breakpoint_needs_single_step() (patch 1)
- Replaced usage of is_default_overflow_handler() with
   hw_breakpoint_needs_single_step(). (patch 2)
- Modified sample test to set set step_needed bit field (patch 3)
v1 was here: http://marc.info/?l=linux-arm-kernel=149910958418708=2

samples/hw_breakpoint/data_breakpoint.c passes with x86_64 but fails with
ARM64. Even though it has been NAKed previously on upstream [1, 2], I have
tried to come up with patches which can resolve it for ARM64 as well.

I noticed that even perf step exception can go into an infinite loop if CPU
receives an interrupt while executing breakpoint/watchpoint handler. So,
event though we are not concerned about above test, we will have to find a
solution for the perf issue.

This patchset attempts to resolve both the issue. Please review.
Since, it also takes care of SW breakpoint, so I hope kgdb should also be
fine. However, I have not tested that.
@Takahiro: Will it be possible to test these patches for kgdb.

[1] http://marc.info/?l=linux-arm-kernel=149580777524910=2
[2] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-April/425266.html


Pratyush Anand (5):
   hw_breakpoint: Add step_needed event attribute
   arm64: use hw_breakpoint_needs_single_step() to decide if step is
 needed
   register_wide_hw_breakpoint(): modify to accept step_needed arg
   arm64: disable irq between breakpoint and step exception
   arm64: fault: re-enable irq if it was disabled for single stepping

  arch/arm64/kernel/debug-monitors.c  |  3 +++
  arch/arm64/kernel/hw_breakpoint.c   | 10 +-
  arch/arm64/mm/fault.c   | 28 
  arch/x86/kernel/kgdb.c  |  2 +-
  include/linux/hw_breakpoint.h   | 10 --
  include/linux/perf_event.h  |  6 ++
  kernel/events/core.c|  2 ++
  kernel/events/hw_breakpoint.c   |  4 +++-
  samples/hw_breakpoint/data_breakpoint.c |  3 ++-
  9 files changed, 54 insertions(+), 14 deletions(-)



--
Regards
Pratyush


Re: [PATCH v3 0/5] ARM64: disable irq between breakpoint and step exception

2017-08-25 Thread Pratyush Anand

Hi Mark, James and Will,

What is the take on this series?

Just to summarize:

- James took first three patches of the series and replaced 4th and 5th patch 
with his 3 patch series [3].
- My patchset disables interrupt while exiting from HW breakpoint/watchpoint 
and SW breakpoint handlers, and re-enables after single step handling..but it 
will work reliably for single stepping even if an interrupt is generated 
during  breakpoint/watchpoint handler execution.
- James's approach keeps interrupt enabled, but it might fail if a hardware 
breakpoint is instrumented on an ISR called between breakpoint and step exception.


@James, I understand there would be other things on priority..do you have a 
plan to fix the pitfall and send a new series. If not, can I address Peter's 
concern on patch 1/5 and send v4?


[3] https://www.spinics.net/lists/arm-kernel/msg598214.html

On Monday 31 July 2017 04:10 PM, Pratyush Anand wrote:

v2 -> v3
- Moved step_needed from uapi structure to kernel only structure
- Re-enable interrupt if stepped instruction faults
- Modified register_wide_hw_breakpoint() to accept step_needed arg
v2 was here: http://marc.info/?l=linux-arm-kernel=149942910730496=2

v1 -> v2:
- patch 1 of v1 has been modified to patch 1-3 of v2.
- Introduced a new event attribute step_needed and implemented
   hw_breakpoint_needs_single_step() (patch 1)
- Replaced usage of is_default_overflow_handler() with
   hw_breakpoint_needs_single_step(). (patch 2)
- Modified sample test to set set step_needed bit field (patch 3)
v1 was here: http://marc.info/?l=linux-arm-kernel=149910958418708=2

samples/hw_breakpoint/data_breakpoint.c passes with x86_64 but fails with
ARM64. Even though it has been NAKed previously on upstream [1, 2], I have
tried to come up with patches which can resolve it for ARM64 as well.

I noticed that even perf step exception can go into an infinite loop if CPU
receives an interrupt while executing breakpoint/watchpoint handler. So,
event though we are not concerned about above test, we will have to find a
solution for the perf issue.

This patchset attempts to resolve both the issue. Please review.
Since, it also takes care of SW breakpoint, so I hope kgdb should also be
fine. However, I have not tested that.
@Takahiro: Will it be possible to test these patches for kgdb.

[1] http://marc.info/?l=linux-arm-kernel=149580777524910=2
[2] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-April/425266.html


Pratyush Anand (5):
   hw_breakpoint: Add step_needed event attribute
   arm64: use hw_breakpoint_needs_single_step() to decide if step is
 needed
   register_wide_hw_breakpoint(): modify to accept step_needed arg
   arm64: disable irq between breakpoint and step exception
   arm64: fault: re-enable irq if it was disabled for single stepping

  arch/arm64/kernel/debug-monitors.c  |  3 +++
  arch/arm64/kernel/hw_breakpoint.c   | 10 +-
  arch/arm64/mm/fault.c   | 28 
  arch/x86/kernel/kgdb.c  |  2 +-
  include/linux/hw_breakpoint.h   | 10 --
  include/linux/perf_event.h  |  6 ++
  kernel/events/core.c|  2 ++
  kernel/events/hw_breakpoint.c   |  4 +++-
  samples/hw_breakpoint/data_breakpoint.c |  3 ++-
  9 files changed, 54 insertions(+), 14 deletions(-)



--
Regards
Pratyush


[PATCH] tracer: hwlat: allow to be default bootup tracer

2017-08-11 Thread Pratyush Anand
hwlat is not a production kernel tracer, however it can be used to
identify any HW latency issue during kernel boot as well.Therefore call
init_hwlat_tracer() though core_initcall() so that we can pass
ftrace=hwlat in kernel commandline parameter and we can have hwlat as
default bootup tracer.

Signed-off-by: Pratyush Anand <pan...@redhat.com>
---
 kernel/trace/trace_hwlat.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/trace/trace_hwlat.c b/kernel/trace/trace_hwlat.c
index d7c8e4ec3d9d..09f8f0950b6c 100644
--- a/kernel/trace/trace_hwlat.c
+++ b/kernel/trace/trace_hwlat.c
@@ -630,4 +630,4 @@ __init static int init_hwlat_tracer(void)
 
return 0;
 }
-late_initcall(init_hwlat_tracer);
+core_initcall(init_hwlat_tracer);
-- 
2.9.4



[PATCH] tracer: hwlat: allow to be default bootup tracer

2017-08-11 Thread Pratyush Anand
hwlat is not a production kernel tracer, however it can be used to
identify any HW latency issue during kernel boot as well.Therefore call
init_hwlat_tracer() though core_initcall() so that we can pass
ftrace=hwlat in kernel commandline parameter and we can have hwlat as
default bootup tracer.

Signed-off-by: Pratyush Anand 
---
 kernel/trace/trace_hwlat.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/trace/trace_hwlat.c b/kernel/trace/trace_hwlat.c
index d7c8e4ec3d9d..09f8f0950b6c 100644
--- a/kernel/trace/trace_hwlat.c
+++ b/kernel/trace/trace_hwlat.c
@@ -630,4 +630,4 @@ __init static int init_hwlat_tracer(void)
 
return 0;
 }
-late_initcall(init_hwlat_tracer);
+core_initcall(init_hwlat_tracer);
-- 
2.9.4



Re: RCU stall when using function_graph

2017-08-09 Thread Pratyush Anand



On Sunday 06 August 2017 10:32 PM, Paul E. McKenney wrote:

On Sat, Aug 05, 2017 at 02:24:21PM +0900, 김동현 wrote:

Dear All

As for me, after configuring function_graph as below, crash disappears.
"echo 0 > d/tracing/tracing_on"
"sleep 1"

"echo function_graph > d/tracing/current_tracer"
"sleep 1"

"echo smp_call_function_single > d/tracing/set_ftrace_filter"


It will limit trace output to only for the filtered function 
(smp_call_function_single).



adb shell "sleep 1"

"echo 1 > d/tracing/tracing_on"
adb shell "sleep 1"

Right after function_graph is enabled, too many logs are traced upon IRQ
transaction which many times eventually causes stall.


That would do it!

Hmmm...

Steven, would it be helpful if RCU were to inform tracing (say) halfway
through the RCU CPU stall interval, allowing the tracer to do something
like cond_resched_rcu_qs()?  I can imagine all sorts of reasons why this
wouldn't work, for example, if all the tracing was with irqs disabled
or some such, but figured I should ask.

Does Guillermo's approach work for others?


Limited output with a couple of filtered function will definitely not cause 
RCU schedule stall. But the question is whether we should expect a full 
function graph trace working on every platform or not (specially the one which 
generates high interrupts)?


--
Regards
Pratyush


Re: RCU stall when using function_graph

2017-08-09 Thread Pratyush Anand



On Sunday 06 August 2017 10:32 PM, Paul E. McKenney wrote:

On Sat, Aug 05, 2017 at 02:24:21PM +0900, 김동현 wrote:

Dear All

As for me, after configuring function_graph as below, crash disappears.
"echo 0 > d/tracing/tracing_on"
"sleep 1"

"echo function_graph > d/tracing/current_tracer"
"sleep 1"

"echo smp_call_function_single > d/tracing/set_ftrace_filter"


It will limit trace output to only for the filtered function 
(smp_call_function_single).



adb shell "sleep 1"

"echo 1 > d/tracing/tracing_on"
adb shell "sleep 1"

Right after function_graph is enabled, too many logs are traced upon IRQ
transaction which many times eventually causes stall.


That would do it!

Hmmm...

Steven, would it be helpful if RCU were to inform tracing (say) halfway
through the RCU CPU stall interval, allowing the tracer to do something
like cond_resched_rcu_qs()?  I can imagine all sorts of reasons why this
wouldn't work, for example, if all the tracing was with irqs disabled
or some such, but figured I should ask.

Does Guillermo's approach work for others?


Limited output with a couple of filtered function will definitely not cause 
RCU schedule stall. But the question is whether we should expect a full 
function graph trace working on every platform or not (specially the one which 
generates high interrupts)?


--
Regards
Pratyush


Re: rcu_sched stall while waiting in csd_lock_wait()

2017-08-02 Thread Pratyush Anand

Hi Peter,

On Wednesday 02 August 2017 01:44 PM, Peter Zijlstra wrote:

On Wed, Aug 02, 2017 at 09:01:19AM +0530, Pratyush Anand wrote:

Hi,

I am observing following rcu_sched stall while executing `perf record -a  --
sleep 1` with one of the arm64 platform. It looks like that stalled cpu was
waiting in csd_lock_wait() from where it never came out,and so the stall.
Any help/pointer for further debugging would be very helpful. Problem also
reproduced with 4.13.0-rc3.


I'm sitting on this patch:

   
https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/commit/?h=locking/core=15377ef4fe0c86eb7fa1099575b2e86357d99e42

please give that a spin.


Thanks for the pointer.
Tried, but it did not help.

--
Regards
Pratyush


Re: rcu_sched stall while waiting in csd_lock_wait()

2017-08-02 Thread Pratyush Anand

Hi Peter,

On Wednesday 02 August 2017 01:44 PM, Peter Zijlstra wrote:

On Wed, Aug 02, 2017 at 09:01:19AM +0530, Pratyush Anand wrote:

Hi,

I am observing following rcu_sched stall while executing `perf record -a  --
sleep 1` with one of the arm64 platform. It looks like that stalled cpu was
waiting in csd_lock_wait() from where it never came out,and so the stall.
Any help/pointer for further debugging would be very helpful. Problem also
reproduced with 4.13.0-rc3.


I'm sitting on this patch:

   
https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/commit/?h=locking/core=15377ef4fe0c86eb7fa1099575b2e86357d99e42

please give that a spin.


Thanks for the pointer.
Tried, but it did not help.

--
Regards
Pratyush


Re: rcu_sched stall while waiting in csd_lock_wait()

2017-08-02 Thread Pratyush Anand

Hi Marc,

On Wednesday 02 August 2017 02:14 PM, Marc Zyngier wrote:

On 02/08/17 09:08, Will Deacon wrote:

Hi Pratyush,

On Wed, Aug 02, 2017 at 09:01:19AM +0530, Pratyush Anand wrote:

I am observing following rcu_sched stall while executing `perf record -a  --
sleep 1` with one of the arm64 platform. It looks like that stalled cpu was
waiting in csd_lock_wait() from where it never came out,and so the stall.
Any help/pointer for further debugging would be very helpful. Problem also
reproduced with 4.13.0-rc3.


When you say "also", which other kernel(s) show the problem? Is this a
recent regression? Which platform are you running on?

It would be interesting to know what the other CPUs are doing, in particular
the target of the cross-call. Either it crashed spectacularly and didn't
unlock the csd lock, or the IPI somehow wasn't delivered.

Do you see any other splats if you enable lock debugging?


Also, is that in a guest, or bare metal? If that's a guest, what's the
host's kernel version?


Its a host.
--
Regards
Pratyush


Re: rcu_sched stall while waiting in csd_lock_wait()

2017-08-02 Thread Pratyush Anand

Hi Marc,

On Wednesday 02 August 2017 02:14 PM, Marc Zyngier wrote:

On 02/08/17 09:08, Will Deacon wrote:

Hi Pratyush,

On Wed, Aug 02, 2017 at 09:01:19AM +0530, Pratyush Anand wrote:

I am observing following rcu_sched stall while executing `perf record -a  --
sleep 1` with one of the arm64 platform. It looks like that stalled cpu was
waiting in csd_lock_wait() from where it never came out,and so the stall.
Any help/pointer for further debugging would be very helpful. Problem also
reproduced with 4.13.0-rc3.


When you say "also", which other kernel(s) show the problem? Is this a
recent regression? Which platform are you running on?

It would be interesting to know what the other CPUs are doing, in particular
the target of the cross-call. Either it crashed spectacularly and didn't
unlock the csd lock, or the IPI somehow wasn't delivered.

Do you see any other splats if you enable lock debugging?


Also, is that in a guest, or bare metal? If that's a guest, what's the
host's kernel version?


Its a host.
--
Regards
Pratyush


Re: rcu_sched stall while waiting in csd_lock_wait()

2017-08-02 Thread Pratyush Anand

Hi Will,

On Wednesday 02 August 2017 01:38 PM, Will Deacon wrote:

Hi Pratyush,

On Wed, Aug 02, 2017 at 09:01:19AM +0530, Pratyush Anand wrote:

I am observing following rcu_sched stall while executing `perf record -a  --
sleep 1` with one of the arm64 platform. It looks like that stalled cpu was
waiting in csd_lock_wait() from where it never came out,and so the stall.
Any help/pointer for further debugging would be very helpful. Problem also
reproduced with 4.13.0-rc3.

When you say "also", which other kernel(s) show the problem? Is this a
recent regression? Which platform are you running on?


Other than 4.13.0-rc3 I had tested with 4.11 based kernel which has 4.11 
vanila+some 4.12 patches back ported, and log which I had attached was from 
same kernel, thats why I mentioned that it reproduced with vanilla upstream as 
well.




It would be interesting to know what the other CPUs are doing, in particular
the target of the cross-call. Either it crashed spectacularly and didn't
unlock the csd lock, or the IPI somehow wasn't delivered.

Do you see any other splats if you enable lock debugging?


It was same.

Following is the log from 4.13.0-rc3 + patch pointed by Peter:

[  173.649589] perf: interrupt took too long (4952 > 4902), lowering 
kernel.perf_event_max_sample_rate to 40300

[  201.340926] INFO: rcu_sched self-detected stall on CPU
[  201.345115] 	9-...: (6499 ticks this GP) idle=e1a/141/0 
softirq=334/334 fqs=3250

[  201.353617]   (t=6500 jiffies g=313 c=312 q=428)
[  201.358220] Task dump for CPU 9:
[  201.361431] perfR  running task0  1888   1864 0x0202
[  201.368462] Call trace:
[  201.370897] [] dump_backtrace+0x0/0x28c
[  201.376276] [] show_stack+0x24/0x2c
[  201.381312] [] sched_show_task+0x19c/0x26c
[  201.386952] [] dump_cpu_task+0x48/0x54
[  201.392250] [] rcu_dump_cpu_stacks+0xac/0xf4
[  201.398063] [] rcu_check_callbacks+0x908/0xc90
[  201.404053] [] update_process_times+0x34/0x5c
[  201.409957] [] tick_sched_handle.isra.16+0x4c/0x70
[  201.416292] [] tick_sched_timer+0x48/0x88
[  201.421847] [] __hrtimer_run_queues+0x17c/0x604
[  201.427924] [] hrtimer_interrupt+0xa4/0x1e8
[  201.433656] [] arch_timer_handler_phys+0x3c/0x48
[  201.439818] [] handle_percpu_devid_irq+0xdc/0x42c
[  201.446069] [] generic_handle_irq+0x34/0x4c
[  201.451796] [] __handle_domain_irq+0x6c/0xc4
[  201.457611] [] gic_handle_irq+0xa0/0x1b0
[  201.463080] Exception stack(0x8016df013a40 to 0x8016df013b70)
[  201.469504] 3a40:  0003  
8016df013bd0
[  201.477316] 3a60: 8016df013bd0 0008 8016df013bb8 
082113c8
[  201.485129] 3a80:  e507a9f0  

[  201.492941] 3aa0: 0005  002f547d23157399 
3a2a9f82ac9c
[  201.500754] 3ac0:   e507a7e0 
08f5b000
[  201.508566] 3ae0: 8016df013c08 08213fcc 0013 
8017616a7800
[  201.516379] 3b00: 08f5b000 082179d4  
088c1000
[  201.524191] 3b20: 8017616a7800 8016df013b70 0818bf04 
8016df013b70
[  201.532004] 3b40: 0818bf28 20400145 08213fcc 
0013

[  201.539816] 3b60: 0001 8016df013bb8
[  201.544677] [] el1_irq+0xb8/0x140
[  201.549539] [] smp_call_function_single+0x160/0x184
[  201.555965] [] cpu_function_call+0x40/0x64
[  201.561605] [] event_function_call+0x120/0x128
[  201.567594] [] _perf_event_disable+0x44/0x64
[  201.573410] [] perf_event_for_each_child+0x3c/0x84
[  201.579747] [] perf_ioctl+0x21c/0x9a4
[  201.584957] [] do_vfs_ioctl+0xcc/0x874
[  201.590250] [] sys_ioctl+0x90/0xa4
[  201.595198] [] __sys_trace_return+0x0/0x4
[  239.003035] INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 
9-... } 6592 jiffies s: 1149 root: 0x1/.

[  239.012199] blocking rcu_node structures: l=1:0-14:0x200/.
[  239.017695] Task dump for CPU 9:
[  239.020880] perfR  running task0  1888   1864 0x0202
[  239.027929] Call trace:
[  239.030346] [] __switch_to+0x64/0x70
[  239.035484] [] free_pcppages_bulk+0x43c/0x640
[  262.304244] perf: interrupt took too long (6221 > 6190), lowering 
kernel.perf_event_max_sample_rate to 32100

[  367.009704] INFO: task kworker/15:2:1187 blocked for more than 120 seconds.
[  367.015713]   Tainted: GW   4.13.0-rc3+ #2
[  367.021200] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.

[  367.028994] kworker/15:2D0  1187  2 0x
[  367.034481] Workqueue: usb_hub_wq hub_event
[  367.038629] Call trace:
[  367.041077] [] __switch_to+0x64/0x70
[  367.046184] [] __schedule+0x410/0xcbc
[  367.051403] [] schedule+0x40/0xa4
[  367.056253] [] 
_synchronize_rcu_expedited.constprop.64+0x370/0x3e8

[  367.063990] [] synchronize_sched_expedited+0x7c/0xf0
[  367.070501]

Re: rcu_sched stall while waiting in csd_lock_wait()

2017-08-02 Thread Pratyush Anand

Hi Will,

On Wednesday 02 August 2017 01:38 PM, Will Deacon wrote:

Hi Pratyush,

On Wed, Aug 02, 2017 at 09:01:19AM +0530, Pratyush Anand wrote:

I am observing following rcu_sched stall while executing `perf record -a  --
sleep 1` with one of the arm64 platform. It looks like that stalled cpu was
waiting in csd_lock_wait() from where it never came out,and so the stall.
Any help/pointer for further debugging would be very helpful. Problem also
reproduced with 4.13.0-rc3.

When you say "also", which other kernel(s) show the problem? Is this a
recent regression? Which platform are you running on?


Other than 4.13.0-rc3 I had tested with 4.11 based kernel which has 4.11 
vanila+some 4.12 patches back ported, and log which I had attached was from 
same kernel, thats why I mentioned that it reproduced with vanilla upstream as 
well.




It would be interesting to know what the other CPUs are doing, in particular
the target of the cross-call. Either it crashed spectacularly and didn't
unlock the csd lock, or the IPI somehow wasn't delivered.

Do you see any other splats if you enable lock debugging?


It was same.

Following is the log from 4.13.0-rc3 + patch pointed by Peter:

[  173.649589] perf: interrupt took too long (4952 > 4902), lowering 
kernel.perf_event_max_sample_rate to 40300

[  201.340926] INFO: rcu_sched self-detected stall on CPU
[  201.345115] 	9-...: (6499 ticks this GP) idle=e1a/141/0 
softirq=334/334 fqs=3250

[  201.353617]   (t=6500 jiffies g=313 c=312 q=428)
[  201.358220] Task dump for CPU 9:
[  201.361431] perfR  running task0  1888   1864 0x0202
[  201.368462] Call trace:
[  201.370897] [] dump_backtrace+0x0/0x28c
[  201.376276] [] show_stack+0x24/0x2c
[  201.381312] [] sched_show_task+0x19c/0x26c
[  201.386952] [] dump_cpu_task+0x48/0x54
[  201.392250] [] rcu_dump_cpu_stacks+0xac/0xf4
[  201.398063] [] rcu_check_callbacks+0x908/0xc90
[  201.404053] [] update_process_times+0x34/0x5c
[  201.409957] [] tick_sched_handle.isra.16+0x4c/0x70
[  201.416292] [] tick_sched_timer+0x48/0x88
[  201.421847] [] __hrtimer_run_queues+0x17c/0x604
[  201.427924] [] hrtimer_interrupt+0xa4/0x1e8
[  201.433656] [] arch_timer_handler_phys+0x3c/0x48
[  201.439818] [] handle_percpu_devid_irq+0xdc/0x42c
[  201.446069] [] generic_handle_irq+0x34/0x4c
[  201.451796] [] __handle_domain_irq+0x6c/0xc4
[  201.457611] [] gic_handle_irq+0xa0/0x1b0
[  201.463080] Exception stack(0x8016df013a40 to 0x8016df013b70)
[  201.469504] 3a40:  0003  
8016df013bd0
[  201.477316] 3a60: 8016df013bd0 0008 8016df013bb8 
082113c8
[  201.485129] 3a80:  e507a9f0  

[  201.492941] 3aa0: 0005  002f547d23157399 
3a2a9f82ac9c
[  201.500754] 3ac0:   e507a7e0 
08f5b000
[  201.508566] 3ae0: 8016df013c08 08213fcc 0013 
8017616a7800
[  201.516379] 3b00: 08f5b000 082179d4  
088c1000
[  201.524191] 3b20: 8017616a7800 8016df013b70 0818bf04 
8016df013b70
[  201.532004] 3b40: 0818bf28 20400145 08213fcc 
0013

[  201.539816] 3b60: 0001 8016df013bb8
[  201.544677] [] el1_irq+0xb8/0x140
[  201.549539] [] smp_call_function_single+0x160/0x184
[  201.555965] [] cpu_function_call+0x40/0x64
[  201.561605] [] event_function_call+0x120/0x128
[  201.567594] [] _perf_event_disable+0x44/0x64
[  201.573410] [] perf_event_for_each_child+0x3c/0x84
[  201.579747] [] perf_ioctl+0x21c/0x9a4
[  201.584957] [] do_vfs_ioctl+0xcc/0x874
[  201.590250] [] sys_ioctl+0x90/0xa4
[  201.595198] [] __sys_trace_return+0x0/0x4
[  239.003035] INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 
9-... } 6592 jiffies s: 1149 root: 0x1/.

[  239.012199] blocking rcu_node structures: l=1:0-14:0x200/.
[  239.017695] Task dump for CPU 9:
[  239.020880] perfR  running task0  1888   1864 0x0202
[  239.027929] Call trace:
[  239.030346] [] __switch_to+0x64/0x70
[  239.035484] [] free_pcppages_bulk+0x43c/0x640
[  262.304244] perf: interrupt took too long (6221 > 6190), lowering 
kernel.perf_event_max_sample_rate to 32100

[  367.009704] INFO: task kworker/15:2:1187 blocked for more than 120 seconds.
[  367.015713]   Tainted: GW   4.13.0-rc3+ #2
[  367.021200] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.

[  367.028994] kworker/15:2D0  1187  2 0x
[  367.034481] Workqueue: usb_hub_wq hub_event
[  367.038629] Call trace:
[  367.041077] [] __switch_to+0x64/0x70
[  367.046184] [] __schedule+0x410/0xcbc
[  367.051403] [] schedule+0x40/0xa4
[  367.056253] [] 
_synchronize_rcu_expedited.constprop.64+0x370/0x3e8

[  367.063990] [] synchronize_sched_expedited+0x7c/0xf0
[  367.070501]

Re: [PATCH v3 0/5] ARM64: disable irq between breakpoint and step exception

2017-08-02 Thread Pratyush Anand

Hi James,

Thanks for your analysis.

On Wednesday 02 August 2017 10:43 PM, James Morse wrote:

Hi Pratyush,

On 01/08/17 05:18, Pratyush Anand wrote:

On Monday 31 July 2017 10:45 PM, James Morse wrote:

On 31/07/17 11:40, Pratyush Anand wrote:

samples/hw_breakpoint/data_breakpoint.c passes with x86_64 but fails with
ARM64. Even though it has been NAKed previously on upstream [1, 2], I have
tried to come up with patches which can resolve it for ARM64 as well.

I noticed that even perf step exception can go into an infinite loop if CPU
receives an interrupt while executing breakpoint/watchpoint handler. So,
event though we are not concerned about above test, we will have to find a
solution for the perf issue.



You can easily reproduce the issue with following:
# insmod data_breakpoint.ko ksym=__sysrq_enabled
# cat /proc/sys/kernel/sysrq


Thanks, that happily dump-stacks forever. Your first three patches fix the
stepping over the watchpoint, I've had a go at fixing the interrupt interaction,
(instead of just masking interrupts).

gdb single-step works, as does kprobes, FWIW for those three:
Tested-by: James Morse <james.mo...@arm.com>



What causes your infinite loop?



Flow is like this:
- A SW or HW breakpoint exception is being generated on a cpu lets say CPU5
- Breakpoint handler does something which causes an interrupt to be active on
the same CPU. In fact there might be many other reasons for an interrupt to be
active on a CPU while breakpoint handler was being executed.
- So, as soon as we return from breakpoint exception, we go to the IRQ exception
handler, while we were expecting a single step exception.


What breaks when this happens?


I think, as soon as we call enable_dbg from el1_irq, step exception will be 
generated, and we will be stepping very first instruction after enable_dbg 
which is not expected by single step handler.




With your reproducer and the first three patches I see it hitting the watchpoint
multiple times and stepping the irq handle
Lets say we were executing instruction from address 0x2000 when watchpoint 
exception occurred. We programmed, ELR with 0x2000 for single stepping, 
however we received an interrupt before instruction at 0x2000 could have been 
single stepped.


Now if 0x3010 is the address next to enable_dbg from el1_irq, then the 
instruction from address 0x3010 will be single stepped. We will jump back to 
0x2000 again after addressing el1_irq, but that is no more available for 
single-stepping and while executing instruction at that address we again land 
into watchpoint exception handler.


I do not have a HW debugger, but this is what looked like while going through 
code.




I think we have two or three interacting bugs here. I'm not convinced masking
interrupts is the best fix as the data abort handler inherits this value. We
might mask interrupts for a fault that can't be handled with interrupts masked.



In my understanding problems are:
(1) Single stepping of unwanted instruction (ie. instruction  next to 
enable_dbg from el1_irq)
(2) We do not have memory at the end of el1_irq, so that we can set watchpoint 
exception generating instruction for single stepping.


I think, we can find a way to take care for (2), but not sure how (1) can be 
taken care, without the approach I am taking.



I will post some RFC/fixes, but need to get my head round the debug/exception
interaction in the ARM-ARM first!


Thanks,

James



--
Regards
Pratyush


Re: [PATCH v3 0/5] ARM64: disable irq between breakpoint and step exception

2017-08-02 Thread Pratyush Anand

Hi James,

Thanks for your analysis.

On Wednesday 02 August 2017 10:43 PM, James Morse wrote:

Hi Pratyush,

On 01/08/17 05:18, Pratyush Anand wrote:

On Monday 31 July 2017 10:45 PM, James Morse wrote:

On 31/07/17 11:40, Pratyush Anand wrote:

samples/hw_breakpoint/data_breakpoint.c passes with x86_64 but fails with
ARM64. Even though it has been NAKed previously on upstream [1, 2], I have
tried to come up with patches which can resolve it for ARM64 as well.

I noticed that even perf step exception can go into an infinite loop if CPU
receives an interrupt while executing breakpoint/watchpoint handler. So,
event though we are not concerned about above test, we will have to find a
solution for the perf issue.



You can easily reproduce the issue with following:
# insmod data_breakpoint.ko ksym=__sysrq_enabled
# cat /proc/sys/kernel/sysrq


Thanks, that happily dump-stacks forever. Your first three patches fix the
stepping over the watchpoint, I've had a go at fixing the interrupt interaction,
(instead of just masking interrupts).

gdb single-step works, as does kprobes, FWIW for those three:
Tested-by: James Morse 



What causes your infinite loop?



Flow is like this:
- A SW or HW breakpoint exception is being generated on a cpu lets say CPU5
- Breakpoint handler does something which causes an interrupt to be active on
the same CPU. In fact there might be many other reasons for an interrupt to be
active on a CPU while breakpoint handler was being executed.
- So, as soon as we return from breakpoint exception, we go to the IRQ exception
handler, while we were expecting a single step exception.


What breaks when this happens?


I think, as soon as we call enable_dbg from el1_irq, step exception will be 
generated, and we will be stepping very first instruction after enable_dbg 
which is not expected by single step handler.




With your reproducer and the first three patches I see it hitting the watchpoint
multiple times and stepping the irq handle
Lets say we were executing instruction from address 0x2000 when watchpoint 
exception occurred. We programmed, ELR with 0x2000 for single stepping, 
however we received an interrupt before instruction at 0x2000 could have been 
single stepped.


Now if 0x3010 is the address next to enable_dbg from el1_irq, then the 
instruction from address 0x3010 will be single stepped. We will jump back to 
0x2000 again after addressing el1_irq, but that is no more available for 
single-stepping and while executing instruction at that address we again land 
into watchpoint exception handler.


I do not have a HW debugger, but this is what looked like while going through 
code.




I think we have two or three interacting bugs here. I'm not convinced masking
interrupts is the best fix as the data abort handler inherits this value. We
might mask interrupts for a fault that can't be handled with interrupts masked.



In my understanding problems are:
(1) Single stepping of unwanted instruction (ie. instruction  next to 
enable_dbg from el1_irq)
(2) We do not have memory at the end of el1_irq, so that we can set watchpoint 
exception generating instruction for single stepping.


I think, we can find a way to take care for (2), but not sure how (1) can be 
taken care, without the approach I am taking.



I will post some RFC/fixes, but need to get my head round the debug/exception
interaction in the ARM-ARM first!


Thanks,

James



--
Regards
Pratyush


Query: rcu_sched detected stalls with function_graph tracer

2017-08-02 Thread Pratyush Anand

Hi Steve,

I am using a 3.10 based kernel, and when I enable function_graph with one 
particular x86_64 machine, I encounter rcu_sched stall.


echo function_graph > /sys/kernel/debug/tracing/current_tracer
echo 1 > /sys/kernel/debug/tracing/tracing_on

If I use 4.13-rc2, then its better, but still goes for it after a bit of 
stressing with stressng.


It looks like that this behavior should be expected with a loaded system (like 
that of ftrace-graph tracing for all available function) which can cause RCU 
stall warning. But,I am not an expert. So, whats your opinion on it?


I tried to bisect and there was no particular function which can cause this. 
It looked like that when we had more than 5000 functions in set_ftrace_filter, 
we start getting such warnings. Tried by putting some most  hit functions like 
spin_lock* etc in set_ftrace_notrace. Still, no help.  Do you have some 
suggestion for further debugging pointers?


log with 3.10 based kernel
===
[  358.764071] perf: interrupt took too long (4503 > 4488), lowering 
kernel.perf_event_max_sample_rate to 44000
[  408.705034] INFO: rcu_sched detected stalls on CPUs/tasks: { 0} (detected 
by 7, t=60036 jiffies, g=21144, c=21143, q=21651)

[  408.706011] Task dump for CPU 0:
[  408.706011] swapper/0   R  running task0 0  0 0x0008
[  408.706011]  0003 0003 81aaf920 
819e4000
[  408.706011]  819e7ed0 816b6ef5  
81b1c820
[  408.706011]  819e4000 819e4000 819e4000 
819e4000

[  408.706011] Call Trace:
[  408.706011]  [] ? ftrace_graph_caller+0x85/0x85
[  408.706011]  [] handle_irq_event_percpu+0x55/0x80
[  408.706011]  [] ? ftrace_graph_caller+0x85/0x85
[  408.706011]  [] handle_irq_event+0x3c/0x60
[  408.706011]  [] ? ftrace_graph_caller+0x85/0x85
[  408.706011]  [] pci_bus_read_config_dword+0x86/0xb0
[  408.706011]  [] ? rest_init+0x77/0x80
[  408.706011]  [] ? start_kernel+0x439/0x45a
[  408.706011]  [] ? repair_env_string+0x5c/0x5c
[  408.706011]  [] ? early_idt_handler_array+0x120/0x120
[  408.706011]  [] ? x86_64_start_reservations+0x24/0x26
[  408.706011]  [] ? x86_64_start_kernel+0x14f/0x172
[  377.243167] perf: interrupt took too long (5741 > 5628), lowering 
kernel.perf_event_max_sample_rate to 34000
[  437.522697] perf: interrupt took too long (8551 > 7176), lowering 
kernel.perf_event_max_sample_rate to 23000
[  463.908115] perf: interrupt took too long (11374 > 10688), lowering 
kernel.perf_event_max_sample_rate to 17000

[  480.954277] INFO: task yum:6726 blocked for more than 120 seconds.
[  480.991431] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.

[  481.038470] yum D 880467bebf40 0  6726   6714 0x0080
[  481.081174]  88006489b850 0082 880467bebf40 
88006489bfd8
[  481.126085]  88006489bfd8 88006489bfd8 880467bebf40 
8802772656b0
[  481.171087]  7fff 0002  
880467bebf40

[  481.216034] Call Trace:
[  481.230832]  [] ftrace_graph_caller+0x85/0x85
[  481.266435]  [] schedule+0x29/0x70
[  481.296290]  [] ftrace_graph_caller+0x85/0x85
[  481.331966]  [] schedule_timeout+0x239/0x2c0
[  481.367082]  [] ? prepare_ftrace_return+0xb6/0xe0
[  481.404850]  [] ? __down_common+0x104/0x104
[  481.439492]  [] ftrace_graph_caller+0x85/0x85
[  481.475091]  [] __down_common+0xaa/0x104
[  481.508098]  [] ? ftrace_graph_caller+0x85/0x85
[  481.544778]  [] down+0x41/0x50
[  481.577943]  [] __down+0x1d/0x1f
[  481.611734]  [] ftrace_graph_caller+0x85/0x85
[  481.652576]  [] xfs_buf_lock+0x3c/0xd0 [xfs]
[  481.693178]  [] ftrace_graph_caller+0x85/0x85
[  481.734216]  [] _xfs_buf_find+0x170/0x330 [xfs]
[  481.775910]  [] ftrace_graph_caller+0x85/0x85
[  481.816266]  [] xfs_buf_get_map+0x2a/0x240 [xfs]
[  481.859119]  [] ftrace_graph_caller+0x85/0x85
[  481.899431]  [] xfs_buf_read_map+0x30/0x160 [xfs]
[  481.941875]  [] ftrace_graph_caller+0x85/0x85
[  481.982192]  [] xfs_trans_read_buf_map+0x211/0x400 [xfs]
[  482.027938]  [] ftrace_graph_caller+0x85/0x85
[  482.068655]  [] xfs_imap_to_bp+0x6e/0xf0 [xfs]
[  482.109275]  [] ftrace_graph_caller+0x85/0x85
[  482.149375]  [] xfs_iunlink_remove+0x295/0x3c0 [xfs]
[  482.193561]  [] ftrace_graph_caller+0x85/0x85
[  482.233577]  [] xfs_ifree+0x47/0x100 [xfs]
[  482.272198]  [] ftrace_graph_caller+0x85/0x85
[  482.312544]  [] xfs_inactive_ifree+0xd1/0x230 [xfs]
[  482.355816]  [] ftrace_graph_caller+0x85/0x85
[  482.395936]  [] xfs_inactive+0x8b/0x130 [xfs]
[  482.436204]  [] ftrace_graph_caller+0x85/0x85
[  482.477110]  [] xfs_fs_destroy_inode+0x95/0x180 [xfs]
[  482.521397]  [] ftrace_graph_caller+0x85/0x85
[  482.562184]  [] destroy_inode+0x38/0x60
[  482.599372]  [] ftrace_graph_caller+0x85/0x85
[  482.639831]  [] evict+0x10a/0x180
[  482.673827]  [] ftrace_graph_caller+0x85/0x85
[  482.714694]  [] iput+0xf9/0x190
[  

Query: rcu_sched detected stalls with function_graph tracer

2017-08-02 Thread Pratyush Anand

Hi Steve,

I am using a 3.10 based kernel, and when I enable function_graph with one 
particular x86_64 machine, I encounter rcu_sched stall.


echo function_graph > /sys/kernel/debug/tracing/current_tracer
echo 1 > /sys/kernel/debug/tracing/tracing_on

If I use 4.13-rc2, then its better, but still goes for it after a bit of 
stressing with stressng.


It looks like that this behavior should be expected with a loaded system (like 
that of ftrace-graph tracing for all available function) which can cause RCU 
stall warning. But,I am not an expert. So, whats your opinion on it?


I tried to bisect and there was no particular function which can cause this. 
It looked like that when we had more than 5000 functions in set_ftrace_filter, 
we start getting such warnings. Tried by putting some most  hit functions like 
spin_lock* etc in set_ftrace_notrace. Still, no help.  Do you have some 
suggestion for further debugging pointers?


log with 3.10 based kernel
===
[  358.764071] perf: interrupt took too long (4503 > 4488), lowering 
kernel.perf_event_max_sample_rate to 44000
[  408.705034] INFO: rcu_sched detected stalls on CPUs/tasks: { 0} (detected 
by 7, t=60036 jiffies, g=21144, c=21143, q=21651)

[  408.706011] Task dump for CPU 0:
[  408.706011] swapper/0   R  running task0 0  0 0x0008
[  408.706011]  0003 0003 81aaf920 
819e4000
[  408.706011]  819e7ed0 816b6ef5  
81b1c820
[  408.706011]  819e4000 819e4000 819e4000 
819e4000

[  408.706011] Call Trace:
[  408.706011]  [] ? ftrace_graph_caller+0x85/0x85
[  408.706011]  [] handle_irq_event_percpu+0x55/0x80
[  408.706011]  [] ? ftrace_graph_caller+0x85/0x85
[  408.706011]  [] handle_irq_event+0x3c/0x60
[  408.706011]  [] ? ftrace_graph_caller+0x85/0x85
[  408.706011]  [] pci_bus_read_config_dword+0x86/0xb0
[  408.706011]  [] ? rest_init+0x77/0x80
[  408.706011]  [] ? start_kernel+0x439/0x45a
[  408.706011]  [] ? repair_env_string+0x5c/0x5c
[  408.706011]  [] ? early_idt_handler_array+0x120/0x120
[  408.706011]  [] ? x86_64_start_reservations+0x24/0x26
[  408.706011]  [] ? x86_64_start_kernel+0x14f/0x172
[  377.243167] perf: interrupt took too long (5741 > 5628), lowering 
kernel.perf_event_max_sample_rate to 34000
[  437.522697] perf: interrupt took too long (8551 > 7176), lowering 
kernel.perf_event_max_sample_rate to 23000
[  463.908115] perf: interrupt took too long (11374 > 10688), lowering 
kernel.perf_event_max_sample_rate to 17000

[  480.954277] INFO: task yum:6726 blocked for more than 120 seconds.
[  480.991431] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.

[  481.038470] yum D 880467bebf40 0  6726   6714 0x0080
[  481.081174]  88006489b850 0082 880467bebf40 
88006489bfd8
[  481.126085]  88006489bfd8 88006489bfd8 880467bebf40 
8802772656b0
[  481.171087]  7fff 0002  
880467bebf40

[  481.216034] Call Trace:
[  481.230832]  [] ftrace_graph_caller+0x85/0x85
[  481.266435]  [] schedule+0x29/0x70
[  481.296290]  [] ftrace_graph_caller+0x85/0x85
[  481.331966]  [] schedule_timeout+0x239/0x2c0
[  481.367082]  [] ? prepare_ftrace_return+0xb6/0xe0
[  481.404850]  [] ? __down_common+0x104/0x104
[  481.439492]  [] ftrace_graph_caller+0x85/0x85
[  481.475091]  [] __down_common+0xaa/0x104
[  481.508098]  [] ? ftrace_graph_caller+0x85/0x85
[  481.544778]  [] down+0x41/0x50
[  481.577943]  [] __down+0x1d/0x1f
[  481.611734]  [] ftrace_graph_caller+0x85/0x85
[  481.652576]  [] xfs_buf_lock+0x3c/0xd0 [xfs]
[  481.693178]  [] ftrace_graph_caller+0x85/0x85
[  481.734216]  [] _xfs_buf_find+0x170/0x330 [xfs]
[  481.775910]  [] ftrace_graph_caller+0x85/0x85
[  481.816266]  [] xfs_buf_get_map+0x2a/0x240 [xfs]
[  481.859119]  [] ftrace_graph_caller+0x85/0x85
[  481.899431]  [] xfs_buf_read_map+0x30/0x160 [xfs]
[  481.941875]  [] ftrace_graph_caller+0x85/0x85
[  481.982192]  [] xfs_trans_read_buf_map+0x211/0x400 [xfs]
[  482.027938]  [] ftrace_graph_caller+0x85/0x85
[  482.068655]  [] xfs_imap_to_bp+0x6e/0xf0 [xfs]
[  482.109275]  [] ftrace_graph_caller+0x85/0x85
[  482.149375]  [] xfs_iunlink_remove+0x295/0x3c0 [xfs]
[  482.193561]  [] ftrace_graph_caller+0x85/0x85
[  482.233577]  [] xfs_ifree+0x47/0x100 [xfs]
[  482.272198]  [] ftrace_graph_caller+0x85/0x85
[  482.312544]  [] xfs_inactive_ifree+0xd1/0x230 [xfs]
[  482.355816]  [] ftrace_graph_caller+0x85/0x85
[  482.395936]  [] xfs_inactive+0x8b/0x130 [xfs]
[  482.436204]  [] ftrace_graph_caller+0x85/0x85
[  482.477110]  [] xfs_fs_destroy_inode+0x95/0x180 [xfs]
[  482.521397]  [] ftrace_graph_caller+0x85/0x85
[  482.562184]  [] destroy_inode+0x38/0x60
[  482.599372]  [] ftrace_graph_caller+0x85/0x85
[  482.639831]  [] evict+0x10a/0x180
[  482.673827]  [] ftrace_graph_caller+0x85/0x85
[  482.714694]  [] iput+0xf9/0x190
[  

rcu_sched stall while waiting in csd_lock_wait()

2017-08-01 Thread Pratyush Anand

Hi,

I am observing following rcu_sched stall while executing `perf record -a  -- 
sleep 1` with one of the arm64 platform. It looks like that stalled cpu was 
waiting in csd_lock_wait() from where it never came out,and so the stall. Any 
help/pointer for further debugging would be very helpful. Problem also 
reproduced with 4.13.0-rc3.



[11108.758804] task: 80178249 task.stack: 801782574000
[11108.764710] PC is at smp_call_function_single+0x128/0x178
[11108.770089] LR is at smp_call_function_single+0x11c/0x178
[11108.775470] pc : [] lr : [] pstate: 
20400145

[11108.782848] sp : 801782577b80
[11108.786147] x29: 801782577b80 x28: 8017815c3800
[11108.791442] x27: 087f1000 x26: 001d
[11108.796737] x25:  x24: 081ddc90
[11108.802032] x23: 08c73000 x22: 0006
[11108.807327] x21: 081da548 x20: 801782577c18
[11108.812623] x19: 08c73000 x18: fa011c30
[11108.817918] x17:  x16: 
[11108.823213] x15: 006ed000 x14: 00620170
[11108.828508] x13:  x12: 0005
[11108.833803] x11:  x10: 
[11108.839098] x9 : fa011e40 x8 : 001d
[11108.844393] x7 :  x6 : 801782577bc8
[11108.849689] x5 : 0040 x4 : 801782577be0
[11108.854984] x3 : 801782577be0 x2 : 
[11108.860279] x1 : 0003 x0 : 
[11108.865574]
[11141.885599] INFO: rcu_sched self-detected stall on CPU
[11141.889777] 	7-...: (5982 ticks this GP) idle=2fb/141/0 
softirq=604/604 fqs=2995

[11141.898281]   (t=6000 jiffies g=11855 c=11854 q=19196)
[11141.903404] Task dump for CPU 7:
[11141.906615] perfR  running task0  2357   2329 0x0202
[11141.913647] Call trace:
[11141.916081] [] dump_backtrace+0x0/0x270
[11141.921460] [] show_stack+0x24/0x2c
[11141.926496] [] sched_show_task+0xec/0x130
[11141.932050] [] dump_cpu_task+0x48/0x54
[11141.937347] [] rcu_dump_cpu_stacks+0xac/0xf4
[11141.943163] [] rcu_check_callbacks+0x7a0/0xa84
[11141.949152] [] update_process_times+0x34/0x5c
[11141.955055] [] tick_sched_handle.isra.15+0x38/0x70
[11141.961390] [] tick_sched_timer+0x48/0x88
[11141.966946] [] __hrtimer_run_queues+0x150/0x2e4
[11141.973022] [] hrtimer_interrupt+0xa0/0x1d4
[11141.978754] [] arch_timer_handler_phys+0x3c/0x48
[11141.984916] [] handle_percpu_devid_irq+0x98/0x210
[11141.991164] [] generic_handle_irq+0x34/0x4c
[11141.996893] [] __handle_domain_irq+0x6c/0xc4
[11142.002710] [] gic_handle_irq+0x94/0x190
[11142.008178] Exception stack(0x801782577a50 to 0x801782577b80)
[11142.014602] 7a40:    
0003
[11142.022414] 7a60:  801782577be0 801782577be0 
0040
[11142.030226] 7a80: 801782577bc8  001d 
fa011e40
[11142.038039] 7aa0:   0005 

[11142.045851] 7ac0: 00620170 006ed000  

[11142.053664] 7ae0: fa011c30 08c73000 801782577c18 
081da548
[11142.061477] 7b00: 0006 08c73000 081ddc90 

[11142.069289] 7b20: 001d 087f1000 8017815c3800 
801782577b80
[11142.077101] 7b40: 081615a0 801782577b80 081615ac 
20400145
[11142.084914] 7b60: 001b0019 801782577c18 0001 
801782577bc8

[11142.092727] [] el1_irq+0xb4/0x140
[11142.097588] [] smp_call_function_single+0x128/0x178
[11142.104014] [] cpu_function_call+0x40/0x64
[11142.109654] [] event_function_call+0x100/0x108
[11142.115643] [] _perf_event_disable+0x4c/0x74
[11142.121459] [] perf_event_for_each_child+0x38/0x98
[11142.127796] [] perf_ioctl+0xec/0x7b4
[11142.132920] [] do_vfs_ioctl+0xcc/0x7bc
[11142.138213] [] SyS_ioctl+0x90/0xa4
[11142.143161] [] __sys_trace_return+0x0/0x4
[11142.148717] INFO: rcu_sched detected stalls on CPUs/tasks:
[11142.154187] 	7-...: (5983 ticks this GP) idle=2fb/140/0 
softirq=604/604 fqs=2995

[11142.162692]  (detected by 8, t=6027 jiffies, g=11855, c=11854, q=19211)
[11142.169289] Task dump for CPU 7:
[11142.172500] perfR  running task0  2357   2329 0x0202
[11142.179531] Call trace:
[11142.181964] [] __switch_to+0x64/0x70
[11142.187088] [] cpu_number+0x0/0x8


--
Regards
Pratyush


rcu_sched stall while waiting in csd_lock_wait()

2017-08-01 Thread Pratyush Anand

Hi,

I am observing following rcu_sched stall while executing `perf record -a  -- 
sleep 1` with one of the arm64 platform. It looks like that stalled cpu was 
waiting in csd_lock_wait() from where it never came out,and so the stall. Any 
help/pointer for further debugging would be very helpful. Problem also 
reproduced with 4.13.0-rc3.



[11108.758804] task: 80178249 task.stack: 801782574000
[11108.764710] PC is at smp_call_function_single+0x128/0x178
[11108.770089] LR is at smp_call_function_single+0x11c/0x178
[11108.775470] pc : [] lr : [] pstate: 
20400145

[11108.782848] sp : 801782577b80
[11108.786147] x29: 801782577b80 x28: 8017815c3800
[11108.791442] x27: 087f1000 x26: 001d
[11108.796737] x25:  x24: 081ddc90
[11108.802032] x23: 08c73000 x22: 0006
[11108.807327] x21: 081da548 x20: 801782577c18
[11108.812623] x19: 08c73000 x18: fa011c30
[11108.817918] x17:  x16: 
[11108.823213] x15: 006ed000 x14: 00620170
[11108.828508] x13:  x12: 0005
[11108.833803] x11:  x10: 
[11108.839098] x9 : fa011e40 x8 : 001d
[11108.844393] x7 :  x6 : 801782577bc8
[11108.849689] x5 : 0040 x4 : 801782577be0
[11108.854984] x3 : 801782577be0 x2 : 
[11108.860279] x1 : 0003 x0 : 
[11108.865574]
[11141.885599] INFO: rcu_sched self-detected stall on CPU
[11141.889777] 	7-...: (5982 ticks this GP) idle=2fb/141/0 
softirq=604/604 fqs=2995

[11141.898281]   (t=6000 jiffies g=11855 c=11854 q=19196)
[11141.903404] Task dump for CPU 7:
[11141.906615] perfR  running task0  2357   2329 0x0202
[11141.913647] Call trace:
[11141.916081] [] dump_backtrace+0x0/0x270
[11141.921460] [] show_stack+0x24/0x2c
[11141.926496] [] sched_show_task+0xec/0x130
[11141.932050] [] dump_cpu_task+0x48/0x54
[11141.937347] [] rcu_dump_cpu_stacks+0xac/0xf4
[11141.943163] [] rcu_check_callbacks+0x7a0/0xa84
[11141.949152] [] update_process_times+0x34/0x5c
[11141.955055] [] tick_sched_handle.isra.15+0x38/0x70
[11141.961390] [] tick_sched_timer+0x48/0x88
[11141.966946] [] __hrtimer_run_queues+0x150/0x2e4
[11141.973022] [] hrtimer_interrupt+0xa0/0x1d4
[11141.978754] [] arch_timer_handler_phys+0x3c/0x48
[11141.984916] [] handle_percpu_devid_irq+0x98/0x210
[11141.991164] [] generic_handle_irq+0x34/0x4c
[11141.996893] [] __handle_domain_irq+0x6c/0xc4
[11142.002710] [] gic_handle_irq+0x94/0x190
[11142.008178] Exception stack(0x801782577a50 to 0x801782577b80)
[11142.014602] 7a40:    
0003
[11142.022414] 7a60:  801782577be0 801782577be0 
0040
[11142.030226] 7a80: 801782577bc8  001d 
fa011e40
[11142.038039] 7aa0:   0005 

[11142.045851] 7ac0: 00620170 006ed000  

[11142.053664] 7ae0: fa011c30 08c73000 801782577c18 
081da548
[11142.061477] 7b00: 0006 08c73000 081ddc90 

[11142.069289] 7b20: 001d 087f1000 8017815c3800 
801782577b80
[11142.077101] 7b40: 081615a0 801782577b80 081615ac 
20400145
[11142.084914] 7b60: 001b0019 801782577c18 0001 
801782577bc8

[11142.092727] [] el1_irq+0xb4/0x140
[11142.097588] [] smp_call_function_single+0x128/0x178
[11142.104014] [] cpu_function_call+0x40/0x64
[11142.109654] [] event_function_call+0x100/0x108
[11142.115643] [] _perf_event_disable+0x4c/0x74
[11142.121459] [] perf_event_for_each_child+0x38/0x98
[11142.127796] [] perf_ioctl+0xec/0x7b4
[11142.132920] [] do_vfs_ioctl+0xcc/0x7bc
[11142.138213] [] SyS_ioctl+0x90/0xa4
[11142.143161] [] __sys_trace_return+0x0/0x4
[11142.148717] INFO: rcu_sched detected stalls on CPUs/tasks:
[11142.154187] 	7-...: (5983 ticks this GP) idle=2fb/140/0 
softirq=604/604 fqs=2995

[11142.162692]  (detected by 8, t=6027 jiffies, g=11855, c=11854, q=19211)
[11142.169289] Task dump for CPU 7:
[11142.172500] perfR  running task0  2357   2329 0x0202
[11142.179531] Call trace:
[11142.181964] [] __switch_to+0x64/0x70
[11142.187088] [] cpu_number+0x0/0x8


--
Regards
Pratyush


Re: [PATCH v3 0/5] ARM64: disable irq between breakpoint and step exception

2017-08-01 Thread Pratyush Anand

Hi Takahiro,

On Tuesday 01 August 2017 01:44 PM, AKASHI Takahiro wrote:

Hi Pratyush,

On Mon, Jul 31, 2017 at 04:10:28PM +0530, Pratyush Anand wrote:

v2 -> v3
- Moved step_needed from uapi structure to kernel only structure
- Re-enable interrupt if stepped instruction faults
- Modified register_wide_hw_breakpoint() to accept step_needed arg
v2 was here: http://marc.info/?l=linux-arm-kernel=149942910730496=2

v1 -> v2:
- patch 1 of v1 has been modified to patch 1-3 of v2.
- Introduced a new event attribute step_needed and implemented
   hw_breakpoint_needs_single_step() (patch 1)
- Replaced usage of is_default_overflow_handler() with
   hw_breakpoint_needs_single_step(). (patch 2)
- Modified sample test to set set step_needed bit field (patch 3)
v1 was here: http://marc.info/?l=linux-arm-kernel=149910958418708=2

samples/hw_breakpoint/data_breakpoint.c passes with x86_64 but fails with
ARM64. Even though it has been NAKed previously on upstream [1, 2], I have
tried to come up with patches which can resolve it for ARM64 as well.

I noticed that even perf step exception can go into an infinite loop if CPU
receives an interrupt while executing breakpoint/watchpoint handler. So,
event though we are not concerned about above test, we will have to find a
solution for the perf issue.

This patchset attempts to resolve both the issue. Please review.
Since, it also takes care of SW breakpoint, so I hope kgdb should also be
fine. However, I have not tested that.
@Takahiro: Will it be possible to test these patches for kgdb.


I have not yet understood the details of your patch, but
I gave it a try and didn't see any difference around the behavior
of kgdb's single stepping.

I also gave a try to James' patch, but again nothing different
as long as kgdb is concerned.
(I'm tackling some issue in single stepping at irq's kernel_exit,
in particular, 'eret'.)


You mean that you were expecting an step exception after eret (and this eret 
was being called from kgdb breakpoint exception handler), but you got irq 
exception? This is what I understood from your previous patch [0].


If that was the case, then I was expecting that this patch series should help.
See, patch 4/5:
- kgdb breakpoint handler kgdb_brk_fn() will be called from 
arch/arm64/kernel/debug-monitors.c: brk_handler().
- If we are expecting a step exception after servicing this breakpoint 
handler, then kgdb code would have called kernel_enable_single_step(). So, we 
should see kernel_active_single_step() true in brk_handler().
- If above happens then do_debug_exception() will make sure that PSR I bit is 
set before eret is called and we should not see an IRQ exception after eret.


Can you please help me with your reproducer test case?

[0]  http://lists.infradead.org/pipermail/linux-arm-kernel/2017-May/508066.html

--
Regards
Pratyush


Re: [PATCH v3 0/5] ARM64: disable irq between breakpoint and step exception

2017-08-01 Thread Pratyush Anand

Hi Takahiro,

On Tuesday 01 August 2017 01:44 PM, AKASHI Takahiro wrote:

Hi Pratyush,

On Mon, Jul 31, 2017 at 04:10:28PM +0530, Pratyush Anand wrote:

v2 -> v3
- Moved step_needed from uapi structure to kernel only structure
- Re-enable interrupt if stepped instruction faults
- Modified register_wide_hw_breakpoint() to accept step_needed arg
v2 was here: http://marc.info/?l=linux-arm-kernel=149942910730496=2

v1 -> v2:
- patch 1 of v1 has been modified to patch 1-3 of v2.
- Introduced a new event attribute step_needed and implemented
   hw_breakpoint_needs_single_step() (patch 1)
- Replaced usage of is_default_overflow_handler() with
   hw_breakpoint_needs_single_step(). (patch 2)
- Modified sample test to set set step_needed bit field (patch 3)
v1 was here: http://marc.info/?l=linux-arm-kernel=149910958418708=2

samples/hw_breakpoint/data_breakpoint.c passes with x86_64 but fails with
ARM64. Even though it has been NAKed previously on upstream [1, 2], I have
tried to come up with patches which can resolve it for ARM64 as well.

I noticed that even perf step exception can go into an infinite loop if CPU
receives an interrupt while executing breakpoint/watchpoint handler. So,
event though we are not concerned about above test, we will have to find a
solution for the perf issue.

This patchset attempts to resolve both the issue. Please review.
Since, it also takes care of SW breakpoint, so I hope kgdb should also be
fine. However, I have not tested that.
@Takahiro: Will it be possible to test these patches for kgdb.


I have not yet understood the details of your patch, but
I gave it a try and didn't see any difference around the behavior
of kgdb's single stepping.

I also gave a try to James' patch, but again nothing different
as long as kgdb is concerned.
(I'm tackling some issue in single stepping at irq's kernel_exit,
in particular, 'eret'.)


You mean that you were expecting an step exception after eret (and this eret 
was being called from kgdb breakpoint exception handler), but you got irq 
exception? This is what I understood from your previous patch [0].


If that was the case, then I was expecting that this patch series should help.
See, patch 4/5:
- kgdb breakpoint handler kgdb_brk_fn() will be called from 
arch/arm64/kernel/debug-monitors.c: brk_handler().
- If we are expecting a step exception after servicing this breakpoint 
handler, then kgdb code would have called kernel_enable_single_step(). So, we 
should see kernel_active_single_step() true in brk_handler().
- If above happens then do_debug_exception() will make sure that PSR I bit is 
set before eret is called and we should not see an IRQ exception after eret.


Can you please help me with your reproducer test case?

[0]  http://lists.infradead.org/pipermail/linux-arm-kernel/2017-May/508066.html

--
Regards
Pratyush


Re: [PATCH v3 0/5] ARM64: disable irq between breakpoint and step exception

2017-07-31 Thread Pratyush Anand

Hi James,

On Monday 31 July 2017 10:45 PM, James Morse wrote:

Hi Pratyush,

On 31/07/17 11:40, Pratyush Anand wrote:

samples/hw_breakpoint/data_breakpoint.c passes with x86_64 but fails with
ARM64. Even though it has been NAKed previously on upstream [1, 2], I have
tried to come up with patches which can resolve it for ARM64 as well.

I noticed that even perf step exception can go into an infinite loop if CPU
receives an interrupt while executing breakpoint/watchpoint handler. So,
event though we are not concerned about above test, we will have to find a
solution for the perf issue.


This caught my eye as I've been reworking the order the DAIF flags get
set/cleared[0].


Thanks for pointing to your series.


What causes your infinite loop? Is it single-stepping kernel_exit? If so patch 4
"arm64: entry.S mask all exceptions during kernel_exit" [1] may help.


Flow is like this:
- A SW or HW breakpoint exception is being generated on a cpu lets say CPU5
- Breakpoint handler does something which causes an interrupt to be active on 
the same CPU. In fact there might be many other reasons for an interrupt to be 
active on a CPU while breakpoint handler was being executed.
- So, as soon as we return from breakpoint exception, we go to the IRQ 
exception handler, while we were expecting a single step exception.


I do not think that your patch 4 will help here. That patch disables interrupt 
while kernel_exit will execute.So,until we enable PSR I bit, we can not stop 
an interrupt to be generated before step exception.


You can easily reproduce the issue with following:
# insmod data_breakpoint.ko ksym=__sysrq_enabled
# cat /proc/sys/kernel/sysrq

Where data_breakpoint.ko is module from samples/hw_breakpoint/data_breakpoint.c.



If its more like "single stepping something we didn't expect" you will get the
same problem if we take an SError. (which with that series is unmasked ~all the
time).
Either way this looks like a new and exciting way of hitting the 'known issue'
described in patch 12 [3].


Would disabling MDSCR_EL1.SS if we took an exception solve your problem?

If so, I think we should add a new flag, 'TIF_KSINGLESTEP', causing us to
save/restore MDSCR_EL1.SS into pt_regs on el1 exceptions. This would let us
single-step without modifying the DAIF flags for the location we are stepping,
and allow taking any kind of exception from that location.

We should disable nested users of single-step, we can do that by testing the
flag, print a warning then pretend we missed the breakpoint. (hence it needs to
be separate from the user single-step flag).


Thanks,

James

[0] https://www.spinics.net/lists/arm-kernel/msg596684.html
[1] https://www.spinics.net/lists/arm-kernel/msg596686.html
[2] https://www.spinics.net/lists/arm-kernel/msg596689.html

___
linux-arm-kernel mailing list
linux-arm-ker...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel



--
Regards
Pratyush


Re: [PATCH v3 0/5] ARM64: disable irq between breakpoint and step exception

2017-07-31 Thread Pratyush Anand

Hi James,

On Monday 31 July 2017 10:45 PM, James Morse wrote:

Hi Pratyush,

On 31/07/17 11:40, Pratyush Anand wrote:

samples/hw_breakpoint/data_breakpoint.c passes with x86_64 but fails with
ARM64. Even though it has been NAKed previously on upstream [1, 2], I have
tried to come up with patches which can resolve it for ARM64 as well.

I noticed that even perf step exception can go into an infinite loop if CPU
receives an interrupt while executing breakpoint/watchpoint handler. So,
event though we are not concerned about above test, we will have to find a
solution for the perf issue.


This caught my eye as I've been reworking the order the DAIF flags get
set/cleared[0].


Thanks for pointing to your series.


What causes your infinite loop? Is it single-stepping kernel_exit? If so patch 4
"arm64: entry.S mask all exceptions during kernel_exit" [1] may help.


Flow is like this:
- A SW or HW breakpoint exception is being generated on a cpu lets say CPU5
- Breakpoint handler does something which causes an interrupt to be active on 
the same CPU. In fact there might be many other reasons for an interrupt to be 
active on a CPU while breakpoint handler was being executed.
- So, as soon as we return from breakpoint exception, we go to the IRQ 
exception handler, while we were expecting a single step exception.


I do not think that your patch 4 will help here. That patch disables interrupt 
while kernel_exit will execute.So,until we enable PSR I bit, we can not stop 
an interrupt to be generated before step exception.


You can easily reproduce the issue with following:
# insmod data_breakpoint.ko ksym=__sysrq_enabled
# cat /proc/sys/kernel/sysrq

Where data_breakpoint.ko is module from samples/hw_breakpoint/data_breakpoint.c.



If its more like "single stepping something we didn't expect" you will get the
same problem if we take an SError. (which with that series is unmasked ~all the
time).
Either way this looks like a new and exciting way of hitting the 'known issue'
described in patch 12 [3].


Would disabling MDSCR_EL1.SS if we took an exception solve your problem?

If so, I think we should add a new flag, 'TIF_KSINGLESTEP', causing us to
save/restore MDSCR_EL1.SS into pt_regs on el1 exceptions. This would let us
single-step without modifying the DAIF flags for the location we are stepping,
and allow taking any kind of exception from that location.

We should disable nested users of single-step, we can do that by testing the
flag, print a warning then pretend we missed the breakpoint. (hence it needs to
be separate from the user single-step flag).


Thanks,

James

[0] https://www.spinics.net/lists/arm-kernel/msg596684.html
[1] https://www.spinics.net/lists/arm-kernel/msg596686.html
[2] https://www.spinics.net/lists/arm-kernel/msg596689.html

___
linux-arm-kernel mailing list
linux-arm-ker...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel



--
Regards
Pratyush


Re: [PATCH RFC] arm64: introduce mm_context_t flags

2017-07-31 Thread Pratyush Anand



On Monday 31 July 2017 08:18 PM, Yury Norov wrote:

  - If we start using TIF flags here, we cannot easily add new mm_context
specific bits because they may mess with TIF ones.


This one seems convincing argument. ATM do you have any mm_context flag which 
would you like to incorporate?




I think that this is not what was intended when you added new field in
mm_context_t.

In this patch the MMCF_AARCH32 flag is introduced, where MMCF prefix stands for
mm_context_t flags. And the new flag is used for uprobes code instead of 
TIF_32BIT.


--
Regards
Pratyush


Re: [PATCH RFC] arm64: introduce mm_context_t flags

2017-07-31 Thread Pratyush Anand



On Monday 31 July 2017 08:18 PM, Yury Norov wrote:

  - If we start using TIF flags here, we cannot easily add new mm_context
specific bits because they may mess with TIF ones.


This one seems convincing argument. ATM do you have any mm_context flag which 
would you like to incorporate?




I think that this is not what was intended when you added new field in
mm_context_t.

In this patch the MMCF_AARCH32 flag is introduced, where MMCF prefix stands for
mm_context_t flags. And the new flag is used for uprobes code instead of 
TIF_32BIT.


--
Regards
Pratyush


[PATCH v3 2/5] arm64: use hw_breakpoint_needs_single_step() to decide if step is needed

2017-07-31 Thread Pratyush Anand
Currently we use is_default_overflow_handler() to decide whether a
"step" will be needed or not. However, is_default_overflow_handler() is
true only for perf implementation. There can be some custom kernel
module tests like samples/hw_breakpoint/data_breakpoint.c which can
rely on default step handler.

hw_breakpoint_needs_single_step() will be true if any hw_breakpoint user
wants to use default step handler and sets step_needed in hw_perf_event.

Signed-off-by: Pratyush Anand <pan...@redhat.com>
---
 arch/arm64/kernel/hw_breakpoint.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kernel/hw_breakpoint.c 
b/arch/arm64/kernel/hw_breakpoint.c
index 749f81779420..9a73f85ab9ad 100644
--- a/arch/arm64/kernel/hw_breakpoint.c
+++ b/arch/arm64/kernel/hw_breakpoint.c
@@ -661,7 +661,7 @@ static int breakpoint_handler(unsigned long unused, 
unsigned int esr,
perf_bp_event(bp, regs);
 
/* Do we need to handle the stepping? */
-   if (is_default_overflow_handler(bp))
+   if (hw_breakpoint_needs_single_step(bp))
step = 1;
 unlock:
rcu_read_unlock();
@@ -789,7 +789,7 @@ static int watchpoint_handler(unsigned long addr, unsigned 
int esr,
perf_bp_event(wp, regs);
 
/* Do we need to handle the stepping? */
-   if (is_default_overflow_handler(wp))
+   if (hw_breakpoint_needs_single_step(wp))
step = 1;
}
if (min_dist > 0 && min_dist != -1) {
@@ -800,7 +800,7 @@ static int watchpoint_handler(unsigned long addr, unsigned 
int esr,
perf_bp_event(wp, regs);
 
/* Do we need to handle the stepping? */
-   if (is_default_overflow_handler(wp))
+   if (hw_breakpoint_needs_single_step(wp))
step = 1;
}
rcu_read_unlock();
-- 
2.9.4



[PATCH v3 2/5] arm64: use hw_breakpoint_needs_single_step() to decide if step is needed

2017-07-31 Thread Pratyush Anand
Currently we use is_default_overflow_handler() to decide whether a
"step" will be needed or not. However, is_default_overflow_handler() is
true only for perf implementation. There can be some custom kernel
module tests like samples/hw_breakpoint/data_breakpoint.c which can
rely on default step handler.

hw_breakpoint_needs_single_step() will be true if any hw_breakpoint user
wants to use default step handler and sets step_needed in hw_perf_event.

Signed-off-by: Pratyush Anand 
---
 arch/arm64/kernel/hw_breakpoint.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kernel/hw_breakpoint.c 
b/arch/arm64/kernel/hw_breakpoint.c
index 749f81779420..9a73f85ab9ad 100644
--- a/arch/arm64/kernel/hw_breakpoint.c
+++ b/arch/arm64/kernel/hw_breakpoint.c
@@ -661,7 +661,7 @@ static int breakpoint_handler(unsigned long unused, 
unsigned int esr,
perf_bp_event(bp, regs);
 
/* Do we need to handle the stepping? */
-   if (is_default_overflow_handler(bp))
+   if (hw_breakpoint_needs_single_step(bp))
step = 1;
 unlock:
rcu_read_unlock();
@@ -789,7 +789,7 @@ static int watchpoint_handler(unsigned long addr, unsigned 
int esr,
perf_bp_event(wp, regs);
 
/* Do we need to handle the stepping? */
-   if (is_default_overflow_handler(wp))
+   if (hw_breakpoint_needs_single_step(wp))
step = 1;
}
if (min_dist > 0 && min_dist != -1) {
@@ -800,7 +800,7 @@ static int watchpoint_handler(unsigned long addr, unsigned 
int esr,
perf_bp_event(wp, regs);
 
/* Do we need to handle the stepping? */
-   if (is_default_overflow_handler(wp))
+   if (hw_breakpoint_needs_single_step(wp))
step = 1;
}
rcu_read_unlock();
-- 
2.9.4



[PATCH v3 1/5] hw_breakpoint: Add step_needed event attribute

2017-07-31 Thread Pratyush Anand
Architecture like ARM64 currently allows to use default hw breakpoint
single step handler only to perf. However, some other users like few
systemtap tests or kernel test in
samples/hw_breakpoint/data_breakpoint.c can also work with default step
handler implementation.

Therefore, this patch introduces a flag 'step_needed' in struct
hw_perf_event, so that arch specific code(specially on arm64) can make a
decision to enable single stepping.

Any architecture which is not using this field will not have any
side effect.

Signed-off-by: Pratyush Anand <pan...@redhat.com>
---
 include/linux/hw_breakpoint.h | 6 ++
 include/linux/perf_event.h| 6 ++
 kernel/events/core.c  | 2 ++
 3 files changed, 14 insertions(+)

diff --git a/include/linux/hw_breakpoint.h b/include/linux/hw_breakpoint.h
index 0464c85e63fd..b9ac9629bf74 100644
--- a/include/linux/hw_breakpoint.h
+++ b/include/linux/hw_breakpoint.h
@@ -38,6 +38,12 @@ static inline int hw_breakpoint_type(struct perf_event *bp)
return bp->attr.bp_type;
 }
 
+static inline bool
+hw_breakpoint_needs_single_step(struct perf_event *bp)
+{
+   return bp->hw.step_needed;
+}
+
 static inline unsigned long hw_breakpoint_len(struct perf_event *bp)
 {
return bp->attr.bp_len;
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 24a635887f28..7da951f94b47 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -233,6 +233,12 @@ struct hw_perf_event {
 */
u64 freq_time_stamp;
u64 freq_count_stamp;
+   /*
+* A HW breakpoint user can either have it's own step handling
+* mechanism or it can use default step handling meachanism defined
+* by arch code. Set step_needed to use default mechanism.
+*/
+   int step_needed;
 #endif
 };
 
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 6c4e523dc1e2..66ce5574e778 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -9444,9 +9444,11 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
} else if (is_write_backward(event)){
event->overflow_handler = perf_event_output_backward;
event->overflow_handler_context = NULL;
+   event->hw.step_needed = 1;
} else {
event->overflow_handler = perf_event_output_forward;
event->overflow_handler_context = NULL;
+   event->hw.step_needed = 1;
}
 
perf_event__state_init(event);
-- 
2.9.4



[PATCH v3 1/5] hw_breakpoint: Add step_needed event attribute

2017-07-31 Thread Pratyush Anand
Architecture like ARM64 currently allows to use default hw breakpoint
single step handler only to perf. However, some other users like few
systemtap tests or kernel test in
samples/hw_breakpoint/data_breakpoint.c can also work with default step
handler implementation.

Therefore, this patch introduces a flag 'step_needed' in struct
hw_perf_event, so that arch specific code(specially on arm64) can make a
decision to enable single stepping.

Any architecture which is not using this field will not have any
side effect.

Signed-off-by: Pratyush Anand 
---
 include/linux/hw_breakpoint.h | 6 ++
 include/linux/perf_event.h| 6 ++
 kernel/events/core.c  | 2 ++
 3 files changed, 14 insertions(+)

diff --git a/include/linux/hw_breakpoint.h b/include/linux/hw_breakpoint.h
index 0464c85e63fd..b9ac9629bf74 100644
--- a/include/linux/hw_breakpoint.h
+++ b/include/linux/hw_breakpoint.h
@@ -38,6 +38,12 @@ static inline int hw_breakpoint_type(struct perf_event *bp)
return bp->attr.bp_type;
 }
 
+static inline bool
+hw_breakpoint_needs_single_step(struct perf_event *bp)
+{
+   return bp->hw.step_needed;
+}
+
 static inline unsigned long hw_breakpoint_len(struct perf_event *bp)
 {
return bp->attr.bp_len;
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 24a635887f28..7da951f94b47 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -233,6 +233,12 @@ struct hw_perf_event {
 */
u64 freq_time_stamp;
u64 freq_count_stamp;
+   /*
+* A HW breakpoint user can either have it's own step handling
+* mechanism or it can use default step handling meachanism defined
+* by arch code. Set step_needed to use default mechanism.
+*/
+   int step_needed;
 #endif
 };
 
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 6c4e523dc1e2..66ce5574e778 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -9444,9 +9444,11 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
} else if (is_write_backward(event)){
event->overflow_handler = perf_event_output_backward;
event->overflow_handler_context = NULL;
+   event->hw.step_needed = 1;
} else {
event->overflow_handler = perf_event_output_forward;
event->overflow_handler_context = NULL;
+   event->hw.step_needed = 1;
}
 
perf_event__state_init(event);
-- 
2.9.4



[PATCH v3 3/5] register_wide_hw_breakpoint(): modify to accept step_needed arg

2017-07-31 Thread Pratyush Anand
arch like ARM64 expects 'step_needed = 1'  in order to use default
single step handler. Therefore, modify register_wide_hw_breakpoint()
implementation,so that we can set this field in struct hw_perf_event to
be used later by arch specific code.

Other arch will not have any affect as they do not use it so far.

Signed-off-by: Pratyush Anand <pan...@redhat.com>
---
 arch/x86/kernel/kgdb.c  | 2 +-
 include/linux/hw_breakpoint.h   | 4 ++--
 kernel/events/hw_breakpoint.c   | 4 +++-
 samples/hw_breakpoint/data_breakpoint.c | 3 ++-
 4 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/kgdb.c b/arch/x86/kernel/kgdb.c
index 8e36f249646e..19b24c50a952 100644
--- a/arch/x86/kernel/kgdb.c
+++ b/arch/x86/kernel/kgdb.c
@@ -674,7 +674,7 @@ void kgdb_arch_late(void)
for (i = 0; i < HBP_NUM; i++) {
if (breakinfo[i].pev)
continue;
-   breakinfo[i].pev = register_wide_hw_breakpoint(, NULL, 
NULL);
+   breakinfo[i].pev = register_wide_hw_breakpoint(, NULL, 
NULL, 0);
if (IS_ERR((void * __force)breakinfo[i].pev)) {
printk(KERN_ERR "kgdb: Could not allocate hw"
   "breakpoints\nDisabling the kernel debugger\n");
diff --git a/include/linux/hw_breakpoint.h b/include/linux/hw_breakpoint.h
index b9ac9629bf74..8cbc8ded6d50 100644
--- a/include/linux/hw_breakpoint.h
+++ b/include/linux/hw_breakpoint.h
@@ -71,7 +71,7 @@ register_wide_hw_breakpoint_cpu(struct perf_event_attr *attr,
 extern struct perf_event * __percpu *
 register_wide_hw_breakpoint(struct perf_event_attr *attr,
perf_overflow_handler_t triggered,
-   void *context);
+   void *context, int step);
 
 extern int register_perf_hw_breakpoint(struct perf_event *bp);
 extern int __register_perf_hw_breakpoint(struct perf_event *bp);
@@ -110,7 +110,7 @@ register_wide_hw_breakpoint_cpu(struct perf_event_attr 
*attr,
 static inline struct perf_event * __percpu *
 register_wide_hw_breakpoint(struct perf_event_attr *attr,
perf_overflow_handler_t triggered,
-   void *context)  { return NULL; }
+   void *context, int step){ return NULL; }
 static inline int
 register_perf_hw_breakpoint(struct perf_event *bp) { return -ENOSYS; }
 static inline int
diff --git a/kernel/events/hw_breakpoint.c b/kernel/events/hw_breakpoint.c
index 3f8cb1e14588..0dcb175276b2 100644
--- a/kernel/events/hw_breakpoint.c
+++ b/kernel/events/hw_breakpoint.c
@@ -492,13 +492,14 @@ EXPORT_SYMBOL_GPL(unregister_hw_breakpoint);
  * register_wide_hw_breakpoint - register a wide breakpoint in the kernel
  * @attr: breakpoint attributes
  * @triggered: callback to trigger when we hit the breakpoint
+ * @step: tells if framework can use default arch step handler
  *
  * @return a set of per_cpu pointers to perf events
  */
 struct perf_event * __percpu *
 register_wide_hw_breakpoint(struct perf_event_attr *attr,
perf_overflow_handler_t triggered,
-   void *context)
+   void *context, int step)
 {
struct perf_event * __percpu *cpu_events, *bp;
long err = 0;
@@ -512,6 +513,7 @@ register_wide_hw_breakpoint(struct perf_event_attr *attr,
for_each_online_cpu(cpu) {
bp = perf_event_create_kernel_counter(attr, cpu, NULL,
  triggered, context);
+   bp->hw.step_needed = step;
if (IS_ERR(bp)) {
err = PTR_ERR(bp);
break;
diff --git a/samples/hw_breakpoint/data_breakpoint.c 
b/samples/hw_breakpoint/data_breakpoint.c
index ef7f32291852..f64e59f9fbc6 100644
--- a/samples/hw_breakpoint/data_breakpoint.c
+++ b/samples/hw_breakpoint/data_breakpoint.c
@@ -60,7 +60,8 @@ static int __init hw_break_module_init(void)
attr.bp_len = HW_BREAKPOINT_LEN_4;
attr.bp_type = HW_BREAKPOINT_W | HW_BREAKPOINT_R;
 
-   sample_hbp = register_wide_hw_breakpoint(, sample_hbp_handler, 
NULL);
+   sample_hbp = register_wide_hw_breakpoint(, sample_hbp_handler,
+NULL, 1);
if (IS_ERR((void __force *)sample_hbp)) {
ret = PTR_ERR((void __force *)sample_hbp);
goto fail;
-- 
2.9.4



[PATCH v3 3/5] register_wide_hw_breakpoint(): modify to accept step_needed arg

2017-07-31 Thread Pratyush Anand
arch like ARM64 expects 'step_needed = 1'  in order to use default
single step handler. Therefore, modify register_wide_hw_breakpoint()
implementation,so that we can set this field in struct hw_perf_event to
be used later by arch specific code.

Other arch will not have any affect as they do not use it so far.

Signed-off-by: Pratyush Anand 
---
 arch/x86/kernel/kgdb.c  | 2 +-
 include/linux/hw_breakpoint.h   | 4 ++--
 kernel/events/hw_breakpoint.c   | 4 +++-
 samples/hw_breakpoint/data_breakpoint.c | 3 ++-
 4 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/kgdb.c b/arch/x86/kernel/kgdb.c
index 8e36f249646e..19b24c50a952 100644
--- a/arch/x86/kernel/kgdb.c
+++ b/arch/x86/kernel/kgdb.c
@@ -674,7 +674,7 @@ void kgdb_arch_late(void)
for (i = 0; i < HBP_NUM; i++) {
if (breakinfo[i].pev)
continue;
-   breakinfo[i].pev = register_wide_hw_breakpoint(, NULL, 
NULL);
+   breakinfo[i].pev = register_wide_hw_breakpoint(, NULL, 
NULL, 0);
if (IS_ERR((void * __force)breakinfo[i].pev)) {
printk(KERN_ERR "kgdb: Could not allocate hw"
   "breakpoints\nDisabling the kernel debugger\n");
diff --git a/include/linux/hw_breakpoint.h b/include/linux/hw_breakpoint.h
index b9ac9629bf74..8cbc8ded6d50 100644
--- a/include/linux/hw_breakpoint.h
+++ b/include/linux/hw_breakpoint.h
@@ -71,7 +71,7 @@ register_wide_hw_breakpoint_cpu(struct perf_event_attr *attr,
 extern struct perf_event * __percpu *
 register_wide_hw_breakpoint(struct perf_event_attr *attr,
perf_overflow_handler_t triggered,
-   void *context);
+   void *context, int step);
 
 extern int register_perf_hw_breakpoint(struct perf_event *bp);
 extern int __register_perf_hw_breakpoint(struct perf_event *bp);
@@ -110,7 +110,7 @@ register_wide_hw_breakpoint_cpu(struct perf_event_attr 
*attr,
 static inline struct perf_event * __percpu *
 register_wide_hw_breakpoint(struct perf_event_attr *attr,
perf_overflow_handler_t triggered,
-   void *context)  { return NULL; }
+   void *context, int step){ return NULL; }
 static inline int
 register_perf_hw_breakpoint(struct perf_event *bp) { return -ENOSYS; }
 static inline int
diff --git a/kernel/events/hw_breakpoint.c b/kernel/events/hw_breakpoint.c
index 3f8cb1e14588..0dcb175276b2 100644
--- a/kernel/events/hw_breakpoint.c
+++ b/kernel/events/hw_breakpoint.c
@@ -492,13 +492,14 @@ EXPORT_SYMBOL_GPL(unregister_hw_breakpoint);
  * register_wide_hw_breakpoint - register a wide breakpoint in the kernel
  * @attr: breakpoint attributes
  * @triggered: callback to trigger when we hit the breakpoint
+ * @step: tells if framework can use default arch step handler
  *
  * @return a set of per_cpu pointers to perf events
  */
 struct perf_event * __percpu *
 register_wide_hw_breakpoint(struct perf_event_attr *attr,
perf_overflow_handler_t triggered,
-   void *context)
+   void *context, int step)
 {
struct perf_event * __percpu *cpu_events, *bp;
long err = 0;
@@ -512,6 +513,7 @@ register_wide_hw_breakpoint(struct perf_event_attr *attr,
for_each_online_cpu(cpu) {
bp = perf_event_create_kernel_counter(attr, cpu, NULL,
  triggered, context);
+   bp->hw.step_needed = step;
if (IS_ERR(bp)) {
err = PTR_ERR(bp);
break;
diff --git a/samples/hw_breakpoint/data_breakpoint.c 
b/samples/hw_breakpoint/data_breakpoint.c
index ef7f32291852..f64e59f9fbc6 100644
--- a/samples/hw_breakpoint/data_breakpoint.c
+++ b/samples/hw_breakpoint/data_breakpoint.c
@@ -60,7 +60,8 @@ static int __init hw_break_module_init(void)
attr.bp_len = HW_BREAKPOINT_LEN_4;
attr.bp_type = HW_BREAKPOINT_W | HW_BREAKPOINT_R;
 
-   sample_hbp = register_wide_hw_breakpoint(, sample_hbp_handler, 
NULL);
+   sample_hbp = register_wide_hw_breakpoint(, sample_hbp_handler,
+NULL, 1);
if (IS_ERR((void __force *)sample_hbp)) {
ret = PTR_ERR((void __force *)sample_hbp);
goto fail;
-- 
2.9.4



[PATCH v3 4/5] arm64: disable irq between breakpoint and step exception

2017-07-31 Thread Pratyush Anand
If an interrupt is generated between breakpoint and step handler then
step handler can not get correct step address. This situation can easily
be invoked by samples/hw_breakpoint/data_breakpoint.c. It can also be
reproduced if we insert any printk() statement or dump_stack() in perf
overflow_handler. So, it seems that perf is working fine just luckily.
If the CPU which is handling perf breakpoint handler receives any
interrupt then, perf step handler will not execute sanely.

This patch improves do_debug_exception() handling, which enforces now,
that exception handler function:
- should return 0 for any software breakpoint and hw
breakpoint/watchpoint handler if it does not expect a single step stage
- should return 1 if it expects single step.
- A single step handler should always return 0.
- All handler should return a -ve error in any other case.

Now, we can know in do_debug_exception() that whether a step exception
will be followed or not. If there will a step exception then disable
irq. Re-enable it after single step handling.

Since we also fix brk_handler() for the above rule, so all SW kernel
breakpoint handler like kgdb and kprobe should behave similar to perf HW
breakpoint. Interrupt will be disabled if kgdb or kprobe breakpoint
handler requires a single stepping.

Signed-off-by: Pratyush Anand <pan...@redhat.com>
---
 arch/arm64/kernel/debug-monitors.c |  3 +++
 arch/arm64/kernel/hw_breakpoint.c  |  4 ++--
 arch/arm64/mm/fault.c  | 22 ++
 3 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/kernel/debug-monitors.c 
b/arch/arm64/kernel/debug-monitors.c
index d618e25c3de1..16f29f853b54 100644
--- a/arch/arm64/kernel/debug-monitors.c
+++ b/arch/arm64/kernel/debug-monitors.c
@@ -325,6 +325,9 @@ static int brk_handler(unsigned long addr, unsigned int esr,
return -EFAULT;
}
 
+   if (kernel_active_single_step() || test_thread_flag(TIF_SINGLESTEP))
+   return 1;
+
return 0;
 }
 NOKPROBE_SYMBOL(brk_handler);
diff --git a/arch/arm64/kernel/hw_breakpoint.c 
b/arch/arm64/kernel/hw_breakpoint.c
index 9a73f85ab9ad..d39b8039c70e 100644
--- a/arch/arm64/kernel/hw_breakpoint.c
+++ b/arch/arm64/kernel/hw_breakpoint.c
@@ -697,7 +697,7 @@ static int breakpoint_handler(unsigned long unused, 
unsigned int esr,
}
}
 
-   return 0;
+   return 1;
 }
 NOKPROBE_SYMBOL(breakpoint_handler);
 
@@ -840,7 +840,7 @@ static int watchpoint_handler(unsigned long addr, unsigned 
int esr,
}
}
 
-   return 0;
+   return 1;
 }
 NOKPROBE_SYMBOL(watchpoint_handler);
 
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 37b95dff0b07..ce5290dacba3 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -653,6 +653,13 @@ static struct fault_info __refdata debug_fault_info[] = {
{ do_bad,   SIGBUS, 0,  "unknown 7" 
},
 };
 
+/*
+ * fn should return 0 from any software breakpoint and hw
+ * breakpoint/watchpoint handler if it does not expect a single step stage
+ * and 1 if it expects single step followed by its execution. A single step
+ * handler should always return 0. All handler should return a -ve error in
+ * any other case.
+ */
 void __init hook_debug_fault_code(int nr,
  int (*fn)(unsigned long, unsigned int, struct 
pt_regs *),
  int sig, int code, const char *name)
@@ -665,6 +672,8 @@ void __init hook_debug_fault_code(int nr,
debug_fault_info[nr].name   = name;
 }
 
+static DEFINE_PER_CPU(bool, irq_enable_needed);
+
 asmlinkage int __exception do_debug_exception(unsigned long addr,
  unsigned int esr,
  struct pt_regs *regs)
@@ -672,6 +681,7 @@ asmlinkage int __exception do_debug_exception(unsigned long 
addr,
const struct fault_info *inf = debug_fault_info + DBG_ESR_EVT(esr);
struct siginfo info;
int rv;
+   bool *irq_en_needed = this_cpu_ptr(_enable_needed);
 
/*
 * Tell lockdep we disabled irqs in entry.S. Do nothing if they were
@@ -680,9 +690,8 @@ asmlinkage int __exception do_debug_exception(unsigned long 
addr,
if (interrupts_enabled(regs))
trace_hardirqs_off();
 
-   if (!inf->fn(addr, esr, regs)) {
-   rv = 1;
-   } else {
+   rv = inf->fn(addr, esr, regs);
+   if (rv < 0) {
pr_alert("Unhandled debug exception: %s (0x%08x) at 0x%016lx\n",
 inf->name, esr, addr);
 
@@ -691,7 +700,12 @@ asmlinkage int __exception do_debug_exception(unsigned 
long addr,
info.si_code  = inf->code;
info.si_addr  = (void __user *)addr;
arm64_notify_die("", regs, , 0);
-   rv = 0;
+   } else if (rv == 1 

[PATCH v3 5/5] arm64: fault: re-enable irq if it was disabled for single stepping

2017-07-31 Thread Pratyush Anand
We disable irq before single stepping and re-enable it after it.
However, if stepped instruction will cause a fault then we will enter
into fault handler with interrupt disabled, which is not desired.
But, we should be safe if we re-enable interrupt in fault handler if
it was disabled for single stepping.

Signed-off-by: Pratyush Anand <pan...@redhat.com>
---
 arch/arm64/mm/fault.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index ce5290dacba3..2b88807eb964 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -589,6 +589,8 @@ static const struct fault_info fault_info[] = {
{ do_bad,   SIGBUS,  0, "unknown 63"
},
 };
 
+static DEFINE_PER_CPU(bool, irq_enable_needed);
+
 /*
  * Dispatch a data abort to the relevant handler.
  */
@@ -597,6 +599,12 @@ asmlinkage void __exception do_mem_abort(unsigned long 
addr, unsigned int esr,
 {
const struct fault_info *inf = esr_to_fault_info(esr);
struct siginfo info;
+   bool *irq_en_needed = this_cpu_ptr(_enable_needed);
+
+   if (*irq_en_needed) {
+   regs->pstate &= ~PSR_I_BIT;
+   *irq_en_needed = false;
+   }
 
if (!inf->fn(addr, esr, regs))
return;
@@ -672,8 +680,6 @@ void __init hook_debug_fault_code(int nr,
debug_fault_info[nr].name   = name;
 }
 
-static DEFINE_PER_CPU(bool, irq_enable_needed);
-
 asmlinkage int __exception do_debug_exception(unsigned long addr,
  unsigned int esr,
  struct pt_regs *regs)
-- 
2.9.4



[PATCH v3 4/5] arm64: disable irq between breakpoint and step exception

2017-07-31 Thread Pratyush Anand
If an interrupt is generated between breakpoint and step handler then
step handler can not get correct step address. This situation can easily
be invoked by samples/hw_breakpoint/data_breakpoint.c. It can also be
reproduced if we insert any printk() statement or dump_stack() in perf
overflow_handler. So, it seems that perf is working fine just luckily.
If the CPU which is handling perf breakpoint handler receives any
interrupt then, perf step handler will not execute sanely.

This patch improves do_debug_exception() handling, which enforces now,
that exception handler function:
- should return 0 for any software breakpoint and hw
breakpoint/watchpoint handler if it does not expect a single step stage
- should return 1 if it expects single step.
- A single step handler should always return 0.
- All handler should return a -ve error in any other case.

Now, we can know in do_debug_exception() that whether a step exception
will be followed or not. If there will a step exception then disable
irq. Re-enable it after single step handling.

Since we also fix brk_handler() for the above rule, so all SW kernel
breakpoint handler like kgdb and kprobe should behave similar to perf HW
breakpoint. Interrupt will be disabled if kgdb or kprobe breakpoint
handler requires a single stepping.

Signed-off-by: Pratyush Anand 
---
 arch/arm64/kernel/debug-monitors.c |  3 +++
 arch/arm64/kernel/hw_breakpoint.c  |  4 ++--
 arch/arm64/mm/fault.c  | 22 ++
 3 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/kernel/debug-monitors.c 
b/arch/arm64/kernel/debug-monitors.c
index d618e25c3de1..16f29f853b54 100644
--- a/arch/arm64/kernel/debug-monitors.c
+++ b/arch/arm64/kernel/debug-monitors.c
@@ -325,6 +325,9 @@ static int brk_handler(unsigned long addr, unsigned int esr,
return -EFAULT;
}
 
+   if (kernel_active_single_step() || test_thread_flag(TIF_SINGLESTEP))
+   return 1;
+
return 0;
 }
 NOKPROBE_SYMBOL(brk_handler);
diff --git a/arch/arm64/kernel/hw_breakpoint.c 
b/arch/arm64/kernel/hw_breakpoint.c
index 9a73f85ab9ad..d39b8039c70e 100644
--- a/arch/arm64/kernel/hw_breakpoint.c
+++ b/arch/arm64/kernel/hw_breakpoint.c
@@ -697,7 +697,7 @@ static int breakpoint_handler(unsigned long unused, 
unsigned int esr,
}
}
 
-   return 0;
+   return 1;
 }
 NOKPROBE_SYMBOL(breakpoint_handler);
 
@@ -840,7 +840,7 @@ static int watchpoint_handler(unsigned long addr, unsigned 
int esr,
}
}
 
-   return 0;
+   return 1;
 }
 NOKPROBE_SYMBOL(watchpoint_handler);
 
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 37b95dff0b07..ce5290dacba3 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -653,6 +653,13 @@ static struct fault_info __refdata debug_fault_info[] = {
{ do_bad,   SIGBUS, 0,  "unknown 7" 
},
 };
 
+/*
+ * fn should return 0 from any software breakpoint and hw
+ * breakpoint/watchpoint handler if it does not expect a single step stage
+ * and 1 if it expects single step followed by its execution. A single step
+ * handler should always return 0. All handler should return a -ve error in
+ * any other case.
+ */
 void __init hook_debug_fault_code(int nr,
  int (*fn)(unsigned long, unsigned int, struct 
pt_regs *),
  int sig, int code, const char *name)
@@ -665,6 +672,8 @@ void __init hook_debug_fault_code(int nr,
debug_fault_info[nr].name   = name;
 }
 
+static DEFINE_PER_CPU(bool, irq_enable_needed);
+
 asmlinkage int __exception do_debug_exception(unsigned long addr,
  unsigned int esr,
  struct pt_regs *regs)
@@ -672,6 +681,7 @@ asmlinkage int __exception do_debug_exception(unsigned long 
addr,
const struct fault_info *inf = debug_fault_info + DBG_ESR_EVT(esr);
struct siginfo info;
int rv;
+   bool *irq_en_needed = this_cpu_ptr(_enable_needed);
 
/*
 * Tell lockdep we disabled irqs in entry.S. Do nothing if they were
@@ -680,9 +690,8 @@ asmlinkage int __exception do_debug_exception(unsigned long 
addr,
if (interrupts_enabled(regs))
trace_hardirqs_off();
 
-   if (!inf->fn(addr, esr, regs)) {
-   rv = 1;
-   } else {
+   rv = inf->fn(addr, esr, regs);
+   if (rv < 0) {
pr_alert("Unhandled debug exception: %s (0x%08x) at 0x%016lx\n",
 inf->name, esr, addr);
 
@@ -691,7 +700,12 @@ asmlinkage int __exception do_debug_exception(unsigned 
long addr,
info.si_code  = inf->code;
info.si_addr  = (void __user *)addr;
arm64_notify_die("", regs, , 0);
-   rv = 0;
+   } else if (rv == 1 && interrupts_enabl

[PATCH v3 5/5] arm64: fault: re-enable irq if it was disabled for single stepping

2017-07-31 Thread Pratyush Anand
We disable irq before single stepping and re-enable it after it.
However, if stepped instruction will cause a fault then we will enter
into fault handler with interrupt disabled, which is not desired.
But, we should be safe if we re-enable interrupt in fault handler if
it was disabled for single stepping.

Signed-off-by: Pratyush Anand 
---
 arch/arm64/mm/fault.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index ce5290dacba3..2b88807eb964 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -589,6 +589,8 @@ static const struct fault_info fault_info[] = {
{ do_bad,   SIGBUS,  0, "unknown 63"
},
 };
 
+static DEFINE_PER_CPU(bool, irq_enable_needed);
+
 /*
  * Dispatch a data abort to the relevant handler.
  */
@@ -597,6 +599,12 @@ asmlinkage void __exception do_mem_abort(unsigned long 
addr, unsigned int esr,
 {
const struct fault_info *inf = esr_to_fault_info(esr);
struct siginfo info;
+   bool *irq_en_needed = this_cpu_ptr(_enable_needed);
+
+   if (*irq_en_needed) {
+   regs->pstate &= ~PSR_I_BIT;
+   *irq_en_needed = false;
+   }
 
if (!inf->fn(addr, esr, regs))
return;
@@ -672,8 +680,6 @@ void __init hook_debug_fault_code(int nr,
debug_fault_info[nr].name   = name;
 }
 
-static DEFINE_PER_CPU(bool, irq_enable_needed);
-
 asmlinkage int __exception do_debug_exception(unsigned long addr,
  unsigned int esr,
  struct pt_regs *regs)
-- 
2.9.4



[PATCH v3 0/5] ARM64: disable irq between breakpoint and step exception

2017-07-31 Thread Pratyush Anand
v2 -> v3
- Moved step_needed from uapi structure to kernel only structure
- Re-enable interrupt if stepped instruction faults
- Modified register_wide_hw_breakpoint() to accept step_needed arg
v2 was here: http://marc.info/?l=linux-arm-kernel=149942910730496=2

v1 -> v2:
- patch 1 of v1 has been modified to patch 1-3 of v2.
- Introduced a new event attribute step_needed and implemented
  hw_breakpoint_needs_single_step() (patch 1)
- Replaced usage of is_default_overflow_handler() with
  hw_breakpoint_needs_single_step(). (patch 2)
- Modified sample test to set set step_needed bit field (patch 3)
v1 was here: http://marc.info/?l=linux-arm-kernel=149910958418708=2

samples/hw_breakpoint/data_breakpoint.c passes with x86_64 but fails with
ARM64. Even though it has been NAKed previously on upstream [1, 2], I have
tried to come up with patches which can resolve it for ARM64 as well.

I noticed that even perf step exception can go into an infinite loop if CPU
receives an interrupt while executing breakpoint/watchpoint handler. So,
event though we are not concerned about above test, we will have to find a
solution for the perf issue.

This patchset attempts to resolve both the issue. Please review.
Since, it also takes care of SW breakpoint, so I hope kgdb should also be
fine. However, I have not tested that.
@Takahiro: Will it be possible to test these patches for kgdb.

[1] http://marc.info/?l=linux-arm-kernel=149580777524910=2
[2] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-April/425266.html


Pratyush Anand (5):
  hw_breakpoint: Add step_needed event attribute
  arm64: use hw_breakpoint_needs_single_step() to decide if step is
needed
  register_wide_hw_breakpoint(): modify to accept step_needed arg
  arm64: disable irq between breakpoint and step exception
  arm64: fault: re-enable irq if it was disabled for single stepping

 arch/arm64/kernel/debug-monitors.c  |  3 +++
 arch/arm64/kernel/hw_breakpoint.c   | 10 +-
 arch/arm64/mm/fault.c   | 28 
 arch/x86/kernel/kgdb.c  |  2 +-
 include/linux/hw_breakpoint.h   | 10 --
 include/linux/perf_event.h  |  6 ++
 kernel/events/core.c|  2 ++
 kernel/events/hw_breakpoint.c   |  4 +++-
 samples/hw_breakpoint/data_breakpoint.c |  3 ++-
 9 files changed, 54 insertions(+), 14 deletions(-)

-- 
2.9.4



[PATCH v3 0/5] ARM64: disable irq between breakpoint and step exception

2017-07-31 Thread Pratyush Anand
v2 -> v3
- Moved step_needed from uapi structure to kernel only structure
- Re-enable interrupt if stepped instruction faults
- Modified register_wide_hw_breakpoint() to accept step_needed arg
v2 was here: http://marc.info/?l=linux-arm-kernel=149942910730496=2

v1 -> v2:
- patch 1 of v1 has been modified to patch 1-3 of v2.
- Introduced a new event attribute step_needed and implemented
  hw_breakpoint_needs_single_step() (patch 1)
- Replaced usage of is_default_overflow_handler() with
  hw_breakpoint_needs_single_step(). (patch 2)
- Modified sample test to set set step_needed bit field (patch 3)
v1 was here: http://marc.info/?l=linux-arm-kernel=149910958418708=2

samples/hw_breakpoint/data_breakpoint.c passes with x86_64 but fails with
ARM64. Even though it has been NAKed previously on upstream [1, 2], I have
tried to come up with patches which can resolve it for ARM64 as well.

I noticed that even perf step exception can go into an infinite loop if CPU
receives an interrupt while executing breakpoint/watchpoint handler. So,
event though we are not concerned about above test, we will have to find a
solution for the perf issue.

This patchset attempts to resolve both the issue. Please review.
Since, it also takes care of SW breakpoint, so I hope kgdb should also be
fine. However, I have not tested that.
@Takahiro: Will it be possible to test these patches for kgdb.

[1] http://marc.info/?l=linux-arm-kernel=149580777524910=2
[2] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-April/425266.html


Pratyush Anand (5):
  hw_breakpoint: Add step_needed event attribute
  arm64: use hw_breakpoint_needs_single_step() to decide if step is
needed
  register_wide_hw_breakpoint(): modify to accept step_needed arg
  arm64: disable irq between breakpoint and step exception
  arm64: fault: re-enable irq if it was disabled for single stepping

 arch/arm64/kernel/debug-monitors.c  |  3 +++
 arch/arm64/kernel/hw_breakpoint.c   | 10 +-
 arch/arm64/mm/fault.c   | 28 
 arch/x86/kernel/kgdb.c  |  2 +-
 include/linux/hw_breakpoint.h   | 10 --
 include/linux/perf_event.h  |  6 ++
 kernel/events/core.c|  2 ++
 kernel/events/hw_breakpoint.c   |  4 +++-
 samples/hw_breakpoint/data_breakpoint.c |  3 ++-
 9 files changed, 54 insertions(+), 14 deletions(-)

-- 
2.9.4



Re: [PATCH V2 1/4] hw_breakpoint: Add step_needed event attribute

2017-07-25 Thread Pratyush Anand



On Tuesday 25 July 2017 06:57 PM, Will Deacon wrote:

On Fri, Jul 07, 2017 at 05:33:57PM +0530, Pratyush Anand wrote:

Architecture like ARM64 currently allows to use default hw breakpoint
single step handler only to perf. However, some other users like few
systemtap tests or kernel test in
samples/hw_breakpoint/data_breakpoint.c can also work with default step
handler implementation. At the same time, some other like GDB/ptrace may
implement their own step handler.

Therefore, this patch introduces a new perf_event_attr bit field, so
that arch specific code(specially on arm64) can make a decision to
enable single stepping.

Any architecture which is not using this field will not have any
side effect.

Signed-off-by: Pratyush Anand <pan...@redhat.com>
---
  include/linux/hw_breakpoint.h | 6 ++
  include/uapi/linux/perf_event.h   | 3 ++-
  kernel/events/core.c  | 2 ++
  tools/include/uapi/linux/perf_event.h | 3 ++-
  4 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/include/linux/hw_breakpoint.h b/include/linux/hw_breakpoint.h
index 0464c85e63fd..6173ae048cbc 100644
--- a/include/linux/hw_breakpoint.h
+++ b/include/linux/hw_breakpoint.h
@@ -38,6 +38,12 @@ static inline int hw_breakpoint_type(struct perf_event *bp)
return bp->attr.bp_type;
  }
  
+static inline bool

+hw_breakpoint_needs_single_step(struct perf_event *bp)
+{
+   return bp->attr.step_needed;
+}
+
  static inline unsigned long hw_breakpoint_len(struct perf_event *bp)
  {
return bp->attr.bp_len;
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index b1c0b187acfe..00935808de0d 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -345,7 +345,8 @@ struct perf_event_attr {
context_switch :  1, /* context switch data */
write_backward :  1, /* Write ring buffer from 
end to beginning */
namespaces :  1, /* include namespaces data 
*/
-   __reserved_1   : 35;
+   step_needed:  1, /* Use arch step handler */
+   __reserved_1   : 34;


This needs documenting properly, as I really have no idea how userspace is
going to use it sensibley, especially as you silently overwrite it in some
cases below.


I too had thought to put it under include/linux/perf_event.h : struct 
perf_event. But, see hw_break_module_init() which does not have knowledge of 
this structure, and we need to have some way so that none-perf kernel module 
implementation can tell that it needs default arch step handler.


Do you see any alternative?

--
Regards
Pratyush


Re: [PATCH V2 1/4] hw_breakpoint: Add step_needed event attribute

2017-07-25 Thread Pratyush Anand



On Tuesday 25 July 2017 06:57 PM, Will Deacon wrote:

On Fri, Jul 07, 2017 at 05:33:57PM +0530, Pratyush Anand wrote:

Architecture like ARM64 currently allows to use default hw breakpoint
single step handler only to perf. However, some other users like few
systemtap tests or kernel test in
samples/hw_breakpoint/data_breakpoint.c can also work with default step
handler implementation. At the same time, some other like GDB/ptrace may
implement their own step handler.

Therefore, this patch introduces a new perf_event_attr bit field, so
that arch specific code(specially on arm64) can make a decision to
enable single stepping.

Any architecture which is not using this field will not have any
side effect.

Signed-off-by: Pratyush Anand 
---
  include/linux/hw_breakpoint.h | 6 ++
  include/uapi/linux/perf_event.h   | 3 ++-
  kernel/events/core.c  | 2 ++
  tools/include/uapi/linux/perf_event.h | 3 ++-
  4 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/include/linux/hw_breakpoint.h b/include/linux/hw_breakpoint.h
index 0464c85e63fd..6173ae048cbc 100644
--- a/include/linux/hw_breakpoint.h
+++ b/include/linux/hw_breakpoint.h
@@ -38,6 +38,12 @@ static inline int hw_breakpoint_type(struct perf_event *bp)
return bp->attr.bp_type;
  }
  
+static inline bool

+hw_breakpoint_needs_single_step(struct perf_event *bp)
+{
+   return bp->attr.step_needed;
+}
+
  static inline unsigned long hw_breakpoint_len(struct perf_event *bp)
  {
return bp->attr.bp_len;
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index b1c0b187acfe..00935808de0d 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -345,7 +345,8 @@ struct perf_event_attr {
context_switch :  1, /* context switch data */
write_backward :  1, /* Write ring buffer from 
end to beginning */
namespaces :  1, /* include namespaces data 
*/
-   __reserved_1   : 35;
+   step_needed:  1, /* Use arch step handler */
+   __reserved_1   : 34;


This needs documenting properly, as I really have no idea how userspace is
going to use it sensibley, especially as you silently overwrite it in some
cases below.


I too had thought to put it under include/linux/perf_event.h : struct 
perf_event. But, see hw_break_module_init() which does not have knowledge of 
this structure, and we need to have some way so that none-perf kernel module 
implementation can tell that it needs default arch step handler.


Do you see any alternative?

--
Regards
Pratyush


Re: [PATCH V2 4/4] arm64: disable irq between breakpoint and step exception

2017-07-25 Thread Pratyush Anand

Hi Will,

Thanks for your review.

On Tuesday 25 July 2017 06:55 PM, Will Deacon wrote:

On Fri, Jul 07, 2017 at 05:34:00PM +0530, Pratyush Anand wrote:

If an interrupt is generated between breakpoint and step handler then
step handler can not get correct step address. This situation can easily
be invoked by samples/hw_breakpoint/data_breakpoint.c. It can also be
reproduced if we insert any printk() statement or dump_stack() in perf
overflow_handler. So, it seems that perf is working fine just luckily.
If the CPU which is handling perf breakpoint handler receives any
interrupt then, perf step handler will not execute sanely.

This patch improves do_debug_exception() handling, which enforces now,
that exception handler function:
- should return 0 for any software breakpoint and hw
breakpoint/watchpoint handler if it does not expect a single step stage
- should return 1 if it expects single step.
- A single step handler should always return 0.
- All handler should return a -ve error in any other case.

Now, we can know in do_debug_exception() that whether a step exception
will be followed or not. If there will a step exception then disable
irq. Re-enable it after single step handling.


AFAICT, this is only a problem for kernel-mode breakpoints where we end up
stepping into the interrupt handler when trying to step over a breakpoint.


I think yes.



We'd probably be better off getting all users of kernel step (kprobes, kgdb
and perf) to run the step with irqs disabled,



That should be doable. We can easily manage all of them in 
do_debug_exception() if individual brk handlers return correct value as per 
the rule mentioned in the commit log of this patch.


I think, I can take care of kprobes and kgdb as well in next version of patch.


but I still have reservations
about that:


So, IIUC, you have concern about faulting of a instruction being stepped. 
Since we will have a notion of *irq_en_needed*, so I think, if needed we can 
re-enable interrupt in fault handler do_mem_abort().


Whats your opinion here?



   http://lists.infradead.org/pipermail/linux-arm-kernel/2017-May/508066.html
   http://lists.infradead.org/pipermail/linux-arm-kernel/2017-June/510814.html

Wouldn't it be better to follow kprobes/kgdb and have perf run the step with
irqs disabled?

--
Regards
Pratyush


Re: [PATCH V2 4/4] arm64: disable irq between breakpoint and step exception

2017-07-25 Thread Pratyush Anand

Hi Will,

Thanks for your review.

On Tuesday 25 July 2017 06:55 PM, Will Deacon wrote:

On Fri, Jul 07, 2017 at 05:34:00PM +0530, Pratyush Anand wrote:

If an interrupt is generated between breakpoint and step handler then
step handler can not get correct step address. This situation can easily
be invoked by samples/hw_breakpoint/data_breakpoint.c. It can also be
reproduced if we insert any printk() statement or dump_stack() in perf
overflow_handler. So, it seems that perf is working fine just luckily.
If the CPU which is handling perf breakpoint handler receives any
interrupt then, perf step handler will not execute sanely.

This patch improves do_debug_exception() handling, which enforces now,
that exception handler function:
- should return 0 for any software breakpoint and hw
breakpoint/watchpoint handler if it does not expect a single step stage
- should return 1 if it expects single step.
- A single step handler should always return 0.
- All handler should return a -ve error in any other case.

Now, we can know in do_debug_exception() that whether a step exception
will be followed or not. If there will a step exception then disable
irq. Re-enable it after single step handling.


AFAICT, this is only a problem for kernel-mode breakpoints where we end up
stepping into the interrupt handler when trying to step over a breakpoint.


I think yes.



We'd probably be better off getting all users of kernel step (kprobes, kgdb
and perf) to run the step with irqs disabled,



That should be doable. We can easily manage all of them in 
do_debug_exception() if individual brk handlers return correct value as per 
the rule mentioned in the commit log of this patch.


I think, I can take care of kprobes and kgdb as well in next version of patch.


but I still have reservations
about that:


So, IIUC, you have concern about faulting of a instruction being stepped. 
Since we will have a notion of *irq_en_needed*, so I think, if needed we can 
re-enable interrupt in fault handler do_mem_abort().


Whats your opinion here?



   http://lists.infradead.org/pipermail/linux-arm-kernel/2017-May/508066.html
   http://lists.infradead.org/pipermail/linux-arm-kernel/2017-June/510814.html

Wouldn't it be better to follow kprobes/kgdb and have perf run the step with
irqs disabled?

--
Regards
Pratyush


Re: [PATCH V2 0/4] ARM64: Fix irq generation between breakpoint and step exception

2017-07-16 Thread Pratyush Anand



On Friday 07 July 2017 05:33 PM, Pratyush Anand wrote:

v1 was here http://marc.info/?l=linux-arm-kernel=149910958418708=2

v1 -> v2:
- patch 1 of v1 has been modified to patch 1-3 of v2.
- Introduced a new event attribute step_needed and implemented
  hw_breakpoint_needs_single_step() (patch 1)
- Replaced usage of is_default_overflow_handler() with
  hw_breakpoint_needs_single_step(). (patch 2)
- Modified sample test to set set step_needed bit field (patch 3)

samples/hw_breakpoint/data_breakpoint.c passes with x86_64 but fails with
ARM64. Even though it has been NAKed previously on upstream [1, 2], I have
tried to come up with patches which can resolve it for ARM64 as well.

I noticed that even perf step exception can go into an infinite loop if CPU
receives an interrupt while executing breakpoint/watchpoint handler. So,
event though we are not concerned about above test, we will have to find a
solution for the perf issue.

This patchset attempts to resolve both the issue. Please review.


Any comments/feedback?



[1] http://marc.info/?l=linux-arm-kernel=149580777524910=2
[2] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-April/425266.html

Pratyush Anand (4):
  hw_breakpoint: Add step_needed event attribute
  arm64: use hw_breakpoint_needs_single_step() to decide if step is
needed
  hw-breakpoint: sample test: set step_needed bit field
  arm64: disable irq between breakpoint and step exception

 arch/arm64/kernel/debug-monitors.c  |  3 +++
 arch/arm64/kernel/hw_breakpoint.c   | 10 +-
 arch/arm64/mm/fault.c   | 22 ++
 include/linux/hw_breakpoint.h   |  6 ++
 include/uapi/linux/perf_event.h |  3 ++-
 kernel/events/core.c|  2 ++
 samples/hw_breakpoint/data_breakpoint.c |  1 +
 tools/include/uapi/linux/perf_event.h   |  3 ++-
 8 files changed, 39 insertions(+), 11 deletions(-)



--
Pratyush


Re: [PATCH V2 0/4] ARM64: Fix irq generation between breakpoint and step exception

2017-07-16 Thread Pratyush Anand



On Friday 07 July 2017 05:33 PM, Pratyush Anand wrote:

v1 was here http://marc.info/?l=linux-arm-kernel=149910958418708=2

v1 -> v2:
- patch 1 of v1 has been modified to patch 1-3 of v2.
- Introduced a new event attribute step_needed and implemented
  hw_breakpoint_needs_single_step() (patch 1)
- Replaced usage of is_default_overflow_handler() with
  hw_breakpoint_needs_single_step(). (patch 2)
- Modified sample test to set set step_needed bit field (patch 3)

samples/hw_breakpoint/data_breakpoint.c passes with x86_64 but fails with
ARM64. Even though it has been NAKed previously on upstream [1, 2], I have
tried to come up with patches which can resolve it for ARM64 as well.

I noticed that even perf step exception can go into an infinite loop if CPU
receives an interrupt while executing breakpoint/watchpoint handler. So,
event though we are not concerned about above test, we will have to find a
solution for the perf issue.

This patchset attempts to resolve both the issue. Please review.


Any comments/feedback?



[1] http://marc.info/?l=linux-arm-kernel=149580777524910=2
[2] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-April/425266.html

Pratyush Anand (4):
  hw_breakpoint: Add step_needed event attribute
  arm64: use hw_breakpoint_needs_single_step() to decide if step is
needed
  hw-breakpoint: sample test: set step_needed bit field
  arm64: disable irq between breakpoint and step exception

 arch/arm64/kernel/debug-monitors.c  |  3 +++
 arch/arm64/kernel/hw_breakpoint.c   | 10 +-
 arch/arm64/mm/fault.c   | 22 ++
 include/linux/hw_breakpoint.h   |  6 ++
 include/uapi/linux/perf_event.h |  3 ++-
 kernel/events/core.c|  2 ++
 samples/hw_breakpoint/data_breakpoint.c |  1 +
 tools/include/uapi/linux/perf_event.h   |  3 ++-
 8 files changed, 39 insertions(+), 11 deletions(-)



--
Pratyush


Re: [PATCH] usb: gadget: functions: add ftrace export over USB

2017-07-13 Thread Pratyush Anand

Hi Felipe,

On Friday 09 June 2017 03:58 PM, Felipe Balbi wrote:

Felipe Balbi  writes:


Allow for ftrace data to be exported over a USB Gadget
Controller. With this, we have a potentially very fast pipe for
transmitting ftrace data to a Host PC for further analysis.

Note that in order to decode the data, one needs access to kernel
symbols in order to convert binary data into function names and what
not.

Signed-off-by: Felipe Balbi 
---

I wanted to take this through the gadget tree, but there is a
dependency with a previous patch of mine adding and extra argument to
the ->write() function. Hoping someone else will take it.


just as an extra note here. In order for this to be really useful, it
would be nice to be able to control what is going to be traced over USB


Probably you will also need to export *atleast* symbol information. In future, 
if this framework is extended to export tracepoint/kprobe/uprobe event data, 
then the information like event format etc will also need to be exported.


What tool do you use on host machine in order to extract information from this 
exported target ring buffer data?

IMHO, standard tools like trace-cmd will not be able to use it as it is.


as well, but that means exporting a few extra functions to GPL drivers.

Would that be okay? I could have a set of vendor-specific control
requests to set buffer size and to read/write ftrace filter functions.

The idea is that things like e.g. Android SDK could rely on this on
debug builds and the SDK itself would make sure to keep a copy of
vmlinux around to processing of the data coming through USB.



--
Pratyush


Re: [PATCH] usb: gadget: functions: add ftrace export over USB

2017-07-13 Thread Pratyush Anand

Hi Felipe,

On Friday 09 June 2017 03:58 PM, Felipe Balbi wrote:

Felipe Balbi  writes:


Allow for ftrace data to be exported over a USB Gadget
Controller. With this, we have a potentially very fast pipe for
transmitting ftrace data to a Host PC for further analysis.

Note that in order to decode the data, one needs access to kernel
symbols in order to convert binary data into function names and what
not.

Signed-off-by: Felipe Balbi 
---

I wanted to take this through the gadget tree, but there is a
dependency with a previous patch of mine adding and extra argument to
the ->write() function. Hoping someone else will take it.


just as an extra note here. In order for this to be really useful, it
would be nice to be able to control what is going to be traced over USB


Probably you will also need to export *atleast* symbol information. In future, 
if this framework is extended to export tracepoint/kprobe/uprobe event data, 
then the information like event format etc will also need to be exported.


What tool do you use on host machine in order to extract information from this 
exported target ring buffer data?

IMHO, standard tools like trace-cmd will not be able to use it as it is.


as well, but that means exporting a few extra functions to GPL drivers.

Would that be okay? I could have a set of vendor-specific control
requests to set buffer size and to read/write ftrace filter functions.

The idea is that things like e.g. Android SDK could rely on this on
debug builds and the SDK itself would make sure to keep a copy of
vmlinux around to processing of the data coming through USB.



--
Pratyush


Re: [PATCH] usb: gadget: functions: add ftrace export over USB

2017-07-13 Thread Pratyush Anand

Hi Felipe,

On Friday 09 June 2017 11:43 AM, Felipe Balbi wrote:

+static void notrace ftrace_write(struct trace_export *ftrace, const void *buf,
+unsigned int len)
+{
+   struct usb_ftrace   *trace = ftrace_to_trace(ftrace);
+   struct usb_request  *req = next_request(>list);
+
+   if (!req)
+   return;
+
+   if (!trace->in->enabled)
+   return;
+
+   req->buf = kmemdup(buf, len, GFP_ATOMIC);


Probably we can avoid the copy of trace data.

We can make write() call of "struct trace_export" as posted. Can have a 
write_complete() callback function implemented in  struct trace_export,which 
can be called from your ftrace_complete().


We need to execute __buffer_unlock_commit() only in write_complete() in case 
of ftrace_export is enabled.




+   req->length = len;
+   req->context = trace;
+   req->complete = ftrace_complete;
+   list_move_tail(>list, >pending);
+
+   schedule_work(>queue_work);
+}
+


--
Pratyush


Re: [PATCH] usb: gadget: functions: add ftrace export over USB

2017-07-13 Thread Pratyush Anand

Hi Felipe,

On Friday 09 June 2017 11:43 AM, Felipe Balbi wrote:

+static void notrace ftrace_write(struct trace_export *ftrace, const void *buf,
+unsigned int len)
+{
+   struct usb_ftrace   *trace = ftrace_to_trace(ftrace);
+   struct usb_request  *req = next_request(>list);
+
+   if (!req)
+   return;
+
+   if (!trace->in->enabled)
+   return;
+
+   req->buf = kmemdup(buf, len, GFP_ATOMIC);


Probably we can avoid the copy of trace data.

We can make write() call of "struct trace_export" as posted. Can have a 
write_complete() callback function implemented in  struct trace_export,which 
can be called from your ftrace_complete().


We need to execute __buffer_unlock_commit() only in write_complete() in case 
of ftrace_export is enabled.




+   req->length = len;
+   req->context = trace;
+   req->complete = ftrace_complete;
+   list_move_tail(>list, >pending);
+
+   schedule_work(>queue_work);
+}
+


--
Pratyush


[PATCH V2 2/4] arm64: use hw_breakpoint_needs_single_step() to decide if step is needed

2017-07-07 Thread Pratyush Anand
Currently we use is_default_overflow_handler() to decide whether a
"step" will be needed or not. However, is_default_overflow_handler() is
true only for perf implementation. There can be some custom kernel
module tests like samples/hw_breakpoint/data_breakpoint.c which can
rely on default step handler.

hw_breakpoint_needs_single_step() will be true if any hw_breakpoint user
wants to use default step handler and sets step_needed in attribute.

Signed-off-by: Pratyush Anand <pan...@redhat.com>
---
 arch/arm64/kernel/hw_breakpoint.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kernel/hw_breakpoint.c 
b/arch/arm64/kernel/hw_breakpoint.c
index 749f81779420..9a73f85ab9ad 100644
--- a/arch/arm64/kernel/hw_breakpoint.c
+++ b/arch/arm64/kernel/hw_breakpoint.c
@@ -661,7 +661,7 @@ static int breakpoint_handler(unsigned long unused, 
unsigned int esr,
perf_bp_event(bp, regs);
 
/* Do we need to handle the stepping? */
-   if (is_default_overflow_handler(bp))
+   if (hw_breakpoint_needs_single_step(bp))
step = 1;
 unlock:
rcu_read_unlock();
@@ -789,7 +789,7 @@ static int watchpoint_handler(unsigned long addr, unsigned 
int esr,
perf_bp_event(wp, regs);
 
/* Do we need to handle the stepping? */
-   if (is_default_overflow_handler(wp))
+   if (hw_breakpoint_needs_single_step(wp))
step = 1;
}
if (min_dist > 0 && min_dist != -1) {
@@ -800,7 +800,7 @@ static int watchpoint_handler(unsigned long addr, unsigned 
int esr,
perf_bp_event(wp, regs);
 
/* Do we need to handle the stepping? */
-   if (is_default_overflow_handler(wp))
+   if (hw_breakpoint_needs_single_step(wp))
step = 1;
}
rcu_read_unlock();
-- 
2.9.3



[PATCH V2 3/4] hw-breakpoint: sample test: set step_needed bit field

2017-07-07 Thread Pratyush Anand
arch like ARM64 expects 'step_needed == true'  in order to use default
single step handler. Therefore, set the bit field in the test case.
Other arch will not have any affect as they do not use it so far.

Signed-off-by: Pratyush Anand <pan...@redhat.com>
---
 samples/hw_breakpoint/data_breakpoint.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/samples/hw_breakpoint/data_breakpoint.c 
b/samples/hw_breakpoint/data_breakpoint.c
index ef7f32291852..5a1919d01800 100644
--- a/samples/hw_breakpoint/data_breakpoint.c
+++ b/samples/hw_breakpoint/data_breakpoint.c
@@ -59,6 +59,7 @@ static int __init hw_break_module_init(void)
attr.bp_addr = kallsyms_lookup_name(ksym_name);
attr.bp_len = HW_BREAKPOINT_LEN_4;
attr.bp_type = HW_BREAKPOINT_W | HW_BREAKPOINT_R;
+   attr.step_needed = true;
 
sample_hbp = register_wide_hw_breakpoint(, sample_hbp_handler, 
NULL);
if (IS_ERR((void __force *)sample_hbp)) {
-- 
2.9.3



[PATCH V2 2/4] arm64: use hw_breakpoint_needs_single_step() to decide if step is needed

2017-07-07 Thread Pratyush Anand
Currently we use is_default_overflow_handler() to decide whether a
"step" will be needed or not. However, is_default_overflow_handler() is
true only for perf implementation. There can be some custom kernel
module tests like samples/hw_breakpoint/data_breakpoint.c which can
rely on default step handler.

hw_breakpoint_needs_single_step() will be true if any hw_breakpoint user
wants to use default step handler and sets step_needed in attribute.

Signed-off-by: Pratyush Anand 
---
 arch/arm64/kernel/hw_breakpoint.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kernel/hw_breakpoint.c 
b/arch/arm64/kernel/hw_breakpoint.c
index 749f81779420..9a73f85ab9ad 100644
--- a/arch/arm64/kernel/hw_breakpoint.c
+++ b/arch/arm64/kernel/hw_breakpoint.c
@@ -661,7 +661,7 @@ static int breakpoint_handler(unsigned long unused, 
unsigned int esr,
perf_bp_event(bp, regs);
 
/* Do we need to handle the stepping? */
-   if (is_default_overflow_handler(bp))
+   if (hw_breakpoint_needs_single_step(bp))
step = 1;
 unlock:
rcu_read_unlock();
@@ -789,7 +789,7 @@ static int watchpoint_handler(unsigned long addr, unsigned 
int esr,
perf_bp_event(wp, regs);
 
/* Do we need to handle the stepping? */
-   if (is_default_overflow_handler(wp))
+   if (hw_breakpoint_needs_single_step(wp))
step = 1;
}
if (min_dist > 0 && min_dist != -1) {
@@ -800,7 +800,7 @@ static int watchpoint_handler(unsigned long addr, unsigned 
int esr,
perf_bp_event(wp, regs);
 
/* Do we need to handle the stepping? */
-   if (is_default_overflow_handler(wp))
+   if (hw_breakpoint_needs_single_step(wp))
step = 1;
}
rcu_read_unlock();
-- 
2.9.3



[PATCH V2 3/4] hw-breakpoint: sample test: set step_needed bit field

2017-07-07 Thread Pratyush Anand
arch like ARM64 expects 'step_needed == true'  in order to use default
single step handler. Therefore, set the bit field in the test case.
Other arch will not have any affect as they do not use it so far.

Signed-off-by: Pratyush Anand 
---
 samples/hw_breakpoint/data_breakpoint.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/samples/hw_breakpoint/data_breakpoint.c 
b/samples/hw_breakpoint/data_breakpoint.c
index ef7f32291852..5a1919d01800 100644
--- a/samples/hw_breakpoint/data_breakpoint.c
+++ b/samples/hw_breakpoint/data_breakpoint.c
@@ -59,6 +59,7 @@ static int __init hw_break_module_init(void)
attr.bp_addr = kallsyms_lookup_name(ksym_name);
attr.bp_len = HW_BREAKPOINT_LEN_4;
attr.bp_type = HW_BREAKPOINT_W | HW_BREAKPOINT_R;
+   attr.step_needed = true;
 
sample_hbp = register_wide_hw_breakpoint(, sample_hbp_handler, 
NULL);
if (IS_ERR((void __force *)sample_hbp)) {
-- 
2.9.3



[PATCH V2 0/4] ARM64: Fix irq generation between breakpoint and step exception

2017-07-07 Thread Pratyush Anand
v1 was here http://marc.info/?l=linux-arm-kernel=149910958418708=2

v1 -> v2:
- patch 1 of v1 has been modified to patch 1-3 of v2.
- Introduced a new event attribute step_needed and implemented
  hw_breakpoint_needs_single_step() (patch 1)
- Replaced usage of is_default_overflow_handler() with
  hw_breakpoint_needs_single_step(). (patch 2)
- Modified sample test to set set step_needed bit field (patch 3)

samples/hw_breakpoint/data_breakpoint.c passes with x86_64 but fails with
ARM64. Even though it has been NAKed previously on upstream [1, 2], I have
tried to come up with patches which can resolve it for ARM64 as well.

I noticed that even perf step exception can go into an infinite loop if CPU
receives an interrupt while executing breakpoint/watchpoint handler. So,
event though we are not concerned about above test, we will have to find a
solution for the perf issue.

This patchset attempts to resolve both the issue. Please review.

[1] http://marc.info/?l=linux-arm-kernel=149580777524910=2
[2] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-April/425266.html

Pratyush Anand (4):
  hw_breakpoint: Add step_needed event attribute
  arm64: use hw_breakpoint_needs_single_step() to decide if step is
needed
  hw-breakpoint: sample test: set step_needed bit field
  arm64: disable irq between breakpoint and step exception

 arch/arm64/kernel/debug-monitors.c  |  3 +++
 arch/arm64/kernel/hw_breakpoint.c   | 10 +-
 arch/arm64/mm/fault.c   | 22 ++
 include/linux/hw_breakpoint.h   |  6 ++
 include/uapi/linux/perf_event.h |  3 ++-
 kernel/events/core.c|  2 ++
 samples/hw_breakpoint/data_breakpoint.c |  1 +
 tools/include/uapi/linux/perf_event.h   |  3 ++-
 8 files changed, 39 insertions(+), 11 deletions(-)

-- 
2.9.3



[PATCH V2 4/4] arm64: disable irq between breakpoint and step exception

2017-07-07 Thread Pratyush Anand
If an interrupt is generated between breakpoint and step handler then
step handler can not get correct step address. This situation can easily
be invoked by samples/hw_breakpoint/data_breakpoint.c. It can also be
reproduced if we insert any printk() statement or dump_stack() in perf
overflow_handler. So, it seems that perf is working fine just luckily.
If the CPU which is handling perf breakpoint handler receives any
interrupt then, perf step handler will not execute sanely.

This patch improves do_debug_exception() handling, which enforces now,
that exception handler function:
- should return 0 for any software breakpoint and hw
breakpoint/watchpoint handler if it does not expect a single step stage
- should return 1 if it expects single step.
- A single step handler should always return 0.
- All handler should return a -ve error in any other case.

Now, we can know in do_debug_exception() that whether a step exception
will be followed or not. If there will a step exception then disable
irq. Re-enable it after single step handling.

Signed-off-by: Pratyush Anand <pan...@redhat.com>
---
 arch/arm64/kernel/debug-monitors.c |  3 +++
 arch/arm64/kernel/hw_breakpoint.c  |  4 ++--
 arch/arm64/mm/fault.c  | 22 ++
 3 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/kernel/debug-monitors.c 
b/arch/arm64/kernel/debug-monitors.c
index d618e25c3de1..16f29f853b54 100644
--- a/arch/arm64/kernel/debug-monitors.c
+++ b/arch/arm64/kernel/debug-monitors.c
@@ -325,6 +325,9 @@ static int brk_handler(unsigned long addr, unsigned int esr,
return -EFAULT;
}
 
+   if (kernel_active_single_step() || test_thread_flag(TIF_SINGLESTEP))
+   return 1;
+
return 0;
 }
 NOKPROBE_SYMBOL(brk_handler);
diff --git a/arch/arm64/kernel/hw_breakpoint.c 
b/arch/arm64/kernel/hw_breakpoint.c
index 9a73f85ab9ad..d39b8039c70e 100644
--- a/arch/arm64/kernel/hw_breakpoint.c
+++ b/arch/arm64/kernel/hw_breakpoint.c
@@ -697,7 +697,7 @@ static int breakpoint_handler(unsigned long unused, 
unsigned int esr,
}
}
 
-   return 0;
+   return 1;
 }
 NOKPROBE_SYMBOL(breakpoint_handler);
 
@@ -840,7 +840,7 @@ static int watchpoint_handler(unsigned long addr, unsigned 
int esr,
}
}
 
-   return 0;
+   return 1;
 }
 NOKPROBE_SYMBOL(watchpoint_handler);
 
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 37b95dff0b07..ce5290dacba3 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -653,6 +653,13 @@ static struct fault_info __refdata debug_fault_info[] = {
{ do_bad,   SIGBUS, 0,  "unknown 7" 
},
 };
 
+/*
+ * fn should return 0 from any software breakpoint and hw
+ * breakpoint/watchpoint handler if it does not expect a single step stage
+ * and 1 if it expects single step followed by its execution. A single step
+ * handler should always return 0. All handler should return a -ve error in
+ * any other case.
+ */
 void __init hook_debug_fault_code(int nr,
  int (*fn)(unsigned long, unsigned int, struct 
pt_regs *),
  int sig, int code, const char *name)
@@ -665,6 +672,8 @@ void __init hook_debug_fault_code(int nr,
debug_fault_info[nr].name   = name;
 }
 
+static DEFINE_PER_CPU(bool, irq_enable_needed);
+
 asmlinkage int __exception do_debug_exception(unsigned long addr,
  unsigned int esr,
  struct pt_regs *regs)
@@ -672,6 +681,7 @@ asmlinkage int __exception do_debug_exception(unsigned long 
addr,
const struct fault_info *inf = debug_fault_info + DBG_ESR_EVT(esr);
struct siginfo info;
int rv;
+   bool *irq_en_needed = this_cpu_ptr(_enable_needed);
 
/*
 * Tell lockdep we disabled irqs in entry.S. Do nothing if they were
@@ -680,9 +690,8 @@ asmlinkage int __exception do_debug_exception(unsigned long 
addr,
if (interrupts_enabled(regs))
trace_hardirqs_off();
 
-   if (!inf->fn(addr, esr, regs)) {
-   rv = 1;
-   } else {
+   rv = inf->fn(addr, esr, regs);
+   if (rv < 0) {
pr_alert("Unhandled debug exception: %s (0x%08x) at 0x%016lx\n",
 inf->name, esr, addr);
 
@@ -691,7 +700,12 @@ asmlinkage int __exception do_debug_exception(unsigned 
long addr,
info.si_code  = inf->code;
info.si_addr  = (void __user *)addr;
arm64_notify_die("", regs, , 0);
-   rv = 0;
+   } else if (rv == 1 && interrupts_enabled(regs)) {
+   regs->pstate |= PSR_I_BIT;
+   *irq_en_needed = true;
+   } else if (rv == 0 && *irq_en_needed) {
+   regs->pstate &= ~PSR_I_BIT;
+ 

[PATCH V2 0/4] ARM64: Fix irq generation between breakpoint and step exception

2017-07-07 Thread Pratyush Anand
v1 was here http://marc.info/?l=linux-arm-kernel=149910958418708=2

v1 -> v2:
- patch 1 of v1 has been modified to patch 1-3 of v2.
- Introduced a new event attribute step_needed and implemented
  hw_breakpoint_needs_single_step() (patch 1)
- Replaced usage of is_default_overflow_handler() with
  hw_breakpoint_needs_single_step(). (patch 2)
- Modified sample test to set set step_needed bit field (patch 3)

samples/hw_breakpoint/data_breakpoint.c passes with x86_64 but fails with
ARM64. Even though it has been NAKed previously on upstream [1, 2], I have
tried to come up with patches which can resolve it for ARM64 as well.

I noticed that even perf step exception can go into an infinite loop if CPU
receives an interrupt while executing breakpoint/watchpoint handler. So,
event though we are not concerned about above test, we will have to find a
solution for the perf issue.

This patchset attempts to resolve both the issue. Please review.

[1] http://marc.info/?l=linux-arm-kernel=149580777524910=2
[2] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-April/425266.html

Pratyush Anand (4):
  hw_breakpoint: Add step_needed event attribute
  arm64: use hw_breakpoint_needs_single_step() to decide if step is
needed
  hw-breakpoint: sample test: set step_needed bit field
  arm64: disable irq between breakpoint and step exception

 arch/arm64/kernel/debug-monitors.c  |  3 +++
 arch/arm64/kernel/hw_breakpoint.c   | 10 +-
 arch/arm64/mm/fault.c   | 22 ++
 include/linux/hw_breakpoint.h   |  6 ++
 include/uapi/linux/perf_event.h |  3 ++-
 kernel/events/core.c|  2 ++
 samples/hw_breakpoint/data_breakpoint.c |  1 +
 tools/include/uapi/linux/perf_event.h   |  3 ++-
 8 files changed, 39 insertions(+), 11 deletions(-)

-- 
2.9.3



[PATCH V2 4/4] arm64: disable irq between breakpoint and step exception

2017-07-07 Thread Pratyush Anand
If an interrupt is generated between breakpoint and step handler then
step handler can not get correct step address. This situation can easily
be invoked by samples/hw_breakpoint/data_breakpoint.c. It can also be
reproduced if we insert any printk() statement or dump_stack() in perf
overflow_handler. So, it seems that perf is working fine just luckily.
If the CPU which is handling perf breakpoint handler receives any
interrupt then, perf step handler will not execute sanely.

This patch improves do_debug_exception() handling, which enforces now,
that exception handler function:
- should return 0 for any software breakpoint and hw
breakpoint/watchpoint handler if it does not expect a single step stage
- should return 1 if it expects single step.
- A single step handler should always return 0.
- All handler should return a -ve error in any other case.

Now, we can know in do_debug_exception() that whether a step exception
will be followed or not. If there will a step exception then disable
irq. Re-enable it after single step handling.

Signed-off-by: Pratyush Anand 
---
 arch/arm64/kernel/debug-monitors.c |  3 +++
 arch/arm64/kernel/hw_breakpoint.c  |  4 ++--
 arch/arm64/mm/fault.c  | 22 ++
 3 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/kernel/debug-monitors.c 
b/arch/arm64/kernel/debug-monitors.c
index d618e25c3de1..16f29f853b54 100644
--- a/arch/arm64/kernel/debug-monitors.c
+++ b/arch/arm64/kernel/debug-monitors.c
@@ -325,6 +325,9 @@ static int brk_handler(unsigned long addr, unsigned int esr,
return -EFAULT;
}
 
+   if (kernel_active_single_step() || test_thread_flag(TIF_SINGLESTEP))
+   return 1;
+
return 0;
 }
 NOKPROBE_SYMBOL(brk_handler);
diff --git a/arch/arm64/kernel/hw_breakpoint.c 
b/arch/arm64/kernel/hw_breakpoint.c
index 9a73f85ab9ad..d39b8039c70e 100644
--- a/arch/arm64/kernel/hw_breakpoint.c
+++ b/arch/arm64/kernel/hw_breakpoint.c
@@ -697,7 +697,7 @@ static int breakpoint_handler(unsigned long unused, 
unsigned int esr,
}
}
 
-   return 0;
+   return 1;
 }
 NOKPROBE_SYMBOL(breakpoint_handler);
 
@@ -840,7 +840,7 @@ static int watchpoint_handler(unsigned long addr, unsigned 
int esr,
}
}
 
-   return 0;
+   return 1;
 }
 NOKPROBE_SYMBOL(watchpoint_handler);
 
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 37b95dff0b07..ce5290dacba3 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -653,6 +653,13 @@ static struct fault_info __refdata debug_fault_info[] = {
{ do_bad,   SIGBUS, 0,  "unknown 7" 
},
 };
 
+/*
+ * fn should return 0 from any software breakpoint and hw
+ * breakpoint/watchpoint handler if it does not expect a single step stage
+ * and 1 if it expects single step followed by its execution. A single step
+ * handler should always return 0. All handler should return a -ve error in
+ * any other case.
+ */
 void __init hook_debug_fault_code(int nr,
  int (*fn)(unsigned long, unsigned int, struct 
pt_regs *),
  int sig, int code, const char *name)
@@ -665,6 +672,8 @@ void __init hook_debug_fault_code(int nr,
debug_fault_info[nr].name   = name;
 }
 
+static DEFINE_PER_CPU(bool, irq_enable_needed);
+
 asmlinkage int __exception do_debug_exception(unsigned long addr,
  unsigned int esr,
  struct pt_regs *regs)
@@ -672,6 +681,7 @@ asmlinkage int __exception do_debug_exception(unsigned long 
addr,
const struct fault_info *inf = debug_fault_info + DBG_ESR_EVT(esr);
struct siginfo info;
int rv;
+   bool *irq_en_needed = this_cpu_ptr(_enable_needed);
 
/*
 * Tell lockdep we disabled irqs in entry.S. Do nothing if they were
@@ -680,9 +690,8 @@ asmlinkage int __exception do_debug_exception(unsigned long 
addr,
if (interrupts_enabled(regs))
trace_hardirqs_off();
 
-   if (!inf->fn(addr, esr, regs)) {
-   rv = 1;
-   } else {
+   rv = inf->fn(addr, esr, regs);
+   if (rv < 0) {
pr_alert("Unhandled debug exception: %s (0x%08x) at 0x%016lx\n",
 inf->name, esr, addr);
 
@@ -691,7 +700,12 @@ asmlinkage int __exception do_debug_exception(unsigned 
long addr,
info.si_code  = inf->code;
info.si_addr  = (void __user *)addr;
arm64_notify_die("", regs, , 0);
-   rv = 0;
+   } else if (rv == 1 && interrupts_enabled(regs)) {
+   regs->pstate |= PSR_I_BIT;
+   *irq_en_needed = true;
+   } else if (rv == 0 && *irq_en_needed) {
+   regs->pstate &= ~PSR_I_BIT;
+   *irq_en_needed = false;
}
 
if (interrupts_enabled(regs))
-- 
2.9.3



[PATCH V2 1/4] hw_breakpoint: Add step_needed event attribute

2017-07-07 Thread Pratyush Anand
Architecture like ARM64 currently allows to use default hw breakpoint
single step handler only to perf. However, some other users like few
systemtap tests or kernel test in
samples/hw_breakpoint/data_breakpoint.c can also work with default step
handler implementation. At the same time, some other like GDB/ptrace may
implement their own step handler.

Therefore, this patch introduces a new perf_event_attr bit field, so
that arch specific code(specially on arm64) can make a decision to
enable single stepping.

Any architecture which is not using this field will not have any
side effect.

Signed-off-by: Pratyush Anand <pan...@redhat.com>
---
 include/linux/hw_breakpoint.h | 6 ++
 include/uapi/linux/perf_event.h   | 3 ++-
 kernel/events/core.c  | 2 ++
 tools/include/uapi/linux/perf_event.h | 3 ++-
 4 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/include/linux/hw_breakpoint.h b/include/linux/hw_breakpoint.h
index 0464c85e63fd..6173ae048cbc 100644
--- a/include/linux/hw_breakpoint.h
+++ b/include/linux/hw_breakpoint.h
@@ -38,6 +38,12 @@ static inline int hw_breakpoint_type(struct perf_event *bp)
return bp->attr.bp_type;
 }
 
+static inline bool
+hw_breakpoint_needs_single_step(struct perf_event *bp)
+{
+   return bp->attr.step_needed;
+}
+
 static inline unsigned long hw_breakpoint_len(struct perf_event *bp)
 {
return bp->attr.bp_len;
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index b1c0b187acfe..00935808de0d 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -345,7 +345,8 @@ struct perf_event_attr {
context_switch :  1, /* context switch data */
write_backward :  1, /* Write ring buffer from 
end to beginning */
namespaces :  1, /* include namespaces data 
*/
-   __reserved_1   : 35;
+   step_needed:  1, /* Use arch step handler */
+   __reserved_1   : 34;
 
union {
__u32   wakeup_events;/* wakeup every n events */
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 6c4e523dc1e2..220e26941475 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -9444,9 +9444,11 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
} else if (is_write_backward(event)){
event->overflow_handler = perf_event_output_backward;
event->overflow_handler_context = NULL;
+   event->attr.step_needed = 1;
} else {
event->overflow_handler = perf_event_output_forward;
event->overflow_handler_context = NULL;
+   event->attr.step_needed = 1;
}
 
perf_event__state_init(event);
diff --git a/tools/include/uapi/linux/perf_event.h 
b/tools/include/uapi/linux/perf_event.h
index b1c0b187acfe..00935808de0d 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -345,7 +345,8 @@ struct perf_event_attr {
context_switch :  1, /* context switch data */
write_backward :  1, /* Write ring buffer from 
end to beginning */
namespaces :  1, /* include namespaces data 
*/
-   __reserved_1   : 35;
+   step_needed:  1, /* Use arch step handler */
+   __reserved_1   : 34;
 
union {
__u32   wakeup_events;/* wakeup every n events */
-- 
2.9.3



[PATCH V2 1/4] hw_breakpoint: Add step_needed event attribute

2017-07-07 Thread Pratyush Anand
Architecture like ARM64 currently allows to use default hw breakpoint
single step handler only to perf. However, some other users like few
systemtap tests or kernel test in
samples/hw_breakpoint/data_breakpoint.c can also work with default step
handler implementation. At the same time, some other like GDB/ptrace may
implement their own step handler.

Therefore, this patch introduces a new perf_event_attr bit field, so
that arch specific code(specially on arm64) can make a decision to
enable single stepping.

Any architecture which is not using this field will not have any
side effect.

Signed-off-by: Pratyush Anand 
---
 include/linux/hw_breakpoint.h | 6 ++
 include/uapi/linux/perf_event.h   | 3 ++-
 kernel/events/core.c  | 2 ++
 tools/include/uapi/linux/perf_event.h | 3 ++-
 4 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/include/linux/hw_breakpoint.h b/include/linux/hw_breakpoint.h
index 0464c85e63fd..6173ae048cbc 100644
--- a/include/linux/hw_breakpoint.h
+++ b/include/linux/hw_breakpoint.h
@@ -38,6 +38,12 @@ static inline int hw_breakpoint_type(struct perf_event *bp)
return bp->attr.bp_type;
 }
 
+static inline bool
+hw_breakpoint_needs_single_step(struct perf_event *bp)
+{
+   return bp->attr.step_needed;
+}
+
 static inline unsigned long hw_breakpoint_len(struct perf_event *bp)
 {
return bp->attr.bp_len;
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index b1c0b187acfe..00935808de0d 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -345,7 +345,8 @@ struct perf_event_attr {
context_switch :  1, /* context switch data */
write_backward :  1, /* Write ring buffer from 
end to beginning */
namespaces :  1, /* include namespaces data 
*/
-   __reserved_1   : 35;
+   step_needed:  1, /* Use arch step handler */
+   __reserved_1   : 34;
 
union {
__u32   wakeup_events;/* wakeup every n events */
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 6c4e523dc1e2..220e26941475 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -9444,9 +9444,11 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
} else if (is_write_backward(event)){
event->overflow_handler = perf_event_output_backward;
event->overflow_handler_context = NULL;
+   event->attr.step_needed = 1;
} else {
event->overflow_handler = perf_event_output_forward;
event->overflow_handler_context = NULL;
+   event->attr.step_needed = 1;
}
 
perf_event__state_init(event);
diff --git a/tools/include/uapi/linux/perf_event.h 
b/tools/include/uapi/linux/perf_event.h
index b1c0b187acfe..00935808de0d 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -345,7 +345,8 @@ struct perf_event_attr {
context_switch :  1, /* context switch data */
write_backward :  1, /* Write ring buffer from 
end to beginning */
namespaces :  1, /* include namespaces data 
*/
-   __reserved_1   : 35;
+   step_needed:  1, /* Use arch step handler */
+   __reserved_1   : 34;
 
union {
__u32   wakeup_events;/* wakeup every n events */
-- 
2.9.3



Re: [RFC, 2/2] kallsyms: Do not display SyS_foo() syscall aliases in kallsyms

2017-06-15 Thread Pratyush Anand



On Thursday 15 June 2017 08:00 PM, Steven Rostedt wrote:

On Thu, 15 Jun 2017 19:56:40 +0530
Pratyush Anand <pan...@redhat.com> wrote:


From: Steven Rostedt <rost...@goodmis.org>


Hi Steve,

Is there any plan for next revision of this patchset.



Heh, I forgot about this. Actually, I didn't.  I was just thinking
about the issue of SyS vs sys, but forgot I had patches to fix it.

I should push it again.

Thanks for the reminder.


Thanks for the quick reply :-)

Pratyush



Re: [RFC, 2/2] kallsyms: Do not display SyS_foo() syscall aliases in kallsyms

2017-06-15 Thread Pratyush Anand



On Thursday 15 June 2017 08:00 PM, Steven Rostedt wrote:

On Thu, 15 Jun 2017 19:56:40 +0530
Pratyush Anand  wrote:


From: Steven Rostedt 


Hi Steve,

Is there any plan for next revision of this patchset.



Heh, I forgot about this. Actually, I didn't.  I was just thinking
about the issue of SyS vs sys, but forgot I had patches to fix it.

I should push it again.

Thanks for the reminder.


Thanks for the quick reply :-)

Pratyush



Re: [RFC, 2/2] kallsyms: Do not display SyS_foo() syscall aliases in kallsyms

2017-06-15 Thread Pratyush Anand



On Thursday 15 June 2017 07:56 PM, Pratyush Anand wrote:

From: Steven Rostedt <rost...@goodmis.org>


Sorry, Please ignore this. I did not had this patch in my inbox and replied 
through patchwork mailbox. Forgot to change the "from" field. Here is the 
patch link, which could resolve issues with trace filtering of aliased system 
calls.


https://patchwork.kernel.org/patch/6351951/

Please let me know if I can help in working for next revision.

Pratyush



Hi Steve,

Is there any plan for next revision of this patchset.

Regards
Pratyush



Re: [RFC, 2/2] kallsyms: Do not display SyS_foo() syscall aliases in kallsyms

2017-06-15 Thread Pratyush Anand



On Thursday 15 June 2017 07:56 PM, Pratyush Anand wrote:

From: Steven Rostedt 


Sorry, Please ignore this. I did not had this patch in my inbox and replied 
through patchwork mailbox. Forgot to change the "from" field. Here is the 
patch link, which could resolve issues with trace filtering of aliased system 
calls.


https://patchwork.kernel.org/patch/6351951/

Please let me know if I can help in working for next revision.

Pratyush



Hi Steve,

Is there any plan for next revision of this patchset.

Regards
Pratyush



[RFC, 2/2] kallsyms: Do not display SyS_foo() syscall aliases in kallsyms

2017-06-15 Thread Pratyush Anand
From: Steven Rostedt 


Hi Steve,

Is there any plan for next revision of this patchset.

Regards
Pratyush



[RFC, 2/2] kallsyms: Do not display SyS_foo() syscall aliases in kallsyms

2017-06-15 Thread Pratyush Anand
From: Steven Rostedt 


Hi Steve,

Is there any plan for next revision of this patchset.

Regards
Pratyush



[PATCH] mei: make sysfs modalias format similar as uevent modalias

2017-05-24 Thread Pratyush Anand
modprobe is not able to resolve sysfs modalias for mei devices.

 # cat
/sys/class/watchdog/watchdog0/device/watchdog/watchdog0/device/modalias
mei::05b79a6f-4628-4d7f-899d-a91514cb32ab:
 # modprobe --set-version 4.9.6-200.fc25.x86_64 -R
mei::05b79a6f-4628-4d7f-899d-a91514cb32ab:
modprobe: FATAL: Module mei::05b79a6f-4628-4d7f-899d-a91514cb32ab: not
found in directory /lib/modules/4.9.6-200.fc25.x86_64
 # cat /lib/modules/4.9.6-200.fc25.x86_64/modules.alias | grep
05b79a6f-4628-4d7f-899d-a91514cb32ab
alias mei:*:05b79a6f-4628-4d7f-899d-a91514cb32ab:*:* mei_wdt

commit b26864cad1c9 changed uevent modalias as mei:S:uuid:N:*, however
sysfs modalias is still in formmat mei:S:uuid:*.

This patch equates format of uevent and sysfs modalias so that modprobe
is able to resolve the aliases.

Signed-off-by: Pratyush Anand <pan...@redhat.com>
---
 drivers/misc/mei/bus.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/misc/mei/bus.c b/drivers/misc/mei/bus.c
index d1928fdd0f43..07aad8576334 100644
--- a/drivers/misc/mei/bus.c
+++ b/drivers/misc/mei/bus.c
@@ -763,8 +763,10 @@ static ssize_t modalias_show(struct device *dev, struct 
device_attribute *a,
 {
struct mei_cl_device *cldev = to_mei_cl_device(dev);
const uuid_le *uuid = mei_me_cl_uuid(cldev->me_cl);
+   u8 version = mei_me_cl_ver(cldev->me_cl);
 
-   return scnprintf(buf, PAGE_SIZE, "mei:%s:%pUl:", cldev->name, uuid);
+   return scnprintf(buf, PAGE_SIZE, "mei:%s:%pUl:%02X:",
+cldev->name, uuid, version);
 }
 static DEVICE_ATTR_RO(modalias);
 
-- 
2.9.3



[PATCH] mei: make sysfs modalias format similar as uevent modalias

2017-05-24 Thread Pratyush Anand
modprobe is not able to resolve sysfs modalias for mei devices.

 # cat
/sys/class/watchdog/watchdog0/device/watchdog/watchdog0/device/modalias
mei::05b79a6f-4628-4d7f-899d-a91514cb32ab:
 # modprobe --set-version 4.9.6-200.fc25.x86_64 -R
mei::05b79a6f-4628-4d7f-899d-a91514cb32ab:
modprobe: FATAL: Module mei::05b79a6f-4628-4d7f-899d-a91514cb32ab: not
found in directory /lib/modules/4.9.6-200.fc25.x86_64
 # cat /lib/modules/4.9.6-200.fc25.x86_64/modules.alias | grep
05b79a6f-4628-4d7f-899d-a91514cb32ab
alias mei:*:05b79a6f-4628-4d7f-899d-a91514cb32ab:*:* mei_wdt

commit b26864cad1c9 changed uevent modalias as mei:S:uuid:N:*, however
sysfs modalias is still in formmat mei:S:uuid:*.

This patch equates format of uevent and sysfs modalias so that modprobe
is able to resolve the aliases.

Signed-off-by: Pratyush Anand 
---
 drivers/misc/mei/bus.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/misc/mei/bus.c b/drivers/misc/mei/bus.c
index d1928fdd0f43..07aad8576334 100644
--- a/drivers/misc/mei/bus.c
+++ b/drivers/misc/mei/bus.c
@@ -763,8 +763,10 @@ static ssize_t modalias_show(struct device *dev, struct 
device_attribute *a,
 {
struct mei_cl_device *cldev = to_mei_cl_device(dev);
const uuid_le *uuid = mei_me_cl_uuid(cldev->me_cl);
+   u8 version = mei_me_cl_ver(cldev->me_cl);
 
-   return scnprintf(buf, PAGE_SIZE, "mei:%s:%pUl:", cldev->name, uuid);
+   return scnprintf(buf, PAGE_SIZE, "mei:%s:%pUl:%02X:",
+cldev->name, uuid, version);
 }
 static DEVICE_ATTR_RO(modalias);
 
-- 
2.9.3



Re: [PATCH v2] kexec/kdump: Minor Documentation updates for arm64 and Image

2017-05-22 Thread Pratyush Anand



On Thursday 18 May 2017 04:23 PM, Bharat Bhushan wrote:

This patch have minor updates in Documentation for arm64i as

arm64

relocatable kernel.
Also this patch updates documentation for using uncompressed
image "Image" which is used for ARM64.

Signed-off-by: Bharat Bhushan <bharat.bhus...@nxp.com>
---
v1->v2
  - "a uncompressed" replaced with "an uncompressed"


Reviewed-by: Pratyush Anand <pan...@redhat.com>



 Documentation/kdump/kdump.txt | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/Documentation/kdump/kdump.txt b/Documentation/kdump/kdump.txt
index 615434d..5181445 100644
--- a/Documentation/kdump/kdump.txt
+++ b/Documentation/kdump/kdump.txt
@@ -112,8 +112,8 @@ There are two possible methods of using Kdump.
 2) Or use the system kernel binary itself as dump-capture kernel and there is
no need to build a separate dump-capture kernel. This is possible
only with the architectures which support a relocatable kernel. As
-   of today, i386, x86_64, ppc64, ia64 and arm architectures support 
relocatable
-   kernel.
+   of today, i386, x86_64, ppc64, ia64, arm and arm64 architectures support
+   relocatable kernel.

 Building a relocatable kernel is advantageous from the point of view that
 one does not have to build a second kernel for capturing the dump. But
@@ -339,7 +339,7 @@ For arm:
 For arm64:
- Use vmlinux or Image

-If you are using a uncompressed vmlinux image then use following command
+If you are using an uncompressed vmlinux image then use following command
 to load dump-capture kernel.

kexec -p  \
@@ -361,6 +361,12 @@ to load dump-capture kernel.
--dtb= \
--append="root= "

+If you are using an uncompressed Image, then use following command
+to load dump-capture kernel.
+
+   kexec -p  \
+   --initrd= \
+   --append="root= "

 Please note, that --args-linux does not need to be specified for ia64.
 It is planned to make this a no-op on that architecture, but for now



Re: [PATCH v2] kexec/kdump: Minor Documentation updates for arm64 and Image

2017-05-22 Thread Pratyush Anand



On Thursday 18 May 2017 04:23 PM, Bharat Bhushan wrote:

This patch have minor updates in Documentation for arm64i as

arm64

relocatable kernel.
Also this patch updates documentation for using uncompressed
image "Image" which is used for ARM64.

Signed-off-by: Bharat Bhushan 
---
v1->v2
  - "a uncompressed" replaced with "an uncompressed"


Reviewed-by: Pratyush Anand 



 Documentation/kdump/kdump.txt | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/Documentation/kdump/kdump.txt b/Documentation/kdump/kdump.txt
index 615434d..5181445 100644
--- a/Documentation/kdump/kdump.txt
+++ b/Documentation/kdump/kdump.txt
@@ -112,8 +112,8 @@ There are two possible methods of using Kdump.
 2) Or use the system kernel binary itself as dump-capture kernel and there is
no need to build a separate dump-capture kernel. This is possible
only with the architectures which support a relocatable kernel. As
-   of today, i386, x86_64, ppc64, ia64 and arm architectures support 
relocatable
-   kernel.
+   of today, i386, x86_64, ppc64, ia64, arm and arm64 architectures support
+   relocatable kernel.

 Building a relocatable kernel is advantageous from the point of view that
 one does not have to build a second kernel for capturing the dump. But
@@ -339,7 +339,7 @@ For arm:
 For arm64:
- Use vmlinux or Image

-If you are using a uncompressed vmlinux image then use following command
+If you are using an uncompressed vmlinux image then use following command
 to load dump-capture kernel.

kexec -p  \
@@ -361,6 +361,12 @@ to load dump-capture kernel.
--dtb= \
--append="root= "

+If you are using an uncompressed Image, then use following command
+to load dump-capture kernel.
+
+   kexec -p  \
+   --initrd= \
+   --append="root= "

 Please note, that --args-linux does not need to be specified for ia64.
 It is planned to make this a no-op on that architecture, but for now



Re: [PATCH] kexec/kdump: Minor Documentation updates for arm64 and Image

2017-05-22 Thread Pratyush Anand



On Monday 22 May 2017 12:19 PM, Bharat Bhushan wrote:

On Friday 19 May 2017 09:15 AM, AKASHI Takahiro wrote:

+to load dump-capture kernel.
+
+   kexec -p  \
+   --initrd= \
+   --append="root= "

For uncompressed Image, dtb is not necessary?

Just for clarification, dtb is optional for both vmlinux and Image on
arm64. (This means you can specify it if you want.) But this is also
true for initrd and append(command line) to some extent.


Yes, I agree.


Should I mention "-dtb" also for "Image"?


No,I think it is fine.

This documentation is representing a typical use case and so above changes is 
OK for me. I think,your v2 is fine.


~Pratyush



Also do we need to mention that it is optional somewhere in this document? I do not see 
"optional" is mentioned for other parameters and architecture.

Does this looks ok:

" -dtb= \"

Thanks
-Bharat



More precisely, whether these parameters are optional or not will
depend on architectures, not formats, I believe.


May be not architecture, rather a distro environment.

For example, we should be able to work without --initrd for any arch if kernel
has been compiled by configuring CONFG_INITRAMFS_SOURCE.

~Pratyush




Re: [PATCH] kexec/kdump: Minor Documentation updates for arm64 and Image

2017-05-22 Thread Pratyush Anand



On Monday 22 May 2017 12:19 PM, Bharat Bhushan wrote:

On Friday 19 May 2017 09:15 AM, AKASHI Takahiro wrote:

+to load dump-capture kernel.
+
+   kexec -p  \
+   --initrd= \
+   --append="root= "

For uncompressed Image, dtb is not necessary?

Just for clarification, dtb is optional for both vmlinux and Image on
arm64. (This means you can specify it if you want.) But this is also
true for initrd and append(command line) to some extent.


Yes, I agree.


Should I mention "-dtb" also for "Image"?


No,I think it is fine.

This documentation is representing a typical use case and so above changes is 
OK for me. I think,your v2 is fine.


~Pratyush



Also do we need to mention that it is optional somewhere in this document? I do not see 
"optional" is mentioned for other parameters and architecture.

Does this looks ok:

" -dtb= \"

Thanks
-Bharat



More precisely, whether these parameters are optional or not will
depend on architectures, not formats, I believe.


May be not architecture, rather a distro environment.

For example, we should be able to work without --initrd for any arch if kernel
has been compiled by configuring CONFG_INITRAMFS_SOURCE.

~Pratyush




Re: [PATCH] kexec/kdump: Minor Documentation updates for arm64 and Image

2017-05-22 Thread Pratyush Anand



On Friday 19 May 2017 09:15 AM, AKASHI Takahiro wrote:

+to load dump-capture kernel.
+
+   kexec -p  \
+   --initrd= \
+   --append="root= "

For uncompressed Image, dtb is not necessary?

Just for clarification, dtb is optional for both vmlinux and Image
on arm64. (This means you can specify it if you want.)
But this is also true for initrd and append(command line) to some extent.


Yes, I agree.


More precisely, whether these parameters are optional or not will
depend on architectures, not formats, I believe.


May be not architecture, rather a distro environment.

For example, we should be able to work without --initrd for any arch if kernel 
has been compiled by configuring CONFG_INITRAMFS_SOURCE.


~Pratyush



Re: [PATCH] kexec/kdump: Minor Documentation updates for arm64 and Image

2017-05-22 Thread Pratyush Anand



On Friday 19 May 2017 09:15 AM, AKASHI Takahiro wrote:

+to load dump-capture kernel.
+
+   kexec -p  \
+   --initrd= \
+   --append="root= "

For uncompressed Image, dtb is not necessary?

Just for clarification, dtb is optional for both vmlinux and Image
on arm64. (This means you can specify it if you want.)
But this is also true for initrd and append(command line) to some extent.


Yes, I agree.


More precisely, whether these parameters are optional or not will
depend on architectures, not formats, I believe.


May be not architecture, rather a distro environment.

For example, we should be able to work without --initrd for any arch if kernel 
has been compiled by configuring CONFG_INITRAMFS_SOURCE.


~Pratyush



Re: [PATCH 1/2] perf probe: Fix wrong register name for arm64

2017-05-03 Thread Pratyush Anand
Just noticed that this patch fixes commit e47392bf9c06 (v4.8+). So,
Ccing sta...@vger.kernel.org
[e47392bf9c06 perf uprobe: Skip prologue if program compiled without
optimization]

After e47392bf9c06 on ARM64:
# cat > test.c << EOF
> #include 
>
> int main(int argc, void **argv)
> {
> printf("argc =%d\n", argc);
> }
> EOF
# gcc -g  -o test test.c
# perf probe -x ./test -a 'main argc'
Target program is compiled without optimization. Skipping prologue.
Probe on address 0x400600 to force probing at the function entry.

Failed to write event: Invalid argument
  Error: Failed to add events.

If this patch is applied on top of e47392bf9c06:
perf probe -x ./test -a 'main argc'
Target program is compiled without optimization. Skipping prologue.
Probe on address 0x400600 to force probing at the function entry.

Added new event:
  probe_test:main  (on main in
/home/panand/work/test_code/uprobe_test/test with argc)

You can now use it in all perf tools, such as:

perf record -e probe_test:main -aR sleep 1

~Pratyush

On Tue, Jan 24, 2017 at 4:00 PM, He Kuang  wrote:
> The register name of arm64 architecture is x0-x31 not r0-r31, this
> patch changes this typo.
>
> Before this patch:
>
>   # perf probe --definition 'sys_write count'
>   p:probe/sys_write _text+1502872 count=%r2:s64
>
>   # echo 'p:probe/sys_write _text+1502872 count=%r2:s64' > \
> /sys/kernel/debug/tracing/kprobe_events
>   Parse error at argument[0]. (-22)
>
> After this patch:
>
>   # perf probe --definition 'sys_write count'
>   p:probe/sys_write _text+1502872 count=%x2:s64
>
>   # echo 'p:probe/sys_write _text+1502872 count=%x2:s64' > \
> /sys/kernel/debug/tracing/kprobe_events
>   # echo 1 >/sys/kernel/debug/tracing/events/probe/enable
>   # cat /sys/kernel/debug/tracing/trace
>   ...
>   sh-422   [000] d... 650.495930: sys_write: (SyS_write+0x0/0xc8) count=22
>   sh-422   [000] d... 651.102389: sys_write: (SyS_write+0x0/0xc8) count=26
>   sh-422   [000] d... 651.358653: sys_write: (SyS_write+0x0/0xc8) count=86
>
> Signed-off-by: He Kuang 
> ---
>  tools/perf/arch/arm64/include/dwarf-regs-table.h | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/tools/perf/arch/arm64/include/dwarf-regs-table.h 
> b/tools/perf/arch/arm64/include/dwarf-regs-table.h
> index 2675936..36e375f 100644
> --- a/tools/perf/arch/arm64/include/dwarf-regs-table.h
> +++ b/tools/perf/arch/arm64/include/dwarf-regs-table.h
> @@ -2,12 +2,12 @@
>  /* This is included in perf/util/dwarf-regs.c */
>
>  static const char * const aarch64_regstr_tbl[] = {
> -   "%r0", "%r1", "%r2", "%r3", "%r4",
> -   "%r5", "%r6", "%r7", "%r8", "%r9",
> -   "%r10", "%r11", "%r12", "%r13", "%r14",
> -   "%r15", "%r16", "%r17", "%r18", "%r19",
> -   "%r20", "%r21", "%r22", "%r23", "%r24",
> -   "%r25", "%r26", "%r27", "%r28", "%r29",
> +   "%x0", "%x1", "%x2", "%x3", "%x4",
> +   "%x5", "%x6", "%x7", "%x8", "%x9",
> +   "%x10", "%x11", "%x12", "%x13", "%x14",
> +   "%x15", "%x16", "%x17", "%x18", "%x19",
> +   "%x20", "%x21", "%x22", "%x23", "%x24",
> +   "%x25", "%x26", "%x27", "%x28", "%x29",
> "%lr", "%sp",
>  };
>  #endif
> --
> 1.8.5.2
>


Re: [PATCH 1/2] perf probe: Fix wrong register name for arm64

2017-05-03 Thread Pratyush Anand
Just noticed that this patch fixes commit e47392bf9c06 (v4.8+). So,
Ccing sta...@vger.kernel.org
[e47392bf9c06 perf uprobe: Skip prologue if program compiled without
optimization]

After e47392bf9c06 on ARM64:
# cat > test.c << EOF
> #include 
>
> int main(int argc, void **argv)
> {
> printf("argc =%d\n", argc);
> }
> EOF
# gcc -g  -o test test.c
# perf probe -x ./test -a 'main argc'
Target program is compiled without optimization. Skipping prologue.
Probe on address 0x400600 to force probing at the function entry.

Failed to write event: Invalid argument
  Error: Failed to add events.

If this patch is applied on top of e47392bf9c06:
perf probe -x ./test -a 'main argc'
Target program is compiled without optimization. Skipping prologue.
Probe on address 0x400600 to force probing at the function entry.

Added new event:
  probe_test:main  (on main in
/home/panand/work/test_code/uprobe_test/test with argc)

You can now use it in all perf tools, such as:

perf record -e probe_test:main -aR sleep 1

~Pratyush

On Tue, Jan 24, 2017 at 4:00 PM, He Kuang  wrote:
> The register name of arm64 architecture is x0-x31 not r0-r31, this
> patch changes this typo.
>
> Before this patch:
>
>   # perf probe --definition 'sys_write count'
>   p:probe/sys_write _text+1502872 count=%r2:s64
>
>   # echo 'p:probe/sys_write _text+1502872 count=%r2:s64' > \
> /sys/kernel/debug/tracing/kprobe_events
>   Parse error at argument[0]. (-22)
>
> After this patch:
>
>   # perf probe --definition 'sys_write count'
>   p:probe/sys_write _text+1502872 count=%x2:s64
>
>   # echo 'p:probe/sys_write _text+1502872 count=%x2:s64' > \
> /sys/kernel/debug/tracing/kprobe_events
>   # echo 1 >/sys/kernel/debug/tracing/events/probe/enable
>   # cat /sys/kernel/debug/tracing/trace
>   ...
>   sh-422   [000] d... 650.495930: sys_write: (SyS_write+0x0/0xc8) count=22
>   sh-422   [000] d... 651.102389: sys_write: (SyS_write+0x0/0xc8) count=26
>   sh-422   [000] d... 651.358653: sys_write: (SyS_write+0x0/0xc8) count=86
>
> Signed-off-by: He Kuang 
> ---
>  tools/perf/arch/arm64/include/dwarf-regs-table.h | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/tools/perf/arch/arm64/include/dwarf-regs-table.h 
> b/tools/perf/arch/arm64/include/dwarf-regs-table.h
> index 2675936..36e375f 100644
> --- a/tools/perf/arch/arm64/include/dwarf-regs-table.h
> +++ b/tools/perf/arch/arm64/include/dwarf-regs-table.h
> @@ -2,12 +2,12 @@
>  /* This is included in perf/util/dwarf-regs.c */
>
>  static const char * const aarch64_regstr_tbl[] = {
> -   "%r0", "%r1", "%r2", "%r3", "%r4",
> -   "%r5", "%r6", "%r7", "%r8", "%r9",
> -   "%r10", "%r11", "%r12", "%r13", "%r14",
> -   "%r15", "%r16", "%r17", "%r18", "%r19",
> -   "%r20", "%r21", "%r22", "%r23", "%r24",
> -   "%r25", "%r26", "%r27", "%r28", "%r29",
> +   "%x0", "%x1", "%x2", "%x3", "%x4",
> +   "%x5", "%x6", "%x7", "%x8", "%x9",
> +   "%x10", "%x11", "%x12", "%x13", "%x14",
> +   "%x15", "%x16", "%x17", "%x18", "%x19",
> +   "%x20", "%x21", "%x22", "%x23", "%x24",
> +   "%x25", "%x26", "%x27", "%x28", "%x29",
> "%lr", "%sp",
>  };
>  #endif
> --
> 1.8.5.2
>


Re: [trace-cmd Patch RFC] trace-cmd: top: A new interface to detect peak memory

2017-04-27 Thread Pratyush Anand



On Thursday 27 April 2017 10:19 PM, Steven Rostedt wrote:

On Thu, 27 Apr 2017 19:32:43 +0530
Pratyush Anand <pan...@redhat.com> wrote:


I will implement your review comments and will send next revision.
However, I had couple of observation which I was unable to justify:

# ./trace-cmd top -s /tmp/test
# ./trace-cmd top -p /tmp/test | grep trace-cmd
   15292   trace-cmd   22144   15808


What does it give for your /tmp/test ?


/tmp/test is name of socket through which both trace aggregator (-s) and 
printer (-p) process are talking to each other.




Note, tracing doesn't start till after trace-cmd is loaded. Everything
before is not going to be seen.


Yes. So with `./trace-cmd top -s /tmp/test` tracing starts and it aggregates 
stats for all processes including self process as well. After some time I 
execute `./trace-cmd top -p /tmp/test` which prints stats for all the 
processes including self. In the above example, I see that peak memory 
calculated for self process (`./trace-cmd top -s /tmp/test`) was 22144KB.





Here,15292 is the pid of trace-cmd task
22144 KB is the peak memory usage
15808 KB is the current memory usage

Now check rss component from statm
# cat /proc/15292/statm
   50 35 23 7 0 12 0 36

This is a result on ARM64/64KB page size. Therefore, as per statm rss is 35
pages = 35*64 = 2240KB
I patched my kernel [2] for test purpose, so that statm reports peak memory as
well. Here, the last extra entry in statm output is peak memory and it is 36
pages = 2304KB.
So, this is a huge difference between what has been reported by statm and what
we get from trace-cmd.
I understand that `trace-cmd top` output would only be approximate, because
many of the memory could be allocated by task and freed in interrupt context.
So, many a time it can differ. But, is such a huge difference justified? If
yes, can we count on the output of this utility to find early boot time oom
issues?


Doesn't it only just trace the memory usage of when the tracing starts?


So,should be less than what is being reported by statm, right? But, here we 
see that value reported by trace-cmd is far more than rss value of statm.


Well, there is one known bug in the code, which I am trying to find out a way. 
But, that should not cause this huge difference.


When we receive a trace entry for any new process, we read rss value from 
statm. We think,that these are the memory consumption by that process before 
tracing started. So,we initialize peak and current memory value for that 
process with rss from statm, and then we increase current when there is an 
entry for mm_page_alloc() and decrease when an entry for mm_page_free(). peak 
value of current at any point is reported as peak memory.


Now, lets consider a process p1 which was existing before tracing started. its 
rss was having value r1 KB. we receive first trace entry for p1, which says 
8KB was allocated for p1.


p1->cur = r1 + 8;
p->peak = r1 + 8;

We receive another entry with 4KB de-allocation.

p1->cur = r1 + 4;
p->peak = r1 + 8;

There would have been some of these entries which have also been taken care in 
statm/rss (r1 here). We do not know how many entries were already considered.
Currently, these n (unknown) entries [which had been generated between (a) 1st 
tarce entry and (b) when trace-cmd top reads statm/rss] are considered twice, 
which is a bug.


~Pratyush



[1]
https://github.com/pratyushanand/trace-cmd/commit/602c2cd96aa613633ad20c6d382e41f7db37a2a4
[2]
https://github.com/pratyushanand/linux/commit/197e2045361b6b70085c46c78ea6607d8afce517




Re: [trace-cmd Patch RFC] trace-cmd: top: A new interface to detect peak memory

2017-04-27 Thread Pratyush Anand



On Thursday 27 April 2017 10:19 PM, Steven Rostedt wrote:

On Thu, 27 Apr 2017 19:32:43 +0530
Pratyush Anand  wrote:


I will implement your review comments and will send next revision.
However, I had couple of observation which I was unable to justify:

# ./trace-cmd top -s /tmp/test
# ./trace-cmd top -p /tmp/test | grep trace-cmd
   15292   trace-cmd   22144   15808


What does it give for your /tmp/test ?


/tmp/test is name of socket through which both trace aggregator (-s) and 
printer (-p) process are talking to each other.




Note, tracing doesn't start till after trace-cmd is loaded. Everything
before is not going to be seen.


Yes. So with `./trace-cmd top -s /tmp/test` tracing starts and it aggregates 
stats for all processes including self process as well. After some time I 
execute `./trace-cmd top -p /tmp/test` which prints stats for all the 
processes including self. In the above example, I see that peak memory 
calculated for self process (`./trace-cmd top -s /tmp/test`) was 22144KB.





Here,15292 is the pid of trace-cmd task
22144 KB is the peak memory usage
15808 KB is the current memory usage

Now check rss component from statm
# cat /proc/15292/statm
   50 35 23 7 0 12 0 36

This is a result on ARM64/64KB page size. Therefore, as per statm rss is 35
pages = 35*64 = 2240KB
I patched my kernel [2] for test purpose, so that statm reports peak memory as
well. Here, the last extra entry in statm output is peak memory and it is 36
pages = 2304KB.
So, this is a huge difference between what has been reported by statm and what
we get from trace-cmd.
I understand that `trace-cmd top` output would only be approximate, because
many of the memory could be allocated by task and freed in interrupt context.
So, many a time it can differ. But, is such a huge difference justified? If
yes, can we count on the output of this utility to find early boot time oom
issues?


Doesn't it only just trace the memory usage of when the tracing starts?


So,should be less than what is being reported by statm, right? But, here we 
see that value reported by trace-cmd is far more than rss value of statm.


Well, there is one known bug in the code, which I am trying to find out a way. 
But, that should not cause this huge difference.


When we receive a trace entry for any new process, we read rss value from 
statm. We think,that these are the memory consumption by that process before 
tracing started. So,we initialize peak and current memory value for that 
process with rss from statm, and then we increase current when there is an 
entry for mm_page_alloc() and decrease when an entry for mm_page_free(). peak 
value of current at any point is reported as peak memory.


Now, lets consider a process p1 which was existing before tracing started. its 
rss was having value r1 KB. we receive first trace entry for p1, which says 
8KB was allocated for p1.


p1->cur = r1 + 8;
p->peak = r1 + 8;

We receive another entry with 4KB de-allocation.

p1->cur = r1 + 4;
p->peak = r1 + 8;

There would have been some of these entries which have also been taken care in 
statm/rss (r1 here). We do not know how many entries were already considered.
Currently, these n (unknown) entries [which had been generated between (a) 1st 
tarce entry and (b) when trace-cmd top reads statm/rss] are considered twice, 
which is a bug.


~Pratyush



[1]
https://github.com/pratyushanand/trace-cmd/commit/602c2cd96aa613633ad20c6d382e41f7db37a2a4
[2]
https://github.com/pratyushanand/linux/commit/197e2045361b6b70085c46c78ea6607d8afce517




Re: [trace-cmd Patch RFC] trace-cmd: top: A new interface to detect peak memory

2017-04-27 Thread Pratyush Anand

Hi Steve,

On Wednesday 26 April 2017 07:31 PM, Steven Rostedt wrote:


Sorry for the late reply. I finally have time to start looking at
trace-cmd again.


Thanks a lot for your review :-)



On Wed,  1 Mar 2017 20:32:37 +0530
Pratyush Anand <pan...@redhat.com> wrote:



[...]


This is as nice idea, but it needs some clean ups. I tried it out just
to play with it.

First, I just did:

 trace-cmd top

And nothing happened. Either a help message needs to be printed, or it
starts something. Maybe have it be interactive, that is, print out data
as it comes in?



[...]



But then for some reason it -p stopped doing anything. And -e didn't
stop it either. But after killing it manually and removing the pid
file, it appeared to work normal again. I can't seem to repeat it, so
lets not worry about that now.



Yes, this version was sharing info between two processes using files. I have a 
new version [1] which has sharing through socket and that works better. Help 
issue is also fixed in that version.



More below.



[...]


+void top_disable_events(void)
+{
+   char *path;
+   char c;
+   int fd;
+   int ret;
+
+   /*
+* WARNING: We want only few events related to memory allocation to
+* be enabled. Disable all events here. Latter, selective events
+* would be enabled


trace-cmd reset does this. You may want to call that instead of doing
it manually here. Someone else wanted to pull out trace-cmd reset into
its own file. I'll have to get those patches. But then we can call that
and enable tracing again.

Or better yet, if we are only tracing memory events, just create your
own instance and use that. Why bother with normal tracing?

 mkdir /sys/kernel/debug/tracing/instances/trace-cmd-memory


Yes, it seems a good idea. Will do.




+*/
+   c = '0';
+   path = get_instance_file(_instance, "events/enable");
+   fd = open(path, O_WRONLY);
+   if (fd < 0)
+   die("failed to open '%s'", path);
+   ret = write(fd, , 1);
+   close(fd);
+   tracecmd_put_tracing_file(path);
+   if (ret < 1)
+   die("failed to write 0 to events/enable");
+
+   path = get_instance_file(_instance, "tracing_on");
+   fd = open(path, O_WRONLY);
+   if (fd < 0)
+   die("failed to open '%s'", path);
+   ret = write(fd, , 1);
+   close(fd);
+   tracecmd_put_tracing_file(path);
+   if (ret < 1)
+   die("failed to write 0 to tracing_on\n");
+
+   path = get_instance_file(_instance, "set_event");
+   fd = open(path, O_WRONLY);
+   if (fd < 0)
+   die("failed to open '%s'", path);
+   close(fd);
+   tracecmd_put_tracing_file(path);
+}
+
+void top_exit(void)
+{
+   int i;
+   /*
+* free all the dynamic memories which could have been allocated by
+* this program
+*/
+   if (kbuf)
+   kbuffer_free(kbuf);
+   if (record)
+   free_record(record);
+   if (pevent)
+   pevent_free(pevent);


The above three functions all can handle a NULL pointer. Remove the if
statements.


ok




+   if (top_task_stats) {
+   for (i = 0; i < top_task_cnt; i++)
+   free(top_task_stats[i].comm);
+   free(top_task_stats);
+   }
+
+   top_disable_events();
+   remove(TRACE_CMD_TOP_PID_FILE);
+}
+
+void top_initialize_events(void)
+{
+   char *path;
+   int fd;
+   char c;
+   int ret;
+   char *buf;
+   int size;
+   enum kbuffer_long_size long_size;
+   enum kbuffer_endian endian;
+   char id_str[8];
+


Might want to look at trace_stream_init() and use that instead. It is
made for live streaming of events.


OK, will look into trace_stream_init().




+   /* trace data is read from the trace_raw_pipe directly */
+   if (tracecmd_host_bigendian())
+   endian = KBUFFER_ENDIAN_BIG;
+   else
+   endian = KBUFFER_ENDIAN_LITTLE;
+


[...]


+void top_start(void)
+{
+   pid_t pid;
+   int fd, ret;
+
+   top_disable_events();
+   top_initialize_events();
+   page_size = getpagesize();
+
+   pid = fork();
+   if (!pid) {
+   do {
+   top_process_events();
+   sleep(1);


We should probably just sleep on the buffer, waiting for data.


OK




+   if (set_print_only) {
+   top_print_stats();
+   set_print_only = false;
+   }
+   if (set_print_and_exit) {
+   top_print_stats();
+   top_exit();
+   break;
+   }
+   } while (1);
+   } else {
+   fd = op

Re: [trace-cmd Patch RFC] trace-cmd: top: A new interface to detect peak memory

2017-04-27 Thread Pratyush Anand

Hi Steve,

On Wednesday 26 April 2017 07:31 PM, Steven Rostedt wrote:


Sorry for the late reply. I finally have time to start looking at
trace-cmd again.


Thanks a lot for your review :-)



On Wed,  1 Mar 2017 20:32:37 +0530
Pratyush Anand  wrote:



[...]


This is as nice idea, but it needs some clean ups. I tried it out just
to play with it.

First, I just did:

 trace-cmd top

And nothing happened. Either a help message needs to be printed, or it
starts something. Maybe have it be interactive, that is, print out data
as it comes in?



[...]



But then for some reason it -p stopped doing anything. And -e didn't
stop it either. But after killing it manually and removing the pid
file, it appeared to work normal again. I can't seem to repeat it, so
lets not worry about that now.



Yes, this version was sharing info between two processes using files. I have a 
new version [1] which has sharing through socket and that works better. Help 
issue is also fixed in that version.



More below.



[...]


+void top_disable_events(void)
+{
+   char *path;
+   char c;
+   int fd;
+   int ret;
+
+   /*
+* WARNING: We want only few events related to memory allocation to
+* be enabled. Disable all events here. Latter, selective events
+* would be enabled


trace-cmd reset does this. You may want to call that instead of doing
it manually here. Someone else wanted to pull out trace-cmd reset into
its own file. I'll have to get those patches. But then we can call that
and enable tracing again.

Or better yet, if we are only tracing memory events, just create your
own instance and use that. Why bother with normal tracing?

 mkdir /sys/kernel/debug/tracing/instances/trace-cmd-memory


Yes, it seems a good idea. Will do.




+*/
+   c = '0';
+   path = get_instance_file(_instance, "events/enable");
+   fd = open(path, O_WRONLY);
+   if (fd < 0)
+   die("failed to open '%s'", path);
+   ret = write(fd, , 1);
+   close(fd);
+   tracecmd_put_tracing_file(path);
+   if (ret < 1)
+   die("failed to write 0 to events/enable");
+
+   path = get_instance_file(_instance, "tracing_on");
+   fd = open(path, O_WRONLY);
+   if (fd < 0)
+   die("failed to open '%s'", path);
+   ret = write(fd, , 1);
+   close(fd);
+   tracecmd_put_tracing_file(path);
+   if (ret < 1)
+   die("failed to write 0 to tracing_on\n");
+
+   path = get_instance_file(_instance, "set_event");
+   fd = open(path, O_WRONLY);
+   if (fd < 0)
+   die("failed to open '%s'", path);
+   close(fd);
+   tracecmd_put_tracing_file(path);
+}
+
+void top_exit(void)
+{
+   int i;
+   /*
+* free all the dynamic memories which could have been allocated by
+* this program
+*/
+   if (kbuf)
+   kbuffer_free(kbuf);
+   if (record)
+   free_record(record);
+   if (pevent)
+   pevent_free(pevent);


The above three functions all can handle a NULL pointer. Remove the if
statements.


ok




+   if (top_task_stats) {
+   for (i = 0; i < top_task_cnt; i++)
+   free(top_task_stats[i].comm);
+   free(top_task_stats);
+   }
+
+   top_disable_events();
+   remove(TRACE_CMD_TOP_PID_FILE);
+}
+
+void top_initialize_events(void)
+{
+   char *path;
+   int fd;
+   char c;
+   int ret;
+   char *buf;
+   int size;
+   enum kbuffer_long_size long_size;
+   enum kbuffer_endian endian;
+   char id_str[8];
+


Might want to look at trace_stream_init() and use that instead. It is
made for live streaming of events.


OK, will look into trace_stream_init().




+   /* trace data is read from the trace_raw_pipe directly */
+   if (tracecmd_host_bigendian())
+   endian = KBUFFER_ENDIAN_BIG;
+   else
+   endian = KBUFFER_ENDIAN_LITTLE;
+


[...]


+void top_start(void)
+{
+   pid_t pid;
+   int fd, ret;
+
+   top_disable_events();
+   top_initialize_events();
+   page_size = getpagesize();
+
+   pid = fork();
+   if (!pid) {
+   do {
+   top_process_events();
+   sleep(1);


We should probably just sleep on the buffer, waiting for data.


OK




+   if (set_print_only) {
+   top_print_stats();
+   set_print_only = false;
+   }
+   if (set_print_and_exit) {
+   top_print_stats();
+   top_exit();
+   break;
+   }
+   } while (1);
+   } else {
+   fd = open(TRACE_CMD_TOP_

[trace-cmd Patch RFC] trace-cmd: top: A new interface to detect peak memory

2017-03-01 Thread Pratyush Anand
A new interface "trace-cmd top" has been introduced in this patch to
know peak memory usage of any task.

$ trace-cmd top -s
It will start to gather statistics of all the process which will be
scheduled after this command execution.
$ trace-cmd top -p
It will print peak memory usage(in KB) against each scheduled task.
$ trace-cmd top -e
It will print peak memory usage(in KB) against each scheduled task and
will stop gathering statistics. It will also kill the child process
initiated by -s option.

$ ./trace-cmd top -s
$ ./trace-cmd top -e

commpeak memory(in KB)
NetworkManager  15912
gnome-shell 114292
gnome-shell 186356
firefox 853244
bash5984
thunderbird 1896396
kworker/3:0 0
kworker/u16:1   0
trace-cmd   256

Signed-off-by: Pratyush Anand <pan...@redhat.com>
---
 Makefile   |   2 +-
 trace-cmd.c|   4 +
 trace-local.h  |   3 +
 trace-record.c |   2 +-
 trace-top.c| 620 +
 trace-usage.c  |   8 +
 6 files changed, 637 insertions(+), 2 deletions(-)
 create mode 100644 trace-top.c

diff --git a/Makefile b/Makefile
index 62cb25b6c1bd..5324c141d3d6 100644
--- a/Makefile
+++ b/Makefile
@@ -331,7 +331,7 @@ TRACE_GUI_OBJS = trace-filter.o trace-compat.o 
trace-filter-hash.o trace-dialog.
trace-xml.o
 TRACE_CMD_OBJS = trace-cmd.o trace-record.o trace-read.o trace-split.o 
trace-listen.o \
 trace-stack.o trace-hist.o trace-mem.o trace-snapshot.o trace-stat.o \
-trace-hash.o trace-profile.o trace-stream.o
+trace-hash.o trace-profile.o trace-stream.o trace-top.o
 TRACE_VIEW_OBJS = trace-view.o trace-view-store.o
 TRACE_GRAPH_OBJS = trace-graph.o trace-plot.o trace-plot-cpu.o 
trace-plot-task.o
 TRACE_VIEW_MAIN_OBJS = trace-view-main.o $(TRACE_VIEW_OBJS) $(TRACE_GUI_OBJS)
diff --git a/trace-cmd.c b/trace-cmd.c
index 1a62faaf71ca..66e0f7cd37bc 100644
--- a/trace-cmd.c
+++ b/trace-cmd.c
@@ -495,6 +495,10 @@ int main (int argc, char **argv)
trace_stat(argc, argv);
exit(0);
 
+   } else if (strcmp(argv[1], "top") == 0) {
+   trace_top(argc, argv);
+   exit(0);
+
} else if (strcmp(argv[1], "options") == 0) {
show_plugin_options();
exit(0);
diff --git a/trace-local.h b/trace-local.h
index 62363d0f9248..c9eac07da79a 100644
--- a/trace-local.h
+++ b/trace-local.h
@@ -74,6 +74,8 @@ void trace_mem(int argc, char **argv);
 
 void trace_stat(int argc, char **argv);
 
+void trace_top(int argc, char **argv);
+
 struct hook_list;
 
 int trace_profile_record(struct tracecmd_input *handle,
@@ -184,5 +186,6 @@ void update_first_instance(struct buffer_instance 
*instance, int topt);
 
 void show_instance_file(struct buffer_instance *instance, const char *name);
 int count_cpus(void);
+char *read_file(char *file, int *psize);
 
 #endif /* __TRACE_LOCAL_H */
diff --git a/trace-record.c b/trace-record.c
index 22b6835c0d5b..b2ce59e7eb4f 100644
--- a/trace-record.c
+++ b/trace-record.c
@@ -3323,7 +3323,7 @@ static char *read_instance_file(struct buffer_instance 
*instance, char *file, in
return buf;
 }
 
-static char *read_file(char *file, int *psize)
+char *read_file(char *file, int *psize)
 {
return read_instance_file(_instance, file, psize);
 }
diff --git a/trace-top.c b/trace-top.c
new file mode 100644
index ..056ae669856a
--- /dev/null
+++ b/trace-top.c
@@ -0,0 +1,620 @@
+/*
+ * Copyright (C) 2017 Red Hat Inc, Pratyush Anand <pan...@redhat.com>
+ *
+ * ~~
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; version 2 of the License (not later!)
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not,  see <http://www.gnu.org/licenses>
+ *
+ * ~~
+ * There are several scenarios where we land into oom-killer in the early
+ * boot process, specially in a memory constrained environment. It becomes
+ * very difficult to identify the user space task which required more
+ * memory compared to their previous released versions. This interface is
+ * an attempt to debug such issues, which will help us to identify peak
+ * memory usage of each task. mm_page_alloc() and mm_page_free() are lowest
+ * level of kernel APIs which allocates and frees memory from buddy. This
+ 

[trace-cmd Patch RFC] trace-cmd: top: A new interface to detect peak memory

2017-03-01 Thread Pratyush Anand
A new interface "trace-cmd top" has been introduced in this patch to
know peak memory usage of any task.

$ trace-cmd top -s
It will start to gather statistics of all the process which will be
scheduled after this command execution.
$ trace-cmd top -p
It will print peak memory usage(in KB) against each scheduled task.
$ trace-cmd top -e
It will print peak memory usage(in KB) against each scheduled task and
will stop gathering statistics. It will also kill the child process
initiated by -s option.

$ ./trace-cmd top -s
$ ./trace-cmd top -e

commpeak memory(in KB)
NetworkManager  15912
gnome-shell 114292
gnome-shell 186356
firefox 853244
bash5984
thunderbird 1896396
kworker/3:0 0
kworker/u16:1   0
trace-cmd   256

Signed-off-by: Pratyush Anand 
---
 Makefile   |   2 +-
 trace-cmd.c|   4 +
 trace-local.h  |   3 +
 trace-record.c |   2 +-
 trace-top.c| 620 +
 trace-usage.c  |   8 +
 6 files changed, 637 insertions(+), 2 deletions(-)
 create mode 100644 trace-top.c

diff --git a/Makefile b/Makefile
index 62cb25b6c1bd..5324c141d3d6 100644
--- a/Makefile
+++ b/Makefile
@@ -331,7 +331,7 @@ TRACE_GUI_OBJS = trace-filter.o trace-compat.o 
trace-filter-hash.o trace-dialog.
trace-xml.o
 TRACE_CMD_OBJS = trace-cmd.o trace-record.o trace-read.o trace-split.o 
trace-listen.o \
 trace-stack.o trace-hist.o trace-mem.o trace-snapshot.o trace-stat.o \
-trace-hash.o trace-profile.o trace-stream.o
+trace-hash.o trace-profile.o trace-stream.o trace-top.o
 TRACE_VIEW_OBJS = trace-view.o trace-view-store.o
 TRACE_GRAPH_OBJS = trace-graph.o trace-plot.o trace-plot-cpu.o 
trace-plot-task.o
 TRACE_VIEW_MAIN_OBJS = trace-view-main.o $(TRACE_VIEW_OBJS) $(TRACE_GUI_OBJS)
diff --git a/trace-cmd.c b/trace-cmd.c
index 1a62faaf71ca..66e0f7cd37bc 100644
--- a/trace-cmd.c
+++ b/trace-cmd.c
@@ -495,6 +495,10 @@ int main (int argc, char **argv)
trace_stat(argc, argv);
exit(0);
 
+   } else if (strcmp(argv[1], "top") == 0) {
+   trace_top(argc, argv);
+   exit(0);
+
} else if (strcmp(argv[1], "options") == 0) {
show_plugin_options();
exit(0);
diff --git a/trace-local.h b/trace-local.h
index 62363d0f9248..c9eac07da79a 100644
--- a/trace-local.h
+++ b/trace-local.h
@@ -74,6 +74,8 @@ void trace_mem(int argc, char **argv);
 
 void trace_stat(int argc, char **argv);
 
+void trace_top(int argc, char **argv);
+
 struct hook_list;
 
 int trace_profile_record(struct tracecmd_input *handle,
@@ -184,5 +186,6 @@ void update_first_instance(struct buffer_instance 
*instance, int topt);
 
 void show_instance_file(struct buffer_instance *instance, const char *name);
 int count_cpus(void);
+char *read_file(char *file, int *psize);
 
 #endif /* __TRACE_LOCAL_H */
diff --git a/trace-record.c b/trace-record.c
index 22b6835c0d5b..b2ce59e7eb4f 100644
--- a/trace-record.c
+++ b/trace-record.c
@@ -3323,7 +3323,7 @@ static char *read_instance_file(struct buffer_instance 
*instance, char *file, in
return buf;
 }
 
-static char *read_file(char *file, int *psize)
+char *read_file(char *file, int *psize)
 {
return read_instance_file(_instance, file, psize);
 }
diff --git a/trace-top.c b/trace-top.c
new file mode 100644
index ..056ae669856a
--- /dev/null
+++ b/trace-top.c
@@ -0,0 +1,620 @@
+/*
+ * Copyright (C) 2017 Red Hat Inc, Pratyush Anand 
+ *
+ * ~~
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; version 2 of the License (not later!)
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not,  see <http://www.gnu.org/licenses>
+ *
+ * ~~
+ * There are several scenarios where we land into oom-killer in the early
+ * boot process, specially in a memory constrained environment. It becomes
+ * very difficult to identify the user space task which required more
+ * memory compared to their previous released versions. This interface is
+ * an attempt to debug such issues, which will help us to identify peak
+ * memory usage of each task. mm_page_alloc() and mm_page_free() are lowest
+ * level of kernel APIs which allocates and frees memory from buddy. This
+ * tool enables tracepoint of these two functions and

Re: [PATCH] /proc/kcore: Update physical address for kcore ram and text

2017-02-23 Thread Pratyush Anand

Hi Andrew/Kees,

On Tuesday 14 February 2017 07:16 AM, Pratyush Anand wrote:


Well, CONFIG_PROC_KCORE is a generalized root KASLR exposure (though
there are lots of such exposures). Why is the actual physical address
needed? Can this just report the virtual address instead? Then the
tool can build a map, but it looks like an identity map, rather than
creating a new physical/virtual memory ASLR offset exposure?


Well, having an ASLR offset information can help to translate an
identity mapped virtual address to a physical address. But that would be
an additional field in PT_LOAD header structure and an arch dependent
value.

Moreover, sending a valid physical address like 0 does not seem right.
So, IMHO it is better to fix that and send valid physical address when
available (identity mapped).

Thanks for the review.


So, whats the decision on this patch? I see that patch is lying in 
next/master. Should I expect this patch in v4.11-rc1?


Couple of user-space makedumpfile modification will depend on this 
patch. So, we can not get those makedumpfile patches merged until this 
patch hits upstream.


~Pratyush


Re: [PATCH] /proc/kcore: Update physical address for kcore ram and text

2017-02-23 Thread Pratyush Anand

Hi Andrew/Kees,

On Tuesday 14 February 2017 07:16 AM, Pratyush Anand wrote:


Well, CONFIG_PROC_KCORE is a generalized root KASLR exposure (though
there are lots of such exposures). Why is the actual physical address
needed? Can this just report the virtual address instead? Then the
tool can build a map, but it looks like an identity map, rather than
creating a new physical/virtual memory ASLR offset exposure?


Well, having an ASLR offset information can help to translate an
identity mapped virtual address to a physical address. But that would be
an additional field in PT_LOAD header structure and an arch dependent
value.

Moreover, sending a valid physical address like 0 does not seem right.
So, IMHO it is better to fix that and send valid physical address when
available (identity mapped).

Thanks for the review.


So, whats the decision on this patch? I see that patch is lying in 
next/master. Should I expect this patch in v4.11-rc1?


Couple of user-space makedumpfile modification will depend on this 
patch. So, we can not get those makedumpfile patches merged until this 
patch hits upstream.


~Pratyush


Re: [PATCH] /proc/kcore: Update physical address for kcore ram and text

2017-02-13 Thread Pratyush Anand



On Tuesday 14 February 2017 03:55 AM, Kees Cook wrote:

On Mon, Jan 30, 2017 at 11:00 AM, Pratyush Anand <pan...@redhat.com> wrote:

CCing Andrew and Kees for their review comments.


On Wednesday 25 January 2017 10:14 AM, Pratyush Anand wrote:

Currently all the p_paddr of PT_LOAD headers are assigned to 0, which is
not true and could be misleading, since 0 is a valid physical address.

User space tools like makedumpfile needs to know physical address for
PT_LOAD segments of direct mapped regions. Therefore this patch updates
paddr for such regions. It also sets an invalid paddr (-1) for other
regions, so that user space tool can know whether a physical address
provided in PT_LOAD is correct or not.

Signed-off-by: Pratyush Anand <pan...@redhat.com>
---
fs/proc/kcore.c | 5 -
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c
index 0b80ad87b4d6..ea9f3d1ae830 100644
--- a/fs/proc/kcore.c
+++ b/fs/proc/kcore.c
@@ -373,7 +373,10 @@ static void elf_kcore_store_hdr(char *bufp, int
nphdr, int dataoff)
phdr->p_flags = PF_R|PF_W|PF_X;
phdr->p_offset = kc_vaddr_to_offset(m->addr) + dataoff;
phdr->p_vaddr = (size_t)m->addr;
- phdr->p_paddr = 0;
+ if (m->type == KCORE_RAM || m->type == KCORE_TEXT)
+ phdr->p_paddr = __pa(m->addr);
+ else
+ phdr->p_paddr = (elf_addr_t)-1;
phdr->p_filesz = phdr->p_memsz = m->size;
phdr->p_align = PAGE_SIZE;
}



Well, CONFIG_PROC_KCORE is a generalized root KASLR exposure (though
there are lots of such exposures). Why is the actual physical address
needed? Can this just report the virtual address instead? Then the
tool can build a map, but it looks like an identity map, rather than
creating a new physical/virtual memory ASLR offset exposure?


Well, having an ASLR offset information can help to translate an 
identity mapped virtual address to a physical address. But that would be 
an additional field in PT_LOAD header structure and an arch dependent value.


Moreover, sending a valid physical address like 0 does not seem right. 
So, IMHO it is better to fix that and send valid physical address when 
available (identity mapped).


Thanks for the review.

~Pratyush


Re: [PATCH] /proc/kcore: Update physical address for kcore ram and text

2017-02-13 Thread Pratyush Anand



On Tuesday 14 February 2017 03:55 AM, Kees Cook wrote:

On Mon, Jan 30, 2017 at 11:00 AM, Pratyush Anand  wrote:

CCing Andrew and Kees for their review comments.


On Wednesday 25 January 2017 10:14 AM, Pratyush Anand wrote:

Currently all the p_paddr of PT_LOAD headers are assigned to 0, which is
not true and could be misleading, since 0 is a valid physical address.

User space tools like makedumpfile needs to know physical address for
PT_LOAD segments of direct mapped regions. Therefore this patch updates
paddr for such regions. It also sets an invalid paddr (-1) for other
regions, so that user space tool can know whether a physical address
provided in PT_LOAD is correct or not.

Signed-off-by: Pratyush Anand 
---
fs/proc/kcore.c | 5 -
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c
index 0b80ad87b4d6..ea9f3d1ae830 100644
--- a/fs/proc/kcore.c
+++ b/fs/proc/kcore.c
@@ -373,7 +373,10 @@ static void elf_kcore_store_hdr(char *bufp, int
nphdr, int dataoff)
phdr->p_flags = PF_R|PF_W|PF_X;
phdr->p_offset = kc_vaddr_to_offset(m->addr) + dataoff;
phdr->p_vaddr = (size_t)m->addr;
- phdr->p_paddr = 0;
+ if (m->type == KCORE_RAM || m->type == KCORE_TEXT)
+ phdr->p_paddr = __pa(m->addr);
+ else
+ phdr->p_paddr = (elf_addr_t)-1;
phdr->p_filesz = phdr->p_memsz = m->size;
phdr->p_align = PAGE_SIZE;
}



Well, CONFIG_PROC_KCORE is a generalized root KASLR exposure (though
there are lots of such exposures). Why is the actual physical address
needed? Can this just report the virtual address instead? Then the
tool can build a map, but it looks like an identity map, rather than
creating a new physical/virtual memory ASLR offset exposure?


Well, having an ASLR offset information can help to translate an 
identity mapped virtual address to a physical address. But that would be 
an additional field in PT_LOAD header structure and an arch dependent value.


Moreover, sending a valid physical address like 0 does not seem right. 
So, IMHO it is better to fix that and send valid physical address when 
available (identity mapped).


Thanks for the review.

~Pratyush


Re: [PATCH] /proc/kcore: Update physical address for kcore ram and text

2017-01-24 Thread Pratyush Anand

Hi Dave,

On Wednesday 25 January 2017 11:59 AM, Dave Young wrote:

Hi Pratyush
On 01/25/17 at 10:14am, Pratyush Anand wrote:

Currently all the p_paddr of PT_LOAD headers are assigned to 0, which is
not true and could be misleading, since 0 is a valid physical address.

I do not know the history of /proc/kcore, so a question is why the
p_addr was set as 0, if there were some reasons and if this could cause
some risk or breakage.



I do not know why it was 0, which is a valid physical address. But 
certainly, it might break some user space tools, and those need to be 
fixed. For example, see following code from kexec-tools


kexec/kexec-elf.c:build_mem_phdrs()

435 if ((phdr->p_paddr + phdr->p_memsz) < phdr->p_paddr) {
436 /* The memory address wraps */
437 if (probe_debug) {
438 fprintf(stderr, "ELF address wrap 
around\n");

439 }
440 return -1;
441 }

We do not need to perform above check for an invalid physical address.

I think, kexec-tools and makedumpfile will need fixup. I already have 
those fixup which will be sent upstream once this patch makes through.
Pro with this approach is that, it will help to calculate variable like 
page_offset, phys_base from PT_LOAD even when they are randomized and 
therefore will reduce many variable and version specific values in user 
space tools.


~Pratyush


Re: [PATCH] /proc/kcore: Update physical address for kcore ram and text

2017-01-24 Thread Pratyush Anand

Hi Dave,

On Wednesday 25 January 2017 11:59 AM, Dave Young wrote:

Hi Pratyush
On 01/25/17 at 10:14am, Pratyush Anand wrote:

Currently all the p_paddr of PT_LOAD headers are assigned to 0, which is
not true and could be misleading, since 0 is a valid physical address.

I do not know the history of /proc/kcore, so a question is why the
p_addr was set as 0, if there were some reasons and if this could cause
some risk or breakage.



I do not know why it was 0, which is a valid physical address. But 
certainly, it might break some user space tools, and those need to be 
fixed. For example, see following code from kexec-tools


kexec/kexec-elf.c:build_mem_phdrs()

435 if ((phdr->p_paddr + phdr->p_memsz) < phdr->p_paddr) {
436 /* The memory address wraps */
437 if (probe_debug) {
438 fprintf(stderr, "ELF address wrap 
around\n");

439 }
440 return -1;
441 }

We do not need to perform above check for an invalid physical address.

I think, kexec-tools and makedumpfile will need fixup. I already have 
those fixup which will be sent upstream once this patch makes through.
Pro with this approach is that, it will help to calculate variable like 
page_offset, phys_base from PT_LOAD even when they are randomized and 
therefore will reduce many variable and version specific values in user 
space tools.


~Pratyush


[PATCH] /proc/kcore: Update physical address for kcore ram and text

2017-01-24 Thread Pratyush Anand
Currently all the p_paddr of PT_LOAD headers are assigned to 0, which is
not true and could be misleading, since 0 is a valid physical address.

User space tools like makedumpfile needs to know physical address for
PT_LOAD segments of direct mapped regions. Therefore this patch updates
paddr for such regions. It also sets an invalid paddr (-1) for other
regions, so that user space tool can know whether a physical address
provided in PT_LOAD is correct or not.

Signed-off-by: Pratyush Anand <pan...@redhat.com>
---
 fs/proc/kcore.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c
index 0b80ad87b4d6..ea9f3d1ae830 100644
--- a/fs/proc/kcore.c
+++ b/fs/proc/kcore.c
@@ -373,7 +373,10 @@ static void elf_kcore_store_hdr(char *bufp, int nphdr, int 
dataoff)
phdr->p_flags   = PF_R|PF_W|PF_X;
phdr->p_offset  = kc_vaddr_to_offset(m->addr) + dataoff;
phdr->p_vaddr   = (size_t)m->addr;
-   phdr->p_paddr   = 0;
+   if (m->type == KCORE_RAM || m->type == KCORE_TEXT)
+   phdr->p_paddr   = __pa(m->addr);
+   else
+   phdr->p_paddr   = (elf_addr_t)-1;
phdr->p_filesz  = phdr->p_memsz = m->size;
phdr->p_align   = PAGE_SIZE;
}
-- 
2.9.3



[PATCH] /proc/kcore: Update physical address for kcore ram and text

2017-01-24 Thread Pratyush Anand
Currently all the p_paddr of PT_LOAD headers are assigned to 0, which is
not true and could be misleading, since 0 is a valid physical address.

User space tools like makedumpfile needs to know physical address for
PT_LOAD segments of direct mapped regions. Therefore this patch updates
paddr for such regions. It also sets an invalid paddr (-1) for other
regions, so that user space tool can know whether a physical address
provided in PT_LOAD is correct or not.

Signed-off-by: Pratyush Anand 
---
 fs/proc/kcore.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/fs/proc/kcore.c b/fs/proc/kcore.c
index 0b80ad87b4d6..ea9f3d1ae830 100644
--- a/fs/proc/kcore.c
+++ b/fs/proc/kcore.c
@@ -373,7 +373,10 @@ static void elf_kcore_store_hdr(char *bufp, int nphdr, int 
dataoff)
phdr->p_flags   = PF_R|PF_W|PF_X;
phdr->p_offset  = kc_vaddr_to_offset(m->addr) + dataoff;
phdr->p_vaddr   = (size_t)m->addr;
-   phdr->p_paddr   = 0;
+   if (m->type == KCORE_RAM || m->type == KCORE_TEXT)
+   phdr->p_paddr   = __pa(m->addr);
+   else
+   phdr->p_paddr   = (elf_addr_t)-1;
phdr->p_filesz  = phdr->p_memsz = m->size;
phdr->p_align   = PAGE_SIZE;
}
-- 
2.9.3



[PATCH V3 0/5] ARM64: More flexible HW watchpoint

2016-11-14 Thread Pratyush Anand
Currently, we do not support all the byte select option provided by ARM64
specs for a HW watchpoint.

This patch set will help user to instrument a watchpoint with all possible
byte select options.

Changes since v2:
- used __ffs() instead of ffs() - 1. Similarly for fls().
- fixed ptrace_hbp_get_addr() to report correct address to user space
- handling stepping for inexact watchpoint as well now.

Changes since v1:
- Introduced a new patch 3/5 where it takes care of the situation when HW
does not report a watchpoint hit with the address that matches one of the
watchpoints set.
- Added corresponding test case to test that functionality.

Pavel Labath (1):
  arm64: hw_breakpoint: Handle inexact watchpoint addresses

Pratyush Anand (4):
  hw_breakpoint: Allow watchpoint of length 3,5,6 and 7
  arm64: Allow hw watchpoint at varied offset from base address
  arm64: Allow hw watchpoint of length 3,5,6 and 7
  selftests: arm64: add test for unaligned/inexact watchpoint handling

 arch/arm64/include/asm/hw_breakpoint.h |   6 +-
 arch/arm64/kernel/hw_breakpoint.c  | 153 +
 arch/arm64/kernel/ptrace.c |   7 +-
 include/uapi/linux/hw_breakpoint.h |   4 +
 tools/include/uapi/linux/hw_breakpoint.h   |   4 +
 tools/testing/selftests/breakpoints/Makefile   |   5 +-
 .../selftests/breakpoints/breakpoint_test_arm64.c  | 236 +
 7 files changed, 372 insertions(+), 43 deletions(-)
 create mode 100644 tools/testing/selftests/breakpoints/breakpoint_test_arm64.c

-- 
2.7.4



[PATCH V3 0/5] ARM64: More flexible HW watchpoint

2016-11-14 Thread Pratyush Anand
Currently, we do not support all the byte select option provided by ARM64
specs for a HW watchpoint.

This patch set will help user to instrument a watchpoint with all possible
byte select options.

Changes since v2:
- used __ffs() instead of ffs() - 1. Similarly for fls().
- fixed ptrace_hbp_get_addr() to report correct address to user space
- handling stepping for inexact watchpoint as well now.

Changes since v1:
- Introduced a new patch 3/5 where it takes care of the situation when HW
does not report a watchpoint hit with the address that matches one of the
watchpoints set.
- Added corresponding test case to test that functionality.

Pavel Labath (1):
  arm64: hw_breakpoint: Handle inexact watchpoint addresses

Pratyush Anand (4):
  hw_breakpoint: Allow watchpoint of length 3,5,6 and 7
  arm64: Allow hw watchpoint at varied offset from base address
  arm64: Allow hw watchpoint of length 3,5,6 and 7
  selftests: arm64: add test for unaligned/inexact watchpoint handling

 arch/arm64/include/asm/hw_breakpoint.h |   6 +-
 arch/arm64/kernel/hw_breakpoint.c  | 153 +
 arch/arm64/kernel/ptrace.c |   7 +-
 include/uapi/linux/hw_breakpoint.h |   4 +
 tools/include/uapi/linux/hw_breakpoint.h   |   4 +
 tools/testing/selftests/breakpoints/Makefile   |   5 +-
 .../selftests/breakpoints/breakpoint_test_arm64.c  | 236 +
 7 files changed, 372 insertions(+), 43 deletions(-)
 create mode 100644 tools/testing/selftests/breakpoints/breakpoint_test_arm64.c

-- 
2.7.4



[PATCH V3 5/5] selftests: arm64: add test for unaligned/inexact watchpoint handling

2016-11-14 Thread Pratyush Anand
ARM64 hardware expects 64bit aligned address for watchpoint invocation.
However, it provides byte selection method to select any number of
consecutive byte set within the range of 1-8.

This patch adds support to test all such byte selection option for
different memory write sizes.

Patch also adds a test for handling the case when the cpu does not
report an address which exactly matches one of the regions we have
been watching (which is a situation permitted by the spec if an
instruction accesses both watched and unwatched regions). The test
was failing on a MSM8996pro before this patch series and is
passing now.

Signed-off-by: Pavel Labath <lab...@google.com>
Signed-off-by: Pratyush Anand <pan...@redhat.com>
---
 tools/testing/selftests/breakpoints/Makefile   |   5 +-
 .../selftests/breakpoints/breakpoint_test_arm64.c  | 236 +
 2 files changed, 240 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/breakpoints/breakpoint_test_arm64.c

diff --git a/tools/testing/selftests/breakpoints/Makefile 
b/tools/testing/selftests/breakpoints/Makefile
index 74e533fd4bc5..61b79e8df1f4 100644
--- a/tools/testing/selftests/breakpoints/Makefile
+++ b/tools/testing/selftests/breakpoints/Makefile
@@ -5,6 +5,9 @@ ARCH ?= $(shell echo $(uname_M) | sed -e s/i.86/x86/ -e 
s/x86_64/x86/)
 ifeq ($(ARCH),x86)
 TEST_PROGS := breakpoint_test
 endif
+ifeq ($(ARCH),aarch64)
+TEST_PROGS := breakpoint_test_arm64
+endif
 
 TEST_PROGS += step_after_suspend_test
 
@@ -13,4 +16,4 @@ all: $(TEST_PROGS)
 include ../lib.mk
 
 clean:
-   rm -fr breakpoint_test step_after_suspend_test
+   rm -fr breakpoint_test breakpoint_test_arm64 step_after_suspend_test
diff --git a/tools/testing/selftests/breakpoints/breakpoint_test_arm64.c 
b/tools/testing/selftests/breakpoints/breakpoint_test_arm64.c
new file mode 100644
index ..3897e996541e
--- /dev/null
+++ b/tools/testing/selftests/breakpoints/breakpoint_test_arm64.c
@@ -0,0 +1,236 @@
+/*
+ * Copyright (C) 2016 Google, Inc.
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * Original Code by Pavel Labath <lab...@google.com>
+ *
+ * Code modified by Pratyush Anand <pan...@redhat.com>
+ * for testing different byte select for each access size.
+ *
+ */
+
+#define _GNU_SOURCE
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "../kselftest.h"
+
+static volatile uint8_t var[96] __attribute__((__aligned__(32)));
+
+static void child(int size, int wr)
+{
+   volatile uint8_t *addr = [32 + wr];
+
+   if (ptrace(PTRACE_TRACEME, 0, NULL, NULL) != 0) {
+   perror("ptrace(PTRACE_TRACEME) failed");
+   _exit(1);
+   }
+
+   if (raise(SIGSTOP) != 0) {
+   perror("raise(SIGSTOP) failed");
+   _exit(1);
+   }
+
+   if ((uintptr_t) addr % size) {
+   perror("Wrong address write for the given size\n");
+   _exit(1);
+   }
+   switch (size) {
+   case 1:
+   *addr = 47;
+   break;
+   case 2:
+   *(uint16_t *)addr = 47;
+   break;
+   case 4:
+   *(uint32_t *)addr = 47;
+   break;
+   case 8:
+   *(uint64_t *)addr = 47;
+   break;
+   case 16:
+   __asm__ volatile ("stp x29, x30, %0" : "=m" (addr[0]));
+   break;
+   case 32:
+   __asm__ volatile ("stp q29, q30, %0" : "=m" (addr[0]));
+   break;
+   }
+
+   _exit(0);
+}
+
+static bool set_watchpoint(pid_t pid, int size, int wp)
+{
+   const volatile uint8_t *addr = [32 + wp];
+   const int offset = (uintptr_t)addr % 8;
+   const unsigned int byte_mask = ((1 << size) - 1) << offset;
+   const unsigned int type = 2; /* Write */
+   const unsigned int enable = 1;
+   const unsigned int control = byte_mask << 5 | type << 3 | enable;
+   struct user_hwdebug_state dreg_state;
+   struct iovec iov;
+
+   memset(_state, 0, sizeof(dreg_state));
+   dreg_state.dbg_regs[0].addr = (uintptr_t)(addr - offset);
+   dreg_state.dbg_regs[0].ctrl = control;
+   iov.iov_base = _state;
+   iov.iov_len = offsetof(struct user_hwdebug_state, dbg_regs) +
+   sizeof(dreg_state.dbg_regs[0]);
+   if (ptrace(PTRACE_SE

[PATCH V3 5/5] selftests: arm64: add test for unaligned/inexact watchpoint handling

2016-11-14 Thread Pratyush Anand
ARM64 hardware expects 64bit aligned address for watchpoint invocation.
However, it provides byte selection method to select any number of
consecutive byte set within the range of 1-8.

This patch adds support to test all such byte selection option for
different memory write sizes.

Patch also adds a test for handling the case when the cpu does not
report an address which exactly matches one of the regions we have
been watching (which is a situation permitted by the spec if an
instruction accesses both watched and unwatched regions). The test
was failing on a MSM8996pro before this patch series and is
passing now.

Signed-off-by: Pavel Labath 
Signed-off-by: Pratyush Anand 
---
 tools/testing/selftests/breakpoints/Makefile   |   5 +-
 .../selftests/breakpoints/breakpoint_test_arm64.c  | 236 +
 2 files changed, 240 insertions(+), 1 deletion(-)
 create mode 100644 tools/testing/selftests/breakpoints/breakpoint_test_arm64.c

diff --git a/tools/testing/selftests/breakpoints/Makefile 
b/tools/testing/selftests/breakpoints/Makefile
index 74e533fd4bc5..61b79e8df1f4 100644
--- a/tools/testing/selftests/breakpoints/Makefile
+++ b/tools/testing/selftests/breakpoints/Makefile
@@ -5,6 +5,9 @@ ARCH ?= $(shell echo $(uname_M) | sed -e s/i.86/x86/ -e 
s/x86_64/x86/)
 ifeq ($(ARCH),x86)
 TEST_PROGS := breakpoint_test
 endif
+ifeq ($(ARCH),aarch64)
+TEST_PROGS := breakpoint_test_arm64
+endif
 
 TEST_PROGS += step_after_suspend_test
 
@@ -13,4 +16,4 @@ all: $(TEST_PROGS)
 include ../lib.mk
 
 clean:
-   rm -fr breakpoint_test step_after_suspend_test
+   rm -fr breakpoint_test breakpoint_test_arm64 step_after_suspend_test
diff --git a/tools/testing/selftests/breakpoints/breakpoint_test_arm64.c 
b/tools/testing/selftests/breakpoints/breakpoint_test_arm64.c
new file mode 100644
index ..3897e996541e
--- /dev/null
+++ b/tools/testing/selftests/breakpoints/breakpoint_test_arm64.c
@@ -0,0 +1,236 @@
+/*
+ * Copyright (C) 2016 Google, Inc.
+ *
+ * This software is licensed under the terms of the GNU General Public
+ * License version 2, as published by the Free Software Foundation, and
+ * may be copied, distributed, and modified under those terms.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * Original Code by Pavel Labath 
+ *
+ * Code modified by Pratyush Anand 
+ * for testing different byte select for each access size.
+ *
+ */
+
+#define _GNU_SOURCE
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "../kselftest.h"
+
+static volatile uint8_t var[96] __attribute__((__aligned__(32)));
+
+static void child(int size, int wr)
+{
+   volatile uint8_t *addr = [32 + wr];
+
+   if (ptrace(PTRACE_TRACEME, 0, NULL, NULL) != 0) {
+   perror("ptrace(PTRACE_TRACEME) failed");
+   _exit(1);
+   }
+
+   if (raise(SIGSTOP) != 0) {
+   perror("raise(SIGSTOP) failed");
+   _exit(1);
+   }
+
+   if ((uintptr_t) addr % size) {
+   perror("Wrong address write for the given size\n");
+   _exit(1);
+   }
+   switch (size) {
+   case 1:
+   *addr = 47;
+   break;
+   case 2:
+   *(uint16_t *)addr = 47;
+   break;
+   case 4:
+   *(uint32_t *)addr = 47;
+   break;
+   case 8:
+   *(uint64_t *)addr = 47;
+   break;
+   case 16:
+   __asm__ volatile ("stp x29, x30, %0" : "=m" (addr[0]));
+   break;
+   case 32:
+   __asm__ volatile ("stp q29, q30, %0" : "=m" (addr[0]));
+   break;
+   }
+
+   _exit(0);
+}
+
+static bool set_watchpoint(pid_t pid, int size, int wp)
+{
+   const volatile uint8_t *addr = [32 + wp];
+   const int offset = (uintptr_t)addr % 8;
+   const unsigned int byte_mask = ((1 << size) - 1) << offset;
+   const unsigned int type = 2; /* Write */
+   const unsigned int enable = 1;
+   const unsigned int control = byte_mask << 5 | type << 3 | enable;
+   struct user_hwdebug_state dreg_state;
+   struct iovec iov;
+
+   memset(_state, 0, sizeof(dreg_state));
+   dreg_state.dbg_regs[0].addr = (uintptr_t)(addr - offset);
+   dreg_state.dbg_regs[0].ctrl = control;
+   iov.iov_base = _state;
+   iov.iov_len = offsetof(struct user_hwdebug_state, dbg_regs) +
+   sizeof(dreg_state.dbg_regs[0]);
+   if (ptrace(PTRACE_SETREGSET, pid, NT_ARM_HW_WATCH, ) == 0)
+   return true;
+
+   if (errno == EIO) {
+ 

  1   2   3   4   5   6   7   >