from:"Pratyush Anand"

Re: [PATCH v2] arm64: fix unwind_frame() for filtered out fn for function graph tracing

2017-09-12 Thread Pratyush Anand




On Wednesday 13 September 2017 08:12 AM, Will Deacon wrote:

On Tue, Sep 12, 2017 at 10:54:28AM +0100, James Morse wrote:

Hi Pratyush,

On 01/09/17 06:48, Pratyush Anand wrote:

do_task_stat() calls get_wchan(), which further does unbind_frame().
unbind_frame() restores frame->pc to original value in case function
graph tracer has modified a return address (LR) in a stack frame to hook
a function return. However, if function graph tracer has hit a filtered
function, then we can't unwind it as ftrace_push_return_trace() has
biased the index(frame->graph) with a 'huge negative'
offset(-FTRACE_NOTRACE_DEPTH).

Moreover, arm64 stack walker defines index(frame->graph) as unsigned
int, which can not compare a -ve number.

Similar problem we can have with calling of walk_stackframe() from
save_stack_trace_tsk() or dump_backtrace().

This patch fixes unwind_frame() to test the index for -ve value and
restore index accordingly before we can restore frame->pc.


I've just spotted arm64's profile_pc, which does this:
 From arch/arm64/kernel/time.c:profile_pc():

#ifdef CONFIG_FUNCTION_GRAPH_TRACER
frame.graph = -1; /* no task info */
#endif


Is this another elaborate way of hitting this problem?

I guess the options are skip any return-address restore in the unwinder if
frame.graph is -1. (and profile_pc may have a bug here). Or, put
current->curr_ret_stack in there.


I think we should go with latter, ie assign frame.graph = 
current->curr_ret_stack in profile_pc().




profile_pc() always passes tsk=NULL, so the unwinder assumes its current...
kernel/profile.c pulls the pt_regs from a per-cpu irq_regs variable, that is
updated by handle_IPI ... so it looks like this should always be current...


Hmmm... is profile_pc the *only* case where frame->graph isn't equal to
tsk->curr_ret_stack in unwind_frame? If so, maybe unwind_frame should just


Yes, it is the only place.


use that, and we could kill the graph member of struct stackframe completely?



Humm, not sure, we initialize frame->graph out of the while loop in 
unwind_frame()'s caller and then keep in decrementing it in looped function.


--
Regards
Pratyush

Re: [PATCH v2] arm64: fix unwind_frame() for filtered out fn for function graph tracing

2017-09-12 Thread Pratyush Anand




On Wednesday 13 September 2017 08:12 AM, Will Deacon wrote:

On Tue, Sep 12, 2017 at 10:54:28AM +0100, James Morse wrote:

Hi Pratyush,

On 01/09/17 06:48, Pratyush Anand wrote:

do_task_stat() calls get_wchan(), which further does unbind_frame().
unbind_frame() restores frame->pc to original value in case function
graph tracer has modified a return address (LR) in a stack frame to hook
a function return. However, if function graph tracer has hit a filtered
function, then we can't unwind it as ftrace_push_return_trace() has
biased the index(frame->graph) with a 'huge negative'
offset(-FTRACE_NOTRACE_DEPTH).

Moreover, arm64 stack walker defines index(frame->graph) as unsigned
int, which can not compare a -ve number.

Similar problem we can have with calling of walk_stackframe() from
save_stack_trace_tsk() or dump_backtrace().

This patch fixes unwind_frame() to test the index for -ve value and
restore index accordingly before we can restore frame->pc.


I've just spotted arm64's profile_pc, which does this:
 From arch/arm64/kernel/time.c:profile_pc():

#ifdef CONFIG_FUNCTION_GRAPH_TRACER
frame.graph = -1; /* no task info */
#endif


Is this another elaborate way of hitting this problem?

I guess the options are skip any return-address restore in the unwinder if
frame.graph is -1. (and profile_pc may have a bug here). Or, put
current->curr_ret_stack in there.


I think we should go with latter, ie assign frame.graph = 
current->curr_ret_stack in profile_pc().




profile_pc() always passes tsk=NULL, so the unwinder assumes its current...
kernel/profile.c pulls the pt_regs from a per-cpu irq_regs variable, that is
updated by handle_IPI ... so it looks like this should always be current...


Hmmm... is profile_pc the *only* case where frame->graph isn't equal to
tsk->curr_ret_stack in unwind_frame? If so, maybe unwind_frame should just


Yes, it is the only place.


use that, and we could kill the graph member of struct stackframe completely?



Humm, not sure, we initialize frame->graph out of the while loop in 
unwind_frame()'s caller and then keep in decrementing it in looped function.


--
Regards
Pratyush

[PATCH v2] arm64: fix unwind_frame() for filtered out fn for function graph tracing

2017-08-31 Thread Pratyush Anand

do_task_stat() calls get_wchan(), which further does unbind_frame().
unbind_frame() restores frame->pc to original value in case function
graph tracer has modified a return address (LR) in a stack frame to hook
a function return. However, if function graph tracer has hit a filtered
function, then we can't unwind it as ftrace_push_return_trace() has
biased the index(frame->graph) with a 'huge negative'
offset(-FTRACE_NOTRACE_DEPTH).

Moreover, arm64 stack walker defines index(frame->graph) as unsigned
int, which can not compare a -ve number.

Similar problem we can have with calling of walk_stackframe() from
save_stack_trace_tsk() or dump_backtrace().

This patch fixes unwind_frame() to test the index for -ve value and
restore index accordingly before we can restore frame->pc.

Reproducer:

cd /sys/kernel/debug/tracing/
echo schedule > set_graph_notrace
echo 1 > options/display-graph
echo wakeup > current_tracer
ps -ef | grep -i agent

Above commands result in:
Unable to handle kernel paging request at virtual address 801bd3d1e000
pgd = 8003cbe97c00
[801bd3d1e000] *pgd=, *pud=
Internal error: Oops: 9606 [#1] SMP
[...]
CPU: 5 PID: 11696 Comm: ps Not tainted 4.11.0+ #33
[...]
task: 8003c21ba000 task.stack: 8003cc6c
PC is at unwind_frame+0x12c/0x180
LR is at get_wchan+0xd4/0x134
pc : [] lr : [] pstate: 6145
sp : 8003cc6c3ab0
x29: 8003cc6c3ab0 x28: 0001
x27: 0026 x26: 0026
x25: 12d8 x24: 
x23: 8003c1c04000 x22: 08c83000
x21: 8003c1c0 x20: 000f
x19: 8003c1bc x18: fc593690
x17:  x16: 0001
x15: b855670e2b60 x14: 0003e97f22cf1d0f
x13: 0001 x12: 
x11: e8f4883e x10: 000154f47ec8
x9 : 70f367c0 x8 : 
x7 : 8003f729 x6 : 0018
x5 :  x4 : 8003c1c03cb0
x3 : 8003c1c03ca0 x2 : 0017ffe8
x1 : 8003cc6c3af8 x0 : 8003d3e9e000

Process ps (pid: 11696, stack limit = 0x8003cc6c)
Stack: (0x8003cc6c3ab0 to 0x8003cc6c4000)
[...]
[] unwind_frame+0x12c/0x180
[] do_task_stat+0x864/0x870
[] proc_tgid_stat+0x3c/0x48
[] proc_single_show+0x5c/0xb8
[] seq_read+0x160/0x414
[] __vfs_read+0x58/0x164
[] vfs_read+0x88/0x144
[] SyS_read+0x60/0xc0
[] __sys_trace_return+0x0/0x4

fixes: 20380bb390a4 (arm64: ftrace: fix a stack tracer's output under function 
graph tracer)
Signed-off-by: Pratyush Anand <pan...@redhat.com>
---
v1 -> v2:
- improved commit log
- now index is restored and thereafter frame->pc as well.

 arch/arm64/include/asm/stacktrace.h | 2 +-
 arch/arm64/kernel/stacktrace.c  | 3 +++
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/stacktrace.h 
b/arch/arm64/include/asm/stacktrace.h
index 5b6eafccc5d8..816db5b9b874 100644
--- a/arch/arm64/include/asm/stacktrace.h
+++ b/arch/arm64/include/asm/stacktrace.h
@@ -23,7 +23,7 @@ struct stackframe {
unsigned long sp;
unsigned long pc;
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
-   unsigned int graph;
+   int graph;
 #endif
 };
 
diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index 09d37d66b630..4c47147d0554 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -75,6 +75,9 @@ int notrace unwind_frame(struct task_struct *tsk, struct 
stackframe *frame)
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
if (tsk->ret_stack &&
(frame->pc == (unsigned long)return_to_handler)) {
+   if (frame->graph < 0)
+   frame->graph += FTRACE_NOTRACE_DEPTH;
+
/*
 * This is a case where function graph tracer has
 * modified a return address (LR) in a stack frame
-- 
2.9.4

[PATCH v2] arm64: fix unwind_frame() for filtered out fn for function graph tracing

2017-08-31 Thread Pratyush Anand

do_task_stat() calls get_wchan(), which further does unbind_frame().
unbind_frame() restores frame->pc to original value in case function
graph tracer has modified a return address (LR) in a stack frame to hook
a function return. However, if function graph tracer has hit a filtered
function, then we can't unwind it as ftrace_push_return_trace() has
biased the index(frame->graph) with a 'huge negative'
offset(-FTRACE_NOTRACE_DEPTH).

Moreover, arm64 stack walker defines index(frame->graph) as unsigned
int, which can not compare a -ve number.

Similar problem we can have with calling of walk_stackframe() from
save_stack_trace_tsk() or dump_backtrace().

This patch fixes unwind_frame() to test the index for -ve value and
restore index accordingly before we can restore frame->pc.

Reproducer:

cd /sys/kernel/debug/tracing/
echo schedule > set_graph_notrace
echo 1 > options/display-graph
echo wakeup > current_tracer
ps -ef | grep -i agent

Above commands result in:
Unable to handle kernel paging request at virtual address 801bd3d1e000
pgd = 8003cbe97c00
[801bd3d1e000] *pgd=, *pud=
Internal error: Oops: 9606 [#1] SMP
[...]
CPU: 5 PID: 11696 Comm: ps Not tainted 4.11.0+ #33
[...]
task: 8003c21ba000 task.stack: 8003cc6c
PC is at unwind_frame+0x12c/0x180
LR is at get_wchan+0xd4/0x134
pc : [] lr : [] pstate: 6145
sp : 8003cc6c3ab0
x29: 8003cc6c3ab0 x28: 0001
x27: 0026 x26: 0026
x25: 12d8 x24: 
x23: 8003c1c04000 x22: 08c83000
x21: 8003c1c0 x20: 000f
x19: 8003c1bc x18: fc593690
x17:  x16: 0001
x15: b855670e2b60 x14: 0003e97f22cf1d0f
x13: 0001 x12: 
x11: e8f4883e x10: 000154f47ec8
x9 : 70f367c0 x8 : 
x7 : 8003f729 x6 : 0018
x5 :  x4 : 8003c1c03cb0
x3 : 8003c1c03ca0 x2 : 0017ffe8
x1 : 8003cc6c3af8 x0 : 8003d3e9e000

Process ps (pid: 11696, stack limit = 0x8003cc6c)
Stack: (0x8003cc6c3ab0 to 0x8003cc6c4000)
[...]
[] unwind_frame+0x12c/0x180
[] do_task_stat+0x864/0x870
[] proc_tgid_stat+0x3c/0x48
[] proc_single_show+0x5c/0xb8
[] seq_read+0x160/0x414
[] __vfs_read+0x58/0x164
[] vfs_read+0x88/0x144
[] SyS_read+0x60/0xc0
[] __sys_trace_return+0x0/0x4

fixes: 20380bb390a4 (arm64: ftrace: fix a stack tracer's output under function 
graph tracer)
Signed-off-by: Pratyush Anand 
---
v1 -> v2:
- improved commit log
- now index is restored and thereafter frame->pc as well.

 arch/arm64/include/asm/stacktrace.h | 2 +-
 arch/arm64/kernel/stacktrace.c  | 3 +++
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/include/asm/stacktrace.h 
b/arch/arm64/include/asm/stacktrace.h
index 5b6eafccc5d8..816db5b9b874 100644
--- a/arch/arm64/include/asm/stacktrace.h
+++ b/arch/arm64/include/asm/stacktrace.h
@@ -23,7 +23,7 @@ struct stackframe {
unsigned long sp;
unsigned long pc;
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
-   unsigned int graph;
+   int graph;
 #endif
 };
 
diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index 09d37d66b630..4c47147d0554 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -75,6 +75,9 @@ int notrace unwind_frame(struct task_struct *tsk, struct 
stackframe *frame)
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
if (tsk->ret_stack &&
(frame->pc == (unsigned long)return_to_handler)) {
+   if (frame->graph < 0)
+   frame->graph += FTRACE_NOTRACE_DEPTH;
+
/*
 * This is a case where function graph tracer has
 * modified a return address (LR) in a stack frame
-- 
2.9.4

Re: [PATCH 03/14] resource: add walk_system_ram_res_rev()

2017-08-30 Thread Pratyush Anand




On Thursday 24 August 2017 01:48 PM, AKASHI Takahiro wrote:

This function, being a variant of walk_system_ram_res() introduced in
commit 8c86e70acead ("resource: provide new functions to walk through
resources"), walks through a list of all the resources of System RAM
in reversed order, i.e., from higher to lower.

It will be used in kexec_file implementation on arm64.

Signed-off-by: AKASHI Takahiro 
Cc: Vivek Goyal 
Cc: Andrew Morton 
Cc: Linus Torvalds 
---
  include/linux/ioport.h |  3 +++
  kernel/resource.c  | 48 
  2 files changed, 51 insertions(+)

diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index 6230064d7f95..9a212266299f 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -271,6 +271,9 @@ extern int
  walk_system_ram_res(u64 start, u64 end, void *arg,
int (*func)(u64, u64, void *));
  extern int
+walk_system_ram_res_rev(u64 start, u64 end, void *arg,
+   int (*func)(u64, u64, void *));
+extern int
  walk_iomem_res_desc(unsigned long desc, unsigned long flags, u64 start, u64 
end,
void *arg, int (*func)(u64, u64, void *));
  
diff --git a/kernel/resource.c b/kernel/resource.c

index 9b5f04404152..1d6d734c75ac 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -23,6 +23,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  
  
@@ -469,6 +470,53 @@ int walk_system_ram_res(u64 start, u64 end, void *arg,

return ret;
  }
  
+int walk_system_ram_res_rev(u64 start, u64 end, void *arg,

+   int (*func)(u64, u64, void *))
+{
+   struct resource res, *rams;
+   u64 orig_end;
+   int count, i;
+   int ret = -1;
+
+   count = 16; /* initial */
+again:
+   /* create a list */
+   rams = vmalloc(sizeof(struct resource) * count);
+   if (!rams)
+   return ret;
+
+   res.start = start;
+   res.end = end;
+   res.flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
+   orig_end = res.end;
+   i = 0;
+   while ((res.start < res.end) &&
+   (!find_next_iomem_res(, IORES_DESC_NONE, true))) {
+   if (i >= count) {
+   /* unlikely but */
+   vfree(rams);
+   count += 16;
+   goto again;


Wounld't it be better to re-alloc a bigger space,copy previous values and free 
the previous pointer, instead of going *again*.



+   }
+
+   rams[i].start = res.start;
+   rams[i++].end = res.end;
+
+   res.start = res.end + 1;
+   res.end = orig_end;
+   }
+
+   /* go reverse */
+   for (i--; i >= 0; i--) {
+   ret = (*func)(rams[i].start, rams[i].end, arg);
+   if (ret)
+   break;
+   }
+
+   vfree(rams);
+   return ret;
+}
+
  #if !defined(CONFIG_ARCH_HAS_WALK_MEMORY)
  
  /*




--
Regards
Pratyush

Re: [PATCH 03/14] resource: add walk_system_ram_res_rev()

2017-08-30 Thread Pratyush Anand




On Thursday 24 August 2017 01:48 PM, AKASHI Takahiro wrote:

This function, being a variant of walk_system_ram_res() introduced in
commit 8c86e70acead ("resource: provide new functions to walk through
resources"), walks through a list of all the resources of System RAM
in reversed order, i.e., from higher to lower.

It will be used in kexec_file implementation on arm64.

Signed-off-by: AKASHI Takahiro 
Cc: Vivek Goyal 
Cc: Andrew Morton 
Cc: Linus Torvalds 
---
  include/linux/ioport.h |  3 +++
  kernel/resource.c  | 48 
  2 files changed, 51 insertions(+)

diff --git a/include/linux/ioport.h b/include/linux/ioport.h
index 6230064d7f95..9a212266299f 100644
--- a/include/linux/ioport.h
+++ b/include/linux/ioport.h
@@ -271,6 +271,9 @@ extern int
  walk_system_ram_res(u64 start, u64 end, void *arg,
int (*func)(u64, u64, void *));
  extern int
+walk_system_ram_res_rev(u64 start, u64 end, void *arg,
+   int (*func)(u64, u64, void *));
+extern int
  walk_iomem_res_desc(unsigned long desc, unsigned long flags, u64 start, u64 
end,
void *arg, int (*func)(u64, u64, void *));
  
diff --git a/kernel/resource.c b/kernel/resource.c

index 9b5f04404152..1d6d734c75ac 100644
--- a/kernel/resource.c
+++ b/kernel/resource.c
@@ -23,6 +23,7 @@
  #include 
  #include 
  #include 
+#include 
  #include 
  
  
@@ -469,6 +470,53 @@ int walk_system_ram_res(u64 start, u64 end, void *arg,

return ret;
  }
  
+int walk_system_ram_res_rev(u64 start, u64 end, void *arg,

+   int (*func)(u64, u64, void *))
+{
+   struct resource res, *rams;
+   u64 orig_end;
+   int count, i;
+   int ret = -1;
+
+   count = 16; /* initial */
+again:
+   /* create a list */
+   rams = vmalloc(sizeof(struct resource) * count);
+   if (!rams)
+   return ret;
+
+   res.start = start;
+   res.end = end;
+   res.flags = IORESOURCE_SYSTEM_RAM | IORESOURCE_BUSY;
+   orig_end = res.end;
+   i = 0;
+   while ((res.start < res.end) &&
+   (!find_next_iomem_res(, IORES_DESC_NONE, true))) {
+   if (i >= count) {
+   /* unlikely but */
+   vfree(rams);
+   count += 16;
+   goto again;


Wounld't it be better to re-alloc a bigger space,copy previous values and free 
the previous pointer, instead of going *again*.



+   }
+
+   rams[i].start = res.start;
+   rams[i++].end = res.end;
+
+   res.start = res.end + 1;
+   res.end = orig_end;
+   }
+
+   /* go reverse */
+   for (i--; i >= 0; i--) {
+   ret = (*func)(rams[i].start, rams[i].end, arg);
+   if (ret)
+   break;
+   }
+
+   vfree(rams);
+   return ret;
+}
+
  #if !defined(CONFIG_ARCH_HAS_WALK_MEMORY)
  
  /*




--
Regards
Pratyush

Re: [PATCH] arm64: fix unwind_frame() for filtered out fn via set_graph_notrace

2017-08-30 Thread Pratyush Anand




On Tuesday 29 August 2017 10:33 PM, James Morse wrote:

Hi Pratyush,

(Nit: get_maintainer.pl will give you the list of people to CC, you're reliant
on the maintainers being eagle-eyed to spot this!)


I noticed it after sending. I do use it, but there was a bug in my cccmd 
script. I have fixed it. I hope, it won't miss from next time.




On 28/08/17 13:53, Pratyush Anand wrote:

Testcase:
cd /sys/kernel/debug/tracing/
echo schedule > set_graph_notrace
echo 1 > options/display-graph
echo wakeup > current_tracer
ps -ef | grep -i agent

Above commands result in
PANIC: "Unable to handle kernel paging request at virtual address
801bcbde7000"


This didn't panic for me, it just killed the running task. (I guess you have
panic-on-oops set)



yes..




vmcore analysis:
1)
crash> bt
PID: 1561   TASK: 8003cb7e4000  CPU: 0   COMMAND: "ps"
  #0 [8003c4ff77b0] crash_kexec at 0816b9b8
  #1 [8003c4ff77e0] die at 08088b34
  #2 [8003c4ff7820] __do_kernel_fault at 0809b830
  #3 [8003c4ff7850] do_bad_area at 08098b90
  #4 [8003c4ff7880] do_translation_fault at 087c6cdc
  #5 [8003c4ff78b0] do_mem_abort at 08081334
  #6 [8003c4ff7ab0] el1_ia at 08082cc0
  PC: 0808811c  [unwind_frame+300]
  LR: 080858a8  [get_wchan+212]
  SP: 8003c4ff7ab0  PSTATE: 6145
 X29: 8003c4ff7ab0  X28: 0001  X27: 
 X26:   X25:   X24: 
 X23: 8003c1c2  X22: 08c73000  X21: 8003c1c1c000
 X20: 000f  X19: 8003c1bc7000  X18: 0010
 X17:   X16: 0001  X15: ffed
 X14: 0010  X13:   X12: 0004
 X11:   X10: 02dd14c0   X9: 1999
  X8: 003f   X7: 8003f71b   X6: 0018
  X5:    X4: 8003c1c1fd20   X3: 8003c1c1fd10
  X2: 0017ffe8   X1: 8003c4ff7af8   X0: 8003cbf67000
  #7 [8003c4ff7b20] do_task_stat at 08304f0c
  #8 [8003c4ff7c60] proc_tgid_stat at 08305b48
  #9 [8003c4ff7ca0] proc_single_show at 082fdd10
  #10 [8003c4ff7ce0] seq_read at 082b27bc
  #11 [8003c4ff7d70] __vfs_read at 08289e54
  #12 [8003c4ff7e30] vfs_read at 0828b14c
  #13 [8003c4ff7e70] sys_read at 0828c2d0
  #14 [8003c4ff7ed0] __sys_trace at 0808349c
  PC: 0006  LR:   SP: ffed  PSTATE: 003f
 X12: 0010 X11:  X10: 0004  X9: 7febe8d0
  X8:   X7: 1999  X6: 003f  X5: 000c
  X4: 7fce9c78  X3: 000c  X2:   X1: 
  X0: 0400

(2) Instruction at 0808811c caused IA/DA.

crash> dis -l 08088108 6
/usr/src/debug//xx/arch/arm64/kernel/stacktrace.c:
84
0x08088108 <unwind_frame+280>:  ldr w2, [x1,#24]
0x0808810c <unwind_frame+284>:  sub w6, w2, #0x1
0x08088110 <unwind_frame+288>:  str w6, [x1,#24]
0x08088114 <unwind_frame+292>:  mov w6, #0x18 // #24
0x08088118 <unwind_frame+296>:  umull   x2, w2, w6
0x0808811c <unwind_frame+300>:  ldr x0, [x0,x2]

Corresponding c statement is
frame->pc = tsk->ret_stack[frame->graph--].ret;

(3) So, it caused data abort while executing
0x0808811c <unwind_frame+300>:  ldr x0, [x0,x2]

x0 + x2 = 8003cbf67000 + 0017ffe8 = 801bcbde7000
Access of 801bcbde7000 resulted in "Unable to handle kernel paging
request"

from above data: frame->graph = task->curr_ret_stack which  should be,
x2 / 0x18 = , which is -FTRACE_NOTRACE_DEPTH.


(0x18 is the size of struct ftrace_ret_stack for your config?)


yes.





OK, so problem is here:
do_task_stat() calls get_wchan(). Here p->curr_ret_stack  is
-FTRACE_NOTRACE_DEPTH in the failed case.



It means, when do_task_stat()
has been called for task A (ps in this case) on CPUm,task A was in mid
of execution on CPUn,


get_wchan() on a running processes can't work: the stack may be modified while
we walk it.
 From arch/arm64/kernel/process.c::get_wchan():

if (!p || p == current || p->state == TASK_RUNNING)
return 0;


As far as I can see, if task A is running on CPU-N then CPU-Ms attempt to
get_wchan() it will return 0.


You seems right.





and was in the mid of mcount() execution where
curr_ret_stack had been decremented in ftrace_push_return_trace() for
hitting schedule() function, but it was yet to be incremented in
ftrace_return_to_handler().


So if the function-graph-tracer has hooked the return values on a stack and hit
a filtered function, (schedule() in your example),

Re: [PATCH] arm64: fix unwind_frame() for filtered out fn via set_graph_notrace

2017-08-30 Thread Pratyush Anand




On Tuesday 29 August 2017 10:33 PM, James Morse wrote:

Hi Pratyush,

(Nit: get_maintainer.pl will give you the list of people to CC, you're reliant
on the maintainers being eagle-eyed to spot this!)


I noticed it after sending. I do use it, but there was a bug in my cccmd 
script. I have fixed it. I hope, it won't miss from next time.




On 28/08/17 13:53, Pratyush Anand wrote:

Testcase:
cd /sys/kernel/debug/tracing/
echo schedule > set_graph_notrace
echo 1 > options/display-graph
echo wakeup > current_tracer
ps -ef | grep -i agent

Above commands result in
PANIC: "Unable to handle kernel paging request at virtual address
801bcbde7000"


This didn't panic for me, it just killed the running task. (I guess you have
panic-on-oops set)



yes..




vmcore analysis:
1)
crash> bt
PID: 1561   TASK: 8003cb7e4000  CPU: 0   COMMAND: "ps"
  #0 [8003c4ff77b0] crash_kexec at 0816b9b8
  #1 [8003c4ff77e0] die at 08088b34
  #2 [8003c4ff7820] __do_kernel_fault at 0809b830
  #3 [8003c4ff7850] do_bad_area at 08098b90
  #4 [8003c4ff7880] do_translation_fault at 087c6cdc
  #5 [8003c4ff78b0] do_mem_abort at 08081334
  #6 [8003c4ff7ab0] el1_ia at 08082cc0
  PC: 0808811c  [unwind_frame+300]
  LR: 080858a8  [get_wchan+212]
  SP: 8003c4ff7ab0  PSTATE: 6145
 X29: 8003c4ff7ab0  X28: 0001  X27: 
 X26:   X25:   X24: 
 X23: 8003c1c2  X22: 08c73000  X21: 8003c1c1c000
 X20: 000f  X19: 8003c1bc7000  X18: 0010
 X17:   X16: 0001  X15: ffed
 X14: 0010  X13:   X12: 0004
 X11:   X10: 02dd14c0   X9: 1999
  X8: 003f   X7: 8003f71b   X6: 0018
  X5:    X4: 8003c1c1fd20   X3: 8003c1c1fd10
  X2: 0017ffe8   X1: 8003c4ff7af8   X0: 8003cbf67000
  #7 [8003c4ff7b20] do_task_stat at 08304f0c
  #8 [8003c4ff7c60] proc_tgid_stat at 08305b48
  #9 [8003c4ff7ca0] proc_single_show at 082fdd10
  #10 [8003c4ff7ce0] seq_read at 082b27bc
  #11 [8003c4ff7d70] __vfs_read at 08289e54
  #12 [8003c4ff7e30] vfs_read at 0828b14c
  #13 [8003c4ff7e70] sys_read at 0828c2d0
  #14 [8003c4ff7ed0] __sys_trace at 0808349c
  PC: 0006  LR:   SP: ffed  PSTATE: 003f
 X12: 0010 X11:  X10: 0004  X9: 7febe8d0
  X8:   X7: 1999  X6: 003f  X5: 000c
  X4: 7fce9c78  X3: 000c  X2:   X1: 
  X0: 0400

(2) Instruction at 0808811c caused IA/DA.

crash> dis -l 08088108 6
/usr/src/debug//xx/arch/arm64/kernel/stacktrace.c:
84
0x08088108 :  ldr w2, [x1,#24]
0x0808810c :  sub w6, w2, #0x1
0x08088110 :  str w6, [x1,#24]
0x08088114 :  mov w6, #0x18 // #24
0x08088118 :  umull   x2, w2, w6
0x0808811c :  ldr x0, [x0,x2]

Corresponding c statement is
frame->pc = tsk->ret_stack[frame->graph--].ret;

(3) So, it caused data abort while executing
0x0808811c :  ldr x0, [x0,x2]

x0 + x2 = 8003cbf67000 + 0017ffe8 = 801bcbde7000
Access of 801bcbde7000 resulted in "Unable to handle kernel paging
request"

from above data: frame->graph = task->curr_ret_stack which  should be,
x2 / 0x18 = , which is -FTRACE_NOTRACE_DEPTH.


(0x18 is the size of struct ftrace_ret_stack for your config?)


yes.





OK, so problem is here:
do_task_stat() calls get_wchan(). Here p->curr_ret_stack  is
-FTRACE_NOTRACE_DEPTH in the failed case.



It means, when do_task_stat()
has been called for task A (ps in this case) on CPUm,task A was in mid
of execution on CPUn,


get_wchan() on a running processes can't work: the stack may be modified while
we walk it.
 From arch/arm64/kernel/process.c::get_wchan():

if (!p || p == current || p->state == TASK_RUNNING)
return 0;


As far as I can see, if task A is running on CPU-N then CPU-Ms attempt to
get_wchan() it will return 0.


You seems right.





and was in the mid of mcount() execution where
curr_ret_stack had been decremented in ftrace_push_return_trace() for
hitting schedule() function, but it was yet to be incremented in
ftrace_return_to_handler().


So if the function-graph-tracer has hooked the return values on a stack and hit
a filtered function, (schedule() in your example), we can't unwind it as
ftrace_push_return_trace() has biased the index with a 'huge negative' offset to
flag it as 'this should be filtered out'.

The arm64 stack walker isn't awa

[PATCH] arm64: fix unwind_frame() for filtered out fn via set_graph_notrace

2017-08-28 Thread Pratyush Anand

Testcase:
cd /sys/kernel/debug/tracing/
echo schedule > set_graph_notrace
echo 1 > options/display-graph
echo wakeup > current_tracer
ps -ef | grep -i agent

Above commands result in
PANIC: "Unable to handle kernel paging request at virtual address
801bcbde7000"

vmcore analysis:
1)
crash> bt
PID: 1561   TASK: 8003cb7e4000  CPU: 0   COMMAND: "ps"
 #0 [8003c4ff77b0] crash_kexec at 0816b9b8
 #1 [8003c4ff77e0] die at 08088b34
 #2 [8003c4ff7820] __do_kernel_fault at 0809b830
 #3 [8003c4ff7850] do_bad_area at 08098b90
 #4 [8003c4ff7880] do_translation_fault at 087c6cdc
 #5 [8003c4ff78b0] do_mem_abort at 08081334
 #6 [8003c4ff7ab0] el1_ia at 08082cc0
 PC: 0808811c  [unwind_frame+300]
 LR: 080858a8  [get_wchan+212]
 SP: 8003c4ff7ab0  PSTATE: 6145
X29: 8003c4ff7ab0  X28: 0001  X27: 
X26:   X25:   X24: 
X23: 8003c1c2  X22: 08c73000  X21: 8003c1c1c000
X20: 000f  X19: 8003c1bc7000  X18: 0010
X17:   X16: 0001  X15: ffed
X14: 0010  X13:   X12: 0004
X11:   X10: 02dd14c0   X9: 1999
 X8: 003f   X7: 8003f71b   X6: 0018
 X5:    X4: 8003c1c1fd20   X3: 8003c1c1fd10
 X2: 0017ffe8   X1: 8003c4ff7af8   X0: 8003cbf67000
 #7 [8003c4ff7b20] do_task_stat at 08304f0c
 #8 [8003c4ff7c60] proc_tgid_stat at 08305b48
 #9 [8003c4ff7ca0] proc_single_show at 082fdd10
 #10 [8003c4ff7ce0] seq_read at 082b27bc
 #11 [8003c4ff7d70] __vfs_read at 08289e54
 #12 [8003c4ff7e30] vfs_read at 0828b14c
 #13 [8003c4ff7e70] sys_read at 0828c2d0
 #14 [8003c4ff7ed0] __sys_trace at 0808349c
 PC: 0006  LR:   SP: ffed  PSTATE: 003f
X12: 0010 X11:  X10: 0004  X9: 7febe8d0
 X8:   X7: 1999  X6: 003f  X5: 000c
 X4: 7fce9c78  X3: 000c  X2:   X1: 
 X0: 0400

(2) Instruction at 0808811c caused IA/DA.

crash> dis -l 08088108 6
/usr/src/debug//xx/arch/arm64/kernel/stacktrace.c:
84
0x08088108 <unwind_frame+280>:  ldr w2, [x1,#24]
0x0808810c <unwind_frame+284>:  sub w6, w2, #0x1
0x08088110 <unwind_frame+288>:  str w6, [x1,#24]
0x08088114 <unwind_frame+292>:  mov w6, #0x18 // #24
0x08088118 <unwind_frame+296>:  umull   x2, w2, w6
0x0808811c <unwind_frame+300>:  ldr x0, [x0,x2]

Corresponding c statement is
frame->pc = tsk->ret_stack[frame->graph--].ret;

(3) So, it caused data abort while executing
0x0808811c <unwind_frame+300>:  ldr x0, [x0,x2]

x0 + x2 = 8003cbf67000 + 0017ffe8 = 801bcbde7000
Access of 801bcbde7000 resulted in "Unable to handle kernel paging
request"

from above data: frame->graph = task->curr_ret_stack which  should be,
x2 / 0x18 = , which is -FTRACE_NOTRACE_DEPTH.

OK, so problem is here:
do_task_stat() calls get_wchan(). Here p->curr_ret_stack  is
-FTRACE_NOTRACE_DEPTH in the failed case. It means, when do_task_stat()
has been called for task A (ps in this case) on CPUm,task A was in mid
of execution on CPUn, and was in the mid of mcount() execution where
curr_ret_stack had been decremented in ftrace_push_return_trace() for
hitting schedule() function, but it was yet to be incremented in
ftrace_return_to_handler().

Similar problem we can have with calling of walk_stackframe() from
save_stack_trace_tsk() or dump_backtrace().

This patch fixes unwind_frame() to not to manipulate frame->pc for
function graph tracer if the function has been marked in
set_graph_notrace.

This patch fixes: 20380bb390a4 (arm64: ftrace: fix a stack tracer's
output under function graph tracer)

Signed-off-by: Pratyush Anand <pan...@redhat.com>
---
 arch/arm64/kernel/stacktrace.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index 09d37d66b630..e79035d673b3 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -74,7 +74,8 @@ int notrace unwind_frame(struct task_struct *tsk, struct 
stackframe *frame)
 
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
if (tsk->ret_stack &&
-   (frame->pc == (unsigned long)return_to_handler)) {
+   (frame->pc == (unsigned long)return_to_handler) &&
+   (frame->graph > -1)) {
/*
 * This is a case where function graph tracer has
 * modified a return address (LR) in a stack frame
-- 
2.9.4

[PATCH] arm64: fix unwind_frame() for filtered out fn via set_graph_notrace

2017-08-28 Thread Pratyush Anand

Testcase:
cd /sys/kernel/debug/tracing/
echo schedule > set_graph_notrace
echo 1 > options/display-graph
echo wakeup > current_tracer
ps -ef | grep -i agent

Above commands result in
PANIC: "Unable to handle kernel paging request at virtual address
801bcbde7000"

vmcore analysis:
1)
crash> bt
PID: 1561   TASK: 8003cb7e4000  CPU: 0   COMMAND: "ps"
 #0 [8003c4ff77b0] crash_kexec at 0816b9b8
 #1 [8003c4ff77e0] die at 08088b34
 #2 [8003c4ff7820] __do_kernel_fault at 0809b830
 #3 [8003c4ff7850] do_bad_area at 08098b90
 #4 [8003c4ff7880] do_translation_fault at 087c6cdc
 #5 [8003c4ff78b0] do_mem_abort at 08081334
 #6 [8003c4ff7ab0] el1_ia at 08082cc0
 PC: 0808811c  [unwind_frame+300]
 LR: 080858a8  [get_wchan+212]
 SP: 8003c4ff7ab0  PSTATE: 6145
X29: 8003c4ff7ab0  X28: 0001  X27: 
X26:   X25:   X24: 
X23: 8003c1c2  X22: 08c73000  X21: 8003c1c1c000
X20: 000f  X19: 8003c1bc7000  X18: 0010
X17:   X16: 0001  X15: ffed
X14: 0010  X13:   X12: 0004
X11:   X10: 02dd14c0   X9: 1999
 X8: 003f   X7: 8003f71b   X6: 0018
 X5:    X4: 8003c1c1fd20   X3: 8003c1c1fd10
 X2: 0017ffe8   X1: 8003c4ff7af8   X0: 8003cbf67000
 #7 [8003c4ff7b20] do_task_stat at 08304f0c
 #8 [8003c4ff7c60] proc_tgid_stat at 08305b48
 #9 [8003c4ff7ca0] proc_single_show at 082fdd10
 #10 [8003c4ff7ce0] seq_read at 082b27bc
 #11 [8003c4ff7d70] __vfs_read at 08289e54
 #12 [8003c4ff7e30] vfs_read at 0828b14c
 #13 [8003c4ff7e70] sys_read at 0828c2d0
 #14 [8003c4ff7ed0] __sys_trace at 0808349c
 PC: 0006  LR:   SP: ffed  PSTATE: 003f
X12: 0010 X11:  X10: 0004  X9: 7febe8d0
 X8:   X7: 1999  X6: 003f  X5: 000c
 X4: 7fce9c78  X3: 000c  X2:   X1: 
 X0: 0400

(2) Instruction at 0808811c caused IA/DA.

crash> dis -l 08088108 6
/usr/src/debug//xx/arch/arm64/kernel/stacktrace.c:
84
0x08088108 :  ldr w2, [x1,#24]
0x0808810c :  sub w6, w2, #0x1
0x08088110 :  str w6, [x1,#24]
0x08088114 :  mov w6, #0x18 // #24
0x08088118 :  umull   x2, w2, w6
0x0808811c :  ldr x0, [x0,x2]

Corresponding c statement is
frame->pc = tsk->ret_stack[frame->graph--].ret;

(3) So, it caused data abort while executing
0x0808811c :  ldr x0, [x0,x2]

x0 + x2 = 8003cbf67000 + 0017ffe8 = 801bcbde7000
Access of 801bcbde7000 resulted in "Unable to handle kernel paging
request"

from above data: frame->graph = task->curr_ret_stack which  should be,
x2 / 0x18 = , which is -FTRACE_NOTRACE_DEPTH.

OK, so problem is here:
do_task_stat() calls get_wchan(). Here p->curr_ret_stack  is
-FTRACE_NOTRACE_DEPTH in the failed case. It means, when do_task_stat()
has been called for task A (ps in this case) on CPUm,task A was in mid
of execution on CPUn, and was in the mid of mcount() execution where
curr_ret_stack had been decremented in ftrace_push_return_trace() for
hitting schedule() function, but it was yet to be incremented in
ftrace_return_to_handler().

Similar problem we can have with calling of walk_stackframe() from
save_stack_trace_tsk() or dump_backtrace().

This patch fixes unwind_frame() to not to manipulate frame->pc for
function graph tracer if the function has been marked in
set_graph_notrace.

This patch fixes: 20380bb390a4 (arm64: ftrace: fix a stack tracer's
output under function graph tracer)

Signed-off-by: Pratyush Anand 
---
 arch/arm64/kernel/stacktrace.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index 09d37d66b630..e79035d673b3 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -74,7 +74,8 @@ int notrace unwind_frame(struct task_struct *tsk, struct 
stackframe *frame)
 
 #ifdef CONFIG_FUNCTION_GRAPH_TRACER
if (tsk->ret_stack &&
-   (frame->pc == (unsigned long)return_to_handler)) {
+   (frame->pc == (unsigned long)return_to_handler) &&
+   (frame->graph > -1)) {
/*
 * This is a case where function graph tracer has
 * modified a return address (LR) in a stack frame
-- 
2.9.4

Re: [PATCH v3 0/5] ARM64: disable irq between breakpoint and step exception

2017-08-25 Thread Pratyush Anand


Hi Mark, James and Will,

What is the take on this series?

Just to summarize:

- James took first three patches of the series and replaced 4th and 5th patch 
with his 3 patch series [3].
- My patchset disables interrupt while exiting from HW breakpoint/watchpoint 
and SW breakpoint handlers, and re-enables after single step handling..but it 
will work reliably for single stepping even if an interrupt is generated 
during  breakpoint/watchpoint handler execution.
- James's approach keeps interrupt enabled, but it might fail if a hardware 
breakpoint is instrumented on an ISR called between breakpoint and step exception.


@James, I understand there would be other things on priority..do you have a 
plan to fix the pitfall and send a new series. If not, can I address Peter's 
concern on patch 1/5 and send v4?


[3] https://www.spinics.net/lists/arm-kernel/msg598214.html

On Monday 31 July 2017 04:10 PM, Pratyush Anand wrote:

v2 -> v3
- Moved step_needed from uapi structure to kernel only structure
- Re-enable interrupt if stepped instruction faults
- Modified register_wide_hw_breakpoint() to accept step_needed arg
v2 was here: http://marc.info/?l=linux-arm-kernel=149942910730496=2

v1 -> v2:
- patch 1 of v1 has been modified to patch 1-3 of v2.
- Introduced a new event attribute step_needed and implemented
   hw_breakpoint_needs_single_step() (patch 1)
- Replaced usage of is_default_overflow_handler() with
   hw_breakpoint_needs_single_step(). (patch 2)
- Modified sample test to set set step_needed bit field (patch 3)
v1 was here: http://marc.info/?l=linux-arm-kernel=149910958418708=2

samples/hw_breakpoint/data_breakpoint.c passes with x86_64 but fails with
ARM64. Even though it has been NAKed previously on upstream [1, 2], I have
tried to come up with patches which can resolve it for ARM64 as well.

I noticed that even perf step exception can go into an infinite loop if CPU
receives an interrupt while executing breakpoint/watchpoint handler. So,
event though we are not concerned about above test, we will have to find a
solution for the perf issue.

This patchset attempts to resolve both the issue. Please review.
Since, it also takes care of SW breakpoint, so I hope kgdb should also be
fine. However, I have not tested that.
@Takahiro: Will it be possible to test these patches for kgdb.

[1] http://marc.info/?l=linux-arm-kernel=149580777524910=2
[2] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-April/425266.html


Pratyush Anand (5):
   hw_breakpoint: Add step_needed event attribute
   arm64: use hw_breakpoint_needs_single_step() to decide if step is
 needed
   register_wide_hw_breakpoint(): modify to accept step_needed arg
   arm64: disable irq between breakpoint and step exception
   arm64: fault: re-enable irq if it was disabled for single stepping

  arch/arm64/kernel/debug-monitors.c  |  3 +++
  arch/arm64/kernel/hw_breakpoint.c   | 10 +-
  arch/arm64/mm/fault.c   | 28 
  arch/x86/kernel/kgdb.c  |  2 +-
  include/linux/hw_breakpoint.h   | 10 --
  include/linux/perf_event.h  |  6 ++
  kernel/events/core.c|  2 ++
  kernel/events/hw_breakpoint.c   |  4 +++-
  samples/hw_breakpoint/data_breakpoint.c |  3 ++-
  9 files changed, 54 insertions(+), 14 deletions(-)



--
Regards
Pratyush

Re: [PATCH v3 0/5] ARM64: disable irq between breakpoint and step exception

2017-08-25 Thread Pratyush Anand


Hi Mark, James and Will,

What is the take on this series?

Just to summarize:

- James took first three patches of the series and replaced 4th and 5th patch 
with his 3 patch series [3].
- My patchset disables interrupt while exiting from HW breakpoint/watchpoint 
and SW breakpoint handlers, and re-enables after single step handling..but it 
will work reliably for single stepping even if an interrupt is generated 
during  breakpoint/watchpoint handler execution.
- James's approach keeps interrupt enabled, but it might fail if a hardware 
breakpoint is instrumented on an ISR called between breakpoint and step exception.


@James, I understand there would be other things on priority..do you have a 
plan to fix the pitfall and send a new series. If not, can I address Peter's 
concern on patch 1/5 and send v4?


[3] https://www.spinics.net/lists/arm-kernel/msg598214.html

On Monday 31 July 2017 04:10 PM, Pratyush Anand wrote:

v2 -> v3
- Moved step_needed from uapi structure to kernel only structure
- Re-enable interrupt if stepped instruction faults
- Modified register_wide_hw_breakpoint() to accept step_needed arg
v2 was here: http://marc.info/?l=linux-arm-kernel=149942910730496=2

v1 -> v2:
- patch 1 of v1 has been modified to patch 1-3 of v2.
- Introduced a new event attribute step_needed and implemented
   hw_breakpoint_needs_single_step() (patch 1)
- Replaced usage of is_default_overflow_handler() with
   hw_breakpoint_needs_single_step(). (patch 2)
- Modified sample test to set set step_needed bit field (patch 3)
v1 was here: http://marc.info/?l=linux-arm-kernel=149910958418708=2

samples/hw_breakpoint/data_breakpoint.c passes with x86_64 but fails with
ARM64. Even though it has been NAKed previously on upstream [1, 2], I have
tried to come up with patches which can resolve it for ARM64 as well.

I noticed that even perf step exception can go into an infinite loop if CPU
receives an interrupt while executing breakpoint/watchpoint handler. So,
event though we are not concerned about above test, we will have to find a
solution for the perf issue.

This patchset attempts to resolve both the issue. Please review.
Since, it also takes care of SW breakpoint, so I hope kgdb should also be
fine. However, I have not tested that.
@Takahiro: Will it be possible to test these patches for kgdb.

[1] http://marc.info/?l=linux-arm-kernel=149580777524910=2
[2] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-April/425266.html


Pratyush Anand (5):
   hw_breakpoint: Add step_needed event attribute
   arm64: use hw_breakpoint_needs_single_step() to decide if step is
 needed
   register_wide_hw_breakpoint(): modify to accept step_needed arg
   arm64: disable irq between breakpoint and step exception
   arm64: fault: re-enable irq if it was disabled for single stepping

  arch/arm64/kernel/debug-monitors.c  |  3 +++
  arch/arm64/kernel/hw_breakpoint.c   | 10 +-
  arch/arm64/mm/fault.c   | 28 
  arch/x86/kernel/kgdb.c  |  2 +-
  include/linux/hw_breakpoint.h   | 10 --
  include/linux/perf_event.h  |  6 ++
  kernel/events/core.c|  2 ++
  kernel/events/hw_breakpoint.c   |  4 +++-
  samples/hw_breakpoint/data_breakpoint.c |  3 ++-
  9 files changed, 54 insertions(+), 14 deletions(-)



--
Regards
Pratyush

[PATCH] tracer: hwlat: allow to be default bootup tracer

2017-08-11 Thread Pratyush Anand

hwlat is not a production kernel tracer, however it can be used to
identify any HW latency issue during kernel boot as well.Therefore call
init_hwlat_tracer() though core_initcall() so that we can pass
ftrace=hwlat in kernel commandline parameter and we can have hwlat as
default bootup tracer.

Signed-off-by: Pratyush Anand <pan...@redhat.com>
---
 kernel/trace/trace_hwlat.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/trace/trace_hwlat.c b/kernel/trace/trace_hwlat.c
index d7c8e4ec3d9d..09f8f0950b6c 100644
--- a/kernel/trace/trace_hwlat.c
+++ b/kernel/trace/trace_hwlat.c
@@ -630,4 +630,4 @@ __init static int init_hwlat_tracer(void)
 
return 0;
 }
-late_initcall(init_hwlat_tracer);
+core_initcall(init_hwlat_tracer);
-- 
2.9.4

[PATCH] tracer: hwlat: allow to be default bootup tracer

2017-08-11 Thread Pratyush Anand

hwlat is not a production kernel tracer, however it can be used to
identify any HW latency issue during kernel boot as well.Therefore call
init_hwlat_tracer() though core_initcall() so that we can pass
ftrace=hwlat in kernel commandline parameter and we can have hwlat as
default bootup tracer.

Signed-off-by: Pratyush Anand 
---
 kernel/trace/trace_hwlat.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/trace/trace_hwlat.c b/kernel/trace/trace_hwlat.c
index d7c8e4ec3d9d..09f8f0950b6c 100644
--- a/kernel/trace/trace_hwlat.c
+++ b/kernel/trace/trace_hwlat.c
@@ -630,4 +630,4 @@ __init static int init_hwlat_tracer(void)
 
return 0;
 }
-late_initcall(init_hwlat_tracer);
+core_initcall(init_hwlat_tracer);
-- 
2.9.4

Re: RCU stall when using function_graph

2017-08-09 Thread Pratyush Anand




On Sunday 06 August 2017 10:32 PM, Paul E. McKenney wrote:

On Sat, Aug 05, 2017 at 02:24:21PM +0900, 김동현 wrote:

Dear All

As for me, after configuring function_graph as below, crash disappears.
"echo 0 > d/tracing/tracing_on"
"sleep 1"

"echo function_graph > d/tracing/current_tracer"
"sleep 1"

"echo smp_call_function_single > d/tracing/set_ftrace_filter"


It will limit trace output to only for the filtered function 
(smp_call_function_single).



adb shell "sleep 1"

"echo 1 > d/tracing/tracing_on"
adb shell "sleep 1"

Right after function_graph is enabled, too many logs are traced upon IRQ
transaction which many times eventually causes stall.


That would do it!

Hmmm...

Steven, would it be helpful if RCU were to inform tracing (say) halfway
through the RCU CPU stall interval, allowing the tracer to do something
like cond_resched_rcu_qs()?  I can imagine all sorts of reasons why this
wouldn't work, for example, if all the tracing was with irqs disabled
or some such, but figured I should ask.

Does Guillermo's approach work for others?


Limited output with a couple of filtered function will definitely not cause 
RCU schedule stall. But the question is whether we should expect a full 
function graph trace working on every platform or not (specially the one which 
generates high interrupts)?


--
Regards
Pratyush

Re: RCU stall when using function_graph

2017-08-09 Thread Pratyush Anand




On Sunday 06 August 2017 10:32 PM, Paul E. McKenney wrote:

On Sat, Aug 05, 2017 at 02:24:21PM +0900, 김동현 wrote:

Dear All

As for me, after configuring function_graph as below, crash disappears.
"echo 0 > d/tracing/tracing_on"
"sleep 1"

"echo function_graph > d/tracing/current_tracer"
"sleep 1"

"echo smp_call_function_single > d/tracing/set_ftrace_filter"


It will limit trace output to only for the filtered function 
(smp_call_function_single).



adb shell "sleep 1"

"echo 1 > d/tracing/tracing_on"
adb shell "sleep 1"

Right after function_graph is enabled, too many logs are traced upon IRQ
transaction which many times eventually causes stall.


That would do it!

Hmmm...

Steven, would it be helpful if RCU were to inform tracing (say) halfway
through the RCU CPU stall interval, allowing the tracer to do something
like cond_resched_rcu_qs()?  I can imagine all sorts of reasons why this
wouldn't work, for example, if all the tracing was with irqs disabled
or some such, but figured I should ask.

Does Guillermo's approach work for others?


Limited output with a couple of filtered function will definitely not cause 
RCU schedule stall. But the question is whether we should expect a full 
function graph trace working on every platform or not (specially the one which 
generates high interrupts)?


--
Regards
Pratyush

Re: rcu_sched stall while waiting in csd_lock_wait()

2017-08-02 Thread Pratyush Anand


Hi Peter,

On Wednesday 02 August 2017 01:44 PM, Peter Zijlstra wrote:

On Wed, Aug 02, 2017 at 09:01:19AM +0530, Pratyush Anand wrote:

Hi,

I am observing following rcu_sched stall while executing `perf record -a  --
sleep 1` with one of the arm64 platform. It looks like that stalled cpu was
waiting in csd_lock_wait() from where it never came out,and so the stall.
Any help/pointer for further debugging would be very helpful. Problem also
reproduced with 4.13.0-rc3.


I'm sitting on this patch:

   
https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/commit/?h=locking/core=15377ef4fe0c86eb7fa1099575b2e86357d99e42

please give that a spin.


Thanks for the pointer.
Tried, but it did not help.

--
Regards
Pratyush

Re: rcu_sched stall while waiting in csd_lock_wait()

2017-08-02 Thread Pratyush Anand


Hi Peter,

On Wednesday 02 August 2017 01:44 PM, Peter Zijlstra wrote:

On Wed, Aug 02, 2017 at 09:01:19AM +0530, Pratyush Anand wrote:

Hi,

I am observing following rcu_sched stall while executing `perf record -a  --
sleep 1` with one of the arm64 platform. It looks like that stalled cpu was
waiting in csd_lock_wait() from where it never came out,and so the stall.
Any help/pointer for further debugging would be very helpful. Problem also
reproduced with 4.13.0-rc3.


I'm sitting on this patch:

   
https://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git/commit/?h=locking/core=15377ef4fe0c86eb7fa1099575b2e86357d99e42

please give that a spin.


Thanks for the pointer.
Tried, but it did not help.

--
Regards
Pratyush

Re: rcu_sched stall while waiting in csd_lock_wait()

2017-08-02 Thread Pratyush Anand


Hi Marc,

On Wednesday 02 August 2017 02:14 PM, Marc Zyngier wrote:

On 02/08/17 09:08, Will Deacon wrote:

Hi Pratyush,

On Wed, Aug 02, 2017 at 09:01:19AM +0530, Pratyush Anand wrote:

I am observing following rcu_sched stall while executing `perf record -a  --
sleep 1` with one of the arm64 platform. It looks like that stalled cpu was
waiting in csd_lock_wait() from where it never came out,and so the stall.
Any help/pointer for further debugging would be very helpful. Problem also
reproduced with 4.13.0-rc3.


When you say "also", which other kernel(s) show the problem? Is this a
recent regression? Which platform are you running on?

It would be interesting to know what the other CPUs are doing, in particular
the target of the cross-call. Either it crashed spectacularly and didn't
unlock the csd lock, or the IPI somehow wasn't delivered.

Do you see any other splats if you enable lock debugging?


Also, is that in a guest, or bare metal? If that's a guest, what's the
host's kernel version?


Its a host.
--
Regards
Pratyush

Re: rcu_sched stall while waiting in csd_lock_wait()

2017-08-02 Thread Pratyush Anand


Hi Marc,

On Wednesday 02 August 2017 02:14 PM, Marc Zyngier wrote:

On 02/08/17 09:08, Will Deacon wrote:

Hi Pratyush,

On Wed, Aug 02, 2017 at 09:01:19AM +0530, Pratyush Anand wrote:

I am observing following rcu_sched stall while executing `perf record -a  --
sleep 1` with one of the arm64 platform. It looks like that stalled cpu was
waiting in csd_lock_wait() from where it never came out,and so the stall.
Any help/pointer for further debugging would be very helpful. Problem also
reproduced with 4.13.0-rc3.


When you say "also", which other kernel(s) show the problem? Is this a
recent regression? Which platform are you running on?

It would be interesting to know what the other CPUs are doing, in particular
the target of the cross-call. Either it crashed spectacularly and didn't
unlock the csd lock, or the IPI somehow wasn't delivered.

Do you see any other splats if you enable lock debugging?


Also, is that in a guest, or bare metal? If that's a guest, what's the
host's kernel version?


Its a host.
--
Regards
Pratyush

Re: rcu_sched stall while waiting in csd_lock_wait()

2017-08-02 Thread Pratyush Anand


Hi Will,

On Wednesday 02 August 2017 01:38 PM, Will Deacon wrote:

Hi Pratyush,

On Wed, Aug 02, 2017 at 09:01:19AM +0530, Pratyush Anand wrote:

I am observing following rcu_sched stall while executing `perf record -a  --
sleep 1` with one of the arm64 platform. It looks like that stalled cpu was
waiting in csd_lock_wait() from where it never came out,and so the stall.
Any help/pointer for further debugging would be very helpful. Problem also
reproduced with 4.13.0-rc3.

When you say "also", which other kernel(s) show the problem? Is this a
recent regression? Which platform are you running on?


Other than 4.13.0-rc3 I had tested with 4.11 based kernel which has 4.11 
vanila+some 4.12 patches back ported, and log which I had attached was from 
same kernel, thats why I mentioned that it reproduced with vanilla upstream as 
well.




It would be interesting to know what the other CPUs are doing, in particular
the target of the cross-call. Either it crashed spectacularly and didn't
unlock the csd lock, or the IPI somehow wasn't delivered.

Do you see any other splats if you enable lock debugging?


It was same.

Following is the log from 4.13.0-rc3 + patch pointed by Peter:

[  173.649589] perf: interrupt took too long (4952 > 4902), lowering 
kernel.perf_event_max_sample_rate to 40300

[  201.340926] INFO: rcu_sched self-detected stall on CPU
[  201.345115] 	9-...: (6499 ticks this GP) idle=e1a/141/0 
softirq=334/334 fqs=3250

[  201.353617]   (t=6500 jiffies g=313 c=312 q=428)
[  201.358220] Task dump for CPU 9:
[  201.361431] perfR  running task0  1888   1864 0x0202
[  201.368462] Call trace:
[  201.370897] [] dump_backtrace+0x0/0x28c
[  201.376276] [] show_stack+0x24/0x2c
[  201.381312] [] sched_show_task+0x19c/0x26c
[  201.386952] [] dump_cpu_task+0x48/0x54
[  201.392250] [] rcu_dump_cpu_stacks+0xac/0xf4
[  201.398063] [] rcu_check_callbacks+0x908/0xc90
[  201.404053] [] update_process_times+0x34/0x5c
[  201.409957] [] tick_sched_handle.isra.16+0x4c/0x70
[  201.416292] [] tick_sched_timer+0x48/0x88
[  201.421847] [] __hrtimer_run_queues+0x17c/0x604
[  201.427924] [] hrtimer_interrupt+0xa4/0x1e8
[  201.433656] [] arch_timer_handler_phys+0x3c/0x48
[  201.439818] [] handle_percpu_devid_irq+0xdc/0x42c
[  201.446069] [] generic_handle_irq+0x34/0x4c
[  201.451796] [] __handle_domain_irq+0x6c/0xc4
[  201.457611] [] gic_handle_irq+0xa0/0x1b0
[  201.463080] Exception stack(0x8016df013a40 to 0x8016df013b70)
[  201.469504] 3a40:  0003  
8016df013bd0
[  201.477316] 3a60: 8016df013bd0 0008 8016df013bb8 
082113c8
[  201.485129] 3a80:  e507a9f0  

[  201.492941] 3aa0: 0005  002f547d23157399 
3a2a9f82ac9c
[  201.500754] 3ac0:   e507a7e0 
08f5b000
[  201.508566] 3ae0: 8016df013c08 08213fcc 0013 
8017616a7800
[  201.516379] 3b00: 08f5b000 082179d4  
088c1000
[  201.524191] 3b20: 8017616a7800 8016df013b70 0818bf04 
8016df013b70
[  201.532004] 3b40: 0818bf28 20400145 08213fcc 
0013

[  201.539816] 3b60: 0001 8016df013bb8
[  201.544677] [] el1_irq+0xb8/0x140
[  201.549539] [] smp_call_function_single+0x160/0x184
[  201.555965] [] cpu_function_call+0x40/0x64
[  201.561605] [] event_function_call+0x120/0x128
[  201.567594] [] _perf_event_disable+0x44/0x64
[  201.573410] [] perf_event_for_each_child+0x3c/0x84
[  201.579747] [] perf_ioctl+0x21c/0x9a4
[  201.584957] [] do_vfs_ioctl+0xcc/0x874
[  201.590250] [] sys_ioctl+0x90/0xa4
[  201.595198] [] __sys_trace_return+0x0/0x4
[  239.003035] INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 
9-... } 6592 jiffies s: 1149 root: 0x1/.

[  239.012199] blocking rcu_node structures: l=1:0-14:0x200/.
[  239.017695] Task dump for CPU 9:
[  239.020880] perfR  running task0  1888   1864 0x0202
[  239.027929] Call trace:
[  239.030346] [] __switch_to+0x64/0x70
[  239.035484] [] free_pcppages_bulk+0x43c/0x640
[  262.304244] perf: interrupt took too long (6221 > 6190), lowering 
kernel.perf_event_max_sample_rate to 32100

[  367.009704] INFO: task kworker/15:2:1187 blocked for more than 120 seconds.
[  367.015713]   Tainted: GW   4.13.0-rc3+ #2
[  367.021200] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.

[  367.028994] kworker/15:2D0  1187  2 0x
[  367.034481] Workqueue: usb_hub_wq hub_event
[  367.038629] Call trace:
[  367.041077] [] __switch_to+0x64/0x70
[  367.046184] [] __schedule+0x410/0xcbc
[  367.051403] [] schedule+0x40/0xa4
[  367.056253] [] 
_synchronize_rcu_expedited.constprop.64+0x370/0x3e8

[  367.063990] [] synchronize_sched_expedited+0x7c/0xf0
[  367.070501]

Re: rcu_sched stall while waiting in csd_lock_wait()

2017-08-02 Thread Pratyush Anand


Hi Will,

On Wednesday 02 August 2017 01:38 PM, Will Deacon wrote:

Hi Pratyush,

On Wed, Aug 02, 2017 at 09:01:19AM +0530, Pratyush Anand wrote:

I am observing following rcu_sched stall while executing `perf record -a  --
sleep 1` with one of the arm64 platform. It looks like that stalled cpu was
waiting in csd_lock_wait() from where it never came out,and so the stall.
Any help/pointer for further debugging would be very helpful. Problem also
reproduced with 4.13.0-rc3.

When you say "also", which other kernel(s) show the problem? Is this a
recent regression? Which platform are you running on?


Other than 4.13.0-rc3 I had tested with 4.11 based kernel which has 4.11 
vanila+some 4.12 patches back ported, and log which I had attached was from 
same kernel, thats why I mentioned that it reproduced with vanilla upstream as 
well.




It would be interesting to know what the other CPUs are doing, in particular
the target of the cross-call. Either it crashed spectacularly and didn't
unlock the csd lock, or the IPI somehow wasn't delivered.

Do you see any other splats if you enable lock debugging?


It was same.

Following is the log from 4.13.0-rc3 + patch pointed by Peter:

[  173.649589] perf: interrupt took too long (4952 > 4902), lowering 
kernel.perf_event_max_sample_rate to 40300

[  201.340926] INFO: rcu_sched self-detected stall on CPU
[  201.345115] 	9-...: (6499 ticks this GP) idle=e1a/141/0 
softirq=334/334 fqs=3250

[  201.353617]   (t=6500 jiffies g=313 c=312 q=428)
[  201.358220] Task dump for CPU 9:
[  201.361431] perfR  running task0  1888   1864 0x0202
[  201.368462] Call trace:
[  201.370897] [] dump_backtrace+0x0/0x28c
[  201.376276] [] show_stack+0x24/0x2c
[  201.381312] [] sched_show_task+0x19c/0x26c
[  201.386952] [] dump_cpu_task+0x48/0x54
[  201.392250] [] rcu_dump_cpu_stacks+0xac/0xf4
[  201.398063] [] rcu_check_callbacks+0x908/0xc90
[  201.404053] [] update_process_times+0x34/0x5c
[  201.409957] [] tick_sched_handle.isra.16+0x4c/0x70
[  201.416292] [] tick_sched_timer+0x48/0x88
[  201.421847] [] __hrtimer_run_queues+0x17c/0x604
[  201.427924] [] hrtimer_interrupt+0xa4/0x1e8
[  201.433656] [] arch_timer_handler_phys+0x3c/0x48
[  201.439818] [] handle_percpu_devid_irq+0xdc/0x42c
[  201.446069] [] generic_handle_irq+0x34/0x4c
[  201.451796] [] __handle_domain_irq+0x6c/0xc4
[  201.457611] [] gic_handle_irq+0xa0/0x1b0
[  201.463080] Exception stack(0x8016df013a40 to 0x8016df013b70)
[  201.469504] 3a40:  0003  
8016df013bd0
[  201.477316] 3a60: 8016df013bd0 0008 8016df013bb8 
082113c8
[  201.485129] 3a80:  e507a9f0  

[  201.492941] 3aa0: 0005  002f547d23157399 
3a2a9f82ac9c
[  201.500754] 3ac0:   e507a7e0 
08f5b000
[  201.508566] 3ae0: 8016df013c08 08213fcc 0013 
8017616a7800
[  201.516379] 3b00: 08f5b000 082179d4  
088c1000
[  201.524191] 3b20: 8017616a7800 8016df013b70 0818bf04 
8016df013b70
[  201.532004] 3b40: 0818bf28 20400145 08213fcc 
0013

[  201.539816] 3b60: 0001 8016df013bb8
[  201.544677] [] el1_irq+0xb8/0x140
[  201.549539] [] smp_call_function_single+0x160/0x184
[  201.555965] [] cpu_function_call+0x40/0x64
[  201.561605] [] event_function_call+0x120/0x128
[  201.567594] [] _perf_event_disable+0x44/0x64
[  201.573410] [] perf_event_for_each_child+0x3c/0x84
[  201.579747] [] perf_ioctl+0x21c/0x9a4
[  201.584957] [] do_vfs_ioctl+0xcc/0x874
[  201.590250] [] sys_ioctl+0x90/0xa4
[  201.595198] [] __sys_trace_return+0x0/0x4
[  239.003035] INFO: rcu_sched detected expedited stalls on CPUs/tasks: { 
9-... } 6592 jiffies s: 1149 root: 0x1/.

[  239.012199] blocking rcu_node structures: l=1:0-14:0x200/.
[  239.017695] Task dump for CPU 9:
[  239.020880] perfR  running task0  1888   1864 0x0202
[  239.027929] Call trace:
[  239.030346] [] __switch_to+0x64/0x70
[  239.035484] [] free_pcppages_bulk+0x43c/0x640
[  262.304244] perf: interrupt took too long (6221 > 6190), lowering 
kernel.perf_event_max_sample_rate to 32100

[  367.009704] INFO: task kworker/15:2:1187 blocked for more than 120 seconds.
[  367.015713]   Tainted: GW   4.13.0-rc3+ #2
[  367.021200] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.

[  367.028994] kworker/15:2D0  1187  2 0x
[  367.034481] Workqueue: usb_hub_wq hub_event
[  367.038629] Call trace:
[  367.041077] [] __switch_to+0x64/0x70
[  367.046184] [] __schedule+0x410/0xcbc
[  367.051403] [] schedule+0x40/0xa4
[  367.056253] [] 
_synchronize_rcu_expedited.constprop.64+0x370/0x3e8

[  367.063990] [] synchronize_sched_expedited+0x7c/0xf0
[  367.070501]

Re: [PATCH v3 0/5] ARM64: disable irq between breakpoint and step exception

2017-08-02 Thread Pratyush Anand


Hi James,

Thanks for your analysis.

On Wednesday 02 August 2017 10:43 PM, James Morse wrote:

Hi Pratyush,

On 01/08/17 05:18, Pratyush Anand wrote:

On Monday 31 July 2017 10:45 PM, James Morse wrote:

On 31/07/17 11:40, Pratyush Anand wrote:

samples/hw_breakpoint/data_breakpoint.c passes with x86_64 but fails with
ARM64. Even though it has been NAKed previously on upstream [1, 2], I have
tried to come up with patches which can resolve it for ARM64 as well.

I noticed that even perf step exception can go into an infinite loop if CPU
receives an interrupt while executing breakpoint/watchpoint handler. So,
event though we are not concerned about above test, we will have to find a
solution for the perf issue.



You can easily reproduce the issue with following:
# insmod data_breakpoint.ko ksym=__sysrq_enabled
# cat /proc/sys/kernel/sysrq


Thanks, that happily dump-stacks forever. Your first three patches fix the
stepping over the watchpoint, I've had a go at fixing the interrupt interaction,
(instead of just masking interrupts).

gdb single-step works, as does kprobes, FWIW for those three:
Tested-by: James Morse <james.mo...@arm.com>



What causes your infinite loop?



Flow is like this:
- A SW or HW breakpoint exception is being generated on a cpu lets say CPU5
- Breakpoint handler does something which causes an interrupt to be active on
the same CPU. In fact there might be many other reasons for an interrupt to be
active on a CPU while breakpoint handler was being executed.
- So, as soon as we return from breakpoint exception, we go to the IRQ exception
handler, while we were expecting a single step exception.


What breaks when this happens?


I think, as soon as we call enable_dbg from el1_irq, step exception will be 
generated, and we will be stepping very first instruction after enable_dbg 
which is not expected by single step handler.




With your reproducer and the first three patches I see it hitting the watchpoint
multiple times and stepping the irq handle
Lets say we were executing instruction from address 0x2000 when watchpoint 
exception occurred. We programmed, ELR with 0x2000 for single stepping, 
however we received an interrupt before instruction at 0x2000 could have been 
single stepped.


Now if 0x3010 is the address next to enable_dbg from el1_irq, then the 
instruction from address 0x3010 will be single stepped. We will jump back to 
0x2000 again after addressing el1_irq, but that is no more available for 
single-stepping and while executing instruction at that address we again land 
into watchpoint exception handler.


I do not have a HW debugger, but this is what looked like while going through 
code.




I think we have two or three interacting bugs here. I'm not convinced masking
interrupts is the best fix as the data abort handler inherits this value. We
might mask interrupts for a fault that can't be handled with interrupts masked.



In my understanding problems are:
(1) Single stepping of unwanted instruction (ie. instruction  next to 
enable_dbg from el1_irq)
(2) We do not have memory at the end of el1_irq, so that we can set watchpoint 
exception generating instruction for single stepping.


I think, we can find a way to take care for (2), but not sure how (1) can be 
taken care, without the approach I am taking.



I will post some RFC/fixes, but need to get my head round the debug/exception
interaction in the ARM-ARM first!


Thanks,

James



--
Regards
Pratyush

Re: [PATCH v3 0/5] ARM64: disable irq between breakpoint and step exception

2017-08-02 Thread Pratyush Anand


Hi James,

Thanks for your analysis.

On Wednesday 02 August 2017 10:43 PM, James Morse wrote:

Hi Pratyush,

On 01/08/17 05:18, Pratyush Anand wrote:

On Monday 31 July 2017 10:45 PM, James Morse wrote:

On 31/07/17 11:40, Pratyush Anand wrote:

samples/hw_breakpoint/data_breakpoint.c passes with x86_64 but fails with
ARM64. Even though it has been NAKed previously on upstream [1, 2], I have
tried to come up with patches which can resolve it for ARM64 as well.

I noticed that even perf step exception can go into an infinite loop if CPU
receives an interrupt while executing breakpoint/watchpoint handler. So,
event though we are not concerned about above test, we will have to find a
solution for the perf issue.



You can easily reproduce the issue with following:
# insmod data_breakpoint.ko ksym=__sysrq_enabled
# cat /proc/sys/kernel/sysrq


Thanks, that happily dump-stacks forever. Your first three patches fix the
stepping over the watchpoint, I've had a go at fixing the interrupt interaction,
(instead of just masking interrupts).

gdb single-step works, as does kprobes, FWIW for those three:
Tested-by: James Morse 



What causes your infinite loop?



Flow is like this:
- A SW or HW breakpoint exception is being generated on a cpu lets say CPU5
- Breakpoint handler does something which causes an interrupt to be active on
the same CPU. In fact there might be many other reasons for an interrupt to be
active on a CPU while breakpoint handler was being executed.
- So, as soon as we return from breakpoint exception, we go to the IRQ exception
handler, while we were expecting a single step exception.


What breaks when this happens?


I think, as soon as we call enable_dbg from el1_irq, step exception will be 
generated, and we will be stepping very first instruction after enable_dbg 
which is not expected by single step handler.




With your reproducer and the first three patches I see it hitting the watchpoint
multiple times and stepping the irq handle
Lets say we were executing instruction from address 0x2000 when watchpoint 
exception occurred. We programmed, ELR with 0x2000 for single stepping, 
however we received an interrupt before instruction at 0x2000 could have been 
single stepped.


Now if 0x3010 is the address next to enable_dbg from el1_irq, then the 
instruction from address 0x3010 will be single stepped. We will jump back to 
0x2000 again after addressing el1_irq, but that is no more available for 
single-stepping and while executing instruction at that address we again land 
into watchpoint exception handler.


I do not have a HW debugger, but this is what looked like while going through 
code.




I think we have two or three interacting bugs here. I'm not convinced masking
interrupts is the best fix as the data abort handler inherits this value. We
might mask interrupts for a fault that can't be handled with interrupts masked.



In my understanding problems are:
(1) Single stepping of unwanted instruction (ie. instruction  next to 
enable_dbg from el1_irq)
(2) We do not have memory at the end of el1_irq, so that we can set watchpoint 
exception generating instruction for single stepping.


I think, we can find a way to take care for (2), but not sure how (1) can be 
taken care, without the approach I am taking.



I will post some RFC/fixes, but need to get my head round the debug/exception
interaction in the ARM-ARM first!


Thanks,

James



--
Regards
Pratyush

Query: rcu_sched detected stalls with function_graph tracer

2017-08-02 Thread Pratyush Anand


Hi Steve,

I am using a 3.10 based kernel, and when I enable function_graph with one 
particular x86_64 machine, I encounter rcu_sched stall.


echo function_graph > /sys/kernel/debug/tracing/current_tracer
echo 1 > /sys/kernel/debug/tracing/tracing_on

If I use 4.13-rc2, then its better, but still goes for it after a bit of 
stressing with stressng.


It looks like that this behavior should be expected with a loaded system (like 
that of ftrace-graph tracing for all available function) which can cause RCU 
stall warning. But,I am not an expert. So, whats your opinion on it?


I tried to bisect and there was no particular function which can cause this. 
It looked like that when we had more than 5000 functions in set_ftrace_filter, 
we start getting such warnings. Tried by putting some most  hit functions like 
spin_lock* etc in set_ftrace_notrace. Still, no help.  Do you have some 
suggestion for further debugging pointers?


log with 3.10 based kernel
===
[  358.764071] perf: interrupt took too long (4503 > 4488), lowering 
kernel.perf_event_max_sample_rate to 44000
[  408.705034] INFO: rcu_sched detected stalls on CPUs/tasks: { 0} (detected 
by 7, t=60036 jiffies, g=21144, c=21143, q=21651)

[  408.706011] Task dump for CPU 0:
[  408.706011] swapper/0   R  running task0 0  0 0x0008
[  408.706011]  0003 0003 81aaf920 
819e4000
[  408.706011]  819e7ed0 816b6ef5  
81b1c820
[  408.706011]  819e4000 819e4000 819e4000 
819e4000

[  408.706011] Call Trace:
[  408.706011]  [] ? ftrace_graph_caller+0x85/0x85
[  408.706011]  [] handle_irq_event_percpu+0x55/0x80
[  408.706011]  [] ? ftrace_graph_caller+0x85/0x85
[  408.706011]  [] handle_irq_event+0x3c/0x60
[  408.706011]  [] ? ftrace_graph_caller+0x85/0x85
[  408.706011]  [] pci_bus_read_config_dword+0x86/0xb0
[  408.706011]  [] ? rest_init+0x77/0x80
[  408.706011]  [] ? start_kernel+0x439/0x45a
[  408.706011]  [] ? repair_env_string+0x5c/0x5c
[  408.706011]  [] ? early_idt_handler_array+0x120/0x120
[  408.706011]  [] ? x86_64_start_reservations+0x24/0x26
[  408.706011]  [] ? x86_64_start_kernel+0x14f/0x172
[  377.243167] perf: interrupt took too long (5741 > 5628), lowering 
kernel.perf_event_max_sample_rate to 34000
[  437.522697] perf: interrupt took too long (8551 > 7176), lowering 
kernel.perf_event_max_sample_rate to 23000
[  463.908115] perf: interrupt took too long (11374 > 10688), lowering 
kernel.perf_event_max_sample_rate to 17000

[  480.954277] INFO: task yum:6726 blocked for more than 120 seconds.
[  480.991431] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.

[  481.038470] yum D 880467bebf40 0  6726   6714 0x0080
[  481.081174]  88006489b850 0082 880467bebf40 
88006489bfd8
[  481.126085]  88006489bfd8 88006489bfd8 880467bebf40 
8802772656b0
[  481.171087]  7fff 0002  
880467bebf40

[  481.216034] Call Trace:
[  481.230832]  [] ftrace_graph_caller+0x85/0x85
[  481.266435]  [] schedule+0x29/0x70
[  481.296290]  [] ftrace_graph_caller+0x85/0x85
[  481.331966]  [] schedule_timeout+0x239/0x2c0
[  481.367082]  [] ? prepare_ftrace_return+0xb6/0xe0
[  481.404850]  [] ? __down_common+0x104/0x104
[  481.439492]  [] ftrace_graph_caller+0x85/0x85
[  481.475091]  [] __down_common+0xaa/0x104
[  481.508098]  [] ? ftrace_graph_caller+0x85/0x85
[  481.544778]  [] down+0x41/0x50
[  481.577943]  [] __down+0x1d/0x1f
[  481.611734]  [] ftrace_graph_caller+0x85/0x85
[  481.652576]  [] xfs_buf_lock+0x3c/0xd0 [xfs]
[  481.693178]  [] ftrace_graph_caller+0x85/0x85
[  481.734216]  [] _xfs_buf_find+0x170/0x330 [xfs]
[  481.775910]  [] ftrace_graph_caller+0x85/0x85
[  481.816266]  [] xfs_buf_get_map+0x2a/0x240 [xfs]
[  481.859119]  [] ftrace_graph_caller+0x85/0x85
[  481.899431]  [] xfs_buf_read_map+0x30/0x160 [xfs]
[  481.941875]  [] ftrace_graph_caller+0x85/0x85
[  481.982192]  [] xfs_trans_read_buf_map+0x211/0x400 [xfs]
[  482.027938]  [] ftrace_graph_caller+0x85/0x85
[  482.068655]  [] xfs_imap_to_bp+0x6e/0xf0 [xfs]
[  482.109275]  [] ftrace_graph_caller+0x85/0x85
[  482.149375]  [] xfs_iunlink_remove+0x295/0x3c0 [xfs]
[  482.193561]  [] ftrace_graph_caller+0x85/0x85
[  482.233577]  [] xfs_ifree+0x47/0x100 [xfs]
[  482.272198]  [] ftrace_graph_caller+0x85/0x85
[  482.312544]  [] xfs_inactive_ifree+0xd1/0x230 [xfs]
[  482.355816]  [] ftrace_graph_caller+0x85/0x85
[  482.395936]  [] xfs_inactive+0x8b/0x130 [xfs]
[  482.436204]  [] ftrace_graph_caller+0x85/0x85
[  482.477110]  [] xfs_fs_destroy_inode+0x95/0x180 [xfs]
[  482.521397]  [] ftrace_graph_caller+0x85/0x85
[  482.562184]  [] destroy_inode+0x38/0x60
[  482.599372]  [] ftrace_graph_caller+0x85/0x85
[  482.639831]  [] evict+0x10a/0x180
[  482.673827]  [] ftrace_graph_caller+0x85/0x85
[  482.714694]  [] iput+0xf9/0x190
[

Query: rcu_sched detected stalls with function_graph tracer

2017-08-02 Thread Pratyush Anand


Hi Steve,

I am using a 3.10 based kernel, and when I enable function_graph with one 
particular x86_64 machine, I encounter rcu_sched stall.


echo function_graph > /sys/kernel/debug/tracing/current_tracer
echo 1 > /sys/kernel/debug/tracing/tracing_on

If I use 4.13-rc2, then its better, but still goes for it after a bit of 
stressing with stressng.


It looks like that this behavior should be expected with a loaded system (like 
that of ftrace-graph tracing for all available function) which can cause RCU 
stall warning. But,I am not an expert. So, whats your opinion on it?


I tried to bisect and there was no particular function which can cause this. 
It looked like that when we had more than 5000 functions in set_ftrace_filter, 
we start getting such warnings. Tried by putting some most  hit functions like 
spin_lock* etc in set_ftrace_notrace. Still, no help.  Do you have some 
suggestion for further debugging pointers?


log with 3.10 based kernel
===
[  358.764071] perf: interrupt took too long (4503 > 4488), lowering 
kernel.perf_event_max_sample_rate to 44000
[  408.705034] INFO: rcu_sched detected stalls on CPUs/tasks: { 0} (detected 
by 7, t=60036 jiffies, g=21144, c=21143, q=21651)

[  408.706011] Task dump for CPU 0:
[  408.706011] swapper/0   R  running task0 0  0 0x0008
[  408.706011]  0003 0003 81aaf920 
819e4000
[  408.706011]  819e7ed0 816b6ef5  
81b1c820
[  408.706011]  819e4000 819e4000 819e4000 
819e4000

[  408.706011] Call Trace:
[  408.706011]  [] ? ftrace_graph_caller+0x85/0x85
[  408.706011]  [] handle_irq_event_percpu+0x55/0x80
[  408.706011]  [] ? ftrace_graph_caller+0x85/0x85
[  408.706011]  [] handle_irq_event+0x3c/0x60
[  408.706011]  [] ? ftrace_graph_caller+0x85/0x85
[  408.706011]  [] pci_bus_read_config_dword+0x86/0xb0
[  408.706011]  [] ? rest_init+0x77/0x80
[  408.706011]  [] ? start_kernel+0x439/0x45a
[  408.706011]  [] ? repair_env_string+0x5c/0x5c
[  408.706011]  [] ? early_idt_handler_array+0x120/0x120
[  408.706011]  [] ? x86_64_start_reservations+0x24/0x26
[  408.706011]  [] ? x86_64_start_kernel+0x14f/0x172
[  377.243167] perf: interrupt took too long (5741 > 5628), lowering 
kernel.perf_event_max_sample_rate to 34000
[  437.522697] perf: interrupt took too long (8551 > 7176), lowering 
kernel.perf_event_max_sample_rate to 23000
[  463.908115] perf: interrupt took too long (11374 > 10688), lowering 
kernel.perf_event_max_sample_rate to 17000

[  480.954277] INFO: task yum:6726 blocked for more than 120 seconds.
[  480.991431] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables 
this message.

[  481.038470] yum D 880467bebf40 0  6726   6714 0x0080
[  481.081174]  88006489b850 0082 880467bebf40 
88006489bfd8
[  481.126085]  88006489bfd8 88006489bfd8 880467bebf40 
8802772656b0
[  481.171087]  7fff 0002  
880467bebf40

[  481.216034] Call Trace:
[  481.230832]  [] ftrace_graph_caller+0x85/0x85
[  481.266435]  [] schedule+0x29/0x70
[  481.296290]  [] ftrace_graph_caller+0x85/0x85
[  481.331966]  [] schedule_timeout+0x239/0x2c0
[  481.367082]  [] ? prepare_ftrace_return+0xb6/0xe0
[  481.404850]  [] ? __down_common+0x104/0x104
[  481.439492]  [] ftrace_graph_caller+0x85/0x85
[  481.475091]  [] __down_common+0xaa/0x104
[  481.508098]  [] ? ftrace_graph_caller+0x85/0x85
[  481.544778]  [] down+0x41/0x50
[  481.577943]  [] __down+0x1d/0x1f
[  481.611734]  [] ftrace_graph_caller+0x85/0x85
[  481.652576]  [] xfs_buf_lock+0x3c/0xd0 [xfs]
[  481.693178]  [] ftrace_graph_caller+0x85/0x85
[  481.734216]  [] _xfs_buf_find+0x170/0x330 [xfs]
[  481.775910]  [] ftrace_graph_caller+0x85/0x85
[  481.816266]  [] xfs_buf_get_map+0x2a/0x240 [xfs]
[  481.859119]  [] ftrace_graph_caller+0x85/0x85
[  481.899431]  [] xfs_buf_read_map+0x30/0x160 [xfs]
[  481.941875]  [] ftrace_graph_caller+0x85/0x85
[  481.982192]  [] xfs_trans_read_buf_map+0x211/0x400 [xfs]
[  482.027938]  [] ftrace_graph_caller+0x85/0x85
[  482.068655]  [] xfs_imap_to_bp+0x6e/0xf0 [xfs]
[  482.109275]  [] ftrace_graph_caller+0x85/0x85
[  482.149375]  [] xfs_iunlink_remove+0x295/0x3c0 [xfs]
[  482.193561]  [] ftrace_graph_caller+0x85/0x85
[  482.233577]  [] xfs_ifree+0x47/0x100 [xfs]
[  482.272198]  [] ftrace_graph_caller+0x85/0x85
[  482.312544]  [] xfs_inactive_ifree+0xd1/0x230 [xfs]
[  482.355816]  [] ftrace_graph_caller+0x85/0x85
[  482.395936]  [] xfs_inactive+0x8b/0x130 [xfs]
[  482.436204]  [] ftrace_graph_caller+0x85/0x85
[  482.477110]  [] xfs_fs_destroy_inode+0x95/0x180 [xfs]
[  482.521397]  [] ftrace_graph_caller+0x85/0x85
[  482.562184]  [] destroy_inode+0x38/0x60
[  482.599372]  [] ftrace_graph_caller+0x85/0x85
[  482.639831]  [] evict+0x10a/0x180
[  482.673827]  [] ftrace_graph_caller+0x85/0x85
[  482.714694]  [] iput+0xf9/0x190
[

rcu_sched stall while waiting in csd_lock_wait()

2017-08-01 Thread Pratyush Anand


Hi,

I am observing following rcu_sched stall while executing `perf record -a  -- 
sleep 1` with one of the arm64 platform. It looks like that stalled cpu was 
waiting in csd_lock_wait() from where it never came out,and so the stall. Any 
help/pointer for further debugging would be very helpful. Problem also 
reproduced with 4.13.0-rc3.



[11108.758804] task: 80178249 task.stack: 801782574000
[11108.764710] PC is at smp_call_function_single+0x128/0x178
[11108.770089] LR is at smp_call_function_single+0x11c/0x178
[11108.775470] pc : [] lr : [] pstate: 
20400145

[11108.782848] sp : 801782577b80
[11108.786147] x29: 801782577b80 x28: 8017815c3800
[11108.791442] x27: 087f1000 x26: 001d
[11108.796737] x25:  x24: 081ddc90
[11108.802032] x23: 08c73000 x22: 0006
[11108.807327] x21: 081da548 x20: 801782577c18
[11108.812623] x19: 08c73000 x18: fa011c30
[11108.817918] x17:  x16: 
[11108.823213] x15: 006ed000 x14: 00620170
[11108.828508] x13:  x12: 0005
[11108.833803] x11:  x10: 
[11108.839098] x9 : fa011e40 x8 : 001d
[11108.844393] x7 :  x6 : 801782577bc8
[11108.849689] x5 : 0040 x4 : 801782577be0
[11108.854984] x3 : 801782577be0 x2 : 
[11108.860279] x1 : 0003 x0 : 
[11108.865574]
[11141.885599] INFO: rcu_sched self-detected stall on CPU
[11141.889777] 	7-...: (5982 ticks this GP) idle=2fb/141/0 
softirq=604/604 fqs=2995

[11141.898281]   (t=6000 jiffies g=11855 c=11854 q=19196)
[11141.903404] Task dump for CPU 7:
[11141.906615] perfR  running task0  2357   2329 0x0202
[11141.913647] Call trace:
[11141.916081] [] dump_backtrace+0x0/0x270
[11141.921460] [] show_stack+0x24/0x2c
[11141.926496] [] sched_show_task+0xec/0x130
[11141.932050] [] dump_cpu_task+0x48/0x54
[11141.937347] [] rcu_dump_cpu_stacks+0xac/0xf4
[11141.943163] [] rcu_check_callbacks+0x7a0/0xa84
[11141.949152] [] update_process_times+0x34/0x5c
[11141.955055] [] tick_sched_handle.isra.15+0x38/0x70
[11141.961390] [] tick_sched_timer+0x48/0x88
[11141.966946] [] __hrtimer_run_queues+0x150/0x2e4
[11141.973022] [] hrtimer_interrupt+0xa0/0x1d4
[11141.978754] [] arch_timer_handler_phys+0x3c/0x48
[11141.984916] [] handle_percpu_devid_irq+0x98/0x210
[11141.991164] [] generic_handle_irq+0x34/0x4c
[11141.996893] [] __handle_domain_irq+0x6c/0xc4
[11142.002710] [] gic_handle_irq+0x94/0x190
[11142.008178] Exception stack(0x801782577a50 to 0x801782577b80)
[11142.014602] 7a40:    
0003
[11142.022414] 7a60:  801782577be0 801782577be0 
0040
[11142.030226] 7a80: 801782577bc8  001d 
fa011e40
[11142.038039] 7aa0:   0005 

[11142.045851] 7ac0: 00620170 006ed000  

[11142.053664] 7ae0: fa011c30 08c73000 801782577c18 
081da548
[11142.061477] 7b00: 0006 08c73000 081ddc90 

[11142.069289] 7b20: 001d 087f1000 8017815c3800 
801782577b80
[11142.077101] 7b40: 081615a0 801782577b80 081615ac 
20400145
[11142.084914] 7b60: 001b0019 801782577c18 0001 
801782577bc8

[11142.092727] [] el1_irq+0xb4/0x140
[11142.097588] [] smp_call_function_single+0x128/0x178
[11142.104014] [] cpu_function_call+0x40/0x64
[11142.109654] [] event_function_call+0x100/0x108
[11142.115643] [] _perf_event_disable+0x4c/0x74
[11142.121459] [] perf_event_for_each_child+0x38/0x98
[11142.127796] [] perf_ioctl+0xec/0x7b4
[11142.132920] [] do_vfs_ioctl+0xcc/0x7bc
[11142.138213] [] SyS_ioctl+0x90/0xa4
[11142.143161] [] __sys_trace_return+0x0/0x4
[11142.148717] INFO: rcu_sched detected stalls on CPUs/tasks:
[11142.154187] 	7-...: (5983 ticks this GP) idle=2fb/140/0 
softirq=604/604 fqs=2995

[11142.162692]  (detected by 8, t=6027 jiffies, g=11855, c=11854, q=19211)
[11142.169289] Task dump for CPU 7:
[11142.172500] perfR  running task0  2357   2329 0x0202
[11142.179531] Call trace:
[11142.181964] [] __switch_to+0x64/0x70
[11142.187088] [] cpu_number+0x0/0x8


--
Regards
Pratyush

rcu_sched stall while waiting in csd_lock_wait()

2017-08-01 Thread Pratyush Anand


Hi,

I am observing following rcu_sched stall while executing `perf record -a  -- 
sleep 1` with one of the arm64 platform. It looks like that stalled cpu was 
waiting in csd_lock_wait() from where it never came out,and so the stall. Any 
help/pointer for further debugging would be very helpful. Problem also 
reproduced with 4.13.0-rc3.



[11108.758804] task: 80178249 task.stack: 801782574000
[11108.764710] PC is at smp_call_function_single+0x128/0x178
[11108.770089] LR is at smp_call_function_single+0x11c/0x178
[11108.775470] pc : [] lr : [] pstate: 
20400145

[11108.782848] sp : 801782577b80
[11108.786147] x29: 801782577b80 x28: 8017815c3800
[11108.791442] x27: 087f1000 x26: 001d
[11108.796737] x25:  x24: 081ddc90
[11108.802032] x23: 08c73000 x22: 0006
[11108.807327] x21: 081da548 x20: 801782577c18
[11108.812623] x19: 08c73000 x18: fa011c30
[11108.817918] x17:  x16: 
[11108.823213] x15: 006ed000 x14: 00620170
[11108.828508] x13:  x12: 0005
[11108.833803] x11:  x10: 
[11108.839098] x9 : fa011e40 x8 : 001d
[11108.844393] x7 :  x6 : 801782577bc8
[11108.849689] x5 : 0040 x4 : 801782577be0
[11108.854984] x3 : 801782577be0 x2 : 
[11108.860279] x1 : 0003 x0 : 
[11108.865574]
[11141.885599] INFO: rcu_sched self-detected stall on CPU
[11141.889777] 	7-...: (5982 ticks this GP) idle=2fb/141/0 
softirq=604/604 fqs=2995

[11141.898281]   (t=6000 jiffies g=11855 c=11854 q=19196)
[11141.903404] Task dump for CPU 7:
[11141.906615] perfR  running task0  2357   2329 0x0202
[11141.913647] Call trace:
[11141.916081] [] dump_backtrace+0x0/0x270
[11141.921460] [] show_stack+0x24/0x2c
[11141.926496] [] sched_show_task+0xec/0x130
[11141.932050] [] dump_cpu_task+0x48/0x54
[11141.937347] [] rcu_dump_cpu_stacks+0xac/0xf4
[11141.943163] [] rcu_check_callbacks+0x7a0/0xa84
[11141.949152] [] update_process_times+0x34/0x5c
[11141.955055] [] tick_sched_handle.isra.15+0x38/0x70
[11141.961390] [] tick_sched_timer+0x48/0x88
[11141.966946] [] __hrtimer_run_queues+0x150/0x2e4
[11141.973022] [] hrtimer_interrupt+0xa0/0x1d4
[11141.978754] [] arch_timer_handler_phys+0x3c/0x48
[11141.984916] [] handle_percpu_devid_irq+0x98/0x210
[11141.991164] [] generic_handle_irq+0x34/0x4c
[11141.996893] [] __handle_domain_irq+0x6c/0xc4
[11142.002710] [] gic_handle_irq+0x94/0x190
[11142.008178] Exception stack(0x801782577a50 to 0x801782577b80)
[11142.014602] 7a40:    
0003
[11142.022414] 7a60:  801782577be0 801782577be0 
0040
[11142.030226] 7a80: 801782577bc8  001d 
fa011e40
[11142.038039] 7aa0:   0005 

[11142.045851] 7ac0: 00620170 006ed000  

[11142.053664] 7ae0: fa011c30 08c73000 801782577c18 
081da548
[11142.061477] 7b00: 0006 08c73000 081ddc90 

[11142.069289] 7b20: 001d 087f1000 8017815c3800 
801782577b80
[11142.077101] 7b40: 081615a0 801782577b80 081615ac 
20400145
[11142.084914] 7b60: 001b0019 801782577c18 0001 
801782577bc8

[11142.092727] [] el1_irq+0xb4/0x140
[11142.097588] [] smp_call_function_single+0x128/0x178
[11142.104014] [] cpu_function_call+0x40/0x64
[11142.109654] [] event_function_call+0x100/0x108
[11142.115643] [] _perf_event_disable+0x4c/0x74
[11142.121459] [] perf_event_for_each_child+0x38/0x98
[11142.127796] [] perf_ioctl+0xec/0x7b4
[11142.132920] [] do_vfs_ioctl+0xcc/0x7bc
[11142.138213] [] SyS_ioctl+0x90/0xa4
[11142.143161] [] __sys_trace_return+0x0/0x4
[11142.148717] INFO: rcu_sched detected stalls on CPUs/tasks:
[11142.154187] 	7-...: (5983 ticks this GP) idle=2fb/140/0 
softirq=604/604 fqs=2995

[11142.162692]  (detected by 8, t=6027 jiffies, g=11855, c=11854, q=19211)
[11142.169289] Task dump for CPU 7:
[11142.172500] perfR  running task0  2357   2329 0x0202
[11142.179531] Call trace:
[11142.181964] [] __switch_to+0x64/0x70
[11142.187088] [] cpu_number+0x0/0x8


--
Regards
Pratyush

Re: [PATCH v3 0/5] ARM64: disable irq between breakpoint and step exception

2017-08-01 Thread Pratyush Anand


Hi Takahiro,

On Tuesday 01 August 2017 01:44 PM, AKASHI Takahiro wrote:

Hi Pratyush,

On Mon, Jul 31, 2017 at 04:10:28PM +0530, Pratyush Anand wrote:

v2 -> v3
- Moved step_needed from uapi structure to kernel only structure
- Re-enable interrupt if stepped instruction faults
- Modified register_wide_hw_breakpoint() to accept step_needed arg
v2 was here: http://marc.info/?l=linux-arm-kernel=149942910730496=2

v1 -> v2:
- patch 1 of v1 has been modified to patch 1-3 of v2.
- Introduced a new event attribute step_needed and implemented
   hw_breakpoint_needs_single_step() (patch 1)
- Replaced usage of is_default_overflow_handler() with
   hw_breakpoint_needs_single_step(). (patch 2)
- Modified sample test to set set step_needed bit field (patch 3)
v1 was here: http://marc.info/?l=linux-arm-kernel=149910958418708=2

samples/hw_breakpoint/data_breakpoint.c passes with x86_64 but fails with
ARM64. Even though it has been NAKed previously on upstream [1, 2], I have
tried to come up with patches which can resolve it for ARM64 as well.

I noticed that even perf step exception can go into an infinite loop if CPU
receives an interrupt while executing breakpoint/watchpoint handler. So,
event though we are not concerned about above test, we will have to find a
solution for the perf issue.

This patchset attempts to resolve both the issue. Please review.
Since, it also takes care of SW breakpoint, so I hope kgdb should also be
fine. However, I have not tested that.
@Takahiro: Will it be possible to test these patches for kgdb.


I have not yet understood the details of your patch, but
I gave it a try and didn't see any difference around the behavior
of kgdb's single stepping.

I also gave a try to James' patch, but again nothing different
as long as kgdb is concerned.
(I'm tackling some issue in single stepping at irq's kernel_exit,
in particular, 'eret'.)


You mean that you were expecting an step exception after eret (and this eret 
was being called from kgdb breakpoint exception handler), but you got irq 
exception? This is what I understood from your previous patch [0].


If that was the case, then I was expecting that this patch series should help.
See, patch 4/5:
- kgdb breakpoint handler kgdb_brk_fn() will be called from 
arch/arm64/kernel/debug-monitors.c: brk_handler().
- If we are expecting a step exception after servicing this breakpoint 
handler, then kgdb code would have called kernel_enable_single_step(). So, we 
should see kernel_active_single_step() true in brk_handler().
- If above happens then do_debug_exception() will make sure that PSR I bit is 
set before eret is called and we should not see an IRQ exception after eret.


Can you please help me with your reproducer test case?

[0]  http://lists.infradead.org/pipermail/linux-arm-kernel/2017-May/508066.html

--
Regards
Pratyush

Re: [PATCH v3 0/5] ARM64: disable irq between breakpoint and step exception

2017-08-01 Thread Pratyush Anand


Hi Takahiro,

On Tuesday 01 August 2017 01:44 PM, AKASHI Takahiro wrote:

Hi Pratyush,

On Mon, Jul 31, 2017 at 04:10:28PM +0530, Pratyush Anand wrote:

v2 -> v3
- Moved step_needed from uapi structure to kernel only structure
- Re-enable interrupt if stepped instruction faults
- Modified register_wide_hw_breakpoint() to accept step_needed arg
v2 was here: http://marc.info/?l=linux-arm-kernel=149942910730496=2

v1 -> v2:
- patch 1 of v1 has been modified to patch 1-3 of v2.
- Introduced a new event attribute step_needed and implemented
   hw_breakpoint_needs_single_step() (patch 1)
- Replaced usage of is_default_overflow_handler() with
   hw_breakpoint_needs_single_step(). (patch 2)
- Modified sample test to set set step_needed bit field (patch 3)
v1 was here: http://marc.info/?l=linux-arm-kernel=149910958418708=2

samples/hw_breakpoint/data_breakpoint.c passes with x86_64 but fails with
ARM64. Even though it has been NAKed previously on upstream [1, 2], I have
tried to come up with patches which can resolve it for ARM64 as well.

I noticed that even perf step exception can go into an infinite loop if CPU
receives an interrupt while executing breakpoint/watchpoint handler. So,
event though we are not concerned about above test, we will have to find a
solution for the perf issue.

This patchset attempts to resolve both the issue. Please review.
Since, it also takes care of SW breakpoint, so I hope kgdb should also be
fine. However, I have not tested that.
@Takahiro: Will it be possible to test these patches for kgdb.


I have not yet understood the details of your patch, but
I gave it a try and didn't see any difference around the behavior
of kgdb's single stepping.

I also gave a try to James' patch, but again nothing different
as long as kgdb is concerned.
(I'm tackling some issue in single stepping at irq's kernel_exit,
in particular, 'eret'.)


You mean that you were expecting an step exception after eret (and this eret 
was being called from kgdb breakpoint exception handler), but you got irq 
exception? This is what I understood from your previous patch [0].


If that was the case, then I was expecting that this patch series should help.
See, patch 4/5:
- kgdb breakpoint handler kgdb_brk_fn() will be called from 
arch/arm64/kernel/debug-monitors.c: brk_handler().
- If we are expecting a step exception after servicing this breakpoint 
handler, then kgdb code would have called kernel_enable_single_step(). So, we 
should see kernel_active_single_step() true in brk_handler().
- If above happens then do_debug_exception() will make sure that PSR I bit is 
set before eret is called and we should not see an IRQ exception after eret.


Can you please help me with your reproducer test case?

[0]  http://lists.infradead.org/pipermail/linux-arm-kernel/2017-May/508066.html

--
Regards
Pratyush

Re: [PATCH v3 0/5] ARM64: disable irq between breakpoint and step exception

2017-07-31 Thread Pratyush Anand


Hi James,

On Monday 31 July 2017 10:45 PM, James Morse wrote:

Hi Pratyush,

On 31/07/17 11:40, Pratyush Anand wrote:

samples/hw_breakpoint/data_breakpoint.c passes with x86_64 but fails with
ARM64. Even though it has been NAKed previously on upstream [1, 2], I have
tried to come up with patches which can resolve it for ARM64 as well.

I noticed that even perf step exception can go into an infinite loop if CPU
receives an interrupt while executing breakpoint/watchpoint handler. So,
event though we are not concerned about above test, we will have to find a
solution for the perf issue.


This caught my eye as I've been reworking the order the DAIF flags get
set/cleared[0].


Thanks for pointing to your series.


What causes your infinite loop? Is it single-stepping kernel_exit? If so patch 4
"arm64: entry.S mask all exceptions during kernel_exit" [1] may help.


Flow is like this:
- A SW or HW breakpoint exception is being generated on a cpu lets say CPU5
- Breakpoint handler does something which causes an interrupt to be active on 
the same CPU. In fact there might be many other reasons for an interrupt to be 
active on a CPU while breakpoint handler was being executed.
- So, as soon as we return from breakpoint exception, we go to the IRQ 
exception handler, while we were expecting a single step exception.


I do not think that your patch 4 will help here. That patch disables interrupt 
while kernel_exit will execute.So,until we enable PSR I bit, we can not stop 
an interrupt to be generated before step exception.


You can easily reproduce the issue with following:
# insmod data_breakpoint.ko ksym=__sysrq_enabled
# cat /proc/sys/kernel/sysrq

Where data_breakpoint.ko is module from samples/hw_breakpoint/data_breakpoint.c.



If its more like "single stepping something we didn't expect" you will get the
same problem if we take an SError. (which with that series is unmasked ~all the
time).
Either way this looks like a new and exciting way of hitting the 'known issue'
described in patch 12 [3].


Would disabling MDSCR_EL1.SS if we took an exception solve your problem?

If so, I think we should add a new flag, 'TIF_KSINGLESTEP', causing us to
save/restore MDSCR_EL1.SS into pt_regs on el1 exceptions. This would let us
single-step without modifying the DAIF flags for the location we are stepping,
and allow taking any kind of exception from that location.

We should disable nested users of single-step, we can do that by testing the
flag, print a warning then pretend we missed the breakpoint. (hence it needs to
be separate from the user single-step flag).


Thanks,

James

[0] https://www.spinics.net/lists/arm-kernel/msg596684.html
[1] https://www.spinics.net/lists/arm-kernel/msg596686.html
[2] https://www.spinics.net/lists/arm-kernel/msg596689.html

___
linux-arm-kernel mailing list
linux-arm-ker...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel



--
Regards
Pratyush

Re: [PATCH v3 0/5] ARM64: disable irq between breakpoint and step exception

2017-07-31 Thread Pratyush Anand


Hi James,

On Monday 31 July 2017 10:45 PM, James Morse wrote:

Hi Pratyush,

On 31/07/17 11:40, Pratyush Anand wrote:

samples/hw_breakpoint/data_breakpoint.c passes with x86_64 but fails with
ARM64. Even though it has been NAKed previously on upstream [1, 2], I have
tried to come up with patches which can resolve it for ARM64 as well.

I noticed that even perf step exception can go into an infinite loop if CPU
receives an interrupt while executing breakpoint/watchpoint handler. So,
event though we are not concerned about above test, we will have to find a
solution for the perf issue.


This caught my eye as I've been reworking the order the DAIF flags get
set/cleared[0].


Thanks for pointing to your series.


What causes your infinite loop? Is it single-stepping kernel_exit? If so patch 4
"arm64: entry.S mask all exceptions during kernel_exit" [1] may help.


Flow is like this:
- A SW or HW breakpoint exception is being generated on a cpu lets say CPU5
- Breakpoint handler does something which causes an interrupt to be active on 
the same CPU. In fact there might be many other reasons for an interrupt to be 
active on a CPU while breakpoint handler was being executed.
- So, as soon as we return from breakpoint exception, we go to the IRQ 
exception handler, while we were expecting a single step exception.


I do not think that your patch 4 will help here. That patch disables interrupt 
while kernel_exit will execute.So,until we enable PSR I bit, we can not stop 
an interrupt to be generated before step exception.


You can easily reproduce the issue with following:
# insmod data_breakpoint.ko ksym=__sysrq_enabled
# cat /proc/sys/kernel/sysrq

Where data_breakpoint.ko is module from samples/hw_breakpoint/data_breakpoint.c.



If its more like "single stepping something we didn't expect" you will get the
same problem if we take an SError. (which with that series is unmasked ~all the
time).
Either way this looks like a new and exciting way of hitting the 'known issue'
described in patch 12 [3].


Would disabling MDSCR_EL1.SS if we took an exception solve your problem?

If so, I think we should add a new flag, 'TIF_KSINGLESTEP', causing us to
save/restore MDSCR_EL1.SS into pt_regs on el1 exceptions. This would let us
single-step without modifying the DAIF flags for the location we are stepping,
and allow taking any kind of exception from that location.

We should disable nested users of single-step, we can do that by testing the
flag, print a warning then pretend we missed the breakpoint. (hence it needs to
be separate from the user single-step flag).


Thanks,

James

[0] https://www.spinics.net/lists/arm-kernel/msg596684.html
[1] https://www.spinics.net/lists/arm-kernel/msg596686.html
[2] https://www.spinics.net/lists/arm-kernel/msg596689.html

___
linux-arm-kernel mailing list
linux-arm-ker...@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel



--
Regards
Pratyush

Re: [PATCH RFC] arm64: introduce mm_context_t flags

2017-07-31 Thread Pratyush Anand




On Monday 31 July 2017 08:18 PM, Yury Norov wrote:

  - If we start using TIF flags here, we cannot easily add new mm_context
specific bits because they may mess with TIF ones.


This one seems convincing argument. ATM do you have any mm_context flag which 
would you like to incorporate?




I think that this is not what was intended when you added new field in
mm_context_t.

In this patch the MMCF_AARCH32 flag is introduced, where MMCF prefix stands for
mm_context_t flags. And the new flag is used for uprobes code instead of 
TIF_32BIT.


--
Regards
Pratyush

Re: [PATCH RFC] arm64: introduce mm_context_t flags

2017-07-31 Thread Pratyush Anand




On Monday 31 July 2017 08:18 PM, Yury Norov wrote:

  - If we start using TIF flags here, we cannot easily add new mm_context
specific bits because they may mess with TIF ones.


This one seems convincing argument. ATM do you have any mm_context flag which 
would you like to incorporate?




I think that this is not what was intended when you added new field in
mm_context_t.

In this patch the MMCF_AARCH32 flag is introduced, where MMCF prefix stands for
mm_context_t flags. And the new flag is used for uprobes code instead of 
TIF_32BIT.


--
Regards
Pratyush

[PATCH v3 2/5] arm64: use hw_breakpoint_needs_single_step() to decide if step is needed

2017-07-31 Thread Pratyush Anand

Currently we use is_default_overflow_handler() to decide whether a
"step" will be needed or not. However, is_default_overflow_handler() is
true only for perf implementation. There can be some custom kernel
module tests like samples/hw_breakpoint/data_breakpoint.c which can
rely on default step handler.

hw_breakpoint_needs_single_step() will be true if any hw_breakpoint user
wants to use default step handler and sets step_needed in hw_perf_event.

Signed-off-by: Pratyush Anand <pan...@redhat.com>
---
 arch/arm64/kernel/hw_breakpoint.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kernel/hw_breakpoint.c 
b/arch/arm64/kernel/hw_breakpoint.c
index 749f81779420..9a73f85ab9ad 100644
--- a/arch/arm64/kernel/hw_breakpoint.c
+++ b/arch/arm64/kernel/hw_breakpoint.c
@@ -661,7 +661,7 @@ static int breakpoint_handler(unsigned long unused, 
unsigned int esr,
perf_bp_event(bp, regs);
 
/* Do we need to handle the stepping? */
-   if (is_default_overflow_handler(bp))
+   if (hw_breakpoint_needs_single_step(bp))
step = 1;
 unlock:
rcu_read_unlock();
@@ -789,7 +789,7 @@ static int watchpoint_handler(unsigned long addr, unsigned 
int esr,
perf_bp_event(wp, regs);
 
/* Do we need to handle the stepping? */
-   if (is_default_overflow_handler(wp))
+   if (hw_breakpoint_needs_single_step(wp))
step = 1;
}
if (min_dist > 0 && min_dist != -1) {
@@ -800,7 +800,7 @@ static int watchpoint_handler(unsigned long addr, unsigned 
int esr,
perf_bp_event(wp, regs);
 
/* Do we need to handle the stepping? */
-   if (is_default_overflow_handler(wp))
+   if (hw_breakpoint_needs_single_step(wp))
step = 1;
}
rcu_read_unlock();
-- 
2.9.4

[PATCH v3 2/5] arm64: use hw_breakpoint_needs_single_step() to decide if step is needed

2017-07-31 Thread Pratyush Anand

Currently we use is_default_overflow_handler() to decide whether a
"step" will be needed or not. However, is_default_overflow_handler() is
true only for perf implementation. There can be some custom kernel
module tests like samples/hw_breakpoint/data_breakpoint.c which can
rely on default step handler.

hw_breakpoint_needs_single_step() will be true if any hw_breakpoint user
wants to use default step handler and sets step_needed in hw_perf_event.

Signed-off-by: Pratyush Anand 
---
 arch/arm64/kernel/hw_breakpoint.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kernel/hw_breakpoint.c 
b/arch/arm64/kernel/hw_breakpoint.c
index 749f81779420..9a73f85ab9ad 100644
--- a/arch/arm64/kernel/hw_breakpoint.c
+++ b/arch/arm64/kernel/hw_breakpoint.c
@@ -661,7 +661,7 @@ static int breakpoint_handler(unsigned long unused, 
unsigned int esr,
perf_bp_event(bp, regs);
 
/* Do we need to handle the stepping? */
-   if (is_default_overflow_handler(bp))
+   if (hw_breakpoint_needs_single_step(bp))
step = 1;
 unlock:
rcu_read_unlock();
@@ -789,7 +789,7 @@ static int watchpoint_handler(unsigned long addr, unsigned 
int esr,
perf_bp_event(wp, regs);
 
/* Do we need to handle the stepping? */
-   if (is_default_overflow_handler(wp))
+   if (hw_breakpoint_needs_single_step(wp))
step = 1;
}
if (min_dist > 0 && min_dist != -1) {
@@ -800,7 +800,7 @@ static int watchpoint_handler(unsigned long addr, unsigned 
int esr,
perf_bp_event(wp, regs);
 
/* Do we need to handle the stepping? */
-   if (is_default_overflow_handler(wp))
+   if (hw_breakpoint_needs_single_step(wp))
step = 1;
}
rcu_read_unlock();
-- 
2.9.4

[PATCH v3 1/5] hw_breakpoint: Add step_needed event attribute

2017-07-31 Thread Pratyush Anand

Architecture like ARM64 currently allows to use default hw breakpoint
single step handler only to perf. However, some other users like few
systemtap tests or kernel test in
samples/hw_breakpoint/data_breakpoint.c can also work with default step
handler implementation.

Therefore, this patch introduces a flag 'step_needed' in struct
hw_perf_event, so that arch specific code(specially on arm64) can make a
decision to enable single stepping.

Any architecture which is not using this field will not have any
side effect.

Signed-off-by: Pratyush Anand <pan...@redhat.com>
---
 include/linux/hw_breakpoint.h | 6 ++
 include/linux/perf_event.h| 6 ++
 kernel/events/core.c  | 2 ++
 3 files changed, 14 insertions(+)

diff --git a/include/linux/hw_breakpoint.h b/include/linux/hw_breakpoint.h
index 0464c85e63fd..b9ac9629bf74 100644
--- a/include/linux/hw_breakpoint.h
+++ b/include/linux/hw_breakpoint.h
@@ -38,6 +38,12 @@ static inline int hw_breakpoint_type(struct perf_event *bp)
return bp->attr.bp_type;
 }
 
+static inline bool
+hw_breakpoint_needs_single_step(struct perf_event *bp)
+{
+   return bp->hw.step_needed;
+}
+
 static inline unsigned long hw_breakpoint_len(struct perf_event *bp)
 {
return bp->attr.bp_len;
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 24a635887f28..7da951f94b47 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -233,6 +233,12 @@ struct hw_perf_event {
 */
u64 freq_time_stamp;
u64 freq_count_stamp;
+   /*
+* A HW breakpoint user can either have it's own step handling
+* mechanism or it can use default step handling meachanism defined
+* by arch code. Set step_needed to use default mechanism.
+*/
+   int step_needed;
 #endif
 };
 
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 6c4e523dc1e2..66ce5574e778 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -9444,9 +9444,11 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
} else if (is_write_backward(event)){
event->overflow_handler = perf_event_output_backward;
event->overflow_handler_context = NULL;
+   event->hw.step_needed = 1;
} else {
event->overflow_handler = perf_event_output_forward;
event->overflow_handler_context = NULL;
+   event->hw.step_needed = 1;
}
 
perf_event__state_init(event);
-- 
2.9.4

[PATCH v3 1/5] hw_breakpoint: Add step_needed event attribute

2017-07-31 Thread Pratyush Anand

Architecture like ARM64 currently allows to use default hw breakpoint
single step handler only to perf. However, some other users like few
systemtap tests or kernel test in
samples/hw_breakpoint/data_breakpoint.c can also work with default step
handler implementation.

Therefore, this patch introduces a flag 'step_needed' in struct
hw_perf_event, so that arch specific code(specially on arm64) can make a
decision to enable single stepping.

Any architecture which is not using this field will not have any
side effect.

Signed-off-by: Pratyush Anand 
---
 include/linux/hw_breakpoint.h | 6 ++
 include/linux/perf_event.h| 6 ++
 kernel/events/core.c  | 2 ++
 3 files changed, 14 insertions(+)

diff --git a/include/linux/hw_breakpoint.h b/include/linux/hw_breakpoint.h
index 0464c85e63fd..b9ac9629bf74 100644
--- a/include/linux/hw_breakpoint.h
+++ b/include/linux/hw_breakpoint.h
@@ -38,6 +38,12 @@ static inline int hw_breakpoint_type(struct perf_event *bp)
return bp->attr.bp_type;
 }
 
+static inline bool
+hw_breakpoint_needs_single_step(struct perf_event *bp)
+{
+   return bp->hw.step_needed;
+}
+
 static inline unsigned long hw_breakpoint_len(struct perf_event *bp)
 {
return bp->attr.bp_len;
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 24a635887f28..7da951f94b47 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -233,6 +233,12 @@ struct hw_perf_event {
 */
u64 freq_time_stamp;
u64 freq_count_stamp;
+   /*
+* A HW breakpoint user can either have it's own step handling
+* mechanism or it can use default step handling meachanism defined
+* by arch code. Set step_needed to use default mechanism.
+*/
+   int step_needed;
 #endif
 };
 
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 6c4e523dc1e2..66ce5574e778 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -9444,9 +9444,11 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
} else if (is_write_backward(event)){
event->overflow_handler = perf_event_output_backward;
event->overflow_handler_context = NULL;
+   event->hw.step_needed = 1;
} else {
event->overflow_handler = perf_event_output_forward;
event->overflow_handler_context = NULL;
+   event->hw.step_needed = 1;
}
 
perf_event__state_init(event);
-- 
2.9.4

[PATCH v3 3/5] register_wide_hw_breakpoint(): modify to accept step_needed arg

2017-07-31 Thread Pratyush Anand

arch like ARM64 expects 'step_needed = 1'  in order to use default
single step handler. Therefore, modify register_wide_hw_breakpoint()
implementation,so that we can set this field in struct hw_perf_event to
be used later by arch specific code.

Other arch will not have any affect as they do not use it so far.

Signed-off-by: Pratyush Anand <pan...@redhat.com>
---
 arch/x86/kernel/kgdb.c  | 2 +-
 include/linux/hw_breakpoint.h   | 4 ++--
 kernel/events/hw_breakpoint.c   | 4 +++-
 samples/hw_breakpoint/data_breakpoint.c | 3 ++-
 4 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/kgdb.c b/arch/x86/kernel/kgdb.c
index 8e36f249646e..19b24c50a952 100644
--- a/arch/x86/kernel/kgdb.c
+++ b/arch/x86/kernel/kgdb.c
@@ -674,7 +674,7 @@ void kgdb_arch_late(void)
for (i = 0; i < HBP_NUM; i++) {
if (breakinfo[i].pev)
continue;
-   breakinfo[i].pev = register_wide_hw_breakpoint(, NULL, 
NULL);
+   breakinfo[i].pev = register_wide_hw_breakpoint(, NULL, 
NULL, 0);
if (IS_ERR((void * __force)breakinfo[i].pev)) {
printk(KERN_ERR "kgdb: Could not allocate hw"
   "breakpoints\nDisabling the kernel debugger\n");
diff --git a/include/linux/hw_breakpoint.h b/include/linux/hw_breakpoint.h
index b9ac9629bf74..8cbc8ded6d50 100644
--- a/include/linux/hw_breakpoint.h
+++ b/include/linux/hw_breakpoint.h
@@ -71,7 +71,7 @@ register_wide_hw_breakpoint_cpu(struct perf_event_attr *attr,
 extern struct perf_event * __percpu *
 register_wide_hw_breakpoint(struct perf_event_attr *attr,
perf_overflow_handler_t triggered,
-   void *context);
+   void *context, int step);
 
 extern int register_perf_hw_breakpoint(struct perf_event *bp);
 extern int __register_perf_hw_breakpoint(struct perf_event *bp);
@@ -110,7 +110,7 @@ register_wide_hw_breakpoint_cpu(struct perf_event_attr 
*attr,
 static inline struct perf_event * __percpu *
 register_wide_hw_breakpoint(struct perf_event_attr *attr,
perf_overflow_handler_t triggered,
-   void *context)  { return NULL; }
+   void *context, int step){ return NULL; }
 static inline int
 register_perf_hw_breakpoint(struct perf_event *bp) { return -ENOSYS; }
 static inline int
diff --git a/kernel/events/hw_breakpoint.c b/kernel/events/hw_breakpoint.c
index 3f8cb1e14588..0dcb175276b2 100644
--- a/kernel/events/hw_breakpoint.c
+++ b/kernel/events/hw_breakpoint.c
@@ -492,13 +492,14 @@ EXPORT_SYMBOL_GPL(unregister_hw_breakpoint);
  * register_wide_hw_breakpoint - register a wide breakpoint in the kernel
  * @attr: breakpoint attributes
  * @triggered: callback to trigger when we hit the breakpoint
+ * @step: tells if framework can use default arch step handler
  *
  * @return a set of per_cpu pointers to perf events
  */
 struct perf_event * __percpu *
 register_wide_hw_breakpoint(struct perf_event_attr *attr,
perf_overflow_handler_t triggered,
-   void *context)
+   void *context, int step)
 {
struct perf_event * __percpu *cpu_events, *bp;
long err = 0;
@@ -512,6 +513,7 @@ register_wide_hw_breakpoint(struct perf_event_attr *attr,
for_each_online_cpu(cpu) {
bp = perf_event_create_kernel_counter(attr, cpu, NULL,
  triggered, context);
+   bp->hw.step_needed = step;
if (IS_ERR(bp)) {
err = PTR_ERR(bp);
break;
diff --git a/samples/hw_breakpoint/data_breakpoint.c 
b/samples/hw_breakpoint/data_breakpoint.c
index ef7f32291852..f64e59f9fbc6 100644
--- a/samples/hw_breakpoint/data_breakpoint.c
+++ b/samples/hw_breakpoint/data_breakpoint.c
@@ -60,7 +60,8 @@ static int __init hw_break_module_init(void)
attr.bp_len = HW_BREAKPOINT_LEN_4;
attr.bp_type = HW_BREAKPOINT_W | HW_BREAKPOINT_R;
 
-   sample_hbp = register_wide_hw_breakpoint(, sample_hbp_handler, 
NULL);
+   sample_hbp = register_wide_hw_breakpoint(, sample_hbp_handler,
+NULL, 1);
if (IS_ERR((void __force *)sample_hbp)) {
ret = PTR_ERR((void __force *)sample_hbp);
goto fail;
-- 
2.9.4

[PATCH v3 3/5] register_wide_hw_breakpoint(): modify to accept step_needed arg

2017-07-31 Thread Pratyush Anand

arch like ARM64 expects 'step_needed = 1'  in order to use default
single step handler. Therefore, modify register_wide_hw_breakpoint()
implementation,so that we can set this field in struct hw_perf_event to
be used later by arch specific code.

Other arch will not have any affect as they do not use it so far.

Signed-off-by: Pratyush Anand 
---
 arch/x86/kernel/kgdb.c  | 2 +-
 include/linux/hw_breakpoint.h   | 4 ++--
 kernel/events/hw_breakpoint.c   | 4 +++-
 samples/hw_breakpoint/data_breakpoint.c | 3 ++-
 4 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/kgdb.c b/arch/x86/kernel/kgdb.c
index 8e36f249646e..19b24c50a952 100644
--- a/arch/x86/kernel/kgdb.c
+++ b/arch/x86/kernel/kgdb.c
@@ -674,7 +674,7 @@ void kgdb_arch_late(void)
for (i = 0; i < HBP_NUM; i++) {
if (breakinfo[i].pev)
continue;
-   breakinfo[i].pev = register_wide_hw_breakpoint(, NULL, 
NULL);
+   breakinfo[i].pev = register_wide_hw_breakpoint(, NULL, 
NULL, 0);
if (IS_ERR((void * __force)breakinfo[i].pev)) {
printk(KERN_ERR "kgdb: Could not allocate hw"
   "breakpoints\nDisabling the kernel debugger\n");
diff --git a/include/linux/hw_breakpoint.h b/include/linux/hw_breakpoint.h
index b9ac9629bf74..8cbc8ded6d50 100644
--- a/include/linux/hw_breakpoint.h
+++ b/include/linux/hw_breakpoint.h
@@ -71,7 +71,7 @@ register_wide_hw_breakpoint_cpu(struct perf_event_attr *attr,
 extern struct perf_event * __percpu *
 register_wide_hw_breakpoint(struct perf_event_attr *attr,
perf_overflow_handler_t triggered,
-   void *context);
+   void *context, int step);
 
 extern int register_perf_hw_breakpoint(struct perf_event *bp);
 extern int __register_perf_hw_breakpoint(struct perf_event *bp);
@@ -110,7 +110,7 @@ register_wide_hw_breakpoint_cpu(struct perf_event_attr 
*attr,
 static inline struct perf_event * __percpu *
 register_wide_hw_breakpoint(struct perf_event_attr *attr,
perf_overflow_handler_t triggered,
-   void *context)  { return NULL; }
+   void *context, int step){ return NULL; }
 static inline int
 register_perf_hw_breakpoint(struct perf_event *bp) { return -ENOSYS; }
 static inline int
diff --git a/kernel/events/hw_breakpoint.c b/kernel/events/hw_breakpoint.c
index 3f8cb1e14588..0dcb175276b2 100644
--- a/kernel/events/hw_breakpoint.c
+++ b/kernel/events/hw_breakpoint.c
@@ -492,13 +492,14 @@ EXPORT_SYMBOL_GPL(unregister_hw_breakpoint);
  * register_wide_hw_breakpoint - register a wide breakpoint in the kernel
  * @attr: breakpoint attributes
  * @triggered: callback to trigger when we hit the breakpoint
+ * @step: tells if framework can use default arch step handler
  *
  * @return a set of per_cpu pointers to perf events
  */
 struct perf_event * __percpu *
 register_wide_hw_breakpoint(struct perf_event_attr *attr,
perf_overflow_handler_t triggered,
-   void *context)
+   void *context, int step)
 {
struct perf_event * __percpu *cpu_events, *bp;
long err = 0;
@@ -512,6 +513,7 @@ register_wide_hw_breakpoint(struct perf_event_attr *attr,
for_each_online_cpu(cpu) {
bp = perf_event_create_kernel_counter(attr, cpu, NULL,
  triggered, context);
+   bp->hw.step_needed = step;
if (IS_ERR(bp)) {
err = PTR_ERR(bp);
break;
diff --git a/samples/hw_breakpoint/data_breakpoint.c 
b/samples/hw_breakpoint/data_breakpoint.c
index ef7f32291852..f64e59f9fbc6 100644
--- a/samples/hw_breakpoint/data_breakpoint.c
+++ b/samples/hw_breakpoint/data_breakpoint.c
@@ -60,7 +60,8 @@ static int __init hw_break_module_init(void)
attr.bp_len = HW_BREAKPOINT_LEN_4;
attr.bp_type = HW_BREAKPOINT_W | HW_BREAKPOINT_R;
 
-   sample_hbp = register_wide_hw_breakpoint(, sample_hbp_handler, 
NULL);
+   sample_hbp = register_wide_hw_breakpoint(, sample_hbp_handler,
+NULL, 1);
if (IS_ERR((void __force *)sample_hbp)) {
ret = PTR_ERR((void __force *)sample_hbp);
goto fail;
-- 
2.9.4

[PATCH v3 4/5] arm64: disable irq between breakpoint and step exception

2017-07-31 Thread Pratyush Anand

If an interrupt is generated between breakpoint and step handler then
step handler can not get correct step address. This situation can easily
be invoked by samples/hw_breakpoint/data_breakpoint.c. It can also be
reproduced if we insert any printk() statement or dump_stack() in perf
overflow_handler. So, it seems that perf is working fine just luckily.
If the CPU which is handling perf breakpoint handler receives any
interrupt then, perf step handler will not execute sanely.

This patch improves do_debug_exception() handling, which enforces now,
that exception handler function:
- should return 0 for any software breakpoint and hw
breakpoint/watchpoint handler if it does not expect a single step stage
- should return 1 if it expects single step.
- A single step handler should always return 0.
- All handler should return a -ve error in any other case.

Now, we can know in do_debug_exception() that whether a step exception
will be followed or not. If there will a step exception then disable
irq. Re-enable it after single step handling.

Since we also fix brk_handler() for the above rule, so all SW kernel
breakpoint handler like kgdb and kprobe should behave similar to perf HW
breakpoint. Interrupt will be disabled if kgdb or kprobe breakpoint
handler requires a single stepping.

Signed-off-by: Pratyush Anand <pan...@redhat.com>
---
 arch/arm64/kernel/debug-monitors.c |  3 +++
 arch/arm64/kernel/hw_breakpoint.c  |  4 ++--
 arch/arm64/mm/fault.c  | 22 ++
 3 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/kernel/debug-monitors.c 
b/arch/arm64/kernel/debug-monitors.c
index d618e25c3de1..16f29f853b54 100644
--- a/arch/arm64/kernel/debug-monitors.c
+++ b/arch/arm64/kernel/debug-monitors.c
@@ -325,6 +325,9 @@ static int brk_handler(unsigned long addr, unsigned int esr,
return -EFAULT;
}
 
+   if (kernel_active_single_step() || test_thread_flag(TIF_SINGLESTEP))
+   return 1;
+
return 0;
 }
 NOKPROBE_SYMBOL(brk_handler);
diff --git a/arch/arm64/kernel/hw_breakpoint.c 
b/arch/arm64/kernel/hw_breakpoint.c
index 9a73f85ab9ad..d39b8039c70e 100644
--- a/arch/arm64/kernel/hw_breakpoint.c
+++ b/arch/arm64/kernel/hw_breakpoint.c
@@ -697,7 +697,7 @@ static int breakpoint_handler(unsigned long unused, 
unsigned int esr,
}
}
 
-   return 0;
+   return 1;
 }
 NOKPROBE_SYMBOL(breakpoint_handler);
 
@@ -840,7 +840,7 @@ static int watchpoint_handler(unsigned long addr, unsigned 
int esr,
}
}
 
-   return 0;
+   return 1;
 }
 NOKPROBE_SYMBOL(watchpoint_handler);
 
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 37b95dff0b07..ce5290dacba3 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -653,6 +653,13 @@ static struct fault_info __refdata debug_fault_info[] = {
{ do_bad,   SIGBUS, 0,  "unknown 7" 
},
 };
 
+/*
+ * fn should return 0 from any software breakpoint and hw
+ * breakpoint/watchpoint handler if it does not expect a single step stage
+ * and 1 if it expects single step followed by its execution. A single step
+ * handler should always return 0. All handler should return a -ve error in
+ * any other case.
+ */
 void __init hook_debug_fault_code(int nr,
  int (*fn)(unsigned long, unsigned int, struct 
pt_regs *),
  int sig, int code, const char *name)
@@ -665,6 +672,8 @@ void __init hook_debug_fault_code(int nr,
debug_fault_info[nr].name   = name;
 }
 
+static DEFINE_PER_CPU(bool, irq_enable_needed);
+
 asmlinkage int __exception do_debug_exception(unsigned long addr,
  unsigned int esr,
  struct pt_regs *regs)
@@ -672,6 +681,7 @@ asmlinkage int __exception do_debug_exception(unsigned long 
addr,
const struct fault_info *inf = debug_fault_info + DBG_ESR_EVT(esr);
struct siginfo info;
int rv;
+   bool *irq_en_needed = this_cpu_ptr(_enable_needed);
 
/*
 * Tell lockdep we disabled irqs in entry.S. Do nothing if they were
@@ -680,9 +690,8 @@ asmlinkage int __exception do_debug_exception(unsigned long 
addr,
if (interrupts_enabled(regs))
trace_hardirqs_off();
 
-   if (!inf->fn(addr, esr, regs)) {
-   rv = 1;
-   } else {
+   rv = inf->fn(addr, esr, regs);
+   if (rv < 0) {
pr_alert("Unhandled debug exception: %s (0x%08x) at 0x%016lx\n",
 inf->name, esr, addr);
 
@@ -691,7 +700,12 @@ asmlinkage int __exception do_debug_exception(unsigned 
long addr,
info.si_code  = inf->code;
info.si_addr  = (void __user *)addr;
arm64_notify_die("", regs, , 0);
-   rv = 0;
+   } else if (rv == 1

[PATCH v3 5/5] arm64: fault: re-enable irq if it was disabled for single stepping

2017-07-31 Thread Pratyush Anand

We disable irq before single stepping and re-enable it after it.
However, if stepped instruction will cause a fault then we will enter
into fault handler with interrupt disabled, which is not desired.
But, we should be safe if we re-enable interrupt in fault handler if
it was disabled for single stepping.

Signed-off-by: Pratyush Anand <pan...@redhat.com>
---
 arch/arm64/mm/fault.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index ce5290dacba3..2b88807eb964 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -589,6 +589,8 @@ static const struct fault_info fault_info[] = {
{ do_bad,   SIGBUS,  0, "unknown 63"
},
 };
 
+static DEFINE_PER_CPU(bool, irq_enable_needed);
+
 /*
  * Dispatch a data abort to the relevant handler.
  */
@@ -597,6 +599,12 @@ asmlinkage void __exception do_mem_abort(unsigned long 
addr, unsigned int esr,
 {
const struct fault_info *inf = esr_to_fault_info(esr);
struct siginfo info;
+   bool *irq_en_needed = this_cpu_ptr(_enable_needed);
+
+   if (*irq_en_needed) {
+   regs->pstate &= ~PSR_I_BIT;
+   *irq_en_needed = false;
+   }
 
if (!inf->fn(addr, esr, regs))
return;
@@ -672,8 +680,6 @@ void __init hook_debug_fault_code(int nr,
debug_fault_info[nr].name   = name;
 }
 
-static DEFINE_PER_CPU(bool, irq_enable_needed);
-
 asmlinkage int __exception do_debug_exception(unsigned long addr,
  unsigned int esr,
  struct pt_regs *regs)
-- 
2.9.4

[PATCH v3 4/5] arm64: disable irq between breakpoint and step exception

2017-07-31 Thread Pratyush Anand

If an interrupt is generated between breakpoint and step handler then
step handler can not get correct step address. This situation can easily
be invoked by samples/hw_breakpoint/data_breakpoint.c. It can also be
reproduced if we insert any printk() statement or dump_stack() in perf
overflow_handler. So, it seems that perf is working fine just luckily.
If the CPU which is handling perf breakpoint handler receives any
interrupt then, perf step handler will not execute sanely.

This patch improves do_debug_exception() handling, which enforces now,
that exception handler function:
- should return 0 for any software breakpoint and hw
breakpoint/watchpoint handler if it does not expect a single step stage
- should return 1 if it expects single step.
- A single step handler should always return 0.
- All handler should return a -ve error in any other case.

Now, we can know in do_debug_exception() that whether a step exception
will be followed or not. If there will a step exception then disable
irq. Re-enable it after single step handling.

Since we also fix brk_handler() for the above rule, so all SW kernel
breakpoint handler like kgdb and kprobe should behave similar to perf HW
breakpoint. Interrupt will be disabled if kgdb or kprobe breakpoint
handler requires a single stepping.

Signed-off-by: Pratyush Anand 
---
 arch/arm64/kernel/debug-monitors.c |  3 +++
 arch/arm64/kernel/hw_breakpoint.c  |  4 ++--
 arch/arm64/mm/fault.c  | 22 ++
 3 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/kernel/debug-monitors.c 
b/arch/arm64/kernel/debug-monitors.c
index d618e25c3de1..16f29f853b54 100644
--- a/arch/arm64/kernel/debug-monitors.c
+++ b/arch/arm64/kernel/debug-monitors.c
@@ -325,6 +325,9 @@ static int brk_handler(unsigned long addr, unsigned int esr,
return -EFAULT;
}
 
+   if (kernel_active_single_step() || test_thread_flag(TIF_SINGLESTEP))
+   return 1;
+
return 0;
 }
 NOKPROBE_SYMBOL(brk_handler);
diff --git a/arch/arm64/kernel/hw_breakpoint.c 
b/arch/arm64/kernel/hw_breakpoint.c
index 9a73f85ab9ad..d39b8039c70e 100644
--- a/arch/arm64/kernel/hw_breakpoint.c
+++ b/arch/arm64/kernel/hw_breakpoint.c
@@ -697,7 +697,7 @@ static int breakpoint_handler(unsigned long unused, 
unsigned int esr,
}
}
 
-   return 0;
+   return 1;
 }
 NOKPROBE_SYMBOL(breakpoint_handler);
 
@@ -840,7 +840,7 @@ static int watchpoint_handler(unsigned long addr, unsigned 
int esr,
}
}
 
-   return 0;
+   return 1;
 }
 NOKPROBE_SYMBOL(watchpoint_handler);
 
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 37b95dff0b07..ce5290dacba3 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -653,6 +653,13 @@ static struct fault_info __refdata debug_fault_info[] = {
{ do_bad,   SIGBUS, 0,  "unknown 7" 
},
 };
 
+/*
+ * fn should return 0 from any software breakpoint and hw
+ * breakpoint/watchpoint handler if it does not expect a single step stage
+ * and 1 if it expects single step followed by its execution. A single step
+ * handler should always return 0. All handler should return a -ve error in
+ * any other case.
+ */
 void __init hook_debug_fault_code(int nr,
  int (*fn)(unsigned long, unsigned int, struct 
pt_regs *),
  int sig, int code, const char *name)
@@ -665,6 +672,8 @@ void __init hook_debug_fault_code(int nr,
debug_fault_info[nr].name   = name;
 }
 
+static DEFINE_PER_CPU(bool, irq_enable_needed);
+
 asmlinkage int __exception do_debug_exception(unsigned long addr,
  unsigned int esr,
  struct pt_regs *regs)
@@ -672,6 +681,7 @@ asmlinkage int __exception do_debug_exception(unsigned long 
addr,
const struct fault_info *inf = debug_fault_info + DBG_ESR_EVT(esr);
struct siginfo info;
int rv;
+   bool *irq_en_needed = this_cpu_ptr(_enable_needed);
 
/*
 * Tell lockdep we disabled irqs in entry.S. Do nothing if they were
@@ -680,9 +690,8 @@ asmlinkage int __exception do_debug_exception(unsigned long 
addr,
if (interrupts_enabled(regs))
trace_hardirqs_off();
 
-   if (!inf->fn(addr, esr, regs)) {
-   rv = 1;
-   } else {
+   rv = inf->fn(addr, esr, regs);
+   if (rv < 0) {
pr_alert("Unhandled debug exception: %s (0x%08x) at 0x%016lx\n",
 inf->name, esr, addr);
 
@@ -691,7 +700,12 @@ asmlinkage int __exception do_debug_exception(unsigned 
long addr,
info.si_code  = inf->code;
info.si_addr  = (void __user *)addr;
arm64_notify_die("", regs, , 0);
-   rv = 0;
+   } else if (rv == 1 && interrupts_enabl

[PATCH v3 5/5] arm64: fault: re-enable irq if it was disabled for single stepping

2017-07-31 Thread Pratyush Anand

We disable irq before single stepping and re-enable it after it.
However, if stepped instruction will cause a fault then we will enter
into fault handler with interrupt disabled, which is not desired.
But, we should be safe if we re-enable interrupt in fault handler if
it was disabled for single stepping.

Signed-off-by: Pratyush Anand 
---
 arch/arm64/mm/fault.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index ce5290dacba3..2b88807eb964 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -589,6 +589,8 @@ static const struct fault_info fault_info[] = {
{ do_bad,   SIGBUS,  0, "unknown 63"
},
 };
 
+static DEFINE_PER_CPU(bool, irq_enable_needed);
+
 /*
  * Dispatch a data abort to the relevant handler.
  */
@@ -597,6 +599,12 @@ asmlinkage void __exception do_mem_abort(unsigned long 
addr, unsigned int esr,
 {
const struct fault_info *inf = esr_to_fault_info(esr);
struct siginfo info;
+   bool *irq_en_needed = this_cpu_ptr(_enable_needed);
+
+   if (*irq_en_needed) {
+   regs->pstate &= ~PSR_I_BIT;
+   *irq_en_needed = false;
+   }
 
if (!inf->fn(addr, esr, regs))
return;
@@ -672,8 +680,6 @@ void __init hook_debug_fault_code(int nr,
debug_fault_info[nr].name   = name;
 }
 
-static DEFINE_PER_CPU(bool, irq_enable_needed);
-
 asmlinkage int __exception do_debug_exception(unsigned long addr,
  unsigned int esr,
  struct pt_regs *regs)
-- 
2.9.4

[PATCH v3 0/5] ARM64: disable irq between breakpoint and step exception

2017-07-31 Thread Pratyush Anand

v2 -> v3
- Moved step_needed from uapi structure to kernel only structure
- Re-enable interrupt if stepped instruction faults
- Modified register_wide_hw_breakpoint() to accept step_needed arg
v2 was here: http://marc.info/?l=linux-arm-kernel=149942910730496=2

v1 -> v2:
- patch 1 of v1 has been modified to patch 1-3 of v2.
- Introduced a new event attribute step_needed and implemented
  hw_breakpoint_needs_single_step() (patch 1)
- Replaced usage of is_default_overflow_handler() with
  hw_breakpoint_needs_single_step(). (patch 2)
- Modified sample test to set set step_needed bit field (patch 3)
v1 was here: http://marc.info/?l=linux-arm-kernel=149910958418708=2

samples/hw_breakpoint/data_breakpoint.c passes with x86_64 but fails with
ARM64. Even though it has been NAKed previously on upstream [1, 2], I have
tried to come up with patches which can resolve it for ARM64 as well.

I noticed that even perf step exception can go into an infinite loop if CPU
receives an interrupt while executing breakpoint/watchpoint handler. So,
event though we are not concerned about above test, we will have to find a
solution for the perf issue.

This patchset attempts to resolve both the issue. Please review.
Since, it also takes care of SW breakpoint, so I hope kgdb should also be
fine. However, I have not tested that.
@Takahiro: Will it be possible to test these patches for kgdb.

[1] http://marc.info/?l=linux-arm-kernel=149580777524910=2
[2] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-April/425266.html


Pratyush Anand (5):
  hw_breakpoint: Add step_needed event attribute
  arm64: use hw_breakpoint_needs_single_step() to decide if step is
needed
  register_wide_hw_breakpoint(): modify to accept step_needed arg
  arm64: disable irq between breakpoint and step exception
  arm64: fault: re-enable irq if it was disabled for single stepping

 arch/arm64/kernel/debug-monitors.c  |  3 +++
 arch/arm64/kernel/hw_breakpoint.c   | 10 +-
 arch/arm64/mm/fault.c   | 28 
 arch/x86/kernel/kgdb.c  |  2 +-
 include/linux/hw_breakpoint.h   | 10 --
 include/linux/perf_event.h  |  6 ++
 kernel/events/core.c|  2 ++
 kernel/events/hw_breakpoint.c   |  4 +++-
 samples/hw_breakpoint/data_breakpoint.c |  3 ++-
 9 files changed, 54 insertions(+), 14 deletions(-)

-- 
2.9.4

[PATCH v3 0/5] ARM64: disable irq between breakpoint and step exception

2017-07-31 Thread Pratyush Anand

v2 -> v3
- Moved step_needed from uapi structure to kernel only structure
- Re-enable interrupt if stepped instruction faults
- Modified register_wide_hw_breakpoint() to accept step_needed arg
v2 was here: http://marc.info/?l=linux-arm-kernel=149942910730496=2

v1 -> v2:
- patch 1 of v1 has been modified to patch 1-3 of v2.
- Introduced a new event attribute step_needed and implemented
  hw_breakpoint_needs_single_step() (patch 1)
- Replaced usage of is_default_overflow_handler() with
  hw_breakpoint_needs_single_step(). (patch 2)
- Modified sample test to set set step_needed bit field (patch 3)
v1 was here: http://marc.info/?l=linux-arm-kernel=149910958418708=2

samples/hw_breakpoint/data_breakpoint.c passes with x86_64 but fails with
ARM64. Even though it has been NAKed previously on upstream [1, 2], I have
tried to come up with patches which can resolve it for ARM64 as well.

I noticed that even perf step exception can go into an infinite loop if CPU
receives an interrupt while executing breakpoint/watchpoint handler. So,
event though we are not concerned about above test, we will have to find a
solution for the perf issue.

This patchset attempts to resolve both the issue. Please review.
Since, it also takes care of SW breakpoint, so I hope kgdb should also be
fine. However, I have not tested that.
@Takahiro: Will it be possible to test these patches for kgdb.

[1] http://marc.info/?l=linux-arm-kernel=149580777524910=2
[2] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-April/425266.html


Pratyush Anand (5):
  hw_breakpoint: Add step_needed event attribute
  arm64: use hw_breakpoint_needs_single_step() to decide if step is
needed
  register_wide_hw_breakpoint(): modify to accept step_needed arg
  arm64: disable irq between breakpoint and step exception
  arm64: fault: re-enable irq if it was disabled for single stepping

 arch/arm64/kernel/debug-monitors.c  |  3 +++
 arch/arm64/kernel/hw_breakpoint.c   | 10 +-
 arch/arm64/mm/fault.c   | 28 
 arch/x86/kernel/kgdb.c  |  2 +-
 include/linux/hw_breakpoint.h   | 10 --
 include/linux/perf_event.h  |  6 ++
 kernel/events/core.c|  2 ++
 kernel/events/hw_breakpoint.c   |  4 +++-
 samples/hw_breakpoint/data_breakpoint.c |  3 ++-
 9 files changed, 54 insertions(+), 14 deletions(-)

-- 
2.9.4

Re: [PATCH V2 1/4] hw_breakpoint: Add step_needed event attribute

2017-07-25 Thread Pratyush Anand




On Tuesday 25 July 2017 06:57 PM, Will Deacon wrote:

On Fri, Jul 07, 2017 at 05:33:57PM +0530, Pratyush Anand wrote:

Architecture like ARM64 currently allows to use default hw breakpoint
single step handler only to perf. However, some other users like few
systemtap tests or kernel test in
samples/hw_breakpoint/data_breakpoint.c can also work with default step
handler implementation. At the same time, some other like GDB/ptrace may
implement their own step handler.

Therefore, this patch introduces a new perf_event_attr bit field, so
that arch specific code(specially on arm64) can make a decision to
enable single stepping.

Any architecture which is not using this field will not have any
side effect.

Signed-off-by: Pratyush Anand <pan...@redhat.com>
---
  include/linux/hw_breakpoint.h | 6 ++
  include/uapi/linux/perf_event.h   | 3 ++-
  kernel/events/core.c  | 2 ++
  tools/include/uapi/linux/perf_event.h | 3 ++-
  4 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/include/linux/hw_breakpoint.h b/include/linux/hw_breakpoint.h
index 0464c85e63fd..6173ae048cbc 100644
--- a/include/linux/hw_breakpoint.h
+++ b/include/linux/hw_breakpoint.h
@@ -38,6 +38,12 @@ static inline int hw_breakpoint_type(struct perf_event *bp)
return bp->attr.bp_type;
  }
  
+static inline bool

+hw_breakpoint_needs_single_step(struct perf_event *bp)
+{
+   return bp->attr.step_needed;
+}
+
  static inline unsigned long hw_breakpoint_len(struct perf_event *bp)
  {
return bp->attr.bp_len;
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index b1c0b187acfe..00935808de0d 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -345,7 +345,8 @@ struct perf_event_attr {
context_switch :  1, /* context switch data */
write_backward :  1, /* Write ring buffer from 
end to beginning */
namespaces :  1, /* include namespaces data 
*/
-   __reserved_1   : 35;
+   step_needed:  1, /* Use arch step handler */
+   __reserved_1   : 34;


This needs documenting properly, as I really have no idea how userspace is
going to use it sensibley, especially as you silently overwrite it in some
cases below.


I too had thought to put it under include/linux/perf_event.h : struct 
perf_event. But, see hw_break_module_init() which does not have knowledge of 
this structure, and we need to have some way so that none-perf kernel module 
implementation can tell that it needs default arch step handler.


Do you see any alternative?

--
Regards
Pratyush

Re: [PATCH V2 1/4] hw_breakpoint: Add step_needed event attribute

2017-07-25 Thread Pratyush Anand




On Tuesday 25 July 2017 06:57 PM, Will Deacon wrote:

On Fri, Jul 07, 2017 at 05:33:57PM +0530, Pratyush Anand wrote:

Architecture like ARM64 currently allows to use default hw breakpoint
single step handler only to perf. However, some other users like few
systemtap tests or kernel test in
samples/hw_breakpoint/data_breakpoint.c can also work with default step
handler implementation. At the same time, some other like GDB/ptrace may
implement their own step handler.

Therefore, this patch introduces a new perf_event_attr bit field, so
that arch specific code(specially on arm64) can make a decision to
enable single stepping.

Any architecture which is not using this field will not have any
side effect.

Signed-off-by: Pratyush Anand 
---
  include/linux/hw_breakpoint.h | 6 ++
  include/uapi/linux/perf_event.h   | 3 ++-
  kernel/events/core.c  | 2 ++
  tools/include/uapi/linux/perf_event.h | 3 ++-
  4 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/include/linux/hw_breakpoint.h b/include/linux/hw_breakpoint.h
index 0464c85e63fd..6173ae048cbc 100644
--- a/include/linux/hw_breakpoint.h
+++ b/include/linux/hw_breakpoint.h
@@ -38,6 +38,12 @@ static inline int hw_breakpoint_type(struct perf_event *bp)
return bp->attr.bp_type;
  }
  
+static inline bool

+hw_breakpoint_needs_single_step(struct perf_event *bp)
+{
+   return bp->attr.step_needed;
+}
+
  static inline unsigned long hw_breakpoint_len(struct perf_event *bp)
  {
return bp->attr.bp_len;
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index b1c0b187acfe..00935808de0d 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -345,7 +345,8 @@ struct perf_event_attr {
context_switch :  1, /* context switch data */
write_backward :  1, /* Write ring buffer from 
end to beginning */
namespaces :  1, /* include namespaces data 
*/
-   __reserved_1   : 35;
+   step_needed:  1, /* Use arch step handler */
+   __reserved_1   : 34;


This needs documenting properly, as I really have no idea how userspace is
going to use it sensibley, especially as you silently overwrite it in some
cases below.


I too had thought to put it under include/linux/perf_event.h : struct 
perf_event. But, see hw_break_module_init() which does not have knowledge of 
this structure, and we need to have some way so that none-perf kernel module 
implementation can tell that it needs default arch step handler.


Do you see any alternative?

--
Regards
Pratyush

Re: [PATCH V2 4/4] arm64: disable irq between breakpoint and step exception

2017-07-25 Thread Pratyush Anand


Hi Will,

Thanks for your review.

On Tuesday 25 July 2017 06:55 PM, Will Deacon wrote:

On Fri, Jul 07, 2017 at 05:34:00PM +0530, Pratyush Anand wrote:

If an interrupt is generated between breakpoint and step handler then
step handler can not get correct step address. This situation can easily
be invoked by samples/hw_breakpoint/data_breakpoint.c. It can also be
reproduced if we insert any printk() statement or dump_stack() in perf
overflow_handler. So, it seems that perf is working fine just luckily.
If the CPU which is handling perf breakpoint handler receives any
interrupt then, perf step handler will not execute sanely.

This patch improves do_debug_exception() handling, which enforces now,
that exception handler function:
- should return 0 for any software breakpoint and hw
breakpoint/watchpoint handler if it does not expect a single step stage
- should return 1 if it expects single step.
- A single step handler should always return 0.
- All handler should return a -ve error in any other case.

Now, we can know in do_debug_exception() that whether a step exception
will be followed or not. If there will a step exception then disable
irq. Re-enable it after single step handling.


AFAICT, this is only a problem for kernel-mode breakpoints where we end up
stepping into the interrupt handler when trying to step over a breakpoint.


I think yes.



We'd probably be better off getting all users of kernel step (kprobes, kgdb
and perf) to run the step with irqs disabled,



That should be doable. We can easily manage all of them in 
do_debug_exception() if individual brk handlers return correct value as per 
the rule mentioned in the commit log of this patch.


I think, I can take care of kprobes and kgdb as well in next version of patch.


but I still have reservations
about that:


So, IIUC, you have concern about faulting of a instruction being stepped. 
Since we will have a notion of *irq_en_needed*, so I think, if needed we can 
re-enable interrupt in fault handler do_mem_abort().


Whats your opinion here?



   http://lists.infradead.org/pipermail/linux-arm-kernel/2017-May/508066.html
   http://lists.infradead.org/pipermail/linux-arm-kernel/2017-June/510814.html

Wouldn't it be better to follow kprobes/kgdb and have perf run the step with
irqs disabled?

--
Regards
Pratyush

Re: [PATCH V2 4/4] arm64: disable irq between breakpoint and step exception

2017-07-25 Thread Pratyush Anand


Hi Will,

Thanks for your review.

On Tuesday 25 July 2017 06:55 PM, Will Deacon wrote:

On Fri, Jul 07, 2017 at 05:34:00PM +0530, Pratyush Anand wrote:

If an interrupt is generated between breakpoint and step handler then
step handler can not get correct step address. This situation can easily
be invoked by samples/hw_breakpoint/data_breakpoint.c. It can also be
reproduced if we insert any printk() statement or dump_stack() in perf
overflow_handler. So, it seems that perf is working fine just luckily.
If the CPU which is handling perf breakpoint handler receives any
interrupt then, perf step handler will not execute sanely.

This patch improves do_debug_exception() handling, which enforces now,
that exception handler function:
- should return 0 for any software breakpoint and hw
breakpoint/watchpoint handler if it does not expect a single step stage
- should return 1 if it expects single step.
- A single step handler should always return 0.
- All handler should return a -ve error in any other case.

Now, we can know in do_debug_exception() that whether a step exception
will be followed or not. If there will a step exception then disable
irq. Re-enable it after single step handling.


AFAICT, this is only a problem for kernel-mode breakpoints where we end up
stepping into the interrupt handler when trying to step over a breakpoint.


I think yes.



We'd probably be better off getting all users of kernel step (kprobes, kgdb
and perf) to run the step with irqs disabled,



That should be doable. We can easily manage all of them in 
do_debug_exception() if individual brk handlers return correct value as per 
the rule mentioned in the commit log of this patch.


I think, I can take care of kprobes and kgdb as well in next version of patch.


but I still have reservations
about that:


So, IIUC, you have concern about faulting of a instruction being stepped. 
Since we will have a notion of *irq_en_needed*, so I think, if needed we can 
re-enable interrupt in fault handler do_mem_abort().


Whats your opinion here?



   http://lists.infradead.org/pipermail/linux-arm-kernel/2017-May/508066.html
   http://lists.infradead.org/pipermail/linux-arm-kernel/2017-June/510814.html

Wouldn't it be better to follow kprobes/kgdb and have perf run the step with
irqs disabled?

--
Regards
Pratyush

Re: [PATCH V2 0/4] ARM64: Fix irq generation between breakpoint and step exception

2017-07-16 Thread Pratyush Anand




On Friday 07 July 2017 05:33 PM, Pratyush Anand wrote:

v1 was here http://marc.info/?l=linux-arm-kernel=149910958418708=2

v1 -> v2:
- patch 1 of v1 has been modified to patch 1-3 of v2.
- Introduced a new event attribute step_needed and implemented
  hw_breakpoint_needs_single_step() (patch 1)
- Replaced usage of is_default_overflow_handler() with
  hw_breakpoint_needs_single_step(). (patch 2)
- Modified sample test to set set step_needed bit field (patch 3)

samples/hw_breakpoint/data_breakpoint.c passes with x86_64 but fails with
ARM64. Even though it has been NAKed previously on upstream [1, 2], I have
tried to come up with patches which can resolve it for ARM64 as well.

I noticed that even perf step exception can go into an infinite loop if CPU
receives an interrupt while executing breakpoint/watchpoint handler. So,
event though we are not concerned about above test, we will have to find a
solution for the perf issue.

This patchset attempts to resolve both the issue. Please review.


Any comments/feedback?



[1] http://marc.info/?l=linux-arm-kernel=149580777524910=2
[2] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-April/425266.html

Pratyush Anand (4):
  hw_breakpoint: Add step_needed event attribute
  arm64: use hw_breakpoint_needs_single_step() to decide if step is
needed
  hw-breakpoint: sample test: set step_needed bit field
  arm64: disable irq between breakpoint and step exception

 arch/arm64/kernel/debug-monitors.c  |  3 +++
 arch/arm64/kernel/hw_breakpoint.c   | 10 +-
 arch/arm64/mm/fault.c   | 22 ++
 include/linux/hw_breakpoint.h   |  6 ++
 include/uapi/linux/perf_event.h |  3 ++-
 kernel/events/core.c|  2 ++
 samples/hw_breakpoint/data_breakpoint.c |  1 +
 tools/include/uapi/linux/perf_event.h   |  3 ++-
 8 files changed, 39 insertions(+), 11 deletions(-)



--
Pratyush

Re: [PATCH V2 0/4] ARM64: Fix irq generation between breakpoint and step exception

2017-07-16 Thread Pratyush Anand




On Friday 07 July 2017 05:33 PM, Pratyush Anand wrote:

v1 was here http://marc.info/?l=linux-arm-kernel=149910958418708=2

v1 -> v2:
- patch 1 of v1 has been modified to patch 1-3 of v2.
- Introduced a new event attribute step_needed and implemented
  hw_breakpoint_needs_single_step() (patch 1)
- Replaced usage of is_default_overflow_handler() with
  hw_breakpoint_needs_single_step(). (patch 2)
- Modified sample test to set set step_needed bit field (patch 3)

samples/hw_breakpoint/data_breakpoint.c passes with x86_64 but fails with
ARM64. Even though it has been NAKed previously on upstream [1, 2], I have
tried to come up with patches which can resolve it for ARM64 as well.

I noticed that even perf step exception can go into an infinite loop if CPU
receives an interrupt while executing breakpoint/watchpoint handler. So,
event though we are not concerned about above test, we will have to find a
solution for the perf issue.

This patchset attempts to resolve both the issue. Please review.


Any comments/feedback?



[1] http://marc.info/?l=linux-arm-kernel=149580777524910=2
[2] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-April/425266.html

Pratyush Anand (4):
  hw_breakpoint: Add step_needed event attribute
  arm64: use hw_breakpoint_needs_single_step() to decide if step is
needed
  hw-breakpoint: sample test: set step_needed bit field
  arm64: disable irq between breakpoint and step exception

 arch/arm64/kernel/debug-monitors.c  |  3 +++
 arch/arm64/kernel/hw_breakpoint.c   | 10 +-
 arch/arm64/mm/fault.c   | 22 ++
 include/linux/hw_breakpoint.h   |  6 ++
 include/uapi/linux/perf_event.h |  3 ++-
 kernel/events/core.c|  2 ++
 samples/hw_breakpoint/data_breakpoint.c |  1 +
 tools/include/uapi/linux/perf_event.h   |  3 ++-
 8 files changed, 39 insertions(+), 11 deletions(-)



--
Pratyush

Re: [PATCH] usb: gadget: functions: add ftrace export over USB

2017-07-13 Thread Pratyush Anand


Hi Felipe,

On Friday 09 June 2017 03:58 PM, Felipe Balbi wrote:

Felipe Balbi  writes:


Allow for ftrace data to be exported over a USB Gadget
Controller. With this, we have a potentially very fast pipe for
transmitting ftrace data to a Host PC for further analysis.

Note that in order to decode the data, one needs access to kernel
symbols in order to convert binary data into function names and what
not.

Signed-off-by: Felipe Balbi 
---

I wanted to take this through the gadget tree, but there is a
dependency with a previous patch of mine adding and extra argument to
the ->write() function. Hoping someone else will take it.


just as an extra note here. In order for this to be really useful, it
would be nice to be able to control what is going to be traced over USB


Probably you will also need to export *atleast* symbol information. In future, 
if this framework is extended to export tracepoint/kprobe/uprobe event data, 
then the information like event format etc will also need to be exported.


What tool do you use on host machine in order to extract information from this 
exported target ring buffer data?

IMHO, standard tools like trace-cmd will not be able to use it as it is.


as well, but that means exporting a few extra functions to GPL drivers.

Would that be okay? I could have a set of vendor-specific control
requests to set buffer size and to read/write ftrace filter functions.

The idea is that things like e.g. Android SDK could rely on this on
debug builds and the SDK itself would make sure to keep a copy of
vmlinux around to processing of the data coming through USB.



--
Pratyush

Re: [PATCH] usb: gadget: functions: add ftrace export over USB

2017-07-13 Thread Pratyush Anand


Hi Felipe,

On Friday 09 June 2017 03:58 PM, Felipe Balbi wrote:

Felipe Balbi  writes:


Allow for ftrace data to be exported over a USB Gadget
Controller. With this, we have a potentially very fast pipe for
transmitting ftrace data to a Host PC for further analysis.

Note that in order to decode the data, one needs access to kernel
symbols in order to convert binary data into function names and what
not.

Signed-off-by: Felipe Balbi 
---

I wanted to take this through the gadget tree, but there is a
dependency with a previous patch of mine adding and extra argument to
the ->write() function. Hoping someone else will take it.


just as an extra note here. In order for this to be really useful, it
would be nice to be able to control what is going to be traced over USB


Probably you will also need to export *atleast* symbol information. In future, 
if this framework is extended to export tracepoint/kprobe/uprobe event data, 
then the information like event format etc will also need to be exported.


What tool do you use on host machine in order to extract information from this 
exported target ring buffer data?

IMHO, standard tools like trace-cmd will not be able to use it as it is.


as well, but that means exporting a few extra functions to GPL drivers.

Would that be okay? I could have a set of vendor-specific control
requests to set buffer size and to read/write ftrace filter functions.

The idea is that things like e.g. Android SDK could rely on this on
debug builds and the SDK itself would make sure to keep a copy of
vmlinux around to processing of the data coming through USB.



--
Pratyush

Re: [PATCH] usb: gadget: functions: add ftrace export over USB

2017-07-13 Thread Pratyush Anand


Hi Felipe,

On Friday 09 June 2017 11:43 AM, Felipe Balbi wrote:

+static void notrace ftrace_write(struct trace_export *ftrace, const void *buf,
+unsigned int len)
+{
+   struct usb_ftrace   *trace = ftrace_to_trace(ftrace);
+   struct usb_request  *req = next_request(>list);
+
+   if (!req)
+   return;
+
+   if (!trace->in->enabled)
+   return;
+
+   req->buf = kmemdup(buf, len, GFP_ATOMIC);


Probably we can avoid the copy of trace data.

We can make write() call of "struct trace_export" as posted. Can have a 
write_complete() callback function implemented in  struct trace_export,which 
can be called from your ftrace_complete().


We need to execute __buffer_unlock_commit() only in write_complete() in case 
of ftrace_export is enabled.




+   req->length = len;
+   req->context = trace;
+   req->complete = ftrace_complete;
+   list_move_tail(>list, >pending);
+
+   schedule_work(>queue_work);
+}
+


--
Pratyush

Re: [PATCH] usb: gadget: functions: add ftrace export over USB

2017-07-13 Thread Pratyush Anand


Hi Felipe,

On Friday 09 June 2017 11:43 AM, Felipe Balbi wrote:

+static void notrace ftrace_write(struct trace_export *ftrace, const void *buf,
+unsigned int len)
+{
+   struct usb_ftrace   *trace = ftrace_to_trace(ftrace);
+   struct usb_request  *req = next_request(>list);
+
+   if (!req)
+   return;
+
+   if (!trace->in->enabled)
+   return;
+
+   req->buf = kmemdup(buf, len, GFP_ATOMIC);


Probably we can avoid the copy of trace data.

We can make write() call of "struct trace_export" as posted. Can have a 
write_complete() callback function implemented in  struct trace_export,which 
can be called from your ftrace_complete().


We need to execute __buffer_unlock_commit() only in write_complete() in case 
of ftrace_export is enabled.




+   req->length = len;
+   req->context = trace;
+   req->complete = ftrace_complete;
+   list_move_tail(>list, >pending);
+
+   schedule_work(>queue_work);
+}
+


--
Pratyush

[PATCH V2 2/4] arm64: use hw_breakpoint_needs_single_step() to decide if step is needed

2017-07-07 Thread Pratyush Anand

Currently we use is_default_overflow_handler() to decide whether a
"step" will be needed or not. However, is_default_overflow_handler() is
true only for perf implementation. There can be some custom kernel
module tests like samples/hw_breakpoint/data_breakpoint.c which can
rely on default step handler.

hw_breakpoint_needs_single_step() will be true if any hw_breakpoint user
wants to use default step handler and sets step_needed in attribute.

Signed-off-by: Pratyush Anand <pan...@redhat.com>
---
 arch/arm64/kernel/hw_breakpoint.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kernel/hw_breakpoint.c 
b/arch/arm64/kernel/hw_breakpoint.c
index 749f81779420..9a73f85ab9ad 100644
--- a/arch/arm64/kernel/hw_breakpoint.c
+++ b/arch/arm64/kernel/hw_breakpoint.c
@@ -661,7 +661,7 @@ static int breakpoint_handler(unsigned long unused, 
unsigned int esr,
perf_bp_event(bp, regs);
 
/* Do we need to handle the stepping? */
-   if (is_default_overflow_handler(bp))
+   if (hw_breakpoint_needs_single_step(bp))
step = 1;
 unlock:
rcu_read_unlock();
@@ -789,7 +789,7 @@ static int watchpoint_handler(unsigned long addr, unsigned 
int esr,
perf_bp_event(wp, regs);
 
/* Do we need to handle the stepping? */
-   if (is_default_overflow_handler(wp))
+   if (hw_breakpoint_needs_single_step(wp))
step = 1;
}
if (min_dist > 0 && min_dist != -1) {
@@ -800,7 +800,7 @@ static int watchpoint_handler(unsigned long addr, unsigned 
int esr,
perf_bp_event(wp, regs);
 
/* Do we need to handle the stepping? */
-   if (is_default_overflow_handler(wp))
+   if (hw_breakpoint_needs_single_step(wp))
step = 1;
}
rcu_read_unlock();
-- 
2.9.3

[PATCH V2 3/4] hw-breakpoint: sample test: set step_needed bit field

2017-07-07 Thread Pratyush Anand

arch like ARM64 expects 'step_needed == true'  in order to use default
single step handler. Therefore, set the bit field in the test case.
Other arch will not have any affect as they do not use it so far.

Signed-off-by: Pratyush Anand <pan...@redhat.com>
---
 samples/hw_breakpoint/data_breakpoint.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/samples/hw_breakpoint/data_breakpoint.c 
b/samples/hw_breakpoint/data_breakpoint.c
index ef7f32291852..5a1919d01800 100644
--- a/samples/hw_breakpoint/data_breakpoint.c
+++ b/samples/hw_breakpoint/data_breakpoint.c
@@ -59,6 +59,7 @@ static int __init hw_break_module_init(void)
attr.bp_addr = kallsyms_lookup_name(ksym_name);
attr.bp_len = HW_BREAKPOINT_LEN_4;
attr.bp_type = HW_BREAKPOINT_W | HW_BREAKPOINT_R;
+   attr.step_needed = true;
 
sample_hbp = register_wide_hw_breakpoint(, sample_hbp_handler, 
NULL);
if (IS_ERR((void __force *)sample_hbp)) {
-- 
2.9.3

[PATCH V2 2/4] arm64: use hw_breakpoint_needs_single_step() to decide if step is needed

2017-07-07 Thread Pratyush Anand

Currently we use is_default_overflow_handler() to decide whether a
"step" will be needed or not. However, is_default_overflow_handler() is
true only for perf implementation. There can be some custom kernel
module tests like samples/hw_breakpoint/data_breakpoint.c which can
rely on default step handler.

hw_breakpoint_needs_single_step() will be true if any hw_breakpoint user
wants to use default step handler and sets step_needed in attribute.

Signed-off-by: Pratyush Anand 
---
 arch/arm64/kernel/hw_breakpoint.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kernel/hw_breakpoint.c 
b/arch/arm64/kernel/hw_breakpoint.c
index 749f81779420..9a73f85ab9ad 100644
--- a/arch/arm64/kernel/hw_breakpoint.c
+++ b/arch/arm64/kernel/hw_breakpoint.c
@@ -661,7 +661,7 @@ static int breakpoint_handler(unsigned long unused, 
unsigned int esr,
perf_bp_event(bp, regs);
 
/* Do we need to handle the stepping? */
-   if (is_default_overflow_handler(bp))
+   if (hw_breakpoint_needs_single_step(bp))
step = 1;
 unlock:
rcu_read_unlock();
@@ -789,7 +789,7 @@ static int watchpoint_handler(unsigned long addr, unsigned 
int esr,
perf_bp_event(wp, regs);
 
/* Do we need to handle the stepping? */
-   if (is_default_overflow_handler(wp))
+   if (hw_breakpoint_needs_single_step(wp))
step = 1;
}
if (min_dist > 0 && min_dist != -1) {
@@ -800,7 +800,7 @@ static int watchpoint_handler(unsigned long addr, unsigned 
int esr,
perf_bp_event(wp, regs);
 
/* Do we need to handle the stepping? */
-   if (is_default_overflow_handler(wp))
+   if (hw_breakpoint_needs_single_step(wp))
step = 1;
}
rcu_read_unlock();
-- 
2.9.3

[PATCH V2 3/4] hw-breakpoint: sample test: set step_needed bit field

2017-07-07 Thread Pratyush Anand

arch like ARM64 expects 'step_needed == true'  in order to use default
single step handler. Therefore, set the bit field in the test case.
Other arch will not have any affect as they do not use it so far.

Signed-off-by: Pratyush Anand 
---
 samples/hw_breakpoint/data_breakpoint.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/samples/hw_breakpoint/data_breakpoint.c 
b/samples/hw_breakpoint/data_breakpoint.c
index ef7f32291852..5a1919d01800 100644
--- a/samples/hw_breakpoint/data_breakpoint.c
+++ b/samples/hw_breakpoint/data_breakpoint.c
@@ -59,6 +59,7 @@ static int __init hw_break_module_init(void)
attr.bp_addr = kallsyms_lookup_name(ksym_name);
attr.bp_len = HW_BREAKPOINT_LEN_4;
attr.bp_type = HW_BREAKPOINT_W | HW_BREAKPOINT_R;
+   attr.step_needed = true;
 
sample_hbp = register_wide_hw_breakpoint(, sample_hbp_handler, 
NULL);
if (IS_ERR((void __force *)sample_hbp)) {
-- 
2.9.3

[PATCH V2 0/4] ARM64: Fix irq generation between breakpoint and step exception

2017-07-07 Thread Pratyush Anand

v1 was here http://marc.info/?l=linux-arm-kernel=149910958418708=2

v1 -> v2:
- patch 1 of v1 has been modified to patch 1-3 of v2.
- Introduced a new event attribute step_needed and implemented
  hw_breakpoint_needs_single_step() (patch 1)
- Replaced usage of is_default_overflow_handler() with
  hw_breakpoint_needs_single_step(). (patch 2)
- Modified sample test to set set step_needed bit field (patch 3)

samples/hw_breakpoint/data_breakpoint.c passes with x86_64 but fails with
ARM64. Even though it has been NAKed previously on upstream [1, 2], I have
tried to come up with patches which can resolve it for ARM64 as well.

I noticed that even perf step exception can go into an infinite loop if CPU
receives an interrupt while executing breakpoint/watchpoint handler. So,
event though we are not concerned about above test, we will have to find a
solution for the perf issue.

This patchset attempts to resolve both the issue. Please review.

[1] http://marc.info/?l=linux-arm-kernel=149580777524910=2
[2] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-April/425266.html

Pratyush Anand (4):
  hw_breakpoint: Add step_needed event attribute
  arm64: use hw_breakpoint_needs_single_step() to decide if step is
needed
  hw-breakpoint: sample test: set step_needed bit field
  arm64: disable irq between breakpoint and step exception

 arch/arm64/kernel/debug-monitors.c  |  3 +++
 arch/arm64/kernel/hw_breakpoint.c   | 10 +-
 arch/arm64/mm/fault.c   | 22 ++
 include/linux/hw_breakpoint.h   |  6 ++
 include/uapi/linux/perf_event.h |  3 ++-
 kernel/events/core.c|  2 ++
 samples/hw_breakpoint/data_breakpoint.c |  1 +
 tools/include/uapi/linux/perf_event.h   |  3 ++-
 8 files changed, 39 insertions(+), 11 deletions(-)

-- 
2.9.3

[PATCH V2 4/4] arm64: disable irq between breakpoint and step exception

2017-07-07 Thread Pratyush Anand

If an interrupt is generated between breakpoint and step handler then
step handler can not get correct step address. This situation can easily
be invoked by samples/hw_breakpoint/data_breakpoint.c. It can also be
reproduced if we insert any printk() statement or dump_stack() in perf
overflow_handler. So, it seems that perf is working fine just luckily.
If the CPU which is handling perf breakpoint handler receives any
interrupt then, perf step handler will not execute sanely.

This patch improves do_debug_exception() handling, which enforces now,
that exception handler function:
- should return 0 for any software breakpoint and hw
breakpoint/watchpoint handler if it does not expect a single step stage
- should return 1 if it expects single step.
- A single step handler should always return 0.
- All handler should return a -ve error in any other case.

Now, we can know in do_debug_exception() that whether a step exception
will be followed or not. If there will a step exception then disable
irq. Re-enable it after single step handling.

Signed-off-by: Pratyush Anand <pan...@redhat.com>
---
 arch/arm64/kernel/debug-monitors.c |  3 +++
 arch/arm64/kernel/hw_breakpoint.c  |  4 ++--
 arch/arm64/mm/fault.c  | 22 ++
 3 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/kernel/debug-monitors.c 
b/arch/arm64/kernel/debug-monitors.c
index d618e25c3de1..16f29f853b54 100644
--- a/arch/arm64/kernel/debug-monitors.c
+++ b/arch/arm64/kernel/debug-monitors.c
@@ -325,6 +325,9 @@ static int brk_handler(unsigned long addr, unsigned int esr,
return -EFAULT;
}
 
+   if (kernel_active_single_step() || test_thread_flag(TIF_SINGLESTEP))
+   return 1;
+
return 0;
 }
 NOKPROBE_SYMBOL(brk_handler);
diff --git a/arch/arm64/kernel/hw_breakpoint.c 
b/arch/arm64/kernel/hw_breakpoint.c
index 9a73f85ab9ad..d39b8039c70e 100644
--- a/arch/arm64/kernel/hw_breakpoint.c
+++ b/arch/arm64/kernel/hw_breakpoint.c
@@ -697,7 +697,7 @@ static int breakpoint_handler(unsigned long unused, 
unsigned int esr,
}
}
 
-   return 0;
+   return 1;
 }
 NOKPROBE_SYMBOL(breakpoint_handler);
 
@@ -840,7 +840,7 @@ static int watchpoint_handler(unsigned long addr, unsigned 
int esr,
}
}
 
-   return 0;
+   return 1;
 }
 NOKPROBE_SYMBOL(watchpoint_handler);
 
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 37b95dff0b07..ce5290dacba3 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -653,6 +653,13 @@ static struct fault_info __refdata debug_fault_info[] = {
{ do_bad,   SIGBUS, 0,  "unknown 7" 
},
 };
 
+/*
+ * fn should return 0 from any software breakpoint and hw
+ * breakpoint/watchpoint handler if it does not expect a single step stage
+ * and 1 if it expects single step followed by its execution. A single step
+ * handler should always return 0. All handler should return a -ve error in
+ * any other case.
+ */
 void __init hook_debug_fault_code(int nr,
  int (*fn)(unsigned long, unsigned int, struct 
pt_regs *),
  int sig, int code, const char *name)
@@ -665,6 +672,8 @@ void __init hook_debug_fault_code(int nr,
debug_fault_info[nr].name   = name;
 }
 
+static DEFINE_PER_CPU(bool, irq_enable_needed);
+
 asmlinkage int __exception do_debug_exception(unsigned long addr,
  unsigned int esr,
  struct pt_regs *regs)
@@ -672,6 +681,7 @@ asmlinkage int __exception do_debug_exception(unsigned long 
addr,
const struct fault_info *inf = debug_fault_info + DBG_ESR_EVT(esr);
struct siginfo info;
int rv;
+   bool *irq_en_needed = this_cpu_ptr(_enable_needed);
 
/*
 * Tell lockdep we disabled irqs in entry.S. Do nothing if they were
@@ -680,9 +690,8 @@ asmlinkage int __exception do_debug_exception(unsigned long 
addr,
if (interrupts_enabled(regs))
trace_hardirqs_off();
 
-   if (!inf->fn(addr, esr, regs)) {
-   rv = 1;
-   } else {
+   rv = inf->fn(addr, esr, regs);
+   if (rv < 0) {
pr_alert("Unhandled debug exception: %s (0x%08x) at 0x%016lx\n",
 inf->name, esr, addr);
 
@@ -691,7 +700,12 @@ asmlinkage int __exception do_debug_exception(unsigned 
long addr,
info.si_code  = inf->code;
info.si_addr  = (void __user *)addr;
arm64_notify_die("", regs, , 0);
-   rv = 0;
+   } else if (rv == 1 && interrupts_enabled(regs)) {
+   regs->pstate |= PSR_I_BIT;
+   *irq_en_needed = true;
+   } else if (rv == 0 && *irq_en_needed) {
+   regs->pstate &= ~PSR_I_BIT;
+

[PATCH V2 0/4] ARM64: Fix irq generation between breakpoint and step exception

2017-07-07 Thread Pratyush Anand

v1 was here http://marc.info/?l=linux-arm-kernel=149910958418708=2

v1 -> v2:
- patch 1 of v1 has been modified to patch 1-3 of v2.
- Introduced a new event attribute step_needed and implemented
  hw_breakpoint_needs_single_step() (patch 1)
- Replaced usage of is_default_overflow_handler() with
  hw_breakpoint_needs_single_step(). (patch 2)
- Modified sample test to set set step_needed bit field (patch 3)

samples/hw_breakpoint/data_breakpoint.c passes with x86_64 but fails with
ARM64. Even though it has been NAKed previously on upstream [1, 2], I have
tried to come up with patches which can resolve it for ARM64 as well.

I noticed that even perf step exception can go into an infinite loop if CPU
receives an interrupt while executing breakpoint/watchpoint handler. So,
event though we are not concerned about above test, we will have to find a
solution for the perf issue.

This patchset attempts to resolve both the issue. Please review.

[1] http://marc.info/?l=linux-arm-kernel=149580777524910=2
[2] http://lists.infradead.org/pipermail/linux-arm-kernel/2016-April/425266.html

Pratyush Anand (4):
  hw_breakpoint: Add step_needed event attribute
  arm64: use hw_breakpoint_needs_single_step() to decide if step is
needed
  hw-breakpoint: sample test: set step_needed bit field
  arm64: disable irq between breakpoint and step exception

 arch/arm64/kernel/debug-monitors.c  |  3 +++
 arch/arm64/kernel/hw_breakpoint.c   | 10 +-
 arch/arm64/mm/fault.c   | 22 ++
 include/linux/hw_breakpoint.h   |  6 ++
 include/uapi/linux/perf_event.h |  3 ++-
 kernel/events/core.c|  2 ++
 samples/hw_breakpoint/data_breakpoint.c |  1 +
 tools/include/uapi/linux/perf_event.h   |  3 ++-
 8 files changed, 39 insertions(+), 11 deletions(-)

-- 
2.9.3

[PATCH V2 4/4] arm64: disable irq between breakpoint and step exception

2017-07-07 Thread Pratyush Anand

If an interrupt is generated between breakpoint and step handler then
step handler can not get correct step address. This situation can easily
be invoked by samples/hw_breakpoint/data_breakpoint.c. It can also be
reproduced if we insert any printk() statement or dump_stack() in perf
overflow_handler. So, it seems that perf is working fine just luckily.
If the CPU which is handling perf breakpoint handler receives any
interrupt then, perf step handler will not execute sanely.

This patch improves do_debug_exception() handling, which enforces now,
that exception handler function:
- should return 0 for any software breakpoint and hw
breakpoint/watchpoint handler if it does not expect a single step stage
- should return 1 if it expects single step.
- A single step handler should always return 0.
- All handler should return a -ve error in any other case.

Now, we can know in do_debug_exception() that whether a step exception
will be followed or not. If there will a step exception then disable
irq. Re-enable it after single step handling.

Signed-off-by: Pratyush Anand 
---
 arch/arm64/kernel/debug-monitors.c |  3 +++
 arch/arm64/kernel/hw_breakpoint.c  |  4 ++--
 arch/arm64/mm/fault.c  | 22 ++
 3 files changed, 23 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/kernel/debug-monitors.c 
b/arch/arm64/kernel/debug-monitors.c
index d618e25c3de1..16f29f853b54 100644
--- a/arch/arm64/kernel/debug-monitors.c
+++ b/arch/arm64/kernel/debug-monitors.c
@@ -325,6 +325,9 @@ static int brk_handler(unsigned long addr, unsigned int esr,
return -EFAULT;
}
 
+   if (kernel_active_single_step() || test_thread_flag(TIF_SINGLESTEP))
+   return 1;
+
return 0;
 }
 NOKPROBE_SYMBOL(brk_handler);
diff --git a/arch/arm64/kernel/hw_breakpoint.c 
b/arch/arm64/kernel/hw_breakpoint.c
index 9a73f85ab9ad..d39b8039c70e 100644
--- a/arch/arm64/kernel/hw_breakpoint.c
+++ b/arch/arm64/kernel/hw_breakpoint.c
@@ -697,7 +697,7 @@ static int breakpoint_handler(unsigned long unused, 
unsigned int esr,
}
}
 
-   return 0;
+   return 1;
 }
 NOKPROBE_SYMBOL(breakpoint_handler);
 
@@ -840,7 +840,7 @@ static int watchpoint_handler(unsigned long addr, unsigned 
int esr,
}
}
 
-   return 0;
+   return 1;
 }
 NOKPROBE_SYMBOL(watchpoint_handler);
 
diff --git a/arch/arm64/mm/fault.c b/arch/arm64/mm/fault.c
index 37b95dff0b07..ce5290dacba3 100644
--- a/arch/arm64/mm/fault.c
+++ b/arch/arm64/mm/fault.c
@@ -653,6 +653,13 @@ static struct fault_info __refdata debug_fault_info[] = {
{ do_bad,   SIGBUS, 0,  "unknown 7" 
},
 };
 
+/*
+ * fn should return 0 from any software breakpoint and hw
+ * breakpoint/watchpoint handler if it does not expect a single step stage
+ * and 1 if it expects single step followed by its execution. A single step
+ * handler should always return 0. All handler should return a -ve error in
+ * any other case.
+ */
 void __init hook_debug_fault_code(int nr,
  int (*fn)(unsigned long, unsigned int, struct 
pt_regs *),
  int sig, int code, const char *name)
@@ -665,6 +672,8 @@ void __init hook_debug_fault_code(int nr,
debug_fault_info[nr].name   = name;
 }
 
+static DEFINE_PER_CPU(bool, irq_enable_needed);
+
 asmlinkage int __exception do_debug_exception(unsigned long addr,
  unsigned int esr,
  struct pt_regs *regs)
@@ -672,6 +681,7 @@ asmlinkage int __exception do_debug_exception(unsigned long 
addr,
const struct fault_info *inf = debug_fault_info + DBG_ESR_EVT(esr);
struct siginfo info;
int rv;
+   bool *irq_en_needed = this_cpu_ptr(_enable_needed);
 
/*
 * Tell lockdep we disabled irqs in entry.S. Do nothing if they were
@@ -680,9 +690,8 @@ asmlinkage int __exception do_debug_exception(unsigned long 
addr,
if (interrupts_enabled(regs))
trace_hardirqs_off();
 
-   if (!inf->fn(addr, esr, regs)) {
-   rv = 1;
-   } else {
+   rv = inf->fn(addr, esr, regs);
+   if (rv < 0) {
pr_alert("Unhandled debug exception: %s (0x%08x) at 0x%016lx\n",
 inf->name, esr, addr);
 
@@ -691,7 +700,12 @@ asmlinkage int __exception do_debug_exception(unsigned 
long addr,
info.si_code  = inf->code;
info.si_addr  = (void __user *)addr;
arm64_notify_die("", regs, , 0);
-   rv = 0;
+   } else if (rv == 1 && interrupts_enabled(regs)) {
+   regs->pstate |= PSR_I_BIT;
+   *irq_en_needed = true;
+   } else if (rv == 0 && *irq_en_needed) {
+   regs->pstate &= ~PSR_I_BIT;
+   *irq_en_needed = false;
}
 
if (interrupts_enabled(regs))
-- 
2.9.3

[PATCH V2 1/4] hw_breakpoint: Add step_needed event attribute

2017-07-07 Thread Pratyush Anand

Architecture like ARM64 currently allows to use default hw breakpoint
single step handler only to perf. However, some other users like few
systemtap tests or kernel test in
samples/hw_breakpoint/data_breakpoint.c can also work with default step
handler implementation. At the same time, some other like GDB/ptrace may
implement their own step handler.

Therefore, this patch introduces a new perf_event_attr bit field, so
that arch specific code(specially on arm64) can make a decision to
enable single stepping.

Any architecture which is not using this field will not have any
side effect.

Signed-off-by: Pratyush Anand <pan...@redhat.com>
---
 include/linux/hw_breakpoint.h | 6 ++
 include/uapi/linux/perf_event.h   | 3 ++-
 kernel/events/core.c  | 2 ++
 tools/include/uapi/linux/perf_event.h | 3 ++-
 4 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/include/linux/hw_breakpoint.h b/include/linux/hw_breakpoint.h
index 0464c85e63fd..6173ae048cbc 100644
--- a/include/linux/hw_breakpoint.h
+++ b/include/linux/hw_breakpoint.h
@@ -38,6 +38,12 @@ static inline int hw_breakpoint_type(struct perf_event *bp)
return bp->attr.bp_type;
 }
 
+static inline bool
+hw_breakpoint_needs_single_step(struct perf_event *bp)
+{
+   return bp->attr.step_needed;
+}
+
 static inline unsigned long hw_breakpoint_len(struct perf_event *bp)
 {
return bp->attr.bp_len;
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index b1c0b187acfe..00935808de0d 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -345,7 +345,8 @@ struct perf_event_attr {
context_switch :  1, /* context switch data */
write_backward :  1, /* Write ring buffer from 
end to beginning */
namespaces :  1, /* include namespaces data 
*/
-   __reserved_1   : 35;
+   step_needed:  1, /* Use arch step handler */
+   __reserved_1   : 34;
 
union {
__u32   wakeup_events;/* wakeup every n events */
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 6c4e523dc1e2..220e26941475 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -9444,9 +9444,11 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
} else if (is_write_backward(event)){
event->overflow_handler = perf_event_output_backward;
event->overflow_handler_context = NULL;
+   event->attr.step_needed = 1;
} else {
event->overflow_handler = perf_event_output_forward;
event->overflow_handler_context = NULL;
+   event->attr.step_needed = 1;
}
 
perf_event__state_init(event);
diff --git a/tools/include/uapi/linux/perf_event.h 
b/tools/include/uapi/linux/perf_event.h
index b1c0b187acfe..00935808de0d 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -345,7 +345,8 @@ struct perf_event_attr {
context_switch :  1, /* context switch data */
write_backward :  1, /* Write ring buffer from 
end to beginning */
namespaces :  1, /* include namespaces data 
*/
-   __reserved_1   : 35;
+   step_needed:  1, /* Use arch step handler */
+   __reserved_1   : 34;
 
union {
__u32   wakeup_events;/* wakeup every n events */
-- 
2.9.3

[PATCH V2 1/4] hw_breakpoint: Add step_needed event attribute

2017-07-07 Thread Pratyush Anand

Architecture like ARM64 currently allows to use default hw breakpoint
single step handler only to perf. However, some other users like few
systemtap tests or kernel test in
samples/hw_breakpoint/data_breakpoint.c can also work with default step
handler implementation. At the same time, some other like GDB/ptrace may
implement their own step handler.

Therefore, this patch introduces a new perf_event_attr bit field, so
that arch specific code(specially on arm64) can make a decision to
enable single stepping.

Any architecture which is not using this field will not have any
side effect.

Signed-off-by: Pratyush Anand 
---
 include/linux/hw_breakpoint.h | 6 ++
 include/uapi/linux/perf_event.h   | 3 ++-
 kernel/events/core.c  | 2 ++
 tools/include/uapi/linux/perf_event.h | 3 ++-
 4 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/include/linux/hw_breakpoint.h b/include/linux/hw_breakpoint.h
index 0464c85e63fd..6173ae048cbc 100644
--- a/include/linux/hw_breakpoint.h
+++ b/include/linux/hw_breakpoint.h
@@ -38,6 +38,12 @@ static inline int hw_breakpoint_type(struct perf_event *bp)
return bp->attr.bp_type;
 }
 
+static inline bool
+hw_breakpoint_needs_single_step(struct perf_event *bp)
+{
+   return bp->attr.step_needed;
+}
+
 static inline unsigned long hw_breakpoint_len(struct perf_event *bp)
 {
return bp->attr.bp_len;
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index b1c0b187acfe..00935808de0d 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -345,7 +345,8 @@ struct perf_event_attr {
context_switch :  1, /* context switch data */
write_backward :  1, /* Write ring buffer from 
end to beginning */
namespaces :  1, /* include namespaces data 
*/
-   __reserved_1   : 35;
+   step_needed:  1, /* Use arch step handler */
+   __reserved_1   : 34;
 
union {
__u32   wakeup_events;/* wakeup every n events */
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 6c4e523dc1e2..220e26941475 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -9444,9 +9444,11 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
} else if (is_write_backward(event)){
event->overflow_handler = perf_event_output_backward;
event->overflow_handler_context = NULL;
+   event->attr.step_needed = 1;
} else {
event->overflow_handler = perf_event_output_forward;
event->overflow_handler_context = NULL;
+   event->attr.step_needed = 1;
}
 
perf_event__state_init(event);
diff --git a/tools/include/uapi/linux/perf_event.h 
b/tools/include/uapi/linux/perf_event.h
index b1c0b187acfe..00935808de0d 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -345,7 +345,8 @@ struct perf_event_attr {
context_switch :  1, /* context switch data */
write_backward :  1, /* Write ring buffer from 
end to beginning */
namespaces :  1, /* include namespaces data 
*/
-   __reserved_1   : 35;
+   step_needed:  1, /* Use arch step handler */
+   __reserved_1   : 34;
 
union {
__u32   wakeup_events;/* wakeup every n events */
-- 
2.9.3

Re: [RFC, 2/2] kallsyms: Do not display SyS_foo() syscall aliases in kallsyms

2017-06-15 Thread Pratyush Anand




On Thursday 15 June 2017 08:00 PM, Steven Rostedt wrote:

On Thu, 15 Jun 2017 19:56:40 +0530
Pratyush Anand <pan...@redhat.com> wrote:


From: Steven Rostedt <rost...@goodmis.org>


Hi Steve,

Is there any plan for next revision of this patchset.



Heh, I forgot about this. Actually, I didn't.  I was just thinking
about the issue of SyS vs sys, but forgot I had patches to fix it.

I should push it again.

Thanks for the reminder.


Thanks for the quick reply :-)

Pratyush

Re: [RFC, 2/2] kallsyms: Do not display SyS_foo() syscall aliases in kallsyms

2017-06-15 Thread Pratyush Anand




On Thursday 15 June 2017 08:00 PM, Steven Rostedt wrote:

On Thu, 15 Jun 2017 19:56:40 +0530
Pratyush Anand  wrote:


From: Steven Rostedt 


Hi Steve,

Is there any plan for next revision of this patchset.



Heh, I forgot about this. Actually, I didn't.  I was just thinking
about the issue of SyS vs sys, but forgot I had patches to fix it.

I should push it again.

Thanks for the reminder.


Thanks for the quick reply :-)

Pratyush

Re: [RFC, 2/2] kallsyms: Do not display SyS_foo() syscall aliases in kallsyms

2017-06-15 Thread Pratyush Anand




On Thursday 15 June 2017 07:56 PM, Pratyush Anand wrote:

From: Steven Rostedt <rost...@goodmis.org>


Sorry, Please ignore this. I did not had this patch in my inbox and replied 
through patchwork mailbox. Forgot to change the "from" field. Here is the 
patch link, which could resolve issues with trace filtering of aliased system 
calls.


https://patchwork.kernel.org/patch/6351951/

Please let me know if I can help in working for next revision.

Pratyush



Hi Steve,

Is there any plan for next revision of this patchset.

Regards
Pratyush

Re: [RFC, 2/2] kallsyms: Do not display SyS_foo() syscall aliases in kallsyms

2017-06-15 Thread Pratyush Anand




On Thursday 15 June 2017 07:56 PM, Pratyush Anand wrote:

From: Steven Rostedt 


Sorry, Please ignore this. I did not had this patch in my inbox and replied 
through patchwork mailbox. Forgot to change the "from" field. Here is the 
patch link, which could resolve issues with trace filtering of aliased system 
calls.


https://patchwork.kernel.org/patch/6351951/

Please let me know if I can help in working for next revision.

Pratyush



Hi Steve,

Is there any plan for next revision of this patchset.

Regards
Pratyush

[RFC, 2/2] kallsyms: Do not display SyS_foo() syscall aliases in kallsyms

2017-06-15 Thread Pratyush Anand

From: Steven Rostedt 


Hi Steve,

Is there any plan for next revision of this patchset.

Regards
Pratyush

[RFC, 2/2] kallsyms: Do not display SyS_foo() syscall aliases in kallsyms

2017-06-15 Thread Pratyush Anand

From: Steven Rostedt 


Hi Steve,

Is there any plan for next revision of this patchset.

Regards
Pratyush

[PATCH] mei: make sysfs modalias format similar as uevent modalias

2017-05-24 Thread Pratyush Anand

modprobe is not able to resolve sysfs modalias for mei devices.

 # cat
/sys/class/watchdog/watchdog0/device/watchdog/watchdog0/device/modalias
mei::05b79a6f-4628-4d7f-899d-a91514cb32ab:
 # modprobe --set-version 4.9.6-200.fc25.x86_64 -R
mei::05b79a6f-4628-4d7f-899d-a91514cb32ab:
modprobe: FATAL: Module mei::05b79a6f-4628-4d7f-899d-a91514cb32ab: not
found in directory /lib/modules/4.9.6-200.fc25.x86_64
 # cat /lib/modules/4.9.6-200.fc25.x86_64/modules.alias | grep
05b79a6f-4628-4d7f-899d-a91514cb32ab
alias mei:*:05b79a6f-4628-4d7f-899d-a91514cb32ab:*:* mei_wdt

commit b26864cad1c9 changed uevent modalias as mei:S:uuid:N:*, however
sysfs modalias is still in formmat mei:S:uuid:*.

This patch equates format of uevent and sysfs modalias so that modprobe
is able to resolve the aliases.

Signed-off-by: Pratyush Anand <pan...@redhat.com>
---
 drivers/misc/mei/bus.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/misc/mei/bus.c b/drivers/misc/mei/bus.c
index d1928fdd0f43..07aad8576334 100644
--- a/drivers/misc/mei/bus.c
+++ b/drivers/misc/mei/bus.c
@@ -763,8 +763,10 @@ static ssize_t modalias_show(struct device *dev, struct 
device_attribute *a,
 {
struct mei_cl_device *cldev = to_mei_cl_device(dev);
const uuid_le *uuid = mei_me_cl_uuid(cldev->me_cl);
+   u8 version = mei_me_cl_ver(cldev->me_cl);
 
-   return scnprintf(buf, PAGE_SIZE, "mei:%s:%pUl:", cldev->name, uuid);
+   return scnprintf(buf, PAGE_SIZE, "mei:%s:%pUl:%02X:",
+cldev->name, uuid, version);
 }
 static DEVICE_ATTR_RO(modalias);
 
-- 
2.9.3

[PATCH] mei: make sysfs modalias format similar as uevent modalias

2017-05-24 Thread Pratyush Anand

modprobe is not able to resolve sysfs modalias for mei devices.

 # cat
/sys/class/watchdog/watchdog0/device/watchdog/watchdog0/device/modalias
mei::05b79a6f-4628-4d7f-899d-a91514cb32ab:
 # modprobe --set-version 4.9.6-200.fc25.x86_64 -R
mei::05b79a6f-4628-4d7f-899d-a91514cb32ab:
modprobe: FATAL: Module mei::05b79a6f-4628-4d7f-899d-a91514cb32ab: not
found in directory /lib/modules/4.9.6-200.fc25.x86_64
 # cat /lib/modules/4.9.6-200.fc25.x86_64/modules.alias | grep
05b79a6f-4628-4d7f-899d-a91514cb32ab
alias mei:*:05b79a6f-4628-4d7f-899d-a91514cb32ab:*:* mei_wdt

commit b26864cad1c9 changed uevent modalias as mei:S:uuid:N:*, however
sysfs modalias is still in formmat mei:S:uuid:*.

This patch equates format of uevent and sysfs modalias so that modprobe
is able to resolve the aliases.

Signed-off-by: Pratyush Anand 
---
 drivers/misc/mei/bus.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/misc/mei/bus.c b/drivers/misc/mei/bus.c
index d1928fdd0f43..07aad8576334 100644
--- a/drivers/misc/mei/bus.c
+++ b/drivers/misc/mei/bus.c
@@ -763,8 +763,10 @@ static ssize_t modalias_show(struct device *dev, struct 
device_attribute *a,
 {
struct mei_cl_device *cldev = to_mei_cl_device(dev);
const uuid_le *uuid = mei_me_cl_uuid(cldev->me_cl);
+   u8 version = mei_me_cl_ver(cldev->me_cl);
 
-   return scnprintf(buf, PAGE_SIZE, "mei:%s:%pUl:", cldev->name, uuid);
+   return scnprintf(buf, PAGE_SIZE, "mei:%s:%pUl:%02X:",
+cldev->name, uuid, version);
 }
 static DEVICE_ATTR_RO(modalias);
 
-- 
2.9.3

Re: [PATCH v2] kexec/kdump: Minor Documentation updates for arm64 and Image

2017-05-22 Thread Pratyush Anand




On Thursday 18 May 2017 04:23 PM, Bharat Bhushan wrote:

This patch have minor updates in Documentation for arm64i as

arm64

relocatable kernel.
Also this patch updates documentation for using uncompressed
image "Image" which is used for ARM64.

Signed-off-by: Bharat Bhushan <bharat.bhus...@nxp.com>
---
v1->v2
  - "a uncompressed" replaced with "an uncompressed"


Reviewed-by: Pratyush Anand <pan...@redhat.com>



 Documentation/kdump/kdump.txt | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/Documentation/kdump/kdump.txt b/Documentation/kdump/kdump.txt
index 615434d..5181445 100644
--- a/Documentation/kdump/kdump.txt
+++ b/Documentation/kdump/kdump.txt
@@ -112,8 +112,8 @@ There are two possible methods of using Kdump.
 2) Or use the system kernel binary itself as dump-capture kernel and there is
no need to build a separate dump-capture kernel. This is possible
only with the architectures which support a relocatable kernel. As
-   of today, i386, x86_64, ppc64, ia64 and arm architectures support 
relocatable
-   kernel.
+   of today, i386, x86_64, ppc64, ia64, arm and arm64 architectures support
+   relocatable kernel.

 Building a relocatable kernel is advantageous from the point of view that
 one does not have to build a second kernel for capturing the dump. But
@@ -339,7 +339,7 @@ For arm:
 For arm64:
- Use vmlinux or Image

-If you are using a uncompressed vmlinux image then use following command
+If you are using an uncompressed vmlinux image then use following command
 to load dump-capture kernel.

kexec -p  \
@@ -361,6 +361,12 @@ to load dump-capture kernel.
--dtb= \
--append="root= "

+If you are using an uncompressed Image, then use following command
+to load dump-capture kernel.
+
+   kexec -p  \
+   --initrd= \
+   --append="root= "

 Please note, that --args-linux does not need to be specified for ia64.
 It is planned to make this a no-op on that architecture, but for now

Re: [PATCH v2] kexec/kdump: Minor Documentation updates for arm64 and Image

2017-05-22 Thread Pratyush Anand




On Thursday 18 May 2017 04:23 PM, Bharat Bhushan wrote:

This patch have minor updates in Documentation for arm64i as

arm64

relocatable kernel.
Also this patch updates documentation for using uncompressed
image "Image" which is used for ARM64.

Signed-off-by: Bharat Bhushan 
---
v1->v2
  - "a uncompressed" replaced with "an uncompressed"


Reviewed-by: Pratyush Anand 



 Documentation/kdump/kdump.txt | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/Documentation/kdump/kdump.txt b/Documentation/kdump/kdump.txt
index 615434d..5181445 100644
--- a/Documentation/kdump/kdump.txt
+++ b/Documentation/kdump/kdump.txt
@@ -112,8 +112,8 @@ There are two possible methods of using Kdump.
 2) Or use the system kernel binary itself as dump-capture kernel and there is
no need to build a separate dump-capture kernel. This is possible
only with the architectures which support a relocatable kernel. As
-   of today, i386, x86_64, ppc64, ia64 and arm architectures support 
relocatable
-   kernel.
+   of today, i386, x86_64, ppc64, ia64, arm and arm64 architectures support
+   relocatable kernel.

 Building a relocatable kernel is advantageous from the point of view that
 one does not have to build a second kernel for capturing the dump. But
@@ -339,7 +339,7 @@ For arm:
 For arm64:
- Use vmlinux or Image

-If you are using a uncompressed vmlinux image then use following command
+If you are using an uncompressed vmlinux image then use following command
 to load dump-capture kernel.

kexec -p  \
@@ -361,6 +361,12 @@ to load dump-capture kernel.
--dtb= \
--append="root= "

+If you are using an uncompressed Image, then use following command
+to load dump-capture kernel.
+
+   kexec -p  \
+   --initrd= \
+   --append="root= "

 Please note, that --args-linux does not need to be specified for ia64.
 It is planned to make this a no-op on that architecture, but for now

Re: [PATCH] kexec/kdump: Minor Documentation updates for arm64 and Image

2017-05-22 Thread Pratyush Anand




On Monday 22 May 2017 12:19 PM, Bharat Bhushan wrote:

On Friday 19 May 2017 09:15 AM, AKASHI Takahiro wrote:

+to load dump-capture kernel.
+
+   kexec -p  \
+   --initrd= \
+   --append="root= "

For uncompressed Image, dtb is not necessary?

Just for clarification, dtb is optional for both vmlinux and Image on
arm64. (This means you can specify it if you want.) But this is also
true for initrd and append(command line) to some extent.


Yes, I agree.


Should I mention "-dtb" also for "Image"?


No,I think it is fine.

This documentation is representing a typical use case and so above changes is 
OK for me. I think,your v2 is fine.


~Pratyush



Also do we need to mention that it is optional somewhere in this document? I do not see 
"optional" is mentioned for other parameters and architecture.

Does this looks ok:

" -dtb= \"

Thanks
-Bharat



More precisely, whether these parameters are optional or not will
depend on architectures, not formats, I believe.


May be not architecture, rather a distro environment.

For example, we should be able to work without --initrd for any arch if kernel
has been compiled by configuring CONFG_INITRAMFS_SOURCE.

~Pratyush

Re: [PATCH] kexec/kdump: Minor Documentation updates for arm64 and Image

2017-05-22 Thread Pratyush Anand




On Monday 22 May 2017 12:19 PM, Bharat Bhushan wrote:

On Friday 19 May 2017 09:15 AM, AKASHI Takahiro wrote:

+to load dump-capture kernel.
+
+   kexec -p  \
+   --initrd= \
+   --append="root= "

For uncompressed Image, dtb is not necessary?

Just for clarification, dtb is optional for both vmlinux and Image on
arm64. (This means you can specify it if you want.) But this is also
true for initrd and append(command line) to some extent.


Yes, I agree.


Should I mention "-dtb" also for "Image"?


No,I think it is fine.

This documentation is representing a typical use case and so above changes is 
OK for me. I think,your v2 is fine.


~Pratyush



Also do we need to mention that it is optional somewhere in this document? I do not see 
"optional" is mentioned for other parameters and architecture.

Does this looks ok:

" -dtb= \"

Thanks
-Bharat



More precisely, whether these parameters are optional or not will
depend on architectures, not formats, I believe.


May be not architecture, rather a distro environment.

For example, we should be able to work without --initrd for any arch if kernel
has been compiled by configuring CONFG_INITRAMFS_SOURCE.

~Pratyush

Re: [PATCH] kexec/kdump: Minor Documentation updates for arm64 and Image

2017-05-22 Thread Pratyush Anand




On Friday 19 May 2017 09:15 AM, AKASHI Takahiro wrote:

+to load dump-capture kernel.
+
+   kexec -p  \
+   --initrd= \
+   --append="root= "

For uncompressed Image, dtb is not necessary?

Just for clarification, dtb is optional for both vmlinux and Image
on arm64. (This means you can specify it if you want.)
But this is also true for initrd and append(command line) to some extent.


Yes, I agree.


More precisely, whether these parameters are optional or not will
depend on architectures, not formats, I believe.


May be not architecture, rather a distro environment.

For example, we should be able to work without --initrd for any arch if kernel 
has been compiled by configuring CONFG_INITRAMFS_SOURCE.


~Pratyush

Re: [PATCH] kexec/kdump: Minor Documentation updates for arm64 and Image

2017-05-22 Thread Pratyush Anand




On Friday 19 May 2017 09:15 AM, AKASHI Takahiro wrote:

+to load dump-capture kernel.
+
+   kexec -p  \
+   --initrd= \
+   --append="root= "

For uncompressed Image, dtb is not necessary?

Just for clarification, dtb is optional for both vmlinux and Image
on arm64. (This means you can specify it if you want.)
But this is also true for initrd and append(command line) to some extent.


Yes, I agree.


More precisely, whether these parameters are optional or not will
depend on architectures, not formats, I believe.


May be not architecture, rather a distro environment.

For example, we should be able to work without --initrd for any arch if kernel 
has been compiled by configuring CONFG_INITRAMFS_SOURCE.


~Pratyush

Re: [PATCH 1/2] perf probe: Fix wrong register name for arm64

2017-05-03 Thread Pratyush Anand

Just noticed that this patch fixes commit e47392bf9c06 (v4.8+). So,
Ccing sta...@vger.kernel.org
[e47392bf9c06 perf uprobe: Skip prologue if program compiled without
optimization]

After e47392bf9c06 on ARM64:
# cat > test.c << EOF
> #include 
>
> int main(int argc, void **argv)
> {
> printf("argc =%d\n", argc);
> }
> EOF
# gcc -g  -o test test.c
# perf probe -x ./test -a 'main argc'
Target program is compiled without optimization. Skipping prologue.
Probe on address 0x400600 to force probing at the function entry.

Failed to write event: Invalid argument
  Error: Failed to add events.

If this patch is applied on top of e47392bf9c06:
perf probe -x ./test -a 'main argc'
Target program is compiled without optimization. Skipping prologue.
Probe on address 0x400600 to force probing at the function entry.

Added new event:
  probe_test:main  (on main in
/home/panand/work/test_code/uprobe_test/test with argc)

You can now use it in all perf tools, such as:

perf record -e probe_test:main -aR sleep 1

~Pratyush

On Tue, Jan 24, 2017 at 4:00 PM, He Kuang  wrote:
> The register name of arm64 architecture is x0-x31 not r0-r31, this
> patch changes this typo.
>
> Before this patch:
>
>   # perf probe --definition 'sys_write count'
>   p:probe/sys_write _text+1502872 count=%r2:s64
>
>   # echo 'p:probe/sys_write _text+1502872 count=%r2:s64' > \
> /sys/kernel/debug/tracing/kprobe_events
>   Parse error at argument[0]. (-22)
>
> After this patch:
>
>   # perf probe --definition 'sys_write count'
>   p:probe/sys_write _text+1502872 count=%x2:s64
>
>   # echo 'p:probe/sys_write _text+1502872 count=%x2:s64' > \
> /sys/kernel/debug/tracing/kprobe_events
>   # echo 1 >/sys/kernel/debug/tracing/events/probe/enable
>   # cat /sys/kernel/debug/tracing/trace
>   ...
>   sh-422   [000] d... 650.495930: sys_write: (SyS_write+0x0/0xc8) count=22
>   sh-422   [000] d... 651.102389: sys_write: (SyS_write+0x0/0xc8) count=26
>   sh-422   [000] d... 651.358653: sys_write: (SyS_write+0x0/0xc8) count=86
>
> Signed-off-by: He Kuang 
> ---
>  tools/perf/arch/arm64/include/dwarf-regs-table.h | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/tools/perf/arch/arm64/include/dwarf-regs-table.h 
> b/tools/perf/arch/arm64/include/dwarf-regs-table.h
> index 2675936..36e375f 100644
> --- a/tools/perf/arch/arm64/include/dwarf-regs-table.h
> +++ b/tools/perf/arch/arm64/include/dwarf-regs-table.h
> @@ -2,12 +2,12 @@
>  /* This is included in perf/util/dwarf-regs.c */
>
>  static const char * const aarch64_regstr_tbl[] = {
> -   "%r0", "%r1", "%r2", "%r3", "%r4",
> -   "%r5", "%r6", "%r7", "%r8", "%r9",
> -   "%r10", "%r11", "%r12", "%r13", "%r14",
> -   "%r15", "%r16", "%r17", "%r18", "%r19",
> -   "%r20", "%r21", "%r22", "%r23", "%r24",
> -   "%r25", "%r26", "%r27", "%r28", "%r29",
> +   "%x0", "%x1", "%x2", "%x3", "%x4",
> +   "%x5", "%x6", "%x7", "%x8", "%x9",
> +   "%x10", "%x11", "%x12", "%x13", "%x14",
> +   "%x15", "%x16", "%x17", "%x18", "%x19",
> +   "%x20", "%x21", "%x22", "%x23", "%x24",
> +   "%x25", "%x26", "%x27", "%x28", "%x29",
> "%lr", "%sp",
>  };
>  #endif
> --
> 1.8.5.2
>

Re: [PATCH 1/2] perf probe: Fix wrong register name for arm64

2017-05-03 Thread Pratyush Anand

Just noticed that this patch fixes commit e47392bf9c06 (v4.8+). So,
Ccing sta...@vger.kernel.org
[e47392bf9c06 perf uprobe: Skip prologue if program compiled without
optimization]

After e47392bf9c06 on ARM64:
# cat > test.c << EOF
> #include 
>
> int main(int argc, void **argv)
> {
> printf("argc =%d\n", argc);
> }
> EOF
# gcc -g  -o test test.c
# perf probe -x ./test -a 'main argc'
Target program is compiled without optimization. Skipping prologue.
Probe on address 0x400600 to force probing at the function entry.

Failed to write event: Invalid argument
  Error: Failed to add events.

If this patch is applied on top of e47392bf9c06:
perf probe -x ./test -a 'main argc'
Target program is compiled without optimization. Skipping prologue.
Probe on address 0x400600 to force probing at the function entry.

Added new event:
  probe_test:main  (on main in
/home/panand/work/test_code/uprobe_test/test with argc)

You can now use it in all perf tools, such as:

perf record -e probe_test:main -aR sleep 1

~Pratyush

On Tue, Jan 24, 2017 at 4:00 PM, He Kuang  wrote:
> The register name of arm64 architecture is x0-x31 not r0-r31, this
> patch changes this typo.
>
> Before this patch:
>
>   # perf probe --definition 'sys_write count'
>   p:probe/sys_write _text+1502872 count=%r2:s64
>
>   # echo 'p:probe/sys_write _text+1502872 count=%r2:s64' > \
> /sys/kernel/debug/tracing/kprobe_events
>   Parse error at argument[0]. (-22)
>
> After this patch:
>
>   # perf probe --definition 'sys_write count'
>   p:probe/sys_write _text+1502872 count=%x2:s64
>
>   # echo 'p:probe/sys_write _text+1502872 count=%x2:s64' > \
> /sys/kernel/debug/tracing/kprobe_events
>   # echo 1 >/sys/kernel/debug/tracing/events/probe/enable
>   # cat /sys/kernel/debug/tracing/trace
>   ...
>   sh-422   [000] d... 650.495930: sys_write: (SyS_write+0x0/0xc8) count=22
>   sh-422   [000] d... 651.102389: sys_write: (SyS_write+0x0/0xc8) count=26
>   sh-422   [000] d... 651.358653: sys_write: (SyS_write+0x0/0xc8) count=86
>
> Signed-off-by: He Kuang 
> ---
>  tools/perf/arch/arm64/include/dwarf-regs-table.h | 12 ++--
>  1 file changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/tools/perf/arch/arm64/include/dwarf-regs-table.h 
> b/tools/perf/arch/arm64/include/dwarf-regs-table.h
> index 2675936..36e375f 100644
> --- a/tools/perf/arch/arm64/include/dwarf-regs-table.h
> +++ b/tools/perf/arch/arm64/include/dwarf-regs-table.h
> @@ -2,12 +2,12 @@
>  /* This is included in perf/util/dwarf-regs.c */
>
>  static const char * const aarch64_regstr_tbl[] = {
> -   "%r0", "%r1", "%r2", "%r3", "%r4",
> -   "%r5", "%r6", "%r7", "%r8", "%r9",
> -   "%r10", "%r11", "%r12", "%r13", "%r14",
> -   "%r15", "%r16", "%r17", "%r18", "%r19",
> -   "%r20", "%r21", "%r22", "%r23", "%r24",
> -   "%r25", "%r26", "%r27", "%r28", "%r29",
> +   "%x0", "%x1", "%x2", "%x3", "%x4",
> +   "%x5", "%x6", "%x7", "%x8", "%x9",
> +   "%x10", "%x11", "%x12", "%x13", "%x14",
> +   "%x15", "%x16", "%x17", "%x18", "%x19",
> +   "%x20", "%x21", "%x22", "%x23", "%x24",
> +   "%x25", "%x26", "%x27", "%x28", "%x29",
> "%lr", "%sp",
>  };
>  #endif
> --
> 1.8.5.2
>

Re: [trace-cmd Patch RFC] trace-cmd: top: A new interface to detect peak memory

2017-04-27 Thread Pratyush Anand

On Thursday 27 April 2017 10:19 PM, Steven Rostedt wrote:

On Thu, 27 Apr 2017 19:32:43 +0530
Pratyush Anand <pan...@redhat.com> wrote:

I will implement your review comments and will send next revision.
However, I had couple of observation which I was unable to justify:

# ./trace-cmd top -s /tmp/test
# ./trace-cmd top -p /tmp/test | grep trace-cmd
15292 trace-cmd 22144 15808

What does it give for your /tmp/test ?

/tmp/test is name of socket through which both trace aggregator (-s) and
printer (-p) process are talking to each other.

Note, tracing doesn't start till after trace-cmd is loaded. Everything
before is not going to be seen.

Yes. So with `./trace-cmd top -s /tmp/test` tracing starts and it aggregates
stats for all processes including self process as well. After some time I
execute `./trace-cmd top -p /tmp/test` which prints stats for all the
processes including self. In the above example, I see that peak memory
calculated for self process (`./trace-cmd top -s /tmp/test`) was 22144KB.

Here,15292 is the pid of trace-cmd task
22144 KB is the peak memory usage
15808 KB is the current memory usage

Now check rss component from statm
# cat /proc/15292/statm
50 35 23 7 0 12 0 36

This is a result on ARM64/64KB page size. Therefore, as per statm rss is 35
pages = 35*64 = 2240KB
I patched my kernel [2] for test purpose, so that statm reports peak memory as
well. Here, the last extra entry in statm output is peak memory and it is 36
pages = 2304KB.
So, this is a huge difference between what has been reported by statm and what
we get from trace-cmd.
I understand that `trace-cmd top` output would only be approximate, because
many of the memory could be allocated by task and freed in interrupt context.
So, many a time it can differ. But, is such a huge difference justified? If
yes, can we count on the output of this utility to find early boot time oom
issues?

Doesn't it only just trace the memory usage of when the tracing starts?

So,should be less than what is being reported by statm, right? But, here we
see that value reported by trace-cmd is far more than rss value of statm.

Well, there is one known bug in the code, which I am trying to find out a way.
But, that should not cause this huge difference.

When we receive a trace entry for any new process, we read rss value from
statm. We think,that these are the memory consumption by that process before
tracing started. So,we initialize peak and current memory value for that
process with rss from statm, and then we increase current when there is an
entry for mm_page_alloc() and decrease when an entry for mm_page_free(). peak
value of current at any point is reported as peak memory.

Now, lets consider a process p1 which was existing before tracing started. its
rss was having value r1 KB. we receive first trace entry for p1, which says
8KB was allocated for p1.

p1->cur = r1 + 8;
p->peak = r1 + 8;

We receive another entry with 4KB de-allocation.

p1->cur = r1 + 4;
p->peak = r1 + 8;

There would have been some of these entries which have also been taken care in
statm/rss (r1 here). We do not know how many entries were already considered.
Currently, these n (unknown) entries [which had been generated between (a) 1st
tarce entry and (b) when trace-cmd top reads statm/rss] are considered twice,
which is a bug.

~Pratyush

[1]
https://github.com/pratyushanand/trace-cmd/commit/602c2cd96aa613633ad20c6d382e41f7db37a2a4
[2]
https://github.com/pratyushanand/linux/commit/197e2045361b6b70085c46c78ea6607d8afce517

Re: [trace-cmd Patch RFC] trace-cmd: top: A new interface to detect peak memory

2017-04-27 Thread Pratyush Anand

On Thursday 27 April 2017 10:19 PM, Steven Rostedt wrote:

On Thu, 27 Apr 2017 19:32:43 +0530
Pratyush Anand wrote: