Re: [RFC PATCH 1/1] proc: introduce /proc/pid/lbr_stack

2015-02-28 Thread Peter Zijlstra
On Fri, Feb 27, 2015 at 03:57:02PM -0800, Andi Kleen wrote:
 On Fri, Feb 27, 2015 at 11:05:45PM +0100, Peter Zijlstra wrote:
  On Fri, Feb 27, 2015 at 09:54:34AM -0800, Andi Kleen wrote:
 perf record doesn't show where you're currently blocked.

Of course it does; look at perf inject -s.
   
   Trace points don't support the LBR stack.
  
  Yes, indeed. But would it not make much more sense to squirrel the LBR
  state into sched:sched_switch and teach that inject -s thing to dtrt,
  than to make a proc file that's available on all archs but will only
  work on 1-2 x86 uarchs and only if you're also running the right magic
  perf record at the same time?
 
 Yes. It would be nice to capture the whole PMU state in trace points.
 There are use models for this where it can work better than
 sampling.
 
 But that would be a lot bigger project than this simple file,
 which is already quite useful with minimal effort.

Its also the most horrible hack of an interface ever, so no go.
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 1/1] proc: introduce /proc/pid/lbr_stack

2015-02-27 Thread Andi Kleen
On Fri, Feb 27, 2015 at 08:58:29AM +0100, Peter Zijlstra wrote:
 On Mon, Feb 23, 2015 at 09:44:48AM -0800, Andi Kleen wrote:
  On Mon, Feb 23, 2015 at 05:49:57PM +0100, Peter Zijlstra wrote:
   On Mon, Feb 23, 2015 at 03:43:41AM +, kan.li...@intel.com wrote:
From: Kan Liang kan.li...@intel.com

Haswell has a new feature that utilizes the existing Last Branch Record
facility to record call chains. It has been implemented in perf. The
call chains information is saved during perf event context.

This patch exposes a /proc/pid/lbr_stack file that shows the saved LBR
call chain information.
   
   But why? I mean, this thing is only useful if you have a concurrently
   running perf record that selects the LBR-stack stuff.
   
   And if you have that, you might as well look at its output instead. Why
   add this unconditional proc file that doesn't function on its own?
  
  perf record doesn't show where you're currently blocked.
 
 Of course it does; look at perf inject -s.

Trace points don't support the LBR stack.

-Andi
-- 
a...@linux.intel.com -- Speaking for myself only
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 1/1] proc: introduce /proc/pid/lbr_stack

2015-02-27 Thread Peter Zijlstra
On Fri, Feb 27, 2015 at 09:54:34AM -0800, Andi Kleen wrote:
   perf record doesn't show where you're currently blocked.
  
  Of course it does; look at perf inject -s.
 
 Trace points don't support the LBR stack.

Yes, indeed. But would it not make much more sense to squirrel the LBR
state into sched:sched_switch and teach that inject -s thing to dtrt,
than to make a proc file that's available on all archs but will only
work on 1-2 x86 uarchs and only if you're also running the right magic
perf record at the same time?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 1/1] proc: introduce /proc/pid/lbr_stack

2015-02-27 Thread Andi Kleen
On Fri, Feb 27, 2015 at 11:05:45PM +0100, Peter Zijlstra wrote:
 On Fri, Feb 27, 2015 at 09:54:34AM -0800, Andi Kleen wrote:
perf record doesn't show where you're currently blocked.
   
   Of course it does; look at perf inject -s.
  
  Trace points don't support the LBR stack.
 
 Yes, indeed. But would it not make much more sense to squirrel the LBR
 state into sched:sched_switch and teach that inject -s thing to dtrt,
 than to make a proc file that's available on all archs but will only
 work on 1-2 x86 uarchs and only if you're also running the right magic
 perf record at the same time?

Yes. It would be nice to capture the whole PMU state in trace points.
There are use models for this where it can work better than
sampling.

But that would be a lot bigger project than this simple file,
which is already quite useful with minimal effort.

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 1/1] proc: introduce /proc/pid/lbr_stack

2015-02-26 Thread Peter Zijlstra
On Mon, Feb 23, 2015 at 09:44:48AM -0800, Andi Kleen wrote:
 On Mon, Feb 23, 2015 at 05:49:57PM +0100, Peter Zijlstra wrote:
  On Mon, Feb 23, 2015 at 03:43:41AM +, kan.li...@intel.com wrote:
   From: Kan Liang kan.li...@intel.com
   
   Haswell has a new feature that utilizes the existing Last Branch Record
   facility to record call chains. It has been implemented in perf. The
   call chains information is saved during perf event context.
   
   This patch exposes a /proc/pid/lbr_stack file that shows the saved LBR
   call chain information.
  
  But why? I mean, this thing is only useful if you have a concurrently
  running perf record that selects the LBR-stack stuff.
  
  And if you have that, you might as well look at its output instead. Why
  add this unconditional proc file that doesn't function on its own?
 
 perf record doesn't show where you're currently blocked.

Of course it does; look at perf inject -s.

  http://article.gmane.org/gmane.linux.kernel/1225774
  http://article.gmane.org/gmane.linux.kernel/1225775
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[RFC PATCH 1/1] proc: introduce /proc/pid/lbr_stack

2015-02-23 Thread kan . liang
From: Kan Liang kan.li...@intel.com

Haswell has a new feature that utilizes the existing Last Branch Record
facility to record call chains. It has been implemented in perf. The
call chains information is saved during perf event context.

This patch exposes a /proc/pid/lbr_stack file that shows the saved LBR
call chain information.

Currently, there are already some tools which can dump stack(E.g.
gstack). However, all of these tools rely on frame pointer or dwarf
information.
The LBR call stack facility provides an alternative to get stack. It
doesn't need the debug information to construct the stack.
One common case is backtracing through the libpthread library in glibc
which is partially in assembler and neither have full dwarf annotation
nor frame pointers.
It's also helpful for jited code.

Here are some examples. perf_stack uses /proc/pid/lbr_stack to dump
stack information.
Example 1:
tchain_edit is a binary with debug information.

./tchain_edit 
[1] 8058

gstack 8058
0  0x0040054d in f3 ()
1  0x00400587 in f2 ()
2  0x004005b3 in f1 ()
3  0x004005f4 in main ()

./perf_stack 8058
0  0x00400540: f3 at ??:?
1  0x0040057d: f2 at ??:?
2  0x004005ae: f1 at ??:?
3  0x004005f9: main at ??:?

With debug information, both gstack and perf_stack dump stack
information.

Example 2:
tchain_edit_ch is a binary which doesn't include either dwarf or frame
pointer information.

./tchain_edit_ch 
[1] 8084

gstack 8084
0  0x00400568 in ?? ()
1  0x7fff134a7960 in ?? ()
2  0x00400587 in ?? ()
3  0x7fff134a7aa8 in ?? ()
4  0x0046 in ?? ()
5  0x7fff134a7980 in ?? ()
6  0x004005b8 in ?? ()
7  0x in ?? ()

gstack shows the wrong stack.

./perf_stack 8084
0  0x00400540: ?? ??:0
1  0x00400582: ?? ??:0
2  0x004005ae: ?? ??:0
3  0x004005f9: ?? ??:0

LBR call stack shows the correct stack.

Here is the perf_stack script.

perf record --call-graph lbr --pid $1 
perf_pid=$!
running_cpu=`cat /proc/$1/stat | awk '{print $39}'`
cpu_tmp=$((1$running_cpu))
cpu=`printf 0x%X $cpu_tmp`
//run something to force context switch
taskset $cpu sleep 2
//dump LBR call stack
i=0
while read -r line
do
function=$(addr2line $line -e /proc/$1/exe -fap)
echo #$i  $function
i=`expr $i + 1`
done  /proc/$1/lbr_stack
kill -9 $perf_pid

The LBR call stack has following known limitations
 - Only available for haswell and later platform
 - Only dump user stack
 - Exception handing such as setjmp/longjmp will have calls/returns not
   match
 - Pushing different return address onto the stack will have
   calls/returns not match
 - If callstack is deeper than the LBR, only the last entries are
   captured

Signed-off-by: Kan Liang kan.li...@intel.com
---
 arch/x86/include/asm/perf_event.h  |  2 ++
 arch/x86/kernel/cpu/perf_event.c   |  9 +++
 arch/x86/kernel/cpu/perf_event.h   |  9 +--
 arch/x86/kernel/cpu/perf_event_intel.c |  1 +
 arch/x86/kernel/cpu/perf_event_intel_lbr.c | 16 +++
 fs/proc/base.c | 43 ++
 include/linux/perf_event.h |  8 +-
 7 files changed, 85 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/perf_event.h 
b/arch/x86/include/asm/perf_event.h
index dc0f6ed..70f07fd 100644
--- a/arch/x86/include/asm/perf_event.h
+++ b/arch/x86/include/asm/perf_event.h
@@ -11,6 +11,8 @@
 
 #define X86_PMC_IDX_MAX   64
 
+#define MAX_LBR_ENTRIES   16
+
 #define MSR_ARCH_PERFMON_PERFCTR00xc1
 #define MSR_ARCH_PERFMON_PERFCTR10xc2
 
diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index e0dab5c..0b39f72 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -1922,6 +1922,14 @@ static void x86_pmu_sched_task(struct perf_event_context 
*ctx, bool sched_in)
x86_pmu.sched_task(ctx, sched_in);
 }
 
+static void x86_pmu_save_lbr_stack(struct perf_event_context *ctx,
+   __u64 *lbr_nr,
+   struct perf_branch_entry *lbr_entries)
+{
+   if (x86_pmu.save_lbr_stack)
+   x86_pmu.save_lbr_stack(ctx, lbr_nr, lbr_entries);
+}
+
 void perf_check_microcode(void)
 {
if (x86_pmu.check_microcode)
@@ -1952,6 +1960,7 @@ static struct pmu pmu = {
 
.event_idx  = x86_pmu_event_idx,
.sched_task = x86_pmu_sched_task,
+   .save_lbr_stack = x86_pmu_save_lbr_stack,
.task_ctx_size  = sizeof(struct x86_perf_task_context),
 };
 
diff --git a/arch/x86/kernel/cpu/perf_event.h b/arch/x86/kernel/cpu/perf_event.h
index a371d27..29d8b14 100644
--- a/arch/x86/kernel/cpu/perf_event.h
+++ b/arch/x86/kernel/cpu/perf_event.h

Re: [RFC PATCH 1/1] proc: introduce /proc/pid/lbr_stack

2015-02-23 Thread Peter Zijlstra
On Mon, Feb 23, 2015 at 03:43:41AM +, kan.li...@intel.com wrote:
 From: Kan Liang kan.li...@intel.com
 
 Haswell has a new feature that utilizes the existing Last Branch Record
 facility to record call chains. It has been implemented in perf. The
 call chains information is saved during perf event context.
 
 This patch exposes a /proc/pid/lbr_stack file that shows the saved LBR
 call chain information.

But why? I mean, this thing is only useful if you have a concurrently
running perf record that selects the LBR-stack stuff.

And if you have that, you might as well look at its output instead. Why
add this unconditional proc file that doesn't function on its own?
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 1/1] proc: introduce /proc/pid/lbr_stack

2015-02-23 Thread Andi Kleen
On Mon, Feb 23, 2015 at 05:49:57PM +0100, Peter Zijlstra wrote:
 On Mon, Feb 23, 2015 at 03:43:41AM +, kan.li...@intel.com wrote:
  From: Kan Liang kan.li...@intel.com
  
  Haswell has a new feature that utilizes the existing Last Branch Record
  facility to record call chains. It has been implemented in perf. The
  call chains information is saved during perf event context.
  
  This patch exposes a /proc/pid/lbr_stack file that shows the saved LBR
  call chain information.
 
 But why? I mean, this thing is only useful if you have a concurrently
 running perf record that selects the LBR-stack stuff.
 
 And if you have that, you might as well look at its output instead. Why
 add this unconditional proc file that doesn't function on its own?

perf record doesn't show where you're currently blocked.

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/