Re: [PATCH v2 1/3] perf tools: Use offset instead of dwarfnum in register table.

2017-02-04 Thread Hekuang

hi

在 2017/2/3 21:00, Will Deacon 写道:

On Fri, Feb 03, 2017 at 11:06:05AM +, He Kuang wrote:

This patch changes the 'dwarfnum' to 'offset' in register table, so
the index of array becomes the dwarfnum (the index of each register
defined by DWARF) and the "offset" member means the byte-offset of the
register in (user_)pt_regs. This change makes the code consistent with
x86.

Acked-by: Masami Hiramatsu 
Signed-off-by: He Kuang 
---
  tools/perf/arch/arm64/util/dwarf-regs.c | 107 
  1 file changed, 52 insertions(+), 55 deletions(-)

Thanks for splitting this up. Comment below.


diff --git a/tools/perf/arch/arm64/util/dwarf-regs.c 
b/tools/perf/arch/arm64/util/dwarf-regs.c
index d49efeb..090f36b 100644
--- a/tools/perf/arch/arm64/util/dwarf-regs.c
+++ b/tools/perf/arch/arm64/util/dwarf-regs.c
@@ -9,72 +9,69 @@
   */
  
  #include 

+#include  /* for struct user_pt_regs */
  #include 
  
-struct pt_regs_dwarfnum {

+struct pt_regs_offset {
const char *name;
-   unsigned int dwarfnum;
+   int offset;
  };
  
-#define STR(s) #s

-#define REG_DWARFNUM_NAME(r, num) {.name = r, .dwarfnum = num}
-#define GPR_DWARFNUM_NAME(num) \
-   {.name = STR(%x##num), .dwarfnum = num}
-#define REG_DWARFNUM_END {.name = NULL, .dwarfnum = 0}
-
  /*
   * Reference:
   * 
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0057b/IHI0057B_aadwarf64.pdf
   */
-static const struct pt_regs_dwarfnum regdwarfnum_table[] = {
-   GPR_DWARFNUM_NAME(0),
-   GPR_DWARFNUM_NAME(1),
-   GPR_DWARFNUM_NAME(2),
-   GPR_DWARFNUM_NAME(3),
-   GPR_DWARFNUM_NAME(4),
-   GPR_DWARFNUM_NAME(5),
-   GPR_DWARFNUM_NAME(6),
-   GPR_DWARFNUM_NAME(7),
-   GPR_DWARFNUM_NAME(8),
-   GPR_DWARFNUM_NAME(9),
-   GPR_DWARFNUM_NAME(10),
-   GPR_DWARFNUM_NAME(11),
-   GPR_DWARFNUM_NAME(12),
-   GPR_DWARFNUM_NAME(13),
-   GPR_DWARFNUM_NAME(14),
-   GPR_DWARFNUM_NAME(15),
-   GPR_DWARFNUM_NAME(16),
-   GPR_DWARFNUM_NAME(17),
-   GPR_DWARFNUM_NAME(18),
-   GPR_DWARFNUM_NAME(19),
-   GPR_DWARFNUM_NAME(20),
-   GPR_DWARFNUM_NAME(21),
-   GPR_DWARFNUM_NAME(22),
-   GPR_DWARFNUM_NAME(23),
-   GPR_DWARFNUM_NAME(24),
-   GPR_DWARFNUM_NAME(25),
-   GPR_DWARFNUM_NAME(26),
-   GPR_DWARFNUM_NAME(27),
-   GPR_DWARFNUM_NAME(28),
-   GPR_DWARFNUM_NAME(29),
-   REG_DWARFNUM_NAME("%lr", 30),
-   REG_DWARFNUM_NAME("%sp", 31),
-   REG_DWARFNUM_END,
-};
+#define REG_OFFSET_NAME(r, num) {.name = "%" #r, \
+   .offset = offsetof(struct user_pt_regs, regs[num])}

Whilst this works in practice, this is undefined behaviour for "sp", since
you'll go off the end of the regs array.


It's not undefined behaviour here,
struct user_pt_regs {
__u64   regs[31];
__u64   sp;
__u64   pc;
__u64   pstate;
};
user_pt_regs->regs[31] is user_pt_regs->sp and the offset value is correct.


I still think you're better off sticking with the dwarfnum, then just having
a dwarfnum2offset macro that multiplies by the size of a register.

Will
I think add a ptregs_offset field is more suitable and makes the code 
indepent

to struct user_pt_regs layout, for example if the structure changed to this:

struct user_pt_regs {
__u64   sp;
__u64   pc;
__u64   pstate;
__u64   regs[31];
};

The multiply result will be incorrect.
Patch updated and the change is similar to commit "4679bccaa308"
 (perf tools powerpc: Add support for generating bpf prologue)

Please review, thanks.




Re: [PATCH v2 1/3] perf tools: Use offset instead of dwarfnum in register table.

2017-02-04 Thread Hekuang

hi

在 2017/2/3 21:00, Will Deacon 写道:

On Fri, Feb 03, 2017 at 11:06:05AM +, He Kuang wrote:

This patch changes the 'dwarfnum' to 'offset' in register table, so
the index of array becomes the dwarfnum (the index of each register
defined by DWARF) and the "offset" member means the byte-offset of the
register in (user_)pt_regs. This change makes the code consistent with
x86.

Acked-by: Masami Hiramatsu 
Signed-off-by: He Kuang 
---
  tools/perf/arch/arm64/util/dwarf-regs.c | 107 
  1 file changed, 52 insertions(+), 55 deletions(-)

Thanks for splitting this up. Comment below.


diff --git a/tools/perf/arch/arm64/util/dwarf-regs.c 
b/tools/perf/arch/arm64/util/dwarf-regs.c
index d49efeb..090f36b 100644
--- a/tools/perf/arch/arm64/util/dwarf-regs.c
+++ b/tools/perf/arch/arm64/util/dwarf-regs.c
@@ -9,72 +9,69 @@
   */
  
  #include 

+#include  /* for struct user_pt_regs */
  #include 
  
-struct pt_regs_dwarfnum {

+struct pt_regs_offset {
const char *name;
-   unsigned int dwarfnum;
+   int offset;
  };
  
-#define STR(s) #s

-#define REG_DWARFNUM_NAME(r, num) {.name = r, .dwarfnum = num}
-#define GPR_DWARFNUM_NAME(num) \
-   {.name = STR(%x##num), .dwarfnum = num}
-#define REG_DWARFNUM_END {.name = NULL, .dwarfnum = 0}
-
  /*
   * Reference:
   * 
http://infocenter.arm.com/help/topic/com.arm.doc.ihi0057b/IHI0057B_aadwarf64.pdf
   */
-static const struct pt_regs_dwarfnum regdwarfnum_table[] = {
-   GPR_DWARFNUM_NAME(0),
-   GPR_DWARFNUM_NAME(1),
-   GPR_DWARFNUM_NAME(2),
-   GPR_DWARFNUM_NAME(3),
-   GPR_DWARFNUM_NAME(4),
-   GPR_DWARFNUM_NAME(5),
-   GPR_DWARFNUM_NAME(6),
-   GPR_DWARFNUM_NAME(7),
-   GPR_DWARFNUM_NAME(8),
-   GPR_DWARFNUM_NAME(9),
-   GPR_DWARFNUM_NAME(10),
-   GPR_DWARFNUM_NAME(11),
-   GPR_DWARFNUM_NAME(12),
-   GPR_DWARFNUM_NAME(13),
-   GPR_DWARFNUM_NAME(14),
-   GPR_DWARFNUM_NAME(15),
-   GPR_DWARFNUM_NAME(16),
-   GPR_DWARFNUM_NAME(17),
-   GPR_DWARFNUM_NAME(18),
-   GPR_DWARFNUM_NAME(19),
-   GPR_DWARFNUM_NAME(20),
-   GPR_DWARFNUM_NAME(21),
-   GPR_DWARFNUM_NAME(22),
-   GPR_DWARFNUM_NAME(23),
-   GPR_DWARFNUM_NAME(24),
-   GPR_DWARFNUM_NAME(25),
-   GPR_DWARFNUM_NAME(26),
-   GPR_DWARFNUM_NAME(27),
-   GPR_DWARFNUM_NAME(28),
-   GPR_DWARFNUM_NAME(29),
-   REG_DWARFNUM_NAME("%lr", 30),
-   REG_DWARFNUM_NAME("%sp", 31),
-   REG_DWARFNUM_END,
-};
+#define REG_OFFSET_NAME(r, num) {.name = "%" #r, \
+   .offset = offsetof(struct user_pt_regs, regs[num])}

Whilst this works in practice, this is undefined behaviour for "sp", since
you'll go off the end of the regs array.


It's not undefined behaviour here,
struct user_pt_regs {
__u64   regs[31];
__u64   sp;
__u64   pc;
__u64   pstate;
};
user_pt_regs->regs[31] is user_pt_regs->sp and the offset value is correct.


I still think you're better off sticking with the dwarfnum, then just having
a dwarfnum2offset macro that multiplies by the size of a register.

Will
I think add a ptregs_offset field is more suitable and makes the code 
indepent

to struct user_pt_regs layout, for example if the structure changed to this:

struct user_pt_regs {
__u64   sp;
__u64   pc;
__u64   pstate;
__u64   regs[31];
};

The multiply result will be incorrect.
Patch updated and the change is similar to commit "4679bccaa308"
 (perf tools powerpc: Add support for generating bpf prologue)

Please review, thanks.




Re: [PATCH 2/2 v2] perf tools: Enable bpf prologue for arm64

2017-02-03 Thread Hekuang

hi,

在 2017/1/27 3:31, Arnaldo Carvalho de Melo 写道:

Em Thu, Jan 26, 2017 at 04:52:12PM +, Will Deacon escreveu:

On Thu, Jan 26, 2017 at 10:49:16AM +0900, Masami Hiramatsu wrote:

On Wed, 25 Jan 2017 13:32:01 +
Will Deacon  wrote:


On Wed, Jan 25, 2017 at 07:23:11AM +, He Kuang wrote:

Since HAVE_KPROBES can be enabled in arm64, this patch introduces
regs_query_register_offset() to convert register name to offset for
arm64, so the BPF prologue feature is ready to use.

This patch also changes the 'dwarfnum' to 'offset' in register table,
so the related functions are consistent with x86.

Wouldn't it be an awful lot simpler just to leave the code as-is, and
implement regs_query_register_offset in the same way that we implement
get_arch_regstr but return the dwarfnum?

No, since the offset is not same as dwarfnum.

With this style, the index of array becomes the dwarfnum (the index of
each register defined by DWARF) and the "offset" member means the
byte-offset of the register in (user_)pt_regs. Those should be different.

Ok, then do it as two patches then, rather than introduce functionality
along with the renaming.


I don't really see the point of all the refactoring.

Also, from the maintenance point of view, this rewrite work makes
the code simply similar to x86 implementation, that will be easier to
maintain :)

Right, apart from the two howling bugs in the version that was nearly merged
initially :p. I tend to err on the "if it ain't broke, don't fix it" side
of the argument but if you really want the refactoring lets keep it as a
separate change.

So, He, can you do that? How do we proceed?

- Arnaldo


I split the patch as Will suggested and resend them. Sorry for
late response, just back from Spring festival.




Re: [PATCH 2/2 v2] perf tools: Enable bpf prologue for arm64

2017-02-03 Thread Hekuang

hi,

在 2017/1/27 3:31, Arnaldo Carvalho de Melo 写道:

Em Thu, Jan 26, 2017 at 04:52:12PM +, Will Deacon escreveu:

On Thu, Jan 26, 2017 at 10:49:16AM +0900, Masami Hiramatsu wrote:

On Wed, 25 Jan 2017 13:32:01 +
Will Deacon  wrote:


On Wed, Jan 25, 2017 at 07:23:11AM +, He Kuang wrote:

Since HAVE_KPROBES can be enabled in arm64, this patch introduces
regs_query_register_offset() to convert register name to offset for
arm64, so the BPF prologue feature is ready to use.

This patch also changes the 'dwarfnum' to 'offset' in register table,
so the related functions are consistent with x86.

Wouldn't it be an awful lot simpler just to leave the code as-is, and
implement regs_query_register_offset in the same way that we implement
get_arch_regstr but return the dwarfnum?

No, since the offset is not same as dwarfnum.

With this style, the index of array becomes the dwarfnum (the index of
each register defined by DWARF) and the "offset" member means the
byte-offset of the register in (user_)pt_regs. Those should be different.

Ok, then do it as two patches then, rather than introduce functionality
along with the renaming.


I don't really see the point of all the refactoring.

Also, from the maintenance point of view, this rewrite work makes
the code simply similar to x86 implementation, that will be easier to
maintain :)

Right, apart from the two howling bugs in the version that was nearly merged
initially :p. I tend to err on the "if it ain't broke, don't fix it" side
of the argument but if you really want the refactoring lets keep it as a
separate change.

So, He, can you do that? How do we proceed?

- Arnaldo


I split the patch as Will suggested and resend them. Sorry for
late response, just back from Spring festival.




Re: [PATCH 2/2] perf tools: Introduce regs_query_register_offset() for arm64

2017-01-24 Thread Hekuang

hi

在 2017/1/25 3:09, Arnaldo Carvalho de Melo 写道:

Em Tue, Jan 24, 2017 at 06:25:18PM +, Will Deacon escreveu:

On Tue, Jan 24, 2017 at 10:30:15AM +, He Kuang wrote:

Since HAVE_KPROBES can be enabled in arm64, this patch introduces
regs_query_register_offset() to convert register name to offset for
arm64, so the BPF prologue feature is ready to use.

This patch also changes the 'dwarfnum' to 'offset' in register table,
so the related functions are consistent with x86.

Signed-off-by: He Kuang 
  

It would've been nice to have been cc'd on this. In future, please at least
cc linux-arm-kernel for patches directly changing arm/arm64 code.
  

+   GPR_OFFSET_NAME(30),
+   {.name = "lr", .offset = offsetof(struct user_pt_regs, regs[30])},
+   REG_OFFSET_NAME(sp),
  

Don't sp and lr need the leading '%'?
  

+   REG_OFFSET_NAME(pc),
+   REG_OFFSET_NAME(pstate),
  

The AArch64 DWARF spec says that DWARF register 32 is "RESERVED" and
register 33 is the ELR, so these pc/pstate entries are wrong.
  

However, with those changes, I think this patch can simply be ignored and
mainline is doing the right thing.

Ok, thanks for checking, dropping this patch then.

- Arnaldo



The purpose of this patch is mainly on enable bpf prologue on arm64,
a new v2 version is sent and fix the problem mentioned by Will, thank
you for reviewing this.



Re: [PATCH 2/2] perf tools: Introduce regs_query_register_offset() for arm64

2017-01-24 Thread Hekuang

hi

在 2017/1/25 3:09, Arnaldo Carvalho de Melo 写道:

Em Tue, Jan 24, 2017 at 06:25:18PM +, Will Deacon escreveu:

On Tue, Jan 24, 2017 at 10:30:15AM +, He Kuang wrote:

Since HAVE_KPROBES can be enabled in arm64, this patch introduces
regs_query_register_offset() to convert register name to offset for
arm64, so the BPF prologue feature is ready to use.

This patch also changes the 'dwarfnum' to 'offset' in register table,
so the related functions are consistent with x86.

Signed-off-by: He Kuang 
  

It would've been nice to have been cc'd on this. In future, please at least
cc linux-arm-kernel for patches directly changing arm/arm64 code.
  

+   GPR_OFFSET_NAME(30),
+   {.name = "lr", .offset = offsetof(struct user_pt_regs, regs[30])},
+   REG_OFFSET_NAME(sp),
  

Don't sp and lr need the leading '%'?
  

+   REG_OFFSET_NAME(pc),
+   REG_OFFSET_NAME(pstate),
  

The AArch64 DWARF spec says that DWARF register 32 is "RESERVED" and
register 33 is the ELR, so these pc/pstate entries are wrong.
  

However, with those changes, I think this patch can simply be ignored and
mainline is doing the right thing.

Ok, thanks for checking, dropping this patch then.

- Arnaldo



The purpose of this patch is mainly on enable bpf prologue on arm64,
a new v2 version is sent and fix the problem mentioned by Will, thank
you for reviewing this.



Re: [PATCH 2/2] perf script: Don't disable use_callchain if input is pipe

2016-08-15 Thread Hekuang

ping.

在 2016/8/4 19:25, He Kuang 写道:

Because perf data from pipe do not have a header with evsel attr, we
should not check that and disable symbol_conf.use_callchain. Otherwise,
perf script won't show callchains even if the data stream contains
callchain.

Before:
   $ perf record -g -o - uname |perf script
   Linux
   [ perf record: Woken up 1 times to write data ]
   [ perf record: Captured and wrote 0.000 MB - ]
   uname  1828 182630.186578:  25 cpu-clock:  ..b9499 setup_arg_pages
   uname  1828 182630.186850:  25 cpu-clock:  ..83b20 ___might_sleep
   uname  1828 182630.187153:  25 cpu-clock:  ..4b6be file_map_prot_ch
   ...

After:
   $ perf record -g -o - uname |perf script
   Linux
   [ perf record: Woken up 1 times to write data ]
   [ perf record: Captured and wrote 0.000 MB - ]
   uname  1833 182675.927099: 25 cpu-clock:
   ba5520 _raw_spin_lock+0xfe200040 ([kernel.kallsyms])
   389dd4 expand_downwards+0xfe200154 ([kernel.kallsyms])
   389f34 expand_stack+0xfe200024 ([kernel.kallsyms])
   3b957e setup_arg_pages+0xfe20019e ([kernel.kallsyms])
   40c80f load_elf_binary+0xfe20042f ([kernel.kallsyms])
   ...

Signed-off-by: He Kuang 
---
  tools/perf/builtin-script.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index d47aef9..ec8df8f 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -371,14 +371,16 @@ static int perf_session__check_output_opt(struct 
perf_session *session)
  
  	if (!no_callchain) {

bool use_callchain = false;
+   bool not_pipe = false;
  
  		evlist__for_each_entry(session->evlist, evsel) {

+   not_pipe = true;
if (evsel->attr.sample_type & PERF_SAMPLE_CALLCHAIN) {
use_callchain = true;
break;
}
}
-   if (!use_callchain)
+   if (not_pipe && !use_callchain)
symbol_conf.use_callchain = false;
}
  





Re: [PATCH 2/2] perf script: Don't disable use_callchain if input is pipe

2016-08-15 Thread Hekuang

ping.

在 2016/8/4 19:25, He Kuang 写道:

Because perf data from pipe do not have a header with evsel attr, we
should not check that and disable symbol_conf.use_callchain. Otherwise,
perf script won't show callchains even if the data stream contains
callchain.

Before:
   $ perf record -g -o - uname |perf script
   Linux
   [ perf record: Woken up 1 times to write data ]
   [ perf record: Captured and wrote 0.000 MB - ]
   uname  1828 182630.186578:  25 cpu-clock:  ..b9499 setup_arg_pages
   uname  1828 182630.186850:  25 cpu-clock:  ..83b20 ___might_sleep
   uname  1828 182630.187153:  25 cpu-clock:  ..4b6be file_map_prot_ch
   ...

After:
   $ perf record -g -o - uname |perf script
   Linux
   [ perf record: Woken up 1 times to write data ]
   [ perf record: Captured and wrote 0.000 MB - ]
   uname  1833 182675.927099: 25 cpu-clock:
   ba5520 _raw_spin_lock+0xfe200040 ([kernel.kallsyms])
   389dd4 expand_downwards+0xfe200154 ([kernel.kallsyms])
   389f34 expand_stack+0xfe200024 ([kernel.kallsyms])
   3b957e setup_arg_pages+0xfe20019e ([kernel.kallsyms])
   40c80f load_elf_binary+0xfe20042f ([kernel.kallsyms])
   ...

Signed-off-by: He Kuang 
---
  tools/perf/builtin-script.c | 4 +++-
  1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index d47aef9..ec8df8f 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -371,14 +371,16 @@ static int perf_session__check_output_opt(struct 
perf_session *session)
  
  	if (!no_callchain) {

bool use_callchain = false;
+   bool not_pipe = false;
  
  		evlist__for_each_entry(session->evlist, evsel) {

+   not_pipe = true;
if (evsel->attr.sample_type & PERF_SAMPLE_CALLCHAIN) {
use_callchain = true;
break;
}
}
-   if (!use_callchain)
+   if (not_pipe && !use_callchain)
symbol_conf.use_callchain = false;
}
  





Re: perf unwind: Odd message about x86 unwind

2016-06-30 Thread Hekuang

hi

在 2016/6/30 20:06, Arnaldo Carvalho de Melo 写道:

Hi He,

While testing a patch by Peter Zijlstra to the --stdio
annotation code I came accross these messages:

[acme@jouet linux]$ perf annotate __vdso_gettimeofday 2>&1 | head -20
unwind: target platform=x86 is not supported
unwind: target platform=x86 is not supported
unwind: target platform=x86 is not supported
unwind: target platform=x86 is not supported
  Percent | Source code & Disassembly of perf-vdso.so-E5tFUx for cycles:u
-
  :
  :
  :
  : Disassembly of section .text:
  :
  : 0cd0 <__vdso_gettimeofday@@LINUX_2.6>:
 0.00 :   cd0:   push   %rbp
 0.00 :   cd1:   mov%rsp,%rbp
 0.00 :   cd4:   push   %r15
 0.00 :   cd6:   push   %r14
 0.00 :   cd8:   push   %r13
 0.00 :   cda:   push   %r12
 0.00 :   cdc:   push   %rbx
 0.00 :   cdd:   sub$0x10,%rsp
[acme@jouet linux]$

And bisected it down to:

commit 52ffe0ff02fc053a025c381d5808e9ecd3206dfe
Author: He Kuang 
Date:   Fri Jun 3 03:33:22 2016 +

 perf callchain: Support x86 target platform
 
 Support x86(32-bit) cross platform callchain unwind.
 
 Signed-off-by: He Kuang 

 Acked-by: Jiri Olsa 

--

The source code where this message is emitted is:

struct unwind_libunwind_ops __weak *local_unwind_libunwind_ops;

unwind__prepare_access()
{
struct unwind_libunwind_ops *ops = local_unwind_libunwind_ops;

if (!strcmp(arch, "x86")) {
 if (dso_type != DSO__TYPE_64BIT)
 ops = x86_32_unwind_libunwind_ops;
 }

 if (!ops) {
 pr_err("unwind: target platform=%s is not supported\n", arch);
 return -1;
 }

So, this should fallback to local_unwind_libunwind_ops, why is this not being
set properly?

Feature detection says:

... libunwind: [ on  ]
...libdw-dwarf-unwind: [ on  ]

This is:

[acme@jouet linux]$ uname -a
Linux jouet 4.5.7-300.fc24.x86_64 #1 SMP Wed Jun 8 18:12:45 UTC 2016 x86_64 
x86_64 x86_64 GNU/Linuxo

Can you please check this?


I think it's because of the perf.data contains both 32bit/64bit
threads while perf is only build with local unwind(unwind for
x86_64).

Then I have a simple test for this:

  $ file hello_x86_64
  hello_x86_64: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), 
dynamically linked (uses shared libs), for GNU/Linux 3.0.0, 
BuildID[sha1]=adc94932cbcb2be80f3b62df530b8c109f40478a, not strippe


  $ file hello_i686
  hello_i686: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), 
dynamically linked (uses shared libs), for GNU/Linux 4.5.0, not stripped


  $ perf record -o perf.data.x86_64_singlethreadhello_x86_64
  $ perf record -o perf.data.x86_64_allcpu   -a hello_x86_64
  $ perf record -o perf.data.i686_singlethread  hello_i686
  $ perf record -o perf.data.i686_allcpu -a hello_i686

Now I get 4 perf.data files:

  IDname32bit64bit

  1perf.data.x86_64_singlethreadNY
  2perf.data.x86_64_allcpuNY
  3perf.data.i686_singlethreadYN
  4perf.data.i686_allcpuYY

Because perf only supports local unwind(for x86_64), cases
contain 32bit thread should show the warnings, i.e. case3 and
case4. And perf.data contains only 32bit threads will show nothing
except the warnings while the one contains both 32bit/64bit will
show warnings for 32bit thread and annoate the 64bit thread.

Then check:

  $ perf annotate -i perf.data.x86_64_singlethread

   Percent |  Source code & Disassembly of hello for cycles:pp
  
   :
   :
   :
   :  Disassembly of section .text:
   :
   :  004005c0 :
   :  fib():
   :  #endif

  $ perf annotate -i perf.data.x86_64_allcpu

  Percent |  Source code & Disassembly of hello for cycles:pp
  
   :
   :
   :
   :  Disassembly of section .text:
   :
   :  004005c0 :
   :  fib():
   :  #endif
   :
   :  volatile int z = 0;
   :
   :  int fib(int x)

  $ perf annotate -i perf.data.i686_singlethread

  unwind: target platform=x86 is not supported
  unwind: target platform=x86 is not supported
  unwind: target platform=x86 is not supported

  $ perf annotate -i perf.data.i686_allcpu

  unwind: target platform=x86 is not supported
  unwind: target platform=x86 is not supported
  unwind: target platform=x86 

Re: perf unwind: Odd message about x86 unwind

2016-06-30 Thread Hekuang

hi

在 2016/6/30 20:06, Arnaldo Carvalho de Melo 写道:

Hi He,

While testing a patch by Peter Zijlstra to the --stdio
annotation code I came accross these messages:

[acme@jouet linux]$ perf annotate __vdso_gettimeofday 2>&1 | head -20
unwind: target platform=x86 is not supported
unwind: target platform=x86 is not supported
unwind: target platform=x86 is not supported
unwind: target platform=x86 is not supported
  Percent | Source code & Disassembly of perf-vdso.so-E5tFUx for cycles:u
-
  :
  :
  :
  : Disassembly of section .text:
  :
  : 0cd0 <__vdso_gettimeofday@@LINUX_2.6>:
 0.00 :   cd0:   push   %rbp
 0.00 :   cd1:   mov%rsp,%rbp
 0.00 :   cd4:   push   %r15
 0.00 :   cd6:   push   %r14
 0.00 :   cd8:   push   %r13
 0.00 :   cda:   push   %r12
 0.00 :   cdc:   push   %rbx
 0.00 :   cdd:   sub$0x10,%rsp
[acme@jouet linux]$

And bisected it down to:

commit 52ffe0ff02fc053a025c381d5808e9ecd3206dfe
Author: He Kuang 
Date:   Fri Jun 3 03:33:22 2016 +

 perf callchain: Support x86 target platform
 
 Support x86(32-bit) cross platform callchain unwind.
 
 Signed-off-by: He Kuang 

 Acked-by: Jiri Olsa 

--

The source code where this message is emitted is:

struct unwind_libunwind_ops __weak *local_unwind_libunwind_ops;

unwind__prepare_access()
{
struct unwind_libunwind_ops *ops = local_unwind_libunwind_ops;

if (!strcmp(arch, "x86")) {
 if (dso_type != DSO__TYPE_64BIT)
 ops = x86_32_unwind_libunwind_ops;
 }

 if (!ops) {
 pr_err("unwind: target platform=%s is not supported\n", arch);
 return -1;
 }

So, this should fallback to local_unwind_libunwind_ops, why is this not being
set properly?

Feature detection says:

... libunwind: [ on  ]
...libdw-dwarf-unwind: [ on  ]

This is:

[acme@jouet linux]$ uname -a
Linux jouet 4.5.7-300.fc24.x86_64 #1 SMP Wed Jun 8 18:12:45 UTC 2016 x86_64 
x86_64 x86_64 GNU/Linuxo

Can you please check this?


I think it's because of the perf.data contains both 32bit/64bit
threads while perf is only build with local unwind(unwind for
x86_64).

Then I have a simple test for this:

  $ file hello_x86_64
  hello_x86_64: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), 
dynamically linked (uses shared libs), for GNU/Linux 3.0.0, 
BuildID[sha1]=adc94932cbcb2be80f3b62df530b8c109f40478a, not strippe


  $ file hello_i686
  hello_i686: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), 
dynamically linked (uses shared libs), for GNU/Linux 4.5.0, not stripped


  $ perf record -o perf.data.x86_64_singlethreadhello_x86_64
  $ perf record -o perf.data.x86_64_allcpu   -a hello_x86_64
  $ perf record -o perf.data.i686_singlethread  hello_i686
  $ perf record -o perf.data.i686_allcpu -a hello_i686

Now I get 4 perf.data files:

  IDname32bit64bit

  1perf.data.x86_64_singlethreadNY
  2perf.data.x86_64_allcpuNY
  3perf.data.i686_singlethreadYN
  4perf.data.i686_allcpuYY

Because perf only supports local unwind(for x86_64), cases
contain 32bit thread should show the warnings, i.e. case3 and
case4. And perf.data contains only 32bit threads will show nothing
except the warnings while the one contains both 32bit/64bit will
show warnings for 32bit thread and annoate the 64bit thread.

Then check:

  $ perf annotate -i perf.data.x86_64_singlethread

   Percent |  Source code & Disassembly of hello for cycles:pp
  
   :
   :
   :
   :  Disassembly of section .text:
   :
   :  004005c0 :
   :  fib():
   :  #endif

  $ perf annotate -i perf.data.x86_64_allcpu

  Percent |  Source code & Disassembly of hello for cycles:pp
  
   :
   :
   :
   :  Disassembly of section .text:
   :
   :  004005c0 :
   :  fib():
   :  #endif
   :
   :  volatile int z = 0;
   :
   :  int fib(int x)

  $ perf annotate -i perf.data.i686_singlethread

  unwind: target platform=x86 is not supported
  unwind: target platform=x86 is not supported
  unwind: target platform=x86 is not supported

  $ perf annotate -i perf.data.i686_allcpu

  unwind: target platform=x86 is not supported
  unwind: target platform=x86 is not supported
  unwind: target platform=x86 is not supported

  Percent |  Source code & 

Re: [RFC PATCH v2 00/26] perf tools: Support uBPF script

2016-06-29 Thread Hekuang

hi

在 2016/6/28 22:57, Alexei Starovoitov 写道:


 return 0;
  }
@@ -465,7 +465,7 @@ EXPORT_SYMBOL_GPL(__bpf_call_base);
   *
   * Decode and execute eBPF instructions.
   */
-static unsigned int __bpf_prog_run(void *ctx, const struct bpf_insn *insn)
+unsigned int __bpf_prog_run(void *ctx, const struct bpf_insn *insn)
yes. that is good.


Also I think the prior experience taught us that sharing code between
kernel and user space will have lots of headaches long term.
I think it makes more sense to use bcc approach. Just have c+py
or c+lua or c+c. llvm has x86 backend too. If you integrate
clang/llvm (bcc approach) you can compile different functions with
different backends... if you don't want to embed the compiler,
have two .c files. Compile one for bpf target and another for native.

I still think that what two .c files without embeded llvm or
one .c with embedded is a better way.
You'll have full C that is fast on x86 or arm instead of
executing things in ubpf.
Or use py/lua wrappers. Equally easy.


Our goal is the same as you described, that to have one .c file
and embeded llvm into perf for compiling it to bpf target for
kernel and native for userspace.

But there's two problems we may encounter by this way on the
phone, which is the most common scenario our work focus on.

The first one is the size of bcc/llvm library. It's more than
800MB for libbcc.so and I guess the llvm part takes most of
them. Shortly we can run perf as a daemon after the
overwrite/control channel be merged (wangnan's recently patches),
such a huge memory consumption is not acceptable.

Second, I've browsed the bcc source briefly and see that there's
two frontend for loading .b and .c, we have to integrate the x86
backend for compiling bpf to native code. That's possible but we
still need extra works and it is not ready to use for now.

Then we have two other approaches, the first is as 'ubpf v2'
which uses one .c file and introduces bpf vm to perf, the second
is like you said, use two .c files and compile userspace bpf to
native code by using llvm externally.

Both the two ways are easy to implement, but we prefer the first
one between them because it uses one .c file which is the same as
our final approach, and it does not face the huge memory
consumption problem, finally, after we solve problems on embeded
llvm in perf and lower the memory consumption, we can keep the
user interface and replace the bpf vm to llvm
frontend+backend.

So what's your opinion on this?

Thank you.



Re: [RFC PATCH v2 00/26] perf tools: Support uBPF script

2016-06-29 Thread Hekuang

hi

在 2016/6/28 22:57, Alexei Starovoitov 写道:


 return 0;
  }
@@ -465,7 +465,7 @@ EXPORT_SYMBOL_GPL(__bpf_call_base);
   *
   * Decode and execute eBPF instructions.
   */
-static unsigned int __bpf_prog_run(void *ctx, const struct bpf_insn *insn)
+unsigned int __bpf_prog_run(void *ctx, const struct bpf_insn *insn)
yes. that is good.


Also I think the prior experience taught us that sharing code between
kernel and user space will have lots of headaches long term.
I think it makes more sense to use bcc approach. Just have c+py
or c+lua or c+c. llvm has x86 backend too. If you integrate
clang/llvm (bcc approach) you can compile different functions with
different backends... if you don't want to embed the compiler,
have two .c files. Compile one for bpf target and another for native.

I still think that what two .c files without embeded llvm or
one .c with embedded is a better way.
You'll have full C that is fast on x86 or arm instead of
executing things in ubpf.
Or use py/lua wrappers. Equally easy.


Our goal is the same as you described, that to have one .c file
and embeded llvm into perf for compiling it to bpf target for
kernel and native for userspace.

But there's two problems we may encounter by this way on the
phone, which is the most common scenario our work focus on.

The first one is the size of bcc/llvm library. It's more than
800MB for libbcc.so and I guess the llvm part takes most of
them. Shortly we can run perf as a daemon after the
overwrite/control channel be merged (wangnan's recently patches),
such a huge memory consumption is not acceptable.

Second, I've browsed the bcc source briefly and see that there's
two frontend for loading .b and .c, we have to integrate the x86
backend for compiling bpf to native code. That's possible but we
still need extra works and it is not ready to use for now.

Then we have two other approaches, the first is as 'ubpf v2'
which uses one .c file and introduces bpf vm to perf, the second
is like you said, use two .c files and compile userspace bpf to
native code by using llvm externally.

Both the two ways are easy to implement, but we prefer the first
one between them because it uses one .c file which is the same as
our final approach, and it does not face the huge memory
consumption problem, finally, after we solve problems on embeded
llvm in perf and lower the memory consumption, we can keep the
user interface and replace the bpf vm to llvm
frontend+backend.

So what's your opinion on this?

Thank you.



Re: [RFC PATCH v2 00/26] perf tools: Support uBPF script

2016-06-28 Thread Hekuang



在 2016/6/27 4:48, Alexei Starovoitov 写道:

On Sun, Jun 26, 2016 at 11:20:52AM +, He Kuang wrote:

  bounds check just like ubpf library does.

hmm. I don't think I suggested to hack bpf/core.c into separate file
and compile it for userspace...


Maybe I misunderstood your suggestion. Now I just let perf check 
bpf/core.o in
kernel output directory, if it exsits, perf will link it. The missing 
functions referenced by

bpf/core.o can be defined empty in perf.

The above way leaves two minor changes in bpf/core.c:

diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index b94a365..0fc6c23 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -452,7 +452,7 @@ struct bpf_prog *bpf_jit_blind_constants(struct 
bpf_prog *prog)

  * therefore keeping it non-static as well; will also be used by JITs
  * anyway later on, so do not let the compiler omit it.
  */
-noinline u64 __bpf_call_base(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5)
+noinline u64 __weak __bpf_call_base(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5)
 {
return 0;
 }
@@ -465,7 +465,7 @@ EXPORT_SYMBOL_GPL(__bpf_call_base);
  *
  * Decode and execute eBPF instructions.
  */
-static unsigned int __bpf_prog_run(void *ctx, const struct bpf_insn *insn)
+unsigned int __bpf_prog_run(void *ctx, const struct bpf_insn *insn)
 {
u64 stack[MAX_BPF_STACK / sizeof(u64)];
u64 regs[MAX_BPF_REG], tmp;

How about this?

Thank you.


Also I think the prior experience taught us that sharing code between
kernel and user space will have lots of headaches long term.
I think it makes more sense to use bcc approach. Just have c+py
or c+lua or c+c. llvm has x86 backend too. If you integrate
clang/llvm (bcc approach) you can compile different functions with
different backends... if you don't want to embed the compiler,
have two .c files. Compile one for bpf target and another for native.







Re: [RFC PATCH v2 00/26] perf tools: Support uBPF script

2016-06-28 Thread Hekuang



在 2016/6/27 4:48, Alexei Starovoitov 写道:

On Sun, Jun 26, 2016 at 11:20:52AM +, He Kuang wrote:

  bounds check just like ubpf library does.

hmm. I don't think I suggested to hack bpf/core.c into separate file
and compile it for userspace...


Maybe I misunderstood your suggestion. Now I just let perf check 
bpf/core.o in
kernel output directory, if it exsits, perf will link it. The missing 
functions referenced by

bpf/core.o can be defined empty in perf.

The above way leaves two minor changes in bpf/core.c:

diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index b94a365..0fc6c23 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -452,7 +452,7 @@ struct bpf_prog *bpf_jit_blind_constants(struct 
bpf_prog *prog)

  * therefore keeping it non-static as well; will also be used by JITs
  * anyway later on, so do not let the compiler omit it.
  */
-noinline u64 __bpf_call_base(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5)
+noinline u64 __weak __bpf_call_base(u64 r1, u64 r2, u64 r3, u64 r4, u64 r5)
 {
return 0;
 }
@@ -465,7 +465,7 @@ EXPORT_SYMBOL_GPL(__bpf_call_base);
  *
  * Decode and execute eBPF instructions.
  */
-static unsigned int __bpf_prog_run(void *ctx, const struct bpf_insn *insn)
+unsigned int __bpf_prog_run(void *ctx, const struct bpf_insn *insn)
 {
u64 stack[MAX_BPF_STACK / sizeof(u64)];
u64 regs[MAX_BPF_REG], tmp;

How about this?

Thank you.


Also I think the prior experience taught us that sharing code between
kernel and user space will have lots of headaches long term.
I think it makes more sense to use bcc approach. Just have c+py
or c+lua or c+c. llvm has x86 backend too. If you integrate
clang/llvm (bcc approach) you can compile different functions with
different backends... if you don't want to embed the compiler,
have two .c files. Compile one for bpf target and another for native.







Re: [RFC PATCH v2 05/26] tools include: Sync math64.h and div64.h

2016-06-26 Thread Hekuang

hi

在 2016/6/27 5:08, Nilay Vaish 写道:

On 26 June 2016 at 06:20, He Kuang  wrote:

From: Wang Nan 

This patch copies "include/linux/math64.h" into
"tools/include/linux/math64.h" and copies
"include/asm-generic/div64.h" into
"tools/include/asm-generic/div64.h", to enable other libraries use
arithmetic operation defined in them.

This probably is a newbie question.  What prevents us from using the
header files directly?

--
Nilay


For not being influenced by kernel headers update too much, perf adopts
the header files in its own folder.

Thank you.







Re: [RFC PATCH v2 05/26] tools include: Sync math64.h and div64.h

2016-06-26 Thread Hekuang

hi

在 2016/6/27 5:08, Nilay Vaish 写道:

On 26 June 2016 at 06:20, He Kuang  wrote:

From: Wang Nan 

This patch copies "include/linux/math64.h" into
"tools/include/linux/math64.h" and copies
"include/asm-generic/div64.h" into
"tools/include/asm-generic/div64.h", to enable other libraries use
arithmetic operation defined in them.

This probably is a newbie question.  What prevents us from using the
header files directly?

--
Nilay


For not being influenced by kernel headers update too much, perf adopts
the header files in its own folder.

Thank you.







Re: [RFC PATCH v2 00/26] perf tools: Support uBPF script

2016-06-26 Thread Hekuang

hi

在 2016/6/27 4:48, Alexei Starovoitov 写道:

On Sun, Jun 26, 2016 at 11:20:52AM +, He Kuang wrote:

This patchset is based on Wang Nan's v1:
  http://thread.gmane.org/gmane.linux.kernel/2203717/focus=2203707

"""
   This patch set allows to perf invoke some user space BPF scripts on
   some point. uBPF scripts and kernel BPF scripts reside in one BPF
   object.  They communicate with each other with BPF maps. uBPF
   scripts can invoke helper functions provided by perf.
   
   At least following new features can be achieved based on uBPF

   support:
   
1) Report statistical result:


   Like DTrace, perf print statistical report before quit. No need
   to extract data using 'perf report'. Statistical method is
   controled by user.
   
2) Control perf's behavior:


   Dynamically adjust period of different events. Policy is defined
   by user.
"""

and modified by following the reviewers' suggestions.

v1-v2:

   - Split bpf vm part out of kernel/bpf/core.c and link to it instead
 of using ubpf library(Suggested by Alexei Starovoitov). And add
 runtime bounds check just like ubpf library does.

hmm. I don't think I suggested to hack bpf/core.c into separate file
and compile it for userspace...

"""

Also ubpf was written from scratch with apache2, while perf is gpl,
so you can just link kernel/bpf/core.o directly instead of using external
libraries.
"""
This is your comment on ubpf v1 thread.

I thought you was suggesting to use code in kernel/bpf/core.o,
but because there're difference in __bpf_prog_run() between userspace
and kernel, for example the __bpf_call_base is used in kernel,
in userspace we get funcs from ubpf function list, we have to modify
the existing code in kernel/bpf/core.c.

I've got the source code of 'bcc' project, but it seems that bcc does not
involve bpf virtual machine, so if we do not use 'kernel/bpf/core.o' solution,
and can't use 'ubpf' because of the license reason, any other choices?

Thank you.


Also I think the prior experience taught us that sharing code between
kernel and user space will have lots of headaches long term.
I think it makes more sense to use bcc approach. Just have c+py
or c+lua or c+c. llvm has x86 backend too. If you integrate
clang/llvm (bcc approach) you can compile different functions with
different backends... if you don't want to embed the compiler,
have two .c files. Compile one for bpf target and another for native.







Re: [RFC PATCH v2 00/26] perf tools: Support uBPF script

2016-06-26 Thread Hekuang

hi

在 2016/6/27 4:48, Alexei Starovoitov 写道:

On Sun, Jun 26, 2016 at 11:20:52AM +, He Kuang wrote:

This patchset is based on Wang Nan's v1:
  http://thread.gmane.org/gmane.linux.kernel/2203717/focus=2203707

"""
   This patch set allows to perf invoke some user space BPF scripts on
   some point. uBPF scripts and kernel BPF scripts reside in one BPF
   object.  They communicate with each other with BPF maps. uBPF
   scripts can invoke helper functions provided by perf.
   
   At least following new features can be achieved based on uBPF

   support:
   
1) Report statistical result:


   Like DTrace, perf print statistical report before quit. No need
   to extract data using 'perf report'. Statistical method is
   controled by user.
   
2) Control perf's behavior:


   Dynamically adjust period of different events. Policy is defined
   by user.
"""

and modified by following the reviewers' suggestions.

v1-v2:

   - Split bpf vm part out of kernel/bpf/core.c and link to it instead
 of using ubpf library(Suggested by Alexei Starovoitov). And add
 runtime bounds check just like ubpf library does.

hmm. I don't think I suggested to hack bpf/core.c into separate file
and compile it for userspace...

"""

Also ubpf was written from scratch with apache2, while perf is gpl,
so you can just link kernel/bpf/core.o directly instead of using external
libraries.
"""
This is your comment on ubpf v1 thread.

I thought you was suggesting to use code in kernel/bpf/core.o,
but because there're difference in __bpf_prog_run() between userspace
and kernel, for example the __bpf_call_base is used in kernel,
in userspace we get funcs from ubpf function list, we have to modify
the existing code in kernel/bpf/core.c.

I've got the source code of 'bcc' project, but it seems that bcc does not
involve bpf virtual machine, so if we do not use 'kernel/bpf/core.o' solution,
and can't use 'ubpf' because of the license reason, any other choices?

Thank you.


Also I think the prior experience taught us that sharing code between
kernel and user space will have lots of headaches long term.
I think it makes more sense to use bcc approach. Just have c+py
or c+lua or c+c. llvm has x86 backend too. If you integrate
clang/llvm (bcc approach) you can compile different functions with
different backends... if you don't want to embed the compiler,
have two .c files. Compile one for bpf target and another for native.







Re: [PATCH 5/5] perf tools: Fix NULL pointer deference when vdso not found

2016-06-22 Thread Hekuang



在 2016/6/23 10:02, Wangnan (F) 写道:

Hi,

This patch fixes a real crash problem when we do 'perf report'
on an arm64 platform with arm32 program.
It is introduced by commit f9b2bdf228 ("perf tools: Find vdso
with the consider of cross-platform"). From dmesg report, perf
crashes in dso__type() because dso is NULL.

Still don't know why on x86 it never crash, but it is obviously


This is because the fault only occured while
dso_type==DSO__TYPE_32BIT, run 64bit executable won't enter the
fault branch, but if we run a 32bit executable on x86_64, this
bug can be reproduced easily.

  # file ~/hello
  ~/hello: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), 
dynamically linked (uses shared libs), for GNU/Linux 4.5.0, not stripped


  # perf record -g hello
  Segmentation fault

Thank you.


that we need to check the return vaule from __dso__find(): it can
be NULL.

So please consider pulling.

Thank you.

On 2016/6/22 14:57, He Kuang wrote:

We should check if 'dso' is a null pointer before passing it to the
function dso__type(), otherwise a segfault will be raised in
dso__data_get_fd(). In function machine__find_vdso(), the return value
checking of 'dso' is missed and this patch fixes this issue.

Signed-off-by: He Kuang 
---
  tools/perf/util/vdso.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/vdso.c b/tools/perf/util/vdso.c
index 8f81c41..7bdcad4 100644
--- a/tools/perf/util/vdso.c
+++ b/tools/perf/util/vdso.c
@@ -296,7 +296,7 @@ static struct dso *machine__find_vdso(struct 
machine *machine,

  if (!dso) {
  dso = __dsos__find(>dsos, DSO__NAME_VDSO,
 true);
-if (dso_type != dso__type(dso, machine))
+if (dso && dso_type != dso__type(dso, machine))
  dso = NULL;
  }
  break;









Re: [PATCH 5/5] perf tools: Fix NULL pointer deference when vdso not found

2016-06-22 Thread Hekuang



在 2016/6/23 10:02, Wangnan (F) 写道:

Hi,

This patch fixes a real crash problem when we do 'perf report'
on an arm64 platform with arm32 program.
It is introduced by commit f9b2bdf228 ("perf tools: Find vdso
with the consider of cross-platform"). From dmesg report, perf
crashes in dso__type() because dso is NULL.

Still don't know why on x86 it never crash, but it is obviously


This is because the fault only occured while
dso_type==DSO__TYPE_32BIT, run 64bit executable won't enter the
fault branch, but if we run a 32bit executable on x86_64, this
bug can be reproduced easily.

  # file ~/hello
  ~/hello: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), 
dynamically linked (uses shared libs), for GNU/Linux 4.5.0, not stripped


  # perf record -g hello
  Segmentation fault

Thank you.


that we need to check the return vaule from __dso__find(): it can
be NULL.

So please consider pulling.

Thank you.

On 2016/6/22 14:57, He Kuang wrote:

We should check if 'dso' is a null pointer before passing it to the
function dso__type(), otherwise a segfault will be raised in
dso__data_get_fd(). In function machine__find_vdso(), the return value
checking of 'dso' is missed and this patch fixes this issue.

Signed-off-by: He Kuang 
---
  tools/perf/util/vdso.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/vdso.c b/tools/perf/util/vdso.c
index 8f81c41..7bdcad4 100644
--- a/tools/perf/util/vdso.c
+++ b/tools/perf/util/vdso.c
@@ -296,7 +296,7 @@ static struct dso *machine__find_vdso(struct 
machine *machine,

  if (!dso) {
  dso = __dsos__find(>dsos, DSO__NAME_VDSO,
 true);
-if (dso_type != dso__type(dso, machine))
+if (dso && dso_type != dso__type(dso, machine))
  dso = NULL;
  }
  break;









Re: [PATCH 0/3] Fixes on remote unwind

2016-06-20 Thread Hekuang



在 2016/6/8 18:15, He Kuang 写道:

The remote unwind can supported scenario where we collect on a x86_64
machine and want to do analysis on a ARM64 or x86-32 machine. Though
this is not tested, after Arnaldo questioned the above issue, I tested
and found a bug.

In util/unwind-libunwind-local.c, PERF_REG_SP/IP is used, but those
macros are assigned to the host platform, we should redefine them in
the wrapper file, for example in "util/libunwind/x86_32.c".

After fixing this problem, i686 perf.data can be parsed on aarch64
machine.

To Arnaldo, I noticed that this patch
"perf tools: Find vdso supporting cross-platform analysis"
is not on "origin/perf/unwind" branch, this patch was applied, please
check if it was missed, the above test needs this patch.

Thanks for solving this, and please have a look at the patches in
this thread.


He Kuang (3):
   perf unwind: Change macro names of perf register
   perf unwind: Fix wrongly used regs for x86_32 unwind
   perf unwind: Fix wrongly used regs for aarch64 unwind

  tools/perf/util/libunwind/arm64.c| 5 +
  tools/perf/util/libunwind/x86_32.c   | 6 ++
  tools/perf/util/unwind-libunwind-local.c | 6 --
  tools/perf/util/unwind.h | 9 +
  4 files changed, 24 insertions(+), 2 deletions(-)






Re: [PATCH 0/3] Fixes on remote unwind

2016-06-20 Thread Hekuang



在 2016/6/8 18:15, He Kuang 写道:

The remote unwind can supported scenario where we collect on a x86_64
machine and want to do analysis on a ARM64 or x86-32 machine. Though
this is not tested, after Arnaldo questioned the above issue, I tested
and found a bug.

In util/unwind-libunwind-local.c, PERF_REG_SP/IP is used, but those
macros are assigned to the host platform, we should redefine them in
the wrapper file, for example in "util/libunwind/x86_32.c".

After fixing this problem, i686 perf.data can be parsed on aarch64
machine.

To Arnaldo, I noticed that this patch
"perf tools: Find vdso supporting cross-platform analysis"
is not on "origin/perf/unwind" branch, this patch was applied, please
check if it was missed, the above test needs this patch.

Thanks for solving this, and please have a look at the patches in
this thread.


He Kuang (3):
   perf unwind: Change macro names of perf register
   perf unwind: Fix wrongly used regs for x86_32 unwind
   perf unwind: Fix wrongly used regs for aarch64 unwind

  tools/perf/util/libunwind/arm64.c| 5 +
  tools/perf/util/libunwind/x86_32.c   | 6 ++
  tools/perf/util/unwind-libunwind-local.c | 6 --
  tools/perf/util/unwind.h | 9 +
  4 files changed, 24 insertions(+), 2 deletions(-)






Re: [PATCH v2 1/2] tools include: Adopt byte ordering macros from byteorder/generic.h

2016-06-16 Thread Hekuang



在 2016/6/16 19:36, Jiri Olsa 写道:

On Thu, Jun 16, 2016 at 10:41:31AM +, He Kuang wrote:

From: Wang Nan 

This patch adopts the macros for byte order conversion from
"include/linux/byteorder/generic.h" to
"tools/include/linux/byteorder/generic.h"

tools/perf/MANIFEST is also updated for 'make perf-*-src-pkg'.

other than explanation for the kbuild test robot email:
   http://marc.info/?l=linux-kernel=146605819924421=2

I think that's because of yesterday I sent these two patches
separately while the second one deponds on the first.

Thank you.


it looks ok..

jirka






Re: [PATCH v2 1/2] tools include: Adopt byte ordering macros from byteorder/generic.h

2016-06-16 Thread Hekuang



在 2016/6/16 19:36, Jiri Olsa 写道:

On Thu, Jun 16, 2016 at 10:41:31AM +, He Kuang wrote:

From: Wang Nan 

This patch adopts the macros for byte order conversion from
"include/linux/byteorder/generic.h" to
"tools/include/linux/byteorder/generic.h"

tools/perf/MANIFEST is also updated for 'make perf-*-src-pkg'.

other than explanation for the kbuild test robot email:
   http://marc.info/?l=linux-kernel=146605819924421=2

I think that's because of yesterday I sent these two patches
separately while the second one deponds on the first.

Thank you.


it looks ok..

jirka






Re: [PATCH] tools/perf: fix the word selected in find_*_bit

2016-06-15 Thread Hekuang



在 2016/6/16 5:29, Arnaldo Carvalho de Melo 写道:

Em Thu, Jun 16, 2016 at 12:11:04AM +0300, Yury Norov escreveu:

On Wed, Jun 15, 2016 at 10:51:27PM +0300, Yury Norov wrote:

Maybe there already is some macro doing the conversion for you...

yes it is, cpu_to_le64() is what you want

Beware that the cpu_to_le64() in tools/perf is bogus, we need to grab a
copy from the kernel sources.

- Arnaldo


[PATCH 1/2] tools include: Sync byteorder/generic.h
[PATCH 2/2] tools include: Fix wrong macro definitions for cpu_to_le* 
for big endian


Here're two patches related to this issue, sorry for wrongly sent two more
reduntant mails.

Thank you.



Re: [PATCH] tools/perf: fix the word selected in find_*_bit

2016-06-15 Thread Hekuang



在 2016/6/16 5:29, Arnaldo Carvalho de Melo 写道:

Em Thu, Jun 16, 2016 at 12:11:04AM +0300, Yury Norov escreveu:

On Wed, Jun 15, 2016 at 10:51:27PM +0300, Yury Norov wrote:

Maybe there already is some macro doing the conversion for you...

yes it is, cpu_to_le64() is what you want

Beware that the cpu_to_le64() in tools/perf is bogus, we need to grab a
copy from the kernel sources.

- Arnaldo


[PATCH 1/2] tools include: Sync byteorder/generic.h
[PATCH 2/2] tools include: Fix wrong macro definitions for cpu_to_le* 
for big endian


Here're two patches related to this issue, sorry for wrongly sent two more
reduntant mails.

Thank you.



Re: [PATCH v3 1/7 UPDATE2] perf tools: Find vdso with the consider of cross-platform

2016-06-15 Thread Hekuang

hi

在 2016/6/15 21:34, Arnaldo Carvalho de Melo 写道:

Em Tue, May 17, 2016 at 09:04:54AM +, He Kuang escreveu:

There's a problem in machine__findnew_vdso(), vdso buildid generated
by a 32-bit machine stores it with the name 'vdso', but when
processing buildid on a 64-bit machine with the same 'perf.data', perf
will search for vdso named as 'vdso32' and get failed.

Without looking at the code, just trying to understand your patch by
reading your description: Why would we look for names if we have
build-ids? All this type and name comparasions seems wrong if we have a
build-id, no?

Adrian?

- Arnaldo


We should find a binary with the {name, buildid} pair, the key is
name not buildid.

There're more than one vdso binaries in system, we store
parts/all of them when recording and should find out which one to
use for unwinding in the 'perf script' stage, by comparing the
name and the thread's dso_type.

Thank you.


This patch tries to find the exsiting dsos in machine->dsos by thread
dso_type. 64-bit thread tries to find vdso with name 'vdso', because
all 64-bit vdso is named as that. 32-bit thread first tries to find
vdso with name 'vdso32' if this thread was run on 64-bit machine, if
failed, then it tries 'vdso' which indicates that the thread was run
on 32-bit machine when recording.

Signed-off-by: He Kuang 
---
  tools/perf/util/vdso.c | 40 +---
  1 file changed, 37 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/vdso.c b/tools/perf/util/vdso.c
index 44d440d..8f81c41 100644
--- a/tools/perf/util/vdso.c
+++ b/tools/perf/util/vdso.c
@@ -134,8 +134,6 @@ static struct dso *__machine__addnew_vdso(struct machine 
*machine, const char *s
return dso;
  }
  
-#if BITS_PER_LONG == 64

-
  static enum dso_type machine__thread_dso_type(struct machine *machine,
  struct thread *thread)
  {
@@ -156,6 +154,8 @@ static enum dso_type machine__thread_dso_type(struct 
machine *machine,
return dso_type;
  }
  
+#if BITS_PER_LONG == 64

+
  static int vdso__do_copy_compat(FILE *f, int fd)
  {
char buf[4096];
@@ -283,8 +283,38 @@ static int __machine__findnew_vdso_compat(struct machine 
*machine,
  
  #endif
  
+static struct dso *machine__find_vdso(struct machine *machine,

+ struct thread *thread)
+{
+   struct dso *dso = NULL;
+   enum dso_type dso_type;
+
+   dso_type = machine__thread_dso_type(machine, thread);
+   switch (dso_type) {
+   case DSO__TYPE_32BIT:
+   dso = __dsos__find(>dsos, DSO__NAME_VDSO32, true);
+   if (!dso) {
+   dso = __dsos__find(>dsos, DSO__NAME_VDSO,
+  true);
+   if (dso_type != dso__type(dso, machine))
+   dso = NULL;
+   }
+   break;
+   case DSO__TYPE_X32BIT:
+   dso = __dsos__find(>dsos, DSO__NAME_VDSOX32, true);
+   break;
+   case DSO__TYPE_64BIT:
+   case DSO__TYPE_UNKNOWN:
+   default:
+   dso = __dsos__find(>dsos, DSO__NAME_VDSO, true);
+   break;
+   }
+
+   return dso;
+}
+
  struct dso *machine__findnew_vdso(struct machine *machine,
- struct thread *thread __maybe_unused)
+ struct thread *thread)
  {
struct vdso_info *vdso_info;
struct dso *dso = NULL;
@@ -297,6 +327,10 @@ struct dso *machine__findnew_vdso(struct machine *machine,
if (!vdso_info)
goto out_unlock;
  
+	dso = machine__find_vdso(machine, thread);

+   if (dso)
+   goto out_unlock;
+
  #if BITS_PER_LONG == 64
if (__machine__findnew_vdso_compat(machine, thread, vdso_info, ))
goto out_unlock;
--
1.8.5.2





Re: [PATCH v3 1/7 UPDATE2] perf tools: Find vdso with the consider of cross-platform

2016-06-15 Thread Hekuang

hi

在 2016/6/15 21:34, Arnaldo Carvalho de Melo 写道:

Em Tue, May 17, 2016 at 09:04:54AM +, He Kuang escreveu:

There's a problem in machine__findnew_vdso(), vdso buildid generated
by a 32-bit machine stores it with the name 'vdso', but when
processing buildid on a 64-bit machine with the same 'perf.data', perf
will search for vdso named as 'vdso32' and get failed.

Without looking at the code, just trying to understand your patch by
reading your description: Why would we look for names if we have
build-ids? All this type and name comparasions seems wrong if we have a
build-id, no?

Adrian?

- Arnaldo


We should find a binary with the {name, buildid} pair, the key is
name not buildid.

There're more than one vdso binaries in system, we store
parts/all of them when recording and should find out which one to
use for unwinding in the 'perf script' stage, by comparing the
name and the thread's dso_type.

Thank you.


This patch tries to find the exsiting dsos in machine->dsos by thread
dso_type. 64-bit thread tries to find vdso with name 'vdso', because
all 64-bit vdso is named as that. 32-bit thread first tries to find
vdso with name 'vdso32' if this thread was run on 64-bit machine, if
failed, then it tries 'vdso' which indicates that the thread was run
on 32-bit machine when recording.

Signed-off-by: He Kuang 
---
  tools/perf/util/vdso.c | 40 +---
  1 file changed, 37 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/vdso.c b/tools/perf/util/vdso.c
index 44d440d..8f81c41 100644
--- a/tools/perf/util/vdso.c
+++ b/tools/perf/util/vdso.c
@@ -134,8 +134,6 @@ static struct dso *__machine__addnew_vdso(struct machine 
*machine, const char *s
return dso;
  }
  
-#if BITS_PER_LONG == 64

-
  static enum dso_type machine__thread_dso_type(struct machine *machine,
  struct thread *thread)
  {
@@ -156,6 +154,8 @@ static enum dso_type machine__thread_dso_type(struct 
machine *machine,
return dso_type;
  }
  
+#if BITS_PER_LONG == 64

+
  static int vdso__do_copy_compat(FILE *f, int fd)
  {
char buf[4096];
@@ -283,8 +283,38 @@ static int __machine__findnew_vdso_compat(struct machine 
*machine,
  
  #endif
  
+static struct dso *machine__find_vdso(struct machine *machine,

+ struct thread *thread)
+{
+   struct dso *dso = NULL;
+   enum dso_type dso_type;
+
+   dso_type = machine__thread_dso_type(machine, thread);
+   switch (dso_type) {
+   case DSO__TYPE_32BIT:
+   dso = __dsos__find(>dsos, DSO__NAME_VDSO32, true);
+   if (!dso) {
+   dso = __dsos__find(>dsos, DSO__NAME_VDSO,
+  true);
+   if (dso_type != dso__type(dso, machine))
+   dso = NULL;
+   }
+   break;
+   case DSO__TYPE_X32BIT:
+   dso = __dsos__find(>dsos, DSO__NAME_VDSOX32, true);
+   break;
+   case DSO__TYPE_64BIT:
+   case DSO__TYPE_UNKNOWN:
+   default:
+   dso = __dsos__find(>dsos, DSO__NAME_VDSO, true);
+   break;
+   }
+
+   return dso;
+}
+
  struct dso *machine__findnew_vdso(struct machine *machine,
- struct thread *thread __maybe_unused)
+ struct thread *thread)
  {
struct vdso_info *vdso_info;
struct dso *dso = NULL;
@@ -297,6 +327,10 @@ struct dso *machine__findnew_vdso(struct machine *machine,
if (!vdso_info)
goto out_unlock;
  
+	dso = machine__find_vdso(machine, thread);

+   if (dso)
+   goto out_unlock;
+
  #if BITS_PER_LONG == 64
if (__machine__findnew_vdso_compat(machine, thread, vdso_info, ))
goto out_unlock;
--
1.8.5.2





Re: [PATCH 0/3] Fixes on remote unwind

2016-06-12 Thread Hekuang



在 2016/6/8 18:15, He Kuang 写道:

The remote unwind can supported scenario where we collect on a x86_64
machine and want to do analysis on a ARM64 or x86-32 machine. Though
this is not tested, after Arnaldo questioned the above issue, I tested
and found a bug.

In util/unwind-libunwind-local.c, PERF_REG_SP/IP is used, but those
macros are assigned to the host platform, we should redefine them in
the wrapper file, for example in "util/libunwind/x86_32.c".

After fixing this problem, i686 perf.data can be parsed on aarch64
machine.

To Arnaldo, I noticed that this patch
"perf tools: Find vdso supporting cross-platform analysis"
is not on "origin/perf/unwind" branch, this patch was applied, please
check if it was missed, the above test needs this patch.
 I also checked tip.git, this patch is not there either, is this patch 
missed?

This is the previous git pull request mail:

https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1148601.html

Thank you.


Thank you.

He Kuang (3):
   perf unwind: Change macro names of perf register
   perf unwind: Fix wrongly used regs for x86_32 unwind
   perf unwind: Fix wrongly used regs for aarch64 unwind

  tools/perf/util/libunwind/arm64.c| 5 +
  tools/perf/util/libunwind/x86_32.c   | 6 ++
  tools/perf/util/unwind-libunwind-local.c | 6 --
  tools/perf/util/unwind.h | 9 +
  4 files changed, 24 insertions(+), 2 deletions(-)






Re: [PATCH 0/3] Fixes on remote unwind

2016-06-12 Thread Hekuang



在 2016/6/8 18:15, He Kuang 写道:

The remote unwind can supported scenario where we collect on a x86_64
machine and want to do analysis on a ARM64 or x86-32 machine. Though
this is not tested, after Arnaldo questioned the above issue, I tested
and found a bug.

In util/unwind-libunwind-local.c, PERF_REG_SP/IP is used, but those
macros are assigned to the host platform, we should redefine them in
the wrapper file, for example in "util/libunwind/x86_32.c".

After fixing this problem, i686 perf.data can be parsed on aarch64
machine.

To Arnaldo, I noticed that this patch
"perf tools: Find vdso supporting cross-platform analysis"
is not on "origin/perf/unwind" branch, this patch was applied, please
check if it was missed, the above test needs this patch.
 I also checked tip.git, this patch is not there either, is this patch 
missed?

This is the previous git pull request mail:

https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1148601.html

Thank you.


Thank you.

He Kuang (3):
   perf unwind: Change macro names of perf register
   perf unwind: Fix wrongly used regs for x86_32 unwind
   perf unwind: Fix wrongly used regs for aarch64 unwind

  tools/perf/util/libunwind/arm64.c| 5 +
  tools/perf/util/libunwind/x86_32.c   | 6 ++
  tools/perf/util/unwind-libunwind-local.c | 6 --
  tools/perf/util/unwind.h | 9 +
  4 files changed, 24 insertions(+), 2 deletions(-)






Re: [PATCH v9 00/14] Add support for remote unwind

2016-06-08 Thread Hekuang

hi

在 2016/6/8 3:44, Arnaldo Carvalho de Melo 写道:

Em Fri, Jun 03, 2016 at 09:06:29AM +0200, Jiri Olsa escreveu:

On Fri, Jun 03, 2016 at 03:33:09AM +, He Kuang wrote:

SNIP


For using remote libunwind libraries, reference this:
   http://thread.gmane.org/gmane.linux.kernel/2224430

and now we can use LIBUNWIND_DIR to specific custom dirctories
containing libunwind libs.

Acked-by: Jiri Olsa  for most patches except:

v9:
  - Change function unwind__register_ops() to static.
  - Move up unwind__prepare_access() in thread__insert_map() and save
map_groups__remove() call.
  - Enclose multiple line if/else into braces.
  - Fix miss modified function declaration for unwind__prepare_access()
in patch 10.

for patchset:

Acked-by: Jiri Olsa 

Ok, I'm applying it, after fixing 'perf test unwind', 'perf top --call-graph 
dwarf'
and 'perf trace --call-graph dwarf', but I have one question, is the
scenario where we collect on a x86_64 machine and want to do analysis on
a ARM64 or x86-32 machine supported? This should be the odd case now,

Yes, it's supported.

But I never tested this before, so I just compiled libunwind for
aarch64, and tested unwinding i686 perf.data on aarch64. Then I
found another issue I've considered but missed at some version of
this patch series.

In util/unwind-libunwind-local.c, PERF_REG_SP/IP is used, but
those macros are assigned to the host platform, we should
redefine them in the wrapper file, for example in
"util/libunwind/x86_32.c".

After fixing this problem, i686 perf.data can be parsed on
aarch64 machine.  Since you've already applied the v9 patches,
should I send patches based on the lastest tree as bug fixes or
just update v9 patches?

Thank you.

Here is the modified part:

diff --git a/tools/perf/util/libunwind/arm64.c 
b/tools/perf/util/libunwind/arm64.c

index 4fb5395..8a5c2fc 100644
--- a/tools/perf/util/libunwind/arm64.c
+++ b/tools/perf/util/libunwind/arm64.c
@@ -29,6 +29,11 @@
 #ifdef NO_LIBUNWIND_DEBUG_FRAME_AARCH64
 #define NO_LIBUNWIND_DEBUG_FRAME
 #endif
+
+#undef PERF_REG_IP
+#undef PERF_REG_SP
+#define PERF_REG_IP PERF_REG_ARM64_PC
+#define PERF_REG_SP PERF_REG_ARM64_SP
 #include "util/unwind-libunwind-local.c"

 struct unwind_libunwind_ops *
diff --git a/tools/perf/util/libunwind/x86_32.c 
b/tools/perf/util/libunwind/x86_32.c

index d98c17e..de21a39 100644
--- a/tools/perf/util/libunwind/x86_32.c
+++ b/tools/perf/util/libunwind/x86_32.c
@@ -31,6 +31,11 @@
 #ifndef NO_LIBUNWIND_DEBUG_FRAME
 #define NO_LIBUNWIND_DEBUG_FRAME
 #endif
+
+#undef PERF_REG_IP
+#undef PERF_REG_SP
+#define PERF_REG_IP PERF_REG_X86_IP
+#define PERF_REG_SP PERF_REG_X86_SP
 #include "util/unwind-libunwind-local.c"

 struct unwind_libunwind_ops *



but from a quick look I couldn't see this as being supported, is that
true or I was just lazy not to have tried this?

- Arnaldo






Re: [PATCH v9 00/14] Add support for remote unwind

2016-06-08 Thread Hekuang

hi

在 2016/6/8 3:44, Arnaldo Carvalho de Melo 写道:

Em Fri, Jun 03, 2016 at 09:06:29AM +0200, Jiri Olsa escreveu:

On Fri, Jun 03, 2016 at 03:33:09AM +, He Kuang wrote:

SNIP


For using remote libunwind libraries, reference this:
   http://thread.gmane.org/gmane.linux.kernel/2224430

and now we can use LIBUNWIND_DIR to specific custom dirctories
containing libunwind libs.

Acked-by: Jiri Olsa  for most patches except:

v9:
  - Change function unwind__register_ops() to static.
  - Move up unwind__prepare_access() in thread__insert_map() and save
map_groups__remove() call.
  - Enclose multiple line if/else into braces.
  - Fix miss modified function declaration for unwind__prepare_access()
in patch 10.

for patchset:

Acked-by: Jiri Olsa 

Ok, I'm applying it, after fixing 'perf test unwind', 'perf top --call-graph 
dwarf'
and 'perf trace --call-graph dwarf', but I have one question, is the
scenario where we collect on a x86_64 machine and want to do analysis on
a ARM64 or x86-32 machine supported? This should be the odd case now,

Yes, it's supported.

But I never tested this before, so I just compiled libunwind for
aarch64, and tested unwinding i686 perf.data on aarch64. Then I
found another issue I've considered but missed at some version of
this patch series.

In util/unwind-libunwind-local.c, PERF_REG_SP/IP is used, but
those macros are assigned to the host platform, we should
redefine them in the wrapper file, for example in
"util/libunwind/x86_32.c".

After fixing this problem, i686 perf.data can be parsed on
aarch64 machine.  Since you've already applied the v9 patches,
should I send patches based on the lastest tree as bug fixes or
just update v9 patches?

Thank you.

Here is the modified part:

diff --git a/tools/perf/util/libunwind/arm64.c 
b/tools/perf/util/libunwind/arm64.c

index 4fb5395..8a5c2fc 100644
--- a/tools/perf/util/libunwind/arm64.c
+++ b/tools/perf/util/libunwind/arm64.c
@@ -29,6 +29,11 @@
 #ifdef NO_LIBUNWIND_DEBUG_FRAME_AARCH64
 #define NO_LIBUNWIND_DEBUG_FRAME
 #endif
+
+#undef PERF_REG_IP
+#undef PERF_REG_SP
+#define PERF_REG_IP PERF_REG_ARM64_PC
+#define PERF_REG_SP PERF_REG_ARM64_SP
 #include "util/unwind-libunwind-local.c"

 struct unwind_libunwind_ops *
diff --git a/tools/perf/util/libunwind/x86_32.c 
b/tools/perf/util/libunwind/x86_32.c

index d98c17e..de21a39 100644
--- a/tools/perf/util/libunwind/x86_32.c
+++ b/tools/perf/util/libunwind/x86_32.c
@@ -31,6 +31,11 @@
 #ifndef NO_LIBUNWIND_DEBUG_FRAME
 #define NO_LIBUNWIND_DEBUG_FRAME
 #endif
+
+#undef PERF_REG_IP
+#undef PERF_REG_SP
+#define PERF_REG_IP PERF_REG_X86_IP
+#define PERF_REG_SP PERF_REG_X86_SP
 #include "util/unwind-libunwind-local.c"

 struct unwind_libunwind_ops *



but from a quick look I couldn't see this as being supported, is that
true or I was just lazy not to have tried this?

- Arnaldo






Re: [PATCH v9 00/14] Add support for remote unwind

2016-06-03 Thread Hekuang



在 2016/6/4 5:09, Arnaldo Carvalho de Melo 写道:

Em Fri, Jun 03, 2016 at 06:06:02PM -0300, Arnaldo Carvalho de Melo escreveu:

Em Fri, Jun 03, 2016 at 04:42:05PM -0300, Arnaldo Carvalho de Melo escreveu:

Em Fri, Jun 03, 2016 at 09:06:29AM +0200, Jiri Olsa escreveu:

On Fri, Jun 03, 2016 at 03:33:09AM +, He Kuang wrote:

v9:
  - Change function unwind__register_ops() to static.
  - Move up unwind__prepare_access() in thread__insert_map() and save
map_groups__remove() call.
  - Enclose multiple line if/else into braces.
  - Fix miss modified function declaration for unwind__prepare_access()
in patch 10.
  

for patchset:
  

Acked-by: Jiri Olsa 

Thanks, applied, build testing.

Build tested went ok, but then 'perf top' crashes:

[root@jouet ~]# perf top
perf: Segmentation fault
 backtrace 
perf[0x55591b]
/lib64/libc.so.6(+0x34ab0)[0x7f38ad9c1ab0]
perf(normalize_arch+0x27)[0x534797]
perf(unwind__prepare_access+0xbb)[0x52b15b]
perf(thread__insert_map+0x27)[0x4d4837]
perf(machine__process_mmap2_event+0xd7)[0x4ca187]
perf(perf_event__synthesize_mmap_events+0x3e2)[0x491b32]
perf(perf_event__synthesize_threads+0x445)[0x492635]
perf(cmd_top+0xee0)[0x442f50]
perf[0x486a91]
perf(main+0x6ee)[0x42485e]
/lib64/libc.so.6(__libc_start_main+0xf0)[0x7f38ad9ad580]
perf(_start+0x29)[0x424949]
[0x0]
[root@jouet ~]#

And I bet that 'perf trace' will too, lemme see, well, it crashes even
more spetacularly, but that is the topic of another bug report, will
send soon.

Anyway, please try your patchkit with 'perf top' and 'perf trace', as
both don't use perf.data files, i.e. they work 'live', so probably
things that you touch in normalize_arch() are not initialized and need
to be setup.

Ah, no need to resend the whole patchkit, just find out what is the bug
and send me a patch and I'll insert it at the right point to avoid
introducing a bisect breaking point.

Your patchkit is in my perf/unwind branch at my tree, I already added
Jiri's Acked-by in all the patches.

git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git

Thanks,

- Arnaldo
-


I send the updated one after PATCH 10/14,  env->arch is null in live mode,
comments are added and "perf top/trace" works now.

Thanks.




Re: [PATCH v9 00/14] Add support for remote unwind

2016-06-03 Thread Hekuang



在 2016/6/4 5:09, Arnaldo Carvalho de Melo 写道:

Em Fri, Jun 03, 2016 at 06:06:02PM -0300, Arnaldo Carvalho de Melo escreveu:

Em Fri, Jun 03, 2016 at 04:42:05PM -0300, Arnaldo Carvalho de Melo escreveu:

Em Fri, Jun 03, 2016 at 09:06:29AM +0200, Jiri Olsa escreveu:

On Fri, Jun 03, 2016 at 03:33:09AM +, He Kuang wrote:

v9:
  - Change function unwind__register_ops() to static.
  - Move up unwind__prepare_access() in thread__insert_map() and save
map_groups__remove() call.
  - Enclose multiple line if/else into braces.
  - Fix miss modified function declaration for unwind__prepare_access()
in patch 10.
  

for patchset:
  

Acked-by: Jiri Olsa 

Thanks, applied, build testing.

Build tested went ok, but then 'perf top' crashes:

[root@jouet ~]# perf top
perf: Segmentation fault
 backtrace 
perf[0x55591b]
/lib64/libc.so.6(+0x34ab0)[0x7f38ad9c1ab0]
perf(normalize_arch+0x27)[0x534797]
perf(unwind__prepare_access+0xbb)[0x52b15b]
perf(thread__insert_map+0x27)[0x4d4837]
perf(machine__process_mmap2_event+0xd7)[0x4ca187]
perf(perf_event__synthesize_mmap_events+0x3e2)[0x491b32]
perf(perf_event__synthesize_threads+0x445)[0x492635]
perf(cmd_top+0xee0)[0x442f50]
perf[0x486a91]
perf(main+0x6ee)[0x42485e]
/lib64/libc.so.6(__libc_start_main+0xf0)[0x7f38ad9ad580]
perf(_start+0x29)[0x424949]
[0x0]
[root@jouet ~]#

And I bet that 'perf trace' will too, lemme see, well, it crashes even
more spetacularly, but that is the topic of another bug report, will
send soon.

Anyway, please try your patchkit with 'perf top' and 'perf trace', as
both don't use perf.data files, i.e. they work 'live', so probably
things that you touch in normalize_arch() are not initialized and need
to be setup.

Ah, no need to resend the whole patchkit, just find out what is the bug
and send me a patch and I'll insert it at the right point to avoid
introducing a bisect breaking point.

Your patchkit is in my perf/unwind branch at my tree, I already added
Jiri's Acked-by in all the patches.

git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git

Thanks,

- Arnaldo
-


I send the updated one after PATCH 10/14,  env->arch is null in live mode,
comments are added and "perf top/trace" works now.

Thanks.




Re: [PATCH v7 10/14] perf tools: Check the target platform before assigning unwind methods

2016-06-01 Thread Hekuang



在 2016/6/1 16:40, Jiri Olsa 写道:

On Tue, May 31, 2016 at 11:19:08AM +, He Kuang wrote:

SNIP


-int unwind__prepare_access(struct thread *thread)
+int unwind__prepare_access(struct thread *thread, struct map *map)
  {
-   unwind__register_ops(thread, local_unwind_libunwind_ops);
+   const char *arch;
+   enum dso_type dso_type;
+   struct unwind_libunwind_ops *ops = local_unwind_libunwind_ops;
  
-	return thread->unwind_libunwind_ops->prepare_access(thread);

+   if (!thread->mg->machine->env)
+   return 0;
+
+   dso_type = dso__type(map->dso, thread->mg->machine);
+   if (dso_type == DSO__TYPE_UNKNOWN)
+   return 0;
+
+   if (thread->addr_space)
+   pr_debug("unwind: thread map already set, 64bit is %d, 
dso=%s\n",
+dso_type == DSO__TYPE_64BIT, map->dso->name);

should we leave once the address space is set? resseting it over
again seems like memory leak unless I'm missing something...

also this check should be probably the first thing we do in here


Sure, I must miss the return statement here.

thanks,
jirka


+
+   arch = normalize_arch(thread->mg->machine->env->arch);
+   pr_debug("unwind: target platform=%s\n", arch);
+
+   unwind__register_ops(thread, ops);
+
+   if (thread->unwind_libunwind_ops)
+   return thread->unwind_libunwind_ops->prepare_access(thread);
+   else
+   return 0;
  }

SNIP






Re: [PATCH v7 10/14] perf tools: Check the target platform before assigning unwind methods

2016-06-01 Thread Hekuang



在 2016/6/1 16:40, Jiri Olsa 写道:

On Tue, May 31, 2016 at 11:19:08AM +, He Kuang wrote:

SNIP


-int unwind__prepare_access(struct thread *thread)
+int unwind__prepare_access(struct thread *thread, struct map *map)
  {
-   unwind__register_ops(thread, local_unwind_libunwind_ops);
+   const char *arch;
+   enum dso_type dso_type;
+   struct unwind_libunwind_ops *ops = local_unwind_libunwind_ops;
  
-	return thread->unwind_libunwind_ops->prepare_access(thread);

+   if (!thread->mg->machine->env)
+   return 0;
+
+   dso_type = dso__type(map->dso, thread->mg->machine);
+   if (dso_type == DSO__TYPE_UNKNOWN)
+   return 0;
+
+   if (thread->addr_space)
+   pr_debug("unwind: thread map already set, 64bit is %d, 
dso=%s\n",
+dso_type == DSO__TYPE_64BIT, map->dso->name);

should we leave once the address space is set? resseting it over
again seems like memory leak unless I'm missing something...

also this check should be probably the first thing we do in here


Sure, I must miss the return statement here.

thanks,
jirka


+
+   arch = normalize_arch(thread->mg->machine->env->arch);
+   pr_debug("unwind: target platform=%s\n", arch);
+
+   unwind__register_ops(thread, ops);
+
+   if (thread->unwind_libunwind_ops)
+   return thread->unwind_libunwind_ops->prepare_access(thread);
+   else
+   return 0;
  }

SNIP






Re: [PATCH v7 14/14] perf callchain: Support aarch64 cross-platform

2016-06-01 Thread Hekuang



在 2016/6/1 16:40, Jiri Olsa 写道:

On Tue, May 31, 2016 at 11:19:12AM +, He Kuang wrote:

SNIP


diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index 7746e09..fced833 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -102,6 +102,7 @@ libperf-$(CONFIG_LIBDW_DWARF_UNWIND) += unwind-libdw.o
  libperf-$(CONFIG_LOCAL_LIBUNWIND)+= unwind-libunwind-local.o
  libperf-$(CONFIG_LIBUNWIND)  += unwind-libunwind.o
  libperf-$(CONFIG_LIBUNWIND_X86)  += libunwind/x86_32.o
+libperf-$(CONFIG_LIBUNWIND_AARCH64)  += libunwind/arm64.o
  
  libperf-$(CONFIG_LIBBABELTRACE) += data-convert-bt.o
  
diff --git a/tools/perf/util/libunwind/arm64.c b/tools/perf/util/libunwind/arm64.c

new file mode 100644
index 000..99c0d42
--- /dev/null
+++ b/tools/perf/util/libunwind/arm64.c
@@ -0,0 +1,18 @@

also please add some comments in here describing how this file works
like that it setups defines to compile arch specific binary from the
generic one, which is then represented by the arm64_unwind_libunwind_ops,
which get assigned for each arm64 thread

or something along those lines

thanks,
jirka

ok

+#define REMOTE_UNWIND_LIBUNWIND
+
+#define LIBUNWIND__ARCH_REG_ID libunwind__arm64_reg_id
+
+#include "unwind.h"
+#include "debug.h"
+#include "libunwind-aarch64.h"
+#include <../../../../arch/arm64/include/uapi/asm/perf_regs.h>
+#include "../../arch/arm64/util/unwind-libunwind.c"
+
+#undef NO_LIBUNWIND_DEBUG_FRAME
+#ifdef NO_LIBUNWIND_DEBUG_FRAME_AARCH64
+#define NO_LIBUNWIND_DEBUG_FRAME
+#endif
+#include "util/unwind-libunwind-local.c"
+
+struct unwind_libunwind_ops *
+arm64_unwind_libunwind_ops = &_unwind_libunwind_ops;

SNIP






Re: [PATCH v7 14/14] perf callchain: Support aarch64 cross-platform

2016-06-01 Thread Hekuang



在 2016/6/1 16:40, Jiri Olsa 写道:

On Tue, May 31, 2016 at 11:19:12AM +, He Kuang wrote:

SNIP


diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index 7746e09..fced833 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -102,6 +102,7 @@ libperf-$(CONFIG_LIBDW_DWARF_UNWIND) += unwind-libdw.o
  libperf-$(CONFIG_LOCAL_LIBUNWIND)+= unwind-libunwind-local.o
  libperf-$(CONFIG_LIBUNWIND)  += unwind-libunwind.o
  libperf-$(CONFIG_LIBUNWIND_X86)  += libunwind/x86_32.o
+libperf-$(CONFIG_LIBUNWIND_AARCH64)  += libunwind/arm64.o
  
  libperf-$(CONFIG_LIBBABELTRACE) += data-convert-bt.o
  
diff --git a/tools/perf/util/libunwind/arm64.c b/tools/perf/util/libunwind/arm64.c

new file mode 100644
index 000..99c0d42
--- /dev/null
+++ b/tools/perf/util/libunwind/arm64.c
@@ -0,0 +1,18 @@

also please add some comments in here describing how this file works
like that it setups defines to compile arch specific binary from the
generic one, which is then represented by the arm64_unwind_libunwind_ops,
which get assigned for each arm64 thread

or something along those lines

thanks,
jirka

ok

+#define REMOTE_UNWIND_LIBUNWIND
+
+#define LIBUNWIND__ARCH_REG_ID libunwind__arm64_reg_id
+
+#include "unwind.h"
+#include "debug.h"
+#include "libunwind-aarch64.h"
+#include <../../../../arch/arm64/include/uapi/asm/perf_regs.h>
+#include "../../arch/arm64/util/unwind-libunwind.c"
+
+#undef NO_LIBUNWIND_DEBUG_FRAME
+#ifdef NO_LIBUNWIND_DEBUG_FRAME_AARCH64
+#define NO_LIBUNWIND_DEBUG_FRAME
+#endif
+#include "util/unwind-libunwind-local.c"
+
+struct unwind_libunwind_ops *
+arm64_unwind_libunwind_ops = &_unwind_libunwind_ops;

SNIP






Re: [PATCH v7 13/14] perf callchain: Support x86 target platform

2016-06-01 Thread Hekuang



在 2016/6/1 16:40, Jiri Olsa 写道:

On Tue, May 31, 2016 at 11:19:11AM +, He Kuang wrote:

SNIP


diff --git a/tools/perf/util/unwind-libunwind.c 
b/tools/perf/util/unwind-libunwind.c
index e183390..5774317 100644
--- a/tools/perf/util/unwind-libunwind.c
+++ b/tools/perf/util/unwind-libunwind.c
@@ -5,6 +5,7 @@
  #include "arch/common.h"
  
  struct unwind_libunwind_ops __weak *local_unwind_libunwind_ops;

+struct unwind_libunwind_ops __weak *x86_32_unwind_libunwind_ops;
  
  void unwind__register_ops(struct thread *thread,

  struct unwind_libunwind_ops *ops)
@@ -30,7 +31,13 @@ int unwind__prepare_access(struct thread *thread, struct map 
*map)
 dso_type == DSO__TYPE_64BIT, map->dso->name);
  
  	arch = normalize_arch(thread->mg->machine->env->arch);

-   pr_debug("unwind: target platform=%s\n", arch);
+
+   if (!strcmp(arch, "x86"))
+   if (dso_type != DSO__TYPE_64BIT)
+   ops = x86_32_unwind_libunwind_ops;
+
+   if (!ops)
+   pr_err("unwind: target platform=%s is not supported\n", arch);

how could ops become NULL in here? it starts with local_unwind_libunwind_ops
I dont think this check is needed in here


x86_32_unwind_libunwind_ops is a null pointer when x86_32
libunwind is not supported.

There's a weak defination if CONFIG_LIBUNWIND_X86 is not set.

   struct unwind_libunwind_ops __weak *x86_32_unwind_libunwind_ops;

In this case, ops is null and "x86_32 is not supported" error
message is showed up.


jirka






Re: [PATCH v7 13/14] perf callchain: Support x86 target platform

2016-06-01 Thread Hekuang



在 2016/6/1 16:40, Jiri Olsa 写道:

On Tue, May 31, 2016 at 11:19:11AM +, He Kuang wrote:

SNIP


diff --git a/tools/perf/util/unwind-libunwind.c 
b/tools/perf/util/unwind-libunwind.c
index e183390..5774317 100644
--- a/tools/perf/util/unwind-libunwind.c
+++ b/tools/perf/util/unwind-libunwind.c
@@ -5,6 +5,7 @@
  #include "arch/common.h"
  
  struct unwind_libunwind_ops __weak *local_unwind_libunwind_ops;

+struct unwind_libunwind_ops __weak *x86_32_unwind_libunwind_ops;
  
  void unwind__register_ops(struct thread *thread,

  struct unwind_libunwind_ops *ops)
@@ -30,7 +31,13 @@ int unwind__prepare_access(struct thread *thread, struct map 
*map)
 dso_type == DSO__TYPE_64BIT, map->dso->name);
  
  	arch = normalize_arch(thread->mg->machine->env->arch);

-   pr_debug("unwind: target platform=%s\n", arch);
+
+   if (!strcmp(arch, "x86"))
+   if (dso_type != DSO__TYPE_64BIT)
+   ops = x86_32_unwind_libunwind_ops;
+
+   if (!ops)
+   pr_err("unwind: target platform=%s is not supported\n", arch);

how could ops become NULL in here? it starts with local_unwind_libunwind_ops
I dont think this check is needed in here


x86_32_unwind_libunwind_ops is a null pointer when x86_32
libunwind is not supported.

There's a weak defination if CONFIG_LIBUNWIND_X86 is not set.

   struct unwind_libunwind_ops __weak *x86_32_unwind_libunwind_ops;

In this case, ops is null and "x86_32 is not supported" error
message is showed up.


jirka






Re: [PATCH v6 10/11] perf callchain: Support x86 target platform

2016-05-30 Thread Hekuang



在 2016/5/30 17:30, Jiri Olsa 写道:

On Mon, May 30, 2016 at 05:11:35PM +0800, Hekuang wrote:

hi

在 2016/5/30 16:53, Jiri Olsa 写道:

On Sat, May 28, 2016 at 11:59:59AM +, He Kuang wrote:

Support x86(32-bit) cross platform callchain unwind.

Signed-off-by: He Kuang <heku...@huawei.com>
---
   tools/perf/arch/Build  |  1 +
   tools/perf/arch/x86/util/unwind-libunwind.c|  7 ---
   tools/perf/arch/x86/util/unwind-libunwind_x86_32.c | 21 +
   tools/perf/util/unwind-libunwind-local.c   |  4 
   tools/perf/util/unwind-libunwind.c | 19 +--
   tools/perf/util/unwind.h   | 10 ++
   6 files changed, 53 insertions(+), 9 deletions(-)
   create mode 100644 tools/perf/arch/x86/util/unwind-libunwind_x86_32.c

diff --git a/tools/perf/arch/Build b/tools/perf/arch/Build
index 109eb75..3fc4af1 100644
--- a/tools/perf/arch/Build
+++ b/tools/perf/arch/Build
@@ -1,2 +1,3 @@
   libperf-y += common.o
   libperf-y += $(ARCH)/
+libperf-$(CONFIG_LIBUNWIND_X86)  += x86/util/unwind-libunwind_x86_32.o

we have Build file directly in arch/x86/util/

if you do it like this to include generic file easily
we better fix the include then

This is because "libperf-y += $(ARCH)" will only sink into $(ARCH) folder,
for example on x86_64, only tools/perf/arch/x86 will be built. But for
  remote libunwind, we also need
  'tools/perf/arch/arm64/util/unwind-libunwind.o', while arm64 folder is
not added to libperf-y. Is there a gracefull to deal with this?

you just need to include the file, right?

I think it's ok to include arch/arm/c
from arch/x86/util/unwind-libunwind-arm64.c

jirka


By following your advise, if ARCH=x86, the file tree will
be like this:

arch/x86
-arch/x86/util/unwind-libunwind-arm64.c
-arch/x86/util/unwind-libunwind-x86_32.c
-arch/x86/util/unwind-libunwind-x86_64.c
-arch/x86/util/unwind-libunwind-arm.c

And for ARCH=arm (host machine is arm, it should be considered)
arch/arm
-arch/arm/util/unwind-libunwind-arm64.c
-arch/arm/util/unwind-libunwind-x86_32.c
-arch/arm/util/unwind-libunwind-x86_64.c
-arch/arm/util/unwind-libunwind-arm.c

For arm64:
arch/arm64
-arch/arm64/util/unwind-libunwind-arm64.c
-arch/arm64/util/unwind-libunwind-x86_32.c
-arch/arm64/util/unwind-libunwind-x86_64.c
-arch/arm64/util/unwind-libunwind-arm.c

But in my patch, the file tree is like this:

arch
-arch/arm64/util/unwind-libunwind-arm64.c
-arch/x86/util/unwind-libunwind-x86_64.c
-arch/x86/util/unwind-libunwind-x86_32.c
-arch/arm/util/unwind-libunwind-arm.c

I admit that

+libperf-$(CONFIG_LIBUNWIND_X86)  += x86/util/unwind-libunwind_x86_32.o

is not so good, but do you think the above file tree is
too redunctant?

Thank you.







Re: [PATCH v6 10/11] perf callchain: Support x86 target platform

2016-05-30 Thread Hekuang



在 2016/5/30 17:30, Jiri Olsa 写道:

On Mon, May 30, 2016 at 05:11:35PM +0800, Hekuang wrote:

hi

在 2016/5/30 16:53, Jiri Olsa 写道:

On Sat, May 28, 2016 at 11:59:59AM +, He Kuang wrote:

Support x86(32-bit) cross platform callchain unwind.

Signed-off-by: He Kuang 
---
   tools/perf/arch/Build  |  1 +
   tools/perf/arch/x86/util/unwind-libunwind.c|  7 ---
   tools/perf/arch/x86/util/unwind-libunwind_x86_32.c | 21 +
   tools/perf/util/unwind-libunwind-local.c   |  4 
   tools/perf/util/unwind-libunwind.c | 19 +--
   tools/perf/util/unwind.h   | 10 ++
   6 files changed, 53 insertions(+), 9 deletions(-)
   create mode 100644 tools/perf/arch/x86/util/unwind-libunwind_x86_32.c

diff --git a/tools/perf/arch/Build b/tools/perf/arch/Build
index 109eb75..3fc4af1 100644
--- a/tools/perf/arch/Build
+++ b/tools/perf/arch/Build
@@ -1,2 +1,3 @@
   libperf-y += common.o
   libperf-y += $(ARCH)/
+libperf-$(CONFIG_LIBUNWIND_X86)  += x86/util/unwind-libunwind_x86_32.o

we have Build file directly in arch/x86/util/

if you do it like this to include generic file easily
we better fix the include then

This is because "libperf-y += $(ARCH)" will only sink into $(ARCH) folder,
for example on x86_64, only tools/perf/arch/x86 will be built. But for
  remote libunwind, we also need
  'tools/perf/arch/arm64/util/unwind-libunwind.o', while arm64 folder is
not added to libperf-y. Is there a gracefull to deal with this?

you just need to include the file, right?

I think it's ok to include arch/arm/c
from arch/x86/util/unwind-libunwind-arm64.c

jirka


By following your advise, if ARCH=x86, the file tree will
be like this:

arch/x86
-arch/x86/util/unwind-libunwind-arm64.c
-arch/x86/util/unwind-libunwind-x86_32.c
-arch/x86/util/unwind-libunwind-x86_64.c
-arch/x86/util/unwind-libunwind-arm.c

And for ARCH=arm (host machine is arm, it should be considered)
arch/arm
-arch/arm/util/unwind-libunwind-arm64.c
-arch/arm/util/unwind-libunwind-x86_32.c
-arch/arm/util/unwind-libunwind-x86_64.c
-arch/arm/util/unwind-libunwind-arm.c

For arm64:
arch/arm64
-arch/arm64/util/unwind-libunwind-arm64.c
-arch/arm64/util/unwind-libunwind-x86_32.c
-arch/arm64/util/unwind-libunwind-x86_64.c
-arch/arm64/util/unwind-libunwind-arm.c

But in my patch, the file tree is like this:

arch
-arch/arm64/util/unwind-libunwind-arm64.c
-arch/x86/util/unwind-libunwind-x86_64.c
-arch/x86/util/unwind-libunwind-x86_32.c
-arch/arm/util/unwind-libunwind-arm.c

I admit that

+libperf-$(CONFIG_LIBUNWIND_X86)  += x86/util/unwind-libunwind_x86_32.o

is not so good, but do you think the above file tree is
too redunctant?

Thank you.







Re: [PATCH v6 10/11] perf callchain: Support x86 target platform

2016-05-30 Thread Hekuang

hi

在 2016/5/30 16:53, Jiri Olsa 写道:

On Sat, May 28, 2016 at 11:59:59AM +, He Kuang wrote:

Support x86(32-bit) cross platform callchain unwind.

Signed-off-by: He Kuang 
---
  tools/perf/arch/Build  |  1 +
  tools/perf/arch/x86/util/unwind-libunwind.c|  7 ---
  tools/perf/arch/x86/util/unwind-libunwind_x86_32.c | 21 +
  tools/perf/util/unwind-libunwind-local.c   |  4 
  tools/perf/util/unwind-libunwind.c | 19 +--
  tools/perf/util/unwind.h   | 10 ++
  6 files changed, 53 insertions(+), 9 deletions(-)
  create mode 100644 tools/perf/arch/x86/util/unwind-libunwind_x86_32.c

diff --git a/tools/perf/arch/Build b/tools/perf/arch/Build
index 109eb75..3fc4af1 100644
--- a/tools/perf/arch/Build
+++ b/tools/perf/arch/Build
@@ -1,2 +1,3 @@
  libperf-y += common.o
  libperf-y += $(ARCH)/
+libperf-$(CONFIG_LIBUNWIND_X86)  += x86/util/unwind-libunwind_x86_32.o

we have Build file directly in arch/x86/util/

if you do it like this to include generic file easily
we better fix the include then


This is because "libperf-y += $(ARCH)" will only sink into $(ARCH) folder,
for example on x86_64, only tools/perf/arch/x86 will be built. But for
 remote libunwind, we also need
 'tools/perf/arch/arm64/util/unwind-libunwind.o', while arm64 folder is
not added to libperf-y. Is there a gracefull to deal with this?


jirka






Re: [PATCH v6 10/11] perf callchain: Support x86 target platform

2016-05-30 Thread Hekuang

hi

在 2016/5/30 16:53, Jiri Olsa 写道:

On Sat, May 28, 2016 at 11:59:59AM +, He Kuang wrote:

Support x86(32-bit) cross platform callchain unwind.

Signed-off-by: He Kuang 
---
  tools/perf/arch/Build  |  1 +
  tools/perf/arch/x86/util/unwind-libunwind.c|  7 ---
  tools/perf/arch/x86/util/unwind-libunwind_x86_32.c | 21 +
  tools/perf/util/unwind-libunwind-local.c   |  4 
  tools/perf/util/unwind-libunwind.c | 19 +--
  tools/perf/util/unwind.h   | 10 ++
  6 files changed, 53 insertions(+), 9 deletions(-)
  create mode 100644 tools/perf/arch/x86/util/unwind-libunwind_x86_32.c

diff --git a/tools/perf/arch/Build b/tools/perf/arch/Build
index 109eb75..3fc4af1 100644
--- a/tools/perf/arch/Build
+++ b/tools/perf/arch/Build
@@ -1,2 +1,3 @@
  libperf-y += common.o
  libperf-y += $(ARCH)/
+libperf-$(CONFIG_LIBUNWIND_X86)  += x86/util/unwind-libunwind_x86_32.o

we have Build file directly in arch/x86/util/

if you do it like this to include generic file easily
we better fix the include then


This is because "libperf-y += $(ARCH)" will only sink into $(ARCH) folder,
for example on x86_64, only tools/perf/arch/x86 will be built. But for
 remote libunwind, we also need
 'tools/perf/arch/arm64/util/unwind-libunwind.o', while arm64 folder is
not added to libperf-y. Is there a gracefull to deal with this?


jirka






Re: [PATCH v6 06/11] perf tools: Extract local libunwind code out of unwind-libunwind.c

2016-05-29 Thread Hekuang



在 2016/5/28 20:10, Wangnan (F) 写道:



On 2016/5/28 19:59, He Kuang wrote:

This patch extracts codes related to specific arithecture out of
unwind-libunwind.c. The extrated part are only built if local
libunwind is supported.

Signed-off-by: He Kuang 
---
  tools/perf/util/Build|   1 +
  tools/perf/util/unwind-libunwind-local.c | 677 
++
  tools/perf/util/unwind-libunwind.c   | 694 
+--

  3 files changed, 679 insertions(+), 693 deletions(-)
  create mode 100644 tools/perf/util/unwind-libunwind-local.c


Please use 'git format-patch -M' to avoid such a large patch.


Currently is not a 100% similar rename, I split this patch into a
rename and a small diff.


Thank you.







Re: [PATCH v6 06/11] perf tools: Extract local libunwind code out of unwind-libunwind.c

2016-05-29 Thread Hekuang



在 2016/5/28 20:10, Wangnan (F) 写道:



On 2016/5/28 19:59, He Kuang wrote:

This patch extracts codes related to specific arithecture out of
unwind-libunwind.c. The extrated part are only built if local
libunwind is supported.

Signed-off-by: He Kuang 
---
  tools/perf/util/Build|   1 +
  tools/perf/util/unwind-libunwind-local.c | 677 
++
  tools/perf/util/unwind-libunwind.c   | 694 
+--

  3 files changed, 679 insertions(+), 693 deletions(-)
  create mode 100644 tools/perf/util/unwind-libunwind-local.c


Please use 'git format-patch -M' to avoid such a large patch.


Currently is not a 100% similar rename, I split this patch into a
rename and a small diff.


Thank you.







Re: [PATCH v5 3/5] perf callchain: Add support for cross-platform unwind

2016-05-27 Thread Hekuang



在 2016/5/27 15:38, Jiri Olsa 写道:

On Fri, May 27, 2016 at 03:13:04PM +0800, Hekuang wrote:

SNIP


- I understand we need to compile 3 objects from unwind-libunwind.c,
  how about we create 3 files like:

  util/unwind-libunwind-local.c
  util/unwind-libunwind-x86_32.c
  util/unwind-libunwind-arm64.c

  which would setup all necessary defines and include unwind-libunwind.c 
like:

  ---
  /* comments explaining every define ;-) */
  ...
  #define LOCAL... REMOTE..
  ...
  #include 
  ...
  

  this way we will keep all the special setup for given unwind object
  in one place and you can also use simple rule in the Build file like
  without defining special rule:

  libperf-$(CONFIG_LIBUNWIND_X86)  += unwind-libunwind_x86_32.o
  libperf-$(CONFIG_LIBUNWIND_AARCH64)  += unwind-libunwind_arm64.o

  the same way for the arch object:

  arch/x86/util/unwind-libunwind-local.c
  arch/x86/util/unwind-libunwind-x86_32.c


Not sure I thought everything through, but I think this way
we'll keep it more maintainable and readable..

let me know what you think

The only concern is that, if later we support more platforms,
there will be too much files named as 'tools/perf/util/unwind-libunwind*.c'
Is it acceptable or not?

And I thought all files belongs to specific archs should
go to folder under 'tools/perf/arch/xxx', is that right?

hum, I wouldn't worry about that.. but you're right,
let's put them under arch


But only 'tools/perf/arch/$(host platform)' will be built, in our case,
we should built the unwind-libunwind-$(arch) as long as we have
the remote libunwind libraries. So, I think there's a conflict in the
existing build script and not easy to 'put them under arch'. That's why
I choose a complex way in my previous patch.

Do you have some suggestions?


thanks,
jirka






Re: [PATCH v5 3/5] perf callchain: Add support for cross-platform unwind

2016-05-27 Thread Hekuang



在 2016/5/27 15:38, Jiri Olsa 写道:

On Fri, May 27, 2016 at 03:13:04PM +0800, Hekuang wrote:

SNIP


- I understand we need to compile 3 objects from unwind-libunwind.c,
  how about we create 3 files like:

  util/unwind-libunwind-local.c
  util/unwind-libunwind-x86_32.c
  util/unwind-libunwind-arm64.c

  which would setup all necessary defines and include unwind-libunwind.c 
like:

  ---
  /* comments explaining every define ;-) */
  ...
  #define LOCAL... REMOTE..
  ...
  #include 
  ...
  

  this way we will keep all the special setup for given unwind object
  in one place and you can also use simple rule in the Build file like
  without defining special rule:

  libperf-$(CONFIG_LIBUNWIND_X86)  += unwind-libunwind_x86_32.o
  libperf-$(CONFIG_LIBUNWIND_AARCH64)  += unwind-libunwind_arm64.o

  the same way for the arch object:

  arch/x86/util/unwind-libunwind-local.c
  arch/x86/util/unwind-libunwind-x86_32.c


Not sure I thought everything through, but I think this way
we'll keep it more maintainable and readable..

let me know what you think

The only concern is that, if later we support more platforms,
there will be too much files named as 'tools/perf/util/unwind-libunwind*.c'
Is it acceptable or not?

And I thought all files belongs to specific archs should
go to folder under 'tools/perf/arch/xxx', is that right?

hum, I wouldn't worry about that.. but you're right,
let's put them under arch


But only 'tools/perf/arch/$(host platform)' will be built, in our case,
we should built the unwind-libunwind-$(arch) as long as we have
the remote libunwind libraries. So, I think there's a conflict in the
existing build script and not easy to 'put them under arch'. That's why
I choose a complex way in my previous patch.

Do you have some suggestions?


thanks,
jirka






Re: [PATCH v5 3/5] perf callchain: Add support for cross-platform unwind

2016-05-27 Thread Hekuang

hi

在 2016/5/27 1:42, Jiri Olsa 写道:

On Tue, May 24, 2016 at 09:20:27AM +, He Kuang wrote:

Use thread specific unwind ops to unwind cross-platform callchains.

Currently, unwind methods is suitable for local unwind, this patch
changes the fixed methods to thread/map related. Each time a map is
inserted, we find the target arch and see if this platform can be
remote unwind. We test for x86 platform and only show proper
messages. The real unwind methods are not implemented, will be
introduced in next patch.

CONFIG_LIBUNWIND/NO_LIBUNWIND are changed to
CONFIG_LOCAL_LIBUNWIND/NO_LOCAL_LIBUNWIND for retaining local unwind
features. CONFIG_LIBUNWIND stands for either local or remote or both
unwind are supported and NO_LIBUNWIND means neither local nor remote
libunwind are supported.

hi,
I think this is too complex and error prone, I'd rather see it split
to several pieces. Basically every logicaly single piece should be
in separate patch for better readablebility and review.

I might be missing some but I'd mainly sepatate following:

   - introducing struct unwind_libunwind_ops for local unwind
   - moving unwind__prepare_access from thread_new into thread__insert_map
   - keep unwind__prepare_access name instead of unwind__get_arch
 and keep the return value, we need to bail out in case of error
   - I wouldn't use null ops.. just check for for ops == NULL in wrapper 
function


OK

   - I understand we need to compile 3 objects from unwind-libunwind.c,
 how about we create 3 files like:

 util/unwind-libunwind-local.c
 util/unwind-libunwind-x86_32.c
 util/unwind-libunwind-arm64.c

 which would setup all necessary defines and include unwind-libunwind.c 
like:

 ---
 /* comments explaining every define ;-) */
 ...
 #define LOCAL... REMOTE..
 ...
 #include 
 ...
 

 this way we will keep all the special setup for given unwind object
 in one place and you can also use simple rule in the Build file like
 without defining special rule:

 libperf-$(CONFIG_LIBUNWIND_X86)  += unwind-libunwind_x86_32.o
 libperf-$(CONFIG_LIBUNWIND_AARCH64)  += unwind-libunwind_arm64.o

 the same way for the arch object:

 arch/x86/util/unwind-libunwind-local.c
 arch/x86/util/unwind-libunwind-x86_32.c


Not sure I thought everything through, but I think this way
we'll keep it more maintainable and readable..

let me know what you think


The only concern is that, if later we support more platforms,
there will be too much files named as 'tools/perf/util/unwind-libunwind*.c'
Is it acceptable or not?

And I thought all files belongs to specific archs should
go to folder under 'tools/perf/arch/xxx', is that right?

Thanks.

thanks,
jirka






Re: [PATCH v5 3/5] perf callchain: Add support for cross-platform unwind

2016-05-27 Thread Hekuang

hi

在 2016/5/27 1:42, Jiri Olsa 写道:

On Tue, May 24, 2016 at 09:20:27AM +, He Kuang wrote:

Use thread specific unwind ops to unwind cross-platform callchains.

Currently, unwind methods is suitable for local unwind, this patch
changes the fixed methods to thread/map related. Each time a map is
inserted, we find the target arch and see if this platform can be
remote unwind. We test for x86 platform and only show proper
messages. The real unwind methods are not implemented, will be
introduced in next patch.

CONFIG_LIBUNWIND/NO_LIBUNWIND are changed to
CONFIG_LOCAL_LIBUNWIND/NO_LOCAL_LIBUNWIND for retaining local unwind
features. CONFIG_LIBUNWIND stands for either local or remote or both
unwind are supported and NO_LIBUNWIND means neither local nor remote
libunwind are supported.

hi,
I think this is too complex and error prone, I'd rather see it split
to several pieces. Basically every logicaly single piece should be
in separate patch for better readablebility and review.

I might be missing some but I'd mainly sepatate following:

   - introducing struct unwind_libunwind_ops for local unwind
   - moving unwind__prepare_access from thread_new into thread__insert_map
   - keep unwind__prepare_access name instead of unwind__get_arch
 and keep the return value, we need to bail out in case of error
   - I wouldn't use null ops.. just check for for ops == NULL in wrapper 
function


OK

   - I understand we need to compile 3 objects from unwind-libunwind.c,
 how about we create 3 files like:

 util/unwind-libunwind-local.c
 util/unwind-libunwind-x86_32.c
 util/unwind-libunwind-arm64.c

 which would setup all necessary defines and include unwind-libunwind.c 
like:

 ---
 /* comments explaining every define ;-) */
 ...
 #define LOCAL... REMOTE..
 ...
 #include 
 ...
 

 this way we will keep all the special setup for given unwind object
 in one place and you can also use simple rule in the Build file like
 without defining special rule:

 libperf-$(CONFIG_LIBUNWIND_X86)  += unwind-libunwind_x86_32.o
 libperf-$(CONFIG_LIBUNWIND_AARCH64)  += unwind-libunwind_arm64.o

 the same way for the arch object:

 arch/x86/util/unwind-libunwind-local.c
 arch/x86/util/unwind-libunwind-x86_32.c


Not sure I thought everything through, but I think this way
we'll keep it more maintainable and readable..

let me know what you think


The only concern is that, if later we support more platforms,
there will be too much files named as 'tools/perf/util/unwind-libunwind*.c'
Is it acceptable or not?

And I thought all files belongs to specific archs should
go to folder under 'tools/perf/arch/xxx', is that right?

Thanks.

thanks,
jirka






Re: [PATCH] perf script: Fix display inconsitency when call-graph config is used

2016-05-26 Thread Hekuang

ping..

在 2016/5/16 12:51, He Kuang 写道:

There's a display inconsistency when 'call-graph' config event appears
in different position. The problem can be reproduced like this:

We record signal_deliver with call-graph and signal_generate without it.

   $ perf record -g -a -e signal:signal_deliver -e 
signal:signal_generate/call-graph=no/

   [ perf record: Captured and wrote 0.017 MB perf.data (2 samples) ]

   $ perf script

   kworker/u2:113 [000]  6563.875949: signal:signal_generate: sig=2 errno=0 
code=128 comm=perf pid=1313 grp=1 res=0 ff61cc __send_signal+0x3ec 
([kernel.kallsyms])
   perf  1313 [000]  6563.877584:  signal:signal_deliver: sig=2 errno=0 
code=128 sa_handler=43115e sa_flags=1400
   7314 get_signal+0x80007f0023a4 ([kernel.kallsyms])
   7fffe358 do_signal+0x80007f002028 ([kernel.kallsyms])
   7fffa5e8 exit_to_usermode_loop+0x80007f002053 ([kernel.kallsyms])
   ...

Then we exchange the order of these two events in commandline, and keep
signal_generate without call-graph.

   $ perf record -g -a -e signal:signal_generate/call-graph=no/ -e 
signal:signal_deliver

   [ perf record: Captured and wrote 0.017 MB perf.data (2 samples) ]

   $ perf script

 kworker/u2:2  1314 [000]  6933.353060: signal:signal_generate: sig=2 
errno=0 code=128 comm=perf pid=1321 grp=1 res=0
 perf  1321 [000]  6933.353872:  signal:signal_deliver: sig=2 
errno=0 code=128 sa_handler=43115e sa_flags=1400

This time, the callchain of the event signal_deliver disappeared. The
problem is caused by that perf only checks for the first evsel in evlist
and decides if callchain should be printed.

This patch travseres all evsels in evlist to see if any of them have
callchains, and shows the right result:

   $ perf script

   kworker/u2:2  1314 [000]  6933.353060: signal:signal_generate: sig=2 errno=0 
code=128 comm=perf pid=1321 grp=1 res=0 ff61cc __send_signal+0x3ec 
([kernel.kallsyms])
   perf  1321 [000]  6933.353872:  signal:signal_deliver: sig=2 errno=0 
code=128 sa_handler=43115e sa_flags=1400
   7314 get_signal+0x80007f0023a4 ([kernel.kallsyms])
   7fffe358 do_signal+0x80007f002028 ([kernel.kallsyms])
   7fffa5e8 exit_to_usermode_loop+0x80007f002053 ([kernel.kallsyms])
   ...

Signed-off-by: He Kuang 
---
  tools/perf/builtin-script.c | 23 +--
  1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index efca816..7a18b92 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -339,7 +339,7 @@ static void set_print_ip_opts(struct perf_event_attr *attr)
   */
  static int perf_session__check_output_opt(struct perf_session *session)
  {
-   int j;
+   unsigned int j;
struct perf_evsel *evsel;
  
  	for (j = 0; j < PERF_TYPE_MAX; ++j) {

@@ -388,17 +388,20 @@ static int perf_session__check_output_opt(struct 
perf_session *session)
struct perf_event_attr *attr;
  
  		j = PERF_TYPE_TRACEPOINT;

-   evsel = perf_session__find_first_evtype(session, j);
-   if (evsel == NULL)
-   goto out;
  
-		attr = >attr;

+   evlist__for_each(session->evlist, evsel) {
+   if (evsel->attr.type != j)
+   continue;
+
+   attr = >attr;
  
-		if (attr->sample_type & PERF_SAMPLE_CALLCHAIN) {

-   output[j].fields |= PERF_OUTPUT_IP;
-   output[j].fields |= PERF_OUTPUT_SYM;
-   output[j].fields |= PERF_OUTPUT_DSO;
-   set_print_ip_opts(attr);
+   if (attr->sample_type & PERF_SAMPLE_CALLCHAIN) {
+   output[j].fields |= PERF_OUTPUT_IP;
+   output[j].fields |= PERF_OUTPUT_SYM;
+   output[j].fields |= PERF_OUTPUT_DSO;
+   set_print_ip_opts(attr);
+   goto out;
+   }
}
}
  





Re: [PATCH] perf script: Fix display inconsitency when call-graph config is used

2016-05-26 Thread Hekuang

ping..

在 2016/5/16 12:51, He Kuang 写道:

There's a display inconsistency when 'call-graph' config event appears
in different position. The problem can be reproduced like this:

We record signal_deliver with call-graph and signal_generate without it.

   $ perf record -g -a -e signal:signal_deliver -e 
signal:signal_generate/call-graph=no/

   [ perf record: Captured and wrote 0.017 MB perf.data (2 samples) ]

   $ perf script

   kworker/u2:113 [000]  6563.875949: signal:signal_generate: sig=2 errno=0 
code=128 comm=perf pid=1313 grp=1 res=0 ff61cc __send_signal+0x3ec 
([kernel.kallsyms])
   perf  1313 [000]  6563.877584:  signal:signal_deliver: sig=2 errno=0 
code=128 sa_handler=43115e sa_flags=1400
   7314 get_signal+0x80007f0023a4 ([kernel.kallsyms])
   7fffe358 do_signal+0x80007f002028 ([kernel.kallsyms])
   7fffa5e8 exit_to_usermode_loop+0x80007f002053 ([kernel.kallsyms])
   ...

Then we exchange the order of these two events in commandline, and keep
signal_generate without call-graph.

   $ perf record -g -a -e signal:signal_generate/call-graph=no/ -e 
signal:signal_deliver

   [ perf record: Captured and wrote 0.017 MB perf.data (2 samples) ]

   $ perf script

 kworker/u2:2  1314 [000]  6933.353060: signal:signal_generate: sig=2 
errno=0 code=128 comm=perf pid=1321 grp=1 res=0
 perf  1321 [000]  6933.353872:  signal:signal_deliver: sig=2 
errno=0 code=128 sa_handler=43115e sa_flags=1400

This time, the callchain of the event signal_deliver disappeared. The
problem is caused by that perf only checks for the first evsel in evlist
and decides if callchain should be printed.

This patch travseres all evsels in evlist to see if any of them have
callchains, and shows the right result:

   $ perf script

   kworker/u2:2  1314 [000]  6933.353060: signal:signal_generate: sig=2 errno=0 
code=128 comm=perf pid=1321 grp=1 res=0 ff61cc __send_signal+0x3ec 
([kernel.kallsyms])
   perf  1321 [000]  6933.353872:  signal:signal_deliver: sig=2 errno=0 
code=128 sa_handler=43115e sa_flags=1400
   7314 get_signal+0x80007f0023a4 ([kernel.kallsyms])
   7fffe358 do_signal+0x80007f002028 ([kernel.kallsyms])
   7fffa5e8 exit_to_usermode_loop+0x80007f002053 ([kernel.kallsyms])
   ...

Signed-off-by: He Kuang 
---
  tools/perf/builtin-script.c | 23 +--
  1 file changed, 13 insertions(+), 10 deletions(-)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index efca816..7a18b92 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -339,7 +339,7 @@ static void set_print_ip_opts(struct perf_event_attr *attr)
   */
  static int perf_session__check_output_opt(struct perf_session *session)
  {
-   int j;
+   unsigned int j;
struct perf_evsel *evsel;
  
  	for (j = 0; j < PERF_TYPE_MAX; ++j) {

@@ -388,17 +388,20 @@ static int perf_session__check_output_opt(struct 
perf_session *session)
struct perf_event_attr *attr;
  
  		j = PERF_TYPE_TRACEPOINT;

-   evsel = perf_session__find_first_evtype(session, j);
-   if (evsel == NULL)
-   goto out;
  
-		attr = >attr;

+   evlist__for_each(session->evlist, evsel) {
+   if (evsel->attr.type != j)
+   continue;
+
+   attr = >attr;
  
-		if (attr->sample_type & PERF_SAMPLE_CALLCHAIN) {

-   output[j].fields |= PERF_OUTPUT_IP;
-   output[j].fields |= PERF_OUTPUT_SYM;
-   output[j].fields |= PERF_OUTPUT_DSO;
-   set_print_ip_opts(attr);
+   if (attr->sample_type & PERF_SAMPLE_CALLCHAIN) {
+   output[j].fields |= PERF_OUTPUT_IP;
+   output[j].fields |= PERF_OUTPUT_SYM;
+   output[j].fields |= PERF_OUTPUT_DSO;
+   set_print_ip_opts(attr);
+   goto out;
+   }
}
}
  





Re: [PATCH v4 2/6] perf tools: Promote proper messages for cross-platform unwind

2016-05-19 Thread Hekuang



在 2016/5/20 0:19, Jiri Olsa 写道:

On Thu, May 19, 2016 at 11:50:10AM -0300, Arnaldo Carvalho de Melo wrote:

Em Thu, May 19, 2016 at 11:47:38AM +, He Kuang escreveu:

Currently, perf script uses host unwind methods to parse perf.data
callchain info regardless of the target architecture. So we get wrong
result and no promotion when unwinding callchains of x86(32-bit) on

What you mean by "promotion" here? Can you use some other synonym so
that I can make sense of this description?

I'd say something like: s/and no promotion/without any warning/

jirka


Yes, that's what I want to express.




Re: [PATCH v4 2/6] perf tools: Promote proper messages for cross-platform unwind

2016-05-19 Thread Hekuang



在 2016/5/20 0:19, Jiri Olsa 写道:

On Thu, May 19, 2016 at 11:50:10AM -0300, Arnaldo Carvalho de Melo wrote:

Em Thu, May 19, 2016 at 11:47:38AM +, He Kuang escreveu:

Currently, perf script uses host unwind methods to parse perf.data
callchain info regardless of the target architecture. So we get wrong
result and no promotion when unwinding callchains of x86(32-bit) on

What you mean by "promotion" here? Can you use some other synonym so
that I can make sense of this description?

I'd say something like: s/and no promotion/without any warning/

jirka


Yes, that's what I want to express.




Re: [PATCH v4 2/6] perf tools: Promote proper messages for cross-platform unwind

2016-05-19 Thread Hekuang

hi

在 2016/5/20 0:46, Jiri Olsa 写道:

On Thu, May 19, 2016 at 11:47:38AM +, He Kuang wrote:

SNIP


  #endif /* ARCH_PERF_COMMON_H */
diff --git a/tools/perf/config/Makefile b/tools/perf/config/Makefile
index 1e46277..a86b864 100644
--- a/tools/perf/config/Makefile
+++ b/tools/perf/config/Makefile
@@ -345,6 +345,12 @@ ifeq ($(ARCH),powerpc)
  endif
  
  ifndef NO_LIBUNWIND

+  ifeq ($(feature-libunwind-x86), 1)
+LIBUNWIND_LIBS += -lunwind-x86
+$(call detected,CONFIG_LIBUNWIND_X86)
+CFLAGS += -DHAVE_LIBUNWIND_X86_SUPPORT
+  endif
+

how does one install that lirary?

thanks,
jirka


My work environment is on an old suse distribution, so it's
difficult to find libunwind-$arch rpm packages, so I build them
from source.

The git repository url is here:
http://git.savannah.gnu.org/r/libunwind.git(master)

Then flow the build step in README, first for i686:

  $ ./autogen.sh
  $ ./configure prefix=/xx/dst_i686 --target=i686-oe-linux 
CC=x86_64-oe-linux-gcc

  $ make && make install

Similar for aarch64:

  $ make clean
  $ ./configure prefix=/xx/dst_aarch64 --target=i686-oe-linux 
CC=x86_64-oe-linux-gcc

  $ make && make install

NOTICE: the contents in '--target' should be like
'i686-oe-linux', only give 'i686' cause strange build errors.

It looks like that libunwind don't support building for multiple
platforms at the same time, so I build them separately into
different directories.

Finally, copy the outputs into /usr/include and /usr/lib64, now
perf can detect them:

  $ make VF=1 ARCH=x86_64 CROSS_COMPILE=x86_64-oe-linux- 
EXTRA_CFLAGS="-m64"

  ...

  ... libunwind-x86: [ on  ]
  ...  libunwind-x86_64: [ OFF ]
  ... libunwind-arm: [ OFF ]
  ... libunwind-aarch64: [ on  ]

Thanks.




Re: [PATCH v4 2/6] perf tools: Promote proper messages for cross-platform unwind

2016-05-19 Thread Hekuang

hi

在 2016/5/20 0:46, Jiri Olsa 写道:

On Thu, May 19, 2016 at 11:47:38AM +, He Kuang wrote:

SNIP


  #endif /* ARCH_PERF_COMMON_H */
diff --git a/tools/perf/config/Makefile b/tools/perf/config/Makefile
index 1e46277..a86b864 100644
--- a/tools/perf/config/Makefile
+++ b/tools/perf/config/Makefile
@@ -345,6 +345,12 @@ ifeq ($(ARCH),powerpc)
  endif
  
  ifndef NO_LIBUNWIND

+  ifeq ($(feature-libunwind-x86), 1)
+LIBUNWIND_LIBS += -lunwind-x86
+$(call detected,CONFIG_LIBUNWIND_X86)
+CFLAGS += -DHAVE_LIBUNWIND_X86_SUPPORT
+  endif
+

how does one install that lirary?

thanks,
jirka


My work environment is on an old suse distribution, so it's
difficult to find libunwind-$arch rpm packages, so I build them
from source.

The git repository url is here:
http://git.savannah.gnu.org/r/libunwind.git(master)

Then flow the build step in README, first for i686:

  $ ./autogen.sh
  $ ./configure prefix=/xx/dst_i686 --target=i686-oe-linux 
CC=x86_64-oe-linux-gcc

  $ make && make install

Similar for aarch64:

  $ make clean
  $ ./configure prefix=/xx/dst_aarch64 --target=i686-oe-linux 
CC=x86_64-oe-linux-gcc

  $ make && make install

NOTICE: the contents in '--target' should be like
'i686-oe-linux', only give 'i686' cause strange build errors.

It looks like that libunwind don't support building for multiple
platforms at the same time, so I build them separately into
different directories.

Finally, copy the outputs into /usr/include and /usr/lib64, now
perf can detect them:

  $ make VF=1 ARCH=x86_64 CROSS_COMPILE=x86_64-oe-linux- 
EXTRA_CFLAGS="-m64"

  ...

  ... libunwind-x86: [ on  ]
  ...  libunwind-x86_64: [ OFF ]
  ... libunwind-arm: [ OFF ]
  ... libunwind-aarch64: [ on  ]

Thanks.




Re: [PATCH v3 3/7 UPDATE] perf tools: Add option for the path of buildid dsos under symfs

2016-05-17 Thread Hekuang



在 2016/5/18 9:51, David Ahern 写道:

On 5/17/16 7:47 PM, Hekuang wrote:



在 2016/5/16 10:50, David Ahern 写道:

On 5/15/16 7:30 PM, Hekuang wrote:

In previous patch, I use 'perf buildid-cache -a' to add vdso
binary into the HOST buildid dir.


So 'perf buildid-cache' needs the symfs option?



With this patch 'PATCH v3 3/7 UPDATE', the tree of symfs dir is
like this:

├── debug($(dso-prefix))
│   ├── .build-id
│   │   ├── 3a
│   │   │   └── e5ba6d4e532ad529e43ccf1ce1ddf8a64a4fdd ->
../../[kernel.kallsyms]/3ae5ba6d4e532ad529e43ccf1ce1ddf8a64a4fdd
│   │   └── 84
│   │   └── dbd75729adba57cc42f5544b25de571c0c8731 ->
../../[vdso32]/84dbd75729adba57cc42f5544b25de571c0c8731
│   ├── [kernel.kallsyms]
│   │   └── 3ae5ba6d4e532ad529e43ccf1ce1ddf8a64a4fdd
│   ├── [vdso]
│   │   └── 84dbd75729adba57cc42f5544b25de571c0c8731
│   └── [vdso32]
│   └── 84dbd75729adba57cc42f5544b25de571c0c8731
├── lib
│   ├── ld-2.22.so
│   └── libc-2.22.so
├── tmp
│   └── hello
└── xxx

So all binaries we need are included in the symfs dir. I think
this is consistent with your idea explained in previous mails.

With this symfs, we do not need buildid dir anymore and what's
your idea on 'perf buildid-cache' needs symfs option? after all,
that only effects on buildid dir.


I don't understand why dso-prefix option is needed? Why make me type 
yet more options to the analysis command? Why can't the directory be 
located under the symfs tree in a known location and populated the 
same way it is without symfs?




Because the default buidid folder path is $HOME/.debug/.buildid,
and this $HOME is on the target machine, not the same as $HOME
on the host. Without that option, we need to copy $HOME/.debug/.buildid
to the 'known location in symfs', that's also an extra work.




Re: [PATCH v3 3/7 UPDATE] perf tools: Add option for the path of buildid dsos under symfs

2016-05-17 Thread Hekuang



在 2016/5/18 9:51, David Ahern 写道:

On 5/17/16 7:47 PM, Hekuang wrote:



在 2016/5/16 10:50, David Ahern 写道:

On 5/15/16 7:30 PM, Hekuang wrote:

In previous patch, I use 'perf buildid-cache -a' to add vdso
binary into the HOST buildid dir.


So 'perf buildid-cache' needs the symfs option?



With this patch 'PATCH v3 3/7 UPDATE', the tree of symfs dir is
like this:

├── debug($(dso-prefix))
│   ├── .build-id
│   │   ├── 3a
│   │   │   └── e5ba6d4e532ad529e43ccf1ce1ddf8a64a4fdd ->
../../[kernel.kallsyms]/3ae5ba6d4e532ad529e43ccf1ce1ddf8a64a4fdd
│   │   └── 84
│   │   └── dbd75729adba57cc42f5544b25de571c0c8731 ->
../../[vdso32]/84dbd75729adba57cc42f5544b25de571c0c8731
│   ├── [kernel.kallsyms]
│   │   └── 3ae5ba6d4e532ad529e43ccf1ce1ddf8a64a4fdd
│   ├── [vdso]
│   │   └── 84dbd75729adba57cc42f5544b25de571c0c8731
│   └── [vdso32]
│   └── 84dbd75729adba57cc42f5544b25de571c0c8731
├── lib
│   ├── ld-2.22.so
│   └── libc-2.22.so
├── tmp
│   └── hello
└── xxx

So all binaries we need are included in the symfs dir. I think
this is consistent with your idea explained in previous mails.

With this symfs, we do not need buildid dir anymore and what's
your idea on 'perf buildid-cache' needs symfs option? after all,
that only effects on buildid dir.


I don't understand why dso-prefix option is needed? Why make me type 
yet more options to the analysis command? Why can't the directory be 
located under the symfs tree in a known location and populated the 
same way it is without symfs?




Because the default buidid folder path is $HOME/.debug/.buildid,
and this $HOME is on the target machine, not the same as $HOME
on the host. Without that option, we need to copy $HOME/.debug/.buildid
to the 'known location in symfs', that's also an extra work.




Re: [PATCH v3 3/7 UPDATE] perf tools: Add option for the path of buildid dsos under symfs

2016-05-17 Thread Hekuang



在 2016/5/16 10:50, David Ahern 写道:

On 5/15/16 7:30 PM, Hekuang wrote:

In previous patch, I use 'perf buildid-cache -a' to add vdso
binary into the HOST buildid dir.


So 'perf buildid-cache' needs the symfs option?



With this patch 'PATCH v3 3/7 UPDATE', the tree of symfs dir is
like this:

├── debug($(dso-prefix))
│   ├── .build-id
│   │   ├── 3a
│   │   │   └── e5ba6d4e532ad529e43ccf1ce1ddf8a64a4fdd -> 
../../[kernel.kallsyms]/3ae5ba6d4e532ad529e43ccf1ce1ddf8a64a4fdd

│   │   └── 84
│   │   └── dbd75729adba57cc42f5544b25de571c0c8731 -> 
../../[vdso32]/84dbd75729adba57cc42f5544b25de571c0c8731

│   ├── [kernel.kallsyms]
│   │   └── 3ae5ba6d4e532ad529e43ccf1ce1ddf8a64a4fdd
│   ├── [vdso]
│   │   └── 84dbd75729adba57cc42f5544b25de571c0c8731
│   └── [vdso32]
│   └── 84dbd75729adba57cc42f5544b25de571c0c8731
├── lib
│   ├── ld-2.22.so
│   └── libc-2.22.so
├── tmp
│   └── hello
└── xxx

So all binaries we need are included in the symfs dir. I think
this is consistent with your idea explained in previous mails.

With this symfs, we do not need buildid dir anymore and what's
your idea on 'perf buildid-cache' needs symfs option? after all,
that only effects on buildid dir.

Thanks.




Re: [PATCH v3 3/7 UPDATE] perf tools: Add option for the path of buildid dsos under symfs

2016-05-17 Thread Hekuang



在 2016/5/16 10:50, David Ahern 写道:

On 5/15/16 7:30 PM, Hekuang wrote:

In previous patch, I use 'perf buildid-cache -a' to add vdso
binary into the HOST buildid dir.


So 'perf buildid-cache' needs the symfs option?



With this patch 'PATCH v3 3/7 UPDATE', the tree of symfs dir is
like this:

├── debug($(dso-prefix))
│   ├── .build-id
│   │   ├── 3a
│   │   │   └── e5ba6d4e532ad529e43ccf1ce1ddf8a64a4fdd -> 
../../[kernel.kallsyms]/3ae5ba6d4e532ad529e43ccf1ce1ddf8a64a4fdd

│   │   └── 84
│   │   └── dbd75729adba57cc42f5544b25de571c0c8731 -> 
../../[vdso32]/84dbd75729adba57cc42f5544b25de571c0c8731

│   ├── [kernel.kallsyms]
│   │   └── 3ae5ba6d4e532ad529e43ccf1ce1ddf8a64a4fdd
│   ├── [vdso]
│   │   └── 84dbd75729adba57cc42f5544b25de571c0c8731
│   └── [vdso32]
│   └── 84dbd75729adba57cc42f5544b25de571c0c8731
├── lib
│   ├── ld-2.22.so
│   └── libc-2.22.so
├── tmp
│   └── hello
└── xxx

So all binaries we need are included in the symfs dir. I think
this is consistent with your idea explained in previous mails.

With this symfs, we do not need buildid dir anymore and what's
your idea on 'perf buildid-cache' needs symfs option? after all,
that only effects on buildid dir.

Thanks.




Re: [PATCH v3 1/7 UPDATE] perf tools: Find vdso with the consider of cross-platform

2016-05-17 Thread Hekuang



在 2016/5/17 15:33, Adrian Hunter 写道:

On 13/05/16 11:51, He Kuang wrote:

There's a problem in machine__findnew_vdso(), vdso buildid generated
by a 32-bit machine stores it with the name 'vdso', but when
processing buildid on a 64-bit machine with the same 'perf.data', perf
will search for vdso named as 'vdso32' and get failed.

This patch tries to find the exsiting dsos in machine->dsos by thread
dso_type. 64-bit thread tries to find vdso with name 'vdso', because
all 64-bit vdso is named as that. 32-bit thread first tries to find
vdso with name 'vdso32' if this thread was run on 64-bit machine, if
failed, then it tries 'vdso' which indicates that the thread was run
on 32-bit machine when recording.

Signed-off-by: He Kuang 
---
  tools/perf/util/vdso.c | 40 +---
  1 file changed, 37 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/vdso.c b/tools/perf/util/vdso.c
index 44d440d..99f4a3d 100644
--- a/tools/perf/util/vdso.c
+++ b/tools/perf/util/vdso.c
@@ -134,8 +134,6 @@ static struct dso *__machine__addnew_vdso(struct machine 
*machine, const char *s
return dso;
  }
  
-#if BITS_PER_LONG == 64

-
  static enum dso_type machine__thread_dso_type(struct machine *machine,
  struct thread *thread)
  {
@@ -156,6 +154,8 @@ static enum dso_type machine__thread_dso_type(struct 
machine *machine,
return dso_type;
  }
  
+#if BITS_PER_LONG == 64

+
  static int vdso__do_copy_compat(FILE *f, int fd)
  {
char buf[4096];
@@ -283,8 +283,38 @@ static int __machine__findnew_vdso_compat(struct machine 
*machine,
  
  #endif
  
+static struct dso *machine__find_vdso(struct machine *machine,

+ struct thread *thread)
+{
+   struct dso *dso = NULL;
+   enum dso_type dso_type;
+
+   dso_type = machine__thread_dso_type(machine, thread);
+   switch (dso_type) {
+   case DSO__TYPE_32BIT:
+   dso = __dsos__find(>dsos, DSO__NAME_VDSO32, true);
+   if (!dso)
+   dso = __dsos__find(>dsos, DSO__NAME_VDSO,
+  true);

So if we have not yet added the 32-bit vdso but have added the 64-bit vdso,
we will return the wrong one.

Can we check it? e.g.

if (!dso)
dso = __dsos__find(>dsos, DSO__NAME_VDSO,
   true);
if (dso_type != dso__type(dso, machine))
dso = NULL;


+   break;
+   case DSO__TYPE_X32BIT:
+   dso = __dsos__find(>dsos, DSO__NAME_VDSOX32, true);
+   if (!dso)
+   dso = __dsos__find(>dsos, DSO__NAME_VDSO,
+  true);

The x32 vdso is never called DSO__NAME_VDSO so this is not correct, but for
the same reason we don't need this __dsos__find() anyway.


Thanks, update

+   break;
+   case DSO__TYPE_64BIT:
+   case DSO__TYPE_UNKNOWN:
+   default:
+   dso = __dsos__find(>dsos, DSO__NAME_VDSO, true);
+   break;
+   }
+
+   return dso;
+}
+
  struct dso *machine__findnew_vdso(struct machine *machine,
- struct thread *thread __maybe_unused)
+ struct thread *thread)
  {
struct vdso_info *vdso_info;
struct dso *dso = NULL;
@@ -297,6 +327,10 @@ struct dso *machine__findnew_vdso(struct machine *machine,
if (!vdso_info)
goto out_unlock;
  
+	dso = machine__find_vdso(machine, thread);

+   if (dso)
+   goto out_unlock;
+
  #if BITS_PER_LONG == 64
if (__machine__findnew_vdso_compat(machine, thread, vdso_info, ))
goto out_unlock;








Re: [PATCH v3 1/7 UPDATE] perf tools: Find vdso with the consider of cross-platform

2016-05-17 Thread Hekuang



在 2016/5/17 15:33, Adrian Hunter 写道:

On 13/05/16 11:51, He Kuang wrote:

There's a problem in machine__findnew_vdso(), vdso buildid generated
by a 32-bit machine stores it with the name 'vdso', but when
processing buildid on a 64-bit machine with the same 'perf.data', perf
will search for vdso named as 'vdso32' and get failed.

This patch tries to find the exsiting dsos in machine->dsos by thread
dso_type. 64-bit thread tries to find vdso with name 'vdso', because
all 64-bit vdso is named as that. 32-bit thread first tries to find
vdso with name 'vdso32' if this thread was run on 64-bit machine, if
failed, then it tries 'vdso' which indicates that the thread was run
on 32-bit machine when recording.

Signed-off-by: He Kuang 
---
  tools/perf/util/vdso.c | 40 +---
  1 file changed, 37 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/vdso.c b/tools/perf/util/vdso.c
index 44d440d..99f4a3d 100644
--- a/tools/perf/util/vdso.c
+++ b/tools/perf/util/vdso.c
@@ -134,8 +134,6 @@ static struct dso *__machine__addnew_vdso(struct machine 
*machine, const char *s
return dso;
  }
  
-#if BITS_PER_LONG == 64

-
  static enum dso_type machine__thread_dso_type(struct machine *machine,
  struct thread *thread)
  {
@@ -156,6 +154,8 @@ static enum dso_type machine__thread_dso_type(struct 
machine *machine,
return dso_type;
  }
  
+#if BITS_PER_LONG == 64

+
  static int vdso__do_copy_compat(FILE *f, int fd)
  {
char buf[4096];
@@ -283,8 +283,38 @@ static int __machine__findnew_vdso_compat(struct machine 
*machine,
  
  #endif
  
+static struct dso *machine__find_vdso(struct machine *machine,

+ struct thread *thread)
+{
+   struct dso *dso = NULL;
+   enum dso_type dso_type;
+
+   dso_type = machine__thread_dso_type(machine, thread);
+   switch (dso_type) {
+   case DSO__TYPE_32BIT:
+   dso = __dsos__find(>dsos, DSO__NAME_VDSO32, true);
+   if (!dso)
+   dso = __dsos__find(>dsos, DSO__NAME_VDSO,
+  true);

So if we have not yet added the 32-bit vdso but have added the 64-bit vdso,
we will return the wrong one.

Can we check it? e.g.

if (!dso)
dso = __dsos__find(>dsos, DSO__NAME_VDSO,
   true);
if (dso_type != dso__type(dso, machine))
dso = NULL;


+   break;
+   case DSO__TYPE_X32BIT:
+   dso = __dsos__find(>dsos, DSO__NAME_VDSOX32, true);
+   if (!dso)
+   dso = __dsos__find(>dsos, DSO__NAME_VDSO,
+  true);

The x32 vdso is never called DSO__NAME_VDSO so this is not correct, but for
the same reason we don't need this __dsos__find() anyway.


Thanks, update

+   break;
+   case DSO__TYPE_64BIT:
+   case DSO__TYPE_UNKNOWN:
+   default:
+   dso = __dsos__find(>dsos, DSO__NAME_VDSO, true);
+   break;
+   }
+
+   return dso;
+}
+
  struct dso *machine__findnew_vdso(struct machine *machine,
- struct thread *thread __maybe_unused)
+ struct thread *thread)
  {
struct vdso_info *vdso_info;
struct dso *dso = NULL;
@@ -297,6 +327,10 @@ struct dso *machine__findnew_vdso(struct machine *machine,
if (!vdso_info)
goto out_unlock;
  
+	dso = machine__find_vdso(machine, thread);

+   if (dso)
+   goto out_unlock;
+
  #if BITS_PER_LONG == 64
if (__machine__findnew_vdso_compat(machine, thread, vdso_info, ))
goto out_unlock;








Re: [PATCH v3 3/7 UPDATE] perf tools: Add option for the path of buildid dsos under symfs

2016-05-16 Thread Hekuang



在 2016/5/16 10:50, David Ahern 写道:

On 5/15/16 7:30 PM, Hekuang wrote:

In previous patch, I use 'perf buildid-cache -a' to add vdso
binary into the HOST buildid dir.


So 'perf buildid-cache' needs the symfs option?




No, for the host, we don't reference any files in buildid-cache
if symfs is givin.




Re: [PATCH v3 3/7 UPDATE] perf tools: Add option for the path of buildid dsos under symfs

2016-05-16 Thread Hekuang



在 2016/5/16 10:50, David Ahern 写道:

On 5/15/16 7:30 PM, Hekuang wrote:

In previous patch, I use 'perf buildid-cache -a' to add vdso
binary into the HOST buildid dir.


So 'perf buildid-cache' needs the symfs option?




No, for the host, we don't reference any files in buildid-cache
if symfs is givin.




Re: [PATCH v3 3/7 UPDATE] perf tools: Add option for the path of buildid dsos under symfs

2016-05-15 Thread Hekuang



在 2016/5/14 22:43, David Ahern 写道:

On 5/14/16 2:19 AM, He Kuang wrote:

In the cross-platform perf record/script scenario, we need vdsos in
buildid-cache dir and other libs in symfs dir at the same time. For
the reason that to have every single file opened by perf is relative
to symfs dirctory, perf skips the buildid dir if symfs is given.

This patch references the buildid dir under symfs if '--symfs' is
used, and adds new option '--dso-prefix' to specify the subdir path in
symfs which contains the buildid dsos.


In the previous version of this patch you just wanted to drop the 
symfs check. That means there is a path that perf searches for buildid 
files and reading it worked for you. Why is adding symsfs to that path 
not enough? ie., Why do you need to specify a different location under 
the symfs?



In previous patch, I use 'perf buildid-cache -a' to add vdso
binary into the HOST buildid dir.

Currently, $(BUILDID_ROOT)/.debug/.buildid has a directory
structure organized by the buildid value, like this:

 .build-id
 - 3a
 |e5ba6d4e532ad529e43ccf1ce1ddf8a64a4fdd
 - 84
  dbd75729adba57cc42f5544b25de571c0c8731

perf searchs for buildid binaries by the buildid value.

And symfs is a normal file tree and perf searchs for binaryies
by the file name.

 lib
 - libc.so
 - ld.so

Tt's inappropriate to add symfs to .build-id dir becuase they
have different dirctory structure and perf searchs for them by
differnt ways.

So in this patch I add .buildid to symfs and got dirctory tree
like this:

 symfs
 - $(DSO_PREFIX)
 - .build-id
   - 3a
   - 84
 - libc.so
 - ld.so

Thanks



Re: [PATCH v3 3/7 UPDATE] perf tools: Add option for the path of buildid dsos under symfs

2016-05-15 Thread Hekuang



在 2016/5/14 22:43, David Ahern 写道:

On 5/14/16 2:19 AM, He Kuang wrote:

In the cross-platform perf record/script scenario, we need vdsos in
buildid-cache dir and other libs in symfs dir at the same time. For
the reason that to have every single file opened by perf is relative
to symfs dirctory, perf skips the buildid dir if symfs is given.

This patch references the buildid dir under symfs if '--symfs' is
used, and adds new option '--dso-prefix' to specify the subdir path in
symfs which contains the buildid dsos.


In the previous version of this patch you just wanted to drop the 
symfs check. That means there is a path that perf searches for buildid 
files and reading it worked for you. Why is adding symsfs to that path 
not enough? ie., Why do you need to specify a different location under 
the symfs?



In previous patch, I use 'perf buildid-cache -a' to add vdso
binary into the HOST buildid dir.

Currently, $(BUILDID_ROOT)/.debug/.buildid has a directory
structure organized by the buildid value, like this:

 .build-id
 - 3a
 |e5ba6d4e532ad529e43ccf1ce1ddf8a64a4fdd
 - 84
  dbd75729adba57cc42f5544b25de571c0c8731

perf searchs for buildid binaries by the buildid value.

And symfs is a normal file tree and perf searchs for binaryies
by the file name.

 lib
 - libc.so
 - ld.so

Tt's inappropriate to add symfs to .build-id dir becuase they
have different dirctory structure and perf searchs for them by
differnt ways.

So in this patch I add .buildid to symfs and got dirctory tree
like this:

 symfs
 - $(DSO_PREFIX)
 - .build-id
   - 3a
   - 84
 - libc.so
 - ld.so

Thanks



Re: [PATCH v3 1/7] perf tools: Set vdso name to vdso[64,32] depending on platform

2016-05-13 Thread Hekuang



在 2016/5/12 18:06, Adrian Hunter 写道:

On 12/05/16 11:43, He Kuang wrote:

This is a preparation for cross-platform vdso lookup.

There is a naming confusion about vdso name, vdso buildid generated by
a 32-bit machine stores it with the name 'vdso', but when processing
buildid on a 64-bit machine with the same 'perf.data', perf will
search for vdso named as 'vdso32' and get failed.

This patch uses different names when storing the buildid, i.e. vdso64
for 64-bit machine and vdso32 for 32-bit machine, and eliminates this
naming confusion.

That looks like it will break existing perf.data files because they will
have a different name recorded in the buildid section.

Also it doesn't look like it would work the other way around i.e. recording
on a 64-bit machine and processing on a 32-bit machine.


Yes, please have a look at the new patch.

Thanks

Signed-off-by: He Kuang 
---
  tools/perf/util/vdso.h | 7 ++-
  1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/vdso.h b/tools/perf/util/vdso.h
index cdc4fab..45e9ef4 100644
--- a/tools/perf/util/vdso.h
+++ b/tools/perf/util/vdso.h
@@ -4,10 +4,15 @@
  #include 
  #include 
  #include 
+#include "util.h"
  
  #define VDSO__MAP_NAME "[vdso]"
  
-#define DSO__NAME_VDSO"[vdso]"

+#if BITS_PER_LONG == 64
+#define DSO__NAME_VDSO"[vdso64]"
+#else
+#define DSO__NAME_VDSO"[vdso32]"
+#endif
  #define DSO__NAME_VDSO32  "[vdso32]"
  #define DSO__NAME_VDSOX32 "[vdsox32]"
  








Re: [PATCH v3 1/7] perf tools: Set vdso name to vdso[64,32] depending on platform

2016-05-13 Thread Hekuang



在 2016/5/12 18:06, Adrian Hunter 写道:

On 12/05/16 11:43, He Kuang wrote:

This is a preparation for cross-platform vdso lookup.

There is a naming confusion about vdso name, vdso buildid generated by
a 32-bit machine stores it with the name 'vdso', but when processing
buildid on a 64-bit machine with the same 'perf.data', perf will
search for vdso named as 'vdso32' and get failed.

This patch uses different names when storing the buildid, i.e. vdso64
for 64-bit machine and vdso32 for 32-bit machine, and eliminates this
naming confusion.

That looks like it will break existing perf.data files because they will
have a different name recorded in the buildid section.

Also it doesn't look like it would work the other way around i.e. recording
on a 64-bit machine and processing on a 32-bit machine.


Yes, please have a look at the new patch.

Thanks

Signed-off-by: He Kuang 
---
  tools/perf/util/vdso.h | 7 ++-
  1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/vdso.h b/tools/perf/util/vdso.h
index cdc4fab..45e9ef4 100644
--- a/tools/perf/util/vdso.h
+++ b/tools/perf/util/vdso.h
@@ -4,10 +4,15 @@
  #include 
  #include 
  #include 
+#include "util.h"
  
  #define VDSO__MAP_NAME "[vdso]"
  
-#define DSO__NAME_VDSO"[vdso]"

+#if BITS_PER_LONG == 64
+#define DSO__NAME_VDSO"[vdso64]"
+#else
+#define DSO__NAME_VDSO"[vdso32]"
+#endif
  #define DSO__NAME_VDSO32  "[vdso32]"
  #define DSO__NAME_VDSOX32 "[vdsox32]"
  








Re: [PATCH v3 3/7] perf tools: Remove the logical that skip buildid cache if symfs is given

2016-05-13 Thread Hekuang

hi

在 2016/5/13 4:23, David Ahern 写道:

On 5/12/16 7:09 AM, Arnaldo Carvalho de Melo wrote:

Em Thu, May 12, 2016 at 08:43:12AM +, He Kuang escreveu:

Symfs dir and buildid dir are two places that perf looks into for
symbols, currently, if symfs dir is given, buildid-cache is skipped.

In the cross-platform perf record/script scenario, we need vdsos in
buildid-cache dir and other libs in symfs dir at the same time. And
consider that the binaries indexed by buildid do not cause ambiguity,
this patch simply removes that logical.


Makes perfect sense, David, do you have any concern? Can I have your
Acked-by?


seems odd to me you want to look in the buildid-cache when a symfs is 
given. The point of symfs was "go look for everything under here."


I believe dso__load is going to hit DSO_BINARY_TYPE__BUILD_ID_CACHE 
before any of the others and there are probably cases where a stale 
cache entry would be hit before a build tree entry (e.g., symfs).


Build id entries recorded in perf.data reflect the current dso,
so if buildid is matched , how can it be a stale one?



What about putting the build id cache under the symfs? so instead of 
dropping the symfs check and it to the path for the build id cache.




I think your intention is to reference symbol files in one place
instead of two. So there're two possible approaches, one is all
in buildid-cache, but in practice, I found lots of binaries in
symfs even not contains valid buildid, so this way is not work.

The other one is all in symfs. It seems ok, but one problem I
should point out, with my test environment as an example, the
symfsdir is $(TARGET_ROOTFS),and by default buildid_dir is
$(TARGET_ROOTFS)/$(HOME)/.debug/, host perf does not know
$(HOME) folder in target and we should copy the debug folder
 to $(TARGET_ROOTFS), which is readonly in the target. For me, it's
easier to use 'buildid-cache -a vdso-' to add that into host
buildid-cache than copy debug folder from $(HOME) to readonly
$(TARGET_ROOTFS).

Without the stale concern, I prefer the two places(buildid-dir in
host and target symfs) way.

Thanks.



Re: [PATCH v3 3/7] perf tools: Remove the logical that skip buildid cache if symfs is given

2016-05-13 Thread Hekuang

hi

在 2016/5/13 4:23, David Ahern 写道:

On 5/12/16 7:09 AM, Arnaldo Carvalho de Melo wrote:

Em Thu, May 12, 2016 at 08:43:12AM +, He Kuang escreveu:

Symfs dir and buildid dir are two places that perf looks into for
symbols, currently, if symfs dir is given, buildid-cache is skipped.

In the cross-platform perf record/script scenario, we need vdsos in
buildid-cache dir and other libs in symfs dir at the same time. And
consider that the binaries indexed by buildid do not cause ambiguity,
this patch simply removes that logical.


Makes perfect sense, David, do you have any concern? Can I have your
Acked-by?


seems odd to me you want to look in the buildid-cache when a symfs is 
given. The point of symfs was "go look for everything under here."


I believe dso__load is going to hit DSO_BINARY_TYPE__BUILD_ID_CACHE 
before any of the others and there are probably cases where a stale 
cache entry would be hit before a build tree entry (e.g., symfs).


Build id entries recorded in perf.data reflect the current dso,
so if buildid is matched , how can it be a stale one?



What about putting the build id cache under the symfs? so instead of 
dropping the symfs check and it to the path for the build id cache.




I think your intention is to reference symbol files in one place
instead of two. So there're two possible approaches, one is all
in buildid-cache, but in practice, I found lots of binaries in
symfs even not contains valid buildid, so this way is not work.

The other one is all in symfs. It seems ok, but one problem I
should point out, with my test environment as an example, the
symfsdir is $(TARGET_ROOTFS),and by default buildid_dir is
$(TARGET_ROOTFS)/$(HOME)/.debug/, host perf does not know
$(HOME) folder in target and we should copy the debug folder
 to $(TARGET_ROOTFS), which is readonly in the target. For me, it's
easier to use 'buildid-cache -a vdso-' to add that into host
buildid-cache than copy debug folder from $(HOME) to readonly
$(TARGET_ROOTFS).

Without the stale concern, I prefer the two places(buildid-dir in
host and target symfs) way.

Thanks.



Re: [PATCH v2 2/9] perf script: Add options for custom vdso path

2016-05-10 Thread Hekuang



在 2016/5/10 21:44, Arnaldo Carvalho de Melo 写道:

Em Tue, May 10, 2016 at 07:40:30AM +, He Kuang escreveu:

When unwinding callchains on a different machine, vdso info should be
provided so the unwind process won't be interrupted if address falls
into vdso region.

Currently, perf does try to read vdso binary in '.debug' folder, but
the filename of the vdso file is generated randomly based on
VDSO__TEMP_FILE_NAME template, such a filename is not reliable and
users need a way to provide the path of their own vdso binary file.

But this becomes one more burden to the user :-\ Why not calculate a
build-id from the VDSO file contents, copy it to
~/.debug/.build-id/ab/cdef01020304 and insert that into the perf.data
file so that at analysis time we do all this automatically?

We can of course have a way to manually provide it, but having it as the
main way for users doesn't seem a good interface. Am I missing
something?

- Arnaldo


OK, I'll try as you suggested!
  

Signed-off-by: He Kuang 
---
  tools/perf/builtin-script.c | 2 ++
  tools/perf/util/dso.c   | 7 +++
  tools/perf/util/dso.h   | 1 +
  tools/perf/util/symbol.c| 1 +
  tools/perf/util/symbol.h| 1 +
  5 files changed, 12 insertions(+)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 8f6ab2a..c88b547 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -2001,6 +2001,8 @@ int cmd_script(int argc, const char **argv, const char 
*prefix __maybe_unused)
   "file", "vmlinux pathname"),
OPT_STRING(0, "kallsyms", _conf.kallsyms_name,
   "file", "kallsyms pathname"),
+   OPT_STRING(0, "vdso", _conf.vdso_name,
+  "file", "vdso pathname"),
OPT_BOOLEAN('G', "hide-call-graph", _callchain,
"When printing symbols do not display call chain"),
OPT_STRING(0, "symfs", _conf.symfs, "directory",
diff --git a/tools/perf/util/dso.c b/tools/perf/util/dso.c
index 8e639543..6ed1cce 100644
--- a/tools/perf/util/dso.c
+++ b/tools/perf/util/dso.c
@@ -13,6 +13,7 @@ char dso__symtab_origin(const struct dso *dso)
static const char origin[] = {
[DSO_BINARY_TYPE__KALLSYMS] = 'k',
[DSO_BINARY_TYPE__VMLINUX]  = 'v',
+   [DSO_BINARY_TYPE__VDSO] = 'D',
[DSO_BINARY_TYPE__JAVA_JIT] = 'j',
[DSO_BINARY_TYPE__DEBUGLINK]= 'l',
[DSO_BINARY_TYPE__BUILD_ID_CACHE]   = 'B',
@@ -113,6 +114,11 @@ int dso__read_binary_type_filename(const struct dso *dso,
 build_id_hex, build_id_hex + 2);
break;
  
+	case DSO_BINARY_TYPE__VDSO:

+   {
+   snprintf(filename, size, "%s", symbol_conf.vdso_name);
+   break;
+   }
case DSO_BINARY_TYPE__VMLINUX:
case DSO_BINARY_TYPE__GUEST_VMLINUX:
case DSO_BINARY_TYPE__SYSTEM_PATH_DSO:
@@ -487,6 +493,7 @@ static void try_to_open_dso(struct dso *dso, struct machine 
*machine)
enum dso_binary_type binary_type_data[] = {
DSO_BINARY_TYPE__BUILD_ID_CACHE,
DSO_BINARY_TYPE__SYSTEM_PATH_DSO,
+   DSO_BINARY_TYPE__VDSO,
DSO_BINARY_TYPE__NOT_FOUND,
};
int i = 0;
diff --git a/tools/perf/util/dso.h b/tools/perf/util/dso.h
index 0953280..05fac98 100644
--- a/tools/perf/util/dso.h
+++ b/tools/perf/util/dso.h
@@ -30,6 +30,7 @@ enum dso_binary_type {
DSO_BINARY_TYPE__KCORE,
DSO_BINARY_TYPE__GUEST_KCORE,
DSO_BINARY_TYPE__OPENEMBEDDED_DEBUGINFO,
+   DSO_BINARY_TYPE__VDSO,
DSO_BINARY_TYPE__NOT_FOUND,
  };
  
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c

index e7588dc..4630751 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -1363,6 +1363,7 @@ static bool dso__is_compatible_symtab_type(struct dso 
*dso, bool kmod,
case DSO_BINARY_TYPE__UBUNTU_DEBUGINFO:
case DSO_BINARY_TYPE__BUILDID_DEBUGINFO:
case DSO_BINARY_TYPE__OPENEMBEDDED_DEBUGINFO:
+   case DSO_BINARY_TYPE__VDSO:
return !kmod && dso->kernel == DSO_TYPE_USER;
  
  	case DSO_BINARY_TYPE__KALLSYMS:

diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index c8b7544..4e6910e 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -114,6 +114,7 @@ struct symbol_conf {
report_hierarchy;
const char  *vmlinux_name,
*kallsyms_name,
+   *vdso_name,
*source_prefix,
*field_sep;
const char  *default_guest_vmlinux_name,
--
1.8.5.2





Re: [PATCH v2 2/9] perf script: Add options for custom vdso path

2016-05-10 Thread Hekuang



在 2016/5/10 21:44, Arnaldo Carvalho de Melo 写道:

Em Tue, May 10, 2016 at 07:40:30AM +, He Kuang escreveu:

When unwinding callchains on a different machine, vdso info should be
provided so the unwind process won't be interrupted if address falls
into vdso region.

Currently, perf does try to read vdso binary in '.debug' folder, but
the filename of the vdso file is generated randomly based on
VDSO__TEMP_FILE_NAME template, such a filename is not reliable and
users need a way to provide the path of their own vdso binary file.

But this becomes one more burden to the user :-\ Why not calculate a
build-id from the VDSO file contents, copy it to
~/.debug/.build-id/ab/cdef01020304 and insert that into the perf.data
file so that at analysis time we do all this automatically?

We can of course have a way to manually provide it, but having it as the
main way for users doesn't seem a good interface. Am I missing
something?

- Arnaldo


OK, I'll try as you suggested!
  

Signed-off-by: He Kuang 
---
  tools/perf/builtin-script.c | 2 ++
  tools/perf/util/dso.c   | 7 +++
  tools/perf/util/dso.h   | 1 +
  tools/perf/util/symbol.c| 1 +
  tools/perf/util/symbol.h| 1 +
  5 files changed, 12 insertions(+)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 8f6ab2a..c88b547 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -2001,6 +2001,8 @@ int cmd_script(int argc, const char **argv, const char 
*prefix __maybe_unused)
   "file", "vmlinux pathname"),
OPT_STRING(0, "kallsyms", _conf.kallsyms_name,
   "file", "kallsyms pathname"),
+   OPT_STRING(0, "vdso", _conf.vdso_name,
+  "file", "vdso pathname"),
OPT_BOOLEAN('G', "hide-call-graph", _callchain,
"When printing symbols do not display call chain"),
OPT_STRING(0, "symfs", _conf.symfs, "directory",
diff --git a/tools/perf/util/dso.c b/tools/perf/util/dso.c
index 8e639543..6ed1cce 100644
--- a/tools/perf/util/dso.c
+++ b/tools/perf/util/dso.c
@@ -13,6 +13,7 @@ char dso__symtab_origin(const struct dso *dso)
static const char origin[] = {
[DSO_BINARY_TYPE__KALLSYMS] = 'k',
[DSO_BINARY_TYPE__VMLINUX]  = 'v',
+   [DSO_BINARY_TYPE__VDSO] = 'D',
[DSO_BINARY_TYPE__JAVA_JIT] = 'j',
[DSO_BINARY_TYPE__DEBUGLINK]= 'l',
[DSO_BINARY_TYPE__BUILD_ID_CACHE]   = 'B',
@@ -113,6 +114,11 @@ int dso__read_binary_type_filename(const struct dso *dso,
 build_id_hex, build_id_hex + 2);
break;
  
+	case DSO_BINARY_TYPE__VDSO:

+   {
+   snprintf(filename, size, "%s", symbol_conf.vdso_name);
+   break;
+   }
case DSO_BINARY_TYPE__VMLINUX:
case DSO_BINARY_TYPE__GUEST_VMLINUX:
case DSO_BINARY_TYPE__SYSTEM_PATH_DSO:
@@ -487,6 +493,7 @@ static void try_to_open_dso(struct dso *dso, struct machine 
*machine)
enum dso_binary_type binary_type_data[] = {
DSO_BINARY_TYPE__BUILD_ID_CACHE,
DSO_BINARY_TYPE__SYSTEM_PATH_DSO,
+   DSO_BINARY_TYPE__VDSO,
DSO_BINARY_TYPE__NOT_FOUND,
};
int i = 0;
diff --git a/tools/perf/util/dso.h b/tools/perf/util/dso.h
index 0953280..05fac98 100644
--- a/tools/perf/util/dso.h
+++ b/tools/perf/util/dso.h
@@ -30,6 +30,7 @@ enum dso_binary_type {
DSO_BINARY_TYPE__KCORE,
DSO_BINARY_TYPE__GUEST_KCORE,
DSO_BINARY_TYPE__OPENEMBEDDED_DEBUGINFO,
+   DSO_BINARY_TYPE__VDSO,
DSO_BINARY_TYPE__NOT_FOUND,
  };
  
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c

index e7588dc..4630751 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -1363,6 +1363,7 @@ static bool dso__is_compatible_symtab_type(struct dso 
*dso, bool kmod,
case DSO_BINARY_TYPE__UBUNTU_DEBUGINFO:
case DSO_BINARY_TYPE__BUILDID_DEBUGINFO:
case DSO_BINARY_TYPE__OPENEMBEDDED_DEBUGINFO:
+   case DSO_BINARY_TYPE__VDSO:
return !kmod && dso->kernel == DSO_TYPE_USER;
  
  	case DSO_BINARY_TYPE__KALLSYMS:

diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index c8b7544..4e6910e 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -114,6 +114,7 @@ struct symbol_conf {
report_hierarchy;
const char  *vmlinux_name,
*kallsyms_name,
+   *vdso_name,
*source_prefix,
*field_sep;
const char  *default_guest_vmlinux_name,
--
1.8.5.2





Re: [PATCH v2 5/9] perf tools: Add methods to test dso is 64-bit or 32-bit

2016-05-10 Thread Hekuang



在 2016/5/10 19:59, Adrian Hunter 写道:

On 10/05/16 14:38, Hekuang wrote:


在 2016/5/10 18:34, Adrian Hunter 写道:

On 10/05/16 12:49, Hekuang wrote:

hi

在 2016/5/10 16:08, Adrian Hunter 写道:

On 10/05/16 10:40, He Kuang wrote:

32-bit programs can be run on 64-bit machines, so we should choose
unwind methods according to 'thread->map' instead of the host
architecture.

This patch adds methods to test whether a dso is 64-bit or 32-bit by
the class info in elf.

What about using dso->is_64_bit set by dso__load_sym() ?

I've noticed this variable, but it's value is not as its name said:

util/dso.c: 1067dso->is_64_bit = (sizeof(void *) == 8);

That is just initialization i.e. before we know what it is we assume it is
the same as the host.


This is only related to the host architecture.

A closer one is 'is_64_bit' in 'struct symsrc', but the value is assigned
after dso
loaded. So I think we should provide individual methods to get that value.

Are you saying you don't load dsos?  Or that is_64_bit is set incorrectly?


Yes, I know it's the inital value, but the correct value is
assigned in function dso__load_sym(), and have a look at the call
stack(gdb):

#0  dso__load_sym
#1  in dso__load
#2  in map__load
#3  in map__find_symbol
#4  in thread__find_addr_location
#5  in entry
#6  in get_entries
#7  in _Ux86__unwind__get_entries
#8  in thread__resolve_callchain

I think we should choose the right unwind method before
dso__load_sym(). i.e. line#7, which is called before dso__load_sym().

I'm not very familiar with this, what's your opinion?

Have you considered calling map__load() instead of dso_is_64_bit()


IMO, map__load() is heavy than dso_is_64_bit() test, because the
dso tested by unwind_get_arch(thread, map) may or may not be
referenced in the unwind process. For example, a thread can
have a large map but that map never appeared in the callcains
in perf.data.

Or in other words, should we load all symbols for that elf flag or
add a new method targeted for that purpose only.

Thanks.









Re: [PATCH v2 5/9] perf tools: Add methods to test dso is 64-bit or 32-bit

2016-05-10 Thread Hekuang



在 2016/5/10 19:59, Adrian Hunter 写道:

On 10/05/16 14:38, Hekuang wrote:


在 2016/5/10 18:34, Adrian Hunter 写道:

On 10/05/16 12:49, Hekuang wrote:

hi

在 2016/5/10 16:08, Adrian Hunter 写道:

On 10/05/16 10:40, He Kuang wrote:

32-bit programs can be run on 64-bit machines, so we should choose
unwind methods according to 'thread->map' instead of the host
architecture.

This patch adds methods to test whether a dso is 64-bit or 32-bit by
the class info in elf.

What about using dso->is_64_bit set by dso__load_sym() ?

I've noticed this variable, but it's value is not as its name said:

util/dso.c: 1067dso->is_64_bit = (sizeof(void *) == 8);

That is just initialization i.e. before we know what it is we assume it is
the same as the host.


This is only related to the host architecture.

A closer one is 'is_64_bit' in 'struct symsrc', but the value is assigned
after dso
loaded. So I think we should provide individual methods to get that value.

Are you saying you don't load dsos?  Or that is_64_bit is set incorrectly?


Yes, I know it's the inital value, but the correct value is
assigned in function dso__load_sym(), and have a look at the call
stack(gdb):

#0  dso__load_sym
#1  in dso__load
#2  in map__load
#3  in map__find_symbol
#4  in thread__find_addr_location
#5  in entry
#6  in get_entries
#7  in _Ux86__unwind__get_entries
#8  in thread__resolve_callchain

I think we should choose the right unwind method before
dso__load_sym(). i.e. line#7, which is called before dso__load_sym().

I'm not very familiar with this, what's your opinion?

Have you considered calling map__load() instead of dso_is_64_bit()


IMO, map__load() is heavy than dso_is_64_bit() test, because the
dso tested by unwind_get_arch(thread, map) may or may not be
referenced in the unwind process. For example, a thread can
have a large map but that map never appeared in the callcains
in perf.data.

Or in other words, should we load all symbols for that elf flag or
add a new method targeted for that purpose only.

Thanks.









Re: [PATCH v2 5/9] perf tools: Add methods to test dso is 64-bit or 32-bit

2016-05-10 Thread Hekuang



在 2016/5/10 18:34, Adrian Hunter 写道:

On 10/05/16 12:49, Hekuang wrote:

hi

在 2016/5/10 16:08, Adrian Hunter 写道:

On 10/05/16 10:40, He Kuang wrote:

32-bit programs can be run on 64-bit machines, so we should choose
unwind methods according to 'thread->map' instead of the host
architecture.

This patch adds methods to test whether a dso is 64-bit or 32-bit by
the class info in elf.

What about using dso->is_64_bit set by dso__load_sym() ?

I've noticed this variable, but it's value is not as its name said:

util/dso.c: 1067dso->is_64_bit = (sizeof(void *) == 8);

That is just initialization i.e. before we know what it is we assume it is
the same as the host.


This is only related to the host architecture.

A closer one is 'is_64_bit' in 'struct symsrc', but the value is assigned
after dso
loaded. So I think we should provide individual methods to get that value.

Are you saying you don't load dsos?  Or that is_64_bit is set incorrectly?



Yes, I know it's the inital value, but the correct value is
assigned in function dso__load_sym(), and have a look at the call
stack(gdb):

#0  dso__load_sym
#1  in dso__load
#2  in map__load
#3  in map__find_symbol
#4  in thread__find_addr_location
#5  in entry
#6  in get_entries
#7  in _Ux86__unwind__get_entries
#8  in thread__resolve_callchain

I think we should choose the right unwind method before
dso__load_sym(). i.e. line#7, which is called before dso__load_sym().

I'm not very familiar with this, what's your opinion?

Thanks.

Thanks.


Signed-off-by: He Kuang <heku...@huawei.com>
---
   tools/perf/util/symbol-elf.c | 16 +++
   tools/perf/util/symbol.c | 49

   tools/perf/util/symbol.h |  2 ++
   3 files changed, 67 insertions(+)

diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
index 3f9d679..9f290b9 100644
--- a/tools/perf/util/symbol-elf.c
+++ b/tools/perf/util/symbol-elf.c
@@ -636,6 +636,22 @@ bool __weak elf__needs_adjust_symbols(GElf_Ehdr ehdr)
   return ehdr.e_type == ET_EXEC || ehdr.e_type == ET_REL;
   }
   +int elf_is_64_bit(char *name)
+{
+Elf *elf;
+int fd;
+
+fd = open(name, O_RDONLY);
+if (fd < 0)
+return -1;
+
+elf = elf_begin(fd, PERF_ELF_C_READ_MMAP, NULL);
+if (elf == NULL)
+return -1;
+
+return (gelf_getclass(elf) == ELFCLASS64);
+}
+
   int symsrc__init(struct symsrc *ss, struct dso *dso, const char *name,
enum dso_binary_type type)
   {
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index 4630751..592bf8c 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -1395,6 +1395,55 @@ static bool dso__is_compatible_symtab_type(struct
dso *dso, bool kmod,
   }
   }
   +int dso_is_64_bit(struct dso *dso, struct map *map)
+{
+char *name;
+u_int i;
+bool kmod;
+char *root_dir = (char *) "";
+struct machine *machine;
+
+if (map->groups && map->groups->machine)
+machine = map->groups->machine;
+else
+machine = NULL;
+
+if (machine)
+root_dir = machine->root_dir;
+
+name = malloc(PATH_MAX);
+if (!name)
+return -1;
+
+kmod = dso->symtab_type == DSO_BINARY_TYPE__SYSTEM_PATH_KMODULE ||
+dso->symtab_type == DSO_BINARY_TYPE__SYSTEM_PATH_KMODULE_COMP ||
+dso->symtab_type == DSO_BINARY_TYPE__GUEST_KMODULE ||
+dso->symtab_type == DSO_BINARY_TYPE__GUEST_KMODULE_COMP;
+
+/*
+ * Iterate over candidate debug images.
+ * Keep track of "interesting" ones (those which have a symtab, dynsym,
+ * and/or opd section) for processing.
+ */
+for (i = 0; i < DSO_BINARY_TYPE__SYMTAB_CNT; i++) {
+enum dso_binary_type symtab_type = binary_type_symtab[i];
+
+if (!dso__is_compatible_symtab_type(dso, kmod, symtab_type))
+continue;
+
+if (dso__read_binary_type_filename(dso, symtab_type,
+   root_dir, name, PATH_MAX))
+continue;
+
+if (!is_regular_file(name))
+continue;
+
+return elf_is_64_bit(name);
+}
+
+return -1;
+}
+
   int dso__load(struct dso *dso, struct map *map, symbol_filter_t filter)
   {
   char *name;
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index 4e6910e..d33fbf4 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -308,6 +308,8 @@ int setup_list(struct strlist **list, const char
*list_str,
  const char *list_name);
   int setup_intlist(struct intlist **list, const char *list_str,
 const char *list_name);
+int elf_is_64_bit(char *name);
+int dso_is_64_bit(struct dso *dso, struct map *map);
 #ifdef HAVE_LIBELF_SUPPORT
   bool elf__needs_adjust_symbols(GElf_Ehdr ehdr);











Re: [PATCH v2 5/9] perf tools: Add methods to test dso is 64-bit or 32-bit

2016-05-10 Thread Hekuang



在 2016/5/10 18:34, Adrian Hunter 写道:

On 10/05/16 12:49, Hekuang wrote:

hi

在 2016/5/10 16:08, Adrian Hunter 写道:

On 10/05/16 10:40, He Kuang wrote:

32-bit programs can be run on 64-bit machines, so we should choose
unwind methods according to 'thread->map' instead of the host
architecture.

This patch adds methods to test whether a dso is 64-bit or 32-bit by
the class info in elf.

What about using dso->is_64_bit set by dso__load_sym() ?

I've noticed this variable, but it's value is not as its name said:

util/dso.c: 1067dso->is_64_bit = (sizeof(void *) == 8);

That is just initialization i.e. before we know what it is we assume it is
the same as the host.


This is only related to the host architecture.

A closer one is 'is_64_bit' in 'struct symsrc', but the value is assigned
after dso
loaded. So I think we should provide individual methods to get that value.

Are you saying you don't load dsos?  Or that is_64_bit is set incorrectly?



Yes, I know it's the inital value, but the correct value is
assigned in function dso__load_sym(), and have a look at the call
stack(gdb):

#0  dso__load_sym
#1  in dso__load
#2  in map__load
#3  in map__find_symbol
#4  in thread__find_addr_location
#5  in entry
#6  in get_entries
#7  in _Ux86__unwind__get_entries
#8  in thread__resolve_callchain

I think we should choose the right unwind method before
dso__load_sym(). i.e. line#7, which is called before dso__load_sym().

I'm not very familiar with this, what's your opinion?

Thanks.

Thanks.


Signed-off-by: He Kuang 
---
   tools/perf/util/symbol-elf.c | 16 +++
   tools/perf/util/symbol.c | 49

   tools/perf/util/symbol.h |  2 ++
   3 files changed, 67 insertions(+)

diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
index 3f9d679..9f290b9 100644
--- a/tools/perf/util/symbol-elf.c
+++ b/tools/perf/util/symbol-elf.c
@@ -636,6 +636,22 @@ bool __weak elf__needs_adjust_symbols(GElf_Ehdr ehdr)
   return ehdr.e_type == ET_EXEC || ehdr.e_type == ET_REL;
   }
   +int elf_is_64_bit(char *name)
+{
+Elf *elf;
+int fd;
+
+fd = open(name, O_RDONLY);
+if (fd < 0)
+return -1;
+
+elf = elf_begin(fd, PERF_ELF_C_READ_MMAP, NULL);
+if (elf == NULL)
+return -1;
+
+return (gelf_getclass(elf) == ELFCLASS64);
+}
+
   int symsrc__init(struct symsrc *ss, struct dso *dso, const char *name,
enum dso_binary_type type)
   {
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index 4630751..592bf8c 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -1395,6 +1395,55 @@ static bool dso__is_compatible_symtab_type(struct
dso *dso, bool kmod,
   }
   }
   +int dso_is_64_bit(struct dso *dso, struct map *map)
+{
+char *name;
+u_int i;
+bool kmod;
+char *root_dir = (char *) "";
+struct machine *machine;
+
+if (map->groups && map->groups->machine)
+machine = map->groups->machine;
+else
+machine = NULL;
+
+if (machine)
+root_dir = machine->root_dir;
+
+name = malloc(PATH_MAX);
+if (!name)
+return -1;
+
+kmod = dso->symtab_type == DSO_BINARY_TYPE__SYSTEM_PATH_KMODULE ||
+dso->symtab_type == DSO_BINARY_TYPE__SYSTEM_PATH_KMODULE_COMP ||
+dso->symtab_type == DSO_BINARY_TYPE__GUEST_KMODULE ||
+dso->symtab_type == DSO_BINARY_TYPE__GUEST_KMODULE_COMP;
+
+/*
+ * Iterate over candidate debug images.
+ * Keep track of "interesting" ones (those which have a symtab, dynsym,
+ * and/or opd section) for processing.
+ */
+for (i = 0; i < DSO_BINARY_TYPE__SYMTAB_CNT; i++) {
+enum dso_binary_type symtab_type = binary_type_symtab[i];
+
+if (!dso__is_compatible_symtab_type(dso, kmod, symtab_type))
+continue;
+
+if (dso__read_binary_type_filename(dso, symtab_type,
+   root_dir, name, PATH_MAX))
+continue;
+
+if (!is_regular_file(name))
+continue;
+
+return elf_is_64_bit(name);
+}
+
+return -1;
+}
+
   int dso__load(struct dso *dso, struct map *map, symbol_filter_t filter)
   {
   char *name;
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index 4e6910e..d33fbf4 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -308,6 +308,8 @@ int setup_list(struct strlist **list, const char
*list_str,
  const char *list_name);
   int setup_intlist(struct intlist **list, const char *list_str,
 const char *list_name);
+int elf_is_64_bit(char *name);
+int dso_is_64_bit(struct dso *dso, struct map *map);
 #ifdef HAVE_LIBELF_SUPPORT
   bool elf__needs_adjust_symbols(GElf_Ehdr ehdr);











Re: [PATCH v2 5/9] perf tools: Add methods to test dso is 64-bit or 32-bit

2016-05-10 Thread Hekuang

hi

在 2016/5/10 16:08, Adrian Hunter 写道:

On 10/05/16 10:40, He Kuang wrote:

32-bit programs can be run on 64-bit machines, so we should choose
unwind methods according to 'thread->map' instead of the host
architecture.

This patch adds methods to test whether a dso is 64-bit or 32-bit by
the class info in elf.

What about using dso->is_64_bit set by dso__load_sym() ?


I've noticed this variable, but it's value is not as its name said:

util/dso.c: 1067dso->is_64_bit = (sizeof(void *) == 8);

This is only related to the host architecture.

A closer one is 'is_64_bit' in 'struct symsrc', but the value is 
assigned after dso

loaded. So I think we should provide individual methods to get that value.

Thanks.




Signed-off-by: He Kuang 
---
  tools/perf/util/symbol-elf.c | 16 +++
  tools/perf/util/symbol.c | 49 
  tools/perf/util/symbol.h |  2 ++
  3 files changed, 67 insertions(+)

diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
index 3f9d679..9f290b9 100644
--- a/tools/perf/util/symbol-elf.c
+++ b/tools/perf/util/symbol-elf.c
@@ -636,6 +636,22 @@ bool __weak elf__needs_adjust_symbols(GElf_Ehdr ehdr)
return ehdr.e_type == ET_EXEC || ehdr.e_type == ET_REL;
  }
  
+int elf_is_64_bit(char *name)

+{
+   Elf *elf;
+   int fd;
+
+   fd = open(name, O_RDONLY);
+   if (fd < 0)
+   return -1;
+
+   elf = elf_begin(fd, PERF_ELF_C_READ_MMAP, NULL);
+   if (elf == NULL)
+   return -1;
+
+   return (gelf_getclass(elf) == ELFCLASS64);
+}
+
  int symsrc__init(struct symsrc *ss, struct dso *dso, const char *name,
 enum dso_binary_type type)
  {
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index 4630751..592bf8c 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -1395,6 +1395,55 @@ static bool dso__is_compatible_symtab_type(struct dso 
*dso, bool kmod,
}
  }
  
+int dso_is_64_bit(struct dso *dso, struct map *map)

+{
+   char *name;
+   u_int i;
+   bool kmod;
+   char *root_dir = (char *) "";
+   struct machine *machine;
+
+   if (map->groups && map->groups->machine)
+   machine = map->groups->machine;
+   else
+   machine = NULL;
+
+   if (machine)
+   root_dir = machine->root_dir;
+
+   name = malloc(PATH_MAX);
+   if (!name)
+   return -1;
+
+   kmod = dso->symtab_type == DSO_BINARY_TYPE__SYSTEM_PATH_KMODULE ||
+   dso->symtab_type == DSO_BINARY_TYPE__SYSTEM_PATH_KMODULE_COMP ||
+   dso->symtab_type == DSO_BINARY_TYPE__GUEST_KMODULE ||
+   dso->symtab_type == DSO_BINARY_TYPE__GUEST_KMODULE_COMP;
+
+   /*
+* Iterate over candidate debug images.
+* Keep track of "interesting" ones (those which have a symtab, dynsym,
+* and/or opd section) for processing.
+*/
+   for (i = 0; i < DSO_BINARY_TYPE__SYMTAB_CNT; i++) {
+   enum dso_binary_type symtab_type = binary_type_symtab[i];
+
+   if (!dso__is_compatible_symtab_type(dso, kmod, symtab_type))
+   continue;
+
+   if (dso__read_binary_type_filename(dso, symtab_type,
+  root_dir, name, PATH_MAX))
+   continue;
+
+   if (!is_regular_file(name))
+   continue;
+
+   return elf_is_64_bit(name);
+   }
+
+   return -1;
+}
+
  int dso__load(struct dso *dso, struct map *map, symbol_filter_t filter)
  {
char *name;
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index 4e6910e..d33fbf4 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -308,6 +308,8 @@ int setup_list(struct strlist **list, const char *list_str,
   const char *list_name);
  int setup_intlist(struct intlist **list, const char *list_str,
  const char *list_name);
+int elf_is_64_bit(char *name);
+int dso_is_64_bit(struct dso *dso, struct map *map);
  
  #ifdef HAVE_LIBELF_SUPPORT

  bool elf__needs_adjust_symbols(GElf_Ehdr ehdr);








Re: [PATCH v2 5/9] perf tools: Add methods to test dso is 64-bit or 32-bit

2016-05-10 Thread Hekuang

hi

在 2016/5/10 16:08, Adrian Hunter 写道:

On 10/05/16 10:40, He Kuang wrote:

32-bit programs can be run on 64-bit machines, so we should choose
unwind methods according to 'thread->map' instead of the host
architecture.

This patch adds methods to test whether a dso is 64-bit or 32-bit by
the class info in elf.

What about using dso->is_64_bit set by dso__load_sym() ?


I've noticed this variable, but it's value is not as its name said:

util/dso.c: 1067dso->is_64_bit = (sizeof(void *) == 8);

This is only related to the host architecture.

A closer one is 'is_64_bit' in 'struct symsrc', but the value is 
assigned after dso

loaded. So I think we should provide individual methods to get that value.

Thanks.




Signed-off-by: He Kuang 
---
  tools/perf/util/symbol-elf.c | 16 +++
  tools/perf/util/symbol.c | 49 
  tools/perf/util/symbol.h |  2 ++
  3 files changed, 67 insertions(+)

diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
index 3f9d679..9f290b9 100644
--- a/tools/perf/util/symbol-elf.c
+++ b/tools/perf/util/symbol-elf.c
@@ -636,6 +636,22 @@ bool __weak elf__needs_adjust_symbols(GElf_Ehdr ehdr)
return ehdr.e_type == ET_EXEC || ehdr.e_type == ET_REL;
  }
  
+int elf_is_64_bit(char *name)

+{
+   Elf *elf;
+   int fd;
+
+   fd = open(name, O_RDONLY);
+   if (fd < 0)
+   return -1;
+
+   elf = elf_begin(fd, PERF_ELF_C_READ_MMAP, NULL);
+   if (elf == NULL)
+   return -1;
+
+   return (gelf_getclass(elf) == ELFCLASS64);
+}
+
  int symsrc__init(struct symsrc *ss, struct dso *dso, const char *name,
 enum dso_binary_type type)
  {
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index 4630751..592bf8c 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -1395,6 +1395,55 @@ static bool dso__is_compatible_symtab_type(struct dso 
*dso, bool kmod,
}
  }
  
+int dso_is_64_bit(struct dso *dso, struct map *map)

+{
+   char *name;
+   u_int i;
+   bool kmod;
+   char *root_dir = (char *) "";
+   struct machine *machine;
+
+   if (map->groups && map->groups->machine)
+   machine = map->groups->machine;
+   else
+   machine = NULL;
+
+   if (machine)
+   root_dir = machine->root_dir;
+
+   name = malloc(PATH_MAX);
+   if (!name)
+   return -1;
+
+   kmod = dso->symtab_type == DSO_BINARY_TYPE__SYSTEM_PATH_KMODULE ||
+   dso->symtab_type == DSO_BINARY_TYPE__SYSTEM_PATH_KMODULE_COMP ||
+   dso->symtab_type == DSO_BINARY_TYPE__GUEST_KMODULE ||
+   dso->symtab_type == DSO_BINARY_TYPE__GUEST_KMODULE_COMP;
+
+   /*
+* Iterate over candidate debug images.
+* Keep track of "interesting" ones (those which have a symtab, dynsym,
+* and/or opd section) for processing.
+*/
+   for (i = 0; i < DSO_BINARY_TYPE__SYMTAB_CNT; i++) {
+   enum dso_binary_type symtab_type = binary_type_symtab[i];
+
+   if (!dso__is_compatible_symtab_type(dso, kmod, symtab_type))
+   continue;
+
+   if (dso__read_binary_type_filename(dso, symtab_type,
+  root_dir, name, PATH_MAX))
+   continue;
+
+   if (!is_regular_file(name))
+   continue;
+
+   return elf_is_64_bit(name);
+   }
+
+   return -1;
+}
+
  int dso__load(struct dso *dso, struct map *map, symbol_filter_t filter)
  {
char *name;
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index 4e6910e..d33fbf4 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -308,6 +308,8 @@ int setup_list(struct strlist **list, const char *list_str,
   const char *list_name);
  int setup_intlist(struct intlist **list, const char *list_str,
  const char *list_name);
+int elf_is_64_bit(char *name);
+int dso_is_64_bit(struct dso *dso, struct map *map);
  
  #ifdef HAVE_LIBELF_SUPPORT

  bool elf__needs_adjust_symbols(GElf_Ehdr ehdr);








Re: [PATCH 1/8] perf tools: Omit DWARF judgement when recording dwarf callchain

2016-05-09 Thread Hekuang

ok

在 2016/5/10 0:16, Arnaldo Carvalho de Melo 写道:

Em Sat, May 07, 2016 at 08:03:46PM +0200, Jiri Olsa escreveu:

On Fri, May 06, 2016 at 08:59:07AM +, He Kuang wrote:

There's no need for dwarf support when perf recording with callchain.

Signed-off-by: He Kuang 

Acked-by: Jiri Olsa 

Ok, due to your Ack I went on to read the code and indeed, with this
changeset description:

"There is no need to check for DWARF unwinding support when using the
'dwarf' callchain record method, as this will only ask the kernel to
collect stack dumps for later DWARF CFI processing, which can be done
in another machine, where the support for DWARF unwinding need to be
present."

Thanks,

- Arnaldo
  

thanks,
jirka


---
  tools/perf/util/util.c | 2 --
  1 file changed, 2 deletions(-)

diff --git a/tools/perf/util/util.c b/tools/perf/util/util.c
index b7766c5..e5ebfd4 100644
--- a/tools/perf/util/util.c
+++ b/tools/perf/util/util.c
@@ -471,7 +471,6 @@ int parse_callchain_record(const char *arg, struct 
callchain_param *param)
   "needed for --call-graph fp\n");
break;
  
-#ifdef HAVE_DWARF_UNWIND_SUPPORT

/* Dwarf style */
} else if (!strncmp(name, "dwarf", sizeof("dwarf"))) {
const unsigned long default_stack_dump_size = 8192;
@@ -487,7 +486,6 @@ int parse_callchain_record(const char *arg, struct 
callchain_param *param)
ret = get_stack_size(tok, );
param->dump_size = size;
}
-#endif /* HAVE_DWARF_UNWIND_SUPPORT */
} else if (!strncmp(name, "lbr", sizeof("lbr"))) {
if (!strtok_r(NULL, ",", )) {
param->record_mode = CALLCHAIN_LBR;
--
1.8.5.2






Re: [PATCH 1/8] perf tools: Omit DWARF judgement when recording dwarf callchain

2016-05-09 Thread Hekuang

ok

在 2016/5/10 0:16, Arnaldo Carvalho de Melo 写道:

Em Sat, May 07, 2016 at 08:03:46PM +0200, Jiri Olsa escreveu:

On Fri, May 06, 2016 at 08:59:07AM +, He Kuang wrote:

There's no need for dwarf support when perf recording with callchain.

Signed-off-by: He Kuang 

Acked-by: Jiri Olsa 

Ok, due to your Ack I went on to read the code and indeed, with this
changeset description:

"There is no need to check for DWARF unwinding support when using the
'dwarf' callchain record method, as this will only ask the kernel to
collect stack dumps for later DWARF CFI processing, which can be done
in another machine, where the support for DWARF unwinding need to be
present."

Thanks,

- Arnaldo
  

thanks,
jirka


---
  tools/perf/util/util.c | 2 --
  1 file changed, 2 deletions(-)

diff --git a/tools/perf/util/util.c b/tools/perf/util/util.c
index b7766c5..e5ebfd4 100644
--- a/tools/perf/util/util.c
+++ b/tools/perf/util/util.c
@@ -471,7 +471,6 @@ int parse_callchain_record(const char *arg, struct 
callchain_param *param)
   "needed for --call-graph fp\n");
break;
  
-#ifdef HAVE_DWARF_UNWIND_SUPPORT

/* Dwarf style */
} else if (!strncmp(name, "dwarf", sizeof("dwarf"))) {
const unsigned long default_stack_dump_size = 8192;
@@ -487,7 +486,6 @@ int parse_callchain_record(const char *arg, struct 
callchain_param *param)
ret = get_stack_size(tok, );
param->dump_size = size;
}
-#endif /* HAVE_DWARF_UNWIND_SUPPORT */
} else if (!strncmp(name, "lbr", sizeof("lbr"))) {
if (!strtok_r(NULL, ",", )) {
param->record_mode = CALLCHAIN_LBR;
--
1.8.5.2






Re: [PATCH] perf tools: Fix build errors on tsc functions for archs other than x86

2016-04-01 Thread Hekuang


hi,
在 2016/4/1 21:13, Arnaldo Carvalho de Melo 写道:

Em Fri, Apr 01, 2016 at 03:49:32AM +, He Kuang escreveu:

Build errors on aarch64:

   libperf.a(libperf-in.o): In function `convert_timestamp':
   util/jitdump.c:356: undefined reference to `tsc_to_perf_time'
   collect2: error: ld returned 1 exit status
   Makefile.perf:347: recipe for target 'perf' failed
   make[1]: *** [perf] Error 1
   Makefile:68: recipe for target 'all' failed
   make: *** [all] Error 2

Got it, my cross-compile environment for aarch64 doesn't build jitdump.c
because it needs a package I haven't found in ubuntu:

config/Makefile:416: No libcrypto.h found, disables jitted code injection, 
please install libssl-devel or libssl-dev
minimal-ubuntu-x-arm64: Ok

BTW, He, what environment do you use to build for arm64, is it a cross
compile one? Yocto?


I use buildroot, though my colleague prefers yocto(oxygen).

In my environment, libcrypto is on.
... libcrypto: [ on  ]



- Arnaldo


  

Since tsc conversion functions were moved out from arch dir, move
'tsc.h' out from x86 dir to make it possible to compile for other archs.

Signed-off-by: He Kuang 
---
  tools/perf/arch/x86/util/tsc.c |  1 -
  tools/perf/arch/x86/util/tsc.h | 17 -
  tools/perf/util/Build  |  3 +--
  tools/perf/util/tsc.h  | 11 ++-
  4 files changed, 11 insertions(+), 21 deletions(-)
  delete mode 100644 tools/perf/arch/x86/util/tsc.h

diff --git a/tools/perf/arch/x86/util/tsc.c b/tools/perf/arch/x86/util/tsc.c
index 70ff7c1..357f1b1 100644
--- a/tools/perf/arch/x86/util/tsc.c
+++ b/tools/perf/arch/x86/util/tsc.c
@@ -7,7 +7,6 @@
  #include 
  #include "../../util/debug.h"
  #include "../../util/tsc.h"
-#include "tsc.h"
  
  int perf_read_tsc_conversion(const struct perf_event_mmap_page *pc,

 struct perf_tsc_conversion *tc)
diff --git a/tools/perf/arch/x86/util/tsc.h b/tools/perf/arch/x86/util/tsc.h
deleted file mode 100644
index 2edc4d3..000
--- a/tools/perf/arch/x86/util/tsc.h
+++ /dev/null
@@ -1,17 +0,0 @@
-#ifndef TOOLS_PERF_ARCH_X86_UTIL_TSC_H__
-#define TOOLS_PERF_ARCH_X86_UTIL_TSC_H__
-
-#include 
-
-struct perf_tsc_conversion {
-   u16 time_shift;
-   u32 time_mult;
-   u64 time_zero;
-};
-
-struct perf_event_mmap_page;
-
-int perf_read_tsc_conversion(const struct perf_event_mmap_page *pc,
-struct perf_tsc_conversion *tc);
-
-#endif /* TOOLS_PERF_ARCH_X86_UTIL_TSC_H__ */
diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index da48fd8..85ceff3 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -69,8 +69,7 @@ libperf-y += stat-shadow.o
  libperf-y += record.o
  libperf-y += srcline.o
  libperf-y += data.o
-libperf-$(CONFIG_X86) += tsc.o
-libperf-$(CONFIG_AUXTRACE) += tsc.o
+libperf-y += tsc.o
  libperf-y += cloexec.o
  libperf-y += thread-stack.o
  libperf-$(CONFIG_AUXTRACE) += auxtrace.o
diff --git a/tools/perf/util/tsc.h b/tools/perf/util/tsc.h
index 280ddc0..d5b11e2 100644
--- a/tools/perf/util/tsc.h
+++ b/tools/perf/util/tsc.h
@@ -4,7 +4,16 @@
  #include 
  
  #include "event.h"

-#include "../arch/x86/util/tsc.h"
+
+struct perf_tsc_conversion {
+   u16 time_shift;
+   u32 time_mult;
+   u64 time_zero;
+};
+struct perf_event_mmap_page;
+
+int perf_read_tsc_conversion(const struct perf_event_mmap_page *pc,
+struct perf_tsc_conversion *tc);
  
  u64 perf_time_to_tsc(u64 ns, struct perf_tsc_conversion *tc);

  u64 tsc_to_perf_time(u64 cyc, struct perf_tsc_conversion *tc);
--
1.8.5.2





Re: [PATCH] perf tools: Fix build errors on tsc functions for archs other than x86

2016-04-01 Thread Hekuang


hi,
在 2016/4/1 21:13, Arnaldo Carvalho de Melo 写道:

Em Fri, Apr 01, 2016 at 03:49:32AM +, He Kuang escreveu:

Build errors on aarch64:

   libperf.a(libperf-in.o): In function `convert_timestamp':
   util/jitdump.c:356: undefined reference to `tsc_to_perf_time'
   collect2: error: ld returned 1 exit status
   Makefile.perf:347: recipe for target 'perf' failed
   make[1]: *** [perf] Error 1
   Makefile:68: recipe for target 'all' failed
   make: *** [all] Error 2

Got it, my cross-compile environment for aarch64 doesn't build jitdump.c
because it needs a package I haven't found in ubuntu:

config/Makefile:416: No libcrypto.h found, disables jitted code injection, 
please install libssl-devel or libssl-dev
minimal-ubuntu-x-arm64: Ok

BTW, He, what environment do you use to build for arm64, is it a cross
compile one? Yocto?


I use buildroot, though my colleague prefers yocto(oxygen).

In my environment, libcrypto is on.
... libcrypto: [ on  ]



- Arnaldo


  

Since tsc conversion functions were moved out from arch dir, move
'tsc.h' out from x86 dir to make it possible to compile for other archs.

Signed-off-by: He Kuang 
---
  tools/perf/arch/x86/util/tsc.c |  1 -
  tools/perf/arch/x86/util/tsc.h | 17 -
  tools/perf/util/Build  |  3 +--
  tools/perf/util/tsc.h  | 11 ++-
  4 files changed, 11 insertions(+), 21 deletions(-)
  delete mode 100644 tools/perf/arch/x86/util/tsc.h

diff --git a/tools/perf/arch/x86/util/tsc.c b/tools/perf/arch/x86/util/tsc.c
index 70ff7c1..357f1b1 100644
--- a/tools/perf/arch/x86/util/tsc.c
+++ b/tools/perf/arch/x86/util/tsc.c
@@ -7,7 +7,6 @@
  #include 
  #include "../../util/debug.h"
  #include "../../util/tsc.h"
-#include "tsc.h"
  
  int perf_read_tsc_conversion(const struct perf_event_mmap_page *pc,

 struct perf_tsc_conversion *tc)
diff --git a/tools/perf/arch/x86/util/tsc.h b/tools/perf/arch/x86/util/tsc.h
deleted file mode 100644
index 2edc4d3..000
--- a/tools/perf/arch/x86/util/tsc.h
+++ /dev/null
@@ -1,17 +0,0 @@
-#ifndef TOOLS_PERF_ARCH_X86_UTIL_TSC_H__
-#define TOOLS_PERF_ARCH_X86_UTIL_TSC_H__
-
-#include 
-
-struct perf_tsc_conversion {
-   u16 time_shift;
-   u32 time_mult;
-   u64 time_zero;
-};
-
-struct perf_event_mmap_page;
-
-int perf_read_tsc_conversion(const struct perf_event_mmap_page *pc,
-struct perf_tsc_conversion *tc);
-
-#endif /* TOOLS_PERF_ARCH_X86_UTIL_TSC_H__ */
diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index da48fd8..85ceff3 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -69,8 +69,7 @@ libperf-y += stat-shadow.o
  libperf-y += record.o
  libperf-y += srcline.o
  libperf-y += data.o
-libperf-$(CONFIG_X86) += tsc.o
-libperf-$(CONFIG_AUXTRACE) += tsc.o
+libperf-y += tsc.o
  libperf-y += cloexec.o
  libperf-y += thread-stack.o
  libperf-$(CONFIG_AUXTRACE) += auxtrace.o
diff --git a/tools/perf/util/tsc.h b/tools/perf/util/tsc.h
index 280ddc0..d5b11e2 100644
--- a/tools/perf/util/tsc.h
+++ b/tools/perf/util/tsc.h
@@ -4,7 +4,16 @@
  #include 
  
  #include "event.h"

-#include "../arch/x86/util/tsc.h"
+
+struct perf_tsc_conversion {
+   u16 time_shift;
+   u32 time_mult;
+   u64 time_zero;
+};
+struct perf_event_mmap_page;
+
+int perf_read_tsc_conversion(const struct perf_event_mmap_page *pc,
+struct perf_tsc_conversion *tc);
  
  u64 perf_time_to_tsc(u64 ns, struct perf_tsc_conversion *tc);

  u64 tsc_to_perf_time(u64 cyc, struct perf_tsc_conversion *tc);
--
1.8.5.2





Re: [PATCH] Revert "mm/page_alloc: protect pcp->batch accesses with ACCESS_ONCE"

2016-03-30 Thread Hekuang

hi

在 2016/3/31 9:39, Zefan Li 写道:

On 2016/3/31 9:14, Hekuang wrote:

Hi

在 2016/3/30 19:10, Michal Hocko 写道:

On Wed 30-03-16 18:51:12, Hekuang wrote:

hi

在 2016/3/30 18:38, Mel Gorman 写道:

On Wed, Mar 30, 2016 at 10:22:07AM +, He Kuang wrote:

This reverts commit 998d39cb236fe464af86a3492a24d2f67ee1efc2.

When local irq is disabled, a percpu variable does not change, so we can
remove the access macros and let the compiler optimize the code safely.


batch can be changed from other contexts. Why is this safe?


I've mistakenly thought that per_cpu variable can only be accessed by that
cpu.

git blame would point you to 998d39cb236f ("mm/page_alloc: protect
pcp->batch accesses with ACCESS_ONCE"). I haven't looked into the code
deeply to confirm this is still the case but it would be a good lead
that this is not that simple. ACCESS_ONCE resp. {READ,WRITE}_ONCE are
usually quite subtle so I would encourage you or anybody else who try to
remove them to study the code and the history deeper before removing
them.


Thank you for responding, I've read that commit and related articles and not 
sending
mail casually, though you may think it's a stupid patch. I'm a beginner and I 
think
sending mails to maillist is a effective way to learn kernel, And, sure i'll be 
more careful and
well prepared next time :)


pcp->batch can be changed in a different cpu. You may read 
percpu_pagelist_fraction_sysctl_handler()
to see how that can happen.



OK. got it!



Re: [PATCH] Revert "mm/page_alloc: protect pcp->batch accesses with ACCESS_ONCE"

2016-03-30 Thread Hekuang

hi

在 2016/3/31 9:39, Zefan Li 写道:

On 2016/3/31 9:14, Hekuang wrote:

Hi

在 2016/3/30 19:10, Michal Hocko 写道:

On Wed 30-03-16 18:51:12, Hekuang wrote:

hi

在 2016/3/30 18:38, Mel Gorman 写道:

On Wed, Mar 30, 2016 at 10:22:07AM +, He Kuang wrote:

This reverts commit 998d39cb236fe464af86a3492a24d2f67ee1efc2.

When local irq is disabled, a percpu variable does not change, so we can
remove the access macros and let the compiler optimize the code safely.


batch can be changed from other contexts. Why is this safe?


I've mistakenly thought that per_cpu variable can only be accessed by that
cpu.

git blame would point you to 998d39cb236f ("mm/page_alloc: protect
pcp->batch accesses with ACCESS_ONCE"). I haven't looked into the code
deeply to confirm this is still the case but it would be a good lead
that this is not that simple. ACCESS_ONCE resp. {READ,WRITE}_ONCE are
usually quite subtle so I would encourage you or anybody else who try to
remove them to study the code and the history deeper before removing
them.


Thank you for responding, I've read that commit and related articles and not 
sending
mail casually, though you may think it's a stupid patch. I'm a beginner and I 
think
sending mails to maillist is a effective way to learn kernel, And, sure i'll be 
more careful and
well prepared next time :)


pcp->batch can be changed in a different cpu. You may read 
percpu_pagelist_fraction_sysctl_handler()
to see how that can happen.



OK. got it!



Re: [PATCH] Revert "mm/page_alloc: protect pcp->batch accesses with ACCESS_ONCE"

2016-03-30 Thread Hekuang

Hi

在 2016/3/30 19:10, Michal Hocko 写道:

On Wed 30-03-16 18:51:12, Hekuang wrote:

hi

在 2016/3/30 18:38, Mel Gorman 写道:

On Wed, Mar 30, 2016 at 10:22:07AM +, He Kuang wrote:

This reverts commit 998d39cb236fe464af86a3492a24d2f67ee1efc2.

When local irq is disabled, a percpu variable does not change, so we can
remove the access macros and let the compiler optimize the code safely.


batch can be changed from other contexts. Why is this safe?


I've mistakenly thought that per_cpu variable can only be accessed by that
cpu.

git blame would point you to 998d39cb236f ("mm/page_alloc: protect
pcp->batch accesses with ACCESS_ONCE"). I haven't looked into the code
deeply to confirm this is still the case but it would be a good lead
that this is not that simple. ACCESS_ONCE resp. {READ,WRITE}_ONCE are
usually quite subtle so I would encourage you or anybody else who try to
remove them to study the code and the history deeper before removing
them.

Thank you for responding, I've read that commit and related articles and 
not sending
mail casually, though you may think it's a stupid patch. I'm a beginner 
and I think
sending mails to maillist is a effective way to learn kernel, And, sure 
i'll be more careful and

well prepared next time :)



Re: [PATCH] Revert "mm/page_alloc: protect pcp->batch accesses with ACCESS_ONCE"

2016-03-30 Thread Hekuang

Hi

在 2016/3/30 19:10, Michal Hocko 写道:

On Wed 30-03-16 18:51:12, Hekuang wrote:

hi

在 2016/3/30 18:38, Mel Gorman 写道:

On Wed, Mar 30, 2016 at 10:22:07AM +, He Kuang wrote:

This reverts commit 998d39cb236fe464af86a3492a24d2f67ee1efc2.

When local irq is disabled, a percpu variable does not change, so we can
remove the access macros and let the compiler optimize the code safely.


batch can be changed from other contexts. Why is this safe?


I've mistakenly thought that per_cpu variable can only be accessed by that
cpu.

git blame would point you to 998d39cb236f ("mm/page_alloc: protect
pcp->batch accesses with ACCESS_ONCE"). I haven't looked into the code
deeply to confirm this is still the case but it would be a good lead
that this is not that simple. ACCESS_ONCE resp. {READ,WRITE}_ONCE are
usually quite subtle so I would encourage you or anybody else who try to
remove them to study the code and the history deeper before removing
them.

Thank you for responding, I've read that commit and related articles and 
not sending
mail casually, though you may think it's a stupid patch. I'm a beginner 
and I think
sending mails to maillist is a effective way to learn kernel, And, sure 
i'll be more careful and

well prepared next time :)



Re: [PATCH] Revert "mm/page_alloc: protect pcp->batch accesses with ACCESS_ONCE"

2016-03-30 Thread Hekuang

hi

在 2016/3/30 18:38, Mel Gorman 写道:

On Wed, Mar 30, 2016 at 10:22:07AM +, He Kuang wrote:

This reverts commit 998d39cb236fe464af86a3492a24d2f67ee1efc2.

When local irq is disabled, a percpu variable does not change, so we can
remove the access macros and let the compiler optimize the code safely.


batch can be changed from other contexts. Why is this safe?

I've mistakenly thought that per_cpu variable can only be accessed by 
that cpu.

Thanks.



Re: [PATCH] Revert "mm/page_alloc: protect pcp->batch accesses with ACCESS_ONCE"

2016-03-30 Thread Hekuang

hi

在 2016/3/30 18:38, Mel Gorman 写道:

On Wed, Mar 30, 2016 at 10:22:07AM +, He Kuang wrote:

This reverts commit 998d39cb236fe464af86a3492a24d2f67ee1efc2.

When local irq is disabled, a percpu variable does not change, so we can
remove the access macros and let the compiler optimize the code safely.


batch can be changed from other contexts. Why is this safe?

I've mistakenly thought that per_cpu variable can only be accessed by 
that cpu.

Thanks.



[BUG ARM64/perf] Perf record on hardware breakpoint causes application to hang

2016-03-03 Thread Hekuang

This problem can be reproduced as follows:

We know cat /proc/version will read the memory of symbol
linux_proc_banner, then we make a hardware memory access
breakpoint on that address.

on terminal 1:

  $ perf record -e mem:0x$(cat /proc/kallsyms|grep 
linux_proc_banner|cut -d " " -f 1):rw --no-buffer -a


on terminal 2:

  $ cat /proc/version

Then our 'cat' process on terminal 2 will be hanged, until we press
'^C' to stop perf from recording events.

The sample numbers recorded by perf is extraordinary too:

  [ perf record: Captured and wrote 0.879 MB perf.data (22691 samples) ]

The right result can be produced by removing the 'no-buffer'
argument in perf command line, and the result should be like
this:

  $ perf record -e mem:0x$(cat /proc/kallsyms|grep linux_proc_
   banner|cut -d " " -f 1):rw  -a
  ^C
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.013 MB perf.data (10 samples) ]

Report this bug to you and hope for answers.

Thanks.



[BUG ARM64/perf] Perf record on hardware breakpoint causes application to hang

2016-03-03 Thread Hekuang

This problem can be reproduced as follows:

We know cat /proc/version will read the memory of symbol
linux_proc_banner, then we make a hardware memory access
breakpoint on that address.

on terminal 1:

  $ perf record -e mem:0x$(cat /proc/kallsyms|grep 
linux_proc_banner|cut -d " " -f 1):rw --no-buffer -a


on terminal 2:

  $ cat /proc/version

Then our 'cat' process on terminal 2 will be hanged, until we press
'^C' to stop perf from recording events.

The sample numbers recorded by perf is extraordinary too:

  [ perf record: Captured and wrote 0.879 MB perf.data (22691 samples) ]

The right result can be produced by removing the 'no-buffer'
argument in perf command line, and the result should be like
this:

  $ perf record -e mem:0x$(cat /proc/kallsyms|grep linux_proc_
   banner|cut -d " " -f 1):rw  -a
  ^C
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.013 MB perf.data (10 samples) ]

Report this bug to you and hope for answers.

Thanks.



Re: [PATCH] perf probe: Fix failure to probe events on arm

2015-06-16 Thread hekuang

hi, Arnaldo

On 06/15/2015 10:49 PM, Arnaldo Carvalho de Melo wrote:

Em Mon, Jun 15, 2015 at 08:06:53AM +, He Kuang escreveu:

Fix failure to probe events on arm, problem is introduced by commit
5a51fcd1f30c ("perf probe: Skip kernel symbols which is out of
.text"). For some architectures, label '_etext' is not in the .text
section(in .notes section for arm/arm64). Label out of .text section is
not loaded as symbols and we got a zero value when look up its address,
which causes all events be wrongly skiped.

This patch uses kernel map->end when failed to get the address of
'_etext' and fixes the problem.

Masami, can't we always use map->end then? Can you please take a look at
this patch and ack/nack it?

- Arnaldo


I think _etext is more accurate than kernel map->end, because
__map_groups__fixup_end() is called at the end of
dso__load_sym(), which fixes map->end to next_map->start.

Comparative result as this:
  etext_addr=819a1b85, map->end=81ff1000.

So if possible, we should use _etext.

Thanks.
  

Problem can be reproduced on arm as following:

   # perf probe --add='generic_perform_write'
   generic_perform_write+0 is out of .text, skip it.
   Probe point 'generic_perform_write' not found.
 Error: Failed to add events. Reason: No such file or directory (Code: -2)

After this patch:

   # perf probe --add='generic_perform_write'
   Added new event:
 probe:generic_perform_write (on generic_perform_write)

   You can now use it in all perf tools, such as:

 perf record -e probe:generic_perform_write -aR sleep 1

Signed-off-by: He Kuang 
---
  tools/perf/util/probe-event.c | 16 +++-
  1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
index daa24a2..ee26961 100644
--- a/tools/perf/util/probe-event.c
+++ b/tools/perf/util/probe-event.c
@@ -575,8 +575,22 @@ static int post_process_probe_trace_events(struct 
probe_trace_event *tevs,
pr_warning("Relocated base symbol is not found!\n");
return -EINVAL;
}
-   /* Get the address of _etext for checking non-probable text symbol */
+   /* Get the address of _etext for checking non-probable text symbol,
+  for some architectures (e.g. arm, arm64), _etext is out of .text
+  section and not loaded as symbols, use kernel map->end instead.
+*/
etext_addr = kernel_get_symbol_address_by_name("_etext", false);
+   if (etext_addr == 0) {
+   struct map *map;
+
+   map = kernel_get_module_map(NULL);
+   if (!map) {
+   pr_err("Failed to get a map for kernel\n");
+   return -EINVAL;
+   }
+
+   etext_addr = map->end;
+   }
  
  	for (i = 0; i < ntevs; i++) {

if (tevs[i].point.address && !tevs[i].point.retprobe) {
--
1.8.5.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] perf probe: Fix failure to probe events on arm

2015-06-16 Thread hekuang

hi, Arnaldo

On 06/15/2015 10:49 PM, Arnaldo Carvalho de Melo wrote:

Em Mon, Jun 15, 2015 at 08:06:53AM +, He Kuang escreveu:

Fix failure to probe events on arm, problem is introduced by commit
5a51fcd1f30c (perf probe: Skip kernel symbols which is out of
.text). For some architectures, label '_etext' is not in the .text
section(in .notes section for arm/arm64). Label out of .text section is
not loaded as symbols and we got a zero value when look up its address,
which causes all events be wrongly skiped.

This patch uses kernel map-end when failed to get the address of
'_etext' and fixes the problem.

Masami, can't we always use map-end then? Can you please take a look at
this patch and ack/nack it?

- Arnaldo


I think _etext is more accurate than kernel map-end, because
__map_groups__fixup_end() is called at the end of
dso__load_sym(), which fixes map-end to next_map-start.

Comparative result as this:
  etext_addr=819a1b85, map-end=81ff1000.

So if possible, we should use _etext.

Thanks.
  

Problem can be reproduced on arm as following:

   # perf probe --add='generic_perform_write'
   generic_perform_write+0 is out of .text, skip it.
   Probe point 'generic_perform_write' not found.
 Error: Failed to add events. Reason: No such file or directory (Code: -2)

After this patch:

   # perf probe --add='generic_perform_write'
   Added new event:
 probe:generic_perform_write (on generic_perform_write)

   You can now use it in all perf tools, such as:

 perf record -e probe:generic_perform_write -aR sleep 1

Signed-off-by: He Kuang heku...@huawei.com
---
  tools/perf/util/probe-event.c | 16 +++-
  1 file changed, 15 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/probe-event.c b/tools/perf/util/probe-event.c
index daa24a2..ee26961 100644
--- a/tools/perf/util/probe-event.c
+++ b/tools/perf/util/probe-event.c
@@ -575,8 +575,22 @@ static int post_process_probe_trace_events(struct 
probe_trace_event *tevs,
pr_warning(Relocated base symbol is not found!\n);
return -EINVAL;
}
-   /* Get the address of _etext for checking non-probable text symbol */
+   /* Get the address of _etext for checking non-probable text symbol,
+  for some architectures (e.g. arm, arm64), _etext is out of .text
+  section and not loaded as symbols, use kernel map-end instead.
+*/
etext_addr = kernel_get_symbol_address_by_name(_etext, false);
+   if (etext_addr == 0) {
+   struct map *map;
+
+   map = kernel_get_module_map(NULL);
+   if (!map) {
+   pr_err(Failed to get a map for kernel\n);
+   return -EINVAL;
+   }
+
+   etext_addr = map-end;
+   }
  
  	for (i = 0; i  ntevs; i++) {

if (tevs[i].point.address  !tevs[i].point.retprobe) {
--
1.8.5.2

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/




--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 1/2] tools lib traceevent: Export dynamic symbols used by traceevent plugins

2015-05-12 Thread hekuang

Hi, jirka

On 05/12/2015 08:37 PM, Jiri Olsa wrote:

On Tue, May 12, 2015 at 06:41:56AM +, He Kuang wrote:

SNIP


$(Q)$(MAKE) $(build)=perf
  
-$(OUTPUT)perf: $(PERFLIBS) $(PERF_IN)

-   $(QUIET_LINK)$(CC) $(CFLAGS) $(LDFLAGS) $(PERF_IN) $(LIBS) -o $@
+LD_LIBTRACEEVENT_FLAGS += -Xlinker --dynamic-list=$(LIBTRACEEVENT_DYNAMIC_LIST)
+$(OUTPUT)perf: $(PERFLIBS) $(PERF_IN) $(LIBTRACEEVENT_DYNAMIC_LIST)
+   $(QUIET_LINK)$(CC) $(CFLAGS) $(LDFLAGS) $(LD_LIBTRACEEVENT_FLAGS) 
$(PERF_IN) $(LIBS) -o $@
  
  $(GTK_IN): FORCE

$(Q)$(MAKE) $(build)=gtk
@@ -373,7 +375,13 @@ $(LIB_FILE): $(LIBPERF_IN)
  LIBTRACEEVENT_FLAGS += plugin_dir=$(plugindir_SQ)
  
  $(LIBTRACEEVENT): FORCE

-   $(Q)$(MAKE) -C $(TRACE_EVENT_DIR) $(LIBTRACEEVENT_FLAGS) O=$(OUTPUT) 
$(OUTPUT)libtraceevent.a plugins
+   $(Q)$(MAKE) -C $(TRACE_EVENT_DIR) $(LIBTRACEEVENT_FLAGS) O=$(OUTPUT) 
$(OUTPUT)libtraceevent.a
+
+libtraceevent_plugins: FORCE
+   $(Q)$(MAKE) -C $(TRACE_EVENT_DIR) $(LIBTRACEEVENT_FLAGS) O=$(OUTPUT) 
plugins
+
+$(LIBTRACEEVENT_DYNAMIC_LIST): libtraceevent_plugins
+   $(Q)$(MAKE) -C $(TRACE_EVENT_DIR) $(LIBTRACEEVENT_FLAGS) O=$(OUTPUT) 
$(OUTPUT)libtraceevent-dynamic-list

I thought the idea of v2 was not to introduce new target,
something like in attached patch (not completely tested)

jirka


There is a problem, the target perf executable is dependent on
the dynamic-list-file, so we should add plugins or the
dynamic-list-file to perf's dependencies.

As your patch below, the dynamic-list-file is built implictly
when building plugins, so we should not add it directly to the
dependency list of perf.

It seems new targets are needed. In the v2 patch,
dynamic-list-file is extracted out from plugins as perf's
dependncy.


---
diff --git a/tools/lib/traceevent/Makefile b/tools/lib/traceevent/Makefile
index d410da335e3d..57eed0403e6a 100644
--- a/tools/lib/traceevent/Makefile
+++ b/tools/lib/traceevent/Makefile
@@ -23,6 +23,7 @@ endef
  # Allow setting CC and AR, or setting CROSS_COMPILE as a prefix.
  $(call allow-override,CC,$(CROSS_COMPILE)gcc)
  $(call allow-override,AR,$(CROSS_COMPILE)ar)
+$(call allow-override,NM,$(CROSS_COMPILE)nm)
  
  EXT = -std=gnu99

  INSTALL = install
@@ -169,7 +170,10 @@ $(OUTPUT)libtraceevent.so: $(TE_IN)
  $(OUTPUT)libtraceevent.a: $(TE_IN)
$(QUIET_LINK)$(RM) $@; $(AR) rcs $@ $^
  
-plugins: $(PLUGINS)

+$(OUTPUT)libtraceevent-dynamic-list: $(PLUGINS)
+   $(QUIET_GEN)$(call do_generate_dynamic_list_file, $(PLUGINS), $@)
+
+plugins: $(OUTPUT)libtraceevent-dynamic-list
  
  __plugin_obj = $(notdir $@)

plugin_obj = $(__plugin_obj:-in.o=)
@@ -238,6 +242,13 @@ define do_install_plugins
done
  endef
  
+define do_generate_dynamic_list_file

+   (echo '{';  \
+   $(NM) -u -D $1 | awk 'NF>1 {print "\t"$$2";"}' | sort -u;\
+   echo '};';  \
+   ) > $2
+endef
+
  install_lib: all_cmd install_plugins
$(call QUIET_INSTALL, $(LIB_FILE)) \
$(call do_install,$(LIB_FILE),$(bindir_SQ))
diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
index 03409cc02117..a8db98649ea3 100644
--- a/tools/perf/Makefile.perf
+++ b/tools/perf/Makefile.perf
@@ -173,6 +173,8 @@ endif
  LIBTRACEEVENT = $(TE_PATH)libtraceevent.a
  export LIBTRACEEVENT
  
+LDFLAGS += -Xlinker --dynamic-list=$(TE_PATH)libtraceevent-dynamic-list

+
  LIBAPI = $(LIB_PATH)libapi.a
  export LIBAPI
  
--

To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/





--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 1/2] tools lib traceevent: Export dynamic symbols used by traceevent plugins

2015-05-12 Thread hekuang

Hi, jirka

On 05/12/2015 08:37 PM, Jiri Olsa wrote:

On Tue, May 12, 2015 at 06:41:56AM +, He Kuang wrote:

SNIP


$(Q)$(MAKE) $(build)=perf
  
-$(OUTPUT)perf: $(PERFLIBS) $(PERF_IN)

-   $(QUIET_LINK)$(CC) $(CFLAGS) $(LDFLAGS) $(PERF_IN) $(LIBS) -o $@
+LD_LIBTRACEEVENT_FLAGS += -Xlinker --dynamic-list=$(LIBTRACEEVENT_DYNAMIC_LIST)
+$(OUTPUT)perf: $(PERFLIBS) $(PERF_IN) $(LIBTRACEEVENT_DYNAMIC_LIST)
+   $(QUIET_LINK)$(CC) $(CFLAGS) $(LDFLAGS) $(LD_LIBTRACEEVENT_FLAGS) 
$(PERF_IN) $(LIBS) -o $@
  
  $(GTK_IN): FORCE

$(Q)$(MAKE) $(build)=gtk
@@ -373,7 +375,13 @@ $(LIB_FILE): $(LIBPERF_IN)
  LIBTRACEEVENT_FLAGS += plugin_dir=$(plugindir_SQ)
  
  $(LIBTRACEEVENT): FORCE

-   $(Q)$(MAKE) -C $(TRACE_EVENT_DIR) $(LIBTRACEEVENT_FLAGS) O=$(OUTPUT) 
$(OUTPUT)libtraceevent.a plugins
+   $(Q)$(MAKE) -C $(TRACE_EVENT_DIR) $(LIBTRACEEVENT_FLAGS) O=$(OUTPUT) 
$(OUTPUT)libtraceevent.a
+
+libtraceevent_plugins: FORCE
+   $(Q)$(MAKE) -C $(TRACE_EVENT_DIR) $(LIBTRACEEVENT_FLAGS) O=$(OUTPUT) 
plugins
+
+$(LIBTRACEEVENT_DYNAMIC_LIST): libtraceevent_plugins
+   $(Q)$(MAKE) -C $(TRACE_EVENT_DIR) $(LIBTRACEEVENT_FLAGS) O=$(OUTPUT) 
$(OUTPUT)libtraceevent-dynamic-list

I thought the idea of v2 was not to introduce new target,
something like in attached patch (not completely tested)

jirka


There is a problem, the target perf executable is dependent on
the dynamic-list-file, so we should add plugins or the
dynamic-list-file to perf's dependencies.

As your patch below, the dynamic-list-file is built implictly
when building plugins, so we should not add it directly to the
dependency list of perf.

It seems new targets are needed. In the v2 patch,
dynamic-list-file is extracted out from plugins as perf's
dependncy.


---
diff --git a/tools/lib/traceevent/Makefile b/tools/lib/traceevent/Makefile
index d410da335e3d..57eed0403e6a 100644
--- a/tools/lib/traceevent/Makefile
+++ b/tools/lib/traceevent/Makefile
@@ -23,6 +23,7 @@ endef
  # Allow setting CC and AR, or setting CROSS_COMPILE as a prefix.
  $(call allow-override,CC,$(CROSS_COMPILE)gcc)
  $(call allow-override,AR,$(CROSS_COMPILE)ar)
+$(call allow-override,NM,$(CROSS_COMPILE)nm)
  
  EXT = -std=gnu99

  INSTALL = install
@@ -169,7 +170,10 @@ $(OUTPUT)libtraceevent.so: $(TE_IN)
  $(OUTPUT)libtraceevent.a: $(TE_IN)
$(QUIET_LINK)$(RM) $@; $(AR) rcs $@ $^
  
-plugins: $(PLUGINS)

+$(OUTPUT)libtraceevent-dynamic-list: $(PLUGINS)
+   $(QUIET_GEN)$(call do_generate_dynamic_list_file, $(PLUGINS), $@)
+
+plugins: $(OUTPUT)libtraceevent-dynamic-list
  
  __plugin_obj = $(notdir $@)

plugin_obj = $(__plugin_obj:-in.o=)
@@ -238,6 +242,13 @@ define do_install_plugins
done
  endef
  
+define do_generate_dynamic_list_file

+   (echo '{';  \
+   $(NM) -u -D $1 | awk 'NF1 {print \t$$2;}' | sort -u;\
+   echo '};';  \
+   )  $2
+endef
+
  install_lib: all_cmd install_plugins
$(call QUIET_INSTALL, $(LIB_FILE)) \
$(call do_install,$(LIB_FILE),$(bindir_SQ))
diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
index 03409cc02117..a8db98649ea3 100644
--- a/tools/perf/Makefile.perf
+++ b/tools/perf/Makefile.perf
@@ -173,6 +173,8 @@ endif
  LIBTRACEEVENT = $(TE_PATH)libtraceevent.a
  export LIBTRACEEVENT
  
+LDFLAGS += -Xlinker --dynamic-list=$(TE_PATH)libtraceevent-dynamic-list

+
  LIBAPI = $(LIB_PATH)libapi.a
  export LIBAPI
  
--

To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/





--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 2/3] perf probe: Add --range option to show variable location range

2015-05-11 Thread hekuang

hi, Arnaldo

On 05/11/2015 09:58 PM, Arnaldo Carvalho de Melo wrote:

Em Mon, May 11, 2015 at 01:10:03PM +, He Kuang escreveu:

It is not easy for users to get the accurate byte offset or the line
number where a local variable can be probed. With '--range' option,
local variables in scope of the probe point are showed with byte offset
range, and can be added according to this range information.

Please, when you send a V4 patch like this, state what changed with
regards to the previous version, as I already applied this patch :-\

- Arnaldo
  


I will check next time.
Thanks.

For example, there are some variables in function
generic_perform_write():

   
   0  ssize_t generic_perform_write(struct file *file,
   1 struct iov_iter *i, loff_t pos)
   2  {
   3  struct address_space *mapping = file->f_mapping;
   4  const struct address_space_operations *a_ops = mapping->a_ops;
   ...
   42 status = a_ops->write_begin(file, mapping, pos, bytes, 
flags,
, );
   44 if (unlikely(status < 0))

But we got failed when we try to probe the variable 'a_ops' at line 42
or 44.

   $ perf probe --add 'generic_perform_write:42 a_ops'
   Failed to find the location of a_ops at this address.
 Perhaps, it has been optimized out.

This is because source code do not match assembly, so a variable may not
be available in the sourcecode line where it presents. After this patch,
we can lookup the accurate byte offset range of a variable, 'INV'
indicates that this variable is not valid at the given point, but
available in scope:

   $ perf probe --vars 'generic_perform_write:42' --range
   Available variables at generic_perform_write:42
 @
 [INV]   ssize_t written @
 [INV]   struct address_space_operations*a_ops   
@
 [VAL]   (unknown_type)  fsdata  
@
 [VAL]   loff_t  pos 
@
 [VAL]   long intstatus  
@
 [VAL]   long unsigned int   bytes   
@
 [VAL]   struct address_space*   mapping 
@
 [VAL]   struct iov_iter*i   
@
 [VAL]   struct page*page
@

Then it is more clearly for us to do the probe with this variable:

   $ perf probe --add 'generic_perform_write+170 a_ops'
   Added new event:
 probe:generic_perform_write (on generic_perform_write+170 with a_ops)

Signed-off-by: He Kuang 
Acked-by: Masami Hiramatsu 
---
  tools/perf/builtin-probe.c |   2 +
  tools/perf/util/dwarf-aux.c| 121 +
  tools/perf/util/dwarf-aux.h|   2 +
  tools/perf/util/probe-event.h  |   1 +
  tools/perf/util/probe-finder.c |  57 ++-
  5 files changed, 170 insertions(+), 13 deletions(-)

diff --git a/tools/perf/builtin-probe.c b/tools/perf/builtin-probe.c
index 7fa2c7a..1272559 100644
--- a/tools/perf/builtin-probe.c
+++ b/tools/perf/builtin-probe.c
@@ -372,6 +372,8 @@ __cmd_probe(int argc, const char **argv, const char *prefix 
__maybe_unused)
 "Show accessible variables on PROBEDEF", opt_show_vars),
OPT_BOOLEAN('\0', "externs", _conf.show_ext_vars,
"Show external variables too (with --vars only)"),
+   OPT_BOOLEAN('\0', "range", _conf.show_location_range,
+   "Show variables location range in scope (with --vars only)"),
OPT_STRING('k', "vmlinux", _conf.vmlinux_name,
   "file", "vmlinux pathname"),
OPT_STRING('s', "source", _conf.source_prefix,
diff --git a/tools/perf/util/dwarf-aux.c b/tools/perf/util/dwarf-aux.c
index 737c9db..afa0971 100644
--- a/tools/perf/util/dwarf-aux.c
+++ b/tools/perf/util/dwarf-aux.c
@@ -913,3 +913,124 @@ int die_get_varname(Dwarf_Die *vr_die, struct strbuf *buf)
return 0;
  }
  
+/**

+ * die_get_var_innermost_scope - Get innermost scope range of given variable 
DIE
+ * @sp_die: a subprogram DIE
+ * @vr_die: a variable DIE
+ * @buf: a strbuf for variable byte offset range
+ *
+ * Get the innermost scope range of @vr_die and stores it in @buf as
+ * "@".
+ */
+static int die_get_var_innermost_scope(Dwarf_Die *sp_die, Dwarf_Die *vr_die,
+   struct strbuf *buf)
+{
+   Dwarf_Die *scopes;
+   int count;
+   size_t offset = 0;
+   Dwarf_Addr base;
+   Dwarf_Addr start, end;
+   Dwarf_Addr entry;
+   int ret;
+   bool first = true;
+   const char *name;
+
+   ret = dwarf_entrypc(sp_die, );
+   if (ret)
+   return ret;
+
+   name = dwarf_diename(sp_die);
+   if (!name)
+   return -ENOENT;
+
+   count = dwarf_getscopes_die(vr_die, );
+
+   /* (*SCOPES)[1] is the DIE for the scope containing that scope */
+   if (count <= 1) {
+   ret = -EINVAL;
+   goto out;
+   }
+
+   while ((offset = 

Re: [PATCH v4 2/3] perf probe: Add --range option to show variable location range

2015-05-11 Thread hekuang

hi, Arnaldo

On 05/11/2015 09:58 PM, Arnaldo Carvalho de Melo wrote:

Em Mon, May 11, 2015 at 01:10:03PM +, He Kuang escreveu:

It is not easy for users to get the accurate byte offset or the line
number where a local variable can be probed. With '--range' option,
local variables in scope of the probe point are showed with byte offset
range, and can be added according to this range information.

Please, when you send a V4 patch like this, state what changed with
regards to the previous version, as I already applied this patch :-\

- Arnaldo
  


I will check next time.
Thanks.

For example, there are some variables in function
generic_perform_write():

   generic_perform_write@mm/filemap.c:0
   0  ssize_t generic_perform_write(struct file *file,
   1 struct iov_iter *i, loff_t pos)
   2  {
   3  struct address_space *mapping = file-f_mapping;
   4  const struct address_space_operations *a_ops = mapping-a_ops;
   ...
   42 status = a_ops-write_begin(file, mapping, pos, bytes, 
flags,
page, fsdata);
   44 if (unlikely(status  0))

But we got failed when we try to probe the variable 'a_ops' at line 42
or 44.

   $ perf probe --add 'generic_perform_write:42 a_ops'
   Failed to find the location of a_ops at this address.
 Perhaps, it has been optimized out.

This is because source code do not match assembly, so a variable may not
be available in the sourcecode line where it presents. After this patch,
we can lookup the accurate byte offset range of a variable, 'INV'
indicates that this variable is not valid at the given point, but
available in scope:

   $ perf probe --vars 'generic_perform_write:42' --range
   Available variables at generic_perform_write:42
 @generic_perform_write+141
 [INV]   ssize_t written @generic_perform_write+[324-331]
 [INV]   struct address_space_operations*a_ops   
@generic_perform_write+[55-61,170-176,223-246]
 [VAL]   (unknown_type)  fsdata  
@generic_perform_write+[70-307,346-411]
 [VAL]   loff_t  pos 
@generic_perform_write+[0-286,286-336,346-411]
 [VAL]   long intstatus  
@generic_perform_write+[83-342,346-411]
 [VAL]   long unsigned int   bytes   
@generic_perform_write+[122-311,320-338,346-403,403-411]
 [VAL]   struct address_space*   mapping 
@generic_perform_write+[35-344,346-411]
 [VAL]   struct iov_iter*i   
@generic_perform_write+[0-340,346-411]
 [VAL]   struct page*page
@generic_perform_write+[70-307,346-411]

Then it is more clearly for us to do the probe with this variable:

   $ perf probe --add 'generic_perform_write+170 a_ops'
   Added new event:
 probe:generic_perform_write (on generic_perform_write+170 with a_ops)

Signed-off-by: He Kuang heku...@huawei.com
Acked-by: Masami Hiramatsu masami.hiramatsu...@hitachi.com
---
  tools/perf/builtin-probe.c |   2 +
  tools/perf/util/dwarf-aux.c| 121 +
  tools/perf/util/dwarf-aux.h|   2 +
  tools/perf/util/probe-event.h  |   1 +
  tools/perf/util/probe-finder.c |  57 ++-
  5 files changed, 170 insertions(+), 13 deletions(-)

diff --git a/tools/perf/builtin-probe.c b/tools/perf/builtin-probe.c
index 7fa2c7a..1272559 100644
--- a/tools/perf/builtin-probe.c
+++ b/tools/perf/builtin-probe.c
@@ -372,6 +372,8 @@ __cmd_probe(int argc, const char **argv, const char *prefix 
__maybe_unused)
 Show accessible variables on PROBEDEF, opt_show_vars),
OPT_BOOLEAN('\0', externs, probe_conf.show_ext_vars,
Show external variables too (with --vars only)),
+   OPT_BOOLEAN('\0', range, probe_conf.show_location_range,
+   Show variables location range in scope (with --vars only)),
OPT_STRING('k', vmlinux, symbol_conf.vmlinux_name,
   file, vmlinux pathname),
OPT_STRING('s', source, symbol_conf.source_prefix,
diff --git a/tools/perf/util/dwarf-aux.c b/tools/perf/util/dwarf-aux.c
index 737c9db..afa0971 100644
--- a/tools/perf/util/dwarf-aux.c
+++ b/tools/perf/util/dwarf-aux.c
@@ -913,3 +913,124 @@ int die_get_varname(Dwarf_Die *vr_die, struct strbuf *buf)
return 0;
  }
  
+/**

+ * die_get_var_innermost_scope - Get innermost scope range of given variable 
DIE
+ * @sp_die: a subprogram DIE
+ * @vr_die: a variable DIE
+ * @buf: a strbuf for variable byte offset range
+ *
+ * Get the innermost scope range of @vr_die and stores it in @buf as
+ * @function_name+[NN-NN,NN-NN].
+ */
+static int die_get_var_innermost_scope(Dwarf_Die *sp_die, Dwarf_Die *vr_die,
+   struct strbuf *buf)
+{
+   Dwarf_Die *scopes;
+   int count;
+   size_t offset = 0;
+   Dwarf_Addr base;
+   Dwarf_Addr start, end;
+   

Re: [PATCH] perf: fix building error in x86_64

2015-02-12 Thread Hekuang


在 2015/2/12 16:07, Namhyung Kim 写道:

Hi,

On Wed, Feb 11, 2015 at 10:01:08AM +0800, He Kuang wrote:

When build with ARCH=x86_64, perf failed to compile with following error:

tests/builtin-test.o:(.data+0x158): undefined reference to 
`test__perf_time_to_tsc'
collect2: error: ld returned 1 exit status
Makefile.perf:632: recipe for target 'perf' failed
...

Which is caused commit c6e5e9fbc3ea1 ("perf tools: Fix building error
in x86_64 when dwarf unwind is on"), ARCH test in Makefile.perf
conflicts with tests/builtin-test.c's __x86_64__.
To x86/x86_64 platform, ARCH should always override to x86 while
IS_64_BIT stands for the actual architecture.

Signed-off-by: He Kuang 
---
  tools/perf/config/Makefile.arch | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/config/Makefile.arch b/tools/perf/config/Makefile.arch
index ff95a68..8c6214d 100644
--- a/tools/perf/config/Makefile.arch
+++ b/tools/perf/config/Makefile.arch
@@ -14,7 +14,7 @@ ifeq ($(RAW_ARCH),i386)
  endif
  
  ifeq ($(RAW_ARCH),x86_64)

-  ARCH ?= x86
+  override ARCH := x86

Hmm.. wouldn't it (again) break cross build then?

Thanks,
Namhyung




I've tested both 'make ARCH=x86' and 'make ARCH=x86_64' cases after a
'make clean'.

The issue was first caused by IS_X86_64 flag wrongly cleared when
ARCH=x86, which is already fixed by separating IS_64_BIT and ARCH in
commit c6e5e9fbc3ea1 ("perf tools: Fix building error in x86_64 when
dwarf unwind is on").

The only problem here is we should let ARCH override to x86, to keep
compatible with 'ifeq ($(ARCH),x86)'.
  
ifneq (, $(findstring m32,$(CFLAGS)))

  RAW_ARCH := x86_32
--
2.2.0.33.gc18b867

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

.




--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] perf: fix building error in x86_64

2015-02-12 Thread Hekuang


在 2015/2/12 16:07, Namhyung Kim 写道:

Hi,

On Wed, Feb 11, 2015 at 10:01:08AM +0800, He Kuang wrote:

When build with ARCH=x86_64, perf failed to compile with following error:

tests/builtin-test.o:(.data+0x158): undefined reference to 
`test__perf_time_to_tsc'
collect2: error: ld returned 1 exit status
Makefile.perf:632: recipe for target 'perf' failed
...

Which is caused commit c6e5e9fbc3ea1 (perf tools: Fix building error
in x86_64 when dwarf unwind is on), ARCH test in Makefile.perf
conflicts with tests/builtin-test.c's __x86_64__.
To x86/x86_64 platform, ARCH should always override to x86 while
IS_64_BIT stands for the actual architecture.

Signed-off-by: He Kuang heku...@huawei.com
---
  tools/perf/config/Makefile.arch | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/config/Makefile.arch b/tools/perf/config/Makefile.arch
index ff95a68..8c6214d 100644
--- a/tools/perf/config/Makefile.arch
+++ b/tools/perf/config/Makefile.arch
@@ -14,7 +14,7 @@ ifeq ($(RAW_ARCH),i386)
  endif
  
  ifeq ($(RAW_ARCH),x86_64)

-  ARCH ?= x86
+  override ARCH := x86

Hmm.. wouldn't it (again) break cross build then?

Thanks,
Namhyung




I've tested both 'make ARCH=x86' and 'make ARCH=x86_64' cases after a
'make clean'.

The issue was first caused by IS_X86_64 flag wrongly cleared when
ARCH=x86, which is already fixed by separating IS_64_BIT and ARCH in
commit c6e5e9fbc3ea1 (perf tools: Fix building error in x86_64 when
dwarf unwind is on).

The only problem here is we should let ARCH override to x86, to keep
compatible with 'ifeq ($(ARCH),x86)'.
  
ifneq (, $(findstring m32,$(CFLAGS)))

  RAW_ARCH := x86_32
--
2.2.0.33.gc18b867

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

.




--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 linux-trace 1/8] tracing: attach eBPF programs to tracepoints and syscalls

2015-02-11 Thread Hekuang



eBPF is very flexible, which means it is bound to have someone use it
in a way you never dreamed of, and that will be what bites you in the
end (pun intended).

understood :)
let's start slow then with bpf+syscall and bpf+kprobe only.



I think BPF + system calls/kprobes can meet our use case
(https://lkml.org/lkml/2015/2/6/44), but there're some issues to be
improved.

I suggest that you can improve bpf+kprobes when attached to function
headers(or TRACE_MARKERS), make it converts pt-regs to bpf_ctx->arg1,
arg2.., then top models and architectures can be separated by bpf.

BPF bytecode is cross-platform, but what we can get by using bpf+kprobes
is a 'regs->rdx' kind of information, such information is both
architecture and kernel version related.

We hope to establish some models for describing kernel procedures such
as IO and network, which requires that it does not rely on architecture
and does not rely to a specific kernel version as much as possible.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 linux-trace 1/8] tracing: attach eBPF programs to tracepoints and syscalls

2015-02-11 Thread Hekuang



eBPF is very flexible, which means it is bound to have someone use it
in a way you never dreamed of, and that will be what bites you in the
end (pun intended).

understood :)
let's start slow then with bpf+syscall and bpf+kprobe only.



I think BPF + system calls/kprobes can meet our use case
(https://lkml.org/lkml/2015/2/6/44), but there're some issues to be
improved.

I suggest that you can improve bpf+kprobes when attached to function
headers(or TRACE_MARKERS), make it converts pt-regs to bpf_ctx-arg1,
arg2.., then top models and architectures can be separated by bpf.

BPF bytecode is cross-platform, but what we can get by using bpf+kprobes
is a 'regs-rdx' kind of information, such information is both
architecture and kernel version related.

We hope to establish some models for describing kernel procedures such
as IO and network, which requires that it does not rely on architecture
and does not rely to a specific kernel version as much as possible.


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   >