Re: [PATCH] perf stat: Ignore error thread when enabling system-wide --per-thread
On 1/23/2018 10:19 PM, Jiri Olsa wrote: On Mon, Jan 22, 2018 at 01:10:31PM +0800, Jin, Yao wrote: On 1/16/2018 9:17 PM, Jiri Olsa wrote: On Tue, Jan 16, 2018 at 09:06:09PM +0800, Jin, Yao wrote: Just tested. But looks it's not OK for '--per-thread' case. yea, I haven't tested much.. might need soem tweaking, but my point was that it could be doable on one place instead of introducing another if possible jirka Hi Jiri, I ever considered to move the operation of removing error thread to perf_evsel__fallback(). The perf_evsel__fallback() is common code and it's shared by perf report, perf stat and perf top. While finally I think it'd better let the caller decide to remove error thread and try again, or just return the warning message. perf_evsel__fallback() probably doesn't know what the caller want to do. That's my current thinking. Maybe there will be a better fix... ok, can't think of better fix atm.. looks good ;-) Acked-by: Jiri Olsa thanks, jirka Hi Arnaldo, Could this fix be accepted? Thanks Jin Yao
Re: [PATCH] perf annotate: Support to display the LBR data in tui mode
On 2/23/2018 11:29 PM, Arnaldo Carvalho de Melo wrote: Em Fri, Feb 23, 2018 at 09:25:00AM +0100, Peter Zijlstra escreveu: On Fri, Feb 23, 2018 at 10:35:58PM +0800, Jin Yao wrote: Unlike the perf report interactive annotate mode, the perf annotate doesn't display the LBR data. perf record -b ... perf annotate function It should show IPC/cycle, but it doesn't. There is far more than IPC/cycle for the LBR data, so this Changelog is misleading. Also, I think that this patch goes the wrong way, we should reduce the divergence of the various modes, not make it worse. Right, Peter, what would you think if I made --stdio use the same routines used to format the TUI, i.e. stdio would be equal to the TUI modulo de navigation/jump arrows, etc. We would have switches to provide the TUI output options that make sense for non-interactive mode, like: J Toggle showing number of jump sources on targets o Toggle disassembler output/simplified view s Toggle source code view t Circulate percent, total period, samples view k Toggle line numbers Hi Arnaldo, looks your idea is very similar as my idea. In my understanding, for example, we may provide switch to tui routine like annotate_browser__write() and use fprintf() to replace ui_browser__printf()/ui_browser_write__xxx() if switch is on for stdio. Is that your idea? For this approach, I think, the benefit is we can reuse most of existing code but the disadvantage is we have to mix tui and stdio up. Thanks Jin Yao And would still have e --stdio-classic (deprecated), that we would keep for a while. I think that this new mode with "dissassembler output" would be the same as the current --stdio, I'll check. To further clarify, this wouldn't use any ncurses/slang TUI code, just plain printf with things formatted using what is used now for the TUI mode. This way there would never be any drift amongst the output modes and we would have less work to do when implementing new stuff like this LBR case. What do you think? - Arnaldo
Re: [PATCH] perf annotate: Support to display the LBR data in tui mode
On 2/23/2018 11:29 PM, Arnaldo Carvalho de Melo wrote: Em Fri, Feb 23, 2018 at 09:25:00AM +0100, Peter Zijlstra escreveu: On Fri, Feb 23, 2018 at 10:35:58PM +0800, Jin Yao wrote: Unlike the perf report interactive annotate mode, the perf annotate doesn't display the LBR data. perf record -b ... perf annotate function It should show IPC/cycle, but it doesn't. There is far more than IPC/cycle for the LBR data, so this Changelog is misleading. Also, I think that this patch goes the wrong way, we should reduce the divergence of the various modes, not make it worse. Right, Peter, what would you think if I made --stdio use the same routines used to format the TUI, i.e. stdio would be equal to the TUI modulo de navigation/jump arrows, etc. We would have switches to provide the TUI output options that make sense for non-interactive mode, like: J Toggle showing number of jump sources on targets o Toggle disassembler output/simplified view s Toggle source code view t Circulate percent, total period, samples view k Toggle line numbers Hi Arnaldo, looks your idea is very similar as my idea. In my understanding, for example, we may provide switch to tui routine like annotate_browser__write() and use fprintf() to replace ui_browser__printf()/ui_browser_write__xxx() if switch is on for stdio. Is that your idea? For this approach, I think, the benefit is we can reuse most of existing code but the disadvantage is we have to mix tui and stdio up. Thanks Jin Yao And would still have e --stdio-classic (deprecated), that we would keep for a while. I think that this new mode with "dissassembler output" would be the same as the current --stdio, I'll check. To further clarify, this wouldn't use any ncurses/slang TUI code, just plain printf with things formatted using what is used now for the TUI mode. This way there would never be any drift amongst the output modes and we would have less work to do when implementing new stuff like this LBR case. What do you think? - Arnaldo
Re: [PATCH] perf annotate: Support to display the LBR data in tui mode
On 2/23/2018 4:25 PM, Peter Zijlstra wrote: On Fri, Feb 23, 2018 at 10:35:58PM +0800, Jin Yao wrote: Unlike the perf report interactive annotate mode, the perf annotate doesn't display the LBR data. perf record -b ... perf annotate function It should show IPC/cycle, but it doesn't. There is far more than IPC/cycle for the LBR data, so this Changelog is misleading. I will change the changelog to make it more clear. Also, I think that this patch goes the wrong way, we should reduce the divergence of the various modes, not make it worse. I do plan to support stdio mode. While stdio mode needs more changes than tui mode, so I plan to do it in a follow-up patch. Posting this patch now is because I want to listen from community first for this feature. If the tui patch could be accepted, then it's worth putting more efforts on stdio version. That's my thoughts. Thanks Jin Yao
Re: [PATCH] perf annotate: Support to display the LBR data in tui mode
On 2/23/2018 4:25 PM, Peter Zijlstra wrote: On Fri, Feb 23, 2018 at 10:35:58PM +0800, Jin Yao wrote: Unlike the perf report interactive annotate mode, the perf annotate doesn't display the LBR data. perf record -b ... perf annotate function It should show IPC/cycle, but it doesn't. There is far more than IPC/cycle for the LBR data, so this Changelog is misleading. I will change the changelog to make it more clear. Also, I think that this patch goes the wrong way, we should reduce the divergence of the various modes, not make it worse. I do plan to support stdio mode. While stdio mode needs more changes than tui mode, so I plan to do it in a follow-up patch. Posting this patch now is because I want to listen from community first for this feature. If the tui patch could be accepted, then it's worth putting more efforts on stdio version. That's my thoughts. Thanks Jin Yao
[PATCH] perf annotate: Support to display the LBR data in tui mode
Unlike the perf report interactive annotate mode, the perf annotate doesn't display the LBR data. perf record -b ... perf annotate function It should show IPC/cycle, but it doesn't. This patch lets perf annotate support the displaying of LBR data. For example, perf annotate compute_flag Percent│ IPC Cycle │ │ │Disassembly of section .text: │ │00400640 : │compute_flag(): │volatile int count; │static unsigned int s_randseed; │ │__attribute__((noinline)) │int compute_flag() │{ 22.96 │1.18 584sub$0x8,%rsp │int i; │ │i = rand() % 2; 23.02 │1.18 1 → callq rand@plt │ │return i; 27.05 │3.37 mov%eax,%edx │} │3.37 add$0x8,%rsp │{ │int i; │ │i = rand() % 2; │ │return i; │3.37 shr$0x1f,%edx │3.37 add%edx,%eax │3.37 and$0x1,%eax │3.37 sub%edx,%eax │} 26.97 │3.37 2 ← retq Note that, this patch only supports tui mode. For other mode like --stdio, it just keeps original behavior. perf annotate compute_flag --stdio Percent | Source code & Disassembly of div for cycles:ppp (7993 samples) -- : : : :Disassembly of section .text: : :00400640 : :compute_flag(): :volatile int count; :static unsigned int s_randseed; : :__attribute__((noinline)) :int compute_flag() :{ 0.29 : 400640: sub$0x8,%rsp # +100.00% :int i; : :i = rand() % 2; 42.93 : 400644: callq 400490 <rand@plt> # -100.00% (p:100.00%) : :return i; 0.10 : 400649: mov%eax,%edx # +100.00% :} 0.94 : 40064b: add$0x8,%rsp :{ :int i; : :i = rand() % 2; : :return i; 27.02 : 40064f: shr$0x1f,%edx 0.15 : 400652: add%edx,%eax 1.24 : 400654: and$0x1,%eax 2.08 : 400657: sub%edx,%eax :} 25.26 : 400659: retq # -100.00% (p:100.00%) Signed-off-by: Jin Yao <yao@linux.intel.com> --- tools/perf/builtin-annotate.c | 88 --- 1 file changed, 82 insertions(+), 6 deletions(-) diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c index f15731a..ead6ae4 100644 --- a/tools/perf/builtin-annotate.c +++ b/tools/perf/builtin-annotate.c @@ -44,6 +44,7 @@ struct perf_annotate { bool full_paths; bool print_line; bool skip_missing; + bool has_br_stack; const char *sym_hist_filter; const char *cpu_list; DECLARE_BITMAP(cpu_bitmap, MAX_NR_CPUS); @@ -146,16 +147,73 @@ static void process_branch_stack(struct branch_stack *bs, struct addr_location * free(bi); } +static int hist_iter__branch_callback(struct hist_entry_iter *iter, + struct addr_location *al __maybe_unused, + bool single __maybe_unused, + void *arg __maybe_unused) +{ + struct hist_entry *he = iter->he; + struct branch_info *bi; + struct perf_sample *sample = iter->sample; + struct perf_evsel *evsel = iter->evsel; + int err; + + hist__account_cycles(sample->branch_stack, al, sample, false); + + bi = he->branch_info; + err = addr_map_symbol__inc_samples(>from, sample, evsel->idx); + + if (err) + goto out; + + err = addr_map_symbol__inc_samples(>to, sample, evsel->idx); + +out: + return err; +} + +static int process_branch_callback(struct perf_evsel *evsel, + struct perf_sample *sample, + struct addr_location *al __maybe_unused, + struct perf_annotate *ann, + struct machine *machine) +{ + struct hist_entry_iter iter = { + .evse
[PATCH] perf annotate: Support to display the LBR data in tui mode
Unlike the perf report interactive annotate mode, the perf annotate doesn't display the LBR data. perf record -b ... perf annotate function It should show IPC/cycle, but it doesn't. This patch lets perf annotate support the displaying of LBR data. For example, perf annotate compute_flag Percent│ IPC Cycle │ │ │Disassembly of section .text: │ │00400640 : │compute_flag(): │volatile int count; │static unsigned int s_randseed; │ │__attribute__((noinline)) │int compute_flag() │{ 22.96 │1.18 584sub$0x8,%rsp │int i; │ │i = rand() % 2; 23.02 │1.18 1 → callq rand@plt │ │return i; 27.05 │3.37 mov%eax,%edx │} │3.37 add$0x8,%rsp │{ │int i; │ │i = rand() % 2; │ │return i; │3.37 shr$0x1f,%edx │3.37 add%edx,%eax │3.37 and$0x1,%eax │3.37 sub%edx,%eax │} 26.97 │3.37 2 ← retq Note that, this patch only supports tui mode. For other mode like --stdio, it just keeps original behavior. perf annotate compute_flag --stdio Percent | Source code & Disassembly of div for cycles:ppp (7993 samples) -- : : : :Disassembly of section .text: : :00400640 : :compute_flag(): :volatile int count; :static unsigned int s_randseed; : :__attribute__((noinline)) :int compute_flag() :{ 0.29 : 400640: sub$0x8,%rsp # +100.00% :int i; : :i = rand() % 2; 42.93 : 400644: callq 400490 # -100.00% (p:100.00%) : :return i; 0.10 : 400649: mov%eax,%edx # +100.00% :} 0.94 : 40064b: add$0x8,%rsp :{ :int i; : :i = rand() % 2; : :return i; 27.02 : 40064f: shr$0x1f,%edx 0.15 : 400652: add%edx,%eax 1.24 : 400654: and$0x1,%eax 2.08 : 400657: sub%edx,%eax :} 25.26 : 400659: retq # -100.00% (p:100.00%) Signed-off-by: Jin Yao --- tools/perf/builtin-annotate.c | 88 --- 1 file changed, 82 insertions(+), 6 deletions(-) diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c index f15731a..ead6ae4 100644 --- a/tools/perf/builtin-annotate.c +++ b/tools/perf/builtin-annotate.c @@ -44,6 +44,7 @@ struct perf_annotate { bool full_paths; bool print_line; bool skip_missing; + bool has_br_stack; const char *sym_hist_filter; const char *cpu_list; DECLARE_BITMAP(cpu_bitmap, MAX_NR_CPUS); @@ -146,16 +147,73 @@ static void process_branch_stack(struct branch_stack *bs, struct addr_location * free(bi); } +static int hist_iter__branch_callback(struct hist_entry_iter *iter, + struct addr_location *al __maybe_unused, + bool single __maybe_unused, + void *arg __maybe_unused) +{ + struct hist_entry *he = iter->he; + struct branch_info *bi; + struct perf_sample *sample = iter->sample; + struct perf_evsel *evsel = iter->evsel; + int err; + + hist__account_cycles(sample->branch_stack, al, sample, false); + + bi = he->branch_info; + err = addr_map_symbol__inc_samples(>from, sample, evsel->idx); + + if (err) + goto out; + + err = addr_map_symbol__inc_samples(>to, sample, evsel->idx); + +out: + return err; +} + +static int process_branch_callback(struct perf_evsel *evsel, + struct perf_sample *sample, + struct addr_location *al __maybe_unused, + struct perf_annotate *ann, + struct machine *machine) +{ + struct hist_entry_iter iter = { + .evsel = evsel, +
[tip:perf/core] perf report: Fix wrong jump arrow
Commit-ID: b40982e8468b46b8f7f5bba5a7e541ec04a29d7d Gitweb: https://git.kernel.org/tip/b40982e8468b46b8f7f5bba5a7e541ec04a29d7d Author: Jin Yao <yao@linux.intel.com> AuthorDate: Mon, 29 Jan 2018 18:57:53 +0800 Committer: Arnaldo Carvalho de Melo <a...@redhat.com> CommitDate: Fri, 16 Feb 2018 14:55:47 -0300 perf report: Fix wrong jump arrow When we use perf report interactive annotate view, we can see the position of jump arrow is not correct. For example, 1. perf record -b ... 2. perf report 3. In interactive mode, select Annotate 'function' Percent│ IPC Cycle │if (flag) 1.37 │0.4┌── 1 ↓ je 82 │ │x += x / y + y / x; 0.00 │0.4│ 1310movsd (%rsp),%xmm0 0.00 │0.4│ 565movsd 0x8(%rsp),%xmm4 │0.4│ movsd 0x8(%rsp),%xmm1 │0.4│ movsd (%rsp),%xmm3 │0.4│ divsd %xmm4,%xmm0 0.00 │0.4│ 579divsd %xmm3,%xmm1 │0.4│ movsd (%rsp),%xmm2 │0.4│ addsd %xmm1,%xmm0 │0.4│ addsd %xmm2,%xmm0 0.00 │0.4│ movsd %xmm0,(%rsp) │ │volatile double x = 1212121212, y = 121212; │ │ │ │s_randseed = time(0); │ │srand(s_randseed); │ │ │ │for (i = 0; i < 20; i++) { 1.37 │0.4└─→ 82: sub$0x1,%ebx 28.21 │0.4817 ↑ jne38 The jump arrow in above example is not correct. It should add the width of IPC and Cycle. With this patch, the result is: Percent│ IPC Cycle │if (flag) 1.37 │0.48 1 ┌──je 82 │ │x += x / y + y / x; 0.00 │0.48 1310 │ movsd (%rsp),%xmm0 0.00 │0.48 565 │ movsd 0x8(%rsp),%xmm4 │0.48 │ movsd 0x8(%rsp),%xmm1 │0.48 │ movsd (%rsp),%xmm3 │0.48 │ divsd %xmm4,%xmm0 0.00 │0.48 579 │ divsd %xmm3,%xmm1 │0.48 │ movsd (%rsp),%xmm2 │0.48 │ addsd %xmm1,%xmm0 │0.48 │ addsd %xmm2,%xmm0 0.00 │0.48 │ movsd %xmm0,(%rsp) │ │volatile double x = 1212121212, y = 121212; │ │ │ │s_randseed = time(0); │ │srand(s_randseed); │ │ │ │for (i = 0; i < 20; i++) { 1.37 │0.4882:└─→sub$0x1,%ebx 28.21 │0.4817 ↑ jne38 Committer notes: Please note that only from LBRv5 (according to Jiri) onwards, i.e. >= Skylake is that we'll have the cycles counts in each branch record entry, so to see the Cycles and IPC columns, and be able to test this patch, one need a capable hardware. While applying this I first tested it on a Broadwell class machine and couldn't get those columns, will add code to the annotate browser to warn the user about that, i.e. you have branch records, but no cycles, use a more recent hardware to get the cycles and IPC columns. Signed-off-by: Jin Yao <yao@linux.intel.com> Cc: Alexander Shishkin <alexander.shish...@linux.intel.com> Cc: Andi Kleen <a...@linux.intel.com> Cc: Jin Yao <yao@intel.com> Cc: Jiri Olsa <jo...@kernel.org> Cc: Kan Liang <kan.li...@intel.com> Cc: Peter Zijlstra <pet...@infradead.org> Link: http://lkml.kernel.org/r/1517223473-14750-1-git-send-email-yao@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <a...@redhat.com> --- tools/perf/ui/browsers/annotate.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/tools/perf/ui/browsers/annotate.c b/tools/perf/ui/browsers/annotate.c index 2864279..e2f6663 100644 --- a/tools/perf/ui/browsers/annotate.c +++ b/tools/perf/ui/browsers/annotate.c @@ -319,6 +319,7 @@ static void annotate_browser__draw_current_jump(struct ui_browser *browser) struct map_symbol *ms = ab->b.priv; struct symbol *sym = ms->sym; u8 pcnt_width = annotate_browser__pcnt_width(ab); + int width = 0; /* PLT symbols contain external offsets */ if (strstr(sym->name, "@plt")) @@ -340,13 +341,17 @@ static void annotate_browser__draw_current_jump(struct ui_browser *browser) to = (u64)btarget->idx; } + if (ab->have_cycles) + width = IPC_WIDTH + CYCLES_WIDTH; + ui_browser__set_color(browser, HE_COLORSET_JUMP_ARROWS); - __ui_browser__line_arrow(browser, pcnt_width + 2 + ab->addr_width, + __ui_browser__line_arrow(browser, +pcnt_width + 2 + ab->addr_width + width, from, to); if (is_fused(ab, cursor)) {
[tip:perf/core] perf report: Fix wrong jump arrow
Commit-ID: b40982e8468b46b8f7f5bba5a7e541ec04a29d7d Gitweb: https://git.kernel.org/tip/b40982e8468b46b8f7f5bba5a7e541ec04a29d7d Author: Jin Yao AuthorDate: Mon, 29 Jan 2018 18:57:53 +0800 Committer: Arnaldo Carvalho de Melo CommitDate: Fri, 16 Feb 2018 14:55:47 -0300 perf report: Fix wrong jump arrow When we use perf report interactive annotate view, we can see the position of jump arrow is not correct. For example, 1. perf record -b ... 2. perf report 3. In interactive mode, select Annotate 'function' Percent│ IPC Cycle │if (flag) 1.37 │0.4┌── 1 ↓ je 82 │ │x += x / y + y / x; 0.00 │0.4│ 1310movsd (%rsp),%xmm0 0.00 │0.4│ 565movsd 0x8(%rsp),%xmm4 │0.4│ movsd 0x8(%rsp),%xmm1 │0.4│ movsd (%rsp),%xmm3 │0.4│ divsd %xmm4,%xmm0 0.00 │0.4│ 579divsd %xmm3,%xmm1 │0.4│ movsd (%rsp),%xmm2 │0.4│ addsd %xmm1,%xmm0 │0.4│ addsd %xmm2,%xmm0 0.00 │0.4│ movsd %xmm0,(%rsp) │ │volatile double x = 1212121212, y = 121212; │ │ │ │s_randseed = time(0); │ │srand(s_randseed); │ │ │ │for (i = 0; i < 20; i++) { 1.37 │0.4└─→ 82: sub$0x1,%ebx 28.21 │0.4817 ↑ jne38 The jump arrow in above example is not correct. It should add the width of IPC and Cycle. With this patch, the result is: Percent│ IPC Cycle │if (flag) 1.37 │0.48 1 ┌──je 82 │ │x += x / y + y / x; 0.00 │0.48 1310 │ movsd (%rsp),%xmm0 0.00 │0.48 565 │ movsd 0x8(%rsp),%xmm4 │0.48 │ movsd 0x8(%rsp),%xmm1 │0.48 │ movsd (%rsp),%xmm3 │0.48 │ divsd %xmm4,%xmm0 0.00 │0.48 579 │ divsd %xmm3,%xmm1 │0.48 │ movsd (%rsp),%xmm2 │0.48 │ addsd %xmm1,%xmm0 │0.48 │ addsd %xmm2,%xmm0 0.00 │0.48 │ movsd %xmm0,(%rsp) │ │volatile double x = 1212121212, y = 121212; │ │ │ │s_randseed = time(0); │ │srand(s_randseed); │ │ │ │for (i = 0; i < 20; i++) { 1.37 │0.4882:└─→sub$0x1,%ebx 28.21 │0.4817 ↑ jne38 Committer notes: Please note that only from LBRv5 (according to Jiri) onwards, i.e. >= Skylake is that we'll have the cycles counts in each branch record entry, so to see the Cycles and IPC columns, and be able to test this patch, one need a capable hardware. While applying this I first tested it on a Broadwell class machine and couldn't get those columns, will add code to the annotate browser to warn the user about that, i.e. you have branch records, but no cycles, use a more recent hardware to get the cycles and IPC columns. Signed-off-by: Jin Yao Cc: Alexander Shishkin Cc: Andi Kleen Cc: Jin Yao Cc: Jiri Olsa Cc: Kan Liang Cc: Peter Zijlstra Link: http://lkml.kernel.org/r/1517223473-14750-1-git-send-email-yao@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/ui/browsers/annotate.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/tools/perf/ui/browsers/annotate.c b/tools/perf/ui/browsers/annotate.c index 2864279..e2f6663 100644 --- a/tools/perf/ui/browsers/annotate.c +++ b/tools/perf/ui/browsers/annotate.c @@ -319,6 +319,7 @@ static void annotate_browser__draw_current_jump(struct ui_browser *browser) struct map_symbol *ms = ab->b.priv; struct symbol *sym = ms->sym; u8 pcnt_width = annotate_browser__pcnt_width(ab); + int width = 0; /* PLT symbols contain external offsets */ if (strstr(sym->name, "@plt")) @@ -340,13 +341,17 @@ static void annotate_browser__draw_current_jump(struct ui_browser *browser) to = (u64)btarget->idx; } + if (ab->have_cycles) + width = IPC_WIDTH + CYCLES_WIDTH; + ui_browser__set_color(browser, HE_COLORSET_JUMP_ARROWS); - __ui_browser__line_arrow(browser, pcnt_width + 2 + ab->addr_width, + __ui_browser__line_arrow(browser, +pcnt_width + 2 + ab->addr_width + width, from, to); if (is_fused(ab, cursor)) { ui_browser__mark_fused(browser, - pcnt_width + 3 + ab->addr_width, + pcnt_width + 3 + ab->addr_width + width, from - 1, to > from ? true : false); }
[tip:perf/core] perf tools: Use target->per_thread and target->system_wide flags
Commit-ID: 147c508f3004df6e2958f6c8867909531c2a15e2 Gitweb: https://git.kernel.org/tip/147c508f3004df6e2958f6c8867909531c2a15e2 Author: Jin Yao <yao@linux.intel.com> AuthorDate: Mon, 12 Feb 2018 13:32:36 -0700 Committer: Arnaldo Carvalho de Melo <a...@redhat.com> CommitDate: Fri, 16 Feb 2018 14:55:40 -0300 perf tools: Use target->per_thread and target->system_wide flags Mathieu Poirier reports issue in commit ("73c0ca1eee3d perf thread_map: Enumerate all threads from /proc") that it has negative impact on 'perf record --per-thread'. It has the effect of creating a kernel event for each thread in the system for 'perf record --per-thread'. Mathieu Poirier's patch ("perf util: Do not reuse target->per_thread flag") can fix this issue by creating a new target->all_threads flag. This patch is based on Mathieu Poirier's patch but it doesn't use a new target->all_threads flag. This patch just uses 'target->per_thread && target->system_wide' as a condition to check for all threads case. Signed-off-by: Jin Yao <yao@linux.intel.com> Cc: Alexander Shishkin <alexander.shish...@linux.intel.com> Cc: Namhyung Kim <namhy...@kernel.org> Cc: Peter Zijlstra <pet...@infradead.org> Cc: linux-arm-ker...@lists.infradead.org Fixes: 73c0ca1eee3d ("perf thread_map: Enumerate all threads from /proc") Link: http://lkml.kernel.org/r/1518467557-18505-3-git-send-email-mathieu.poir...@linaro.org Signed-off-by: Mathieu Poirier <mathieu.poir...@linaro.org> [Fixed checkpatch warning about line over 80 characters] Signed-off-by: Arnaldo Carvalho de Melo <a...@redhat.com> --- tools/perf/util/evlist.c | 21 - tools/perf/util/thread_map.c | 4 ++-- tools/perf/util/thread_map.h | 2 +- 3 files changed, 23 insertions(+), 4 deletions(-) diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c index e5fc14e..7b7d535 100644 --- a/tools/perf/util/evlist.c +++ b/tools/perf/util/evlist.c @@ -1086,11 +1086,30 @@ int perf_evlist__mmap(struct perf_evlist *evlist, unsigned int pages) int perf_evlist__create_maps(struct perf_evlist *evlist, struct target *target) { + bool all_threads = (target->per_thread && target->system_wide); struct cpu_map *cpus; struct thread_map *threads; + /* +* If specify '-a' and '--per-thread' to perf record, perf record +* will override '--per-thread'. target->per_thread = false and +* target->system_wide = true. +* +* If specify '--per-thread' only to perf record, +* target->per_thread = true and target->system_wide = false. +* +* So target->per_thread && target->system_wide is false. +* For perf record, thread_map__new_str doesn't call +* thread_map__new_all_cpus. That will keep perf record's +* current behavior. +* +* For perf stat, it allows the case that target->per_thread and +* target->system_wide are all true. It means to collect system-wide +* per-thread data. thread_map__new_str will call +* thread_map__new_all_cpus to enumerate all threads. +*/ threads = thread_map__new_str(target->pid, target->tid, target->uid, - target->per_thread); + all_threads); if (!threads) return -1; diff --git a/tools/perf/util/thread_map.c b/tools/perf/util/thread_map.c index 3e1038f..729dad8 100644 --- a/tools/perf/util/thread_map.c +++ b/tools/perf/util/thread_map.c @@ -323,7 +323,7 @@ out_free_threads: } struct thread_map *thread_map__new_str(const char *pid, const char *tid, - uid_t uid, bool per_thread) + uid_t uid, bool all_threads) { if (pid) return thread_map__new_by_pid_str(pid); @@ -331,7 +331,7 @@ struct thread_map *thread_map__new_str(const char *pid, const char *tid, if (!tid && uid != UINT_MAX) return thread_map__new_by_uid(uid); - if (per_thread) + if (all_threads) return thread_map__new_all_cpus(); return thread_map__new_by_tid_str(tid); diff --git a/tools/perf/util/thread_map.h b/tools/perf/util/thread_map.h index 0a806b9..5ec91cf 100644 --- a/tools/perf/util/thread_map.h +++ b/tools/perf/util/thread_map.h @@ -31,7 +31,7 @@ struct thread_map *thread_map__get(struct thread_map *map); void thread_map__put(struct thread_map *map); struct thread_map *thread_map__new_str(const char *pid, - const char *tid, uid_t uid, bool per_thread); + const char *tid, uid_t uid, bool all_threads); struct thread_map *thread_map__new_by_tid_str(const char *tid_str);
[tip:perf/core] perf tools: Use target->per_thread and target->system_wide flags
Commit-ID: 147c508f3004df6e2958f6c8867909531c2a15e2 Gitweb: https://git.kernel.org/tip/147c508f3004df6e2958f6c8867909531c2a15e2 Author: Jin Yao AuthorDate: Mon, 12 Feb 2018 13:32:36 -0700 Committer: Arnaldo Carvalho de Melo CommitDate: Fri, 16 Feb 2018 14:55:40 -0300 perf tools: Use target->per_thread and target->system_wide flags Mathieu Poirier reports issue in commit ("73c0ca1eee3d perf thread_map: Enumerate all threads from /proc") that it has negative impact on 'perf record --per-thread'. It has the effect of creating a kernel event for each thread in the system for 'perf record --per-thread'. Mathieu Poirier's patch ("perf util: Do not reuse target->per_thread flag") can fix this issue by creating a new target->all_threads flag. This patch is based on Mathieu Poirier's patch but it doesn't use a new target->all_threads flag. This patch just uses 'target->per_thread && target->system_wide' as a condition to check for all threads case. Signed-off-by: Jin Yao Cc: Alexander Shishkin Cc: Namhyung Kim Cc: Peter Zijlstra Cc: linux-arm-ker...@lists.infradead.org Fixes: 73c0ca1eee3d ("perf thread_map: Enumerate all threads from /proc") Link: http://lkml.kernel.org/r/1518467557-18505-3-git-send-email-mathieu.poir...@linaro.org Signed-off-by: Mathieu Poirier [Fixed checkpatch warning about line over 80 characters] Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/util/evlist.c | 21 - tools/perf/util/thread_map.c | 4 ++-- tools/perf/util/thread_map.h | 2 +- 3 files changed, 23 insertions(+), 4 deletions(-) diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c index e5fc14e..7b7d535 100644 --- a/tools/perf/util/evlist.c +++ b/tools/perf/util/evlist.c @@ -1086,11 +1086,30 @@ int perf_evlist__mmap(struct perf_evlist *evlist, unsigned int pages) int perf_evlist__create_maps(struct perf_evlist *evlist, struct target *target) { + bool all_threads = (target->per_thread && target->system_wide); struct cpu_map *cpus; struct thread_map *threads; + /* +* If specify '-a' and '--per-thread' to perf record, perf record +* will override '--per-thread'. target->per_thread = false and +* target->system_wide = true. +* +* If specify '--per-thread' only to perf record, +* target->per_thread = true and target->system_wide = false. +* +* So target->per_thread && target->system_wide is false. +* For perf record, thread_map__new_str doesn't call +* thread_map__new_all_cpus. That will keep perf record's +* current behavior. +* +* For perf stat, it allows the case that target->per_thread and +* target->system_wide are all true. It means to collect system-wide +* per-thread data. thread_map__new_str will call +* thread_map__new_all_cpus to enumerate all threads. +*/ threads = thread_map__new_str(target->pid, target->tid, target->uid, - target->per_thread); + all_threads); if (!threads) return -1; diff --git a/tools/perf/util/thread_map.c b/tools/perf/util/thread_map.c index 3e1038f..729dad8 100644 --- a/tools/perf/util/thread_map.c +++ b/tools/perf/util/thread_map.c @@ -323,7 +323,7 @@ out_free_threads: } struct thread_map *thread_map__new_str(const char *pid, const char *tid, - uid_t uid, bool per_thread) + uid_t uid, bool all_threads) { if (pid) return thread_map__new_by_pid_str(pid); @@ -331,7 +331,7 @@ struct thread_map *thread_map__new_str(const char *pid, const char *tid, if (!tid && uid != UINT_MAX) return thread_map__new_by_uid(uid); - if (per_thread) + if (all_threads) return thread_map__new_all_cpus(); return thread_map__new_by_tid_str(tid); diff --git a/tools/perf/util/thread_map.h b/tools/perf/util/thread_map.h index 0a806b9..5ec91cf 100644 --- a/tools/perf/util/thread_map.h +++ b/tools/perf/util/thread_map.h @@ -31,7 +31,7 @@ struct thread_map *thread_map__get(struct thread_map *map); void thread_map__put(struct thread_map *map); struct thread_map *thread_map__new_str(const char *pid, - const char *tid, uid_t uid, bool per_thread); + const char *tid, uid_t uid, bool all_threads); struct thread_map *thread_map__new_by_tid_str(const char *tid_str);
Re: [PATCH] perf report: Fix a memory corrupton issue when enabling --branch-history
On 2/13/2018 10:00 PM, Jin, Yao wrote: On 2/13/2018 5:45 PM, Jiri Olsa wrote: On Tue, Feb 13, 2018 at 04:44:28PM +0800, Jin Yao wrote: Following command lines will cause perf crash. perf record -j call -g -a perf report --branch-history *** Error in `perf': double free or corruption (!prev): 0x104aa040 *** === Backtrace: = /lib/x86_64-linux-gnu/libc.so.6(+0x77725)[0x7f6b37254725] /lib/x86_64-linux-gnu/libc.so.6(+0x7ff4a)[0x7f6b3725cf4a] /lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7f6b37260abc] perf[0x51b914] perf(hist_entry_iter__add+0x1e5)[0x51f305] perf[0x43cf01] perf[0x4fa3bf] perf[0x4fa923] perf[0x4fd396] perf[0x4f9614] perf(perf_session__process_events+0x89e)[0x4fc38e] perf(cmd_report+0x15d2)[0x43f202] perf[0x4a059f] perf(main+0x631)[0x427b71] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f6b371fd830] perf(_start+0x29)[0x427d89] The memory corruption happens at: iter_add_next_cumulative_entry() { ... for (i = 0; i < iter->curr; i++) { ... } Whatever in iter_next_cumulative_entry() or in iter_add_next_cumulative_entry(), they all don't check if iter->curr exceeds the array 'he_cache[]'. If there are too many nodes in callchain, it's possible that iter->curr > iter->max_stack, then memory corruption occurs. This patch will reallocate array 'he_cache[]' in iter_next_cumulative_entry() if necessary (the case of too many nodes in callchain). right, the max_stack does not say how many nodes end up in callchain_cursor at the end.. good catch, please mention that also in the changelog max_stack looks only to limit the number of calls but not for other branches. however we know the final count from callchain_cursor itself, the attached patch might do the same job, right? I think the attached patch is ok. also could we now get rid of iter->max_stack? From my opinion, the option '--max-stack' in perf report looks not very necessary. While it's just my personal opinion, need to hear from more people. :) Thanks Jin Yao thanks, jirka --- diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c index b6140950301e..b50b7b70dcca 100644 --- a/tools/perf/util/hist.c +++ b/tools/perf/util/hist.c @@ -879,7 +879,7 @@ iter_prepare_cumulative_entry(struct hist_entry_iter *iter, * cumulated only one time to prevent entries more than 100% * overhead. */ - he_cache = malloc(sizeof(*he_cache) * (iter->max_stack + 1)); + he_cache = malloc(sizeof(*he_cache) * (callchain_cursor.nr + 1)); if (he_cache == NULL) return -ENOMEM; Hi Jiri, I guess you will post this patch, right? Thanks Jin Yao
Re: [PATCH] perf report: Fix a memory corrupton issue when enabling --branch-history
On 2/13/2018 10:00 PM, Jin, Yao wrote: On 2/13/2018 5:45 PM, Jiri Olsa wrote: On Tue, Feb 13, 2018 at 04:44:28PM +0800, Jin Yao wrote: Following command lines will cause perf crash. perf record -j call -g -a perf report --branch-history *** Error in `perf': double free or corruption (!prev): 0x104aa040 *** === Backtrace: = /lib/x86_64-linux-gnu/libc.so.6(+0x77725)[0x7f6b37254725] /lib/x86_64-linux-gnu/libc.so.6(+0x7ff4a)[0x7f6b3725cf4a] /lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7f6b37260abc] perf[0x51b914] perf(hist_entry_iter__add+0x1e5)[0x51f305] perf[0x43cf01] perf[0x4fa3bf] perf[0x4fa923] perf[0x4fd396] perf[0x4f9614] perf(perf_session__process_events+0x89e)[0x4fc38e] perf(cmd_report+0x15d2)[0x43f202] perf[0x4a059f] perf(main+0x631)[0x427b71] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f6b371fd830] perf(_start+0x29)[0x427d89] The memory corruption happens at: iter_add_next_cumulative_entry() { ... for (i = 0; i < iter->curr; i++) { ... } Whatever in iter_next_cumulative_entry() or in iter_add_next_cumulative_entry(), they all don't check if iter->curr exceeds the array 'he_cache[]'. If there are too many nodes in callchain, it's possible that iter->curr > iter->max_stack, then memory corruption occurs. This patch will reallocate array 'he_cache[]' in iter_next_cumulative_entry() if necessary (the case of too many nodes in callchain). right, the max_stack does not say how many nodes end up in callchain_cursor at the end.. good catch, please mention that also in the changelog max_stack looks only to limit the number of calls but not for other branches. however we know the final count from callchain_cursor itself, the attached patch might do the same job, right? I think the attached patch is ok. also could we now get rid of iter->max_stack? From my opinion, the option '--max-stack' in perf report looks not very necessary. While it's just my personal opinion, need to hear from more people. :) Thanks Jin Yao thanks, jirka --- diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c index b6140950301e..b50b7b70dcca 100644 --- a/tools/perf/util/hist.c +++ b/tools/perf/util/hist.c @@ -879,7 +879,7 @@ iter_prepare_cumulative_entry(struct hist_entry_iter *iter, * cumulated only one time to prevent entries more than 100% * overhead. */ - he_cache = malloc(sizeof(*he_cache) * (iter->max_stack + 1)); + he_cache = malloc(sizeof(*he_cache) * (callchain_cursor.nr + 1)); if (he_cache == NULL) return -ENOMEM; Hi Jiri, I guess you will post this patch, right? Thanks Jin Yao
Re: [PATCH] perf report: Fix a memory corrupton issue when enabling --branch-history
On 2/13/2018 5:45 PM, Jiri Olsa wrote: On Tue, Feb 13, 2018 at 04:44:28PM +0800, Jin Yao wrote: Following command lines will cause perf crash. perf record -j call -g -a perf report --branch-history *** Error in `perf': double free or corruption (!prev): 0x104aa040 *** === Backtrace: = /lib/x86_64-linux-gnu/libc.so.6(+0x77725)[0x7f6b37254725] /lib/x86_64-linux-gnu/libc.so.6(+0x7ff4a)[0x7f6b3725cf4a] /lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7f6b37260abc] perf[0x51b914] perf(hist_entry_iter__add+0x1e5)[0x51f305] perf[0x43cf01] perf[0x4fa3bf] perf[0x4fa923] perf[0x4fd396] perf[0x4f9614] perf(perf_session__process_events+0x89e)[0x4fc38e] perf(cmd_report+0x15d2)[0x43f202] perf[0x4a059f] perf(main+0x631)[0x427b71] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f6b371fd830] perf(_start+0x29)[0x427d89] The memory corruption happens at: iter_add_next_cumulative_entry() { ... for (i = 0; i < iter->curr; i++) { ... } Whatever in iter_next_cumulative_entry() or in iter_add_next_cumulative_entry(), they all don't check if iter->curr exceeds the array 'he_cache[]'. If there are too many nodes in callchain, it's possible that iter->curr > iter->max_stack, then memory corruption occurs. This patch will reallocate array 'he_cache[]' in iter_next_cumulative_entry() if necessary (the case of too many nodes in callchain). right, the max_stack does not say how many nodes end up in callchain_cursor at the end.. good catch, please mention that also in the changelog max_stack looks only to limit the number of calls but not for other branches. however we know the final count from callchain_cursor itself, the attached patch might do the same job, right? I think the attached patch is ok. also could we now get rid of iter->max_stack? From my opinion, the option '--max-stack' in perf report looks not very necessary. While it's just my personal opinion, need to hear from more people. :) Thanks Jin Yao thanks, jirka --- diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c index b6140950301e..b50b7b70dcca 100644 --- a/tools/perf/util/hist.c +++ b/tools/perf/util/hist.c @@ -879,7 +879,7 @@ iter_prepare_cumulative_entry(struct hist_entry_iter *iter, * cumulated only one time to prevent entries more than 100% * overhead. */ - he_cache = malloc(sizeof(*he_cache) * (iter->max_stack + 1)); + he_cache = malloc(sizeof(*he_cache) * (callchain_cursor.nr + 1)); if (he_cache == NULL) return -ENOMEM;
Re: [PATCH] perf report: Fix a memory corrupton issue when enabling --branch-history
On 2/13/2018 5:45 PM, Jiri Olsa wrote: On Tue, Feb 13, 2018 at 04:44:28PM +0800, Jin Yao wrote: Following command lines will cause perf crash. perf record -j call -g -a perf report --branch-history *** Error in `perf': double free or corruption (!prev): 0x104aa040 *** === Backtrace: = /lib/x86_64-linux-gnu/libc.so.6(+0x77725)[0x7f6b37254725] /lib/x86_64-linux-gnu/libc.so.6(+0x7ff4a)[0x7f6b3725cf4a] /lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7f6b37260abc] perf[0x51b914] perf(hist_entry_iter__add+0x1e5)[0x51f305] perf[0x43cf01] perf[0x4fa3bf] perf[0x4fa923] perf[0x4fd396] perf[0x4f9614] perf(perf_session__process_events+0x89e)[0x4fc38e] perf(cmd_report+0x15d2)[0x43f202] perf[0x4a059f] perf(main+0x631)[0x427b71] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f6b371fd830] perf(_start+0x29)[0x427d89] The memory corruption happens at: iter_add_next_cumulative_entry() { ... for (i = 0; i < iter->curr; i++) { ... } Whatever in iter_next_cumulative_entry() or in iter_add_next_cumulative_entry(), they all don't check if iter->curr exceeds the array 'he_cache[]'. If there are too many nodes in callchain, it's possible that iter->curr > iter->max_stack, then memory corruption occurs. This patch will reallocate array 'he_cache[]' in iter_next_cumulative_entry() if necessary (the case of too many nodes in callchain). right, the max_stack does not say how many nodes end up in callchain_cursor at the end.. good catch, please mention that also in the changelog max_stack looks only to limit the number of calls but not for other branches. however we know the final count from callchain_cursor itself, the attached patch might do the same job, right? I think the attached patch is ok. also could we now get rid of iter->max_stack? From my opinion, the option '--max-stack' in perf report looks not very necessary. While it's just my personal opinion, need to hear from more people. :) Thanks Jin Yao thanks, jirka --- diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c index b6140950301e..b50b7b70dcca 100644 --- a/tools/perf/util/hist.c +++ b/tools/perf/util/hist.c @@ -879,7 +879,7 @@ iter_prepare_cumulative_entry(struct hist_entry_iter *iter, * cumulated only one time to prevent entries more than 100% * overhead. */ - he_cache = malloc(sizeof(*he_cache) * (iter->max_stack + 1)); + he_cache = malloc(sizeof(*he_cache) * (callchain_cursor.nr + 1)); if (he_cache == NULL) return -ENOMEM;
[PATCH] perf report: Fix a memory corrupton issue when enabling --branch-history
Following command lines will cause perf crash. perf record -j call -g -a perf report --branch-history *** Error in `perf': double free or corruption (!prev): 0x104aa040 *** === Backtrace: = /lib/x86_64-linux-gnu/libc.so.6(+0x77725)[0x7f6b37254725] /lib/x86_64-linux-gnu/libc.so.6(+0x7ff4a)[0x7f6b3725cf4a] /lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7f6b37260abc] perf[0x51b914] perf(hist_entry_iter__add+0x1e5)[0x51f305] perf[0x43cf01] perf[0x4fa3bf] perf[0x4fa923] perf[0x4fd396] perf[0x4f9614] perf(perf_session__process_events+0x89e)[0x4fc38e] perf(cmd_report+0x15d2)[0x43f202] perf[0x4a059f] perf(main+0x631)[0x427b71] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f6b371fd830] perf(_start+0x29)[0x427d89] The memory corruption happens at: iter_add_next_cumulative_entry() { ... for (i = 0; i < iter->curr; i++) { ... } Whatever in iter_next_cumulative_entry() or in iter_add_next_cumulative_entry(), they all don't check if iter->curr exceeds the array 'he_cache[]'. If there are too many nodes in callchain, it's possible that iter->curr > iter->max_stack, then memory corruption occurs. This patch will reallocate array 'he_cache[]' in iter_next_cumulative_entry() if necessary (the case of too many nodes in callchain). Signed-off-by: Jin Yao <yao@linux.intel.com> --- tools/perf/util/hist.c | 21 + 1 file changed, 21 insertions(+) diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c index b614095..71f07d2 100644 --- a/tools/perf/util/hist.c +++ b/tools/perf/util/hist.c @@ -926,11 +926,32 @@ iter_next_cumulative_entry(struct hist_entry_iter *iter, struct addr_location *al) { struct callchain_cursor_node *node; + struct hist_entry **tmp; + int i; node = callchain_cursor_current(_cursor); if (node == NULL) return 0; + /* +* If there are too many nodes in callchain, +* increase the size of he_cache[]. +*/ + if (iter->curr == iter->max_stack) { + i = 2 * iter->max_stack + 1; + tmp = realloc(iter->priv, sizeof(struct hist_entry *) * i); + if (tmp == NULL) { + /* +* No need to free iter->priv here. It will be +* freed in iter_finish_cumulative_entry. +*/ + return 0; + } + + iter->priv = tmp; + iter->max_stack = i; + } + return fill_callchain_info(al, node, iter->hide_unresolved); } -- 2.7.4
[PATCH] perf report: Fix a memory corrupton issue when enabling --branch-history
Following command lines will cause perf crash. perf record -j call -g -a perf report --branch-history *** Error in `perf': double free or corruption (!prev): 0x104aa040 *** === Backtrace: = /lib/x86_64-linux-gnu/libc.so.6(+0x77725)[0x7f6b37254725] /lib/x86_64-linux-gnu/libc.so.6(+0x7ff4a)[0x7f6b3725cf4a] /lib/x86_64-linux-gnu/libc.so.6(cfree+0x4c)[0x7f6b37260abc] perf[0x51b914] perf(hist_entry_iter__add+0x1e5)[0x51f305] perf[0x43cf01] perf[0x4fa3bf] perf[0x4fa923] perf[0x4fd396] perf[0x4f9614] perf(perf_session__process_events+0x89e)[0x4fc38e] perf(cmd_report+0x15d2)[0x43f202] perf[0x4a059f] perf(main+0x631)[0x427b71] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f6b371fd830] perf(_start+0x29)[0x427d89] The memory corruption happens at: iter_add_next_cumulative_entry() { ... for (i = 0; i < iter->curr; i++) { ... } Whatever in iter_next_cumulative_entry() or in iter_add_next_cumulative_entry(), they all don't check if iter->curr exceeds the array 'he_cache[]'. If there are too many nodes in callchain, it's possible that iter->curr > iter->max_stack, then memory corruption occurs. This patch will reallocate array 'he_cache[]' in iter_next_cumulative_entry() if necessary (the case of too many nodes in callchain). Signed-off-by: Jin Yao --- tools/perf/util/hist.c | 21 + 1 file changed, 21 insertions(+) diff --git a/tools/perf/util/hist.c b/tools/perf/util/hist.c index b614095..71f07d2 100644 --- a/tools/perf/util/hist.c +++ b/tools/perf/util/hist.c @@ -926,11 +926,32 @@ iter_next_cumulative_entry(struct hist_entry_iter *iter, struct addr_location *al) { struct callchain_cursor_node *node; + struct hist_entry **tmp; + int i; node = callchain_cursor_current(_cursor); if (node == NULL) return 0; + /* +* If there are too many nodes in callchain, +* increase the size of he_cache[]. +*/ + if (iter->curr == iter->max_stack) { + i = 2 * iter->max_stack + 1; + tmp = realloc(iter->priv, sizeof(struct hist_entry *) * i); + if (tmp == NULL) { + /* +* No need to free iter->priv here. It will be +* freed in iter_finish_cumulative_entry. +*/ + return 0; + } + + iter->priv = tmp; + iter->max_stack = i; + } + return fill_callchain_info(al, node, iter->hide_unresolved); } -- 2.7.4
Re: [PATCH] perf report: Fix wrong jump arrow
Hi Arnaldo, Thanks for applying the patch. Yes the issue only happens on skl+. The committer notes are very good and clear. Thanks Jin Yao On 2/9/2018 11:15 PM, Arnaldo Carvalho de Melo wrote: Em Mon, Jan 29, 2018 at 06:57:53PM +0800, Jin Yao escreveu: When we use perf report interactive annotate view, we can see the position of jump arrow is not correct. For example, 1. perf record -b ... 2. perf report 3. In interactive mode, select Annotate 'function' Percent│ IPC Cycle │if (flag) 1.37 │0.4┌── 1 ↓ je 82 │ │x += x / y + y / x; Applied and added this to the cset log, please check: Committer notes: Please note that only from LBRv5 (according to Jiri) onwards, i.e. >= Skylake is that we'll have the cycles counts in each branch record entry, so to see the Cycles and IPC columns, and be able to test this patch, one need a capable hardware. While applying this I first tested it on a Broadwell class machine and couldn't get those columns, will add code to the annotate browser to warn the user about that, i.e. you have branch records, but no cycles, use a more recent hardware to get the cycles and IPC columns. - Arnaldo
Re: [PATCH] perf report: Fix wrong jump arrow
Hi Arnaldo, Thanks for applying the patch. Yes the issue only happens on skl+. The committer notes are very good and clear. Thanks Jin Yao On 2/9/2018 11:15 PM, Arnaldo Carvalho de Melo wrote: Em Mon, Jan 29, 2018 at 06:57:53PM +0800, Jin Yao escreveu: When we use perf report interactive annotate view, we can see the position of jump arrow is not correct. For example, 1. perf record -b ... 2. perf report 3. In interactive mode, select Annotate 'function' Percent│ IPC Cycle │if (flag) 1.37 │0.4┌── 1 ↓ je 82 │ │x += x / y + y / x; Applied and added this to the cset log, please check: Committer notes: Please note that only from LBRv5 (according to Jiri) onwards, i.e. >= Skylake is that we'll have the cycles counts in each branch record entry, so to see the Cycles and IPC columns, and be able to test this patch, one need a capable hardware. While applying this I first tested it on a Broadwell class machine and couldn't get those columns, will add code to the annotate browser to warn the user about that, i.e. you have branch records, but no cycles, use a more recent hardware to get the cycles and IPC columns. - Arnaldo
[PATCH] perf report: Fix wrong jump arrow
When we use perf report interactive annotate view, we can see the position of jump arrow is not correct. For example, 1. perf record -b ... 2. perf report 3. In interactive mode, select Annotate 'function' Percent│ IPC Cycle │if (flag) 1.37 │0.4┌── 1 ↓ je 82 │ │x += x / y + y / x; 0.00 │0.4│ 1310movsd (%rsp),%xmm0 0.00 │0.4│ 565movsd 0x8(%rsp),%xmm4 │0.4│ movsd 0x8(%rsp),%xmm1 │0.4│ movsd (%rsp),%xmm3 │0.4│ divsd %xmm4,%xmm0 0.00 │0.4│ 579divsd %xmm3,%xmm1 │0.4│ movsd (%rsp),%xmm2 │0.4│ addsd %xmm1,%xmm0 │0.4│ addsd %xmm2,%xmm0 0.00 │0.4│ movsd %xmm0,(%rsp) │ │volatile double x = 1212121212, y = 121212; │ │ │ │s_randseed = time(0); │ │srand(s_randseed); │ │ │ │for (i = 0; i < 20; i++) { 1.37 │0.4└─→ 82: sub$0x1,%ebx 28.21 │0.4817 ↑ jne38 The jump arrow in above example is not correct. It should add the width of IPC and Cycle. With this patch, the result is: Percent│ IPC Cycle │if (flag) 1.37 │0.48 1 ┌──je 82 │ │x += x / y + y / x; 0.00 │0.48 1310 │ movsd (%rsp),%xmm0 0.00 │0.48 565 │ movsd 0x8(%rsp),%xmm4 │0.48 │ movsd 0x8(%rsp),%xmm1 │0.48 │ movsd (%rsp),%xmm3 │0.48 │ divsd %xmm4,%xmm0 0.00 │0.48 579 │ divsd %xmm3,%xmm1 │0.48 │ movsd (%rsp),%xmm2 │0.48 │ addsd %xmm1,%xmm0 │0.48 │ addsd %xmm2,%xmm0 0.00 │0.48 │ movsd %xmm0,(%rsp) │ │volatile double x = 1212121212, y = 121212; │ │ │ │s_randseed = time(0); │ │srand(s_randseed); │ │ │ │for (i = 0; i < 20; i++) { 1.37 │0.4882:└─→sub$0x1,%ebx 28.21 │0.4817 ↑ jne38 Signed-off-by: Jin Yao <yao@linux.intel.com> --- tools/perf/ui/browsers/annotate.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/tools/perf/ui/browsers/annotate.c b/tools/perf/ui/browsers/annotate.c index 2864279..e2f6663 100644 --- a/tools/perf/ui/browsers/annotate.c +++ b/tools/perf/ui/browsers/annotate.c @@ -319,6 +319,7 @@ static void annotate_browser__draw_current_jump(struct ui_browser *browser) struct map_symbol *ms = ab->b.priv; struct symbol *sym = ms->sym; u8 pcnt_width = annotate_browser__pcnt_width(ab); + int width = 0; /* PLT symbols contain external offsets */ if (strstr(sym->name, "@plt")) @@ -340,13 +341,17 @@ static void annotate_browser__draw_current_jump(struct ui_browser *browser) to = (u64)btarget->idx; } + if (ab->have_cycles) + width = IPC_WIDTH + CYCLES_WIDTH; + ui_browser__set_color(browser, HE_COLORSET_JUMP_ARROWS); - __ui_browser__line_arrow(browser, pcnt_width + 2 + ab->addr_width, + __ui_browser__line_arrow(browser, +pcnt_width + 2 + ab->addr_width + width, from, to); if (is_fused(ab, cursor)) { ui_browser__mark_fused(browser, - pcnt_width + 3 + ab->addr_width, + pcnt_width + 3 + ab->addr_width + width, from - 1, to > from ? true : false); } -- 2.7.4
[PATCH] perf report: Fix wrong jump arrow
When we use perf report interactive annotate view, we can see the position of jump arrow is not correct. For example, 1. perf record -b ... 2. perf report 3. In interactive mode, select Annotate 'function' Percent│ IPC Cycle │if (flag) 1.37 │0.4┌── 1 ↓ je 82 │ │x += x / y + y / x; 0.00 │0.4│ 1310movsd (%rsp),%xmm0 0.00 │0.4│ 565movsd 0x8(%rsp),%xmm4 │0.4│ movsd 0x8(%rsp),%xmm1 │0.4│ movsd (%rsp),%xmm3 │0.4│ divsd %xmm4,%xmm0 0.00 │0.4│ 579divsd %xmm3,%xmm1 │0.4│ movsd (%rsp),%xmm2 │0.4│ addsd %xmm1,%xmm0 │0.4│ addsd %xmm2,%xmm0 0.00 │0.4│ movsd %xmm0,(%rsp) │ │volatile double x = 1212121212, y = 121212; │ │ │ │s_randseed = time(0); │ │srand(s_randseed); │ │ │ │for (i = 0; i < 20; i++) { 1.37 │0.4└─→ 82: sub$0x1,%ebx 28.21 │0.4817 ↑ jne38 The jump arrow in above example is not correct. It should add the width of IPC and Cycle. With this patch, the result is: Percent│ IPC Cycle │if (flag) 1.37 │0.48 1 ┌──je 82 │ │x += x / y + y / x; 0.00 │0.48 1310 │ movsd (%rsp),%xmm0 0.00 │0.48 565 │ movsd 0x8(%rsp),%xmm4 │0.48 │ movsd 0x8(%rsp),%xmm1 │0.48 │ movsd (%rsp),%xmm3 │0.48 │ divsd %xmm4,%xmm0 0.00 │0.48 579 │ divsd %xmm3,%xmm1 │0.48 │ movsd (%rsp),%xmm2 │0.48 │ addsd %xmm1,%xmm0 │0.48 │ addsd %xmm2,%xmm0 0.00 │0.48 │ movsd %xmm0,(%rsp) │ │volatile double x = 1212121212, y = 121212; │ │ │ │s_randseed = time(0); │ │srand(s_randseed); │ │ │ │for (i = 0; i < 20; i++) { 1.37 │0.4882:└─→sub$0x1,%ebx 28.21 │0.4817 ↑ jne38 Signed-off-by: Jin Yao --- tools/perf/ui/browsers/annotate.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/tools/perf/ui/browsers/annotate.c b/tools/perf/ui/browsers/annotate.c index 2864279..e2f6663 100644 --- a/tools/perf/ui/browsers/annotate.c +++ b/tools/perf/ui/browsers/annotate.c @@ -319,6 +319,7 @@ static void annotate_browser__draw_current_jump(struct ui_browser *browser) struct map_symbol *ms = ab->b.priv; struct symbol *sym = ms->sym; u8 pcnt_width = annotate_browser__pcnt_width(ab); + int width = 0; /* PLT symbols contain external offsets */ if (strstr(sym->name, "@plt")) @@ -340,13 +341,17 @@ static void annotate_browser__draw_current_jump(struct ui_browser *browser) to = (u64)btarget->idx; } + if (ab->have_cycles) + width = IPC_WIDTH + CYCLES_WIDTH; + ui_browser__set_color(browser, HE_COLORSET_JUMP_ARROWS); - __ui_browser__line_arrow(browser, pcnt_width + 2 + ab->addr_width, + __ui_browser__line_arrow(browser, +pcnt_width + 2 + ab->addr_width + width, from, to); if (is_fused(ab, cursor)) { ui_browser__mark_fused(browser, - pcnt_width + 3 + ab->addr_width, + pcnt_width + 3 + ab->addr_width + width, from - 1, to > from ? true : false); } -- 2.7.4
[PATCH v2] perf util: Use target->per_thread and target->system_wide flags
Mathieu Poirier reports issue in commit ("73c0ca1eee3d perf thread_map: Enumerate all threads from /proc") that it has negative impact on 'perf record --per-thread'. It has the effect of creating a kernel event for each thread in the system for 'perf record --per-thread'. Mathieu Poirier's patch ("perf util: Do not reuse target->per_thread flag") can fix this issue by creating a new target->all_threads flag. This patch is based on Mathieu Poirier's patch but it doesn't use a new target->all_threads flag. This patch just uses 'target->per_thread && target->system_wide' as a condition to check for all threads case. v2: --- According to Jiri's comment, add description to explain why the patch uses 'target->per_thread && target->system_wide'. v2 doesn't have functional change. Signed-off-by: Jin Yao <yao@linux.intel.com> --- tools/perf/util/evlist.c | 20 +++- tools/perf/util/thread_map.c | 4 ++-- tools/perf/util/thread_map.h | 2 +- 3 files changed, 22 insertions(+), 4 deletions(-) diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c index ac35cd2..3eea0f8 100644 --- a/tools/perf/util/evlist.c +++ b/tools/perf/util/evlist.c @@ -1106,8 +1106,26 @@ int perf_evlist__create_maps(struct perf_evlist *evlist, struct target *target) struct cpu_map *cpus; struct thread_map *threads; + /* +* If specify '-a' and '--per-thread' to perf record, perf record +* will override '--per-thread'. target->per_thread = false and +* target->system_wide = true. +* +* If specify '--per-thread' only to perf record, +* target->per_thread = true and target->system_wide = false. +* +* So target->per_thread && target->system_wide is false. +* For perf record, thread_map__new_str doesn't call +* thread_map__new_all_cpus. That will keep perf record's +* current behavior. +* +* For perf stat, it allows the case that target->per_thread and +* target->system_wide are all true. It means to collect system-wide +* per-thread data. thread_map__new_str will call +* thread_map__new_all_cpus to enumerate all threads. +*/ threads = thread_map__new_str(target->pid, target->tid, target->uid, - target->per_thread); + target->per_thread && target->system_wide); if (!threads) return -1; diff --git a/tools/perf/util/thread_map.c b/tools/perf/util/thread_map.c index 3e1038f..729dad8 100644 --- a/tools/perf/util/thread_map.c +++ b/tools/perf/util/thread_map.c @@ -323,7 +323,7 @@ struct thread_map *thread_map__new_by_tid_str(const char *tid_str) } struct thread_map *thread_map__new_str(const char *pid, const char *tid, - uid_t uid, bool per_thread) + uid_t uid, bool all_threads) { if (pid) return thread_map__new_by_pid_str(pid); @@ -331,7 +331,7 @@ struct thread_map *thread_map__new_str(const char *pid, const char *tid, if (!tid && uid != UINT_MAX) return thread_map__new_by_uid(uid); - if (per_thread) + if (all_threads) return thread_map__new_all_cpus(); return thread_map__new_by_tid_str(tid); diff --git a/tools/perf/util/thread_map.h b/tools/perf/util/thread_map.h index 0a806b9..5ec91cf 100644 --- a/tools/perf/util/thread_map.h +++ b/tools/perf/util/thread_map.h @@ -31,7 +31,7 @@ struct thread_map *thread_map__get(struct thread_map *map); void thread_map__put(struct thread_map *map); struct thread_map *thread_map__new_str(const char *pid, - const char *tid, uid_t uid, bool per_thread); + const char *tid, uid_t uid, bool all_threads); struct thread_map *thread_map__new_by_tid_str(const char *tid_str); -- 2.7.4
[PATCH v2] perf util: Use target->per_thread and target->system_wide flags
Mathieu Poirier reports issue in commit ("73c0ca1eee3d perf thread_map: Enumerate all threads from /proc") that it has negative impact on 'perf record --per-thread'. It has the effect of creating a kernel event for each thread in the system for 'perf record --per-thread'. Mathieu Poirier's patch ("perf util: Do not reuse target->per_thread flag") can fix this issue by creating a new target->all_threads flag. This patch is based on Mathieu Poirier's patch but it doesn't use a new target->all_threads flag. This patch just uses 'target->per_thread && target->system_wide' as a condition to check for all threads case. v2: --- According to Jiri's comment, add description to explain why the patch uses 'target->per_thread && target->system_wide'. v2 doesn't have functional change. Signed-off-by: Jin Yao --- tools/perf/util/evlist.c | 20 +++- tools/perf/util/thread_map.c | 4 ++-- tools/perf/util/thread_map.h | 2 +- 3 files changed, 22 insertions(+), 4 deletions(-) diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c index ac35cd2..3eea0f8 100644 --- a/tools/perf/util/evlist.c +++ b/tools/perf/util/evlist.c @@ -1106,8 +1106,26 @@ int perf_evlist__create_maps(struct perf_evlist *evlist, struct target *target) struct cpu_map *cpus; struct thread_map *threads; + /* +* If specify '-a' and '--per-thread' to perf record, perf record +* will override '--per-thread'. target->per_thread = false and +* target->system_wide = true. +* +* If specify '--per-thread' only to perf record, +* target->per_thread = true and target->system_wide = false. +* +* So target->per_thread && target->system_wide is false. +* For perf record, thread_map__new_str doesn't call +* thread_map__new_all_cpus. That will keep perf record's +* current behavior. +* +* For perf stat, it allows the case that target->per_thread and +* target->system_wide are all true. It means to collect system-wide +* per-thread data. thread_map__new_str will call +* thread_map__new_all_cpus to enumerate all threads. +*/ threads = thread_map__new_str(target->pid, target->tid, target->uid, - target->per_thread); + target->per_thread && target->system_wide); if (!threads) return -1; diff --git a/tools/perf/util/thread_map.c b/tools/perf/util/thread_map.c index 3e1038f..729dad8 100644 --- a/tools/perf/util/thread_map.c +++ b/tools/perf/util/thread_map.c @@ -323,7 +323,7 @@ struct thread_map *thread_map__new_by_tid_str(const char *tid_str) } struct thread_map *thread_map__new_str(const char *pid, const char *tid, - uid_t uid, bool per_thread) + uid_t uid, bool all_threads) { if (pid) return thread_map__new_by_pid_str(pid); @@ -331,7 +331,7 @@ struct thread_map *thread_map__new_str(const char *pid, const char *tid, if (!tid && uid != UINT_MAX) return thread_map__new_by_uid(uid); - if (per_thread) + if (all_threads) return thread_map__new_all_cpus(); return thread_map__new_by_tid_str(tid); diff --git a/tools/perf/util/thread_map.h b/tools/perf/util/thread_map.h index 0a806b9..5ec91cf 100644 --- a/tools/perf/util/thread_map.h +++ b/tools/perf/util/thread_map.h @@ -31,7 +31,7 @@ struct thread_map *thread_map__get(struct thread_map *map); void thread_map__put(struct thread_map *map); struct thread_map *thread_map__new_str(const char *pid, - const char *tid, uid_t uid, bool per_thread); + const char *tid, uid_t uid, bool all_threads); struct thread_map *thread_map__new_by_tid_str(const char *tid_str); -- 2.7.4
Re: [PATCH] perf util: Use target->per_thread and target->system_wide flags
On 1/23/2018 10:40 PM, Jiri Olsa wrote: On Tue, Jan 23, 2018 at 07:02:44AM +0800, Jin, Yao wrote: SNIP threads = thread_map__new_str(target->pid, target->tid, target->uid, - target->per_thread); + target->per_thread && target->system_wide); At first glance I thought your solution would do the trick but perf record does use target->system_wide when the '-a' switch is used. Moreover specifying the '-a' switch doesn't prevent the '--per-thread' option from being used as well, making both target->perf_thread and target_system_wide equal to true (and that is not good). Although not a fan of adding more to struct target, the advantage of having target->all_threads is that we are guaranteed that it isn't used anywhere else. Let me know what you think, Mathieu If we specify both '-a' and '--per-thread' to perf record, perf record will override'--per-thread'. So now target->per_thread = false, and target->system_wide = true. If we specify '--per-thread' only to perf record, target->per_thread = true, and target->system_wide = false. So whatever for any case, target->per_thread && target->system_wide is false. Since the parameter is false, in thread_map__new_str(), it will not execute the thread_map__new_all_cpus(). So that will not change perf record previous behavior. In perf stat, it allows the case that target->per_thread and target->system_wide are all true. That means we want to collect system-wide per-thread metrics. could you please put this description into comment before the thread_map__new_str is called? thanks, jirka OK, I will put this comment in v2. I will send v2 soon. Thanks Jin Yao
Re: [PATCH] perf util: Use target->per_thread and target->system_wide flags
On 1/23/2018 10:40 PM, Jiri Olsa wrote: On Tue, Jan 23, 2018 at 07:02:44AM +0800, Jin, Yao wrote: SNIP threads = thread_map__new_str(target->pid, target->tid, target->uid, - target->per_thread); + target->per_thread && target->system_wide); At first glance I thought your solution would do the trick but perf record does use target->system_wide when the '-a' switch is used. Moreover specifying the '-a' switch doesn't prevent the '--per-thread' option from being used as well, making both target->perf_thread and target_system_wide equal to true (and that is not good). Although not a fan of adding more to struct target, the advantage of having target->all_threads is that we are guaranteed that it isn't used anywhere else. Let me know what you think, Mathieu If we specify both '-a' and '--per-thread' to perf record, perf record will override'--per-thread'. So now target->per_thread = false, and target->system_wide = true. If we specify '--per-thread' only to perf record, target->per_thread = true, and target->system_wide = false. So whatever for any case, target->per_thread && target->system_wide is false. Since the parameter is false, in thread_map__new_str(), it will not execute the thread_map__new_all_cpus(). So that will not change perf record previous behavior. In perf stat, it allows the case that target->per_thread and target->system_wide are all true. That means we want to collect system-wide per-thread metrics. could you please put this description into comment before the thread_map__new_str is called? thanks, jirka OK, I will put this comment in v2. I will send v2 soon. Thanks Jin Yao
Re: [PATCH] perf util: Use target->per_thread and target->system_wide flags
On 1/23/2018 7:56 AM, Mathieu Poirier wrote: On 22 January 2018 at 15:15, Jin Yao <yao@linux.intel.com> wrote: Mathieu Poirier reports issue in commit ("73c0ca1eee3d perf thread_map: Enumerate all threads from /proc") that it has negative impact on 'perf record --per-thread'. It has the effect of creating a kernel event for each thread in the system for 'perf record --per-thread'. Mathieu Poirier's patch ("perf util: Do not reuse target->per_thread flag") can fix this issue by creating a new target->all_threads flag. This patch is based on Mathieu Poirier's patch but it doesn't use a new target->all_threads flag. This patch just uses 'target->per_thread && target->system_wide' as a condition to check for all threads case. Signed-off-by: Jin Yao <yao@linux.intel.com> --- tools/perf/util/evlist.c | 2 +- tools/perf/util/thread_map.c | 4 ++-- tools/perf/util/thread_map.h | 2 +- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c index 120efd8..9dff74a 100644 --- a/tools/perf/util/evlist.c +++ b/tools/perf/util/evlist.c @@ -1106,7 +1106,7 @@ int perf_evlist__create_maps(struct perf_evlist *evlist, struct target *target) struct thread_map *threads; threads = thread_map__new_str(target->pid, target->tid, target->uid, - target->per_thread); + target->per_thread && target->system_wide); if (!threads) return -1; diff --git a/tools/perf/util/thread_map.c b/tools/perf/util/thread_map.c index 3e1038f..729dad8 100644 --- a/tools/perf/util/thread_map.c +++ b/tools/perf/util/thread_map.c @@ -323,7 +323,7 @@ struct thread_map *thread_map__new_by_tid_str(const char *tid_str) } struct thread_map *thread_map__new_str(const char *pid, const char *tid, - uid_t uid, bool per_thread) + uid_t uid, bool all_threads) { if (pid) return thread_map__new_by_pid_str(pid); @@ -331,7 +331,7 @@ struct thread_map *thread_map__new_str(const char *pid, const char *tid, if (!tid && uid != UINT_MAX) return thread_map__new_by_uid(uid); - if (per_thread) + if (all_threads) return thread_map__new_all_cpus(); return thread_map__new_by_tid_str(tid); diff --git a/tools/perf/util/thread_map.h b/tools/perf/util/thread_map.h index 0a806b9..5ec91cf 100644 --- a/tools/perf/util/thread_map.h +++ b/tools/perf/util/thread_map.h @@ -31,7 +31,7 @@ struct thread_map *thread_map__get(struct thread_map *map); void thread_map__put(struct thread_map *map); struct thread_map *thread_map__new_str(const char *pid, - const char *tid, uid_t uid, bool per_thread); + const char *tid, uid_t uid, bool all_threads); struct thread_map *thread_map__new_by_tid_str(const char *tid_str); Reviewed-and-tested-by: Mathieu Poirier "mathieu.poir...@linaro.org" Thanks a lot for reviewing the patch. Thanks Jin Yao
Re: [PATCH] perf util: Use target->per_thread and target->system_wide flags
On 1/23/2018 7:56 AM, Mathieu Poirier wrote: On 22 January 2018 at 15:15, Jin Yao wrote: Mathieu Poirier reports issue in commit ("73c0ca1eee3d perf thread_map: Enumerate all threads from /proc") that it has negative impact on 'perf record --per-thread'. It has the effect of creating a kernel event for each thread in the system for 'perf record --per-thread'. Mathieu Poirier's patch ("perf util: Do not reuse target->per_thread flag") can fix this issue by creating a new target->all_threads flag. This patch is based on Mathieu Poirier's patch but it doesn't use a new target->all_threads flag. This patch just uses 'target->per_thread && target->system_wide' as a condition to check for all threads case. Signed-off-by: Jin Yao --- tools/perf/util/evlist.c | 2 +- tools/perf/util/thread_map.c | 4 ++-- tools/perf/util/thread_map.h | 2 +- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c index 120efd8..9dff74a 100644 --- a/tools/perf/util/evlist.c +++ b/tools/perf/util/evlist.c @@ -1106,7 +1106,7 @@ int perf_evlist__create_maps(struct perf_evlist *evlist, struct target *target) struct thread_map *threads; threads = thread_map__new_str(target->pid, target->tid, target->uid, - target->per_thread); + target->per_thread && target->system_wide); if (!threads) return -1; diff --git a/tools/perf/util/thread_map.c b/tools/perf/util/thread_map.c index 3e1038f..729dad8 100644 --- a/tools/perf/util/thread_map.c +++ b/tools/perf/util/thread_map.c @@ -323,7 +323,7 @@ struct thread_map *thread_map__new_by_tid_str(const char *tid_str) } struct thread_map *thread_map__new_str(const char *pid, const char *tid, - uid_t uid, bool per_thread) + uid_t uid, bool all_threads) { if (pid) return thread_map__new_by_pid_str(pid); @@ -331,7 +331,7 @@ struct thread_map *thread_map__new_str(const char *pid, const char *tid, if (!tid && uid != UINT_MAX) return thread_map__new_by_uid(uid); - if (per_thread) + if (all_threads) return thread_map__new_all_cpus(); return thread_map__new_by_tid_str(tid); diff --git a/tools/perf/util/thread_map.h b/tools/perf/util/thread_map.h index 0a806b9..5ec91cf 100644 --- a/tools/perf/util/thread_map.h +++ b/tools/perf/util/thread_map.h @@ -31,7 +31,7 @@ struct thread_map *thread_map__get(struct thread_map *map); void thread_map__put(struct thread_map *map); struct thread_map *thread_map__new_str(const char *pid, - const char *tid, uid_t uid, bool per_thread); + const char *tid, uid_t uid, bool all_threads); struct thread_map *thread_map__new_by_tid_str(const char *tid_str); Reviewed-and-tested-by: Mathieu Poirier "mathieu.poir...@linaro.org" Thanks a lot for reviewing the patch. Thanks Jin Yao
Re: [PATCH] perf util: Use target->per_thread and target->system_wide flags
On 1/23/2018 5:10 AM, Mathieu Poirier wrote: On 22 January 2018 at 15:15, Jin Yao <yao@linux.intel.com> wrote: Mathieu Poirier reports issue in commit ("73c0ca1eee3d perf thread_map: Enumerate all threads from /proc") that it has negative impact on 'perf record --per-thread'. It has the effect of creating a kernel event for each thread in the system for 'perf record --per-thread'. Mathieu Poirier's patch ("perf util: Do not reuse target->per_thread flag") can fix this issue by creating a new target->all_threads flag. This patch is based on Mathieu Poirier's patch but it doesn't use a new target->all_threads flag. This patch just uses 'target->per_thread && target->system_wide' as a condition to check for all threads case. Signed-off-by: Jin Yao <yao@linux.intel.com> --- tools/perf/util/evlist.c | 2 +- tools/perf/util/thread_map.c | 4 ++-- tools/perf/util/thread_map.h | 2 +- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c index 120efd8..9dff74a 100644 --- a/tools/perf/util/evlist.c +++ b/tools/perf/util/evlist.c @@ -1106,7 +1106,7 @@ int perf_evlist__create_maps(struct perf_evlist *evlist, struct target *target) struct thread_map *threads; threads = thread_map__new_str(target->pid, target->tid, target->uid, - target->per_thread); + target->per_thread && target->system_wide); At first glance I thought your solution would do the trick but perf record does use target->system_wide when the '-a' switch is used. Moreover specifying the '-a' switch doesn't prevent the '--per-thread' option from being used as well, making both target->perf_thread and target_system_wide equal to true (and that is not good). Although not a fan of adding more to struct target, the advantage of having target->all_threads is that we are guaranteed that it isn't used anywhere else. Let me know what you think, Mathieu If we specify both '-a' and '--per-thread' to perf record, perf record will override'--per-thread'. So now target->per_thread = false, and target->system_wide = true. If we specify '--per-thread' only to perf record, target->per_thread = true, and target->system_wide = false. So whatever for any case, target->per_thread && target->system_wide is false. Since the parameter is false, in thread_map__new_str(), it will not execute the thread_map__new_all_cpus(). So that will not change perf record previous behavior. In perf stat, it allows the case that target->per_thread and target->system_wide are all true. That means we want to collect system-wide per-thread metrics. That's my current thinking. Thanks Jin Yao
Re: [PATCH] perf util: Use target->per_thread and target->system_wide flags
On 1/23/2018 5:10 AM, Mathieu Poirier wrote: On 22 January 2018 at 15:15, Jin Yao wrote: Mathieu Poirier reports issue in commit ("73c0ca1eee3d perf thread_map: Enumerate all threads from /proc") that it has negative impact on 'perf record --per-thread'. It has the effect of creating a kernel event for each thread in the system for 'perf record --per-thread'. Mathieu Poirier's patch ("perf util: Do not reuse target->per_thread flag") can fix this issue by creating a new target->all_threads flag. This patch is based on Mathieu Poirier's patch but it doesn't use a new target->all_threads flag. This patch just uses 'target->per_thread && target->system_wide' as a condition to check for all threads case. Signed-off-by: Jin Yao --- tools/perf/util/evlist.c | 2 +- tools/perf/util/thread_map.c | 4 ++-- tools/perf/util/thread_map.h | 2 +- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c index 120efd8..9dff74a 100644 --- a/tools/perf/util/evlist.c +++ b/tools/perf/util/evlist.c @@ -1106,7 +1106,7 @@ int perf_evlist__create_maps(struct perf_evlist *evlist, struct target *target) struct thread_map *threads; threads = thread_map__new_str(target->pid, target->tid, target->uid, - target->per_thread); + target->per_thread && target->system_wide); At first glance I thought your solution would do the trick but perf record does use target->system_wide when the '-a' switch is used. Moreover specifying the '-a' switch doesn't prevent the '--per-thread' option from being used as well, making both target->perf_thread and target_system_wide equal to true (and that is not good). Although not a fan of adding more to struct target, the advantage of having target->all_threads is that we are guaranteed that it isn't used anywhere else. Let me know what you think, Mathieu If we specify both '-a' and '--per-thread' to perf record, perf record will override'--per-thread'. So now target->per_thread = false, and target->system_wide = true. If we specify '--per-thread' only to perf record, target->per_thread = true, and target->system_wide = false. So whatever for any case, target->per_thread && target->system_wide is false. Since the parameter is false, in thread_map__new_str(), it will not execute the thread_map__new_all_cpus(). So that will not change perf record previous behavior. In perf stat, it allows the case that target->per_thread and target->system_wide are all true. That means we want to collect system-wide per-thread metrics. That's my current thinking. Thanks Jin Yao
[PATCH] perf util: Use target->per_thread and target->system_wide flags
Mathieu Poirier reports issue in commit ("73c0ca1eee3d perf thread_map: Enumerate all threads from /proc") that it has negative impact on 'perf record --per-thread'. It has the effect of creating a kernel event for each thread in the system for 'perf record --per-thread'. Mathieu Poirier's patch ("perf util: Do not reuse target->per_thread flag") can fix this issue by creating a new target->all_threads flag. This patch is based on Mathieu Poirier's patch but it doesn't use a new target->all_threads flag. This patch just uses 'target->per_thread && target->system_wide' as a condition to check for all threads case. Signed-off-by: Jin Yao <yao@linux.intel.com> --- tools/perf/util/evlist.c | 2 +- tools/perf/util/thread_map.c | 4 ++-- tools/perf/util/thread_map.h | 2 +- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c index 120efd8..9dff74a 100644 --- a/tools/perf/util/evlist.c +++ b/tools/perf/util/evlist.c @@ -1106,7 +1106,7 @@ int perf_evlist__create_maps(struct perf_evlist *evlist, struct target *target) struct thread_map *threads; threads = thread_map__new_str(target->pid, target->tid, target->uid, - target->per_thread); + target->per_thread && target->system_wide); if (!threads) return -1; diff --git a/tools/perf/util/thread_map.c b/tools/perf/util/thread_map.c index 3e1038f..729dad8 100644 --- a/tools/perf/util/thread_map.c +++ b/tools/perf/util/thread_map.c @@ -323,7 +323,7 @@ struct thread_map *thread_map__new_by_tid_str(const char *tid_str) } struct thread_map *thread_map__new_str(const char *pid, const char *tid, - uid_t uid, bool per_thread) + uid_t uid, bool all_threads) { if (pid) return thread_map__new_by_pid_str(pid); @@ -331,7 +331,7 @@ struct thread_map *thread_map__new_str(const char *pid, const char *tid, if (!tid && uid != UINT_MAX) return thread_map__new_by_uid(uid); - if (per_thread) + if (all_threads) return thread_map__new_all_cpus(); return thread_map__new_by_tid_str(tid); diff --git a/tools/perf/util/thread_map.h b/tools/perf/util/thread_map.h index 0a806b9..5ec91cf 100644 --- a/tools/perf/util/thread_map.h +++ b/tools/perf/util/thread_map.h @@ -31,7 +31,7 @@ struct thread_map *thread_map__get(struct thread_map *map); void thread_map__put(struct thread_map *map); struct thread_map *thread_map__new_str(const char *pid, - const char *tid, uid_t uid, bool per_thread); + const char *tid, uid_t uid, bool all_threads); struct thread_map *thread_map__new_by_tid_str(const char *tid_str); -- 2.7.4
[PATCH] perf util: Use target->per_thread and target->system_wide flags
Mathieu Poirier reports issue in commit ("73c0ca1eee3d perf thread_map: Enumerate all threads from /proc") that it has negative impact on 'perf record --per-thread'. It has the effect of creating a kernel event for each thread in the system for 'perf record --per-thread'. Mathieu Poirier's patch ("perf util: Do not reuse target->per_thread flag") can fix this issue by creating a new target->all_threads flag. This patch is based on Mathieu Poirier's patch but it doesn't use a new target->all_threads flag. This patch just uses 'target->per_thread && target->system_wide' as a condition to check for all threads case. Signed-off-by: Jin Yao --- tools/perf/util/evlist.c | 2 +- tools/perf/util/thread_map.c | 4 ++-- tools/perf/util/thread_map.h | 2 +- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c index 120efd8..9dff74a 100644 --- a/tools/perf/util/evlist.c +++ b/tools/perf/util/evlist.c @@ -1106,7 +1106,7 @@ int perf_evlist__create_maps(struct perf_evlist *evlist, struct target *target) struct thread_map *threads; threads = thread_map__new_str(target->pid, target->tid, target->uid, - target->per_thread); + target->per_thread && target->system_wide); if (!threads) return -1; diff --git a/tools/perf/util/thread_map.c b/tools/perf/util/thread_map.c index 3e1038f..729dad8 100644 --- a/tools/perf/util/thread_map.c +++ b/tools/perf/util/thread_map.c @@ -323,7 +323,7 @@ struct thread_map *thread_map__new_by_tid_str(const char *tid_str) } struct thread_map *thread_map__new_str(const char *pid, const char *tid, - uid_t uid, bool per_thread) + uid_t uid, bool all_threads) { if (pid) return thread_map__new_by_pid_str(pid); @@ -331,7 +331,7 @@ struct thread_map *thread_map__new_str(const char *pid, const char *tid, if (!tid && uid != UINT_MAX) return thread_map__new_by_uid(uid); - if (per_thread) + if (all_threads) return thread_map__new_all_cpus(); return thread_map__new_by_tid_str(tid); diff --git a/tools/perf/util/thread_map.h b/tools/perf/util/thread_map.h index 0a806b9..5ec91cf 100644 --- a/tools/perf/util/thread_map.h +++ b/tools/perf/util/thread_map.h @@ -31,7 +31,7 @@ struct thread_map *thread_map__get(struct thread_map *map); void thread_map__put(struct thread_map *map); struct thread_map *thread_map__new_str(const char *pid, - const char *tid, uid_t uid, bool per_thread); + const char *tid, uid_t uid, bool all_threads); struct thread_map *thread_map__new_by_tid_str(const char *tid_str); -- 2.7.4
Re: [PATCH 1/1] perf util: Do not reuse target->per_thread flag
Hi Mathieu, Sorry, I just see this patch today. I have tested this patch and it works. Another idea from me is it doesn't need to add a new target->all_threads flag. We just use target->per-thread && target->system_wide as a condition to check for all per-thread case. I just think for your perf record case, the target->system_wide will not be set. Instead, if target->per-thread and target->system_wide are both set, that means we needs to trace on all threads, right? Thanks Jin Yao On 1/13/2018 7:12 AM, Mathieu Poirier wrote: Commit ("73c0ca1eee3d perf thread_map: Enumerate all threads from /proc") is using the target->per_thread flag to specify that all threads in a system should be taken into account. That is then used in function thread_map__new_str() where all threads are added to evlist->threads. Variable target->per_thread is also used by 'perf record' when handling trace sessions using the --per-thread command line option. Since the target->per_thread flag is set all threads will be added to evlist->threads, which has the effect of creating a kernel event for each thread in the system. This patch address the issue by creating a new target->all_threads flag that gets set from the 'stat' utility, avoiding any conflict with other utilities using target->per_thread. Signed-off-by: Mathieu Poirier <mathieu.poir...@linaro.org> --- tools/perf/builtin-stat.c| 2 +- tools/perf/util/evlist.c | 2 +- tools/perf/util/target.h | 1 + tools/perf/util/thread_map.c | 4 ++-- tools/perf/util/thread_map.h | 2 +- 5 files changed, 6 insertions(+), 5 deletions(-) diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c index 98bf9d32f222..87e156bfb45e 100644 --- a/tools/perf/builtin-stat.c +++ b/tools/perf/builtin-stat.c @@ -2831,7 +2831,7 @@ int cmd_stat(int argc, const char **argv) target__validate(); if ((stat_config.aggr_mode == AGGR_THREAD) && (target.system_wide)) - target.per_thread = true; + target.per_thread = target.all_threads = true; if (perf_evlist__create_maps(evsel_list, ) < 0) { if (target__has_task()) { diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c index f0a5e09c4071..c239eb895612 100644 --- a/tools/perf/util/evlist.c +++ b/tools/perf/util/evlist.c @@ -1106,7 +1106,7 @@ int perf_evlist__create_maps(struct perf_evlist *evlist, struct target *target) struct thread_map *threads; threads = thread_map__new_str(target->pid, target->tid, target->uid, - target->per_thread); + target->all_threads); if (!threads) return -1; diff --git a/tools/perf/util/target.h b/tools/perf/util/target.h index 6ef01a83b24e..0da8ea2b6801 100644 --- a/tools/perf/util/target.h +++ b/tools/perf/util/target.h @@ -15,6 +15,7 @@ struct target { bool uses_mmap; bool default_per_cpu; bool per_thread; + bool all_threads; }; enum target_errno { diff --git a/tools/perf/util/thread_map.c b/tools/perf/util/thread_map.c index 3e1038f6491c..729dad8f412d 100644 --- a/tools/perf/util/thread_map.c +++ b/tools/perf/util/thread_map.c @@ -323,7 +323,7 @@ struct thread_map *thread_map__new_by_tid_str(const char *tid_str) } struct thread_map *thread_map__new_str(const char *pid, const char *tid, - uid_t uid, bool per_thread) + uid_t uid, bool all_threads) { if (pid) return thread_map__new_by_pid_str(pid); @@ -331,7 +331,7 @@ struct thread_map *thread_map__new_str(const char *pid, const char *tid, if (!tid && uid != UINT_MAX) return thread_map__new_by_uid(uid); - if (per_thread) + if (all_threads) return thread_map__new_all_cpus(); return thread_map__new_by_tid_str(tid); diff --git a/tools/perf/util/thread_map.h b/tools/perf/util/thread_map.h index 0a806b99e73c..5ec91cfd1869 100644 --- a/tools/perf/util/thread_map.h +++ b/tools/perf/util/thread_map.h @@ -31,7 +31,7 @@ struct thread_map *thread_map__get(struct thread_map *map); void thread_map__put(struct thread_map *map); struct thread_map *thread_map__new_str(const char *pid, - const char *tid, uid_t uid, bool per_thread); + const char *tid, uid_t uid, bool all_threads); struct thread_map *thread_map__new_by_tid_str(const char *tid_str);
Re: [PATCH 1/1] perf util: Do not reuse target->per_thread flag
Hi Mathieu, Sorry, I just see this patch today. I have tested this patch and it works. Another idea from me is it doesn't need to add a new target->all_threads flag. We just use target->per-thread && target->system_wide as a condition to check for all per-thread case. I just think for your perf record case, the target->system_wide will not be set. Instead, if target->per-thread and target->system_wide are both set, that means we needs to trace on all threads, right? Thanks Jin Yao On 1/13/2018 7:12 AM, Mathieu Poirier wrote: Commit ("73c0ca1eee3d perf thread_map: Enumerate all threads from /proc") is using the target->per_thread flag to specify that all threads in a system should be taken into account. That is then used in function thread_map__new_str() where all threads are added to evlist->threads. Variable target->per_thread is also used by 'perf record' when handling trace sessions using the --per-thread command line option. Since the target->per_thread flag is set all threads will be added to evlist->threads, which has the effect of creating a kernel event for each thread in the system. This patch address the issue by creating a new target->all_threads flag that gets set from the 'stat' utility, avoiding any conflict with other utilities using target->per_thread. Signed-off-by: Mathieu Poirier --- tools/perf/builtin-stat.c| 2 +- tools/perf/util/evlist.c | 2 +- tools/perf/util/target.h | 1 + tools/perf/util/thread_map.c | 4 ++-- tools/perf/util/thread_map.h | 2 +- 5 files changed, 6 insertions(+), 5 deletions(-) diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c index 98bf9d32f222..87e156bfb45e 100644 --- a/tools/perf/builtin-stat.c +++ b/tools/perf/builtin-stat.c @@ -2831,7 +2831,7 @@ int cmd_stat(int argc, const char **argv) target__validate(); if ((stat_config.aggr_mode == AGGR_THREAD) && (target.system_wide)) - target.per_thread = true; + target.per_thread = target.all_threads = true; if (perf_evlist__create_maps(evsel_list, ) < 0) { if (target__has_task()) { diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c index f0a5e09c4071..c239eb895612 100644 --- a/tools/perf/util/evlist.c +++ b/tools/perf/util/evlist.c @@ -1106,7 +1106,7 @@ int perf_evlist__create_maps(struct perf_evlist *evlist, struct target *target) struct thread_map *threads; threads = thread_map__new_str(target->pid, target->tid, target->uid, - target->per_thread); + target->all_threads); if (!threads) return -1; diff --git a/tools/perf/util/target.h b/tools/perf/util/target.h index 6ef01a83b24e..0da8ea2b6801 100644 --- a/tools/perf/util/target.h +++ b/tools/perf/util/target.h @@ -15,6 +15,7 @@ struct target { bool uses_mmap; bool default_per_cpu; bool per_thread; + bool all_threads; }; enum target_errno { diff --git a/tools/perf/util/thread_map.c b/tools/perf/util/thread_map.c index 3e1038f6491c..729dad8f412d 100644 --- a/tools/perf/util/thread_map.c +++ b/tools/perf/util/thread_map.c @@ -323,7 +323,7 @@ struct thread_map *thread_map__new_by_tid_str(const char *tid_str) } struct thread_map *thread_map__new_str(const char *pid, const char *tid, - uid_t uid, bool per_thread) + uid_t uid, bool all_threads) { if (pid) return thread_map__new_by_pid_str(pid); @@ -331,7 +331,7 @@ struct thread_map *thread_map__new_str(const char *pid, const char *tid, if (!tid && uid != UINT_MAX) return thread_map__new_by_uid(uid); - if (per_thread) + if (all_threads) return thread_map__new_all_cpus(); return thread_map__new_by_tid_str(tid); diff --git a/tools/perf/util/thread_map.h b/tools/perf/util/thread_map.h index 0a806b99e73c..5ec91cfd1869 100644 --- a/tools/perf/util/thread_map.h +++ b/tools/perf/util/thread_map.h @@ -31,7 +31,7 @@ struct thread_map *thread_map__get(struct thread_map *map); void thread_map__put(struct thread_map *map); struct thread_map *thread_map__new_str(const char *pid, - const char *tid, uid_t uid, bool per_thread); + const char *tid, uid_t uid, bool all_threads); struct thread_map *thread_map__new_by_tid_str(const char *tid_str);
Re: [PATCH] perf util: Fix evlist->threads when working with 'perf stat --per-thread'
On 1/13/2018 3:53 AM, Arnaldo Carvalho de Melo wrote: Em Fri, Jan 12, 2018 at 09:44:53AM -0700, Mathieu Poirier escreveu: After commit ("73c0ca1eee3d perf thread_map: Enumerate all threads from /proc") any condition where target->per_thread is set will see all the threads in a system added to evlist->threads. Since the 'record' utility also uses function thread_map__new_str(), any attempt to use the --perf-thread option will also end up accounting for all --perf-thread? Do you mean --per-thread, ok, so 'record' has this, uses the opts->per_thread that then Jin reused for his recent patches... Jin, can you try to untangle this, probably you can't reuse that opts->per-thread thing... Without looking further, I think that could be a better avenue to fix this issue, without having to add a 'stat' specific method to the core classes. - Arnaldo Hi, Very sorry, I just see this mail today. Please let me look at this issue first. Thanks Jin Yao threads in a system, resulting in new kernel events being created for all threads rather than just the thread of interest. This patch keeps the newly introduced functionality but make sure it doesn't affect other utilities outside of 'stat'. Signed-off-by: Mathieu Poirier <mathieu.poir...@linaro.org> --- tools/perf/builtin-stat.c | 2 +- tools/perf/util/evlist.c | 29 - tools/perf/util/evlist.h | 2 ++ 3 files changed, 27 insertions(+), 6 deletions(-) diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c index 98bf9d32f222..82d16426b984 100644 --- a/tools/perf/builtin-stat.c +++ b/tools/perf/builtin-stat.c @@ -2833,7 +2833,7 @@ int cmd_stat(int argc, const char **argv) if ((stat_config.aggr_mode == AGGR_THREAD) && (target.system_wide)) target.per_thread = true; - if (perf_evlist__create_maps(evsel_list, ) < 0) { + if (perf_evlist__create_stat_maps(evsel_list, ) < 0) { if (target__has_task()) { pr_err("Problems finding threads of monitor\n"); parse_options_usage(stat_usage, stat_options, "p", 1); diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c index f0a5e09c4071..a4ae684f28de 100644 --- a/tools/perf/util/evlist.c +++ b/tools/perf/util/evlist.c @@ -1100,13 +1100,11 @@ int perf_evlist__mmap(struct perf_evlist *evlist, unsigned int pages) return perf_evlist__mmap_ex(evlist, pages, 0, false); } -int perf_evlist__create_maps(struct perf_evlist *evlist, struct target *target) +static int _perf_evlist__create_maps(struct perf_evlist *evlist, +struct target *target, +struct thread_map *threads) { struct cpu_map *cpus; - struct thread_map *threads; - - threads = thread_map__new_str(target->pid, target->tid, target->uid, - target->per_thread); if (!threads) return -1; @@ -1130,6 +1128,27 @@ int perf_evlist__create_maps(struct perf_evlist *evlist, struct target *target) return -1; } +int perf_evlist__create_maps(struct perf_evlist *evlist, struct target *target) +{ + struct thread_map *threads; + + threads = thread_map__new_str(target->pid, target->tid, target->uid, + false); + + return _perf_evlist__create_maps(evlist, target, threads); +} + +int perf_evlist__create_stat_maps(struct perf_evlist *evlist, + struct target *target) +{ + struct thread_map *threads; + + threads = thread_map__new_str(target->pid, target->tid, target->uid, + target->per_thread); + + return _perf_evlist__create_maps(evlist, target, threads); +} + void perf_evlist__set_maps(struct perf_evlist *evlist, struct cpu_map *cpus, struct thread_map *threads) { diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h index 75160666d305..80c654990e19 100644 --- a/tools/perf/util/evlist.h +++ b/tools/perf/util/evlist.h @@ -188,6 +188,8 @@ void perf_evlist__set_selected(struct perf_evlist *evlist, void perf_evlist__set_maps(struct perf_evlist *evlist, struct cpu_map *cpus, struct thread_map *threads); int perf_evlist__create_maps(struct perf_evlist *evlist, struct target *target); +int perf_evlist__create_stat_maps(struct perf_evlist *evlist, + struct target *target); int perf_evlist__apply_filters(struct perf_evlist *evlist, struct perf_evsel **err_evsel); void __perf_evlist__set_leader(struct list_head *list); -- 2.7.4
Re: [PATCH] perf util: Fix evlist->threads when working with 'perf stat --per-thread'
On 1/13/2018 3:53 AM, Arnaldo Carvalho de Melo wrote: Em Fri, Jan 12, 2018 at 09:44:53AM -0700, Mathieu Poirier escreveu: After commit ("73c0ca1eee3d perf thread_map: Enumerate all threads from /proc") any condition where target->per_thread is set will see all the threads in a system added to evlist->threads. Since the 'record' utility also uses function thread_map__new_str(), any attempt to use the --perf-thread option will also end up accounting for all --perf-thread? Do you mean --per-thread, ok, so 'record' has this, uses the opts->per_thread that then Jin reused for his recent patches... Jin, can you try to untangle this, probably you can't reuse that opts->per-thread thing... Without looking further, I think that could be a better avenue to fix this issue, without having to add a 'stat' specific method to the core classes. - Arnaldo Hi, Very sorry, I just see this mail today. Please let me look at this issue first. Thanks Jin Yao threads in a system, resulting in new kernel events being created for all threads rather than just the thread of interest. This patch keeps the newly introduced functionality but make sure it doesn't affect other utilities outside of 'stat'. Signed-off-by: Mathieu Poirier --- tools/perf/builtin-stat.c | 2 +- tools/perf/util/evlist.c | 29 - tools/perf/util/evlist.h | 2 ++ 3 files changed, 27 insertions(+), 6 deletions(-) diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c index 98bf9d32f222..82d16426b984 100644 --- a/tools/perf/builtin-stat.c +++ b/tools/perf/builtin-stat.c @@ -2833,7 +2833,7 @@ int cmd_stat(int argc, const char **argv) if ((stat_config.aggr_mode == AGGR_THREAD) && (target.system_wide)) target.per_thread = true; - if (perf_evlist__create_maps(evsel_list, ) < 0) { + if (perf_evlist__create_stat_maps(evsel_list, ) < 0) { if (target__has_task()) { pr_err("Problems finding threads of monitor\n"); parse_options_usage(stat_usage, stat_options, "p", 1); diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c index f0a5e09c4071..a4ae684f28de 100644 --- a/tools/perf/util/evlist.c +++ b/tools/perf/util/evlist.c @@ -1100,13 +1100,11 @@ int perf_evlist__mmap(struct perf_evlist *evlist, unsigned int pages) return perf_evlist__mmap_ex(evlist, pages, 0, false); } -int perf_evlist__create_maps(struct perf_evlist *evlist, struct target *target) +static int _perf_evlist__create_maps(struct perf_evlist *evlist, +struct target *target, +struct thread_map *threads) { struct cpu_map *cpus; - struct thread_map *threads; - - threads = thread_map__new_str(target->pid, target->tid, target->uid, - target->per_thread); if (!threads) return -1; @@ -1130,6 +1128,27 @@ int perf_evlist__create_maps(struct perf_evlist *evlist, struct target *target) return -1; } +int perf_evlist__create_maps(struct perf_evlist *evlist, struct target *target) +{ + struct thread_map *threads; + + threads = thread_map__new_str(target->pid, target->tid, target->uid, + false); + + return _perf_evlist__create_maps(evlist, target, threads); +} + +int perf_evlist__create_stat_maps(struct perf_evlist *evlist, + struct target *target) +{ + struct thread_map *threads; + + threads = thread_map__new_str(target->pid, target->tid, target->uid, + target->per_thread); + + return _perf_evlist__create_maps(evlist, target, threads); +} + void perf_evlist__set_maps(struct perf_evlist *evlist, struct cpu_map *cpus, struct thread_map *threads) { diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h index 75160666d305..80c654990e19 100644 --- a/tools/perf/util/evlist.h +++ b/tools/perf/util/evlist.h @@ -188,6 +188,8 @@ void perf_evlist__set_selected(struct perf_evlist *evlist, void perf_evlist__set_maps(struct perf_evlist *evlist, struct cpu_map *cpus, struct thread_map *threads); int perf_evlist__create_maps(struct perf_evlist *evlist, struct target *target); +int perf_evlist__create_stat_maps(struct perf_evlist *evlist, + struct target *target); int perf_evlist__apply_filters(struct perf_evlist *evlist, struct perf_evsel **err_evsel); void __perf_evlist__set_leader(struct list_head *list); -- 2.7.4
Re: [PATCH] perf stat: Ignore error thread when enabling system-wide --per-thread
On 1/16/2018 9:17 PM, Jiri Olsa wrote: On Tue, Jan 16, 2018 at 09:06:09PM +0800, Jin, Yao wrote: Just tested. But looks it's not OK for '--per-thread' case. yea, I haven't tested much.. might need soem tweaking, but my point was that it could be doable on one place instead of introducing another if possible jirka Hi Jiri, I ever considered to move the operation of removing error thread to perf_evsel__fallback(). The perf_evsel__fallback() is common code and it's shared by perf report, perf stat and perf top. While finally I think it'd better let the caller decide to remove error thread and try again, or just return the warning message. perf_evsel__fallback() probably doesn't know what the caller want to do. That's my current thinking. Maybe there will be a better fix... Thanks Jin Yao
Re: [PATCH] perf stat: Ignore error thread when enabling system-wide --per-thread
On 1/16/2018 9:17 PM, Jiri Olsa wrote: On Tue, Jan 16, 2018 at 09:06:09PM +0800, Jin, Yao wrote: Just tested. But looks it's not OK for '--per-thread' case. yea, I haven't tested much.. might need soem tweaking, but my point was that it could be doable on one place instead of introducing another if possible jirka Hi Jiri, I ever considered to move the operation of removing error thread to perf_evsel__fallback(). The perf_evsel__fallback() is common code and it's shared by perf report, perf stat and perf top. While finally I think it'd better let the caller decide to remove error thread and try again, or just return the warning message. perf_evsel__fallback() probably doesn't know what the caller want to do. That's my current thinking. Maybe there will be a better fix... Thanks Jin Yao
[tip:perf/core] perf script: Remove the time slices number limitation
Commit-ID: cc2ef584a863b7c8033b78723cd253ca47e9a589 Gitweb: https://git.kernel.org/tip/cc2ef584a863b7c8033b78723cd253ca47e9a589 Author: Jin Yao <yao@linux.intel.com> AuthorDate: Wed, 10 Jan 2018 23:00:33 +0800 Committer: Arnaldo Carvalho de Melo <a...@redhat.com> CommitDate: Wed, 17 Jan 2018 10:23:37 -0300 perf script: Remove the time slices number limitation Previously it was only allowed to use at most 10 time slices in 'perf script --time'. This patch removes this limitation. For example, following command line is OK (12 time slices) perf script --time 1%/1,1%/2,1%/3,1%/4,1%/5,1%/6,1%/7,1%/8,1%/9,1%/10,1%/11,1%/12 Signed-off-by: Jin Yao <yao@linux.intel.com> Suggested-by: Arnaldo Carvalho de Melo <a...@redhat.com> Reviewed-by: Jiri Olsa <jo...@redhat.com> Cc: Alexander Shishkin <alexander.shish...@linux.intel.com> Cc: Andi Kleen <a...@linux.intel.com> Cc: Kan Liang <kan.li...@intel.com> Cc: Peter Zijlstra <pet...@infradead.org> Link: http://lkml.kernel.org/r/1515596433-24653-9-git-send-email-yao@linux.intel.com [ No need to check for NULL to call free, use zfree ] Signed-off-by: Arnaldo Carvalho de Melo <a...@redhat.com> --- tools/perf/Documentation/perf-script.txt | 10 +- tools/perf/builtin-script.c | 16 2 files changed, 17 insertions(+), 9 deletions(-) diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt index 806ec63..7730c1d 100644 --- a/tools/perf/Documentation/perf-script.txt +++ b/tools/perf/Documentation/perf-script.txt @@ -351,19 +351,19 @@ include::itrace.txt[] to end of file. Also support time percent with multipe time range. Time string is - 'a%/n,b%/m,...' or 'a%-b%,c%-%d,...'. The maximum number of slices is 10. + 'a%/n,b%/m,...' or 'a%-b%,c%-%d,...'. For example: - Select the second 10% time slice + Select the second 10% time slice: perf script --time 10%/2 - Select from 0% to 10% time slice + Select from 0% to 10% time slice: perf script --time 0%-10% - Select the first and second 10% time slices + Select the first and second 10% time slices: perf script --time 10%/1,10%/2 - Select from 0% to 10% and 30% to 40% slices + Select from 0% to 10% and 30% to 40% slices: perf script --time 0%-10%,30%-40% --max-blocks:: diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c index ac78191..3499d68 100644 --- a/tools/perf/builtin-script.c +++ b/tools/perf/builtin-script.c @@ -1480,8 +1480,6 @@ static int perf_sample__fprintf_synth(struct perf_sample *sample, return 0; } -#define PTIME_RANGE_MAX10 - struct perf_script { struct perf_tooltool; struct perf_session *session; @@ -1496,7 +1494,8 @@ struct perf_script { struct thread_map *threads; int name_width; const char *time_str; - struct perf_time_interval ptime_range[PTIME_RANGE_MAX]; + struct perf_time_interval *ptime_range; + int range_size; int range_num; }; @@ -3445,6 +3444,13 @@ int cmd_script(int argc, const char **argv) if (err < 0) goto out_delete; + script.ptime_range = perf_time__range_alloc(script.time_str, + _size); + if (!script.ptime_range) { + err = -ENOMEM; + goto out_delete; + } + /* needs to be parsed after looking up reference time */ if (perf_time__parse_str(script.ptime_range, script.time_str) != 0) { if (session->evlist->first_sample_time == 0 && @@ -3457,7 +3463,7 @@ int cmd_script(int argc, const char **argv) } script.range_num = perf_time__percent_parse_str( - script.ptime_range, PTIME_RANGE_MAX, + script.ptime_range, script.range_size, script.time_str, session->evlist->first_sample_time, session->evlist->last_sample_time); @@ -3476,6 +3482,8 @@ int cmd_script(int argc, const char **argv) flush_scripting(); out_delete: + zfree(_range); + perf_evlist__free_stats(session->evlist); perf_session__delete(session);
[tip:perf/core] perf script: Remove the time slices number limitation
Commit-ID: cc2ef584a863b7c8033b78723cd253ca47e9a589 Gitweb: https://git.kernel.org/tip/cc2ef584a863b7c8033b78723cd253ca47e9a589 Author: Jin Yao AuthorDate: Wed, 10 Jan 2018 23:00:33 +0800 Committer: Arnaldo Carvalho de Melo CommitDate: Wed, 17 Jan 2018 10:23:37 -0300 perf script: Remove the time slices number limitation Previously it was only allowed to use at most 10 time slices in 'perf script --time'. This patch removes this limitation. For example, following command line is OK (12 time slices) perf script --time 1%/1,1%/2,1%/3,1%/4,1%/5,1%/6,1%/7,1%/8,1%/9,1%/10,1%/11,1%/12 Signed-off-by: Jin Yao Suggested-by: Arnaldo Carvalho de Melo Reviewed-by: Jiri Olsa Cc: Alexander Shishkin Cc: Andi Kleen Cc: Kan Liang Cc: Peter Zijlstra Link: http://lkml.kernel.org/r/1515596433-24653-9-git-send-email-yao@linux.intel.com [ No need to check for NULL to call free, use zfree ] Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/Documentation/perf-script.txt | 10 +- tools/perf/builtin-script.c | 16 2 files changed, 17 insertions(+), 9 deletions(-) diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt index 806ec63..7730c1d 100644 --- a/tools/perf/Documentation/perf-script.txt +++ b/tools/perf/Documentation/perf-script.txt @@ -351,19 +351,19 @@ include::itrace.txt[] to end of file. Also support time percent with multipe time range. Time string is - 'a%/n,b%/m,...' or 'a%-b%,c%-%d,...'. The maximum number of slices is 10. + 'a%/n,b%/m,...' or 'a%-b%,c%-%d,...'. For example: - Select the second 10% time slice + Select the second 10% time slice: perf script --time 10%/2 - Select from 0% to 10% time slice + Select from 0% to 10% time slice: perf script --time 0%-10% - Select the first and second 10% time slices + Select the first and second 10% time slices: perf script --time 10%/1,10%/2 - Select from 0% to 10% and 30% to 40% slices + Select from 0% to 10% and 30% to 40% slices: perf script --time 0%-10%,30%-40% --max-blocks:: diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c index ac78191..3499d68 100644 --- a/tools/perf/builtin-script.c +++ b/tools/perf/builtin-script.c @@ -1480,8 +1480,6 @@ static int perf_sample__fprintf_synth(struct perf_sample *sample, return 0; } -#define PTIME_RANGE_MAX10 - struct perf_script { struct perf_tooltool; struct perf_session *session; @@ -1496,7 +1494,8 @@ struct perf_script { struct thread_map *threads; int name_width; const char *time_str; - struct perf_time_interval ptime_range[PTIME_RANGE_MAX]; + struct perf_time_interval *ptime_range; + int range_size; int range_num; }; @@ -3445,6 +3444,13 @@ int cmd_script(int argc, const char **argv) if (err < 0) goto out_delete; + script.ptime_range = perf_time__range_alloc(script.time_str, + _size); + if (!script.ptime_range) { + err = -ENOMEM; + goto out_delete; + } + /* needs to be parsed after looking up reference time */ if (perf_time__parse_str(script.ptime_range, script.time_str) != 0) { if (session->evlist->first_sample_time == 0 && @@ -3457,7 +3463,7 @@ int cmd_script(int argc, const char **argv) } script.range_num = perf_time__percent_parse_str( - script.ptime_range, PTIME_RANGE_MAX, + script.ptime_range, script.range_size, script.time_str, session->evlist->first_sample_time, session->evlist->last_sample_time); @@ -3476,6 +3482,8 @@ int cmd_script(int argc, const char **argv) flush_scripting(); out_delete: + zfree(_range); + perf_evlist__free_stats(session->evlist); perf_session__delete(session);
[tip:perf/core] perf report: Remove the time slices number limitation
Commit-ID: 0a3cc3ae05c363dabd891ed5f918c62197de8c7f Gitweb: https://git.kernel.org/tip/0a3cc3ae05c363dabd891ed5f918c62197de8c7f Author: Jin Yao <yao@linux.intel.com> AuthorDate: Wed, 10 Jan 2018 23:00:32 +0800 Committer: Arnaldo Carvalho de Melo <a...@redhat.com> CommitDate: Wed, 17 Jan 2018 10:23:37 -0300 perf report: Remove the time slices number limitation Previously it was only allowed to use at most 10 time slices in 'perf report --time'. This patch removes this limitation. For example, following command line is OK (12 time slices) perf report --stdio --time 1%/1,1%/2,1%/3,1%/4,1%/5,1%/6,1%/7,1%/8,1%/9,1%/10,1%/11,1%/12 Signed-off-by: Jin Yao <yao@linux.intel.com> Suggested-by: Arnaldo Carvalho de Melo <a...@redhat.com> Reviewed-by: Jiri Olsa <jo...@redhat.com> Cc: Alexander Shishkin <alexander.shish...@linux.intel.com> Cc: Andi Kleen <a...@linux.intel.com> Cc: Jiri Olsa <jo...@kernel.org> Cc: Kan Liang <kan.li...@intel.com> Cc: Peter Zijlstra <pet...@infradead.org> Link: http://lkml.kernel.org/r/1515596433-24653-8-git-send-email-yao@linux.intel.com [ No need to check for NULL to call free, use zfree ] Signed-off-by: Arnaldo Carvalho de Melo <a...@redhat.com> --- tools/perf/Documentation/perf-report.txt | 2 +- tools/perf/builtin-report.c | 22 -- 2 files changed, 17 insertions(+), 7 deletions(-) diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt index 63d0db3..907e505 100644 --- a/tools/perf/Documentation/perf-report.txt +++ b/tools/perf/Documentation/perf-report.txt @@ -403,7 +403,7 @@ OPTIONS to end of file. Also support time percent with multiple time range. Time string is - 'a%/n,b%/m,...' or 'a%-b%,c%-%d,...'. The maximum number of slices is 10. + 'a%/n,b%/m,...' or 'a%-b%,c%-%d,...'. For example: Select the second 10% time slice: diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c index 437..42a52dc 100644 --- a/tools/perf/builtin-report.c +++ b/tools/perf/builtin-report.c @@ -54,8 +54,6 @@ #include #include -#define PTIME_RANGE_MAX10 - struct report { struct perf_tooltool; struct perf_session *session; @@ -76,7 +74,8 @@ struct report { const char *cpu_list; const char *symbol_filter_str; const char *time_str; - struct perf_time_interval ptime_range[PTIME_RANGE_MAX]; + struct perf_time_interval *ptime_range; + int range_size; int range_num; float min_percent; u64 nr_entries; @@ -1300,24 +1299,33 @@ repeat: if (symbol__init(>header.env) < 0) goto error; + report.ptime_range = perf_time__range_alloc(report.time_str, + _size); + if (!report.ptime_range) { + ret = -ENOMEM; + goto error; + } + if (perf_time__parse_str(report.ptime_range, report.time_str) != 0) { if (session->evlist->first_sample_time == 0 && session->evlist->last_sample_time == 0) { pr_err("HINT: no first/last sample time found in perf data.\n" "Please use latest perf binary to execute 'perf record'\n" "(if '--buildid-all' is enabled, please set '--timestamp-boundary').\n"); - return -EINVAL; + ret = -EINVAL; + goto error; } report.range_num = perf_time__percent_parse_str( - report.ptime_range, PTIME_RANGE_MAX, + report.ptime_range, report.range_size, report.time_str, session->evlist->first_sample_time, session->evlist->last_sample_time); if (report.range_num < 0) { pr_err("Invalid time string\n"); - return -EINVAL; + ret = -EINVAL; + goto error; } } else { report.range_num = 1; @@ -1333,6 +1341,8 @@ repeat: ret = 0; error: + zfree(_range); + perf_session__delete(session); return ret; }
[tip:perf/core] perf report: Remove the time slices number limitation
Commit-ID: 0a3cc3ae05c363dabd891ed5f918c62197de8c7f Gitweb: https://git.kernel.org/tip/0a3cc3ae05c363dabd891ed5f918c62197de8c7f Author: Jin Yao AuthorDate: Wed, 10 Jan 2018 23:00:32 +0800 Committer: Arnaldo Carvalho de Melo CommitDate: Wed, 17 Jan 2018 10:23:37 -0300 perf report: Remove the time slices number limitation Previously it was only allowed to use at most 10 time slices in 'perf report --time'. This patch removes this limitation. For example, following command line is OK (12 time slices) perf report --stdio --time 1%/1,1%/2,1%/3,1%/4,1%/5,1%/6,1%/7,1%/8,1%/9,1%/10,1%/11,1%/12 Signed-off-by: Jin Yao Suggested-by: Arnaldo Carvalho de Melo Reviewed-by: Jiri Olsa Cc: Alexander Shishkin Cc: Andi Kleen Cc: Jiri Olsa Cc: Kan Liang Cc: Peter Zijlstra Link: http://lkml.kernel.org/r/1515596433-24653-8-git-send-email-yao@linux.intel.com [ No need to check for NULL to call free, use zfree ] Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/Documentation/perf-report.txt | 2 +- tools/perf/builtin-report.c | 22 -- 2 files changed, 17 insertions(+), 7 deletions(-) diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt index 63d0db3..907e505 100644 --- a/tools/perf/Documentation/perf-report.txt +++ b/tools/perf/Documentation/perf-report.txt @@ -403,7 +403,7 @@ OPTIONS to end of file. Also support time percent with multiple time range. Time string is - 'a%/n,b%/m,...' or 'a%-b%,c%-%d,...'. The maximum number of slices is 10. + 'a%/n,b%/m,...' or 'a%-b%,c%-%d,...'. For example: Select the second 10% time slice: diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c index 437..42a52dc 100644 --- a/tools/perf/builtin-report.c +++ b/tools/perf/builtin-report.c @@ -54,8 +54,6 @@ #include #include -#define PTIME_RANGE_MAX10 - struct report { struct perf_tooltool; struct perf_session *session; @@ -76,7 +74,8 @@ struct report { const char *cpu_list; const char *symbol_filter_str; const char *time_str; - struct perf_time_interval ptime_range[PTIME_RANGE_MAX]; + struct perf_time_interval *ptime_range; + int range_size; int range_num; float min_percent; u64 nr_entries; @@ -1300,24 +1299,33 @@ repeat: if (symbol__init(>header.env) < 0) goto error; + report.ptime_range = perf_time__range_alloc(report.time_str, + _size); + if (!report.ptime_range) { + ret = -ENOMEM; + goto error; + } + if (perf_time__parse_str(report.ptime_range, report.time_str) != 0) { if (session->evlist->first_sample_time == 0 && session->evlist->last_sample_time == 0) { pr_err("HINT: no first/last sample time found in perf data.\n" "Please use latest perf binary to execute 'perf record'\n" "(if '--buildid-all' is enabled, please set '--timestamp-boundary').\n"); - return -EINVAL; + ret = -EINVAL; + goto error; } report.range_num = perf_time__percent_parse_str( - report.ptime_range, PTIME_RANGE_MAX, + report.ptime_range, report.range_size, report.time_str, session->evlist->first_sample_time, session->evlist->last_sample_time); if (report.range_num < 0) { pr_err("Invalid time string\n"); - return -EINVAL; + ret = -EINVAL; + goto error; } } else { report.range_num = 1; @@ -1333,6 +1341,8 @@ repeat: ret = 0; error: + zfree(_range); + perf_session__delete(session); return ret; }
[tip:perf/core] perf util: Allocate time slices buffer according to number of comma
Commit-ID: 5a031f887cb8d60fe87d21159c3cf82c38f55679 Gitweb: https://git.kernel.org/tip/5a031f887cb8d60fe87d21159c3cf82c38f55679 Author: Jin Yao <yao@linux.intel.com> AuthorDate: Wed, 10 Jan 2018 23:00:31 +0800 Committer: Arnaldo Carvalho de Melo <a...@redhat.com> CommitDate: Wed, 17 Jan 2018 10:23:36 -0300 perf util: Allocate time slices buffer according to number of comma Previously we use a magic number 10 to limit the number of time slices. It's not very good. This patch creates a new function perf_time__range_alloc() to allocate time slices buffer. The number of buffer entries is determined by the number of comma in string but at least it will allocate one entry even if no comma is found. Signed-off-by: Jin Yao <yao@linux.intel.com> Suggested-by: Arnaldo Carvalho de Melo <a...@redhat.com> Reviewed-by: Jiri Olsa <jo...@redhat.com> Cc: Alexander Shishkin <alexander.shish...@linux.intel.com> Cc: Andi Kleen <a...@linux.intel.com> Cc: Kan Liang <kan.li...@intel.com> Cc: Peter Zijlstra <pet...@infradead.org> Link: http://lkml.kernel.org/r/1515596433-24653-7-git-send-email-yao@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <a...@redhat.com> --- tools/perf/util/time-utils.c | 28 tools/perf/util/time-utils.h | 2 ++ 2 files changed, 30 insertions(+) diff --git a/tools/perf/util/time-utils.c b/tools/perf/util/time-utils.c index 5769f97..6193b46 100644 --- a/tools/perf/util/time-utils.c +++ b/tools/perf/util/time-utils.c @@ -325,6 +325,34 @@ int perf_time__percent_parse_str(struct perf_time_interval *ptime_buf, int num, return -1; } +struct perf_time_interval *perf_time__range_alloc(const char *ostr, int *size) +{ + const char *p1, *p2; + int i = 1; + struct perf_time_interval *ptime; + + /* +* At least allocate one time range. +*/ + if (!ostr) + goto alloc; + + p1 = ostr; + while (p1 < ostr + strlen(ostr)) { + p2 = strchr(p1, ','); + if (!p2) + break; + + p1 = p2 + 1; + i++; + } + +alloc: + *size = i; + ptime = calloc(i, sizeof(*ptime)); + return ptime; +} + bool perf_time__skip_sample(struct perf_time_interval *ptime, u64 timestamp) { /* if time is not set don't drop sample */ diff --git a/tools/perf/util/time-utils.h b/tools/perf/util/time-utils.h index 34d5eba..70b177d 100644 --- a/tools/perf/util/time-utils.h +++ b/tools/perf/util/time-utils.h @@ -16,6 +16,8 @@ int perf_time__parse_str(struct perf_time_interval *ptime, const char *ostr); int perf_time__percent_parse_str(struct perf_time_interval *ptime_buf, int num, const char *ostr, u64 start, u64 end); +struct perf_time_interval *perf_time__range_alloc(const char *ostr, int *size); + bool perf_time__skip_sample(struct perf_time_interval *ptime, u64 timestamp); bool perf_time__ranges_skip_sample(struct perf_time_interval *ptime_buf,
[tip:perf/core] perf util: Allocate time slices buffer according to number of comma
Commit-ID: 5a031f887cb8d60fe87d21159c3cf82c38f55679 Gitweb: https://git.kernel.org/tip/5a031f887cb8d60fe87d21159c3cf82c38f55679 Author: Jin Yao AuthorDate: Wed, 10 Jan 2018 23:00:31 +0800 Committer: Arnaldo Carvalho de Melo CommitDate: Wed, 17 Jan 2018 10:23:36 -0300 perf util: Allocate time slices buffer according to number of comma Previously we use a magic number 10 to limit the number of time slices. It's not very good. This patch creates a new function perf_time__range_alloc() to allocate time slices buffer. The number of buffer entries is determined by the number of comma in string but at least it will allocate one entry even if no comma is found. Signed-off-by: Jin Yao Suggested-by: Arnaldo Carvalho de Melo Reviewed-by: Jiri Olsa Cc: Alexander Shishkin Cc: Andi Kleen Cc: Kan Liang Cc: Peter Zijlstra Link: http://lkml.kernel.org/r/1515596433-24653-7-git-send-email-yao@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/util/time-utils.c | 28 tools/perf/util/time-utils.h | 2 ++ 2 files changed, 30 insertions(+) diff --git a/tools/perf/util/time-utils.c b/tools/perf/util/time-utils.c index 5769f97..6193b46 100644 --- a/tools/perf/util/time-utils.c +++ b/tools/perf/util/time-utils.c @@ -325,6 +325,34 @@ int perf_time__percent_parse_str(struct perf_time_interval *ptime_buf, int num, return -1; } +struct perf_time_interval *perf_time__range_alloc(const char *ostr, int *size) +{ + const char *p1, *p2; + int i = 1; + struct perf_time_interval *ptime; + + /* +* At least allocate one time range. +*/ + if (!ostr) + goto alloc; + + p1 = ostr; + while (p1 < ostr + strlen(ostr)) { + p2 = strchr(p1, ','); + if (!p2) + break; + + p1 = p2 + 1; + i++; + } + +alloc: + *size = i; + ptime = calloc(i, sizeof(*ptime)); + return ptime; +} + bool perf_time__skip_sample(struct perf_time_interval *ptime, u64 timestamp) { /* if time is not set don't drop sample */ diff --git a/tools/perf/util/time-utils.h b/tools/perf/util/time-utils.h index 34d5eba..70b177d 100644 --- a/tools/perf/util/time-utils.h +++ b/tools/perf/util/time-utils.h @@ -16,6 +16,8 @@ int perf_time__parse_str(struct perf_time_interval *ptime, const char *ostr); int perf_time__percent_parse_str(struct perf_time_interval *ptime_buf, int num, const char *ostr, u64 start, u64 end); +struct perf_time_interval *perf_time__range_alloc(const char *ostr, int *size); + bool perf_time__skip_sample(struct perf_time_interval *ptime, u64 timestamp); bool perf_time__ranges_skip_sample(struct perf_time_interval *ptime_buf,
[tip:perf/core] perf report: Add an indication of what time slices are used
Commit-ID: 7425664bbd3174814500c7ab8740cbb9bb25396c Gitweb: https://git.kernel.org/tip/7425664bbd3174814500c7ab8740cbb9bb25396c Author: Jin Yao <yao@linux.intel.com> AuthorDate: Wed, 10 Jan 2018 23:00:30 +0800 Committer: Arnaldo Carvalho de Melo <a...@redhat.com> CommitDate: Wed, 17 Jan 2018 10:23:36 -0300 perf report: Add an indication of what time slices are used Add a time slices indication to the perf report header. For example, # perf report --stdio --time 10% # Total Lost Samples: 0 # # Samples: 9K of event 'cycles:ppp' (time slices: 10%) # Event count (approx.): 8951288803 Signed-off-by: Jin Yao <yao@linux.intel.com> Suggested--by: Arnaldo Carvalho de Melo <a...@redhat.com> Tested-by: Arnaldo Carvalho de Melo <a...@redhat.com> Reviewed-by: Jiri Olsa <jo...@redhat.com> Cc: Alexander Shishkin <alexander.shish...@linux.intel.com> Cc: Andi Kleen <a...@linux.intel.com> Cc: Kan Liang <kan.li...@intel.com> Cc: Peter Zijlstra <pet...@infradead.org> Link: http://lkml.kernel.org/r/1515596433-24653-6-git-send-email-yao@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <a...@redhat.com> --- tools/perf/builtin-report.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c index 7d4f0a5..437 100644 --- a/tools/perf/builtin-report.c +++ b/tools/perf/builtin-report.c @@ -404,6 +404,9 @@ static size_t hists__fprintf_nr_sample_events(struct hists *hists, struct report if (evname != NULL) ret += fprintf(fp, " of event '%s'", evname); + if (rep->time_str) + ret += fprintf(fp, " (time slices: %s)", rep->time_str); + if (symbol_conf.show_ref_callgraph && strstr(evname, "call-graph=no")) { ret += fprintf(fp, ", show reference callgraph");
[tip:perf/core] perf report: Add an indication of what time slices are used
Commit-ID: 7425664bbd3174814500c7ab8740cbb9bb25396c Gitweb: https://git.kernel.org/tip/7425664bbd3174814500c7ab8740cbb9bb25396c Author: Jin Yao AuthorDate: Wed, 10 Jan 2018 23:00:30 +0800 Committer: Arnaldo Carvalho de Melo CommitDate: Wed, 17 Jan 2018 10:23:36 -0300 perf report: Add an indication of what time slices are used Add a time slices indication to the perf report header. For example, # perf report --stdio --time 10% # Total Lost Samples: 0 # # Samples: 9K of event 'cycles:ppp' (time slices: 10%) # Event count (approx.): 8951288803 Signed-off-by: Jin Yao Suggested--by: Arnaldo Carvalho de Melo Tested-by: Arnaldo Carvalho de Melo Reviewed-by: Jiri Olsa Cc: Alexander Shishkin Cc: Andi Kleen Cc: Kan Liang Cc: Peter Zijlstra Link: http://lkml.kernel.org/r/1515596433-24653-6-git-send-email-yao@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/builtin-report.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c index 7d4f0a5..437 100644 --- a/tools/perf/builtin-report.c +++ b/tools/perf/builtin-report.c @@ -404,6 +404,9 @@ static size_t hists__fprintf_nr_sample_events(struct hists *hists, struct report if (evname != NULL) ret += fprintf(fp, " of event '%s'", evname); + if (rep->time_str) + ret += fprintf(fp, " (time slices: %s)", rep->time_str); + if (symbol_conf.show_ref_callgraph && strstr(evname, "call-graph=no")) { ret += fprintf(fp, ", show reference callgraph");
[tip:perf/core] perf util: Support no index time percent slice
Commit-ID: 3002812e602d3f991a5b8cdc0499e63e13ff65c4 Gitweb: https://git.kernel.org/tip/3002812e602d3f991a5b8cdc0499e63e13ff65c4 Author: Jin Yao <yao@linux.intel.com> AuthorDate: Wed, 10 Jan 2018 23:00:29 +0800 Committer: Arnaldo Carvalho de Melo <a...@redhat.com> CommitDate: Wed, 17 Jan 2018 10:23:35 -0300 perf util: Support no index time percent slice Previously, the time percent slice needs an index to specify which one the user wants. It may be easier to use if the index can be omitted. So with this patch, for example, perf report --stdio --time 10%/1 should be equivalent to perf report --stdio --time 10% Signed-off-by: Jin Yao <yao@linux.intel.com> Suggested-by: Arnaldo Carvalho de Melo <a...@redhat.com> Tested-by: Arnaldo Carvalho de Melo <a...@redhat.com> Reviewed-by: Jiri Olsa <jo...@redhat.com> Cc: Alexander Shishkin <alexander.shish...@linux.intel.com> Cc: Andi Kleen <a...@linux.intel.com> Cc: Kan Liang <kan.li...@intel.com> Cc: Peter Zijlstra <pet...@infradead.org> Link: http://lkml.kernel.org/r/1515596433-24653-5-git-send-email-yao@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <a...@redhat.com> --- tools/perf/util/time-utils.c | 36 1 file changed, 36 insertions(+) diff --git a/tools/perf/util/time-utils.c b/tools/perf/util/time-utils.c index 88510ab..5769f97 100644 --- a/tools/perf/util/time-utils.c +++ b/tools/perf/util/time-utils.c @@ -261,6 +261,37 @@ static int percent_comma_split(struct perf_time_interval *ptime_buf, int num, return i; } +static int one_percent_convert(struct perf_time_interval *ptime_buf, + const char *ostr, u64 start, u64 end, char *c) +{ + char *str; + int len = strlen(ostr), ret; + + /* +* c points to '%'. +* '%' should be the last character +*/ + if (ostr + len - 1 != c) + return -1; + + /* +* Construct a string like "xx%/1" +*/ + str = malloc(len + 3); + if (str == NULL) + return -ENOMEM; + + memcpy(str, ostr, len); + strcpy(str + len, "/1"); + + ret = percent_slash_split(str, ptime_buf, start, end); + if (ret == 0) + ret = 1; + + free(str); + return ret; +} + int perf_time__percent_parse_str(struct perf_time_interval *ptime_buf, int num, const char *ostr, u64 start, u64 end) { @@ -270,6 +301,7 @@ int perf_time__percent_parse_str(struct perf_time_interval *ptime_buf, int num, * ostr example: * 10%/2,10%/3: select the second 10% slice and the third 10% slice * 0%-10%,30%-40%: multiple time range +* 50%: just one percent */ memset(ptime_buf, 0, sizeof(*ptime_buf) * num); @@ -286,6 +318,10 @@ int perf_time__percent_parse_str(struct perf_time_interval *ptime_buf, int num, end, percent_dash_split); } + c = strchr(ostr, '%'); + if (c) + return one_percent_convert(ptime_buf, ostr, start, end, c); + return -1; }
[tip:perf/core] perf util: Support no index time percent slice
Commit-ID: 3002812e602d3f991a5b8cdc0499e63e13ff65c4 Gitweb: https://git.kernel.org/tip/3002812e602d3f991a5b8cdc0499e63e13ff65c4 Author: Jin Yao AuthorDate: Wed, 10 Jan 2018 23:00:29 +0800 Committer: Arnaldo Carvalho de Melo CommitDate: Wed, 17 Jan 2018 10:23:35 -0300 perf util: Support no index time percent slice Previously, the time percent slice needs an index to specify which one the user wants. It may be easier to use if the index can be omitted. So with this patch, for example, perf report --stdio --time 10%/1 should be equivalent to perf report --stdio --time 10% Signed-off-by: Jin Yao Suggested-by: Arnaldo Carvalho de Melo Tested-by: Arnaldo Carvalho de Melo Reviewed-by: Jiri Olsa Cc: Alexander Shishkin Cc: Andi Kleen Cc: Kan Liang Cc: Peter Zijlstra Link: http://lkml.kernel.org/r/1515596433-24653-5-git-send-email-yao@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/util/time-utils.c | 36 1 file changed, 36 insertions(+) diff --git a/tools/perf/util/time-utils.c b/tools/perf/util/time-utils.c index 88510ab..5769f97 100644 --- a/tools/perf/util/time-utils.c +++ b/tools/perf/util/time-utils.c @@ -261,6 +261,37 @@ static int percent_comma_split(struct perf_time_interval *ptime_buf, int num, return i; } +static int one_percent_convert(struct perf_time_interval *ptime_buf, + const char *ostr, u64 start, u64 end, char *c) +{ + char *str; + int len = strlen(ostr), ret; + + /* +* c points to '%'. +* '%' should be the last character +*/ + if (ostr + len - 1 != c) + return -1; + + /* +* Construct a string like "xx%/1" +*/ + str = malloc(len + 3); + if (str == NULL) + return -ENOMEM; + + memcpy(str, ostr, len); + strcpy(str + len, "/1"); + + ret = percent_slash_split(str, ptime_buf, start, end); + if (ret == 0) + ret = 1; + + free(str); + return ret; +} + int perf_time__percent_parse_str(struct perf_time_interval *ptime_buf, int num, const char *ostr, u64 start, u64 end) { @@ -270,6 +301,7 @@ int perf_time__percent_parse_str(struct perf_time_interval *ptime_buf, int num, * ostr example: * 10%/2,10%/3: select the second 10% slice and the third 10% slice * 0%-10%,30%-40%: multiple time range +* 50%: just one percent */ memset(ptime_buf, 0, sizeof(*ptime_buf) * num); @@ -286,6 +318,10 @@ int perf_time__percent_parse_str(struct perf_time_interval *ptime_buf, int num, end, percent_dash_split); } + c = strchr(ostr, '%'); + if (c) + return one_percent_convert(ptime_buf, ostr, start, end, c); + return -1; }
[tip:perf/core] perf util: Improve error checking for time percent input
Commit-ID: 6e761cbc9127fb8fc609aea2265ee8279b8d6c55 Gitweb: https://git.kernel.org/tip/6e761cbc9127fb8fc609aea2265ee8279b8d6c55 Author: Jin Yao <yao@linux.intel.com> AuthorDate: Wed, 10 Jan 2018 23:00:28 +0800 Committer: Arnaldo Carvalho de Melo <a...@redhat.com> CommitDate: Wed, 17 Jan 2018 10:23:35 -0300 perf util: Improve error checking for time percent input The command line like 'perf report --stdio --time 1abc%/1' could be accepted by perf. It looks not very good. This patch uses strtod() to replace original atof() and check the entire string. Now for the same command line, it would return error message "Invalid time string". root@skl:/tmp# perf report --stdio --time 1abc%/1 Invalid time string Signed-off-by: Jin Yao <yao@linux.intel.com> Reviewed-by: Jiri Olsa <jo...@redhat.com> Cc: Alexander Shishkin <alexander.shish...@linux.intel.com> Cc: Andi Kleen <a...@linux.intel.com> Cc: Kan Liang <kan.li...@intel.com> Cc: Peter Zijlstra <pet...@infradead.org> Link: http://lkml.kernel.org/r/1515596433-24653-4-git-send-email-yao@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <a...@redhat.com> --- tools/perf/util/time-utils.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/tools/perf/util/time-utils.c b/tools/perf/util/time-utils.c index 3f7f18f..88510ab 100644 --- a/tools/perf/util/time-utils.c +++ b/tools/perf/util/time-utils.c @@ -116,7 +116,8 @@ int perf_time__parse_str(struct perf_time_interval *ptime, const char *ostr) static int parse_percent(double *pcnt, char *str) { - char *c; + char *c, *endptr; + double d; c = strchr(str, '%'); if (c) @@ -124,8 +125,11 @@ static int parse_percent(double *pcnt, char *str) else return -1; - *pcnt = atof(str) / 100.0; + d = strtod(str, ); + if (endptr != str + strlen(str)) + return -1; + *pcnt = d / 100.0; return 0; }
[tip:perf/core] perf util: Improve error checking for time percent input
Commit-ID: 6e761cbc9127fb8fc609aea2265ee8279b8d6c55 Gitweb: https://git.kernel.org/tip/6e761cbc9127fb8fc609aea2265ee8279b8d6c55 Author: Jin Yao AuthorDate: Wed, 10 Jan 2018 23:00:28 +0800 Committer: Arnaldo Carvalho de Melo CommitDate: Wed, 17 Jan 2018 10:23:35 -0300 perf util: Improve error checking for time percent input The command line like 'perf report --stdio --time 1abc%/1' could be accepted by perf. It looks not very good. This patch uses strtod() to replace original atof() and check the entire string. Now for the same command line, it would return error message "Invalid time string". root@skl:/tmp# perf report --stdio --time 1abc%/1 Invalid time string Signed-off-by: Jin Yao Reviewed-by: Jiri Olsa Cc: Alexander Shishkin Cc: Andi Kleen Cc: Kan Liang Cc: Peter Zijlstra Link: http://lkml.kernel.org/r/1515596433-24653-4-git-send-email-yao@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/util/time-utils.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/tools/perf/util/time-utils.c b/tools/perf/util/time-utils.c index 3f7f18f..88510ab 100644 --- a/tools/perf/util/time-utils.c +++ b/tools/perf/util/time-utils.c @@ -116,7 +116,8 @@ int perf_time__parse_str(struct perf_time_interval *ptime, const char *ostr) static int parse_percent(double *pcnt, char *str) { - char *c; + char *c, *endptr; + double d; c = strchr(str, '%'); if (c) @@ -124,8 +125,11 @@ static int parse_percent(double *pcnt, char *str) else return -1; - *pcnt = atof(str) / 100.0; + d = strtod(str, ); + if (endptr != str + strlen(str)) + return -1; + *pcnt = d / 100.0; return 0; }
[tip:perf/core] perf script: Improve error msg when no first/last sample time found
Commit-ID: 1e2778e91616086177a255f3fc8c72ecaa564ae6 Gitweb: https://git.kernel.org/tip/1e2778e91616086177a255f3fc8c72ecaa564ae6 Author: Jin Yao <yao@linux.intel.com> AuthorDate: Wed, 10 Jan 2018 23:00:27 +0800 Committer: Arnaldo Carvalho de Melo <a...@redhat.com> CommitDate: Wed, 17 Jan 2018 10:23:34 -0300 perf script: Improve error msg when no first/last sample time found The following message will be returned to user when executing 'perf script --time' if perf data file doesn't contain the first/last sample time. "HINT: no first/last sample time found in perf data. Please use latest perf binary to execute 'perf record' (if '--buildid-all' is enabled, needs to set '--timestamp-boundary')." Signed-off-by: Jin Yao <yao@linux.intel.com> Reviewed-by: Jiri Olsa <jo...@redhat.com> Cc: Alexander Shishkin <alexander.shish...@linux.intel.com> Cc: Andi Kleen <a...@linux.intel.com> Cc: Kan Liang <kan.li...@intel.com> Cc: Peter Zijlstra <pet...@infradead.org> Link: http://lkml.kernel.org/r/1515596433-24653-3-git-send-email-yao@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <a...@redhat.com> --- tools/perf/builtin-script.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c index 08bc818..ac78191 100644 --- a/tools/perf/builtin-script.c +++ b/tools/perf/builtin-script.c @@ -3449,7 +3449,9 @@ int cmd_script(int argc, const char **argv) if (perf_time__parse_str(script.ptime_range, script.time_str) != 0) { if (session->evlist->first_sample_time == 0 && session->evlist->last_sample_time == 0) { - pr_err("No first/last sample time in perf data\n"); + pr_err("HINT: no first/last sample time found in perf data.\n" + "Please use latest perf binary to execute 'perf record'\n" + "(if '--buildid-all' is enabled, please set '--timestamp-boundary').\n"); err = -EINVAL; goto out_delete; }
[tip:perf/core] perf script: Improve error msg when no first/last sample time found
Commit-ID: 1e2778e91616086177a255f3fc8c72ecaa564ae6 Gitweb: https://git.kernel.org/tip/1e2778e91616086177a255f3fc8c72ecaa564ae6 Author: Jin Yao AuthorDate: Wed, 10 Jan 2018 23:00:27 +0800 Committer: Arnaldo Carvalho de Melo CommitDate: Wed, 17 Jan 2018 10:23:34 -0300 perf script: Improve error msg when no first/last sample time found The following message will be returned to user when executing 'perf script --time' if perf data file doesn't contain the first/last sample time. "HINT: no first/last sample time found in perf data. Please use latest perf binary to execute 'perf record' (if '--buildid-all' is enabled, needs to set '--timestamp-boundary')." Signed-off-by: Jin Yao Reviewed-by: Jiri Olsa Cc: Alexander Shishkin Cc: Andi Kleen Cc: Kan Liang Cc: Peter Zijlstra Link: http://lkml.kernel.org/r/1515596433-24653-3-git-send-email-yao@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/builtin-script.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c index 08bc818..ac78191 100644 --- a/tools/perf/builtin-script.c +++ b/tools/perf/builtin-script.c @@ -3449,7 +3449,9 @@ int cmd_script(int argc, const char **argv) if (perf_time__parse_str(script.ptime_range, script.time_str) != 0) { if (session->evlist->first_sample_time == 0 && session->evlist->last_sample_time == 0) { - pr_err("No first/last sample time in perf data\n"); + pr_err("HINT: no first/last sample time found in perf data.\n" + "Please use latest perf binary to execute 'perf record'\n" + "(if '--buildid-all' is enabled, please set '--timestamp-boundary').\n"); err = -EINVAL; goto out_delete; }
[tip:perf/core] perf report: Improve error msg when no first/last sample time found
Commit-ID: eb0b419eff8cf51af8e16cc8c5d2a92d19824266 Gitweb: https://git.kernel.org/tip/eb0b419eff8cf51af8e16cc8c5d2a92d19824266 Author: Jin Yao <yao@linux.intel.com> AuthorDate: Wed, 10 Jan 2018 23:00:26 +0800 Committer: Arnaldo Carvalho de Melo <a...@redhat.com> CommitDate: Wed, 17 Jan 2018 10:23:34 -0300 perf report: Improve error msg when no first/last sample time found The following message will be returned to user when executing 'perf report --time' if perf data file doesn't contain the first/last sample time. "HINT: no first/last sample time found in perf data. Please use latest perf binary to execute 'perf record' (if '--buildid-all' is enabled, needs to set '--timestamp-boundary')." Signed-off-by: Jin Yao <yao@linux.intel.com> Reviewed-by: Jiri Olsa <jo...@redhat.com> Cc: Alexander Shishkin <alexander.shish...@linux.intel.com> Cc: Andi Kleen <a...@linux.intel.com> Cc: Kan Liang <kan.li...@intel.com> Cc: Peter Zijlstra <pet...@infradead.org> Link: http://lkml.kernel.org/r/1515596433-24653-2-git-send-email-yao@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <a...@redhat.com> --- tools/perf/builtin-report.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c index 6593779..7d4f0a5 100644 --- a/tools/perf/builtin-report.c +++ b/tools/perf/builtin-report.c @@ -1300,7 +1300,9 @@ repeat: if (perf_time__parse_str(report.ptime_range, report.time_str) != 0) { if (session->evlist->first_sample_time == 0 && session->evlist->last_sample_time == 0) { - pr_err("No first/last sample time in perf data\n"); + pr_err("HINT: no first/last sample time found in perf data.\n" + "Please use latest perf binary to execute 'perf record'\n" + "(if '--buildid-all' is enabled, please set '--timestamp-boundary').\n"); return -EINVAL; }
[tip:perf/core] perf report: Improve error msg when no first/last sample time found
Commit-ID: eb0b419eff8cf51af8e16cc8c5d2a92d19824266 Gitweb: https://git.kernel.org/tip/eb0b419eff8cf51af8e16cc8c5d2a92d19824266 Author: Jin Yao AuthorDate: Wed, 10 Jan 2018 23:00:26 +0800 Committer: Arnaldo Carvalho de Melo CommitDate: Wed, 17 Jan 2018 10:23:34 -0300 perf report: Improve error msg when no first/last sample time found The following message will be returned to user when executing 'perf report --time' if perf data file doesn't contain the first/last sample time. "HINT: no first/last sample time found in perf data. Please use latest perf binary to execute 'perf record' (if '--buildid-all' is enabled, needs to set '--timestamp-boundary')." Signed-off-by: Jin Yao Reviewed-by: Jiri Olsa Cc: Alexander Shishkin Cc: Andi Kleen Cc: Kan Liang Cc: Peter Zijlstra Link: http://lkml.kernel.org/r/1515596433-24653-2-git-send-email-yao@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/builtin-report.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c index 6593779..7d4f0a5 100644 --- a/tools/perf/builtin-report.c +++ b/tools/perf/builtin-report.c @@ -1300,7 +1300,9 @@ repeat: if (perf_time__parse_str(report.ptime_range, report.time_str) != 0) { if (session->evlist->first_sample_time == 0 && session->evlist->last_sample_time == 0) { - pr_err("No first/last sample time in perf data\n"); + pr_err("HINT: no first/last sample time found in perf data.\n" + "Please use latest perf binary to execute 'perf record'\n" + "(if '--buildid-all' is enabled, please set '--timestamp-boundary').\n"); return -EINVAL; }
Re: [PATCH v1 0/8] perf: Follow-up patches to improve time slice
On 1/16/2018 10:48 PM, Arnaldo Carvalho de Melo wrote: Em Tue, Jan 16, 2018 at 12:55:19PM +0100, Jiri Olsa escreveu: On Wed, Jan 10, 2018 at 11:00:25PM +0800, Jin Yao wrote: It's follow-up patches to improve the perf time slice feature (perf report/script --time xxx) 1. Improve the error message perf report: Improve error msg when no first/last sample time found perf script: Improve error msg when no first/last sample time found 2. Fix an issue that illegal percent was accepted previously (e.g. 1abc%) perf util: Improve error checking for time percent input 3. Omit the slice index if possible. For example, perf report --stdio --time 10%/1 is equivalent to perf report --stdio --time 10% perf util: Support no index time percent slice 4. Add indication of time slices in perf report header. perf report: Add an indication of what time slices are used >>> 5. Remove the time slices number limitation in perf report/script perf util: Allocate time slices buffer according to number of comma perf report: Remove the time slices number limitation perf script: Remove the time slices number limitation Jin Yao (8): perf report: Improve error msg when no first/last sample time found perf script: Improve error msg when no first/last sample time found perf util: Improve error checking for time percent input perf util: Support no index time percent slice perf report: Add an indication of what time slices are used perf util: Allocate time slices buffer according to number of comma perf report: Remove the time slices number limitation perf script: Remove the time slices number limitation from quick look it looks ok Reviewed-by: Jiri Olsa <jo...@redhat.com> Thanks, applied, now test building with 'make -C tools/perf build-test', containers, etc. - Arnaldo Thanks Arnaldo! Thanks Jin Yao
Re: [PATCH v1 0/8] perf: Follow-up patches to improve time slice
On 1/16/2018 10:48 PM, Arnaldo Carvalho de Melo wrote: Em Tue, Jan 16, 2018 at 12:55:19PM +0100, Jiri Olsa escreveu: On Wed, Jan 10, 2018 at 11:00:25PM +0800, Jin Yao wrote: It's follow-up patches to improve the perf time slice feature (perf report/script --time xxx) 1. Improve the error message perf report: Improve error msg when no first/last sample time found perf script: Improve error msg when no first/last sample time found 2. Fix an issue that illegal percent was accepted previously (e.g. 1abc%) perf util: Improve error checking for time percent input 3. Omit the slice index if possible. For example, perf report --stdio --time 10%/1 is equivalent to perf report --stdio --time 10% perf util: Support no index time percent slice 4. Add indication of time slices in perf report header. perf report: Add an indication of what time slices are used >>> 5. Remove the time slices number limitation in perf report/script perf util: Allocate time slices buffer according to number of comma perf report: Remove the time slices number limitation perf script: Remove the time slices number limitation Jin Yao (8): perf report: Improve error msg when no first/last sample time found perf script: Improve error msg when no first/last sample time found perf util: Improve error checking for time percent input perf util: Support no index time percent slice perf report: Add an indication of what time slices are used perf util: Allocate time slices buffer according to number of comma perf report: Remove the time slices number limitation perf script: Remove the time slices number limitation from quick look it looks ok Reviewed-by: Jiri Olsa Thanks, applied, now test building with 'make -C tools/perf build-test', containers, etc. - Arnaldo Thanks Arnaldo! Thanks Jin Yao
Re: [PATCH] perf stat: Ignore error thread when enabling system-wide --per-thread
On 1/16/2018 9:17 PM, Jiri Olsa wrote: On Tue, Jan 16, 2018 at 09:06:09PM +0800, Jin, Yao wrote: Just tested. But looks it's not OK for '--per-thread' case. yea, I haven't tested much.. might need soem tweaking, but my point was that it could be doable on one place instead of introducing another if possible jirka Yes, I understand. It'd better we can put the code to more common places. Actually I used the similar patch as yours at first. While I found it didn't work for the case of system-wide '--per-thread' then I developed current one. Thanks Jin Yao
Re: [PATCH] perf stat: Ignore error thread when enabling system-wide --per-thread
On 1/16/2018 9:17 PM, Jiri Olsa wrote: On Tue, Jan 16, 2018 at 09:06:09PM +0800, Jin, Yao wrote: Just tested. But looks it's not OK for '--per-thread' case. yea, I haven't tested much.. might need soem tweaking, but my point was that it could be doable on one place instead of introducing another if possible jirka Yes, I understand. It'd better we can put the code to more common places. Actually I used the similar patch as yours at first. While I found it didn't work for the case of system-wide '--per-thread' then I developed current one. Thanks Jin Yao
Re: [PATCH] perf stat: Ignore error thread when enabling system-wide --per-thread
Just tested. But looks it's not OK for '--per-thread' case. jinyao@skl:~/$ ./perf stat --per-thread WARNING: Ignored open failure for pid 1 WARNING: Ignored open failure for pid 2 WARNING: Ignored open failure for pid 4 WARNING: Ignored open failure for pid 6 WARNING: Ignored open failure for pid 7 WARNING: Ignored open failure for pid 8 WARNING: Ignored open failure for pid 9 WARNING: Ignored open failure for pid 10 .. WARNING: Ignored open failure for pid 11783 WARNING: Ignored open failure for pid 12275 WARNING: Ignored open failure for pid 31733 WARNING: Ignored open failure for pid 31737 Error: You may not have permission to collect system-wide stats. Consider tweaking /proc/sys/kernel/perf_event_paranoid, which controls use of the performance events system by unprivileged users (without CAP_SYS_ADMIN). The current value is 2: -1: Allow use of (almost) all events by all users Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK >= 0: Disallow ftrace function tracepoint by users without CAP_SYS_ADMIN Disallow raw tracepoint access by users without CAP_SYS_ADMIN >= 1: Disallow CPU event access by users without CAP_SYS_ADMIN >= 2: Disallow kernel profiling by users without CAP_SYS_ADMIN To make this setting permanent, edit /etc/sysctl.conf too, e.g.: kernel.perf_event_paranoid = -1 Thanks Jin Yao On 1/16/2018 8:51 PM, Jiri Olsa wrote: On Tue, Jan 16, 2018 at 11:43:08PM +0800, Jin Yao wrote: If we execute 'perf stat --per-thread' with non-root account (even set kernel.perf_event_paranoid = -1 yet), it reports the error: jinyao@skl:~$ perf stat --per-thread Error: You may not have permission to collect system-wide stats. Consider tweaking /proc/sys/kernel/perf_event_paranoid, which controls use of the performance events system by unprivileged users (without CAP_SYS_ADMIN). The current value is 2: -1: Allow use of (almost) all events by all users Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK = 0: Disallow ftrace function tracepoint by users without CAP_SYS_ADMIN Disallow raw tracepoint access by users without CAP_SYS_ADMIN = 1: Disallow CPU event access by users without CAP_SYS_ADMIN = 2: Disallow kernel profiling by users without CAP_SYS_ADMIN To make this setting permanent, edit /etc/sysctl.conf too, e.g.: kernel.perf_event_paranoid = -1 Perhaps the ptrace rule doesn't allow to trace some processes. But anyway the global --per-thread mode had better ignore such errors and continue working on other threads. This patch will record the index of error thread in perf_evsel__open() and remove this thread before retrying. For example (run with non-root, kernel.perf_event_paranoid isn't set): jinyao@skl:~$ perf stat --per-thread ^C Performance counter stats for 'system wide': vmstat-3458 6.171984 cpu-clock:u (msec)# 0.000 CPUs utilized perf-3670 0.515599 cpu-clock:u (msec)# 0.000 CPUs utilized vmstat-3458 1,163,643 cycles:u # 0.189 GHz perf-3670 40,881 cycles:u # 0.079 GHz vmstat-3458 1,410,238 instructions:u# 1.21 insn per cycle perf-3670 3,536 instructions:u# 0.09 insn per cycle vmstat-3458288,937 branches:u# 46.814 M/sec perf-3670936 branches:u# 1.815 M/sec vmstat-3458 15,195 branch-misses:u # 5.26% of all branches perf-3670 76 branch-misses:u # 8.12% of all branches 12.651675247 seconds time elapsed could we use the existing code like in attached patch? we might also swap following warning for some generic one: WARNING: Ignored open failure for pid 28392 jirka --- diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c index 98bf9d32f222..f9d4d3ad6fc5 100644 --- a/tools/perf/builtin-stat.c +++ b/tools/perf/builtin-stat.c @@ -276,6 +276,9 @@ static int create_perf_stat_counter(struct perf_evsel *evsel) attr->enable_on_exec = 1; } + if (stat_config.aggr_mode == AGGR_THREAD) + evsel->ignore_missing_thread = true; + if (target__has_cpu() && !target__has_per_thread()) return perf_evsel__open_per_cpu(evsel, perf_evsel__cpus(evsel)); diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c index 8f971a2301d1..509ee175bc97 100644 --- a/tools/perf/util/evsel.c +++ b/tools/perf/util/evsel.c @@ -1656,7 +1656,7 @@ static bool ignore_missing_thread(struct perf_evsel *evsel, return false; /* The -ESRCH is perf event syscall errno for pid's not found. */ -
Re: [PATCH] perf stat: Ignore error thread when enabling system-wide --per-thread
Just tested. But looks it's not OK for '--per-thread' case. jinyao@skl:~/$ ./perf stat --per-thread WARNING: Ignored open failure for pid 1 WARNING: Ignored open failure for pid 2 WARNING: Ignored open failure for pid 4 WARNING: Ignored open failure for pid 6 WARNING: Ignored open failure for pid 7 WARNING: Ignored open failure for pid 8 WARNING: Ignored open failure for pid 9 WARNING: Ignored open failure for pid 10 .. WARNING: Ignored open failure for pid 11783 WARNING: Ignored open failure for pid 12275 WARNING: Ignored open failure for pid 31733 WARNING: Ignored open failure for pid 31737 Error: You may not have permission to collect system-wide stats. Consider tweaking /proc/sys/kernel/perf_event_paranoid, which controls use of the performance events system by unprivileged users (without CAP_SYS_ADMIN). The current value is 2: -1: Allow use of (almost) all events by all users Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK >= 0: Disallow ftrace function tracepoint by users without CAP_SYS_ADMIN Disallow raw tracepoint access by users without CAP_SYS_ADMIN >= 1: Disallow CPU event access by users without CAP_SYS_ADMIN >= 2: Disallow kernel profiling by users without CAP_SYS_ADMIN To make this setting permanent, edit /etc/sysctl.conf too, e.g.: kernel.perf_event_paranoid = -1 Thanks Jin Yao On 1/16/2018 8:51 PM, Jiri Olsa wrote: On Tue, Jan 16, 2018 at 11:43:08PM +0800, Jin Yao wrote: If we execute 'perf stat --per-thread' with non-root account (even set kernel.perf_event_paranoid = -1 yet), it reports the error: jinyao@skl:~$ perf stat --per-thread Error: You may not have permission to collect system-wide stats. Consider tweaking /proc/sys/kernel/perf_event_paranoid, which controls use of the performance events system by unprivileged users (without CAP_SYS_ADMIN). The current value is 2: -1: Allow use of (almost) all events by all users Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK = 0: Disallow ftrace function tracepoint by users without CAP_SYS_ADMIN Disallow raw tracepoint access by users without CAP_SYS_ADMIN = 1: Disallow CPU event access by users without CAP_SYS_ADMIN = 2: Disallow kernel profiling by users without CAP_SYS_ADMIN To make this setting permanent, edit /etc/sysctl.conf too, e.g.: kernel.perf_event_paranoid = -1 Perhaps the ptrace rule doesn't allow to trace some processes. But anyway the global --per-thread mode had better ignore such errors and continue working on other threads. This patch will record the index of error thread in perf_evsel__open() and remove this thread before retrying. For example (run with non-root, kernel.perf_event_paranoid isn't set): jinyao@skl:~$ perf stat --per-thread ^C Performance counter stats for 'system wide': vmstat-3458 6.171984 cpu-clock:u (msec)# 0.000 CPUs utilized perf-3670 0.515599 cpu-clock:u (msec)# 0.000 CPUs utilized vmstat-3458 1,163,643 cycles:u # 0.189 GHz perf-3670 40,881 cycles:u # 0.079 GHz vmstat-3458 1,410,238 instructions:u# 1.21 insn per cycle perf-3670 3,536 instructions:u# 0.09 insn per cycle vmstat-3458288,937 branches:u# 46.814 M/sec perf-3670936 branches:u# 1.815 M/sec vmstat-3458 15,195 branch-misses:u # 5.26% of all branches perf-3670 76 branch-misses:u # 8.12% of all branches 12.651675247 seconds time elapsed could we use the existing code like in attached patch? we might also swap following warning for some generic one: WARNING: Ignored open failure for pid 28392 jirka --- diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c index 98bf9d32f222..f9d4d3ad6fc5 100644 --- a/tools/perf/builtin-stat.c +++ b/tools/perf/builtin-stat.c @@ -276,6 +276,9 @@ static int create_perf_stat_counter(struct perf_evsel *evsel) attr->enable_on_exec = 1; } + if (stat_config.aggr_mode == AGGR_THREAD) + evsel->ignore_missing_thread = true; + if (target__has_cpu() && !target__has_per_thread()) return perf_evsel__open_per_cpu(evsel, perf_evsel__cpus(evsel)); diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c index 8f971a2301d1..509ee175bc97 100644 --- a/tools/perf/util/evsel.c +++ b/tools/perf/util/evsel.c @@ -1656,7 +1656,7 @@ static bool ignore_missing_thread(struct perf_evsel *evsel, return false; /* The -ESRCH is perf event syscall errno for pid's not found. */ -
Re: [PATCH v1 0/8] perf: Follow-up patches to improve time slice
On 1/16/2018 7:55 PM, Jiri Olsa wrote: On Wed, Jan 10, 2018 at 11:00:25PM +0800, Jin Yao wrote: It's follow-up patches to improve the perf time slice feature (perf report/script --time xxx) 1. Improve the error message perf report: Improve error msg when no first/last sample time found perf script: Improve error msg when no first/last sample time found 2. Fix an issue that illegal percent was accepted previously (e.g. 1abc%) perf util: Improve error checking for time percent input 3. Omit the slice index if possible. For example, perf report --stdio --time 10%/1 is equivalent to perf report --stdio --time 10% perf util: Support no index time percent slice 4. Add indication of time slices in perf report header. perf report: Add an indication of what time slices are used 5. Remove the time slices number limitation in perf report/script perf util: Allocate time slices buffer according to number of comma perf report: Remove the time slices number limitation perf script: Remove the time slices number limitation Jin Yao (8): perf report: Improve error msg when no first/last sample time found perf script: Improve error msg when no first/last sample time found perf util: Improve error checking for time percent input perf util: Support no index time percent slice perf report: Add an indication of what time slices are used perf util: Allocate time slices buffer according to number of comma perf report: Remove the time slices number limitation perf script: Remove the time slices number limitation from quick look it looks ok Reviewed-by: Jiri Olsa <jo...@redhat.com> thanks, jirka Hi Jiri, Thanks so much for reviewing the patch. Thanks Jin Yao
Re: [PATCH v1 0/8] perf: Follow-up patches to improve time slice
On 1/16/2018 7:55 PM, Jiri Olsa wrote: On Wed, Jan 10, 2018 at 11:00:25PM +0800, Jin Yao wrote: It's follow-up patches to improve the perf time slice feature (perf report/script --time xxx) 1. Improve the error message perf report: Improve error msg when no first/last sample time found perf script: Improve error msg when no first/last sample time found 2. Fix an issue that illegal percent was accepted previously (e.g. 1abc%) perf util: Improve error checking for time percent input 3. Omit the slice index if possible. For example, perf report --stdio --time 10%/1 is equivalent to perf report --stdio --time 10% perf util: Support no index time percent slice 4. Add indication of time slices in perf report header. perf report: Add an indication of what time slices are used 5. Remove the time slices number limitation in perf report/script perf util: Allocate time slices buffer according to number of comma perf report: Remove the time slices number limitation perf script: Remove the time slices number limitation Jin Yao (8): perf report: Improve error msg when no first/last sample time found perf script: Improve error msg when no first/last sample time found perf util: Improve error checking for time percent input perf util: Support no index time percent slice perf report: Add an indication of what time slices are used perf util: Allocate time slices buffer according to number of comma perf report: Remove the time slices number limitation perf script: Remove the time slices number limitation from quick look it looks ok Reviewed-by: Jiri Olsa thanks, jirka Hi Jiri, Thanks so much for reviewing the patch. Thanks Jin Yao
[PATCH] perf stat: Ignore error thread when enabling system-wide --per-thread
If we execute 'perf stat --per-thread' with non-root account (even set kernel.perf_event_paranoid = -1 yet), it reports the error: jinyao@skl:~$ perf stat --per-thread Error: You may not have permission to collect system-wide stats. Consider tweaking /proc/sys/kernel/perf_event_paranoid, which controls use of the performance events system by unprivileged users (without CAP_SYS_ADMIN). The current value is 2: -1: Allow use of (almost) all events by all users Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK >= 0: Disallow ftrace function tracepoint by users without CAP_SYS_ADMIN Disallow raw tracepoint access by users without CAP_SYS_ADMIN >= 1: Disallow CPU event access by users without CAP_SYS_ADMIN >= 2: Disallow kernel profiling by users without CAP_SYS_ADMIN To make this setting permanent, edit /etc/sysctl.conf too, e.g.: kernel.perf_event_paranoid = -1 Perhaps the ptrace rule doesn't allow to trace some processes. But anyway the global --per-thread mode had better ignore such errors and continue working on other threads. This patch will record the index of error thread in perf_evsel__open() and remove this thread before retrying. For example (run with non-root, kernel.perf_event_paranoid isn't set): jinyao@skl:~$ perf stat --per-thread ^C Performance counter stats for 'system wide': vmstat-3458 6.171984 cpu-clock:u (msec)# 0.000 CPUs utilized perf-3670 0.515599 cpu-clock:u (msec)# 0.000 CPUs utilized vmstat-3458 1,163,643 cycles:u # 0.189 GHz perf-3670 40,881 cycles:u # 0.079 GHz vmstat-3458 1,410,238 instructions:u# 1.21 insn per cycle perf-3670 3,536 instructions:u# 0.09 insn per cycle vmstat-3458288,937 branches:u# 46.814 M/sec perf-3670936 branches:u# 1.815 M/sec vmstat-3458 15,195 branch-misses:u # 5.26% of all branches perf-3670 76 branch-misses:u # 8.12% of all branches 12.651675247 seconds time elapsed Signed-off-by: Jin Yao <yao@linux.intel.com> --- tools/perf/builtin-stat.c| 14 +- tools/perf/util/evsel.c | 3 +++ tools/perf/util/thread_map.c | 1 + tools/perf/util/thread_map.h | 1 + 4 files changed, 18 insertions(+), 1 deletion(-) diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c index 98bf9d3..bcdb47c 100644 --- a/tools/perf/builtin-stat.c +++ b/tools/perf/builtin-stat.c @@ -632,7 +632,19 @@ static int __run_perf_stat(int argc, const char **argv) if (verbose > 0) ui__warning("%s\n", msg); goto try_again; -} + } else if (target__has_per_thread() && + evsel_list->threads && + evsel_list->threads->err_thread != -1) { + /* +* For global --per-thread case, skip current +* error thread. +*/ + if (!thread_map__remove(evsel_list->threads, + evsel_list->threads->err_thread)) { + evsel_list->threads->err_thread = -1; + goto try_again; + } + } perf_evsel__open_strerror(counter, , errno, msg, sizeof(msg)); diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c index a4d256e..12f8733 100644 --- a/tools/perf/util/evsel.c +++ b/tools/perf/util/evsel.c @@ -1899,6 +1899,9 @@ int perf_evsel__open(struct perf_evsel *evsel, struct cpu_map *cpus, goto fallback_missing_features; } out_close: + if (err) + threads->err_thread = thread; + do { while (--thread >= 0) { close(FD(evsel, cpu, thread)); diff --git a/tools/perf/util/thread_map.c b/tools/perf/util/thread_map.c index 3e1038f..870cb0c 100644 --- a/tools/perf/util/thread_map.c +++ b/tools/perf/util/thread_map.c @@ -32,6 +32,7 @@ static void thread_map__reset(struct thread_map *map, int start, int nr) size_t size = (nr - start) * sizeof(map->map[0]); memset(>map[start], 0, size); + map->err_thread = -1; } static struct thread_map *thread_map__
[PATCH] perf stat: Ignore error thread when enabling system-wide --per-thread
If we execute 'perf stat --per-thread' with non-root account (even set kernel.perf_event_paranoid = -1 yet), it reports the error: jinyao@skl:~$ perf stat --per-thread Error: You may not have permission to collect system-wide stats. Consider tweaking /proc/sys/kernel/perf_event_paranoid, which controls use of the performance events system by unprivileged users (without CAP_SYS_ADMIN). The current value is 2: -1: Allow use of (almost) all events by all users Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK >= 0: Disallow ftrace function tracepoint by users without CAP_SYS_ADMIN Disallow raw tracepoint access by users without CAP_SYS_ADMIN >= 1: Disallow CPU event access by users without CAP_SYS_ADMIN >= 2: Disallow kernel profiling by users without CAP_SYS_ADMIN To make this setting permanent, edit /etc/sysctl.conf too, e.g.: kernel.perf_event_paranoid = -1 Perhaps the ptrace rule doesn't allow to trace some processes. But anyway the global --per-thread mode had better ignore such errors and continue working on other threads. This patch will record the index of error thread in perf_evsel__open() and remove this thread before retrying. For example (run with non-root, kernel.perf_event_paranoid isn't set): jinyao@skl:~$ perf stat --per-thread ^C Performance counter stats for 'system wide': vmstat-3458 6.171984 cpu-clock:u (msec)# 0.000 CPUs utilized perf-3670 0.515599 cpu-clock:u (msec)# 0.000 CPUs utilized vmstat-3458 1,163,643 cycles:u # 0.189 GHz perf-3670 40,881 cycles:u # 0.079 GHz vmstat-3458 1,410,238 instructions:u# 1.21 insn per cycle perf-3670 3,536 instructions:u# 0.09 insn per cycle vmstat-3458288,937 branches:u# 46.814 M/sec perf-3670936 branches:u# 1.815 M/sec vmstat-3458 15,195 branch-misses:u # 5.26% of all branches perf-3670 76 branch-misses:u # 8.12% of all branches 12.651675247 seconds time elapsed Signed-off-by: Jin Yao --- tools/perf/builtin-stat.c| 14 +- tools/perf/util/evsel.c | 3 +++ tools/perf/util/thread_map.c | 1 + tools/perf/util/thread_map.h | 1 + 4 files changed, 18 insertions(+), 1 deletion(-) diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c index 98bf9d3..bcdb47c 100644 --- a/tools/perf/builtin-stat.c +++ b/tools/perf/builtin-stat.c @@ -632,7 +632,19 @@ static int __run_perf_stat(int argc, const char **argv) if (verbose > 0) ui__warning("%s\n", msg); goto try_again; -} + } else if (target__has_per_thread() && + evsel_list->threads && + evsel_list->threads->err_thread != -1) { + /* +* For global --per-thread case, skip current +* error thread. +*/ + if (!thread_map__remove(evsel_list->threads, + evsel_list->threads->err_thread)) { + evsel_list->threads->err_thread = -1; + goto try_again; + } + } perf_evsel__open_strerror(counter, , errno, msg, sizeof(msg)); diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c index a4d256e..12f8733 100644 --- a/tools/perf/util/evsel.c +++ b/tools/perf/util/evsel.c @@ -1899,6 +1899,9 @@ int perf_evsel__open(struct perf_evsel *evsel, struct cpu_map *cpus, goto fallback_missing_features; } out_close: + if (err) + threads->err_thread = thread; + do { while (--thread >= 0) { close(FD(evsel, cpu, thread)); diff --git a/tools/perf/util/thread_map.c b/tools/perf/util/thread_map.c index 3e1038f..870cb0c 100644 --- a/tools/perf/util/thread_map.c +++ b/tools/perf/util/thread_map.c @@ -32,6 +32,7 @@ static void thread_map__reset(struct thread_map *map, int start, int nr) size_t size = (nr - start) * sizeof(map->map[0]); memset(>map[start], 0, size); + map->err_thread = -1; } static struct thread_map *thread_map__realloc(struct thread_
Re: [PATCH] perf util: Display warning when perf report/annotate is missing some libs
On 1/11/2018 11:30 PM, Jiri Olsa wrote: On Thu, Jan 11, 2018 at 07:03:06PM +0800, Jin Yao wrote: We keep having bug reports that when users build perf on their own, we already have same warnings during the build Yes, there will be warnings displayed during the build if some libraries are missing. While I do find the users (especially the new perf user) don't notice the warning. They only care about the success of build. Once they use this perf version and find something working strangely, they will say that perf may have bug. :( but they don't install some needed libraries like libelf, libbfd/libibery. how about saying that in the symbol column, instead of poluting report's output, like: $ perf report --stdio # Overhead Command Shared Object Symbol (disabled) # ... .. I just think it'd better provide some hints to user. For example, "symbol is disabled and you need to install libelf/xxx", say something like that. But it looks the column can't contain too much information (i.e. no more space to contain the entire hints). Any idea? Or just add this warning in verbose mode? also your change does not affect tui mode annotation for some reason does not start at all.. could be little more verbose ;-) jirka Yes, it doesn't affect tui mode. Or we just add this warning in verbose mode? e.g. perf report -v? Thanks Jin Yao
Re: [PATCH] perf util: Display warning when perf report/annotate is missing some libs
On 1/11/2018 11:30 PM, Jiri Olsa wrote: On Thu, Jan 11, 2018 at 07:03:06PM +0800, Jin Yao wrote: We keep having bug reports that when users build perf on their own, we already have same warnings during the build Yes, there will be warnings displayed during the build if some libraries are missing. While I do find the users (especially the new perf user) don't notice the warning. They only care about the success of build. Once they use this perf version and find something working strangely, they will say that perf may have bug. :( but they don't install some needed libraries like libelf, libbfd/libibery. how about saying that in the symbol column, instead of poluting report's output, like: $ perf report --stdio # Overhead Command Shared Object Symbol (disabled) # ... .. I just think it'd better provide some hints to user. For example, "symbol is disabled and you need to install libelf/xxx", say something like that. But it looks the column can't contain too much information (i.e. no more space to contain the entire hints). Any idea? Or just add this warning in verbose mode? also your change does not affect tui mode annotation for some reason does not start at all.. could be little more verbose ;-) jirka Yes, it doesn't affect tui mode. Or we just add this warning in verbose mode? e.g. perf report -v? Thanks Jin Yao
[tip:perf/core] perf script: Support time percent and multiple time ranges
Commit-ID: 2ab046cd01e33a854798a3e245c9e3f32b950a7d Gitweb: https://git.kernel.org/tip/2ab046cd01e33a854798a3e245c9e3f32b950a7d Author: Jin Yao <yao@linux.intel.com> AuthorDate: Fri, 8 Dec 2017 21:13:46 +0800 Committer: Arnaldo Carvalho de Melo <a...@redhat.com> CommitDate: Mon, 8 Jan 2018 12:07:06 -0300 perf script: Support time percent and multiple time ranges perf script has a --time option to limit the time range of output. It only supports absolute time. Now this option is extended to support multiple time ranges and support the percent of time. For example: 1. Select the first and second 10% time slices: perf script --time 10%/1,10%/2 2. Select from 0% to 10% and 30% to 40% slices: perf script --time 0%-10%,30%-40% Changelog: v6: Fix the merge issue with latest perf/core branch. No functional changes. v5: Add checking of first/last sample time to detect if it's recorded in perf.data. If it's not recorded, returns error message to user. v4: Remove perf_time__skip_sample, only uses perf_time__ranges_skip_sample v3: Since the definitions of first_sample_time/last_sample_time are moved from perf_session to perf_evlist so change the related code. Signed-off-by: Jin Yao <yao@linux.intel.com> Acked-by: Jiri Olsa <jo...@kernel.org> Tested-by: Arnaldo Carvalho de Melo <a...@redhat.com> Cc: Alexander Shishkin <alexander.shish...@linux.intel.com> Cc: Andi Kleen <a...@linux.intel.com> Cc: Kan Liang <kan.li...@intel.com> Cc: Peter Zijlstra <pet...@infradead.org> Link: http://lkml.kernel.org/r/1512738826-2628-7-git-send-email-yao@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <a...@redhat.com> --- tools/perf/Documentation/perf-script.txt | 16 +++ tools/perf/builtin-script.c | 34 ++-- 2 files changed, 44 insertions(+), 6 deletions(-) diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt index 974ceb1..7b622a8 100644 --- a/tools/perf/Documentation/perf-script.txt +++ b/tools/perf/Documentation/perf-script.txt @@ -329,6 +329,22 @@ include::itrace.txt[] stop time is not given (i.e, time string is 'x.y,') then analysis goes to end of file. + Also support time percent with multipe time range. Time string is + 'a%/n,b%/m,...' or 'a%-b%,c%-%d,...'. The maximum number of slices is 10. + + For example: + Select the second 10% time slice + perf script --time 10%/2 + + Select from 0% to 10% time slice + perf script --time 0%-10% + + Select the first and second 10% time slices + perf script --time 10%/1,10%/2 + + Select from 0% to 10% and 30% to 40% slices + perf script --time 0%-10%,30%-40% + --max-blocks:: Set the maximum number of program blocks to print with brstackasm for each sample. diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c index 77e47cf..330dcd9 100644 --- a/tools/perf/builtin-script.c +++ b/tools/perf/builtin-script.c @@ -1436,6 +1436,8 @@ static int perf_sample__fprintf_synth(struct perf_sample *sample, return 0; } +#define PTIME_RANGE_MAX10 + struct perf_script { struct perf_tooltool; struct perf_session *session; @@ -1449,7 +1451,8 @@ struct perf_script { struct thread_map *threads; int name_width; const char *time_str; - struct perf_time_interval ptime; + struct perf_time_interval ptime_range[PTIME_RANGE_MAX]; + int range_num; }; static int perf_evlist__max_name_len(struct perf_evlist *evlist) @@ -1734,8 +1737,10 @@ static int process_sample_event(struct perf_tool *tool, struct perf_script *scr = container_of(tool, struct perf_script, tool); struct addr_location al; - if (perf_time__skip_sample(>ptime, sample->time)) + if (perf_time__ranges_skip_sample(scr->ptime_range, scr->range_num, + sample->time)) { return 0; + } if (debug_mode) { if (sample->time < last_timestamp) { @@ -3360,10 +3365,27 @@ int cmd_script(int argc, const char **argv) goto out_delete; /* needs to be parsed after looking up reference time */ - if (perf_time__parse_str(, script.time_str) != 0) { - pr_err("Invalid time string\n"); - err = -EINVAL; - goto out_delete; + if (perf_time__parse_str(script.ptime_range, script.time_str) != 0) { + if (session->evlist->first_sample_time == 0 && + session->evlist->last_sample_time == 0) { + pr_err("No first/last sample time in perf data\n"); + err = -EINVAL; +
[tip:perf/core] perf script: Support time percent and multiple time ranges
Commit-ID: 2ab046cd01e33a854798a3e245c9e3f32b950a7d Gitweb: https://git.kernel.org/tip/2ab046cd01e33a854798a3e245c9e3f32b950a7d Author: Jin Yao AuthorDate: Fri, 8 Dec 2017 21:13:46 +0800 Committer: Arnaldo Carvalho de Melo CommitDate: Mon, 8 Jan 2018 12:07:06 -0300 perf script: Support time percent and multiple time ranges perf script has a --time option to limit the time range of output. It only supports absolute time. Now this option is extended to support multiple time ranges and support the percent of time. For example: 1. Select the first and second 10% time slices: perf script --time 10%/1,10%/2 2. Select from 0% to 10% and 30% to 40% slices: perf script --time 0%-10%,30%-40% Changelog: v6: Fix the merge issue with latest perf/core branch. No functional changes. v5: Add checking of first/last sample time to detect if it's recorded in perf.data. If it's not recorded, returns error message to user. v4: Remove perf_time__skip_sample, only uses perf_time__ranges_skip_sample v3: Since the definitions of first_sample_time/last_sample_time are moved from perf_session to perf_evlist so change the related code. Signed-off-by: Jin Yao Acked-by: Jiri Olsa Tested-by: Arnaldo Carvalho de Melo Cc: Alexander Shishkin Cc: Andi Kleen Cc: Kan Liang Cc: Peter Zijlstra Link: http://lkml.kernel.org/r/1512738826-2628-7-git-send-email-yao@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/Documentation/perf-script.txt | 16 +++ tools/perf/builtin-script.c | 34 ++-- 2 files changed, 44 insertions(+), 6 deletions(-) diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt index 974ceb1..7b622a8 100644 --- a/tools/perf/Documentation/perf-script.txt +++ b/tools/perf/Documentation/perf-script.txt @@ -329,6 +329,22 @@ include::itrace.txt[] stop time is not given (i.e, time string is 'x.y,') then analysis goes to end of file. + Also support time percent with multipe time range. Time string is + 'a%/n,b%/m,...' or 'a%-b%,c%-%d,...'. The maximum number of slices is 10. + + For example: + Select the second 10% time slice + perf script --time 10%/2 + + Select from 0% to 10% time slice + perf script --time 0%-10% + + Select the first and second 10% time slices + perf script --time 10%/1,10%/2 + + Select from 0% to 10% and 30% to 40% slices + perf script --time 0%-10%,30%-40% + --max-blocks:: Set the maximum number of program blocks to print with brstackasm for each sample. diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c index 77e47cf..330dcd9 100644 --- a/tools/perf/builtin-script.c +++ b/tools/perf/builtin-script.c @@ -1436,6 +1436,8 @@ static int perf_sample__fprintf_synth(struct perf_sample *sample, return 0; } +#define PTIME_RANGE_MAX10 + struct perf_script { struct perf_tooltool; struct perf_session *session; @@ -1449,7 +1451,8 @@ struct perf_script { struct thread_map *threads; int name_width; const char *time_str; - struct perf_time_interval ptime; + struct perf_time_interval ptime_range[PTIME_RANGE_MAX]; + int range_num; }; static int perf_evlist__max_name_len(struct perf_evlist *evlist) @@ -1734,8 +1737,10 @@ static int process_sample_event(struct perf_tool *tool, struct perf_script *scr = container_of(tool, struct perf_script, tool); struct addr_location al; - if (perf_time__skip_sample(>ptime, sample->time)) + if (perf_time__ranges_skip_sample(scr->ptime_range, scr->range_num, + sample->time)) { return 0; + } if (debug_mode) { if (sample->time < last_timestamp) { @@ -3360,10 +3365,27 @@ int cmd_script(int argc, const char **argv) goto out_delete; /* needs to be parsed after looking up reference time */ - if (perf_time__parse_str(, script.time_str) != 0) { - pr_err("Invalid time string\n"); - err = -EINVAL; - goto out_delete; + if (perf_time__parse_str(script.ptime_range, script.time_str) != 0) { + if (session->evlist->first_sample_time == 0 && + session->evlist->last_sample_time == 0) { + pr_err("No first/last sample time in perf data\n"); + err = -EINVAL; + goto out_delete; + } + + script.range_num = perf_time__percent_parse_str( + script.ptime_range, PTIME_RANGE_MAX, + script.time_str, +
[tip:perf/core] perf report: Support time percent and multiple time ranges
Commit-ID: 5b969bc766807e5c2f184d1d6f97b8471de946f1 Gitweb: https://git.kernel.org/tip/5b969bc766807e5c2f184d1d6f97b8471de946f1 Author: Jin Yao <yao@linux.intel.com> AuthorDate: Fri, 8 Dec 2017 21:13:45 +0800 Committer: Arnaldo Carvalho de Melo <a...@redhat.com> CommitDate: Mon, 8 Jan 2018 12:06:20 -0300 perf report: Support time percent and multiple time ranges perf report has a --time option to limit the time range of output. It only supports absolute time. Now this option is extended to support multiple time ranges and support the percent of time. For example: 1. Select the first and second 10% time slices: perf report --time 10%/1,10%/2 2. Select from 0% to 10% and 30% to 40% slices: perf report --time 0%-10%,30%-40% Changelog: v6: Fix the merge issue with latest perf/core branch. No functional changes. v5: Add checking of first/last sample time to detect if it's recorded in perf.data. If it's not recorded, returns error message to user. v4: Remove perf_time__skip_sample, only uses perf_time__ranges_skip_sample v3: Since the definitions of first_sample_time/last_sample_time are moved from perf_session to perf_evlist so change the related code. Signed-off-by: Jin Yao <yao@linux.intel.com> Acked-by: Jiri Olsa <jo...@kernel.org> Tested-by: Arnaldo Carvalho de Melo <a...@redhat.com> Cc: Alexander Shishkin <alexander.shish...@linux.intel.com> Cc: Andi Kleen <a...@linux.intel.com> Cc: Kan Liang <kan.li...@intel.com> Cc: Peter Zijlstra <pet...@infradead.org> Link: http://lkml.kernel.org/r/1512738826-2628-6-git-send-email-yao@linux.intel.com [ Add missing colons at end of examples in the man page ] Signed-off-by: Arnaldo Carvalho de Melo <a...@redhat.com> --- tools/perf/Documentation/perf-report.txt | 20 tools/perf/builtin-report.c | 31 ++- 2 files changed, 46 insertions(+), 5 deletions(-) diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt index ddde2b5..1e02c4e 100644 --- a/tools/perf/Documentation/perf-report.txt +++ b/tools/perf/Documentation/perf-report.txt @@ -402,6 +402,26 @@ OPTIONS stop time is not given (i.e, time string is 'x.y,') then analysis goes to end of file. + Also support time percent with multiple time range. Time string is + 'a%/n,b%/m,...' or 'a%-b%,c%-%d,...'. The maximum number of slices is 10. + + For example: + Select the second 10% time slice: + + perf report --time 10%/2 + + Select from 0% to 10% time slice: + + perf report --time 0%-10% + + Select the first and second 10% time slices: + + perf report --time 10%/1,10%/2 + + Select from 0% to 10% and 30% to 40% slices: + + perf report --time 0%-10%,30%-40% + --itrace:: Options for decoding instruction tracing data. The options are: diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c index 07827cd..770bf8a 100644 --- a/tools/perf/builtin-report.c +++ b/tools/perf/builtin-report.c @@ -52,6 +52,8 @@ #include #include +#define PTIME_RANGE_MAX10 + struct report { struct perf_tooltool; struct perf_session *session; @@ -69,7 +71,8 @@ struct report { const char *cpu_list; const char *symbol_filter_str; const char *time_str; - struct perf_time_interval ptime; + struct perf_time_interval ptime_range[PTIME_RANGE_MAX]; + int range_num; float min_percent; u64 nr_entries; u64 queue_size; @@ -202,8 +205,10 @@ static int process_sample_event(struct perf_tool *tool, }; int ret = 0; - if (perf_time__skip_sample(>ptime, sample->time)) + if (perf_time__ranges_skip_sample(rep->ptime_range, rep->range_num, + sample->time)) { return 0; + } if (machine__resolve(machine, , sample) < 0) { pr_debug("problem processing %d event, skipping it.\n", @@ -1093,9 +1098,25 @@ repeat: if (symbol__init(>header.env) < 0) goto error; - if (perf_time__parse_str(, report.time_str) != 0) { - pr_err("Invalid time string\n"); - return -EINVAL; + if (perf_time__parse_str(report.ptime_range, report.time_str) != 0) { + if (session->evlist->first_sample_time == 0 && + session->evlist->last_sample_time == 0) { + pr_err("No first/last sample time in perf data\n"); + return -EINVAL; + } + + report.range_num = perf_time__percent_parse_str
[tip:perf/core] perf report: Support time percent and multiple time ranges
Commit-ID: 5b969bc766807e5c2f184d1d6f97b8471de946f1 Gitweb: https://git.kernel.org/tip/5b969bc766807e5c2f184d1d6f97b8471de946f1 Author: Jin Yao AuthorDate: Fri, 8 Dec 2017 21:13:45 +0800 Committer: Arnaldo Carvalho de Melo CommitDate: Mon, 8 Jan 2018 12:06:20 -0300 perf report: Support time percent and multiple time ranges perf report has a --time option to limit the time range of output. It only supports absolute time. Now this option is extended to support multiple time ranges and support the percent of time. For example: 1. Select the first and second 10% time slices: perf report --time 10%/1,10%/2 2. Select from 0% to 10% and 30% to 40% slices: perf report --time 0%-10%,30%-40% Changelog: v6: Fix the merge issue with latest perf/core branch. No functional changes. v5: Add checking of first/last sample time to detect if it's recorded in perf.data. If it's not recorded, returns error message to user. v4: Remove perf_time__skip_sample, only uses perf_time__ranges_skip_sample v3: Since the definitions of first_sample_time/last_sample_time are moved from perf_session to perf_evlist so change the related code. Signed-off-by: Jin Yao Acked-by: Jiri Olsa Tested-by: Arnaldo Carvalho de Melo Cc: Alexander Shishkin Cc: Andi Kleen Cc: Kan Liang Cc: Peter Zijlstra Link: http://lkml.kernel.org/r/1512738826-2628-6-git-send-email-yao@linux.intel.com [ Add missing colons at end of examples in the man page ] Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/Documentation/perf-report.txt | 20 tools/perf/builtin-report.c | 31 ++- 2 files changed, 46 insertions(+), 5 deletions(-) diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt index ddde2b5..1e02c4e 100644 --- a/tools/perf/Documentation/perf-report.txt +++ b/tools/perf/Documentation/perf-report.txt @@ -402,6 +402,26 @@ OPTIONS stop time is not given (i.e, time string is 'x.y,') then analysis goes to end of file. + Also support time percent with multiple time range. Time string is + 'a%/n,b%/m,...' or 'a%-b%,c%-%d,...'. The maximum number of slices is 10. + + For example: + Select the second 10% time slice: + + perf report --time 10%/2 + + Select from 0% to 10% time slice: + + perf report --time 0%-10% + + Select the first and second 10% time slices: + + perf report --time 10%/1,10%/2 + + Select from 0% to 10% and 30% to 40% slices: + + perf report --time 0%-10%,30%-40% + --itrace:: Options for decoding instruction tracing data. The options are: diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c index 07827cd..770bf8a 100644 --- a/tools/perf/builtin-report.c +++ b/tools/perf/builtin-report.c @@ -52,6 +52,8 @@ #include #include +#define PTIME_RANGE_MAX10 + struct report { struct perf_tooltool; struct perf_session *session; @@ -69,7 +71,8 @@ struct report { const char *cpu_list; const char *symbol_filter_str; const char *time_str; - struct perf_time_interval ptime; + struct perf_time_interval ptime_range[PTIME_RANGE_MAX]; + int range_num; float min_percent; u64 nr_entries; u64 queue_size; @@ -202,8 +205,10 @@ static int process_sample_event(struct perf_tool *tool, }; int ret = 0; - if (perf_time__skip_sample(>ptime, sample->time)) + if (perf_time__ranges_skip_sample(rep->ptime_range, rep->range_num, + sample->time)) { return 0; + } if (machine__resolve(machine, , sample) < 0) { pr_debug("problem processing %d event, skipping it.\n", @@ -1093,9 +1098,25 @@ repeat: if (symbol__init(>header.env) < 0) goto error; - if (perf_time__parse_str(, report.time_str) != 0) { - pr_err("Invalid time string\n"); - return -EINVAL; + if (perf_time__parse_str(report.ptime_range, report.time_str) != 0) { + if (session->evlist->first_sample_time == 0 && + session->evlist->last_sample_time == 0) { + pr_err("No first/last sample time in perf data\n"); + return -EINVAL; + } + + report.range_num = perf_time__percent_parse_str( + report.ptime_range, PTIME_RANGE_MAX, + report.time_str, + session->evlist->first_sample_time, + session->evlist->last_sample_time); + +
[tip:perf/core] perf tools: Create function to perform multiple time range checking
Commit-ID: 9a9b8b4b2271e763c1600311a3d4ecc2ac359b55 Gitweb: https://git.kernel.org/tip/9a9b8b4b2271e763c1600311a3d4ecc2ac359b55 Author: Jin Yao <yao@linux.intel.com> AuthorDate: Fri, 8 Dec 2017 21:13:44 +0800 Committer: Arnaldo Carvalho de Melo <a...@redhat.com> CommitDate: Mon, 8 Jan 2018 11:41:06 -0300 perf tools: Create function to perform multiple time range checking Previous patch supports the multiple time range. For example, select the first and second 10% time slices. perf report --time 10%/1,10%/2 We need a function to check if a timestamp is in the ranges of [0, 10%) and [10%, 20%]. Note that it includes the last element in [10%, 20%] but it doesn't include the last element in [0, 10%). It's to avoid the overlap. This patch implments a new function perf_time__ranges_skip_sample for this checking. Change log: v4: Let perf_time__ranges_skip_sample be compatible with perf_time__skip_sample when only one time range. Signed-off-by: Jin Yao <yao@linux.intel.com> Acked-by: Jiri Olsa <jo...@kernel.org> Cc: Alexander Shishkin <alexander.shish...@linux.intel.com> Cc: Andi Kleen <a...@linux.intel.com> Cc: Kan Liang <kan.li...@intel.com> Cc: Peter Zijlstra <pet...@infradead.org> Link: http://lkml.kernel.org/r/1512738826-2628-5-git-send-email-yao@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <a...@redhat.com> --- tools/perf/util/time-utils.c | 28 tools/perf/util/time-utils.h | 3 +++ 2 files changed, 31 insertions(+) diff --git a/tools/perf/util/time-utils.c b/tools/perf/util/time-utils.c index 61c46022..3f7f18f 100644 --- a/tools/perf/util/time-utils.c +++ b/tools/perf/util/time-utils.c @@ -300,6 +300,34 @@ bool perf_time__skip_sample(struct perf_time_interval *ptime, u64 timestamp) return false; } +bool perf_time__ranges_skip_sample(struct perf_time_interval *ptime_buf, + int num, u64 timestamp) +{ + struct perf_time_interval *ptime; + int i; + + if ((timestamp == 0) || (num == 0)) + return false; + + if (num == 1) + return perf_time__skip_sample(_buf[0], timestamp); + + /* +* start/end of multiple time ranges must be valid. +*/ + for (i = 0; i < num; i++) { + ptime = _buf[i]; + + if (timestamp >= ptime->start && + ((timestamp < ptime->end && i < num - 1) || +(timestamp <= ptime->end && i == num - 1))) { + break; + } + } + + return (i == num) ? true : false; +} + int timestamp__scnprintf_usec(u64 timestamp, char *buf, size_t sz) { u64 sec = timestamp / NSEC_PER_SEC; diff --git a/tools/perf/util/time-utils.h b/tools/perf/util/time-utils.h index 2308723..34d5eba 100644 --- a/tools/perf/util/time-utils.h +++ b/tools/perf/util/time-utils.h @@ -18,6 +18,9 @@ int perf_time__percent_parse_str(struct perf_time_interval *ptime_buf, int num, bool perf_time__skip_sample(struct perf_time_interval *ptime, u64 timestamp); +bool perf_time__ranges_skip_sample(struct perf_time_interval *ptime_buf, + int num, u64 timestamp); + int timestamp__scnprintf_usec(u64 timestamp, char *buf, size_t sz); int fetch_current_timestamp(char *buf, size_t sz);
[tip:perf/core] perf tools: Create function to perform multiple time range checking
Commit-ID: 9a9b8b4b2271e763c1600311a3d4ecc2ac359b55 Gitweb: https://git.kernel.org/tip/9a9b8b4b2271e763c1600311a3d4ecc2ac359b55 Author: Jin Yao AuthorDate: Fri, 8 Dec 2017 21:13:44 +0800 Committer: Arnaldo Carvalho de Melo CommitDate: Mon, 8 Jan 2018 11:41:06 -0300 perf tools: Create function to perform multiple time range checking Previous patch supports the multiple time range. For example, select the first and second 10% time slices. perf report --time 10%/1,10%/2 We need a function to check if a timestamp is in the ranges of [0, 10%) and [10%, 20%]. Note that it includes the last element in [10%, 20%] but it doesn't include the last element in [0, 10%). It's to avoid the overlap. This patch implments a new function perf_time__ranges_skip_sample for this checking. Change log: v4: Let perf_time__ranges_skip_sample be compatible with perf_time__skip_sample when only one time range. Signed-off-by: Jin Yao Acked-by: Jiri Olsa Cc: Alexander Shishkin Cc: Andi Kleen Cc: Kan Liang Cc: Peter Zijlstra Link: http://lkml.kernel.org/r/1512738826-2628-5-git-send-email-yao@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/util/time-utils.c | 28 tools/perf/util/time-utils.h | 3 +++ 2 files changed, 31 insertions(+) diff --git a/tools/perf/util/time-utils.c b/tools/perf/util/time-utils.c index 61c46022..3f7f18f 100644 --- a/tools/perf/util/time-utils.c +++ b/tools/perf/util/time-utils.c @@ -300,6 +300,34 @@ bool perf_time__skip_sample(struct perf_time_interval *ptime, u64 timestamp) return false; } +bool perf_time__ranges_skip_sample(struct perf_time_interval *ptime_buf, + int num, u64 timestamp) +{ + struct perf_time_interval *ptime; + int i; + + if ((timestamp == 0) || (num == 0)) + return false; + + if (num == 1) + return perf_time__skip_sample(_buf[0], timestamp); + + /* +* start/end of multiple time ranges must be valid. +*/ + for (i = 0; i < num; i++) { + ptime = _buf[i]; + + if (timestamp >= ptime->start && + ((timestamp < ptime->end && i < num - 1) || +(timestamp <= ptime->end && i == num - 1))) { + break; + } + } + + return (i == num) ? true : false; +} + int timestamp__scnprintf_usec(u64 timestamp, char *buf, size_t sz) { u64 sec = timestamp / NSEC_PER_SEC; diff --git a/tools/perf/util/time-utils.h b/tools/perf/util/time-utils.h index 2308723..34d5eba 100644 --- a/tools/perf/util/time-utils.h +++ b/tools/perf/util/time-utils.h @@ -18,6 +18,9 @@ int perf_time__percent_parse_str(struct perf_time_interval *ptime_buf, int num, bool perf_time__skip_sample(struct perf_time_interval *ptime, u64 timestamp); +bool perf_time__ranges_skip_sample(struct perf_time_interval *ptime_buf, + int num, u64 timestamp); + int timestamp__scnprintf_usec(u64 timestamp, char *buf, size_t sz); int fetch_current_timestamp(char *buf, size_t sz);
[tip:perf/core] perf tools: Create function to parse time percent
Commit-ID: 13a70f350665580708ab11f725d3578eaacbf2d0 Gitweb: https://git.kernel.org/tip/13a70f350665580708ab11f725d3578eaacbf2d0 Author: Jin Yao <yao@linux.intel.com> AuthorDate: Fri, 8 Dec 2017 21:13:43 +0800 Committer: Arnaldo Carvalho de Melo <a...@redhat.com> CommitDate: Mon, 8 Jan 2018 11:39:09 -0300 perf tools: Create function to parse time percent Current perf report/script/... have a --time option to limit the time range of output. But right now it only supports absolute time, add support for time percentage. For example: 1. Select the second 10% time slice perf report --time 10%/2 2. Select from 0% to 10% time slice perf report --time 0%-10% It also support the multiple time ranges. 3. Select the first and second 10% time slices perf report --time 10%/1,10%/2 4. Select from 0% to 10% and 30% to 40% slices perf report --time 0%-10%,30%-40% Changelog: v4: An issue is found. Following passes. perf script --time 10%/10x12321xsdfdasfdsafdsafdsa Now it uses strtol to replace atoi. Committer notes: This just puts in place the infrastructure, so the examples in this cset comment will only work later, after more patches in this series are applied. Signed-off-by: Jin Yao <yao@linux.intel.com> Acked-by: Jiri Olsa <jo...@kernel.org> Cc: Alexander Shishkin <alexander.shish...@linux.intel.com> Cc: Andi Kleen <a...@linux.intel.com> Cc: Kan Liang <kan.li...@intel.com> Cc: Peter Zijlstra <pet...@infradead.org> Link: http://lkml.kernel.org/r/1512738826-2628-4-git-send-email-yao@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <a...@redhat.com> --- tools/perf/util/time-utils.c | 205 --- tools/perf/util/time-utils.h | 3 + 2 files changed, 196 insertions(+), 12 deletions(-) diff --git a/tools/perf/util/time-utils.c b/tools/perf/util/time-utils.c index 81927d0..61c46022 100644 --- a/tools/perf/util/time-utils.c +++ b/tools/perf/util/time-utils.c @@ -6,6 +6,7 @@ #include #include #include +#include #include "perf.h" #include "debug.h" @@ -60,11 +61,10 @@ static int parse_timestr_sec_nsec(struct perf_time_interval *ptime, return 0; } -int perf_time__parse_str(struct perf_time_interval *ptime, const char *ostr) +static int split_start_end(char **start, char **end, const char *ostr, char ch) { char *start_str, *end_str; char *d, *str; - int rc = 0; if (ostr == NULL || *ostr == '\0') return 0; @@ -74,25 +74,35 @@ int perf_time__parse_str(struct perf_time_interval *ptime, const char *ostr) if (str == NULL) return -ENOMEM; - ptime->start = 0; - ptime->end = 0; - - /* str has the format: , -* variations: , -* , -* , -*/ start_str = str; - d = strchr(start_str, ','); + d = strchr(start_str, ch); if (d) { *d = '\0'; ++d; } end_str = d; + *start = start_str; + *end = end_str; + + return 0; +} + +int perf_time__parse_str(struct perf_time_interval *ptime, const char *ostr) +{ + char *start_str = NULL, *end_str; + int rc; + + rc = split_start_end(_str, _str, ostr, ','); + if (rc || !start_str) + return rc; + + ptime->start = 0; + ptime->end = 0; + rc = parse_timestr_sec_nsec(ptime, start_str, end_str); - free(str); + free(start_str); /* make sure end time is after start time if it was given */ if (rc == 0 && ptime->end && ptime->end < ptime->start) @@ -104,6 +114,177 @@ int perf_time__parse_str(struct perf_time_interval *ptime, const char *ostr) return rc; } +static int parse_percent(double *pcnt, char *str) +{ + char *c; + + c = strchr(str, '%'); + if (c) + *c = '\0'; + else + return -1; + + *pcnt = atof(str) / 100.0; + + return 0; +} + +static int percent_slash_split(char *str, struct perf_time_interval *ptime, + u64 start, u64 end) +{ + char *p, *end_str; + double pcnt, start_pcnt, end_pcnt; + u64 total = end - start; + int i; + + /* +* Example: +* 10%/2: select the second 10% slice and the third 10% slice +*/ + + /* We can modify this string since the original one is copied */ + p = strchr(str, '/'); + if (!p) + return -1; + + *p = '\0'; + if (parse_percent(, str) < 0) + return -1; + + p++; + i = (int)strtol(p, _str, 10); + if (*end_str) + return -1; + + if (pcnt <= 0.0) + return -1; + + start_pcnt = pcnt * (i - 1); + end_pcnt = pcnt * i; + + if (start_pcnt < 0.0
[tip:perf/core] perf tools: Create function to parse time percent
Commit-ID: 13a70f350665580708ab11f725d3578eaacbf2d0 Gitweb: https://git.kernel.org/tip/13a70f350665580708ab11f725d3578eaacbf2d0 Author: Jin Yao AuthorDate: Fri, 8 Dec 2017 21:13:43 +0800 Committer: Arnaldo Carvalho de Melo CommitDate: Mon, 8 Jan 2018 11:39:09 -0300 perf tools: Create function to parse time percent Current perf report/script/... have a --time option to limit the time range of output. But right now it only supports absolute time, add support for time percentage. For example: 1. Select the second 10% time slice perf report --time 10%/2 2. Select from 0% to 10% time slice perf report --time 0%-10% It also support the multiple time ranges. 3. Select the first and second 10% time slices perf report --time 10%/1,10%/2 4. Select from 0% to 10% and 30% to 40% slices perf report --time 0%-10%,30%-40% Changelog: v4: An issue is found. Following passes. perf script --time 10%/10x12321xsdfdasfdsafdsafdsa Now it uses strtol to replace atoi. Committer notes: This just puts in place the infrastructure, so the examples in this cset comment will only work later, after more patches in this series are applied. Signed-off-by: Jin Yao Acked-by: Jiri Olsa Cc: Alexander Shishkin Cc: Andi Kleen Cc: Kan Liang Cc: Peter Zijlstra Link: http://lkml.kernel.org/r/1512738826-2628-4-git-send-email-yao@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/util/time-utils.c | 205 --- tools/perf/util/time-utils.h | 3 + 2 files changed, 196 insertions(+), 12 deletions(-) diff --git a/tools/perf/util/time-utils.c b/tools/perf/util/time-utils.c index 81927d0..61c46022 100644 --- a/tools/perf/util/time-utils.c +++ b/tools/perf/util/time-utils.c @@ -6,6 +6,7 @@ #include #include #include +#include #include "perf.h" #include "debug.h" @@ -60,11 +61,10 @@ static int parse_timestr_sec_nsec(struct perf_time_interval *ptime, return 0; } -int perf_time__parse_str(struct perf_time_interval *ptime, const char *ostr) +static int split_start_end(char **start, char **end, const char *ostr, char ch) { char *start_str, *end_str; char *d, *str; - int rc = 0; if (ostr == NULL || *ostr == '\0') return 0; @@ -74,25 +74,35 @@ int perf_time__parse_str(struct perf_time_interval *ptime, const char *ostr) if (str == NULL) return -ENOMEM; - ptime->start = 0; - ptime->end = 0; - - /* str has the format: , -* variations: , -* , -* , -*/ start_str = str; - d = strchr(start_str, ','); + d = strchr(start_str, ch); if (d) { *d = '\0'; ++d; } end_str = d; + *start = start_str; + *end = end_str; + + return 0; +} + +int perf_time__parse_str(struct perf_time_interval *ptime, const char *ostr) +{ + char *start_str = NULL, *end_str; + int rc; + + rc = split_start_end(_str, _str, ostr, ','); + if (rc || !start_str) + return rc; + + ptime->start = 0; + ptime->end = 0; + rc = parse_timestr_sec_nsec(ptime, start_str, end_str); - free(str); + free(start_str); /* make sure end time is after start time if it was given */ if (rc == 0 && ptime->end && ptime->end < ptime->start) @@ -104,6 +114,177 @@ int perf_time__parse_str(struct perf_time_interval *ptime, const char *ostr) return rc; } +static int parse_percent(double *pcnt, char *str) +{ + char *c; + + c = strchr(str, '%'); + if (c) + *c = '\0'; + else + return -1; + + *pcnt = atof(str) / 100.0; + + return 0; +} + +static int percent_slash_split(char *str, struct perf_time_interval *ptime, + u64 start, u64 end) +{ + char *p, *end_str; + double pcnt, start_pcnt, end_pcnt; + u64 total = end - start; + int i; + + /* +* Example: +* 10%/2: select the second 10% slice and the third 10% slice +*/ + + /* We can modify this string since the original one is copied */ + p = strchr(str, '/'); + if (!p) + return -1; + + *p = '\0'; + if (parse_percent(, str) < 0) + return -1; + + p++; + i = (int)strtol(p, _str, 10); + if (*end_str) + return -1; + + if (pcnt <= 0.0) + return -1; + + start_pcnt = pcnt * (i - 1); + end_pcnt = pcnt * i; + + if (start_pcnt < 0.0 || start_pcnt > 1.0 || + end_pcnt < 0.0 || end_pcnt > 1.0) { + return -1; + } + + ptime->start = start + round(start_pcnt * total); + ptime->end = start + round(end_pcnt * total); + + retur
[tip:perf/core] perf header: Add infrastructure to record first and last sample time
Commit-ID: 6011518db3bd04c80cd3ce3e6aea1c399739adb4 Gitweb: https://git.kernel.org/tip/6011518db3bd04c80cd3ce3e6aea1c399739adb4 Author: Jin Yao <yao@linux.intel.com> AuthorDate: Fri, 8 Dec 2017 21:13:41 +0800 Committer: Arnaldo Carvalho de Melo <a...@redhat.com> CommitDate: Mon, 8 Jan 2018 11:20:51 -0300 perf header: Add infrastructure to record first and last sample time perf report/script/... have a --time option to limit the time range of output. That's very useful to slice large traces, e.g. when processing the output of perf script for some analysis. But right now --time only supports absolute time. Also there is no fast way to get the start/end times of a given trace except for looking at it. This makes it hard to e.g. only decode the first half of the trace, which is useful for parallelization of scripts Another problem is that perf records are variable size and there is no synchronization mechanism. So the only way to find the last sample reliably would be to walk all samples. But we want to avoid that in perf report/... because it is already quite expensive. That is why storing the first sample time and last sample time in perf record is better. This patch creates a new header feature type HEADER_SAMPLE_TIME and related ops. Save the first sample time and the last sample time to the feature section in perf file header. That will be done when, for instance, processing build-ids, where we already have to process all samples to create the build-id table, take advantage of that to further amortize that processing by storing HEADER_SAMPLE_TIME to make 'perf report/script' faster when using --time. Committer testing: After this patch is applied the header is written with zeroes, we need the next patch, for "perf record" to actually write the timestamps: # perf report -D | grep PERF_RECORD_SAMPLE\( 22501155244406 0x44f0 [0x28]: PERF_RECORD_SAMPLE(IP, 0x4001): 25016/25016: 0xa21be8c5 period: 1 addr: 0 22501155793625 0x4a30 [0x28]: PERF_RECORD_SAMPLE(IP, 0x4001): 25016/25016: 0xa21ffd50 period: 2828043 addr: 0 # perf report --header | grep "time of " # time of first sample : 0.00 # time of last sample : 0.00 # Changelog: v7: 1. Rebase to latest perf/core branch. 2. Add following clarification in patch description according to Arnaldo's suggestion. "That will be done when, for instance, processing build-ids, where we already have to process all samples to create the build-id table, take advantage of that to further amortize that processing by storing HEADER_SAMPLE_TIME to make 'perf report/script' faster when using --time." v4: Use perf script time style for timestamp printing. Also add with the printing of sample duration. v3: Remove the definitions of first_sample_time/last_sample_time from perf_session. Just define them in perf_evlist Signed-off-by: Jin Yao <yao@linux.intel.com> Acked-by: Jiri Olsa <jo...@kernel.org> Tested-by: Arnaldo Carvalho de Melo <a...@redhat.com> Cc: Alexander Shishkin <alexander.shish...@linux.intel.com> Cc: Andi Kleen <a...@linux.intel.com> Cc: Kan Liang <kan.li...@intel.com> Cc: Peter Zijlstra <pet...@infradead.org> Link: http://lkml.kernel.org/r/1512738826-2628-2-git-send-email-yao@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <a...@redhat.com> --- tools/perf/Documentation/perf.data-file-format.txt | 4 ++ tools/perf/util/evlist.h | 2 + tools/perf/util/header.c | 60 ++ tools/perf/util/header.h | 1 + 4 files changed, 67 insertions(+) diff --git a/tools/perf/Documentation/perf.data-file-format.txt b/tools/perf/Documentation/perf.data-file-format.txt index 15e8b48..f7d85e8 100644 --- a/tools/perf/Documentation/perf.data-file-format.txt +++ b/tools/perf/Documentation/perf.data-file-format.txt @@ -261,6 +261,10 @@ struct { struct perf_header_string map; }[number_of_cache_levels]; + HEADER_SAMPLE_TIME = 21, + +Two uint64_t for the time of first sample and the time of last sample. + other bits are reserved and should ignored for now HEADER_FEAT_BITS= 256, diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h index 7516066..e7fbca6 100644 --- a/tools/perf/util/evlist.h +++ b/tools/perf/util/evlist.h @@ -50,6 +50,8 @@ struct perf_evlist { struct perf_evsel *selected; struct events_stats stats; struct perf_env *env; + u64 first_sample_time; + u64 last_sample_time; }; struct perf_evsel_str_handler { diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c index ca73aa7..a326e0d 100644 --- a/tools/perf/util/header.c +++ b/tools/perf/util/header.c @@ -16,6 +16,7 @@ #include #include #include +#include #include &
[tip:perf/core] perf record: Record the first and last sample time in the header
Commit-ID: 68588baf8d01826673f2874f434123029e519052 Gitweb: https://git.kernel.org/tip/68588baf8d01826673f2874f434123029e519052 Author: Jin Yao <yao@linux.intel.com> AuthorDate: Fri, 8 Dec 2017 21:13:42 +0800 Committer: Arnaldo Carvalho de Melo <a...@redhat.com> CommitDate: Mon, 8 Jan 2018 11:20:56 -0300 perf record: Record the first and last sample time in the header In the default 'perf record' configuration, all samples are processed, to create the HEADER_BUILD_ID table. So it's very easy to get the first/last samples and save the time to perf file header via the function write_sample_time(). Later, at post processing time, perf report/script will fetch the time from perf file header. Committer testing: # perf record -a sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 2.099 MB perf.data (1101 samples) ] [root@jouet home]# perf report --header | grep "time of " # time of first sample : 22947.909226 # time of last sample : 22948.910704 # # perf report -D | grep PERF_RECORD_SAMPLE\( 0 22947909226101 0x20bb68 [0x30]: PERF_RECORD_SAMPLE(IP, 0x4001): 0/0: 0xa21b1af3 period: 1 addr: 0 0 22947909229928 0x20bb98 [0x30]: PERF_RECORD_SAMPLE(IP, 0x4001): 0/0: 0xa200d204 period: 1 addr: 0 3 22948910397351 0x219360 [0x30]: PERF_RECORD_SAMPLE(IP, 0x4001): 28251/28251: 0xa22071d8 period: 169518 addr: 0 0 22948910652380 0x20f120 [0x30]: PERF_RECORD_SAMPLE(IP, 0x4001): 0/0: 0xa2856816 period: 198807 addr: 0 2 22948910704034 0x2172d0 [0x30]: PERF_RECORD_SAMPLE(IP, 0x4001): 0/0: 0xa2856816 period: 88111 addr: 0 # Changelog: v7: Just update the patch description according to Arnaldo's suggestion. v6: Currently '--buildid-all' is not enabled at default. So the walking on all samples is the default operation. There is no big overhead to calculate the timestamp boundary in process_sample_event handler once we already go through all samples. So the timestamp boundary calculation is enabled by default when '--buildid-all' is not enabled. While if '--buildid-all' is enabled, we creates a new option "--timestamp-boundary" for user to decide if it enables the timestamp boundary calculation. v5: There is an issue that the sample walking can only work when '--buildid-all' is not enabled. So we need to let the walking be able to work even if '--buildid-all' is enabled and let the processing skips the dso hit marking for this case. At first, I want to provide a new option "--record-time-boundaries". While after consideration, I think a new option is not very necessary. v3: Remove the definitions of first_sample_time and last_sample_time from struct record and directly save them in perf_evlist. Signed-off-by: Jin Yao <yao@linux.intel.com> Acked-by: Jiri Olsa <jo...@kernel.org> Tested-by: Arnaldo Carvalho de Melo <a...@redhat.com> Cc: Alexander Shishkin <alexander.shish...@linux.intel.com> Cc: Andi Kleen <a...@linux.intel.com> Cc: Kan Liang <kan.li...@intel.com> Cc: Peter Zijlstra <pet...@infradead.org> Link: http://lkml.kernel.org/r/1512738826-2628-3-git-send-email-yao@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <a...@redhat.com> --- tools/perf/Documentation/perf-record.txt | 3 +++ tools/perf/builtin-record.c | 18 +++--- 2 files changed, 18 insertions(+), 3 deletions(-) diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt index 5a626ef..3eea6de 100644 --- a/tools/perf/Documentation/perf-record.txt +++ b/tools/perf/Documentation/perf-record.txt @@ -430,6 +430,9 @@ Configure all used events to run in user space. --timestamp-filename Append timestamp to output file name. +--timestamp-boundary:: +Record timestamp boundary (time of first/last samples). + --switch-output[=mode]:: Generate multiple perf.data files, timestamp prefixed, switching to a new one based on 'mode' value: diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c index 50385d8..65681a1 100644 --- a/tools/perf/builtin-record.c +++ b/tools/perf/builtin-record.c @@ -78,6 +78,7 @@ struct record { boolno_buildid_cache_set; boolbuildid_all; booltimestamp_filename; + booltimestamp_boundary; struct switch_outputswitch_output; unsigned long long samples; }; @@ -409,8 +410,15 @@ static int process_sample_event(struct perf_tool *tool, { struct record *rec = container_of(tool, struct record, tool); - rec->samples++; + if (rec->evlist->first_sample_time == 0) + rec->evlist->first_sample_time = sample->time; + + rec->evlist->last_sample_time = sample->time; + if (rec->bu
[tip:perf/core] perf header: Add infrastructure to record first and last sample time
Commit-ID: 6011518db3bd04c80cd3ce3e6aea1c399739adb4 Gitweb: https://git.kernel.org/tip/6011518db3bd04c80cd3ce3e6aea1c399739adb4 Author: Jin Yao AuthorDate: Fri, 8 Dec 2017 21:13:41 +0800 Committer: Arnaldo Carvalho de Melo CommitDate: Mon, 8 Jan 2018 11:20:51 -0300 perf header: Add infrastructure to record first and last sample time perf report/script/... have a --time option to limit the time range of output. That's very useful to slice large traces, e.g. when processing the output of perf script for some analysis. But right now --time only supports absolute time. Also there is no fast way to get the start/end times of a given trace except for looking at it. This makes it hard to e.g. only decode the first half of the trace, which is useful for parallelization of scripts Another problem is that perf records are variable size and there is no synchronization mechanism. So the only way to find the last sample reliably would be to walk all samples. But we want to avoid that in perf report/... because it is already quite expensive. That is why storing the first sample time and last sample time in perf record is better. This patch creates a new header feature type HEADER_SAMPLE_TIME and related ops. Save the first sample time and the last sample time to the feature section in perf file header. That will be done when, for instance, processing build-ids, where we already have to process all samples to create the build-id table, take advantage of that to further amortize that processing by storing HEADER_SAMPLE_TIME to make 'perf report/script' faster when using --time. Committer testing: After this patch is applied the header is written with zeroes, we need the next patch, for "perf record" to actually write the timestamps: # perf report -D | grep PERF_RECORD_SAMPLE\( 22501155244406 0x44f0 [0x28]: PERF_RECORD_SAMPLE(IP, 0x4001): 25016/25016: 0xa21be8c5 period: 1 addr: 0 22501155793625 0x4a30 [0x28]: PERF_RECORD_SAMPLE(IP, 0x4001): 25016/25016: 0xa21ffd50 period: 2828043 addr: 0 # perf report --header | grep "time of " # time of first sample : 0.00 # time of last sample : 0.00 # Changelog: v7: 1. Rebase to latest perf/core branch. 2. Add following clarification in patch description according to Arnaldo's suggestion. "That will be done when, for instance, processing build-ids, where we already have to process all samples to create the build-id table, take advantage of that to further amortize that processing by storing HEADER_SAMPLE_TIME to make 'perf report/script' faster when using --time." v4: Use perf script time style for timestamp printing. Also add with the printing of sample duration. v3: Remove the definitions of first_sample_time/last_sample_time from perf_session. Just define them in perf_evlist Signed-off-by: Jin Yao Acked-by: Jiri Olsa Tested-by: Arnaldo Carvalho de Melo Cc: Alexander Shishkin Cc: Andi Kleen Cc: Kan Liang Cc: Peter Zijlstra Link: http://lkml.kernel.org/r/1512738826-2628-2-git-send-email-yao@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/Documentation/perf.data-file-format.txt | 4 ++ tools/perf/util/evlist.h | 2 + tools/perf/util/header.c | 60 ++ tools/perf/util/header.h | 1 + 4 files changed, 67 insertions(+) diff --git a/tools/perf/Documentation/perf.data-file-format.txt b/tools/perf/Documentation/perf.data-file-format.txt index 15e8b48..f7d85e8 100644 --- a/tools/perf/Documentation/perf.data-file-format.txt +++ b/tools/perf/Documentation/perf.data-file-format.txt @@ -261,6 +261,10 @@ struct { struct perf_header_string map; }[number_of_cache_levels]; + HEADER_SAMPLE_TIME = 21, + +Two uint64_t for the time of first sample and the time of last sample. + other bits are reserved and should ignored for now HEADER_FEAT_BITS= 256, diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h index 7516066..e7fbca6 100644 --- a/tools/perf/util/evlist.h +++ b/tools/perf/util/evlist.h @@ -50,6 +50,8 @@ struct perf_evlist { struct perf_evsel *selected; struct events_stats stats; struct perf_env *env; + u64 first_sample_time; + u64 last_sample_time; }; struct perf_evsel_str_handler { diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c index ca73aa7..a326e0d 100644 --- a/tools/perf/util/header.c +++ b/tools/perf/util/header.c @@ -16,6 +16,7 @@ #include #include #include +#include #include "evlist.h" #include "evsel.h" @@ -35,6 +36,7 @@ #include #include "asm/bug.h" #include "tool.h" +#include "time-utils.h" #include "sane_ctype.h" @@ -1180,6 +1182,20 @@ static int write_stat(struct feat_fd *
[tip:perf/core] perf record: Record the first and last sample time in the header
Commit-ID: 68588baf8d01826673f2874f434123029e519052 Gitweb: https://git.kernel.org/tip/68588baf8d01826673f2874f434123029e519052 Author: Jin Yao AuthorDate: Fri, 8 Dec 2017 21:13:42 +0800 Committer: Arnaldo Carvalho de Melo CommitDate: Mon, 8 Jan 2018 11:20:56 -0300 perf record: Record the first and last sample time in the header In the default 'perf record' configuration, all samples are processed, to create the HEADER_BUILD_ID table. So it's very easy to get the first/last samples and save the time to perf file header via the function write_sample_time(). Later, at post processing time, perf report/script will fetch the time from perf file header. Committer testing: # perf record -a sleep 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 2.099 MB perf.data (1101 samples) ] [root@jouet home]# perf report --header | grep "time of " # time of first sample : 22947.909226 # time of last sample : 22948.910704 # # perf report -D | grep PERF_RECORD_SAMPLE\( 0 22947909226101 0x20bb68 [0x30]: PERF_RECORD_SAMPLE(IP, 0x4001): 0/0: 0xa21b1af3 period: 1 addr: 0 0 22947909229928 0x20bb98 [0x30]: PERF_RECORD_SAMPLE(IP, 0x4001): 0/0: 0xa200d204 period: 1 addr: 0 3 22948910397351 0x219360 [0x30]: PERF_RECORD_SAMPLE(IP, 0x4001): 28251/28251: 0xa22071d8 period: 169518 addr: 0 0 22948910652380 0x20f120 [0x30]: PERF_RECORD_SAMPLE(IP, 0x4001): 0/0: 0xa2856816 period: 198807 addr: 0 2 22948910704034 0x2172d0 [0x30]: PERF_RECORD_SAMPLE(IP, 0x4001): 0/0: 0xa2856816 period: 88111 addr: 0 # Changelog: v7: Just update the patch description according to Arnaldo's suggestion. v6: Currently '--buildid-all' is not enabled at default. So the walking on all samples is the default operation. There is no big overhead to calculate the timestamp boundary in process_sample_event handler once we already go through all samples. So the timestamp boundary calculation is enabled by default when '--buildid-all' is not enabled. While if '--buildid-all' is enabled, we creates a new option "--timestamp-boundary" for user to decide if it enables the timestamp boundary calculation. v5: There is an issue that the sample walking can only work when '--buildid-all' is not enabled. So we need to let the walking be able to work even if '--buildid-all' is enabled and let the processing skips the dso hit marking for this case. At first, I want to provide a new option "--record-time-boundaries". While after consideration, I think a new option is not very necessary. v3: Remove the definitions of first_sample_time and last_sample_time from struct record and directly save them in perf_evlist. Signed-off-by: Jin Yao Acked-by: Jiri Olsa Tested-by: Arnaldo Carvalho de Melo Cc: Alexander Shishkin Cc: Andi Kleen Cc: Kan Liang Cc: Peter Zijlstra Link: http://lkml.kernel.org/r/1512738826-2628-3-git-send-email-yao@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/Documentation/perf-record.txt | 3 +++ tools/perf/builtin-record.c | 18 +++--- 2 files changed, 18 insertions(+), 3 deletions(-) diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt index 5a626ef..3eea6de 100644 --- a/tools/perf/Documentation/perf-record.txt +++ b/tools/perf/Documentation/perf-record.txt @@ -430,6 +430,9 @@ Configure all used events to run in user space. --timestamp-filename Append timestamp to output file name. +--timestamp-boundary:: +Record timestamp boundary (time of first/last samples). + --switch-output[=mode]:: Generate multiple perf.data files, timestamp prefixed, switching to a new one based on 'mode' value: diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c index 50385d8..65681a1 100644 --- a/tools/perf/builtin-record.c +++ b/tools/perf/builtin-record.c @@ -78,6 +78,7 @@ struct record { boolno_buildid_cache_set; boolbuildid_all; booltimestamp_filename; + booltimestamp_boundary; struct switch_outputswitch_output; unsigned long long samples; }; @@ -409,8 +410,15 @@ static int process_sample_event(struct perf_tool *tool, { struct record *rec = container_of(tool, struct record, tool); - rec->samples++; + if (rec->evlist->first_sample_time == 0) + rec->evlist->first_sample_time = sample->time; + + rec->evlist->last_sample_time = sample->time; + if (rec->buildid_all) + return 0; + + rec->samples++; return build_id__mark_dso_hit(tool, event, sample, evsel, machine); } @@ -435,9 +443,11 @@ static int process_buildids(struct record *rec) /* * If --buildid-all is given, it marks all D
[tip:perf/core] perf report: Fix a wrong offset issue when using /proc/kcore
Commit-ID: 935f5a9d4500020879858c9224c98dfabf16101d Gitweb: https://git.kernel.org/tip/935f5a9d4500020879858c9224c98dfabf16101d Author: Jin Yao <yao@linux.intel.com> AuthorDate: Sat, 30 Dec 2017 00:26:52 +0800 Committer: Arnaldo Carvalho de Melo <a...@redhat.com> CommitDate: Mon, 8 Jan 2018 11:11:57 -0300 perf report: Fix a wrong offset issue when using /proc/kcore When a valid vmlinux is not found, 'perf report' falls back to look at /proc/kcore. In this case, it will report the impossible large offset. For example: # perf record -b -e cycles:k find /etc/ > /dev/null # perf report --stdio --branch-history 22.77% _vm_normal_page+18446603336221188162 | ---page_remove_rmap +18446603336221188324 page_remove_rmap +18446603336221188487 (cycles:5) unlock_page_memcg +18446603336221188096 page_remove_rmap +18446603336221188327 (cycles:1) The issue is the value which is passed to parameter 'addr' in __get_srcline() is the objdump address. It's not correct if we calculate the offset by using 'addr - sym->start'. This patch creates a new parameter 'ip' in __get_srcline(). It is not converted to objdump address. With this patch, the perf report output is: 22.77% _vm_normal_page+66 | ---page_remove_rmap +228 page_remove_rmap +391 (cycles:5) unlock_page_memcg +0 page_remove_rmap +231 (cycles:1) page_remove_rmap +236 Committer testing: Make sure you get any valid vmlinux out of the way, using '-v' on the 'perf report' case and deleting it from places where perf searches them, like your kernel build dir and the build-id cache, in ~/.debug/. Reported-by: Arnaldo Carvalho de Melo <a...@redhat.com> Signed-off-by: Jin Yao <yao@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <a...@redhat.com> Cc: Alexander Shishkin <alexander.shish...@linux.intel.com> Cc: Andi Kleen <a...@linux.intel.com> Cc: Jiri Olsa <jo...@kernel.org> Cc: Kan Liang <kan.li...@intel.com> Cc: Peter Zijlstra <pet...@infradead.org> Link: http://lkml.kernel.org/r/1514564812-17344-1-git-send-email-yao@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <a...@redhat.com> --- tools/perf/util/annotate.c | 3 ++- tools/perf/util/machine.c | 2 +- tools/perf/util/map.c | 2 +- tools/perf/util/sort.c | 16 ++-- tools/perf/util/srcline.c | 9 + tools/perf/util/srcline.h | 5 +++-- 6 files changed, 22 insertions(+), 15 deletions(-) diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c index 68e687d..28b233c 100644 --- a/tools/perf/util/annotate.c +++ b/tools/perf/util/annotate.c @@ -1960,7 +1960,8 @@ static void annotation__calc_lines(struct annotation *notes, struct map *map, if (percent_max <= 0.5) continue; - al->path = get_srcline(map->dso, start + al->offset, NULL, false, true); + al->path = get_srcline(map->dso, start + al->offset, NULL, + false, true, start + al->offset); insert_source_line(_root, al); } diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c index 64d255f..b05a674 100644 --- a/tools/perf/util/machine.c +++ b/tools/perf/util/machine.c @@ -1726,7 +1726,7 @@ static char *callchain_srcline(struct map *map, struct symbol *sym, u64 ip) bool show_addr = callchain_param.key == CCKEY_ADDRESS; srcline = get_srcline(map->dso, map__rip_2objdump(map, ip), - sym, show_sym, show_addr); + sym, show_sym, show_addr, ip); srcline__tree_insert(>dso->srclines, ip, srcline); } diff --git a/tools/perf/util/map.c b/tools/perf/util/map.c index 6d40efd..8fe5703 100644 --- a/tools/perf/util/map.c +++ b/tools/perf/util/map.c @@ -419,7 +419,7 @@ int map__fprintf_srcline(struct map *map, u64 addr, const char *prefix, if (map && map->dso) { srcline = get_srcline(map->dso, map__rip_2objdump(map, addr), NULL, - true, true); + true, true, addr); if (srcline != SRCLINE_UNKNOWN) ret = fprintf(fp, "%s%s", prefix, srcline); free_srcline(srcline); diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c index a00eacd..211e7f3 100644 --- a/tools/perf/util/sort.c +++ b/tools/perf/util/sort.c @@ -336,7 +336,7 @@ char *hist_entry__get_srcline(struct hist_entry *he) return SRCLINE_UNKNOWN; return get_srcline(map->dso, map__rip_2objdump(map, he->ip), - he->ms.sym, true,
[tip:perf/core] perf report: Fix a no annotate browser displayed issue
Commit-ID: 40c39e3046411f84bab82f66783ff3593e2bcd9b Gitweb: https://git.kernel.org/tip/40c39e3046411f84bab82f66783ff3593e2bcd9b Author: Jin Yao <yao@linux.intel.com> AuthorDate: Tue, 26 Dec 2017 18:42:43 +0800 Committer: Arnaldo Carvalho de Melo <a...@redhat.com> CommitDate: Mon, 8 Jan 2018 11:11:57 -0300 perf report: Fix a no annotate browser displayed issue When enabling '-b' option in perf record, for example, perf record -b ... perf report and then browsing the annotate browser from perf report (press 'A'), it would fail (annotate browser can't be displayed). It's because the '.add_entry_cb' op of struct report is overwritten by hist_iter__branch_callback() in builtin-report.c. But this function doesn't do something like mapping symbols and sources. So next, do_annotate() will return directly. notes = symbol__annotation(act->ms.sym); if (!notes->src) return 0; This patch adds the lost code to hist_iter__branch_callback (refer to hist_iter__report_callback). v2: Fix a crash bug when perform 'perf report --stdio'. The reason is that we init the symbol annotation only in browser mode, it doesn't allocate/init resources for stdio mode. So now in hist_iter__branch_callback(), it will return directly if it's not in browser mode. Signed-off-by: Jin Yao <yao@linux.intel.com> Tested-by: Arnaldo Carvalho de Melo <a...@redhat.com> Cc: Alexander Shishkin <alexander.shish...@linux.intel.com> Cc: Andi Kleen <a...@linux.intel.com> Cc: Jiri Olsa <jo...@kernel.org> Cc: Kan Liang <kan.li...@intel.com> Cc: Peter Zijlstra <pet...@infradead.org> Link: http://lkml.kernel.org/r/1514284963-18587-1-git-send-email-yao@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <a...@redhat.com> --- tools/perf/builtin-report.c | 18 +- 1 file changed, 17 insertions(+), 1 deletion(-) diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c index eb9ce63..07827cd 100644 --- a/tools/perf/builtin-report.c +++ b/tools/perf/builtin-report.c @@ -162,12 +162,28 @@ static int hist_iter__branch_callback(struct hist_entry_iter *iter, struct hist_entry *he = iter->he; struct report *rep = arg; struct branch_info *bi; + struct perf_sample *sample = iter->sample; + struct perf_evsel *evsel = iter->evsel; + int err; + + if (!ui__has_annotation()) + return 0; + + hist__account_cycles(sample->branch_stack, al, sample, +rep->nonany_branch_mode); bi = he->branch_info; + err = addr_map_symbol__inc_samples(>from, sample, evsel->idx); + if (err) + goto out; + + err = addr_map_symbol__inc_samples(>to, sample, evsel->idx); + branch_type_count(>brtype_stat, >flags, bi->from.addr, bi->to.addr); - return 0; +out: + return err; } static int process_sample_event(struct perf_tool *tool,
[tip:perf/core] perf report: Fix a no annotate browser displayed issue
Commit-ID: 40c39e3046411f84bab82f66783ff3593e2bcd9b Gitweb: https://git.kernel.org/tip/40c39e3046411f84bab82f66783ff3593e2bcd9b Author: Jin Yao AuthorDate: Tue, 26 Dec 2017 18:42:43 +0800 Committer: Arnaldo Carvalho de Melo CommitDate: Mon, 8 Jan 2018 11:11:57 -0300 perf report: Fix a no annotate browser displayed issue When enabling '-b' option in perf record, for example, perf record -b ... perf report and then browsing the annotate browser from perf report (press 'A'), it would fail (annotate browser can't be displayed). It's because the '.add_entry_cb' op of struct report is overwritten by hist_iter__branch_callback() in builtin-report.c. But this function doesn't do something like mapping symbols and sources. So next, do_annotate() will return directly. notes = symbol__annotation(act->ms.sym); if (!notes->src) return 0; This patch adds the lost code to hist_iter__branch_callback (refer to hist_iter__report_callback). v2: Fix a crash bug when perform 'perf report --stdio'. The reason is that we init the symbol annotation only in browser mode, it doesn't allocate/init resources for stdio mode. So now in hist_iter__branch_callback(), it will return directly if it's not in browser mode. Signed-off-by: Jin Yao Tested-by: Arnaldo Carvalho de Melo Cc: Alexander Shishkin Cc: Andi Kleen Cc: Jiri Olsa Cc: Kan Liang Cc: Peter Zijlstra Link: http://lkml.kernel.org/r/1514284963-18587-1-git-send-email-yao@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/builtin-report.c | 18 +- 1 file changed, 17 insertions(+), 1 deletion(-) diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c index eb9ce63..07827cd 100644 --- a/tools/perf/builtin-report.c +++ b/tools/perf/builtin-report.c @@ -162,12 +162,28 @@ static int hist_iter__branch_callback(struct hist_entry_iter *iter, struct hist_entry *he = iter->he; struct report *rep = arg; struct branch_info *bi; + struct perf_sample *sample = iter->sample; + struct perf_evsel *evsel = iter->evsel; + int err; + + if (!ui__has_annotation()) + return 0; + + hist__account_cycles(sample->branch_stack, al, sample, +rep->nonany_branch_mode); bi = he->branch_info; + err = addr_map_symbol__inc_samples(>from, sample, evsel->idx); + if (err) + goto out; + + err = addr_map_symbol__inc_samples(>to, sample, evsel->idx); + branch_type_count(>brtype_stat, >flags, bi->from.addr, bi->to.addr); - return 0; +out: + return err; } static int process_sample_event(struct perf_tool *tool,
[tip:perf/core] perf report: Fix a wrong offset issue when using /proc/kcore
Commit-ID: 935f5a9d4500020879858c9224c98dfabf16101d Gitweb: https://git.kernel.org/tip/935f5a9d4500020879858c9224c98dfabf16101d Author: Jin Yao AuthorDate: Sat, 30 Dec 2017 00:26:52 +0800 Committer: Arnaldo Carvalho de Melo CommitDate: Mon, 8 Jan 2018 11:11:57 -0300 perf report: Fix a wrong offset issue when using /proc/kcore When a valid vmlinux is not found, 'perf report' falls back to look at /proc/kcore. In this case, it will report the impossible large offset. For example: # perf record -b -e cycles:k find /etc/ > /dev/null # perf report --stdio --branch-history 22.77% _vm_normal_page+18446603336221188162 | ---page_remove_rmap +18446603336221188324 page_remove_rmap +18446603336221188487 (cycles:5) unlock_page_memcg +18446603336221188096 page_remove_rmap +18446603336221188327 (cycles:1) The issue is the value which is passed to parameter 'addr' in __get_srcline() is the objdump address. It's not correct if we calculate the offset by using 'addr - sym->start'. This patch creates a new parameter 'ip' in __get_srcline(). It is not converted to objdump address. With this patch, the perf report output is: 22.77% _vm_normal_page+66 | ---page_remove_rmap +228 page_remove_rmap +391 (cycles:5) unlock_page_memcg +0 page_remove_rmap +231 (cycles:1) page_remove_rmap +236 Committer testing: Make sure you get any valid vmlinux out of the way, using '-v' on the 'perf report' case and deleting it from places where perf searches them, like your kernel build dir and the build-id cache, in ~/.debug/. Reported-by: Arnaldo Carvalho de Melo Signed-off-by: Jin Yao Tested-by: Arnaldo Carvalho de Melo Cc: Alexander Shishkin Cc: Andi Kleen Cc: Jiri Olsa Cc: Kan Liang Cc: Peter Zijlstra Link: http://lkml.kernel.org/r/1514564812-17344-1-git-send-email-yao@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/util/annotate.c | 3 ++- tools/perf/util/machine.c | 2 +- tools/perf/util/map.c | 2 +- tools/perf/util/sort.c | 16 ++-- tools/perf/util/srcline.c | 9 + tools/perf/util/srcline.h | 5 +++-- 6 files changed, 22 insertions(+), 15 deletions(-) diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c index 68e687d..28b233c 100644 --- a/tools/perf/util/annotate.c +++ b/tools/perf/util/annotate.c @@ -1960,7 +1960,8 @@ static void annotation__calc_lines(struct annotation *notes, struct map *map, if (percent_max <= 0.5) continue; - al->path = get_srcline(map->dso, start + al->offset, NULL, false, true); + al->path = get_srcline(map->dso, start + al->offset, NULL, + false, true, start + al->offset); insert_source_line(_root, al); } diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c index 64d255f..b05a674 100644 --- a/tools/perf/util/machine.c +++ b/tools/perf/util/machine.c @@ -1726,7 +1726,7 @@ static char *callchain_srcline(struct map *map, struct symbol *sym, u64 ip) bool show_addr = callchain_param.key == CCKEY_ADDRESS; srcline = get_srcline(map->dso, map__rip_2objdump(map, ip), - sym, show_sym, show_addr); + sym, show_sym, show_addr, ip); srcline__tree_insert(>dso->srclines, ip, srcline); } diff --git a/tools/perf/util/map.c b/tools/perf/util/map.c index 6d40efd..8fe5703 100644 --- a/tools/perf/util/map.c +++ b/tools/perf/util/map.c @@ -419,7 +419,7 @@ int map__fprintf_srcline(struct map *map, u64 addr, const char *prefix, if (map && map->dso) { srcline = get_srcline(map->dso, map__rip_2objdump(map, addr), NULL, - true, true); + true, true, addr); if (srcline != SRCLINE_UNKNOWN) ret = fprintf(fp, "%s%s", prefix, srcline); free_srcline(srcline); diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c index a00eacd..211e7f3 100644 --- a/tools/perf/util/sort.c +++ b/tools/perf/util/sort.c @@ -336,7 +336,7 @@ char *hist_entry__get_srcline(struct hist_entry *he) return SRCLINE_UNKNOWN; return get_srcline(map->dso, map__rip_2objdump(map, he->ip), - he->ms.sym, true, true); + he->ms.sym, true, true, he->ip); } static int64_t @@ -380,7 +380,8 @@ sort__srcline_from_cmp(struct hist_entry *left, struct hist_entry *right) map__rip_2objdump(map,
[PATCH] perf util: Display warning when perf report/annotate is missing some libs
We keep having bug reports that when users build perf on their own, but they don't install some needed libraries like libelf, libbfd/libibery. The perf can build, but it is missing important functionality. For example, perf report doesn't display any symbols and perf annotate doesn't work. This patch displays warnings that these libraries are missing in build when perf report / perf annotate are used. Signed-off-by: Jin Yao <yao@linux.intel.com> --- tools/perf/builtin-annotate.c | 2 ++ tools/perf/builtin-report.c | 2 ++ tools/perf/util/symbol.c | 21 + tools/perf/util/symbol.h | 2 ++ 4 files changed, 27 insertions(+) diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c index f15731a..609eb7a 100644 --- a/tools/perf/builtin-annotate.c +++ b/tools/perf/builtin-annotate.c @@ -511,6 +511,8 @@ int cmd_annotate(int argc, const char **argv) setup_browser(true); + build_lib_warning(); + ret = __cmd_annotate(); out_delete: diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c index dd4df9a..550adb7 100644 --- a/tools/perf/builtin-report.c +++ b/tools/perf/builtin-report.c @@ -1293,6 +1293,8 @@ int cmd_report(int argc, const char **argv) } } + build_lib_warning(); + if (symbol__init(>header.env) < 0) goto error; diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c index cc065d4..4205ba8 100644 --- a/tools/perf/util/symbol.c +++ b/tools/perf/util/symbol.c @@ -2224,3 +2224,24 @@ int symbol__config_symfs(const struct option *opt __maybe_unused, free(bf); return 0; } + +void build_lib_warning(void) +{ +#ifndef HAVE_LIBELF_SUPPORT + pr_warning("Symbols are disabled!\n" + "Please install libelf-dev, libelf-devel " + "or elfutils-libelf-devel before building perf.\n"); +#endif + +#ifndef HAVE_DWARF_SUPPORT + pr_warning("Unwind support is disabled!\n" + "Please install elfutils-devel/libdw-dev " + "before building perf.\n"); +#endif + +#ifndef HAVE_LIBBFD_SUPPORT + pr_warning("C++ demangling and line numbers are disabled!\n" + "Please install binutils-dev[el]/" + "libiberty-dev before building perf.\n"); +#endif +} diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h index 0563f33..d88e046 100644 --- a/tools/perf/util/symbol.h +++ b/tools/perf/util/symbol.h @@ -383,6 +383,8 @@ int get_sdt_note_list(struct list_head *head, const char *target); int cleanup_sdt_note_list(struct list_head *sdt_notes); int sdt_notes__get_count(struct list_head *start); +void build_lib_warning(void); + #define SDT_BASE_SCN ".stapsdt.base" #define SDT_NOTE_SCN ".note.stapsdt" #define SDT_NOTE_TYPE 3 -- 2.7.4
[PATCH] perf util: Display warning when perf report/annotate is missing some libs
We keep having bug reports that when users build perf on their own, but they don't install some needed libraries like libelf, libbfd/libibery. The perf can build, but it is missing important functionality. For example, perf report doesn't display any symbols and perf annotate doesn't work. This patch displays warnings that these libraries are missing in build when perf report / perf annotate are used. Signed-off-by: Jin Yao --- tools/perf/builtin-annotate.c | 2 ++ tools/perf/builtin-report.c | 2 ++ tools/perf/util/symbol.c | 21 + tools/perf/util/symbol.h | 2 ++ 4 files changed, 27 insertions(+) diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c index f15731a..609eb7a 100644 --- a/tools/perf/builtin-annotate.c +++ b/tools/perf/builtin-annotate.c @@ -511,6 +511,8 @@ int cmd_annotate(int argc, const char **argv) setup_browser(true); + build_lib_warning(); + ret = __cmd_annotate(); out_delete: diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c index dd4df9a..550adb7 100644 --- a/tools/perf/builtin-report.c +++ b/tools/perf/builtin-report.c @@ -1293,6 +1293,8 @@ int cmd_report(int argc, const char **argv) } } + build_lib_warning(); + if (symbol__init(>header.env) < 0) goto error; diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c index cc065d4..4205ba8 100644 --- a/tools/perf/util/symbol.c +++ b/tools/perf/util/symbol.c @@ -2224,3 +2224,24 @@ int symbol__config_symfs(const struct option *opt __maybe_unused, free(bf); return 0; } + +void build_lib_warning(void) +{ +#ifndef HAVE_LIBELF_SUPPORT + pr_warning("Symbols are disabled!\n" + "Please install libelf-dev, libelf-devel " + "or elfutils-libelf-devel before building perf.\n"); +#endif + +#ifndef HAVE_DWARF_SUPPORT + pr_warning("Unwind support is disabled!\n" + "Please install elfutils-devel/libdw-dev " + "before building perf.\n"); +#endif + +#ifndef HAVE_LIBBFD_SUPPORT + pr_warning("C++ demangling and line numbers are disabled!\n" + "Please install binutils-dev[el]/" + "libiberty-dev before building perf.\n"); +#endif +} diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h index 0563f33..d88e046 100644 --- a/tools/perf/util/symbol.h +++ b/tools/perf/util/symbol.h @@ -383,6 +383,8 @@ int get_sdt_note_list(struct list_head *head, const char *target); int cleanup_sdt_note_list(struct list_head *sdt_notes); int sdt_notes__get_count(struct list_head *start); +void build_lib_warning(void); + #define SDT_BASE_SCN ".stapsdt.base" #define SDT_NOTE_SCN ".note.stapsdt" #define SDT_NOTE_TYPE 3 -- 2.7.4
[PATCH v1 6/8] perf util: Allocate time slices buffer according to number of comma
Previously we use a magic number 10 to limit the number of time slices. It's not very good. This patch creates a new function perf_time__range_alloc() to allocate time slices buffer. The number of buffer entries is determined by the number of comma in string but at least it will allocate one entry even if no comma is found. Signed-off-by: Jin Yao <yao@linux.intel.com> --- tools/perf/util/time-utils.c | 28 tools/perf/util/time-utils.h | 2 ++ 2 files changed, 30 insertions(+) diff --git a/tools/perf/util/time-utils.c b/tools/perf/util/time-utils.c index 5769f97..6193b46 100644 --- a/tools/perf/util/time-utils.c +++ b/tools/perf/util/time-utils.c @@ -325,6 +325,34 @@ int perf_time__percent_parse_str(struct perf_time_interval *ptime_buf, int num, return -1; } +struct perf_time_interval *perf_time__range_alloc(const char *ostr, int *size) +{ + const char *p1, *p2; + int i = 1; + struct perf_time_interval *ptime; + + /* +* At least allocate one time range. +*/ + if (!ostr) + goto alloc; + + p1 = ostr; + while (p1 < ostr + strlen(ostr)) { + p2 = strchr(p1, ','); + if (!p2) + break; + + p1 = p2 + 1; + i++; + } + +alloc: + *size = i; + ptime = calloc(i, sizeof(*ptime)); + return ptime; +} + bool perf_time__skip_sample(struct perf_time_interval *ptime, u64 timestamp) { /* if time is not set don't drop sample */ diff --git a/tools/perf/util/time-utils.h b/tools/perf/util/time-utils.h index 34d5eba..70b177d 100644 --- a/tools/perf/util/time-utils.h +++ b/tools/perf/util/time-utils.h @@ -16,6 +16,8 @@ int perf_time__parse_str(struct perf_time_interval *ptime, const char *ostr); int perf_time__percent_parse_str(struct perf_time_interval *ptime_buf, int num, const char *ostr, u64 start, u64 end); +struct perf_time_interval *perf_time__range_alloc(const char *ostr, int *size); + bool perf_time__skip_sample(struct perf_time_interval *ptime, u64 timestamp); bool perf_time__ranges_skip_sample(struct perf_time_interval *ptime_buf, -- 2.7.4
[PATCH v1 6/8] perf util: Allocate time slices buffer according to number of comma
Previously we use a magic number 10 to limit the number of time slices. It's not very good. This patch creates a new function perf_time__range_alloc() to allocate time slices buffer. The number of buffer entries is determined by the number of comma in string but at least it will allocate one entry even if no comma is found. Signed-off-by: Jin Yao --- tools/perf/util/time-utils.c | 28 tools/perf/util/time-utils.h | 2 ++ 2 files changed, 30 insertions(+) diff --git a/tools/perf/util/time-utils.c b/tools/perf/util/time-utils.c index 5769f97..6193b46 100644 --- a/tools/perf/util/time-utils.c +++ b/tools/perf/util/time-utils.c @@ -325,6 +325,34 @@ int perf_time__percent_parse_str(struct perf_time_interval *ptime_buf, int num, return -1; } +struct perf_time_interval *perf_time__range_alloc(const char *ostr, int *size) +{ + const char *p1, *p2; + int i = 1; + struct perf_time_interval *ptime; + + /* +* At least allocate one time range. +*/ + if (!ostr) + goto alloc; + + p1 = ostr; + while (p1 < ostr + strlen(ostr)) { + p2 = strchr(p1, ','); + if (!p2) + break; + + p1 = p2 + 1; + i++; + } + +alloc: + *size = i; + ptime = calloc(i, sizeof(*ptime)); + return ptime; +} + bool perf_time__skip_sample(struct perf_time_interval *ptime, u64 timestamp) { /* if time is not set don't drop sample */ diff --git a/tools/perf/util/time-utils.h b/tools/perf/util/time-utils.h index 34d5eba..70b177d 100644 --- a/tools/perf/util/time-utils.h +++ b/tools/perf/util/time-utils.h @@ -16,6 +16,8 @@ int perf_time__parse_str(struct perf_time_interval *ptime, const char *ostr); int perf_time__percent_parse_str(struct perf_time_interval *ptime_buf, int num, const char *ostr, u64 start, u64 end); +struct perf_time_interval *perf_time__range_alloc(const char *ostr, int *size); + bool perf_time__skip_sample(struct perf_time_interval *ptime, u64 timestamp); bool perf_time__ranges_skip_sample(struct perf_time_interval *ptime_buf, -- 2.7.4
[PATCH v1 4/8] perf util: Support no index time percent slice
Previously, the time percent slice needs an index to specify which one the user wants. While it may be easy for using if the index can be omitted. So with this patch, for example, perf report --stdio --time 10%/1 should be equivalent to perf report --stdio --time 10% Signed-off-by: Jin Yao <yao@linux.intel.com> --- tools/perf/util/time-utils.c | 36 1 file changed, 36 insertions(+) diff --git a/tools/perf/util/time-utils.c b/tools/perf/util/time-utils.c index 88510ab..5769f97 100644 --- a/tools/perf/util/time-utils.c +++ b/tools/perf/util/time-utils.c @@ -261,6 +261,37 @@ static int percent_comma_split(struct perf_time_interval *ptime_buf, int num, return i; } +static int one_percent_convert(struct perf_time_interval *ptime_buf, + const char *ostr, u64 start, u64 end, char *c) +{ + char *str; + int len = strlen(ostr), ret; + + /* +* c points to '%'. +* '%' should be the last character +*/ + if (ostr + len - 1 != c) + return -1; + + /* +* Construct a string like "xx%/1" +*/ + str = malloc(len + 3); + if (str == NULL) + return -ENOMEM; + + memcpy(str, ostr, len); + strcpy(str + len, "/1"); + + ret = percent_slash_split(str, ptime_buf, start, end); + if (ret == 0) + ret = 1; + + free(str); + return ret; +} + int perf_time__percent_parse_str(struct perf_time_interval *ptime_buf, int num, const char *ostr, u64 start, u64 end) { @@ -270,6 +301,7 @@ int perf_time__percent_parse_str(struct perf_time_interval *ptime_buf, int num, * ostr example: * 10%/2,10%/3: select the second 10% slice and the third 10% slice * 0%-10%,30%-40%: multiple time range +* 50%: just one percent */ memset(ptime_buf, 0, sizeof(*ptime_buf) * num); @@ -286,6 +318,10 @@ int perf_time__percent_parse_str(struct perf_time_interval *ptime_buf, int num, end, percent_dash_split); } + c = strchr(ostr, '%'); + if (c) + return one_percent_convert(ptime_buf, ostr, start, end, c); + return -1; } -- 2.7.4
[PATCH v1 2/8] perf script: Improve error msg when no first/last sample time found
The following message will be returned to user when executing 'perf script --time' if perf data file doesn't contain the first/last sample time. "HINT: no first/last sample time found in perf data. Please use latest perf binary to execute 'perf record' (if '--buildid-all' is enabled, needs to set '--timestamp-boundary')." Signed-off-by: Jin Yao <yao@linux.intel.com> --- tools/perf/builtin-script.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c index c1cce47..4f691af 100644 --- a/tools/perf/builtin-script.c +++ b/tools/perf/builtin-script.c @@ -3448,7 +3448,9 @@ int cmd_script(int argc, const char **argv) if (perf_time__parse_str(script.ptime_range, script.time_str) != 0) { if (session->evlist->first_sample_time == 0 && session->evlist->last_sample_time == 0) { - pr_err("No first/last sample time in perf data\n"); + pr_err("HINT: no first/last sample time found in perf data.\n" + "Please use latest perf binary to execute 'perf record'\n" + "(if '--buildid-all' is enabled, please set '--timestamp-boundary').\n"); err = -EINVAL; goto out_delete; } -- 2.7.4
[PATCH v1 4/8] perf util: Support no index time percent slice
Previously, the time percent slice needs an index to specify which one the user wants. While it may be easy for using if the index can be omitted. So with this patch, for example, perf report --stdio --time 10%/1 should be equivalent to perf report --stdio --time 10% Signed-off-by: Jin Yao --- tools/perf/util/time-utils.c | 36 1 file changed, 36 insertions(+) diff --git a/tools/perf/util/time-utils.c b/tools/perf/util/time-utils.c index 88510ab..5769f97 100644 --- a/tools/perf/util/time-utils.c +++ b/tools/perf/util/time-utils.c @@ -261,6 +261,37 @@ static int percent_comma_split(struct perf_time_interval *ptime_buf, int num, return i; } +static int one_percent_convert(struct perf_time_interval *ptime_buf, + const char *ostr, u64 start, u64 end, char *c) +{ + char *str; + int len = strlen(ostr), ret; + + /* +* c points to '%'. +* '%' should be the last character +*/ + if (ostr + len - 1 != c) + return -1; + + /* +* Construct a string like "xx%/1" +*/ + str = malloc(len + 3); + if (str == NULL) + return -ENOMEM; + + memcpy(str, ostr, len); + strcpy(str + len, "/1"); + + ret = percent_slash_split(str, ptime_buf, start, end); + if (ret == 0) + ret = 1; + + free(str); + return ret; +} + int perf_time__percent_parse_str(struct perf_time_interval *ptime_buf, int num, const char *ostr, u64 start, u64 end) { @@ -270,6 +301,7 @@ int perf_time__percent_parse_str(struct perf_time_interval *ptime_buf, int num, * ostr example: * 10%/2,10%/3: select the second 10% slice and the third 10% slice * 0%-10%,30%-40%: multiple time range +* 50%: just one percent */ memset(ptime_buf, 0, sizeof(*ptime_buf) * num); @@ -286,6 +318,10 @@ int perf_time__percent_parse_str(struct perf_time_interval *ptime_buf, int num, end, percent_dash_split); } + c = strchr(ostr, '%'); + if (c) + return one_percent_convert(ptime_buf, ostr, start, end, c); + return -1; } -- 2.7.4
[PATCH v1 2/8] perf script: Improve error msg when no first/last sample time found
The following message will be returned to user when executing 'perf script --time' if perf data file doesn't contain the first/last sample time. "HINT: no first/last sample time found in perf data. Please use latest perf binary to execute 'perf record' (if '--buildid-all' is enabled, needs to set '--timestamp-boundary')." Signed-off-by: Jin Yao --- tools/perf/builtin-script.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c index c1cce47..4f691af 100644 --- a/tools/perf/builtin-script.c +++ b/tools/perf/builtin-script.c @@ -3448,7 +3448,9 @@ int cmd_script(int argc, const char **argv) if (perf_time__parse_str(script.ptime_range, script.time_str) != 0) { if (session->evlist->first_sample_time == 0 && session->evlist->last_sample_time == 0) { - pr_err("No first/last sample time in perf data\n"); + pr_err("HINT: no first/last sample time found in perf data.\n" + "Please use latest perf binary to execute 'perf record'\n" + "(if '--buildid-all' is enabled, please set '--timestamp-boundary').\n"); err = -EINVAL; goto out_delete; } -- 2.7.4
[PATCH v1 8/8] perf script: Remove the time slices number limitation
Previously it was only allowed to use at most 10 time slices in 'perf script --time'. This patch removes this limitation. For example, following command line is OK (12 time slices) perf script --time 1%/1,1%/2,1%/3,1%/4,1%/5,1%/6,1%/7,1%/8,1%/9,1%/10,1%/11,1%/12 Signed-off-by: Jin Yao <yao@linux.intel.com> --- tools/perf/Documentation/perf-script.txt | 10 +- tools/perf/builtin-script.c | 17 + 2 files changed, 18 insertions(+), 9 deletions(-) diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt index 806ec63..7730c1d 100644 --- a/tools/perf/Documentation/perf-script.txt +++ b/tools/perf/Documentation/perf-script.txt @@ -351,19 +351,19 @@ include::itrace.txt[] to end of file. Also support time percent with multipe time range. Time string is - 'a%/n,b%/m,...' or 'a%-b%,c%-%d,...'. The maximum number of slices is 10. + 'a%/n,b%/m,...' or 'a%-b%,c%-%d,...'. For example: - Select the second 10% time slice + Select the second 10% time slice: perf script --time 10%/2 - Select from 0% to 10% time slice + Select from 0% to 10% time slice: perf script --time 0%-10% - Select the first and second 10% time slices + Select the first and second 10% time slices: perf script --time 10%/1,10%/2 - Select from 0% to 10% and 30% to 40% slices + Select from 0% to 10% and 30% to 40% slices: perf script --time 0%-10%,30%-40% --max-blocks:: diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c index 4f691af..f116a31 100644 --- a/tools/perf/builtin-script.c +++ b/tools/perf/builtin-script.c @@ -1480,8 +1480,6 @@ static int perf_sample__fprintf_synth(struct perf_sample *sample, return 0; } -#define PTIME_RANGE_MAX10 - struct perf_script { struct perf_tooltool; struct perf_session *session; @@ -1496,7 +1494,8 @@ struct perf_script { struct thread_map *threads; int name_width; const char *time_str; - struct perf_time_interval ptime_range[PTIME_RANGE_MAX]; + struct perf_time_interval *ptime_range; + int range_size; int range_num; }; @@ -3444,6 +3443,13 @@ int cmd_script(int argc, const char **argv) if (err < 0) goto out_delete; + script.ptime_range = perf_time__range_alloc(script.time_str, + _size); + if (!script.ptime_range) { + err = -ENOMEM; + goto out_delete; + } + /* needs to be parsed after looking up reference time */ if (perf_time__parse_str(script.ptime_range, script.time_str) != 0) { if (session->evlist->first_sample_time == 0 && @@ -3456,7 +3462,7 @@ int cmd_script(int argc, const char **argv) } script.range_num = perf_time__percent_parse_str( - script.ptime_range, PTIME_RANGE_MAX, + script.ptime_range, script.range_size, script.time_str, session->evlist->first_sample_time, session->evlist->last_sample_time); @@ -3475,6 +3481,9 @@ int cmd_script(int argc, const char **argv) flush_scripting(); out_delete: + if (script.ptime_range) + free(script.ptime_range); + perf_evlist__free_stats(session->evlist); perf_session__delete(session); -- 2.7.4
[PATCH v1 8/8] perf script: Remove the time slices number limitation
Previously it was only allowed to use at most 10 time slices in 'perf script --time'. This patch removes this limitation. For example, following command line is OK (12 time slices) perf script --time 1%/1,1%/2,1%/3,1%/4,1%/5,1%/6,1%/7,1%/8,1%/9,1%/10,1%/11,1%/12 Signed-off-by: Jin Yao --- tools/perf/Documentation/perf-script.txt | 10 +- tools/perf/builtin-script.c | 17 + 2 files changed, 18 insertions(+), 9 deletions(-) diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt index 806ec63..7730c1d 100644 --- a/tools/perf/Documentation/perf-script.txt +++ b/tools/perf/Documentation/perf-script.txt @@ -351,19 +351,19 @@ include::itrace.txt[] to end of file. Also support time percent with multipe time range. Time string is - 'a%/n,b%/m,...' or 'a%-b%,c%-%d,...'. The maximum number of slices is 10. + 'a%/n,b%/m,...' or 'a%-b%,c%-%d,...'. For example: - Select the second 10% time slice + Select the second 10% time slice: perf script --time 10%/2 - Select from 0% to 10% time slice + Select from 0% to 10% time slice: perf script --time 0%-10% - Select the first and second 10% time slices + Select the first and second 10% time slices: perf script --time 10%/1,10%/2 - Select from 0% to 10% and 30% to 40% slices + Select from 0% to 10% and 30% to 40% slices: perf script --time 0%-10%,30%-40% --max-blocks:: diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c index 4f691af..f116a31 100644 --- a/tools/perf/builtin-script.c +++ b/tools/perf/builtin-script.c @@ -1480,8 +1480,6 @@ static int perf_sample__fprintf_synth(struct perf_sample *sample, return 0; } -#define PTIME_RANGE_MAX10 - struct perf_script { struct perf_tooltool; struct perf_session *session; @@ -1496,7 +1494,8 @@ struct perf_script { struct thread_map *threads; int name_width; const char *time_str; - struct perf_time_interval ptime_range[PTIME_RANGE_MAX]; + struct perf_time_interval *ptime_range; + int range_size; int range_num; }; @@ -3444,6 +3443,13 @@ int cmd_script(int argc, const char **argv) if (err < 0) goto out_delete; + script.ptime_range = perf_time__range_alloc(script.time_str, + _size); + if (!script.ptime_range) { + err = -ENOMEM; + goto out_delete; + } + /* needs to be parsed after looking up reference time */ if (perf_time__parse_str(script.ptime_range, script.time_str) != 0) { if (session->evlist->first_sample_time == 0 && @@ -3456,7 +3462,7 @@ int cmd_script(int argc, const char **argv) } script.range_num = perf_time__percent_parse_str( - script.ptime_range, PTIME_RANGE_MAX, + script.ptime_range, script.range_size, script.time_str, session->evlist->first_sample_time, session->evlist->last_sample_time); @@ -3475,6 +3481,9 @@ int cmd_script(int argc, const char **argv) flush_scripting(); out_delete: + if (script.ptime_range) + free(script.ptime_range); + perf_evlist__free_stats(session->evlist); perf_session__delete(session); -- 2.7.4
[PATCH v1 7/8] perf report: Remove the time slices number limitation
Previously it was only allowed to use at most 10 time slices in 'perf report --time'. This patch removes this limitation. For example, following command line is OK (12 time slices) perf report --stdio --time 1%/1,1%/2,1%/3,1%/4,1%/5,1%/6,1%/7,1%/8,1%/9,1%/10,1%/11,1%/12 Signed-off-by: Jin Yao <yao@linux.intel.com> --- tools/perf/Documentation/perf-report.txt | 2 +- tools/perf/builtin-report.c | 23 +-- 2 files changed, 18 insertions(+), 7 deletions(-) diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt index 5522ce0..1940c4f 100644 --- a/tools/perf/Documentation/perf-report.txt +++ b/tools/perf/Documentation/perf-report.txt @@ -403,7 +403,7 @@ OPTIONS to end of file. Also support time percent with multiple time range. Time string is - 'a%/n,b%/m,...' or 'a%-b%,c%-%d,...'. The maximum number of slices is 10. + 'a%/n,b%/m,...' or 'a%-b%,c%-%d,...'. For example: Select the second 10% time slice: diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c index 77c954c..fe89021 100644 --- a/tools/perf/builtin-report.c +++ b/tools/perf/builtin-report.c @@ -54,8 +54,6 @@ #include #include -#define PTIME_RANGE_MAX10 - struct report { struct perf_tooltool; struct perf_session *session; @@ -76,7 +74,8 @@ struct report { const char *cpu_list; const char *symbol_filter_str; const char *time_str; - struct perf_time_interval ptime_range[PTIME_RANGE_MAX]; + struct perf_time_interval *ptime_range; + int range_size; int range_num; float min_percent; u64 nr_entries; @@ -1299,24 +1298,33 @@ int cmd_report(int argc, const char **argv) if (symbol__init(>header.env) < 0) goto error; + report.ptime_range = perf_time__range_alloc(report.time_str, + _size); + if (!report.ptime_range) { + ret = -ENOMEM; + goto error; + } + if (perf_time__parse_str(report.ptime_range, report.time_str) != 0) { if (session->evlist->first_sample_time == 0 && session->evlist->last_sample_time == 0) { pr_err("HINT: no first/last sample time found in perf data.\n" "Please use latest perf binary to execute 'perf record'\n" "(if '--buildid-all' is enabled, please set '--timestamp-boundary').\n"); - return -EINVAL; + ret = -EINVAL; + goto error; } report.range_num = perf_time__percent_parse_str( - report.ptime_range, PTIME_RANGE_MAX, + report.ptime_range, report.range_size, report.time_str, session->evlist->first_sample_time, session->evlist->last_sample_time); if (report.range_num < 0) { pr_err("Invalid time string\n"); - return -EINVAL; + ret = -EINVAL; + goto error; } } else { report.range_num = 1; @@ -1332,6 +1340,9 @@ int cmd_report(int argc, const char **argv) ret = 0; error: + if (report.ptime_range) + free(report.ptime_range); + perf_session__delete(session); return ret; } -- 2.7.4
[PATCH v1 3/8] perf util: Improve error checking for time percent input
The command line like 'perf report --stdio --time 1abc%/1' could be accepted by perf. It looks not very good. This patch uses strtod() to replace original atof() and check the entire string. Now for the same command line, it would return error message "Invalid time string". root@skl:/tmp# perf report --stdio --time 1abc%/1 Invalid time string Signed-off-by: Jin Yao <yao@linux.intel.com> --- tools/perf/util/time-utils.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/tools/perf/util/time-utils.c b/tools/perf/util/time-utils.c index 3f7f18f..88510ab 100644 --- a/tools/perf/util/time-utils.c +++ b/tools/perf/util/time-utils.c @@ -116,7 +116,8 @@ int perf_time__parse_str(struct perf_time_interval *ptime, const char *ostr) static int parse_percent(double *pcnt, char *str) { - char *c; + char *c, *endptr; + double d; c = strchr(str, '%'); if (c) @@ -124,8 +125,11 @@ static int parse_percent(double *pcnt, char *str) else return -1; - *pcnt = atof(str) / 100.0; + d = strtod(str, ); + if (endptr != str + strlen(str)) + return -1; + *pcnt = d / 100.0; return 0; } -- 2.7.4
[PATCH v1 7/8] perf report: Remove the time slices number limitation
Previously it was only allowed to use at most 10 time slices in 'perf report --time'. This patch removes this limitation. For example, following command line is OK (12 time slices) perf report --stdio --time 1%/1,1%/2,1%/3,1%/4,1%/5,1%/6,1%/7,1%/8,1%/9,1%/10,1%/11,1%/12 Signed-off-by: Jin Yao --- tools/perf/Documentation/perf-report.txt | 2 +- tools/perf/builtin-report.c | 23 +-- 2 files changed, 18 insertions(+), 7 deletions(-) diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt index 5522ce0..1940c4f 100644 --- a/tools/perf/Documentation/perf-report.txt +++ b/tools/perf/Documentation/perf-report.txt @@ -403,7 +403,7 @@ OPTIONS to end of file. Also support time percent with multiple time range. Time string is - 'a%/n,b%/m,...' or 'a%-b%,c%-%d,...'. The maximum number of slices is 10. + 'a%/n,b%/m,...' or 'a%-b%,c%-%d,...'. For example: Select the second 10% time slice: diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c index 77c954c..fe89021 100644 --- a/tools/perf/builtin-report.c +++ b/tools/perf/builtin-report.c @@ -54,8 +54,6 @@ #include #include -#define PTIME_RANGE_MAX10 - struct report { struct perf_tooltool; struct perf_session *session; @@ -76,7 +74,8 @@ struct report { const char *cpu_list; const char *symbol_filter_str; const char *time_str; - struct perf_time_interval ptime_range[PTIME_RANGE_MAX]; + struct perf_time_interval *ptime_range; + int range_size; int range_num; float min_percent; u64 nr_entries; @@ -1299,24 +1298,33 @@ int cmd_report(int argc, const char **argv) if (symbol__init(>header.env) < 0) goto error; + report.ptime_range = perf_time__range_alloc(report.time_str, + _size); + if (!report.ptime_range) { + ret = -ENOMEM; + goto error; + } + if (perf_time__parse_str(report.ptime_range, report.time_str) != 0) { if (session->evlist->first_sample_time == 0 && session->evlist->last_sample_time == 0) { pr_err("HINT: no first/last sample time found in perf data.\n" "Please use latest perf binary to execute 'perf record'\n" "(if '--buildid-all' is enabled, please set '--timestamp-boundary').\n"); - return -EINVAL; + ret = -EINVAL; + goto error; } report.range_num = perf_time__percent_parse_str( - report.ptime_range, PTIME_RANGE_MAX, + report.ptime_range, report.range_size, report.time_str, session->evlist->first_sample_time, session->evlist->last_sample_time); if (report.range_num < 0) { pr_err("Invalid time string\n"); - return -EINVAL; + ret = -EINVAL; + goto error; } } else { report.range_num = 1; @@ -1332,6 +1340,9 @@ int cmd_report(int argc, const char **argv) ret = 0; error: + if (report.ptime_range) + free(report.ptime_range); + perf_session__delete(session); return ret; } -- 2.7.4
[PATCH v1 3/8] perf util: Improve error checking for time percent input
The command line like 'perf report --stdio --time 1abc%/1' could be accepted by perf. It looks not very good. This patch uses strtod() to replace original atof() and check the entire string. Now for the same command line, it would return error message "Invalid time string". root@skl:/tmp# perf report --stdio --time 1abc%/1 Invalid time string Signed-off-by: Jin Yao --- tools/perf/util/time-utils.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/tools/perf/util/time-utils.c b/tools/perf/util/time-utils.c index 3f7f18f..88510ab 100644 --- a/tools/perf/util/time-utils.c +++ b/tools/perf/util/time-utils.c @@ -116,7 +116,8 @@ int perf_time__parse_str(struct perf_time_interval *ptime, const char *ostr) static int parse_percent(double *pcnt, char *str) { - char *c; + char *c, *endptr; + double d; c = strchr(str, '%'); if (c) @@ -124,8 +125,11 @@ static int parse_percent(double *pcnt, char *str) else return -1; - *pcnt = atof(str) / 100.0; + d = strtod(str, ); + if (endptr != str + strlen(str)) + return -1; + *pcnt = d / 100.0; return 0; } -- 2.7.4
[PATCH v1 5/8] perf report: Add an indication of what time slices are used
Add a time slices indication to the perf report header. For example, # perf report --stdio --time 10% # Total Lost Samples: 0 # # Samples: 9K of event 'cycles:ppp' (time slices: 10%) # Event count (approx.): 8951288803 Signed-off-by: Jin Yao <yao@linux.intel.com> --- tools/perf/builtin-report.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c index a6c5cf2..77c954c 100644 --- a/tools/perf/builtin-report.c +++ b/tools/perf/builtin-report.c @@ -403,6 +403,9 @@ static size_t hists__fprintf_nr_sample_events(struct hists *hists, struct report if (evname != NULL) ret += fprintf(fp, " of event '%s'", evname); + if (rep->time_str) + ret += fprintf(fp, " (time slices: %s)", rep->time_str); + if (symbol_conf.show_ref_callgraph && strstr(evname, "call-graph=no")) { ret += fprintf(fp, ", show reference callgraph"); -- 2.7.4
[PATCH v1 5/8] perf report: Add an indication of what time slices are used
Add a time slices indication to the perf report header. For example, # perf report --stdio --time 10% # Total Lost Samples: 0 # # Samples: 9K of event 'cycles:ppp' (time slices: 10%) # Event count (approx.): 8951288803 Signed-off-by: Jin Yao --- tools/perf/builtin-report.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c index a6c5cf2..77c954c 100644 --- a/tools/perf/builtin-report.c +++ b/tools/perf/builtin-report.c @@ -403,6 +403,9 @@ static size_t hists__fprintf_nr_sample_events(struct hists *hists, struct report if (evname != NULL) ret += fprintf(fp, " of event '%s'", evname); + if (rep->time_str) + ret += fprintf(fp, " (time slices: %s)", rep->time_str); + if (symbol_conf.show_ref_callgraph && strstr(evname, "call-graph=no")) { ret += fprintf(fp, ", show reference callgraph"); -- 2.7.4
[PATCH v1 1/8] perf report: Improve error msg when no first/last sample time found
The following message will be returned to user when executing 'perf report --time' if perf data file doesn't contain the first/last sample time. "HINT: no first/last sample time found in perf data. Please use latest perf binary to execute 'perf record' (if '--buildid-all' is enabled, needs to set '--timestamp-boundary')." Signed-off-by: Jin Yao <yao@linux.intel.com> --- tools/perf/builtin-report.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c index dd4df9a..a6c5cf2 100644 --- a/tools/perf/builtin-report.c +++ b/tools/perf/builtin-report.c @@ -1299,7 +1299,9 @@ int cmd_report(int argc, const char **argv) if (perf_time__parse_str(report.ptime_range, report.time_str) != 0) { if (session->evlist->first_sample_time == 0 && session->evlist->last_sample_time == 0) { - pr_err("No first/last sample time in perf data\n"); + pr_err("HINT: no first/last sample time found in perf data.\n" + "Please use latest perf binary to execute 'perf record'\n" + "(if '--buildid-all' is enabled, please set '--timestamp-boundary').\n"); return -EINVAL; } -- 2.7.4
[PATCH v1 1/8] perf report: Improve error msg when no first/last sample time found
The following message will be returned to user when executing 'perf report --time' if perf data file doesn't contain the first/last sample time. "HINT: no first/last sample time found in perf data. Please use latest perf binary to execute 'perf record' (if '--buildid-all' is enabled, needs to set '--timestamp-boundary')." Signed-off-by: Jin Yao --- tools/perf/builtin-report.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c index dd4df9a..a6c5cf2 100644 --- a/tools/perf/builtin-report.c +++ b/tools/perf/builtin-report.c @@ -1299,7 +1299,9 @@ int cmd_report(int argc, const char **argv) if (perf_time__parse_str(report.ptime_range, report.time_str) != 0) { if (session->evlist->first_sample_time == 0 && session->evlist->last_sample_time == 0) { - pr_err("No first/last sample time in perf data\n"); + pr_err("HINT: no first/last sample time found in perf data.\n" + "Please use latest perf binary to execute 'perf record'\n" + "(if '--buildid-all' is enabled, please set '--timestamp-boundary').\n"); return -EINVAL; } -- 2.7.4
[PATCH v1 0/8] perf: Follow-up patches to improve time slice
It's follow-up patches to improve the perf time slice feature (perf report/script --time xxx) 1. Improve the error message perf report: Improve error msg when no first/last sample time found perf script: Improve error msg when no first/last sample time found 2. Fix an issue that illegal percent was accepted previously (e.g. 1abc%) perf util: Improve error checking for time percent input 3. Omit the slice index if possible. For example, perf report --stdio --time 10%/1 is equivalent to perf report --stdio --time 10% perf util: Support no index time percent slice 4. Add indication of time slices in perf report header. perf report: Add an indication of what time slices are used 5. Remove the time slices number limitation in perf report/script perf util: Allocate time slices buffer according to number of comma perf report: Remove the time slices number limitation perf script: Remove the time slices number limitation Jin Yao (8): perf report: Improve error msg when no first/last sample time found perf script: Improve error msg when no first/last sample time found perf util: Improve error checking for time percent input perf util: Support no index time percent slice perf report: Add an indication of what time slices are used perf util: Allocate time slices buffer according to number of comma perf report: Remove the time slices number limitation perf script: Remove the time slices number limitation tools/perf/Documentation/perf-report.txt | 2 +- tools/perf/Documentation/perf-script.txt | 10 ++--- tools/perf/builtin-report.c | 30 + tools/perf/builtin-script.c | 21 +++--- tools/perf/util/time-utils.c | 72 +++- tools/perf/util/time-utils.h | 2 + 6 files changed, 117 insertions(+), 20 deletions(-) -- 2.7.4