RE: NumaTOP 1.0 launched
Hi, https://01.org/numatop/ is slightly faster to access. Or please access https://github.com/01org/numatop to get the source directly. Thanks Jin Yao -Original Message- From: Jin, Yao Sent: Wednesday, April 17, 2013 10:30 AM To: 'l...@lwn.net'; 'linux-kernel@vger.kernel.org'; 'linux-perf-us...@vger.kernel.org' Subject: NumaTOP 1.0 launched We are pleased to announce today that the NumaTOP project has been added to 01.org. Performance analysis engineers know that NUMA can seriously impact performance and that NUMA performance analysis can be challenging. We've realized that currently there isn't an easy-to-use tool that lets us easily observe whether NUMA-related issues exist and, if so, where the NUMA bottleneck(s) reside. It can be quite challenging, especially in complex server environments. We decided to create a tool that automatically performs the typical steps in NUMA analysis and provides a good starting point to dive in and fix NUMA-related bottlenecks. That's NumaTOP! NumaTOP is an observation tool for runtime memory locality characterization and analysis of processes and threads running on a NUMA system. It helps the user characterize the NUMA behavior of processes and threads and identify where the NUMA-related performance bottlenecks reside. It uses Intel performance counter sampling technologies and associates the performance data with system runtime information to provide real-time analysis for production systems. NumaTOP is a GUI tool. It can run on Linux kernel 3.8 with a perf load latency patch today. That patch is planned to be integrated into kernel 3.9 or a later release. To learn more about NumaTOP, visit: http://01.org/numatop/ Best Regards Jin Yao -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [RFC][PATCH 7/7] perf/annotate: Add branch stack / basic block information
Hi, The idea of my patch is a little bit different. It targets to associate the branch coverage percentage and branch mispredict rate with the source code then user can see them directly in perf annotate source code / assembly view. The screenshot to show the branch%. https://github.com/yaoj/perf/blob/master/2.png The screenshot to show the mispredict rate. https://github.com/yaoj/perf/blob/master/3.png The --stdio mode is tested working well now and will do for --tui mode in next. The internal review comments require the patch to keep the existing "Percent" column unchanged and add new columns "Branch%" and "Mispred%" if LBRs are recorded in perf.data. For example, following 3 columns will be shown at default if LBRs are in perf.data when executing perf annotate --stdio. Percent | Branch% | Mispred% | With this comment, the patch needs to be changed. Please let me know what do you think for this patch and if I should go ahead. Thanks Jin Yao -Original Message- From: Arnaldo Carvalho de Melo [mailto:a...@kernel.org] Sent: Friday, September 9, 2016 1:11 AM To: Andi Kleen Cc: Stephane Eranian ; Peter Zijlstra ; Ingo Molnar ; LKML ; Jiri Olsa ; Linus Torvalds ; David Carrillo-Cisneros ; Alexander Shishkin ; Namhyung Kim ; Liang, Kan ; Anshuman Khandual ; Jin, Yao Subject: Re: [RFC][PATCH 7/7] perf/annotate: Add branch stack / basic block information Em Thu, Sep 08, 2016 at 09:59:15AM -0700, Andi Kleen escreveu: > On Thu, Sep 08, 2016 at 09:43:53AM -0700, Stephane Eranian wrote: > > Hi, > > > > On Thu, Sep 8, 2016 at 9:18 AM, Arnaldo Carvalho de Melo > > wrote: > > > > > > Em Fri, Jul 08, 2016 at 06:36:32PM +0200, Peter Zijlstra escreveu: > > > > On Fri, Jul 08, 2016 at 06:27:33PM +0200, Peter Zijlstra wrote: > > > > > > > > > I've been thinking of filtering all targets and branches that > > > > > are smaller than 0.1% in order to avoid this, but so far I've > > > > > just been ignoring these things. > > > > > > > > Like so... seems to 'work'. > > > > > > So I merged this one with 7/7 and this is the result, screenshot > > > to capture the colors: > > > > > > http://vger.kernel.org/~acme/perf/annotate_basic_blocks.png > > > > > > Please let me know if I should go ahead and push with the combined > > > patch, that is now at: > > > > > > https://git.kernel.org/cgit/linux/kernel/git/acme/linux.git/commit > > > /?h=perf/annotate_basic_blocks&id=baf41a43fa439ac534d21e41882a7858 > > > d5cee1e5 > > > > > > git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git > > > perf/annotate_basic_blocks > > > > > > Is that ok? > > > > > I like the idea and yes, branch stack can be used for this, but I > > have a hard time understanding the colored output. > > What is the explanation for the color changes? > > How do I interpret the percentages in the comments of the assembly: > > -54.50% (p: 42%) > > Why not have dedicated columns before the assembly with proper column > > headers? > > Yes columns with headers are better. Jin Yao has been looking at this > and already has some patches. For --tui as well? - Arnaldo
[PATCH] perf symbols: Not using the ubuntu debuginfo when dso load
In dso__load, it iterates over *interesting" debug images which have symtab/dynsym/opd section. For example, for loading the libc-2.23.so on ubuntu 16.04, it will try 2 images in turn. Try image 1: /lib/x86_64-linux-gnu/libc-2.23.so. Since it doesn't have ".symtab" section, after symsrc__init return, runtime_ss = ss (ss is associated with /lib/x86_64-linux-gnu/libc-2.23.so), and syms_ss = NULL. Try image 2: /usr/lib/debug/lib/x86_64-linux-gnu/libc-2.23.so. Since it has ".symtab" section, after symsrc__init return, syms_ss = ss (ss is associated with /usr/lib/debug/lib/x86_64-linux-gnu/libc-2.23.so), and runtime_ss is not changed. Now at the dso__load_sym(dso, map, syms_ss, runtime_ss, filter, kmod), syms_ss is associated with /usr/lib/debug/lib/x86_64-linux-gnu/libc-2.23.so, But the dso->name is /lib/x86_64-linux-gnu/libc-2.23.so. In dso__load_sym, it needs to compute the dso->text_offset. if (elf_section_by_name(elf, &ehdr, &tshdr, ".text", NULL)) dso->text_offset = tshdr.sh_addr - tshdr.sh_offset; The issue is that tshdr is the ELF section header for /usr/lib/debug/lib/x86_64-linux-gnu/libc-2.23.so not the header for /lib/x86_64-linux-gnu/libc-2.23.so. So the result of dso->text_offset is not correct. In symbol__disassemble, it calls map__rip_2objdump to convert symbol start address to objdump address. The converted address is rip + map->dso->text_offset Since this text_offset is not correct, so finally objdump uses wrong start address to disassemble /lib/x86_64-linux-gnu/libc-2.23.so. This patch lets dso__load not try ubuntu debuginfo (/usr/lib/debug/lib/x86_64-linux-gnu/libc-2.23.so in this example), because 1: the image doesn't have valid assembly code (check output of objdump) 2: it can avoid above issue. Signed-off-by: Jin Yao --- tools/perf/util/symbol.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c index 37e8d20..25a10a3 100644 --- a/tools/perf/util/symbol.c +++ b/tools/perf/util/symbol.c @@ -1324,7 +1324,6 @@ static bool dso__is_compatible_symtab_type(struct dso *dso, bool kmod, case DSO_BINARY_TYPE__DEBUGLINK: case DSO_BINARY_TYPE__SYSTEM_PATH_DSO: case DSO_BINARY_TYPE__FEDORA_DEBUGINFO: - case DSO_BINARY_TYPE__UBUNTU_DEBUGINFO: case DSO_BINARY_TYPE__BUILDID_DEBUGINFO: case DSO_BINARY_TYPE__OPENEMBEDDED_DEBUGINFO: return !kmod && dso->kernel == DSO_TYPE_USER; @@ -1353,6 +1352,7 @@ static bool dso__is_compatible_symtab_type(struct dso *dso, bool kmod, return true; case DSO_BINARY_TYPE__NOT_FOUND: + case DSO_BINARY_TYPE__UBUNTU_DEBUGINFO: default: return false; } -- 2.7.4
[PATCH v1 0/4] perf annotate: Display multiple events on the left side of annotate view
perf record -e cycles,branches ... perf annotate main --stdio The result only shows cycles. It should show both cycles and branches on the left side of annotate view. It works with "--group", but need this to work even without groups. The patch series supports to display multiple events on the left side of annotate view for stdio, tui and gtk modes. Jin Yao (4): perf annotate: create a new hists to manage multiple events samples perf annotate: Display multiple events for stdio mode perf annotate: Display multiple events for tui mode perf annotate: Display multiple events for gtk mode tools/perf/builtin-annotate.c | 62 + tools/perf/builtin-top.c | 3 +- tools/perf/ui/browsers/annotate.c | 49 +++--- tools/perf/ui/browsers/hists.c| 2 +- tools/perf/ui/gtk/annotate.c | 35 --- tools/perf/util/annotate.c| 187 +- tools/perf/util/annotate.h| 16 +++- tools/perf/util/hist.h| 8 +- tools/perf/util/sort.c| 21 + tools/perf/util/sort.h| 13 +++ 10 files changed, 301 insertions(+), 95 deletions(-) -- 2.7.4
[PATCH v1 3/4] perf annotate: Display multiple events for tui mode
For example: perf record -e cycles,branches ./div perf annotate main │for (i = 0; i < 20; i++) { │flag = compute_flag(); 5.77 4.85 │38: xor%eax,%eax 0.01 0.01 │→ callq compute_flag │ │count++; 2.38 4.38 │ movcount,%edx 0.00 0.00 │ add$0x1,%edx │ │if (flag) │ test %eax,%eax │srand(s_randseed); │ │for (i = 0; i < 20; i++) { │flag = compute_flag(); │ │count++; 0.60 0.44 │ mov%edx,count │ │if (flag) 3.99 4.32 │↓ je 82 Signed-off-by: Jin Yao --- tools/perf/ui/browsers/annotate.c | 49 --- tools/perf/ui/browsers/hists.c| 2 +- tools/perf/util/annotate.h| 6 +++-- tools/perf/util/hist.h| 8 --- 4 files changed, 45 insertions(+), 20 deletions(-) diff --git a/tools/perf/ui/browsers/annotate.c b/tools/perf/ui/browsers/annotate.c index 80f38da..b76c238 100644 --- a/tools/perf/ui/browsers/annotate.c +++ b/tools/perf/ui/browsers/annotate.c @@ -428,13 +428,15 @@ static void annotate_browser__set_rb_top(struct annotate_browser *browser, } static void annotate_browser__calc_percent(struct annotate_browser *browser, - struct perf_evsel *evsel) + struct perf_evsel *evsel, + struct hist_entry *he) { struct map_symbol *ms = browser->b.priv; struct symbol *sym = ms->sym; struct annotation *notes = symbol__annotation(sym); struct disasm_line *pos, *next; s64 len = symbol__size(sym); + struct hist_event *hevt; browser->entries = RB_ROOT; @@ -455,13 +457,25 @@ static void annotate_browser__calc_percent(struct annotate_browser *browser, for (i = 0; i < browser->nr_events; i++) { struct sym_hist_entry sample; - - bpos->samples[i].percent = disasm__calc_percent(notes, + if (evsel) { + bpos->samples[i].percent = + disasm__calc_percent(notes, evsel->idx + i, pos->offset, next ? next->offset : len, &path, &sample); - bpos->samples[i].he = sample; + bpos->samples[i].he = sample; + } else if (he) { + hevt = &he->events[i]; + notes = symbol__annotation(hevt->ms.sym); + bpos->samples[i].percent = + disasm__calc_percent(notes, + hevt->idx, + pos->offset, + next ? next->offset : len, + &path, &sample); + bpos->samples[i].he = sample; + } if (max_percent < bpos->samples[i].percent) max_percent = bpos->samples[i].percent; @@ -567,7 +581,7 @@ static bool annotate_browser__callq(struct annotate_browser *browser, } pthread_mutex_unlock(¬es->lock); - symbol__tui_annotate(target.sym, target.map, evsel, hbt); + symbol__tui_annotate(target.sym, target.map, evsel, hbt, NULL); sym_title(ms->sym, ms->map, title, sizeof(title)); ui_browser__show_title(&browser->b, title); return true; @@ -755,7 +769,8 @@ static void annotate_browser__update_addr_width(struct annotate_browser *browser static int annotate_browser__run(struct annotate_browser *browser, struct perf_evsel *evsel, -struct hist_browser_timer *hbt) +struct hist_browser_timer *hbt, +struct hist_entry *he) { struct rb_node *nd = NULL; struct map_symbol *ms = browser->b.priv; @@ -769,7 +784,7 @@ static int annotate_browser__run(struct annotate_browser *browser, if (ui_browser__show(&browser->b, title, help) < 0) return -1; - annotate_browser__calc_percent(browser, evsel); + annotate_browser__
[PATCH v1 1/4] perf annotate: create a new hists to manage multiple events samples
An issue is found during using perf annotate. perf record -e cycles,branches ... perf annotate main --stdio The result only shows cycles. It should show both cycles and branches on the left side. It works with "--group", but need this to work even without groups. In current design, the hists is per event. So we need a new hists to manage the samples for multiple events and use a new hist_event data structure to save the map/symbol information for per event. Signed-off-by: Jin Yao --- tools/perf/builtin-annotate.c | 60 +++ tools/perf/util/annotate.c| 8 ++ tools/perf/util/annotate.h| 4 +++ tools/perf/util/sort.c| 21 +++ tools/perf/util/sort.h| 13 ++ 5 files changed, 89 insertions(+), 17 deletions(-) diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c index 658c920..833866c 100644 --- a/tools/perf/builtin-annotate.c +++ b/tools/perf/builtin-annotate.c @@ -45,6 +45,7 @@ struct perf_annotate { bool skip_missing; const char *sym_hist_filter; const char *cpu_list; + struct hists events_hists; DECLARE_BITMAP(cpu_bitmap, MAX_NR_CPUS); }; @@ -153,6 +154,7 @@ static int perf_evsel__add_sample(struct perf_evsel *evsel, struct hists *hists = evsel__hists(evsel); struct hist_entry *he; int ret; + struct hist_event *hevt; if (ann->sym_hist_filter != NULL && (al->sym == NULL || @@ -177,12 +179,30 @@ static int perf_evsel__add_sample(struct perf_evsel *evsel, */ process_branch_stack(sample->branch_stack, al, sample); - he = hists__add_entry(hists, al, NULL, NULL, NULL, sample, true); - if (he == NULL) - return -ENOMEM; + if (symbol_conf.nr_events == 1) { + he = hists__add_entry(hists, al, NULL, NULL, NULL, + sample, true); + if (he == NULL) + return -ENOMEM; + + ret = hist_entry__inc_addr_samples(he, sample, evsel->idx, + al->addr); + hists__inc_nr_samples(hists, true); + } else { + he = hists__add_entry(&ann->events_hists, al, NULL, + NULL, NULL, sample, true); + if (he == NULL) + return -ENOMEM; + + hevt = hist_entry__event_add(he, evsel); + if (hevt == NULL) + return -ENOMEM; + + ret = hist_event__inc_addr_samples(hevt, sample, hevt->idx, + al->addr); + hists__inc_nr_samples(&ann->events_hists, true); + } - ret = hist_entry__inc_addr_samples(he, sample, evsel->idx, al->addr); - hists__inc_nr_samples(hists, true); return ret; } @@ -304,7 +324,8 @@ static int __cmd_annotate(struct perf_annotate *ann) int ret; struct perf_session *session = ann->session; struct perf_evsel *pos; - u64 total_nr_samples; + u64 total_nr_samples = 0; + struct hists *hists; if (ann->cpu_list) { ret = perf_session__cpu_bitmap(session, ann->cpu_list, @@ -335,23 +356,26 @@ static int __cmd_annotate(struct perf_annotate *ann) if (verbose > 2) perf_session__fprintf_dsos(session, stdout); - total_nr_samples = 0; - evlist__for_each_entry(session->evlist, pos) { - struct hists *hists = evsel__hists(pos); - u32 nr_samples = hists->stats.nr_events[PERF_RECORD_SAMPLE]; + if (symbol_conf.nr_events > 1) { + hists = &ann->events_hists; + total_nr_samples += + hists->stats.nr_events[PERF_RECORD_SAMPLE]; + + hists__collapse_resort(hists, NULL); + hists__output_resort(hists, NULL); + hists__find_annotations(hists, NULL, ann); + } else { + evlist__for_each_entry(session->evlist, pos) { + hists = evsel__hists(pos); + total_nr_samples += + hists->stats.nr_events[PERF_RECORD_SAMPLE]; - if (nr_samples > 0) { - total_nr_samples += nr_samples; hists__collapse_resort(hists, NULL); /* Don't sort callchain */ perf_evsel__reset_sample_bit(pos, CALLCHAIN); perf_evsel__output_resort(pos, NULL); - - if (symbol_conf.event_group && - !perf_evsel__is_group_leader(pos)) - continue; - hists__find_annotations(hists, pos, ann); +
[PATCH v1 4/4] perf annotate: Display multiple events for gtk mode
For example: perf record -e cycles,branches ./div perf annotate main --gtk Both the cycles and branches are displayed at the left column in gtk window. Signed-off-by: Jin Yao --- tools/perf/ui/gtk/annotate.c | 35 ++- 1 file changed, 22 insertions(+), 13 deletions(-) diff --git a/tools/perf/ui/gtk/annotate.c b/tools/perf/ui/gtk/annotate.c index 0217619..f314654 100644 --- a/tools/perf/ui/gtk/annotate.c +++ b/tools/perf/ui/gtk/annotate.c @@ -88,7 +88,8 @@ static int perf_gtk__get_line(char *buf, size_t size, struct disasm_line *dl) static int perf_gtk__annotate_symbol(GtkWidget *window, struct symbol *sym, struct map *map, struct perf_evsel *evsel, - struct hist_browser_timer *hbt __maybe_unused) + struct hist_browser_timer *hbt __maybe_unused, + struct hist_entry *he) { struct disasm_line *pos, *n; struct annotation *notes; @@ -98,6 +99,7 @@ static int perf_gtk__annotate_symbol(GtkWidget *window, struct symbol *sym, GtkWidget *view; int i; char s[512]; + struct hist_event *hevt; notes = symbol__annotation(sym); @@ -124,17 +126,18 @@ static int perf_gtk__annotate_symbol(GtkWidget *window, struct symbol *sym, gtk_list_store_append(store, &iter); - if (perf_evsel__is_group_event(evsel)) { - for (i = 0; i < evsel->nr_members; i++) { + if (evsel) { + ret = perf_gtk__get_percent(s, sizeof(s), sym, pos, + evsel->idx); + } else if (he) { + for (i = 0; i < he->event_nr; i++) { + hevt = &he->events[i]; ret += perf_gtk__get_percent(s + ret, sizeof(s) - ret, -sym, pos, -evsel->idx + i); +hevt->ms.sym, pos, +hevt->idx); ret += scnprintf(s + ret, sizeof(s) - ret, " "); } - } else { - ret = perf_gtk__get_percent(s, sizeof(s), sym, pos, - evsel->idx); } if (ret) @@ -157,19 +160,25 @@ static int perf_gtk__annotate_symbol(GtkWidget *window, struct symbol *sym, static int symbol__gtk_annotate(struct symbol *sym, struct map *map, struct perf_evsel *evsel, - struct hist_browser_timer *hbt) + struct hist_browser_timer *hbt, + struct hist_entry *he) { GtkWidget *window; GtkWidget *notebook; GtkWidget *scrolled_window; GtkWidget *tab_label; int err; + char *arch; + + if (!evsel) + arch = perf_evsel__env_arch(he->events[0].evsel); + else + arch = perf_evsel__env_arch(evsel); if (map->dso->annotate_warned) return -1; - err = symbol__disassemble(sym, map, perf_evsel__env_arch(evsel), - 0, NULL, NULL); + err = symbol__disassemble(sym, map, arch, 0, NULL, NULL); if (err) { char msg[BUFSIZ]; symbol__strerror_disassemble(sym, map, err, msg, sizeof(msg)); @@ -228,7 +237,7 @@ static int symbol__gtk_annotate(struct symbol *sym, struct map *map, gtk_notebook_append_page(GTK_NOTEBOOK(notebook), scrolled_window, tab_label); - perf_gtk__annotate_symbol(scrolled_window, sym, map, evsel, hbt); + perf_gtk__annotate_symbol(scrolled_window, sym, map, evsel, hbt, he); return 0; } @@ -236,7 +245,7 @@ int hist_entry__gtk_annotate(struct hist_entry *he, struct perf_evsel *evsel, struct hist_browser_timer *hbt) { - return symbol__gtk_annotate(he->ms.sym, he->ms.map, evsel, hbt); + return symbol__gtk_annotate(he->ms.sym, he->ms.map, evsel, hbt, he); } void perf_gtk__show_annotations(void) -- 2.7.4
[PATCH v1 2/4] perf annotate: Display multiple events for stdio mode
For example: perf record -e cycles,branches ./div perf annotate main --stdio Percent | Source code & Disassembly of div for branches,cycles (90966 samples) .. : for (i = 0; i < 20; i++) { : flag = compute_flag(); 5.774.85 :4004e8: xor%eax,%eax 0.010.01 :4004ea: callq 400640 : : count++; 2.384.38 :4004ef: mov0x200b57(%rip),%edx 0.000.00 :4004f5: add$0x1,%edx : : if (flag) 0.000.00 :4004f8: test %eax,%eax : srand(s_randseed); : : for (i = 0; i < 20; i++) { : flag = compute_flag(); : : count++; 0.600.44 :4004fa: mov%edx,0x200b4c(%rip) : : if (flag) 3.994.32 :400500: je 400532 Signed-off-by: Jin Yao --- tools/perf/builtin-annotate.c | 2 +- tools/perf/builtin-top.c | 3 +- tools/perf/util/annotate.c| 179 -- tools/perf/util/annotate.h| 6 +- 4 files changed, 145 insertions(+), 45 deletions(-) diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c index 833866c..98663bd 100644 --- a/tools/perf/builtin-annotate.c +++ b/tools/perf/builtin-annotate.c @@ -240,7 +240,7 @@ static int hist_entry__tty_annotate(struct hist_entry *he, struct perf_annotate *ann) { return symbol__tty_annotate(he->ms.sym, he->ms.map, evsel, - ann->print_line, ann->full_paths, 0, 0); + ann->print_line, ann->full_paths, 0, 0, he); } static void hists__find_annotations(struct hists *hists, diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c index ee954bd..2287667 100644 --- a/tools/perf/builtin-top.c +++ b/tools/perf/builtin-top.c @@ -245,7 +245,8 @@ static void perf_top__show_details(struct perf_top *top) printf(" Events Pcnt (>=%d%%)\n", top->sym_pcnt_filter); more = symbol__annotate_printf(symbol, he->ms.map, top->sym_evsel, - 0, top->sym_pcnt_filter, top->print_entries, 4); + 0, top->sym_pcnt_filter, + top->print_entries, 4, NULL); if (top->evlist->enabled) { if (top->zero) diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c index 16ec881..8630108 100644 --- a/tools/perf/util/annotate.c +++ b/tools/perf/util/annotate.c @@ -1071,7 +1071,8 @@ static void annotate__branch_printf(struct block_range *br, u64 addr) static int disasm_line__print(struct disasm_line *dl, struct symbol *sym, u64 start, struct perf_evsel *evsel, u64 len, int min_pcnt, int printed, - int max_lines, struct disasm_line *queue) + int max_lines, struct disasm_line *queue, + struct hist_entry *he) { static const char *prev_line; static const char *prev_color; @@ -1082,31 +1083,39 @@ static int disasm_line__print(struct disasm_line *dl, struct symbol *sym, u64 st double *ppercents = &percent; struct sym_hist_entry sample; struct sym_hist_entry *psamples = &sample; - int i, nr_percent = 1; + int i, nr_percent; const char *color; struct annotation *notes = symbol__annotation(sym); s64 offset = dl->offset; const u64 addr = start + offset; struct disasm_line *next; struct block_range *br; + struct hist_event *hevt; next = disasm__get_next_ip_line(¬es->src->source, dl); - - if (perf_evsel__is_group_event(evsel)) { - nr_percent = evsel->nr_members; - ppercents = calloc(nr_percent, sizeof(double)); - psamples = calloc(nr_percent, sizeof(struct sym_hist_entry)); - if (ppercents == NULL || psamples == NULL) { - return -1; - } + nr_percent = (evsel) ? 1 : he->event_nr; + ppercents = calloc(nr_percent, sizeof(double)); + psamples = calloc(nr_percent, sizeof(struct sym_hist_entry));
[PATCH v3 0/2] perf report: Implement visual marker for macro fusion in annotate
Macro fusion merges two instructions to a single micro-op. Intel core platform performs this hardware optimization under limited circumstances. For example, CMP + JCC can be "fused" and executed /retired together. While with sampling this can result in the sample sometimes being on the JCC and sometimes on the CMP. So for the fused instruction pair, they could be considered together. On Nehalem, fused instruction pairs: cmp/test + jcc. On other new CPU: cmp/test/add/sub/and/inc/dec + jcc. This patch series marks the case clearly by joining the fused instruction pair in the arrow of the jump. For example: │ ┌──cmpl $0x0,argp_program_version_hook 81.93 │ ├──je 20 │ │ lock cmpxchg %esi,0x38a9a4(%rip) │ │↓ jne29 │ │↓ jmp43 11.47 │20:└─→cmpxch %esi,0x38a999(%rip) Change-log: --- v3: 1. Add checking for Nehalem (CMP, TEST). For other newer Intel CPUs just check it by default (CMP, TEST, ADD, SUB, AND, INC, DEC). 2. Use Arnaldo's fix to let the display be better v2: According to Arnaldo's comments, remove the weak function and use an arch-specific function instead to check fused instruction pair. v1: Inital post Jin Yao (2): perf util: Check for fused instruction perf report: Implement visual marker for macro fusion in annotate tools/perf/arch/x86/annotate/instructions.c | 37 + tools/perf/ui/browser.c | 29 ++ tools/perf/ui/browser.h | 2 ++ tools/perf/ui/browsers/annotate.c | 32 + tools/perf/util/annotate.c | 17 + tools/perf/util/annotate.h | 3 +++ 6 files changed, 120 insertions(+) -- 2.7.4
[PATCH v3 2/2] perf report: Implement visual marker for macro fusion in annotate
For marking the fused instructions clearly, This patch adds a line before the first instruction of pair and joins it with the arrow of the jump. For example, when je is selected in annotate view, the line before cmpl is displayed and joins the arrow of je. │ ┌──cmpl $0x0,argp_program_version_hook 81.93 │ ├──je 20 │ │ lock cmpxchg %esi,0x38a9a4(%rip) │ │↓ jne29 │ │↓ jmp43 11.47 │20:└─→cmpxch %esi,0x38a999(%rip) That means the cmpl+je is fused instruction pair and they should be considered together. Change-log: --- v3: Use Arnaldo's fix to let the display be better. To get the evsel->evlist->env->cpuid, save the evsel in annotate_browser. v2: No more changes, just uses a new function "ins__is_fused" to check if the instructions are fused. v1: Initial post Signed-off-by: Jin Yao --- tools/perf/ui/browser.c | 29 + tools/perf/ui/browser.h | 2 ++ tools/perf/ui/browsers/annotate.c | 32 tools/perf/util/annotate.c| 5 + tools/perf/util/annotate.h| 1 + 5 files changed, 69 insertions(+) diff --git a/tools/perf/ui/browser.c b/tools/perf/ui/browser.c index a4d3762..9ef7677 100644 --- a/tools/perf/ui/browser.c +++ b/tools/perf/ui/browser.c @@ -738,6 +738,35 @@ void __ui_browser__line_arrow(struct ui_browser *browser, unsigned int column, __ui_browser__line_arrow_down(browser, column, start, end); } +void ui_browser__mark_fused(struct ui_browser *browser, unsigned int column, + unsigned int row, bool arrow_down) +{ + unsigned int end_row; + + if (row >= browser->top_idx) + end_row = row - browser->top_idx; + else + return; + + SLsmg_set_char_set(1); + + if (arrow_down) { + ui_browser__gotorc(browser, end_row, column - 1); + SLsmg_write_char(SLSMG_ULCORN_CHAR); + ui_browser__gotorc(browser, end_row, column); + SLsmg_draw_hline(2); + ui_browser__gotorc(browser, end_row + 1, column - 1); + SLsmg_write_char(SLSMG_LTEE_CHAR); + } else { + ui_browser__gotorc(browser, end_row, column - 1); + SLsmg_write_char(SLSMG_LTEE_CHAR); + ui_browser__gotorc(browser, end_row, column); + SLsmg_draw_hline(2); + } + + SLsmg_set_char_set(0); +} + void ui_browser__init(void) { int i = 0; diff --git a/tools/perf/ui/browser.h b/tools/perf/ui/browser.h index be3b70e..a12eff7 100644 --- a/tools/perf/ui/browser.h +++ b/tools/perf/ui/browser.h @@ -43,6 +43,8 @@ void ui_browser__printf(struct ui_browser *browser, const char *fmt, ...); void ui_browser__write_graph(struct ui_browser *browser, int graph); void __ui_browser__line_arrow(struct ui_browser *browser, unsigned int column, u64 start, u64 end); +void ui_browser__mark_fused(struct ui_browser *browser, unsigned int column, + unsigned int row, bool arrow_down); void __ui_browser__show_title(struct ui_browser *browser, const char *title); void ui_browser__show_title(struct ui_browser *browser, const char *title); int ui_browser__show(struct ui_browser *browser, const char *title, diff --git a/tools/perf/ui/browsers/annotate.c b/tools/perf/ui/browsers/annotate.c index 27f41f2..d3c7109 100644 --- a/tools/perf/ui/browsers/annotate.c +++ b/tools/perf/ui/browsers/annotate.c @@ -9,6 +9,7 @@ #include "../../util/symbol.h" #include "../../util/evsel.h" #include "../../util/config.h" +#include "../../util/evlist.h" #include #include #include @@ -55,6 +56,7 @@ struct annotate_browser { struct disasm_line *selection; struct disasm_line **offsets; struct arch *arch; + struct perf_evsel *evsel; int nr_events; u64 start; int nr_asm_entries; @@ -272,6 +274,28 @@ static bool disasm_line__is_valid_jump(struct disasm_line *dl, struct symbol *sy return true; } +static bool is_fused(struct annotate_browser *ab, struct disasm_line *cursor) +{ + struct disasm_line *pos = list_prev_entry(cursor, node); + struct perf_evsel *evsel = ab->evsel; + const char *name; + + if (!pos || !evsel || !evsel->evlist || !evsel->evlist->env || + !evsel->evlist->env->cpuid) + return false; + + if (ins__is_lock(&pos->ins)) + name = pos->ops.locked.ins.name; + else + name = pos->ins.name; + + if (!name || !cursor->ins.name) + return false; + + return ins__is_fused(ab->arch, evsel->evlist->env->cpuid, name, +cursor->ins.na
[PATCH v3 1/2] perf util: Check for fused instruction
Macro fusion merges two instructions to a single micro-op. Intel core platform performs this hardware optimization under limited circumstances. For example, CMP + JCC can be "fused" and executed /retired together. While with sampling this can result in the sample sometimes being on the JCC and sometimes on the CMP. So for the fused instruction pair, they could be considered together. On Nehalem, fused instruction pairs: cmp/test + jcc. On other new CPU: cmp/test/add/sub/and/inc/dec + jcc. This patch adds an x86-specific function which checks if 2 instructions are in a "fused" pair. For non-x86 arch, the function is just NULL. Change-log: --- v3: Add checking for Nehalem (CMP, TEST). For other newer Intel CPUs just check it by default (CMP, TEST, ADD, SUB, AND, INC, DEC). v2: Remove the origial weak function. Arnaldo points out that doing it as a weak function that will be overridden by the host arch doesn't work. So now it's implemented as an arch-specific function. v1: Initial post Signed-off-by: Jin Yao --- tools/perf/arch/x86/annotate/instructions.c | 37 + tools/perf/util/annotate.c | 12 ++ tools/perf/util/annotate.h | 2 ++ 3 files changed, 51 insertions(+) diff --git a/tools/perf/arch/x86/annotate/instructions.c b/tools/perf/arch/x86/annotate/instructions.c index c1625f2..8d06dd4 100644 --- a/tools/perf/arch/x86/annotate/instructions.c +++ b/tools/perf/arch/x86/annotate/instructions.c @@ -76,3 +76,40 @@ static struct ins x86__instructions[] = { { .name = "xbeginq",.ops = &jump_ops, }, { .name = "retq", .ops = &ret_ops, }, }; + +static bool x86__ins_is_fused(char *cpuid, const char *ins1, const char *ins2) +{ + unsigned int family, model, stepping; + int ret; + + /* +* cpuid = "GenuineIntel,family,model,stepping" +*/ + ret = sscanf(cpuid, "%*[^,],%u,%u,%u", &family, &model, &stepping); + + if ((ret != 3) || (family != 6) || (model < 0x1e) || +strstr(ins2, "jmp")) { + return false; + } + + if (model == 0x1e) { + /* Nehalem */ + if ((strstr(ins1, "cmp") && !strstr(ins1, "xchg")) || +strstr(ins1, "test")) { + return true; + } + } else { + /* Newer platform */ + if ((strstr(ins1, "cmp") && !strstr(ins1, "xchg")) || +strstr(ins1, "test") || +strstr(ins1, "add") || +strstr(ins1, "sub") || +strstr(ins1, "and") || +strstr(ins1, "inc") || +strstr(ins1, "dec")) { + return true; + } + } + + return false; +} diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c index be1caab..8cf6025 100644 --- a/tools/perf/util/annotate.c +++ b/tools/perf/util/annotate.c @@ -48,6 +48,8 @@ struct arch { boolinitialized; void*priv; int (*init)(struct arch *arch); + bool(*ins_is_fused)(char *cpuid, const char *ins1, + const char *ins2); struct { char comment_char; char skip_functions_char; @@ -129,6 +131,7 @@ static struct arch architectures[] = { .name = "x86", .instructions = x86__instructions, .nr_instructions = ARRAY_SIZE(x86__instructions), + .ins_is_fused = x86__ins_is_fused, .objdump = { .comment_char = '#', }, @@ -171,6 +174,15 @@ int ins__scnprintf(struct ins *ins, char *bf, size_t size, return ins__raw_scnprintf(ins, bf, size, ops); } +bool ins__is_fused(struct arch *arch, char *cpuid, const char *ins1, + const char *ins2) +{ + if (!arch || !arch->ins_is_fused || !cpuid) + return false; + + return arch->ins_is_fused(cpuid, ins1, ins2); +} + static int call__parse(struct arch *arch, struct ins_operands *ops, struct map *map) { char *endptr, *tok, *name; diff --git a/tools/perf/util/annotate.h b/tools/perf/util/annotate.h index 2105503..35cb7b5 100644 --- a/tools/perf/util/annotate.h +++ b/tools/perf/util/annotate.h @@ -53,6 +53,8 @@ bool ins__is_jump(const struct ins *ins); bool ins__is_call(const struct ins *ins); bool ins__is_ret(const struct ins *ins); int ins__scnprintf(struct ins *ins, char *bf, size_t size, struct ins_operands *ops); +bool ins__is_fused(struct arch *arch, char *cpuid, const char *ins1, + const char *ins2); struct annotation; -- 2.7.4
Re: [PATCH v5 2/6] perf record: Get the first sample time and last sample time
On 10/24/2017 3:16 PM, Jiri Olsa wrote: On Tue, Oct 24, 2017 at 10:03:05AM +0800, Jin, Yao wrote: SNIP hum, could you still unset the sample if there's no time given? and keep the speed in this case.. jirka Hi Jiri, I check this question again. The '--time' option is for perf report but not for perf record. For perf record, we have to always walk on all samples to get the time of first sample and the time of last sample whatever buildid_all is enabled or not enabled. So 'rec->tool.sample = NULL' is removed. Sorry, the previous mail was replied at midnight, I was drowsy. :( If my answer is correct, I will not send v6. If my understanding is still not correct, please let me know. right, I did not realize we store this unconditionaly.. then yes, it's ok I think I've already acked this, anyway for the patchset: Acked-by: Jiri Olsa thanks, jirka Hi Arnaldo, Is this patch-set OK for merging or anything I should improve? Thanks Jin Yao
Re: [PATCH v5 2/6] perf record: Get the first sample time and last sample time
On 11/4/2017 12:29 AM, Arnaldo Carvalho de Melo wrote: Em Tue, Oct 24, 2017 at 09:16:59AM +0200, Jiri Olsa escreveu: On Tue, Oct 24, 2017 at 10:03:05AM +0800, Jin, Yao wrote: SNIP hum, could you still unset the sample if there's no time given? and keep the speed in this case.. jirka Hi Jiri, I check this question again. The '--time' option is for perf report but not for perf record. For perf record, we have to always walk on all samples to get the time of first sample and the time of last sample whatever buildid_all is enabled or not enabled. So 'rec->tool.sample = NULL' is removed. Sorry, the previous mail was replied at midnight, I was drowsy. :( If my answer is correct, I will not send v6. If my understanding is still not correct, please let me know. right, I did not realize we store this unconditionaly.. then yes, it's ok And should we store this unconditionally? What this patch is doing is making 'perf record' unconditionally slower so that the generated perf.data file becomes useful for some usecases, but not for all, only people interested in using 'perf report/script --time' will benefit, right? Yes, that makes sense. I thought that we could get this sorted out in a different fashion, i.e. getting the first timestamp is easy, even if we don't process build-ids, right? To get the last one we could ask the kernel to insert an extra dummy sample at the end, one that we know the size and thus can to to the end of the file, rewind that size, get the event and parse the sample, agreed? Yes, agree. So I suggest that first make this conditional, i.e. 'perf record --timestamps' will enable the logic you implemented, and as a followup, if you agree, add the dummy, known size event at the end, and then even when build-ids are not processed, the cost for getting the timestamps will be next to zero. So we will use 2 steps for the implementation. Step1: Upgrade this patch series to v6. The change is to add "--timestamps" option in perf record. Step2: As a followup patch series, let the kernel generate a dummy sample at the end. That maybe needs more discussions on what the format we should use in dummy sample and how to generate the dummy sample in kernel. Let me complete the step1 first. Thanks Jin Yao - Arnaldo - Arnaldo I think I've already acked this, anyway for the patchset: Acked-by: Jiri Olsa thanks, jirka
Re: [PATCH v5 2/6] perf record: Get the first sample time and last sample time
On 11/4/2017 6:24 PM, Jiri Olsa wrote: On Fri, Nov 03, 2017 at 01:29:42PM -0300, Arnaldo Carvalho de Melo wrote: Em Tue, Oct 24, 2017 at 09:16:59AM +0200, Jiri Olsa escreveu: On Tue, Oct 24, 2017 at 10:03:05AM +0800, Jin, Yao wrote: SNIP hum, could you still unset the sample if there's no time given? and keep the speed in this case.. jirka Hi Jiri, I check this question again. The '--time' option is for perf report but not for perf record. For perf record, we have to always walk on all samples to get the time of first sample and the time of last sample whatever buildid_all is enabled or not enabled. So 'rec->tool.sample = NULL' is removed. Sorry, the previous mail was replied at midnight, I was drowsy. :( If my answer is correct, I will not send v6. If my understanding is still not correct, please let me know. right, I did not realize we store this unconditionaly.. then yes, it's ok And should we store this unconditionally? What this patch is doing is making 'perf record' unconditionally slower so that the generated perf.data file becomes useful for some usecases, but not for all, only people interested in using 'perf report/script --time' will benefit, right? maybe we can also silently enable that when processing buildids, (which is set by default), there's no big performance hit once we already go through samples jirka It's a good idea. Default enabling --timestamps in perf record since buildids is enabled by default as well. But if buildids is not enabled, then it needs to check if --timestamps is enabled. I will follow this rule in v6. Thanks Jin Yao I thought that we could get this sorted out in a different fashion, i.e. getting the first timestamp is easy, even if we don't process build-ids, right? To get the last one we could ask the kernel to insert an extra dummy sample at the end, one that we know the size and thus can to to the end of the file, rewind that size, get the event and parse the sample, agreed? So I suggest that first make this conditional, i.e. 'perf record --timestamps' will enable the logic you implemented, and as a followup, if you agree, add the dummy, known size event at the end, and then even when build-ids are not processed, the cost for getting the timestamps will be next to zero. - Arnaldo - Arnaldo I think I've already acked this, anyway for the patchset: Acked-by: Jiri Olsa thanks, jirka
[PATCH v6 3/6] perf util: Create function to parse time percent
Current perf report/script/... have a --time option to limit the time range of output. But right now it only supports absolute time. For easy using, now it can support a percent of time usage. For example: 1. Select the second 10% time slice perf report --time 10%/2 2. Select from 0% to 10% time slice perf report --time 0%-10% It also support the multiple time ranges. 3. Select the first and second 10% time slices perf report --time 10%/1,10%/2 4. Select from 0% to 10% and 30% to 40% slices perf report --time 0%-10%,30%-40% Change log: --- v4: An issue is found. Following passes. perf script --time 10%/10x12321xsdfdasfdsafdsafdsa Now it uses strtol to replace atoi. Signed-off-by: Jin Yao --- tools/perf/util/time-utils.c | 205 --- tools/perf/util/time-utils.h | 3 + 2 files changed, 196 insertions(+), 12 deletions(-) diff --git a/tools/perf/util/time-utils.c b/tools/perf/util/time-utils.c index 5b5d021..79e4281 100644 --- a/tools/perf/util/time-utils.c +++ b/tools/perf/util/time-utils.c @@ -5,6 +5,7 @@ #include #include #include +#include #include "perf.h" #include "debug.h" @@ -59,11 +60,10 @@ static int parse_timestr_sec_nsec(struct perf_time_interval *ptime, return 0; } -int perf_time__parse_str(struct perf_time_interval *ptime, const char *ostr) +static int split_start_end(char **start, char **end, const char *ostr, char ch) { char *start_str, *end_str; char *d, *str; - int rc = 0; if (ostr == NULL || *ostr == '\0') return 0; @@ -73,25 +73,35 @@ int perf_time__parse_str(struct perf_time_interval *ptime, const char *ostr) if (str == NULL) return -ENOMEM; - ptime->start = 0; - ptime->end = 0; - - /* str has the format: , -* variations: , -* , -* , -*/ start_str = str; - d = strchr(start_str, ','); + d = strchr(start_str, ch); if (d) { *d = '\0'; ++d; } end_str = d; + *start = start_str; + *end = end_str; + + return 0; +} + +int perf_time__parse_str(struct perf_time_interval *ptime, const char *ostr) +{ + char *start_str = NULL, *end_str; + int rc; + + rc = split_start_end(&start_str, &end_str, ostr, ','); + if (rc || !start_str) + return rc; + + ptime->start = 0; + ptime->end = 0; + rc = parse_timestr_sec_nsec(ptime, start_str, end_str); - free(str); + free(start_str); /* make sure end time is after start time if it was given */ if (rc == 0 && ptime->end && ptime->end < ptime->start) @@ -103,6 +113,177 @@ int perf_time__parse_str(struct perf_time_interval *ptime, const char *ostr) return rc; } +static int parse_percent(double *pcnt, char *str) +{ + char *c; + + c = strchr(str, '%'); + if (c) + *c = '\0'; + else + return -1; + + *pcnt = atof(str) / 100.0; + + return 0; +} + +static int percent_slash_split(char *str, struct perf_time_interval *ptime, + u64 start, u64 end) +{ + char *p, *end_str; + double pcnt, start_pcnt, end_pcnt; + u64 total = end - start; + int i; + + /* +* Example: +* 10%/2: select the second 10% slice and the third 10% slice +*/ + + /* We can modify this string since the original one is copied */ + p = strchr(str, '/'); + if (!p) + return -1; + + *p = '\0'; + if (parse_percent(&pcnt, str) < 0) + return -1; + + p++; + i = (int)strtol(p, &end_str, 10); + if (*end_str) + return -1; + + if (pcnt <= 0.0) + return -1; + + start_pcnt = pcnt * (i - 1); + end_pcnt = pcnt * i; + + if (start_pcnt < 0.0 || start_pcnt > 1.0 || + end_pcnt < 0.0 || end_pcnt > 1.0) { + return -1; + } + + ptime->start = start + round(start_pcnt * total); + ptime->end = start + round(end_pcnt * total); + + return 0; +} + +static int percent_dash_split(char *str, struct perf_time_interval *ptime, + u64 start, u64 end) +{ + char *start_str = NULL, *end_str; + double start_pcnt, end_pcnt; + u64 total = end - start; + int ret; + + /* +* Example: 0%-10% +*/ + + ret = split_start_end(&start_str, &end_str, str, '-'); + if (ret || !start_str) + return ret; + + if ((parse_percent(&start_pcnt, start_str) != 0) || + (parse_percent(&end_pcnt, end_str
[PATCH v6 5/6] perf report: support time percent and multiple time ranges
perf report has a --time option to limit the time range of output. It only supports absolute time. Now this option is extended to support multiple time ranges and support the percent of time. For example: 1. Select the first and second 10% time slices perf report --time 10%/1,10%/2 2. Select from 0% to 10% and 30% to 40% slices perf report --time 0%-10%,30%-40% Change log: --- v6: Fix the merge issue with latest perf/core branch. No functional changes. v5: Add checking of first/last sample time to detect if it's recorded in perf.data. If it's not recorded, returns error message to user. v4: Remove perf_time__skip_sample, only uses perf_time__ranges_skip_sample v3: Since the definitions of first_sample_time/last_sample_time are moved from perf_session to perf_evlist so change the related code. Signed-off-by: Jin Yao --- tools/perf/Documentation/perf-report.txt | 16 tools/perf/builtin-report.c | 31 ++- 2 files changed, 42 insertions(+), 5 deletions(-) diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt index ddde2b5..ed36553 100644 --- a/tools/perf/Documentation/perf-report.txt +++ b/tools/perf/Documentation/perf-report.txt @@ -402,6 +402,22 @@ OPTIONS stop time is not given (i.e, time string is 'x.y,') then analysis goes to end of file. + Also support time percent with multipe time range. Time string is + 'a%/n,b%/m,...' or 'a%-b%,c%-%d,...'. The maximum number of slices is 10. + + For example: + Select the second 10% time slice + perf report --time 10%/2 + + Select from 0% to 10% time slice + perf report --time 0%-10% + + Select the first and second 10% time slices + perf report --time 10%/1,10%/2 + + Select from 0% to 10% and 30% to 40% slices + perf report --time 0%-10%,30%-40% + --itrace:: Options for decoding instruction tracing data. The options are: diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c index 3c2d9d4..c43181d 100644 --- a/tools/perf/builtin-report.c +++ b/tools/perf/builtin-report.c @@ -51,6 +51,8 @@ #include #include +#define PTIME_RANGE_MAX10 + struct report { struct perf_tooltool; struct perf_session *session; @@ -68,7 +70,8 @@ struct report { const char *cpu_list; const char *symbol_filter_str; const char *time_str; - struct perf_time_interval ptime; + struct perf_time_interval ptime_range[PTIME_RANGE_MAX]; + int range_num; float min_percent; u64 nr_entries; u64 queue_size; @@ -185,8 +188,10 @@ static int process_sample_event(struct perf_tool *tool, }; int ret = 0; - if (perf_time__skip_sample(&rep->ptime, sample->time)) + if (perf_time__ranges_skip_sample(rep->ptime_range, rep->range_num, + sample->time)) { return 0; + } if (machine__resolve(machine, &al, sample) < 0) { pr_debug("problem processing %d event, skipping it.\n", @@ -1073,9 +1078,25 @@ int cmd_report(int argc, const char **argv) if (symbol__init(&session->header.env) < 0) goto error; - if (perf_time__parse_str(&report.ptime, report.time_str) != 0) { - pr_err("Invalid time string\n"); - return -EINVAL; + if (perf_time__parse_str(report.ptime_range, report.time_str) != 0) { + if (session->evlist->first_sample_time == 0 && + session->evlist->last_sample_time == 0) { + pr_err("No first/last sample time in perf data\n"); + return -EINVAL; + } + + report.range_num = perf_time__percent_parse_str( + report.ptime_range, PTIME_RANGE_MAX, + report.time_str, + session->evlist->first_sample_time, + session->evlist->last_sample_time); + + if (report.range_num < 0) { + pr_err("Invalid time string\n"); + return -EINVAL; + } + } else { + report.range_num = 1; } sort__setup_elide(stdout); -- 2.7.4
[PATCH v6 1/6] perf header: Record first sample time and last sample time in perf file header
perf report/script/... have a --time option to limit the time range of output. That's very useful to slice large traces, e.g. when processing the output of perf script for some analysis. But right now --time only supports absolute time. Also there is no fast way to get the start/end times of a given trace except for looking at it. This makes it hard to e.g. only decode the first half of the trace, which is useful for parallelization of scripts Another problem is that perf records are variable size and there is no synchronization mechanism. So the only way to find the last sample reliably would be to walk all samples. But we want to avoid that in perf report/... because it is already quite expensive. That is why storing the first sample time and last sample time in perf record is better. This patch creates a new header feature type HEADER_SAMPLE_TIME and related ops. Save the first sample time and the last sample time to the feature section in perf file header. Change log: --- v4: Use perf script time style for timestamp printing. Also add with the printing of sample duration. v3: Remove the definitions of first_sample_time/last_sample_time from perf_session. Just define them in perf_evlist Signed-off-by: Jin Yao --- tools/perf/Documentation/perf.data-file-format.txt | 4 ++ tools/perf/util/evlist.h | 2 + tools/perf/util/header.c | 60 ++ tools/perf/util/header.h | 1 + 4 files changed, 67 insertions(+) diff --git a/tools/perf/Documentation/perf.data-file-format.txt b/tools/perf/Documentation/perf.data-file-format.txt index e90c59c..d5e3043 100644 --- a/tools/perf/Documentation/perf.data-file-format.txt +++ b/tools/perf/Documentation/perf.data-file-format.txt @@ -238,6 +238,10 @@ struct auxtrace_index { struct auxtrace_index_entry entries[PERF_AUXTRACE_INDEX_ENTRY_COUNT]; }; + HEADER_SAMPLE_TIME = 21, + +Two uint64_t for the time of first sample and the time of last sample. + other bits are reserved and should ignored for now HEADER_FEAT_BITS= 256, diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h index 8c433e9..a08f407 100644 --- a/tools/perf/util/evlist.h +++ b/tools/perf/util/evlist.h @@ -50,6 +50,8 @@ struct perf_evlist { struct perf_evsel *selected; struct events_stats stats; struct perf_env *env; + u64 first_sample_time; + u64 last_sample_time; }; struct perf_evsel_str_handler { diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c index 6e59dcc..fd81c5b1 100644 --- a/tools/perf/util/header.c +++ b/tools/perf/util/header.c @@ -17,6 +17,7 @@ #include #include #include +#include #include "evlist.h" #include "evsel.h" @@ -36,6 +37,7 @@ #include #include "asm/bug.h" #include "tool.h" +#include "time-utils.h" #include "sane_ctype.h" @@ -1181,6 +1183,20 @@ static int write_stat(struct feat_fd *ff __maybe_unused, return 0; } +static int write_sample_time(struct feat_fd *ff, +struct perf_evlist *evlist) +{ + int ret; + + ret = do_write(ff, &evlist->first_sample_time, + sizeof(evlist->first_sample_time)); + if (ret < 0) + return ret; + + return do_write(ff, &evlist->last_sample_time, + sizeof(evlist->last_sample_time)); +} + static void print_hostname(struct feat_fd *ff, FILE *fp) { fprintf(fp, "# hostname : %s\n", ff->ph->env.hostname); @@ -1506,6 +1522,28 @@ static void print_group_desc(struct feat_fd *ff, FILE *fp) } } +static void print_sample_time(struct feat_fd *ff, FILE *fp) +{ + struct perf_session *session; + char time_buf[32]; + double d; + + session = container_of(ff->ph, struct perf_session, header); + + timestamp__scnprintf_usec(session->evlist->first_sample_time, + time_buf, sizeof(time_buf)); + fprintf(fp, "# time of first sample : %s\n", time_buf); + + timestamp__scnprintf_usec(session->evlist->last_sample_time, + time_buf, sizeof(time_buf)); + fprintf(fp, "# time of last sample : %s\n", time_buf); + + d = (double)(session->evlist->last_sample_time - + session->evlist->first_sample_time) / NSEC_PER_MSEC; + + fprintf(fp, "# sample duration : %10.3f ms\n", d); +} + static int __event_process_build_id(struct build_id_event *bev, char *filename, struct perf_session *session) @@ -2147,6 +2185,27 @@ static int process_cache(struct feat_fd *ff, void *data __maybe_unused) return -1; } +static int process_sam
[PATCH v6 6/6] perf script: support time percent and multiple time ranges
perf script has a --time option to limit the time range of output. It only supports absolute time. Now this option is extended to support multiple time ranges and support the percent of time. For example: 1. Select the first and second 10% time slices perf script --time 10%/1,10%/2 2. Select from 0% to 10% and 30% to 40% slices perf script --time 0%-10%,30%-40% Change log: --- v6: Fix the merge issue with latest perf/core branch. No functional changes. v5: Add checking of first/last sample time to detect if it's recorded in perf.data. If it's not recorded, returns error message to user. v4: Remove perf_time__skip_sample, only uses perf_time__ranges_skip_sample v3: Since the definitions of first_sample_time/last_sample_time are moved from perf_session to perf_evlist so change the related code. Signed-off-by: Jin Yao --- tools/perf/Documentation/perf-script.txt | 16 +++ tools/perf/builtin-script.c | 34 ++-- 2 files changed, 44 insertions(+), 6 deletions(-) diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt index 2811fcf..5313fd1 100644 --- a/tools/perf/Documentation/perf-script.txt +++ b/tools/perf/Documentation/perf-script.txt @@ -321,6 +321,22 @@ include::itrace.txt[] stop time is not given (i.e, time string is 'x.y,') then analysis goes to end of file. + Also support time percent with multipe time range. Time string is + 'a%/n,b%/m,...' or 'a%-b%,c%-%d,...'. The maximum number of slices is 10. + + For example: + Select the second 10% time slice + perf script --time 10%/2 + + Select from 0% to 10% time slice + perf script --time 0%-10% + + Select the first and second 10% time slices + perf script --time 10%/1,10%/2 + + Select from 0% to 10% and 30% to 40% slices + perf script --time 0%-10%,30%-40% + --max-blocks:: Set the maximum number of program blocks to print with brstackasm for each sample. diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c index 89975e3..dad548f 100644 --- a/tools/perf/builtin-script.c +++ b/tools/perf/builtin-script.c @@ -1429,6 +1429,8 @@ static int perf_sample__fprintf_synth(struct perf_sample *sample, return 0; } +#define PTIME_RANGE_MAX10 + struct perf_script { struct perf_tooltool; struct perf_session *session; @@ -1442,7 +1444,8 @@ struct perf_script { struct thread_map *threads; int name_width; const char *time_str; - struct perf_time_interval ptime; + struct perf_time_interval ptime_range[PTIME_RANGE_MAX]; + int range_num; }; static int perf_evlist__max_name_len(struct perf_evlist *evlist) @@ -1642,8 +1645,10 @@ static int process_sample_event(struct perf_tool *tool, struct perf_script *scr = container_of(tool, struct perf_script, tool); struct addr_location al; - if (perf_time__skip_sample(&scr->ptime, sample->time)) + if (perf_time__ranges_skip_sample(scr->ptime_range, scr->range_num, + sample->time)) { return 0; + } if (debug_mode) { if (sample->time < last_timestamp) { @@ -3252,10 +3257,27 @@ int cmd_script(int argc, const char **argv) goto out_delete; /* needs to be parsed after looking up reference time */ - if (perf_time__parse_str(&script.ptime, script.time_str) != 0) { - pr_err("Invalid time string\n"); - err = -EINVAL; - goto out_delete; + if (perf_time__parse_str(script.ptime_range, script.time_str) != 0) { + if (session->evlist->first_sample_time == 0 && + session->evlist->last_sample_time == 0) { + pr_err("No first/last sample time in perf data\n"); + err = -EINVAL; + goto out_delete; + } + + script.range_num = perf_time__percent_parse_str( + script.ptime_range, PTIME_RANGE_MAX, + script.time_str, + session->evlist->first_sample_time, + session->evlist->last_sample_time); + + if (script.range_num < 0) { + pr_err("Invalid time string\n"); + err = -EINVAL; + goto out_delete; + } + } else { + script.range_num = 1; } err = __cmd_script(&script); -- 2.7.4
[PATCH v6 0/6] perf report/script: Support percent and multiple range in --time option
v6: --- 1. Create a new option "--timestamp-boundary" in perf record. Currently '--buildid-all' is not enabled by default. So the walking on all samples is the default operation. There is no big overhead to calculate the timestamp boundary in process_sample_event handler once we already go through all samples. So the timestamp boundary calculation is enabled by default when '--buildid-all' is not enabled. While if '--buildid-all' is enabled, we creates a new option "--timestamp-boundary" for user to decide if it enables the timestamp boundary calculation. Impacted patch: --- perf record: Get the first sample time and last sample time 2. Fix the merge issue with the latest perf/core branch. No functional changes. Impacted patch: --- perf report: support time percent and multiple time ranges perf script: support time percent and multiple time ranges v5: --- 1. There is an issue that the sample walking can only work when '--buildid-all' is not enabled. So we need to let the walking be able to work even if '--buildid-all' is enabled and let the processing skips the dso hit marking for this case. 2. Check if first/last sample time is recorded in perf data file. If it's not recorded, return error message to user. Patched modified in v5: perf record: Get the first sample time and last sample time perf report: support time percent and multiple time ranges perf script: support time percent and multiple time ranges v4: --- 1. Use perf script time style for timestamp printing. Also add with the printing of sample duration. For example: perf report --header time of first sample : 5276531.323099 time of last sample : 5276555.345625 sample duration : 24022.526 ms 2. Fix an invalid time string issue. For example, perf script --time 10%/10x12321xsdfdasfdsafdsafdsa Now in code, it uses strtol to replace atoi. 3. Remove perf_time__skip_sample, only uses perf_time__ranges_skip_sample in perf report/perf script. v3: --- 1. Move the definitions of first_sample_time/last_sample_time from perf_session and struct record to perf_evlist and update the related code. v2: --- 1. This patch creates a new header feature type HEADER_SAMPLE_TIME and related ops. Save the first sample time and the last sample time to the feature section in perf file header. 2. Add checking for last element in time range. For example, select the first and second 10% time slices. perf report --time 10%/1,10%/2 Note that now it includes the last element in [10%, 20%] but it doesn't include the last element in [0, 10%). It's to avoid the overlap. Following patches are changed: perf header: Record first sample time and last sample time in perf file header perf record: Get the first sample time and last sample time perf util: Create function to perform multiple time range checking v1: initial post Current perf report/script/... have a --time option to limit the time range of output. But it only supports the absolute time. The patch series extend this option to let it support percent of time and support the multiple time ranges. For example: 1. Select the second 10% time slice perf report --time 10%/2 2. Select from 0% to 10% time slice perf report --time 0%-10% It also support the multiple time ranges. 3. Select the first and second 10% time slices perf report --time 10%/1,10%/2 4. Select from 0% to 10% and 30% to 40% slices perf report --time 0%-10%,30%-40% Jin Yao (6): perf header: Record first sample time and last sample time in perf file header perf record: Get the first sample time and last sample time perf util: Create function to parse time percent perf util: Create function to perform multiple time range checking perf report: support time percent and multiple time ranges perf script: support time percent and multiple time ranges tools/perf/Documentation/perf-record.txt | 3 + tools/perf/Documentation/perf-report.txt | 16 ++ tools/perf/Documentation/perf-script.txt | 16 ++ tools/perf/Documentation/perf.data-file-format.txt | 4 + tools/perf/builtin-record.c| 18 +- tools/perf/builtin-report.c| 31 ++- tools/perf/builtin-script.c| 34 ++- tools/perf/util/evlist.h | 2 + tools/perf/util/header.c | 60 ++ tools/perf/util/header.h | 1 + tools/perf/util/time-utils.c | 233 +++-- tools/perf/util/time-utils.h | 6 + 12 files changed, 398 insertions(+), 26 deletions(-) -- 2.7.4
[PATCH v6 2/6] perf record: Get the first sample time and last sample time
In perf record, it's walked on all samples yet. So it's very easy to get the first/last samples and save the time to perf file header via the function write_sample_time(). In later, perf report/script will fetch the time from perf file header. Change log: --- v6: Currently '--buildid-all' is not enabled at default. So the walking on all samples is the default operation. There is no big overhead to calculate the timestamp boundary in process_sample_event handler once we already go through all samples. So the timestamp boundary calculation is enabled by default when '--buildid-all' is not enabled. While if '--buildid-all' is enabled, we creates a new option "--timestamp-boundary" for user to decide if it enables the timestamp boundary calculation. v5: There is an issue that the sample walking can only work when '--buildid-all' is not enabled. So we need to let the walking be able to work even if '--buildid-all' is enabled and let the processing skips the dso hit marking for this case. At first, I want to provide a new option "--record-time-boundaries". While after consideration, I think a new option is not very necessary. v3: Remove the definitions of first_sample_time and last_sample_time from struct record and directly save them in perf_evlist. Signed-off-by: Jin Yao --- tools/perf/Documentation/perf-record.txt | 3 +++ tools/perf/builtin-record.c | 18 +++--- 2 files changed, 18 insertions(+), 3 deletions(-) diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt index 5a626ef..3eea6de 100644 --- a/tools/perf/Documentation/perf-record.txt +++ b/tools/perf/Documentation/perf-record.txt @@ -430,6 +430,9 @@ Configure all used events to run in user space. --timestamp-filename Append timestamp to output file name. +--timestamp-boundary:: +Record timestamp boundary (time of first/last samples). + --switch-output[=mode]:: Generate multiple perf.data files, timestamp prefixed, switching to a new one based on 'mode' value: diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c index f4d9fc5..082a0cb 100644 --- a/tools/perf/builtin-record.c +++ b/tools/perf/builtin-record.c @@ -78,6 +78,7 @@ struct record { boolno_buildid_cache_set; boolbuildid_all; booltimestamp_filename; + booltimestamp_boundary; struct switch_outputswitch_output; unsigned long long samples; }; @@ -391,8 +392,15 @@ static int process_sample_event(struct perf_tool *tool, { struct record *rec = container_of(tool, struct record, tool); - rec->samples++; + if (rec->evlist->first_sample_time == 0) + rec->evlist->first_sample_time = sample->time; + + rec->evlist->last_sample_time = sample->time; + if (rec->buildid_all) + return 0; + + rec->samples++; return build_id__mark_dso_hit(tool, event, sample, evsel, machine); } @@ -417,9 +425,11 @@ static int process_buildids(struct record *rec) /* * If --buildid-all is given, it marks all DSO regardless of hits, -* so no need to process samples. +* so no need to process samples. But if timestamp_boundary is enabled, +* it still needs to walk on all samples to get the timestamps of +* first/last samples. */ - if (rec->buildid_all) + if (rec->buildid_all && !rec->timestamp_boundary) rec->tool.sample = NULL; return perf_session__process_events(session); @@ -1579,6 +1589,8 @@ static struct option __record_options[] = { "Record build-id of all DSOs regardless of hits"), OPT_BOOLEAN(0, "timestamp-filename", &record.timestamp_filename, "append timestamp to output filename"), + OPT_BOOLEAN(0, "timestamp-boundary", &record.timestamp_boundary, + "Record timestamp boundary (time of first/last samples)"), OPT_STRING_OPTARG_SET(0, "switch-output", &record.switch_output.str, &record.switch_output.set, "signal,size,time", "Switch output when receive SIGUSR2 or cross size,time threshold", -- 2.7.4
[PATCH v6 4/6] perf util: Create function to perform multiple time range checking
Previous patch supports the multiple time range. For example, select the first and second 10% time slices. perf report --time 10%/1,10%/2 We need a function to check if a timestamp is in the ranges of [0, 10%) and [10%, 20%]. Note that it includes the last element in [10%, 20%] but it doesn't include the last element in [0, 10%). It's to avoid the overlap. This patch implments a new function perf_time__ranges_skip_sample for this checking. Change log: --- v4: Let perf_time__ranges_skip_sample be compatible with perf_time__skip_sample when only one time range. Signed-off-by: Jin Yao --- tools/perf/util/time-utils.c | 28 tools/perf/util/time-utils.h | 3 +++ 2 files changed, 31 insertions(+) diff --git a/tools/perf/util/time-utils.c b/tools/perf/util/time-utils.c index 79e4281..b380356 100644 --- a/tools/perf/util/time-utils.c +++ b/tools/perf/util/time-utils.c @@ -299,6 +299,34 @@ bool perf_time__skip_sample(struct perf_time_interval *ptime, u64 timestamp) return false; } +bool perf_time__ranges_skip_sample(struct perf_time_interval *ptime_buf, + int num, u64 timestamp) +{ + struct perf_time_interval *ptime; + int i; + + if ((timestamp == 0) || (num == 0)) + return false; + + if (num == 1) + return perf_time__skip_sample(&ptime_buf[0], timestamp); + + /* +* start/end of multiple time ranges must be valid. +*/ + for (i = 0; i < num; i++) { + ptime = &ptime_buf[i]; + + if (timestamp >= ptime->start && + ((timestamp < ptime->end && i < num - 1) || +(timestamp <= ptime->end && i == num - 1))) { + break; + } + } + + return (i == num) ? true : false; +} + int timestamp__scnprintf_usec(u64 timestamp, char *buf, size_t sz) { u64 sec = timestamp / NSEC_PER_SEC; diff --git a/tools/perf/util/time-utils.h b/tools/perf/util/time-utils.h index fd018e2..de279ea 100644 --- a/tools/perf/util/time-utils.h +++ b/tools/perf/util/time-utils.h @@ -17,6 +17,9 @@ int perf_time__percent_parse_str(struct perf_time_interval *ptime_buf, int num, bool perf_time__skip_sample(struct perf_time_interval *ptime, u64 timestamp); +bool perf_time__ranges_skip_sample(struct perf_time_interval *ptime_buf, + int num, u64 timestamp); + int timestamp__scnprintf_usec(u64 timestamp, char *buf, size_t sz); int fetch_current_timestamp(char *buf, size_t sz); -- 2.7.4
[PATCH] perf util: Fix wrong processing when closing evsel fd
In current xyarray code, xyarray__max_x() returns max_y, and xyarray__max_y() returns max_x. It's confusing and for code logic it looks not correct. Error happens when closing evsel fd. Let's see this scenario: 1. Allocate an fd (pseudo-code) --- perf_evsel__alloc_fd(struct perf_evsel *evsel, int ncpus, int nthreads) { evsel->fd = xyarray__new(ncpus, nthreads, sizeof(int)); } xyarray__new(int xlen, int ylen, size_t entry_size) { size_t row_size = ylen * entry_size; struct xyarray *xy = zalloc(sizeof(*xy) + xlen * row_size); xy->entry_size = entry_size; xy->row_size = row_size; xy->entries= xlen * ylen; xy->max_x = xlen; xy->max_y = ylen; .. } So max_x is ncpus, max_y is nthreads and row_size = nthreads * 4. 2. Use perf syscall and get the fd -- int perf_evsel__open(struct perf_evsel *evsel, struct cpu_map *cpus, struct thread_map *threads) { for (cpu = 0; cpu < cpus->nr; cpu++) { for (thread = 0; thread < nthreads; thread++) { int fd, group_fd; fd = sys_perf_event_open(&evsel->attr, pid, cpus->map[cpu], group_fd, flags); FD(evsel, cpu, thread) = fd; } } static inline void *xyarray__entry(struct xyarray *xy, int x, int y) { return &xy->contents[x * xy->row_size + y * xy->entry_size]; } These codes don't have issues. The issue happens in the closing of fd. 3. Close fd. --- void perf_evsel__close_fd(struct perf_evsel *evsel) { int cpu, thread; for (cpu = 0; cpu < xyarray__max_x(evsel->fd); cpu++) for (thread = 0; thread < xyarray__max_y(evsel->fd); ++thread) { close(FD(evsel, cpu, thread)); FD(evsel, cpu, thread) = -1; } } Since xyarray__max_x() returns max_y (nthreads) and xyarry__max_y() returns max_x (ncpus), so above code is actually to be: for (cpu = 0; cpu < nthreads; cpu++) for (thread = 0; thread < ncpus; ++thread) { close(FD(evsel, cpu, thread)); FD(evsel, cpu, thread) = -1; } It's not correct! This change is introduced by "475fb533fb7d" ("perf evsel: Fix buffer overflow while freeing events") This fix is to let xyarray__max_x() return max_x (ncpus) and let xyarry__max_y() return max_y (nthreads) Signed-off-by: Jin Yao --- tools/perf/util/xyarray.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/perf/util/xyarray.h b/tools/perf/util/xyarray.h index 4ba726c..54af6046 100644 --- a/tools/perf/util/xyarray.h +++ b/tools/perf/util/xyarray.h @@ -23,12 +23,12 @@ static inline void *xyarray__entry(struct xyarray *xy, int x, int y) static inline int xyarray__max_y(struct xyarray *xy) { - return xy->max_x; + return xy->max_y; } static inline int xyarray__max_x(struct xyarray *xy) { - return xy->max_y; + return xy->max_x; } #endif /* _PERF_XYARRAY_H_ */ -- 2.7.4
Re: [PATCH v4 0/6] perf report/script: Support percent and multiple range in --time option
On 10/5/2017 4:50 PM, Jiri Olsa wrote: On Tue, Oct 03, 2017 at 10:22:32PM +0800, Jin Yao wrote: v4: --- 1. Use perf script time style for timestamp printing. Also add with the printing of sample duration. For example: perf report --header time of first sample : 5276531.323099 time of last sample : 5276555.345625 sample duration : 24022.526 ms 2. Fix an invalid time string issue. For example, perf script --time 10%/10x12321xsdfdasfdsafdsafdsa Now in code, it uses strtol to replace atoi. for the patchset: Acked-by: Jiri Olsa thanks, jirka Thanks Jiri! Hi Arnaldo, Could this patchset be merged? Thanks Jin Yao
Re: [PATCH v4 2/6] perf record: Get the first sample time and last sample time
On 10/20/2017 4:25 AM, Arnaldo Carvalho de Melo wrote: Em Thu, Oct 19, 2017 at 05:21:27PM -0300, Arnaldo Carvalho de Melo escreveu: Em Tue, Oct 03, 2017 at 10:22:34PM +0800, Jin Yao escreveu: In perf record, it's walked on all samples yet. So it's very easy to get You're saying that perf record walks all samples always? That only happens when we generate the build-id table, right? And people disable that to speed up the process, knowing that some limitations will come from that, for doing analysis right after running it is mostly OK to disable the build-id processing. So either you add a new option that processes all events without doing build-id processing (and all the locking, struct thread, map, etc processing it entails) and just looks at the sample->time, and when build id processing is enabled, just do as you're doing in this patch, then, at perf report --time you should look to see if those start/end times were filled in and if not tell that to the user, i.e. that either --record-time-boundaries (or a better name :-) ) has to be used, or, that build-id process, with a short explanation that --record-time-boundaries is a bit cheaper. - Arnaldo Hi Arnaldo, Thanks so much for reminding me that the walking only happens when build-id processing is enabled. Yes, the step is same as what you said so there will be an issue when the build-id processing is disabled. According to your suggestion, I will provide a new option "--record-time-boundaries" in perf record. If user disables the build-id processing, the perf record will ask user to set the "--record-time-boundaries". If user enables the build-id processing, the perf record will not ask user to set the "--record-time-boundaries". Also in perf report, if it doesn't see start/end time filled in the perf file header, it will show some information to let user know he should set "--record-time-boundaries" or enable the build-id processing in perf record. I will provide v5 patch series, maybe some days later. Thanks Jin Yao
[PATCH] perf/x86/intel/uncore: Provide alias for IIO free-running boxes on SKX
For Skylake Server, Linux has supported a number of free running counters that collect counts of IO clocks/Bandwidth/Utilization. For example, to collect the inbound bandwidth, root@skx /sys/devices# ls | grep uncore_iio uncore_iio_0 uncore_iio_1 uncore_iio_2 uncore_iio_3 uncore_iio_4 uncore_iio_5 uncore_iio_free_running_0 uncore_iio_free_running_1 uncore_iio_free_running_2 uncore_iio_free_running_3 uncore_iio_free_running_4 uncore_iio_free_running_5 root@skx /sys/devices# perf stat -a -e uncore_iio_free_running_2/bw_in_port0/ ^C Performance counter stats for 'system wide': 153.19 MiB uncore_iio_free_running_2/bw_in_port0/ 8.037701069 seconds time elapsed While it's hard for user to understanding what the box the uncore_iio_free_running_N means. This patch provides aliases for the boxes. With this patch, for example, root@skx /sys/devices# ls | grep uncore_iio uncore_iio_0 uncore_iio_1 uncore_iio_2 uncore_iio_3 uncore_iio_4 uncore_iio_5 uncore_iio_cbdma uncore_iio_mcp0 uncore_iio_mcp1 uncore_iio_pcie0 uncore_iio_pcie1 uncore_iio_pcie2 root@skx ~# perf stat -a -e uncore_iio_pcie1/bw_in_port0/ ^C Performance counter stats for 'system wide': 153.12 MiB uncore_iio_pcie1/bw_in_port0/ 8.469790720 seconds time elapsed Signed-off-by: Jin Yao --- arch/x86/events/intel/uncore.c | 9 +++-- arch/x86/events/intel/uncore.h | 1 + arch/x86/events/intel/uncore_snbep.c | 10 ++ 3 files changed, 18 insertions(+), 2 deletions(-) diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c index 27a4614..6c6615f 100644 --- a/arch/x86/events/intel/uncore.c +++ b/arch/x86/events/intel/uncore.c @@ -812,8 +812,13 @@ static int uncore_pmu_register(struct intel_uncore_pmu *pmu) else sprintf(pmu->name, "uncore"); } else { - sprintf(pmu->name, "uncore_%s_%d", pmu->type->name, - pmu->pmu_idx); + if (pmu->type->alias && pmu->pmu_idx < pmu->type->num_boxes) { + sprintf(pmu->name, "uncore_%s", + pmu->type->alias[pmu->pmu_idx]); + } else { + sprintf(pmu->name, "uncore_%s_%d", pmu->type->name, + pmu->pmu_idx); + } } ret = perf_pmu_register(&pmu->pmu, pmu->name, -1); diff --git a/arch/x86/events/intel/uncore.h b/arch/x86/events/intel/uncore.h index e17ab88..db9fd9a 100644 --- a/arch/x86/events/intel/uncore.h +++ b/arch/x86/events/intel/uncore.h @@ -44,6 +44,7 @@ struct freerunning_counters; struct intel_uncore_type { const char *name; + const char **alias; int num_counters; int num_boxes; int perf_ctr_bits; diff --git a/arch/x86/events/intel/uncore_snbep.c b/arch/x86/events/intel/uncore_snbep.c index 51d7c11..9cd7e3d 100644 --- a/arch/x86/events/intel/uncore_snbep.c +++ b/arch/x86/events/intel/uncore_snbep.c @@ -3596,8 +3596,18 @@ static const struct attribute_group skx_uncore_iio_freerunning_format_group = { .attrs = skx_uncore_iio_freerunning_formats_attr, }; +static const char * const skx_uncore_iio_free_running_aliases[] = { + { "iio_cbdma" }, + { "iio_pcie0" }, + { "iio_pcie1" }, + { "iio_pcie2" }, + { "iio_mcp0" }, + { "iio_mcp1" }, +}; + static struct intel_uncore_type skx_uncore_iio_free_running = { .name = "iio_free_running", + .alias = skx_uncore_iio_free_running_aliases, .num_counters = 17, .num_boxes = 6, .num_freerunning_types = SKX_IIO_FREERUNNING_TYPE_MAX, -- 2.7.4
Re: [PATCH] perf util: Use target->per_thread and target->system_wide flags
On 1/23/2018 10:40 PM, Jiri Olsa wrote: On Tue, Jan 23, 2018 at 07:02:44AM +0800, Jin, Yao wrote: SNIP threads = thread_map__new_str(target->pid, target->tid, target->uid, - target->per_thread); + target->per_thread && target->system_wide); At first glance I thought your solution would do the trick but perf record does use target->system_wide when the '-a' switch is used. Moreover specifying the '-a' switch doesn't prevent the '--per-thread' option from being used as well, making both target->perf_thread and target_system_wide equal to true (and that is not good). Although not a fan of adding more to struct target, the advantage of having target->all_threads is that we are guaranteed that it isn't used anywhere else. Let me know what you think, Mathieu If we specify both '-a' and '--per-thread' to perf record, perf record will override'--per-thread'. So now target->per_thread = false, and target->system_wide = true. If we specify '--per-thread' only to perf record, target->per_thread = true, and target->system_wide = false. So whatever for any case, target->per_thread && target->system_wide is false. Since the parameter is false, in thread_map__new_str(), it will not execute the thread_map__new_all_cpus(). So that will not change perf record previous behavior. In perf stat, it allows the case that target->per_thread and target->system_wide are all true. That means we want to collect system-wide per-thread metrics. could you please put this description into comment before the thread_map__new_str is called? thanks, jirka OK, I will put this comment in v2. I will send v2 soon. Thanks Jin Yao
[PATCH v2] perf util: Use target->per_thread and target->system_wide flags
Mathieu Poirier reports issue in commit ("73c0ca1eee3d perf thread_map: Enumerate all threads from /proc") that it has negative impact on 'perf record --per-thread'. It has the effect of creating a kernel event for each thread in the system for 'perf record --per-thread'. Mathieu Poirier's patch ("perf util: Do not reuse target->per_thread flag") can fix this issue by creating a new target->all_threads flag. This patch is based on Mathieu Poirier's patch but it doesn't use a new target->all_threads flag. This patch just uses 'target->per_thread && target->system_wide' as a condition to check for all threads case. v2: --- According to Jiri's comment, add description to explain why the patch uses 'target->per_thread && target->system_wide'. v2 doesn't have functional change. Signed-off-by: Jin Yao --- tools/perf/util/evlist.c | 20 +++- tools/perf/util/thread_map.c | 4 ++-- tools/perf/util/thread_map.h | 2 +- 3 files changed, 22 insertions(+), 4 deletions(-) diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c index ac35cd2..3eea0f8 100644 --- a/tools/perf/util/evlist.c +++ b/tools/perf/util/evlist.c @@ -1106,8 +1106,26 @@ int perf_evlist__create_maps(struct perf_evlist *evlist, struct target *target) struct cpu_map *cpus; struct thread_map *threads; + /* +* If specify '-a' and '--per-thread' to perf record, perf record +* will override '--per-thread'. target->per_thread = false and +* target->system_wide = true. +* +* If specify '--per-thread' only to perf record, +* target->per_thread = true and target->system_wide = false. +* +* So target->per_thread && target->system_wide is false. +* For perf record, thread_map__new_str doesn't call +* thread_map__new_all_cpus. That will keep perf record's +* current behavior. +* +* For perf stat, it allows the case that target->per_thread and +* target->system_wide are all true. It means to collect system-wide +* per-thread data. thread_map__new_str will call +* thread_map__new_all_cpus to enumerate all threads. +*/ threads = thread_map__new_str(target->pid, target->tid, target->uid, - target->per_thread); + target->per_thread && target->system_wide); if (!threads) return -1; diff --git a/tools/perf/util/thread_map.c b/tools/perf/util/thread_map.c index 3e1038f..729dad8 100644 --- a/tools/perf/util/thread_map.c +++ b/tools/perf/util/thread_map.c @@ -323,7 +323,7 @@ struct thread_map *thread_map__new_by_tid_str(const char *tid_str) } struct thread_map *thread_map__new_str(const char *pid, const char *tid, - uid_t uid, bool per_thread) + uid_t uid, bool all_threads) { if (pid) return thread_map__new_by_pid_str(pid); @@ -331,7 +331,7 @@ struct thread_map *thread_map__new_str(const char *pid, const char *tid, if (!tid && uid != UINT_MAX) return thread_map__new_by_uid(uid); - if (per_thread) + if (all_threads) return thread_map__new_all_cpus(); return thread_map__new_by_tid_str(tid); diff --git a/tools/perf/util/thread_map.h b/tools/perf/util/thread_map.h index 0a806b9..5ec91cf 100644 --- a/tools/perf/util/thread_map.h +++ b/tools/perf/util/thread_map.h @@ -31,7 +31,7 @@ struct thread_map *thread_map__get(struct thread_map *map); void thread_map__put(struct thread_map *map); struct thread_map *thread_map__new_str(const char *pid, - const char *tid, uid_t uid, bool per_thread); + const char *tid, uid_t uid, bool all_threads); struct thread_map *thread_map__new_by_tid_str(const char *tid_str); -- 2.7.4
[PATCH] perf util: Display warning when perf report/annotate is missing some libs
We keep having bug reports that when users build perf on their own, but they don't install some needed libraries like libelf, libbfd/libibery. The perf can build, but it is missing important functionality. For example, perf report doesn't display any symbols and perf annotate doesn't work. This patch displays warnings that these libraries are missing in build when perf report / perf annotate are used. Signed-off-by: Jin Yao --- tools/perf/builtin-annotate.c | 2 ++ tools/perf/builtin-report.c | 2 ++ tools/perf/util/symbol.c | 21 + tools/perf/util/symbol.h | 2 ++ 4 files changed, 27 insertions(+) diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c index f15731a..609eb7a 100644 --- a/tools/perf/builtin-annotate.c +++ b/tools/perf/builtin-annotate.c @@ -511,6 +511,8 @@ int cmd_annotate(int argc, const char **argv) setup_browser(true); + build_lib_warning(); + ret = __cmd_annotate(&annotate); out_delete: diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c index dd4df9a..550adb7 100644 --- a/tools/perf/builtin-report.c +++ b/tools/perf/builtin-report.c @@ -1293,6 +1293,8 @@ int cmd_report(int argc, const char **argv) } } + build_lib_warning(); + if (symbol__init(&session->header.env) < 0) goto error; diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c index cc065d4..4205ba8 100644 --- a/tools/perf/util/symbol.c +++ b/tools/perf/util/symbol.c @@ -2224,3 +2224,24 @@ int symbol__config_symfs(const struct option *opt __maybe_unused, free(bf); return 0; } + +void build_lib_warning(void) +{ +#ifndef HAVE_LIBELF_SUPPORT + pr_warning("Symbols are disabled!\n" + "Please install libelf-dev, libelf-devel " + "or elfutils-libelf-devel before building perf.\n"); +#endif + +#ifndef HAVE_DWARF_SUPPORT + pr_warning("Unwind support is disabled!\n" + "Please install elfutils-devel/libdw-dev " + "before building perf.\n"); +#endif + +#ifndef HAVE_LIBBFD_SUPPORT + pr_warning("C++ demangling and line numbers are disabled!\n" + "Please install binutils-dev[el]/" + "libiberty-dev before building perf.\n"); +#endif +} diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h index 0563f33..d88e046 100644 --- a/tools/perf/util/symbol.h +++ b/tools/perf/util/symbol.h @@ -383,6 +383,8 @@ int get_sdt_note_list(struct list_head *head, const char *target); int cleanup_sdt_note_list(struct list_head *sdt_notes); int sdt_notes__get_count(struct list_head *start); +void build_lib_warning(void); + #define SDT_BASE_SCN ".stapsdt.base" #define SDT_NOTE_SCN ".note.stapsdt" #define SDT_NOTE_TYPE 3 -- 2.7.4
Re: [PATCH] perf util: Display warning when perf report/annotate is missing some libs
On 1/11/2018 11:30 PM, Jiri Olsa wrote: On Thu, Jan 11, 2018 at 07:03:06PM +0800, Jin Yao wrote: We keep having bug reports that when users build perf on their own, we already have same warnings during the build Yes, there will be warnings displayed during the build if some libraries are missing. While I do find the users (especially the new perf user) don't notice the warning. They only care about the success of build. Once they use this perf version and find something working strangely, they will say that perf may have bug. :( but they don't install some needed libraries like libelf, libbfd/libibery. how about saying that in the symbol column, instead of poluting report's output, like: $ perf report --stdio # Overhead Command Shared Object Symbol (disabled) # ... .. I just think it'd better provide some hints to user. For example, "symbol is disabled and you need to install libelf/xxx", say something like that. But it looks the column can't contain too much information (i.e. no more space to contain the entire hints). Any idea? Or just add this warning in verbose mode? also your change does not affect tui mode annotation for some reason does not start at all.. could be little more verbose ;-) jirka Yes, it doesn't affect tui mode. Or we just add this warning in verbose mode? e.g. perf report -v? Thanks Jin Yao
Re: [PATCH] perf stat: Ignore error thread when enabling system-wide --per-thread
On 1/16/2018 9:17 PM, Jiri Olsa wrote: On Tue, Jan 16, 2018 at 09:06:09PM +0800, Jin, Yao wrote: Just tested. But looks it's not OK for '--per-thread' case. yea, I haven't tested much.. might need soem tweaking, but my point was that it could be doable on one place instead of introducing another if possible jirka Hi Jiri, I ever considered to move the operation of removing error thread to perf_evsel__fallback(). The perf_evsel__fallback() is common code and it's shared by perf report, perf stat and perf top. While finally I think it'd better let the caller decide to remove error thread and try again, or just return the warning message. perf_evsel__fallback() probably doesn't know what the caller want to do. That's my current thinking. Maybe there will be a better fix... Thanks Jin Yao
Re: [PATCH] perf util: Fix evlist->threads when working with 'perf stat --per-thread'
On 1/13/2018 3:53 AM, Arnaldo Carvalho de Melo wrote: Em Fri, Jan 12, 2018 at 09:44:53AM -0700, Mathieu Poirier escreveu: After commit ("73c0ca1eee3d perf thread_map: Enumerate all threads from /proc") any condition where target->per_thread is set will see all the threads in a system added to evlist->threads. Since the 'record' utility also uses function thread_map__new_str(), any attempt to use the --perf-thread option will also end up accounting for all --perf-thread? Do you mean --per-thread, ok, so 'record' has this, uses the opts->per_thread that then Jin reused for his recent patches... Jin, can you try to untangle this, probably you can't reuse that opts->per-thread thing... Without looking further, I think that could be a better avenue to fix this issue, without having to add a 'stat' specific method to the core classes. - Arnaldo Hi, Very sorry, I just see this mail today. Please let me look at this issue first. Thanks Jin Yao threads in a system, resulting in new kernel events being created for all threads rather than just the thread of interest. This patch keeps the newly introduced functionality but make sure it doesn't affect other utilities outside of 'stat'. Signed-off-by: Mathieu Poirier --- tools/perf/builtin-stat.c | 2 +- tools/perf/util/evlist.c | 29 - tools/perf/util/evlist.h | 2 ++ 3 files changed, 27 insertions(+), 6 deletions(-) diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c index 98bf9d32f222..82d16426b984 100644 --- a/tools/perf/builtin-stat.c +++ b/tools/perf/builtin-stat.c @@ -2833,7 +2833,7 @@ int cmd_stat(int argc, const char **argv) if ((stat_config.aggr_mode == AGGR_THREAD) && (target.system_wide)) target.per_thread = true; - if (perf_evlist__create_maps(evsel_list, &target) < 0) { + if (perf_evlist__create_stat_maps(evsel_list, &target) < 0) { if (target__has_task(&target)) { pr_err("Problems finding threads of monitor\n"); parse_options_usage(stat_usage, stat_options, "p", 1); diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c index f0a5e09c4071..a4ae684f28de 100644 --- a/tools/perf/util/evlist.c +++ b/tools/perf/util/evlist.c @@ -1100,13 +1100,11 @@ int perf_evlist__mmap(struct perf_evlist *evlist, unsigned int pages) return perf_evlist__mmap_ex(evlist, pages, 0, false); } -int perf_evlist__create_maps(struct perf_evlist *evlist, struct target *target) +static int _perf_evlist__create_maps(struct perf_evlist *evlist, +struct target *target, +struct thread_map *threads) { struct cpu_map *cpus; - struct thread_map *threads; - - threads = thread_map__new_str(target->pid, target->tid, target->uid, - target->per_thread); if (!threads) return -1; @@ -1130,6 +1128,27 @@ int perf_evlist__create_maps(struct perf_evlist *evlist, struct target *target) return -1; } +int perf_evlist__create_maps(struct perf_evlist *evlist, struct target *target) +{ + struct thread_map *threads; + + threads = thread_map__new_str(target->pid, target->tid, target->uid, + false); + + return _perf_evlist__create_maps(evlist, target, threads); +} + +int perf_evlist__create_stat_maps(struct perf_evlist *evlist, + struct target *target) +{ + struct thread_map *threads; + + threads = thread_map__new_str(target->pid, target->tid, target->uid, + target->per_thread); + + return _perf_evlist__create_maps(evlist, target, threads); +} + void perf_evlist__set_maps(struct perf_evlist *evlist, struct cpu_map *cpus, struct thread_map *threads) { diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h index 75160666d305..80c654990e19 100644 --- a/tools/perf/util/evlist.h +++ b/tools/perf/util/evlist.h @@ -188,6 +188,8 @@ void perf_evlist__set_selected(struct perf_evlist *evlist, void perf_evlist__set_maps(struct perf_evlist *evlist, struct cpu_map *cpus, struct thread_map *threads); int perf_evlist__create_maps(struct perf_evlist *evlist, struct target *target); +int perf_evlist__create_stat_maps(struct perf_evlist *evlist, + struct target *target); int perf_evlist__apply_filters(struct perf_evlist *evlist, struct perf_evsel **err_evsel); void __perf_evlist__set_leader(struct list_head *list); -- 2.7.4
Re: [PATCH 1/1] perf util: Do not reuse target->per_thread flag
Hi Mathieu, Sorry, I just see this patch today. I have tested this patch and it works. Another idea from me is it doesn't need to add a new target->all_threads flag. We just use target->per-thread && target->system_wide as a condition to check for all per-thread case. I just think for your perf record case, the target->system_wide will not be set. Instead, if target->per-thread and target->system_wide are both set, that means we needs to trace on all threads, right? Thanks Jin Yao On 1/13/2018 7:12 AM, Mathieu Poirier wrote: Commit ("73c0ca1eee3d perf thread_map: Enumerate all threads from /proc") is using the target->per_thread flag to specify that all threads in a system should be taken into account. That is then used in function thread_map__new_str() where all threads are added to evlist->threads. Variable target->per_thread is also used by 'perf record' when handling trace sessions using the --per-thread command line option. Since the target->per_thread flag is set all threads will be added to evlist->threads, which has the effect of creating a kernel event for each thread in the system. This patch address the issue by creating a new target->all_threads flag that gets set from the 'stat' utility, avoiding any conflict with other utilities using target->per_thread. Signed-off-by: Mathieu Poirier --- tools/perf/builtin-stat.c| 2 +- tools/perf/util/evlist.c | 2 +- tools/perf/util/target.h | 1 + tools/perf/util/thread_map.c | 4 ++-- tools/perf/util/thread_map.h | 2 +- 5 files changed, 6 insertions(+), 5 deletions(-) diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c index 98bf9d32f222..87e156bfb45e 100644 --- a/tools/perf/builtin-stat.c +++ b/tools/perf/builtin-stat.c @@ -2831,7 +2831,7 @@ int cmd_stat(int argc, const char **argv) target__validate(&target); if ((stat_config.aggr_mode == AGGR_THREAD) && (target.system_wide)) - target.per_thread = true; + target.per_thread = target.all_threads = true; if (perf_evlist__create_maps(evsel_list, &target) < 0) { if (target__has_task(&target)) { diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c index f0a5e09c4071..c239eb895612 100644 --- a/tools/perf/util/evlist.c +++ b/tools/perf/util/evlist.c @@ -1106,7 +1106,7 @@ int perf_evlist__create_maps(struct perf_evlist *evlist, struct target *target) struct thread_map *threads; threads = thread_map__new_str(target->pid, target->tid, target->uid, - target->per_thread); + target->all_threads); if (!threads) return -1; diff --git a/tools/perf/util/target.h b/tools/perf/util/target.h index 6ef01a83b24e..0da8ea2b6801 100644 --- a/tools/perf/util/target.h +++ b/tools/perf/util/target.h @@ -15,6 +15,7 @@ struct target { bool uses_mmap; bool default_per_cpu; bool per_thread; + bool all_threads; }; enum target_errno { diff --git a/tools/perf/util/thread_map.c b/tools/perf/util/thread_map.c index 3e1038f6491c..729dad8f412d 100644 --- a/tools/perf/util/thread_map.c +++ b/tools/perf/util/thread_map.c @@ -323,7 +323,7 @@ struct thread_map *thread_map__new_by_tid_str(const char *tid_str) } struct thread_map *thread_map__new_str(const char *pid, const char *tid, - uid_t uid, bool per_thread) + uid_t uid, bool all_threads) { if (pid) return thread_map__new_by_pid_str(pid); @@ -331,7 +331,7 @@ struct thread_map *thread_map__new_str(const char *pid, const char *tid, if (!tid && uid != UINT_MAX) return thread_map__new_by_uid(uid); - if (per_thread) + if (all_threads) return thread_map__new_all_cpus(); return thread_map__new_by_tid_str(tid); diff --git a/tools/perf/util/thread_map.h b/tools/perf/util/thread_map.h index 0a806b99e73c..5ec91cfd1869 100644 --- a/tools/perf/util/thread_map.h +++ b/tools/perf/util/thread_map.h @@ -31,7 +31,7 @@ struct thread_map *thread_map__get(struct thread_map *map); void thread_map__put(struct thread_map *map); struct thread_map *thread_map__new_str(const char *pid, - const char *tid, uid_t uid, bool per_thread); + const char *tid, uid_t uid, bool all_threads); struct thread_map *thread_map__new_by_tid_str(const char *tid_str);
[PATCH] perf util: Use target->per_thread and target->system_wide flags
Mathieu Poirier reports issue in commit ("73c0ca1eee3d perf thread_map: Enumerate all threads from /proc") that it has negative impact on 'perf record --per-thread'. It has the effect of creating a kernel event for each thread in the system for 'perf record --per-thread'. Mathieu Poirier's patch ("perf util: Do not reuse target->per_thread flag") can fix this issue by creating a new target->all_threads flag. This patch is based on Mathieu Poirier's patch but it doesn't use a new target->all_threads flag. This patch just uses 'target->per_thread && target->system_wide' as a condition to check for all threads case. Signed-off-by: Jin Yao --- tools/perf/util/evlist.c | 2 +- tools/perf/util/thread_map.c | 4 ++-- tools/perf/util/thread_map.h | 2 +- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c index 120efd8..9dff74a 100644 --- a/tools/perf/util/evlist.c +++ b/tools/perf/util/evlist.c @@ -1106,7 +1106,7 @@ int perf_evlist__create_maps(struct perf_evlist *evlist, struct target *target) struct thread_map *threads; threads = thread_map__new_str(target->pid, target->tid, target->uid, - target->per_thread); + target->per_thread && target->system_wide); if (!threads) return -1; diff --git a/tools/perf/util/thread_map.c b/tools/perf/util/thread_map.c index 3e1038f..729dad8 100644 --- a/tools/perf/util/thread_map.c +++ b/tools/perf/util/thread_map.c @@ -323,7 +323,7 @@ struct thread_map *thread_map__new_by_tid_str(const char *tid_str) } struct thread_map *thread_map__new_str(const char *pid, const char *tid, - uid_t uid, bool per_thread) + uid_t uid, bool all_threads) { if (pid) return thread_map__new_by_pid_str(pid); @@ -331,7 +331,7 @@ struct thread_map *thread_map__new_str(const char *pid, const char *tid, if (!tid && uid != UINT_MAX) return thread_map__new_by_uid(uid); - if (per_thread) + if (all_threads) return thread_map__new_all_cpus(); return thread_map__new_by_tid_str(tid); diff --git a/tools/perf/util/thread_map.h b/tools/perf/util/thread_map.h index 0a806b9..5ec91cf 100644 --- a/tools/perf/util/thread_map.h +++ b/tools/perf/util/thread_map.h @@ -31,7 +31,7 @@ struct thread_map *thread_map__get(struct thread_map *map); void thread_map__put(struct thread_map *map); struct thread_map *thread_map__new_str(const char *pid, - const char *tid, uid_t uid, bool per_thread); + const char *tid, uid_t uid, bool all_threads); struct thread_map *thread_map__new_by_tid_str(const char *tid_str); -- 2.7.4
Re: [PATCH] perf util: Use target->per_thread and target->system_wide flags
On 1/23/2018 5:10 AM, Mathieu Poirier wrote: On 22 January 2018 at 15:15, Jin Yao wrote: Mathieu Poirier reports issue in commit ("73c0ca1eee3d perf thread_map: Enumerate all threads from /proc") that it has negative impact on 'perf record --per-thread'. It has the effect of creating a kernel event for each thread in the system for 'perf record --per-thread'. Mathieu Poirier's patch ("perf util: Do not reuse target->per_thread flag") can fix this issue by creating a new target->all_threads flag. This patch is based on Mathieu Poirier's patch but it doesn't use a new target->all_threads flag. This patch just uses 'target->per_thread && target->system_wide' as a condition to check for all threads case. Signed-off-by: Jin Yao --- tools/perf/util/evlist.c | 2 +- tools/perf/util/thread_map.c | 4 ++-- tools/perf/util/thread_map.h | 2 +- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c index 120efd8..9dff74a 100644 --- a/tools/perf/util/evlist.c +++ b/tools/perf/util/evlist.c @@ -1106,7 +1106,7 @@ int perf_evlist__create_maps(struct perf_evlist *evlist, struct target *target) struct thread_map *threads; threads = thread_map__new_str(target->pid, target->tid, target->uid, - target->per_thread); + target->per_thread && target->system_wide); At first glance I thought your solution would do the trick but perf record does use target->system_wide when the '-a' switch is used. Moreover specifying the '-a' switch doesn't prevent the '--per-thread' option from being used as well, making both target->perf_thread and target_system_wide equal to true (and that is not good). Although not a fan of adding more to struct target, the advantage of having target->all_threads is that we are guaranteed that it isn't used anywhere else. Let me know what you think, Mathieu If we specify both '-a' and '--per-thread' to perf record, perf record will override'--per-thread'. So now target->per_thread = false, and target->system_wide = true. If we specify '--per-thread' only to perf record, target->per_thread = true, and target->system_wide = false. So whatever for any case, target->per_thread && target->system_wide is false. Since the parameter is false, in thread_map__new_str(), it will not execute the thread_map__new_all_cpus(). So that will not change perf record previous behavior. In perf stat, it allows the case that target->per_thread and target->system_wide are all true. That means we want to collect system-wide per-thread metrics. That's my current thinking. Thanks Jin Yao
Re: [PATCH] perf util: Use target->per_thread and target->system_wide flags
On 1/23/2018 7:56 AM, Mathieu Poirier wrote: On 22 January 2018 at 15:15, Jin Yao wrote: Mathieu Poirier reports issue in commit ("73c0ca1eee3d perf thread_map: Enumerate all threads from /proc") that it has negative impact on 'perf record --per-thread'. It has the effect of creating a kernel event for each thread in the system for 'perf record --per-thread'. Mathieu Poirier's patch ("perf util: Do not reuse target->per_thread flag") can fix this issue by creating a new target->all_threads flag. This patch is based on Mathieu Poirier's patch but it doesn't use a new target->all_threads flag. This patch just uses 'target->per_thread && target->system_wide' as a condition to check for all threads case. Signed-off-by: Jin Yao --- tools/perf/util/evlist.c | 2 +- tools/perf/util/thread_map.c | 4 ++-- tools/perf/util/thread_map.h | 2 +- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c index 120efd8..9dff74a 100644 --- a/tools/perf/util/evlist.c +++ b/tools/perf/util/evlist.c @@ -1106,7 +1106,7 @@ int perf_evlist__create_maps(struct perf_evlist *evlist, struct target *target) struct thread_map *threads; threads = thread_map__new_str(target->pid, target->tid, target->uid, - target->per_thread); + target->per_thread && target->system_wide); if (!threads) return -1; diff --git a/tools/perf/util/thread_map.c b/tools/perf/util/thread_map.c index 3e1038f..729dad8 100644 --- a/tools/perf/util/thread_map.c +++ b/tools/perf/util/thread_map.c @@ -323,7 +323,7 @@ struct thread_map *thread_map__new_by_tid_str(const char *tid_str) } struct thread_map *thread_map__new_str(const char *pid, const char *tid, - uid_t uid, bool per_thread) + uid_t uid, bool all_threads) { if (pid) return thread_map__new_by_pid_str(pid); @@ -331,7 +331,7 @@ struct thread_map *thread_map__new_str(const char *pid, const char *tid, if (!tid && uid != UINT_MAX) return thread_map__new_by_uid(uid); - if (per_thread) + if (all_threads) return thread_map__new_all_cpus(); return thread_map__new_by_tid_str(tid); diff --git a/tools/perf/util/thread_map.h b/tools/perf/util/thread_map.h index 0a806b9..5ec91cf 100644 --- a/tools/perf/util/thread_map.h +++ b/tools/perf/util/thread_map.h @@ -31,7 +31,7 @@ struct thread_map *thread_map__get(struct thread_map *map); void thread_map__put(struct thread_map *map); struct thread_map *thread_map__new_str(const char *pid, - const char *tid, uid_t uid, bool per_thread); + const char *tid, uid_t uid, bool all_threads); struct thread_map *thread_map__new_by_tid_str(const char *tid_str); Reviewed-and-tested-by: Mathieu Poirier "mathieu.poir...@linaro.org" Thanks a lot for reviewing the patch. Thanks Jin Yao
Re: [PATCH v7 2/6] perf record: Get the first sample time and last sample time
On 1/5/2018 3:09 AM, Arnaldo Carvalho de Melo wrote: Em Fri, Dec 08, 2017 at 09:13:42PM +0800, Jin Yao escreveu: In the default 'perf record' configuration, all samples are processed, to create the HEADER_BUILD_ID table. So it's very easy to get the first/last samples and save the time to perf file header via the function write_sample_time(). Later, at post processing time, perf report/script will fetch the time from perf file header. So, at this point I was expecting that that record would be present on the perf.data file: [acme@jouet perf]$ perf record --timestamp-boundary sleep 1 Cannot read kernel map [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.001 MB perf.data (7 samples) ] [acme@jouet perf]$ perf report -D | grep PERF_RECORD_SAMPLE | wc -l 7 [acme@jouet perf]$ perf report -D | grep PERF_RECORD_SAMPLE_TIME [acme@jouet perf]$ What am I doing wrong? To clarify, this is with just the first two patches in this series applied. - Arnaldo Hi Arnaldo, The timestamp boundary information is saved in perf file header. So if we want to look at them, we need to add '--header' in perf report. For example, root@skl:/tmp# perf report -D --header | grep 'time of' # time of first sample : 248333.706656 # time of last sample : 248357.215328 Thanks Jin Yao
Re: [PATCH v7 2/6] perf record: Get the first sample time and last sample time
On 1/5/2018 8:53 PM, Arnaldo Carvalho de Melo wrote: Em Fri, Jan 05, 2018 at 09:15:03AM +0800, Jin, Yao escreveu: On 1/5/2018 3:09 AM, Arnaldo Carvalho de Melo wrote: Em Fri, Dec 08, 2017 at 09:13:42PM +0800, Jin Yao escreveu: In the default 'perf record' configuration, all samples are processed, to create the HEADER_BUILD_ID table. So it's very easy to get the first/last samples and save the time to perf file header via the function write_sample_time(). Later, at post processing time, perf report/script will fetch the time from perf file header. So, at this point I was expecting that that record would be present on the perf.data file: [acme@jouet perf]$ perf record --timestamp-boundary sleep 1 Cannot read kernel map [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.001 MB perf.data (7 samples) ] [acme@jouet perf]$ perf report -D | grep PERF_RECORD_SAMPLE | wc -l 7 [acme@jouet perf]$ perf report -D | grep PERF_RECORD_SAMPLE_TIME [acme@jouet perf]$ What am I doing wrong? To clarify, this is with just the first two patches in this series applied. - Arnaldo Hi Arnaldo, The timestamp boundary information is saved in perf file header. Right, my bad, I somehow thought it would be as PERF_RECORD_SAMPLE_TIME, duh. Will continue reviewing, sorry about that. - Arnaldo Thanks for reviewing the patch. If you see anything I should improve, please let me know. Thanks Jin Yao So if we want to look at them, we need to add '--header' in perf report. For example, root@skl:/tmp# perf report -D --header | grep 'time of' # time of first sample : 248333.706656 # time of last sample : 248357.215328 Thanks Jin Yao
[PATCH] perf report: Fix wrong jump arrow
When we use perf report interactive annotate view, we can see the position of jump arrow is not correct. For example, 1. perf record -b ... 2. perf report 3. In interactive mode, select Annotate 'function' Percent│ IPC Cycle │if (flag) 1.37 │0.4┌── 1 ↓ je 82 │ │x += x / y + y / x; 0.00 │0.4│ 1310movsd (%rsp),%xmm0 0.00 │0.4│ 565movsd 0x8(%rsp),%xmm4 │0.4│ movsd 0x8(%rsp),%xmm1 │0.4│ movsd (%rsp),%xmm3 │0.4│ divsd %xmm4,%xmm0 0.00 │0.4│ 579divsd %xmm3,%xmm1 │0.4│ movsd (%rsp),%xmm2 │0.4│ addsd %xmm1,%xmm0 │0.4│ addsd %xmm2,%xmm0 0.00 │0.4│ movsd %xmm0,(%rsp) │ │volatile double x = 1212121212, y = 121212; │ │ │ │s_randseed = time(0); │ │srand(s_randseed); │ │ │ │for (i = 0; i < 20; i++) { 1.37 │0.4└─→ 82: sub$0x1,%ebx 28.21 │0.4817 ↑ jne38 The jump arrow in above example is not correct. It should add the width of IPC and Cycle. With this patch, the result is: Percent│ IPC Cycle │if (flag) 1.37 │0.48 1 ┌──je 82 │ │x += x / y + y / x; 0.00 │0.48 1310 │ movsd (%rsp),%xmm0 0.00 │0.48 565 │ movsd 0x8(%rsp),%xmm4 │0.48 │ movsd 0x8(%rsp),%xmm1 │0.48 │ movsd (%rsp),%xmm3 │0.48 │ divsd %xmm4,%xmm0 0.00 │0.48 579 │ divsd %xmm3,%xmm1 │0.48 │ movsd (%rsp),%xmm2 │0.48 │ addsd %xmm1,%xmm0 │0.48 │ addsd %xmm2,%xmm0 0.00 │0.48 │ movsd %xmm0,(%rsp) │ │volatile double x = 1212121212, y = 121212; │ │ │ │s_randseed = time(0); │ │srand(s_randseed); │ │ │ │for (i = 0; i < 20; i++) { 1.37 │0.4882:└─→sub$0x1,%ebx 28.21 │0.4817 ↑ jne38 Signed-off-by: Jin Yao --- tools/perf/ui/browsers/annotate.c | 9 +++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/tools/perf/ui/browsers/annotate.c b/tools/perf/ui/browsers/annotate.c index 2864279..e2f6663 100644 --- a/tools/perf/ui/browsers/annotate.c +++ b/tools/perf/ui/browsers/annotate.c @@ -319,6 +319,7 @@ static void annotate_browser__draw_current_jump(struct ui_browser *browser) struct map_symbol *ms = ab->b.priv; struct symbol *sym = ms->sym; u8 pcnt_width = annotate_browser__pcnt_width(ab); + int width = 0; /* PLT symbols contain external offsets */ if (strstr(sym->name, "@plt")) @@ -340,13 +341,17 @@ static void annotate_browser__draw_current_jump(struct ui_browser *browser) to = (u64)btarget->idx; } + if (ab->have_cycles) + width = IPC_WIDTH + CYCLES_WIDTH; + ui_browser__set_color(browser, HE_COLORSET_JUMP_ARROWS); - __ui_browser__line_arrow(browser, pcnt_width + 2 + ab->addr_width, + __ui_browser__line_arrow(browser, +pcnt_width + 2 + ab->addr_width + width, from, to); if (is_fused(ab, cursor)) { ui_browser__mark_fused(browser, - pcnt_width + 3 + ab->addr_width, + pcnt_width + 3 + ab->addr_width + width, from - 1, to > from ? true : false); } -- 2.7.4
Re: [PATCH v1 0/8] perf: Follow-up patches to improve time slice
On 1/16/2018 10:48 PM, Arnaldo Carvalho de Melo wrote: Em Tue, Jan 16, 2018 at 12:55:19PM +0100, Jiri Olsa escreveu: On Wed, Jan 10, 2018 at 11:00:25PM +0800, Jin Yao wrote: It's follow-up patches to improve the perf time slice feature (perf report/script --time xxx) 1. Improve the error message perf report: Improve error msg when no first/last sample time found perf script: Improve error msg when no first/last sample time found 2. Fix an issue that illegal percent was accepted previously (e.g. 1abc%) perf util: Improve error checking for time percent input 3. Omit the slice index if possible. For example, perf report --stdio --time 10%/1 is equivalent to perf report --stdio --time 10% perf util: Support no index time percent slice 4. Add indication of time slices in perf report header. perf report: Add an indication of what time slices are used >>> 5. Remove the time slices number limitation in perf report/script perf util: Allocate time slices buffer according to number of comma perf report: Remove the time slices number limitation perf script: Remove the time slices number limitation Jin Yao (8): perf report: Improve error msg when no first/last sample time found perf script: Improve error msg when no first/last sample time found perf util: Improve error checking for time percent input perf util: Support no index time percent slice perf report: Add an indication of what time slices are used perf util: Allocate time slices buffer according to number of comma perf report: Remove the time slices number limitation perf script: Remove the time slices number limitation from quick look it looks ok Reviewed-by: Jiri Olsa Thanks, applied, now test building with 'make -C tools/perf build-test', containers, etc. - Arnaldo Thanks Arnaldo! Thanks Jin Yao
Re: [PATCH v7 3/6] perf util: Create function to parse time percent
On 1/8/2018 10:38 PM, Arnaldo Carvalho de Melo wrote: Em Mon, Jan 08, 2018 at 11:31:49AM -0300, Arnaldo Carvalho de Melo escreveu: Em Fri, Dec 08, 2017 at 09:13:43PM +0800, Jin Yao escreveu: Current perf report/script/... have a --time option to limit the time range of output. But right now it only supports absolute time. For easy using, now it can support a percent of time usage. For example: 1. Select the second 10% time slice perf report --time 10%/2 After applying this patch I'm not being able to get any of these examples to work: [root@jouet home]# perf report --header | grep "time of" # time of first sample : 22947.909226 # time of last sample : 22948.910704 [root@jouet home]# Then, when I try the first example: [root@jouet home]# perf report --stdio --time 1%-20% Invalid time string # To display the perf.data header info, please use --header/--header-only options. # [root@jouet home]# What am I doing wrong? Oh well, the way you worded this cset it looked like after applying it I would be able to use percents, etc, which is not the case, I'll probably need to apply the next patches to _then_ this work as advertised in this cset comment. Please try to make it clear in the commit messages these details, to ease reviewing. - Arnaldo Hi Arnaldo, Sorry about that. I will take care next time to try my best to make the patch description more clear. Thanks Jin Yao
Re: [PATCH v7 5/6] perf report: support time percent and multiple time ranges
On 1/8/2018 11:04 PM, Arnaldo Carvalho de Melo wrote: One more thing to consider: When you use: perf report --time 10%/1 it will do what is asked but there is no indication of what percentage ranges are in place, it would be nice to have this in the first line in the TUI, right after this: Samples: 128 of event 'cycles:ppp', Event count (approx.): 21386169 (10%/1) - Arnaldo Hi Arnaldo, Thanks for merging this feature and thanks so much for providing comments for how to improve this feature in next step. Please let me summarize what the follow-up patch needs to do: 1. Improve the error message, something like: "HINT: use 'perf record --foobar' to record the first/last sample timestamps in the perf.data file header or enable build-id processing." 2. Currently it uses magic number 10 to limit the number of time slices. But it's not good. The best thing would be to have no such limitations, i.e. to lift this limitation from the code and docs. 3. 'perf report --time 10%' should be equivalent to 'perf report --time 10%/1'. Or at least say something like: "percent slices need an index to specify which one is wanted" when you notice a % in the --time string, etc. 4. Add an indication of what percentage ranges are being used. For example, Samples: 128 of event 'cycles:ppp', Event count (approx.): 21386169 (10%/1) I will develop a follow-up patchset to improve this feature. Thanks Jin Yao
[PATCH v1 0/8] perf: Follow-up patches to improve time slice
It's follow-up patches to improve the perf time slice feature (perf report/script --time xxx) 1. Improve the error message perf report: Improve error msg when no first/last sample time found perf script: Improve error msg when no first/last sample time found 2. Fix an issue that illegal percent was accepted previously (e.g. 1abc%) perf util: Improve error checking for time percent input 3. Omit the slice index if possible. For example, perf report --stdio --time 10%/1 is equivalent to perf report --stdio --time 10% perf util: Support no index time percent slice 4. Add indication of time slices in perf report header. perf report: Add an indication of what time slices are used 5. Remove the time slices number limitation in perf report/script perf util: Allocate time slices buffer according to number of comma perf report: Remove the time slices number limitation perf script: Remove the time slices number limitation Jin Yao (8): perf report: Improve error msg when no first/last sample time found perf script: Improve error msg when no first/last sample time found perf util: Improve error checking for time percent input perf util: Support no index time percent slice perf report: Add an indication of what time slices are used perf util: Allocate time slices buffer according to number of comma perf report: Remove the time slices number limitation perf script: Remove the time slices number limitation tools/perf/Documentation/perf-report.txt | 2 +- tools/perf/Documentation/perf-script.txt | 10 ++--- tools/perf/builtin-report.c | 30 + tools/perf/builtin-script.c | 21 +++--- tools/perf/util/time-utils.c | 72 +++- tools/perf/util/time-utils.h | 2 + 6 files changed, 117 insertions(+), 20 deletions(-) -- 2.7.4
[PATCH v1 1/8] perf report: Improve error msg when no first/last sample time found
The following message will be returned to user when executing 'perf report --time' if perf data file doesn't contain the first/last sample time. "HINT: no first/last sample time found in perf data. Please use latest perf binary to execute 'perf record' (if '--buildid-all' is enabled, needs to set '--timestamp-boundary')." Signed-off-by: Jin Yao --- tools/perf/builtin-report.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c index dd4df9a..a6c5cf2 100644 --- a/tools/perf/builtin-report.c +++ b/tools/perf/builtin-report.c @@ -1299,7 +1299,9 @@ int cmd_report(int argc, const char **argv) if (perf_time__parse_str(report.ptime_range, report.time_str) != 0) { if (session->evlist->first_sample_time == 0 && session->evlist->last_sample_time == 0) { - pr_err("No first/last sample time in perf data\n"); + pr_err("HINT: no first/last sample time found in perf data.\n" + "Please use latest perf binary to execute 'perf record'\n" + "(if '--buildid-all' is enabled, please set '--timestamp-boundary').\n"); return -EINVAL; } -- 2.7.4
[PATCH v1 5/8] perf report: Add an indication of what time slices are used
Add a time slices indication to the perf report header. For example, # perf report --stdio --time 10% # Total Lost Samples: 0 # # Samples: 9K of event 'cycles:ppp' (time slices: 10%) # Event count (approx.): 8951288803 Signed-off-by: Jin Yao --- tools/perf/builtin-report.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c index a6c5cf2..77c954c 100644 --- a/tools/perf/builtin-report.c +++ b/tools/perf/builtin-report.c @@ -403,6 +403,9 @@ static size_t hists__fprintf_nr_sample_events(struct hists *hists, struct report if (evname != NULL) ret += fprintf(fp, " of event '%s'", evname); + if (rep->time_str) + ret += fprintf(fp, " (time slices: %s)", rep->time_str); + if (symbol_conf.show_ref_callgraph && strstr(evname, "call-graph=no")) { ret += fprintf(fp, ", show reference callgraph"); -- 2.7.4
[PATCH v1 7/8] perf report: Remove the time slices number limitation
Previously it was only allowed to use at most 10 time slices in 'perf report --time'. This patch removes this limitation. For example, following command line is OK (12 time slices) perf report --stdio --time 1%/1,1%/2,1%/3,1%/4,1%/5,1%/6,1%/7,1%/8,1%/9,1%/10,1%/11,1%/12 Signed-off-by: Jin Yao --- tools/perf/Documentation/perf-report.txt | 2 +- tools/perf/builtin-report.c | 23 +-- 2 files changed, 18 insertions(+), 7 deletions(-) diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt index 5522ce0..1940c4f 100644 --- a/tools/perf/Documentation/perf-report.txt +++ b/tools/perf/Documentation/perf-report.txt @@ -403,7 +403,7 @@ OPTIONS to end of file. Also support time percent with multiple time range. Time string is - 'a%/n,b%/m,...' or 'a%-b%,c%-%d,...'. The maximum number of slices is 10. + 'a%/n,b%/m,...' or 'a%-b%,c%-%d,...'. For example: Select the second 10% time slice: diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c index 77c954c..fe89021 100644 --- a/tools/perf/builtin-report.c +++ b/tools/perf/builtin-report.c @@ -54,8 +54,6 @@ #include #include -#define PTIME_RANGE_MAX10 - struct report { struct perf_tooltool; struct perf_session *session; @@ -76,7 +74,8 @@ struct report { const char *cpu_list; const char *symbol_filter_str; const char *time_str; - struct perf_time_interval ptime_range[PTIME_RANGE_MAX]; + struct perf_time_interval *ptime_range; + int range_size; int range_num; float min_percent; u64 nr_entries; @@ -1299,24 +1298,33 @@ int cmd_report(int argc, const char **argv) if (symbol__init(&session->header.env) < 0) goto error; + report.ptime_range = perf_time__range_alloc(report.time_str, + &report.range_size); + if (!report.ptime_range) { + ret = -ENOMEM; + goto error; + } + if (perf_time__parse_str(report.ptime_range, report.time_str) != 0) { if (session->evlist->first_sample_time == 0 && session->evlist->last_sample_time == 0) { pr_err("HINT: no first/last sample time found in perf data.\n" "Please use latest perf binary to execute 'perf record'\n" "(if '--buildid-all' is enabled, please set '--timestamp-boundary').\n"); - return -EINVAL; + ret = -EINVAL; + goto error; } report.range_num = perf_time__percent_parse_str( - report.ptime_range, PTIME_RANGE_MAX, + report.ptime_range, report.range_size, report.time_str, session->evlist->first_sample_time, session->evlist->last_sample_time); if (report.range_num < 0) { pr_err("Invalid time string\n"); - return -EINVAL; + ret = -EINVAL; + goto error; } } else { report.range_num = 1; @@ -1332,6 +1340,9 @@ int cmd_report(int argc, const char **argv) ret = 0; error: + if (report.ptime_range) + free(report.ptime_range); + perf_session__delete(session); return ret; } -- 2.7.4
[PATCH v1 3/8] perf util: Improve error checking for time percent input
The command line like 'perf report --stdio --time 1abc%/1' could be accepted by perf. It looks not very good. This patch uses strtod() to replace original atof() and check the entire string. Now for the same command line, it would return error message "Invalid time string". root@skl:/tmp# perf report --stdio --time 1abc%/1 Invalid time string Signed-off-by: Jin Yao --- tools/perf/util/time-utils.c | 8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/tools/perf/util/time-utils.c b/tools/perf/util/time-utils.c index 3f7f18f..88510ab 100644 --- a/tools/perf/util/time-utils.c +++ b/tools/perf/util/time-utils.c @@ -116,7 +116,8 @@ int perf_time__parse_str(struct perf_time_interval *ptime, const char *ostr) static int parse_percent(double *pcnt, char *str) { - char *c; + char *c, *endptr; + double d; c = strchr(str, '%'); if (c) @@ -124,8 +125,11 @@ static int parse_percent(double *pcnt, char *str) else return -1; - *pcnt = atof(str) / 100.0; + d = strtod(str, &endptr); + if (endptr != str + strlen(str)) + return -1; + *pcnt = d / 100.0; return 0; } -- 2.7.4
[PATCH v1 8/8] perf script: Remove the time slices number limitation
Previously it was only allowed to use at most 10 time slices in 'perf script --time'. This patch removes this limitation. For example, following command line is OK (12 time slices) perf script --time 1%/1,1%/2,1%/3,1%/4,1%/5,1%/6,1%/7,1%/8,1%/9,1%/10,1%/11,1%/12 Signed-off-by: Jin Yao --- tools/perf/Documentation/perf-script.txt | 10 +- tools/perf/builtin-script.c | 17 + 2 files changed, 18 insertions(+), 9 deletions(-) diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt index 806ec63..7730c1d 100644 --- a/tools/perf/Documentation/perf-script.txt +++ b/tools/perf/Documentation/perf-script.txt @@ -351,19 +351,19 @@ include::itrace.txt[] to end of file. Also support time percent with multipe time range. Time string is - 'a%/n,b%/m,...' or 'a%-b%,c%-%d,...'. The maximum number of slices is 10. + 'a%/n,b%/m,...' or 'a%-b%,c%-%d,...'. For example: - Select the second 10% time slice + Select the second 10% time slice: perf script --time 10%/2 - Select from 0% to 10% time slice + Select from 0% to 10% time slice: perf script --time 0%-10% - Select the first and second 10% time slices + Select the first and second 10% time slices: perf script --time 10%/1,10%/2 - Select from 0% to 10% and 30% to 40% slices + Select from 0% to 10% and 30% to 40% slices: perf script --time 0%-10%,30%-40% --max-blocks:: diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c index 4f691af..f116a31 100644 --- a/tools/perf/builtin-script.c +++ b/tools/perf/builtin-script.c @@ -1480,8 +1480,6 @@ static int perf_sample__fprintf_synth(struct perf_sample *sample, return 0; } -#define PTIME_RANGE_MAX10 - struct perf_script { struct perf_tooltool; struct perf_session *session; @@ -1496,7 +1494,8 @@ struct perf_script { struct thread_map *threads; int name_width; const char *time_str; - struct perf_time_interval ptime_range[PTIME_RANGE_MAX]; + struct perf_time_interval *ptime_range; + int range_size; int range_num; }; @@ -3444,6 +3443,13 @@ int cmd_script(int argc, const char **argv) if (err < 0) goto out_delete; + script.ptime_range = perf_time__range_alloc(script.time_str, + &script.range_size); + if (!script.ptime_range) { + err = -ENOMEM; + goto out_delete; + } + /* needs to be parsed after looking up reference time */ if (perf_time__parse_str(script.ptime_range, script.time_str) != 0) { if (session->evlist->first_sample_time == 0 && @@ -3456,7 +3462,7 @@ int cmd_script(int argc, const char **argv) } script.range_num = perf_time__percent_parse_str( - script.ptime_range, PTIME_RANGE_MAX, + script.ptime_range, script.range_size, script.time_str, session->evlist->first_sample_time, session->evlist->last_sample_time); @@ -3475,6 +3481,9 @@ int cmd_script(int argc, const char **argv) flush_scripting(); out_delete: + if (script.ptime_range) + free(script.ptime_range); + perf_evlist__free_stats(session->evlist); perf_session__delete(session); -- 2.7.4
[PATCH v1 4/8] perf util: Support no index time percent slice
Previously, the time percent slice needs an index to specify which one the user wants. While it may be easy for using if the index can be omitted. So with this patch, for example, perf report --stdio --time 10%/1 should be equivalent to perf report --stdio --time 10% Signed-off-by: Jin Yao --- tools/perf/util/time-utils.c | 36 1 file changed, 36 insertions(+) diff --git a/tools/perf/util/time-utils.c b/tools/perf/util/time-utils.c index 88510ab..5769f97 100644 --- a/tools/perf/util/time-utils.c +++ b/tools/perf/util/time-utils.c @@ -261,6 +261,37 @@ static int percent_comma_split(struct perf_time_interval *ptime_buf, int num, return i; } +static int one_percent_convert(struct perf_time_interval *ptime_buf, + const char *ostr, u64 start, u64 end, char *c) +{ + char *str; + int len = strlen(ostr), ret; + + /* +* c points to '%'. +* '%' should be the last character +*/ + if (ostr + len - 1 != c) + return -1; + + /* +* Construct a string like "xx%/1" +*/ + str = malloc(len + 3); + if (str == NULL) + return -ENOMEM; + + memcpy(str, ostr, len); + strcpy(str + len, "/1"); + + ret = percent_slash_split(str, ptime_buf, start, end); + if (ret == 0) + ret = 1; + + free(str); + return ret; +} + int perf_time__percent_parse_str(struct perf_time_interval *ptime_buf, int num, const char *ostr, u64 start, u64 end) { @@ -270,6 +301,7 @@ int perf_time__percent_parse_str(struct perf_time_interval *ptime_buf, int num, * ostr example: * 10%/2,10%/3: select the second 10% slice and the third 10% slice * 0%-10%,30%-40%: multiple time range +* 50%: just one percent */ memset(ptime_buf, 0, sizeof(*ptime_buf) * num); @@ -286,6 +318,10 @@ int perf_time__percent_parse_str(struct perf_time_interval *ptime_buf, int num, end, percent_dash_split); } + c = strchr(ostr, '%'); + if (c) + return one_percent_convert(ptime_buf, ostr, start, end, c); + return -1; } -- 2.7.4
[PATCH v1 2/8] perf script: Improve error msg when no first/last sample time found
The following message will be returned to user when executing 'perf script --time' if perf data file doesn't contain the first/last sample time. "HINT: no first/last sample time found in perf data. Please use latest perf binary to execute 'perf record' (if '--buildid-all' is enabled, needs to set '--timestamp-boundary')." Signed-off-by: Jin Yao --- tools/perf/builtin-script.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c index c1cce47..4f691af 100644 --- a/tools/perf/builtin-script.c +++ b/tools/perf/builtin-script.c @@ -3448,7 +3448,9 @@ int cmd_script(int argc, const char **argv) if (perf_time__parse_str(script.ptime_range, script.time_str) != 0) { if (session->evlist->first_sample_time == 0 && session->evlist->last_sample_time == 0) { - pr_err("No first/last sample time in perf data\n"); + pr_err("HINT: no first/last sample time found in perf data.\n" + "Please use latest perf binary to execute 'perf record'\n" + "(if '--buildid-all' is enabled, please set '--timestamp-boundary').\n"); err = -EINVAL; goto out_delete; } -- 2.7.4
[PATCH v1 6/8] perf util: Allocate time slices buffer according to number of comma
Previously we use a magic number 10 to limit the number of time slices. It's not very good. This patch creates a new function perf_time__range_alloc() to allocate time slices buffer. The number of buffer entries is determined by the number of comma in string but at least it will allocate one entry even if no comma is found. Signed-off-by: Jin Yao --- tools/perf/util/time-utils.c | 28 tools/perf/util/time-utils.h | 2 ++ 2 files changed, 30 insertions(+) diff --git a/tools/perf/util/time-utils.c b/tools/perf/util/time-utils.c index 5769f97..6193b46 100644 --- a/tools/perf/util/time-utils.c +++ b/tools/perf/util/time-utils.c @@ -325,6 +325,34 @@ int perf_time__percent_parse_str(struct perf_time_interval *ptime_buf, int num, return -1; } +struct perf_time_interval *perf_time__range_alloc(const char *ostr, int *size) +{ + const char *p1, *p2; + int i = 1; + struct perf_time_interval *ptime; + + /* +* At least allocate one time range. +*/ + if (!ostr) + goto alloc; + + p1 = ostr; + while (p1 < ostr + strlen(ostr)) { + p2 = strchr(p1, ','); + if (!p2) + break; + + p1 = p2 + 1; + i++; + } + +alloc: + *size = i; + ptime = calloc(i, sizeof(*ptime)); + return ptime; +} + bool perf_time__skip_sample(struct perf_time_interval *ptime, u64 timestamp) { /* if time is not set don't drop sample */ diff --git a/tools/perf/util/time-utils.h b/tools/perf/util/time-utils.h index 34d5eba..70b177d 100644 --- a/tools/perf/util/time-utils.h +++ b/tools/perf/util/time-utils.h @@ -16,6 +16,8 @@ int perf_time__parse_str(struct perf_time_interval *ptime, const char *ostr); int perf_time__percent_parse_str(struct perf_time_interval *ptime_buf, int num, const char *ostr, u64 start, u64 end); +struct perf_time_interval *perf_time__range_alloc(const char *ostr, int *size); + bool perf_time__skip_sample(struct perf_time_interval *ptime, u64 timestamp); bool perf_time__ranges_skip_sample(struct perf_time_interval *ptime_buf, -- 2.7.4
[PATCH] perf stat: Ignore error thread when enabling system-wide --per-thread
If we execute 'perf stat --per-thread' with non-root account (even set kernel.perf_event_paranoid = -1 yet), it reports the error: jinyao@skl:~$ perf stat --per-thread Error: You may not have permission to collect system-wide stats. Consider tweaking /proc/sys/kernel/perf_event_paranoid, which controls use of the performance events system by unprivileged users (without CAP_SYS_ADMIN). The current value is 2: -1: Allow use of (almost) all events by all users Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK >= 0: Disallow ftrace function tracepoint by users without CAP_SYS_ADMIN Disallow raw tracepoint access by users without CAP_SYS_ADMIN >= 1: Disallow CPU event access by users without CAP_SYS_ADMIN >= 2: Disallow kernel profiling by users without CAP_SYS_ADMIN To make this setting permanent, edit /etc/sysctl.conf too, e.g.: kernel.perf_event_paranoid = -1 Perhaps the ptrace rule doesn't allow to trace some processes. But anyway the global --per-thread mode had better ignore such errors and continue working on other threads. This patch will record the index of error thread in perf_evsel__open() and remove this thread before retrying. For example (run with non-root, kernel.perf_event_paranoid isn't set): jinyao@skl:~$ perf stat --per-thread ^C Performance counter stats for 'system wide': vmstat-3458 6.171984 cpu-clock:u (msec)# 0.000 CPUs utilized perf-3670 0.515599 cpu-clock:u (msec)# 0.000 CPUs utilized vmstat-3458 1,163,643 cycles:u # 0.189 GHz perf-3670 40,881 cycles:u # 0.079 GHz vmstat-3458 1,410,238 instructions:u# 1.21 insn per cycle perf-3670 3,536 instructions:u# 0.09 insn per cycle vmstat-3458288,937 branches:u# 46.814 M/sec perf-3670936 branches:u# 1.815 M/sec vmstat-3458 15,195 branch-misses:u # 5.26% of all branches perf-3670 76 branch-misses:u # 8.12% of all branches 12.651675247 seconds time elapsed Signed-off-by: Jin Yao --- tools/perf/builtin-stat.c| 14 +- tools/perf/util/evsel.c | 3 +++ tools/perf/util/thread_map.c | 1 + tools/perf/util/thread_map.h | 1 + 4 files changed, 18 insertions(+), 1 deletion(-) diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c index 98bf9d3..bcdb47c 100644 --- a/tools/perf/builtin-stat.c +++ b/tools/perf/builtin-stat.c @@ -632,7 +632,19 @@ static int __run_perf_stat(int argc, const char **argv) if (verbose > 0) ui__warning("%s\n", msg); goto try_again; -} + } else if (target__has_per_thread(&target) && + evsel_list->threads && + evsel_list->threads->err_thread != -1) { + /* +* For global --per-thread case, skip current +* error thread. +*/ + if (!thread_map__remove(evsel_list->threads, + evsel_list->threads->err_thread)) { + evsel_list->threads->err_thread = -1; + goto try_again; + } + } perf_evsel__open_strerror(counter, &target, errno, msg, sizeof(msg)); diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c index a4d256e..12f8733 100644 --- a/tools/perf/util/evsel.c +++ b/tools/perf/util/evsel.c @@ -1899,6 +1899,9 @@ int perf_evsel__open(struct perf_evsel *evsel, struct cpu_map *cpus, goto fallback_missing_features; } out_close: + if (err) + threads->err_thread = thread; + do { while (--thread >= 0) { close(FD(evsel, cpu, thread)); diff --git a/tools/perf/util/thread_map.c b/tools/perf/util/thread_map.c index 3e1038f..870cb0c 100644 --- a/tools/perf/util/thread_map.c +++ b/tools/perf/util/thread_map.c @@ -32,6 +32,7 @@ static void thread_map__reset(struct thread_map *map, int start, int nr) size_t size = (nr - start) * sizeof(map->map[0]); memset(&map->map[start], 0, size); + map->err_thread = -1; } static s
Re: [PATCH v1 0/8] perf: Follow-up patches to improve time slice
On 1/16/2018 7:55 PM, Jiri Olsa wrote: On Wed, Jan 10, 2018 at 11:00:25PM +0800, Jin Yao wrote: It's follow-up patches to improve the perf time slice feature (perf report/script --time xxx) 1. Improve the error message perf report: Improve error msg when no first/last sample time found perf script: Improve error msg when no first/last sample time found 2. Fix an issue that illegal percent was accepted previously (e.g. 1abc%) perf util: Improve error checking for time percent input 3. Omit the slice index if possible. For example, perf report --stdio --time 10%/1 is equivalent to perf report --stdio --time 10% perf util: Support no index time percent slice 4. Add indication of time slices in perf report header. perf report: Add an indication of what time slices are used 5. Remove the time slices number limitation in perf report/script perf util: Allocate time slices buffer according to number of comma perf report: Remove the time slices number limitation perf script: Remove the time slices number limitation Jin Yao (8): perf report: Improve error msg when no first/last sample time found perf script: Improve error msg when no first/last sample time found perf util: Improve error checking for time percent input perf util: Support no index time percent slice perf report: Add an indication of what time slices are used perf util: Allocate time slices buffer according to number of comma perf report: Remove the time slices number limitation perf script: Remove the time slices number limitation from quick look it looks ok Reviewed-by: Jiri Olsa thanks, jirka Hi Jiri, Thanks so much for reviewing the patch. Thanks Jin Yao
Re: [PATCH] perf stat: Ignore error thread when enabling system-wide --per-thread
Just tested. But looks it's not OK for '--per-thread' case. jinyao@skl:~/$ ./perf stat --per-thread WARNING: Ignored open failure for pid 1 WARNING: Ignored open failure for pid 2 WARNING: Ignored open failure for pid 4 WARNING: Ignored open failure for pid 6 WARNING: Ignored open failure for pid 7 WARNING: Ignored open failure for pid 8 WARNING: Ignored open failure for pid 9 WARNING: Ignored open failure for pid 10 .. WARNING: Ignored open failure for pid 11783 WARNING: Ignored open failure for pid 12275 WARNING: Ignored open failure for pid 31733 WARNING: Ignored open failure for pid 31737 Error: You may not have permission to collect system-wide stats. Consider tweaking /proc/sys/kernel/perf_event_paranoid, which controls use of the performance events system by unprivileged users (without CAP_SYS_ADMIN). The current value is 2: -1: Allow use of (almost) all events by all users Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK >= 0: Disallow ftrace function tracepoint by users without CAP_SYS_ADMIN Disallow raw tracepoint access by users without CAP_SYS_ADMIN >= 1: Disallow CPU event access by users without CAP_SYS_ADMIN >= 2: Disallow kernel profiling by users without CAP_SYS_ADMIN To make this setting permanent, edit /etc/sysctl.conf too, e.g.: kernel.perf_event_paranoid = -1 Thanks Jin Yao On 1/16/2018 8:51 PM, Jiri Olsa wrote: On Tue, Jan 16, 2018 at 11:43:08PM +0800, Jin Yao wrote: If we execute 'perf stat --per-thread' with non-root account (even set kernel.perf_event_paranoid = -1 yet), it reports the error: jinyao@skl:~$ perf stat --per-thread Error: You may not have permission to collect system-wide stats. Consider tweaking /proc/sys/kernel/perf_event_paranoid, which controls use of the performance events system by unprivileged users (without CAP_SYS_ADMIN). The current value is 2: -1: Allow use of (almost) all events by all users Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK = 0: Disallow ftrace function tracepoint by users without CAP_SYS_ADMIN Disallow raw tracepoint access by users without CAP_SYS_ADMIN = 1: Disallow CPU event access by users without CAP_SYS_ADMIN = 2: Disallow kernel profiling by users without CAP_SYS_ADMIN To make this setting permanent, edit /etc/sysctl.conf too, e.g.: kernel.perf_event_paranoid = -1 Perhaps the ptrace rule doesn't allow to trace some processes. But anyway the global --per-thread mode had better ignore such errors and continue working on other threads. This patch will record the index of error thread in perf_evsel__open() and remove this thread before retrying. For example (run with non-root, kernel.perf_event_paranoid isn't set): jinyao@skl:~$ perf stat --per-thread ^C Performance counter stats for 'system wide': vmstat-3458 6.171984 cpu-clock:u (msec)# 0.000 CPUs utilized perf-3670 0.515599 cpu-clock:u (msec)# 0.000 CPUs utilized vmstat-3458 1,163,643 cycles:u # 0.189 GHz perf-3670 40,881 cycles:u # 0.079 GHz vmstat-3458 1,410,238 instructions:u# 1.21 insn per cycle perf-3670 3,536 instructions:u# 0.09 insn per cycle vmstat-3458288,937 branches:u# 46.814 M/sec perf-3670936 branches:u# 1.815 M/sec vmstat-3458 15,195 branch-misses:u # 5.26% of all branches perf-3670 76 branch-misses:u # 8.12% of all branches 12.651675247 seconds time elapsed could we use the existing code like in attached patch? we might also swap following warning for some generic one: WARNING: Ignored open failure for pid 28392 jirka --- diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c index 98bf9d32f222..f9d4d3ad6fc5 100644 --- a/tools/perf/builtin-stat.c +++ b/tools/perf/builtin-stat.c @@ -276,6 +276,9 @@ static int create_perf_stat_counter(struct perf_evsel *evsel) attr->enable_on_exec = 1; } + if (stat_config.aggr_mode == AGGR_THREAD) + evsel->ignore_missing_thread = true; + if (target__has_cpu(&target) && !target__has_per_thread(&target)) return perf_evsel__open_per_cpu(evsel, perf_evsel__cpus(evsel)); diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c index 8f971a2301d1..509ee175bc97 100644 --- a/tools/perf/util/evsel.c +++ b/tools/perf/util/evsel.c @@ -1656,7 +1656,7 @@ static bool ignore_missing_thread(struct perf_evsel *evsel, return false; /* The
Re: [PATCH] perf stat: Ignore error thread when enabling system-wide --per-thread
On 1/16/2018 9:17 PM, Jiri Olsa wrote: On Tue, Jan 16, 2018 at 09:06:09PM +0800, Jin, Yao wrote: Just tested. But looks it's not OK for '--per-thread' case. yea, I haven't tested much.. might need soem tweaking, but my point was that it could be doable on one place instead of introducing another if possible jirka Yes, I understand. It'd better we can put the code to more common places. Actually I used the similar patch as yours at first. While I found it didn't work for the case of system-wide '--per-thread' then I developed current one. Thanks Jin Yao
[PATCH v1 3/4] perf annotate: Process the new switch flag tui_dump
The tui_dump is created as a parameter and it will be finally passed to symbol__tui_annotate() and be saved to browser.dump. It's a switch for TUI routines to indicate if TUI output needs to be dumped to stdio. Signed-off-by: Jin Yao --- tools/perf/builtin-annotate.c | 2 +- tools/perf/ui/browsers/annotate.c | 41 +++ tools/perf/ui/browsers/hists.c| 2 +- tools/perf/util/annotate.h| 6 -- tools/perf/util/hist.h| 11 +++ 5 files changed, 42 insertions(+), 20 deletions(-) diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c index 2db5b50..8ba8e2c 100644 --- a/tools/perf/builtin-annotate.c +++ b/tools/perf/builtin-annotate.c @@ -334,7 +334,7 @@ static void hists__find_annotations(struct hists *hists, /* skip missing symbols */ nd = rb_next(nd); } else if (use_browser == 1) { - key = hist_entry__tui_annotate(he, evsel, NULL); + key = hist_entry__tui_annotate(he, evsel, NULL, false); switch (key) { case -1: diff --git a/tools/perf/ui/browsers/annotate.c b/tools/perf/ui/browsers/annotate.c index 618edf9..bb7229d 100644 --- a/tools/perf/ui/browsers/annotate.c +++ b/tools/perf/ui/browsers/annotate.c @@ -122,7 +122,9 @@ static void disasm_line__write(struct disasm_line *dl, struct ui_browser *browse char *bf, size_t size) { if (dl->ins.ops && dl->ins.ops->scnprintf) { - if (ins__is_jump(&dl->ins)) { + if (browser->dump) { + ui_browser__write_nstring(browser, " ", 2); + } else if (ins__is_jump(&dl->ins)) { bool fwd = dl->ops.target.offset > dl->al.offset; ui_browser__write_graph(browser, fwd ? SLSMG_DARROW_CHAR : @@ -217,7 +219,10 @@ static void annotate_browser__write(struct ui_browser *browser, void *entry, int ui_browser__printf(browser, "%*s ", CYCLES_WIDTH - 1, "Cycle"); } - SLsmg_write_char(' '); + if (browser->dump) + putchar(' '); + else + SLsmg_write_char(' '); /* The scroll bar isn't being used */ if (!browser->navkeypressed) @@ -388,6 +393,9 @@ static unsigned int annotate_browser__refresh(struct ui_browser *browser) int ret = ui_browser__list_head_refresh(browser); int pcnt_width = annotate_browser__pcnt_width(ab); + if (browser->dump) + return ret; + if (annotate_browser__opts.jump_arrows) annotate_browser__draw_current_jump(browser); @@ -589,7 +597,7 @@ static bool annotate_browser__callq(struct annotate_browser *browser, } pthread_mutex_unlock(¬es->lock); - symbol__tui_annotate(dl->ops.target.sym, ms->map, evsel, hbt); + symbol__tui_annotate(dl->ops.target.sym, ms->map, evsel, hbt, false); sym_title(ms->sym, ms->map, title, sizeof(title)); ui_browser__show_title(&browser->b, title); return true; @@ -961,7 +969,7 @@ static int annotate_browser__run(struct annotate_browser *browser, } int map_symbol__tui_annotate(struct map_symbol *ms, struct perf_evsel *evsel, -struct hist_browser_timer *hbt) +struct hist_browser_timer *hbt, bool tui_dump) { /* Set default value for show_total_period and show_nr_samples */ annotate_browser__opts.show_total_period = @@ -969,17 +977,19 @@ int map_symbol__tui_annotate(struct map_symbol *ms, struct perf_evsel *evsel, annotate_browser__opts.show_nr_samples = symbol_conf.show_nr_samples; - return symbol__tui_annotate(ms->sym, ms->map, evsel, hbt); + return symbol__tui_annotate(ms->sym, ms->map, evsel, hbt, tui_dump); } int hist_entry__tui_annotate(struct hist_entry *he, struct perf_evsel *evsel, -struct hist_browser_timer *hbt) +struct hist_browser_timer *hbt, bool tui_dump) { /* reset abort key so that it can get Ctrl-C as a key */ - SLang_reset_tty(); - SLang_init_tty(0, 0, 0); + if (!tui_dump) { + SLang_reset_tty(); + SLang_init_tty(0, 0, 0); + } - return map_symbol__tui_annotate(&he->ms, evsel, hbt); + return map_symbol__tui_annotate(&he->ms, evsel, hbt, tui_dump); } @@ -1100,7 +1110,8 @@ static inline int width_jumps(int n) int symbol__tui_annotate(struct symbol *sym, struct map *map, struct perf_evsel *evsel, -struct hist_browser_timer *hbt) +
[PATCH v1 4/4] perf annotate: Enable the '--tui-dump' mode
It creates a new option '--tui-dump' in perf annotate command line. With this option, for example, the perf annotate output: $ perf annotate compute_flag --tui-dump Percent IPC Cycle Disassembly of section .text: 00400640 : compute_flag(): volatile int count; static unsigned int s_randseed; __attribute__((noinline)) int compute_flag() { 23.00 1.16 sub$0x8,%rsp int i; i = rand() % 2; 23.06 1.16 1callq rand@plt return i; 27.01 3.38 mov%eax,%edx } 3.38 add$0x8,%rsp { int i; i = rand() % 2; return i; 3.38 shr$0x1f,%edx 3.38 add%edx,%eax 3.38 and$0x1,%eax 3.38 sub%edx,%eax } 26.93 3.38 2retq Compared to TUI output, $ perf annotate compute_flag │Disassembly of section .text: │ │00400640 │compute_flag(): │volatile int count; │static unsigned int s_randseed; │ │__attribute__((noinline)) │int compute_flag() │{ 23.00 │1.16 sub$0x8,%rsp │int i; │ │i = rand() % 2; 23.06 │1.16 1 → callq rand@plt │ │return i; 27.01 │3.38 mov%eax,%edx │} │3.38 add$0x8,%rsp │{ │int i; │ │i = rand() % 2; │ │return i; │3.38 shr$0x1f,%edx │3.38 add%edx,%eax │3.38 and$0x1,%eax │3.38 sub%edx,%eax │} 26.93 │3.38 2 ← retq Signed-off-by: Jin Yao --- tools/perf/Documentation/perf-annotate.txt | 3 +++ tools/perf/builtin-annotate.c | 12 ++-- 2 files changed, 13 insertions(+), 2 deletions(-) diff --git a/tools/perf/Documentation/perf-annotate.txt b/tools/perf/Documentation/perf-annotate.txt index 292809c3..d52ae47 100644 --- a/tools/perf/Documentation/perf-annotate.txt +++ b/tools/perf/Documentation/perf-annotate.txt @@ -83,6 +83,9 @@ OPTIONS --gtk:: Use the GTK interface. +--tui-dump: Use the TUI interface. The TUI output is dumped to stdio + interface. + -C:: --cpu=:: Only report samples for the list of CPUs provided. Multiple CPUs can be provided as a comma-separated list with no space: 0,1. Ranges of diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c index 8ba8e2c..5110160 100644 --- a/tools/perf/builtin-annotate.c +++ b/tools/perf/builtin-annotate.c @@ -41,6 +41,7 @@ struct perf_annotate { struct perf_tool tool; struct perf_session *session; bool use_tui, use_stdio, use_gtk; + bool tui_dump; bool full_paths; bool print_line; bool skip_missing; @@ -334,7 +335,10 @@ static void hists__find_annotations(struct hists *hists, /* skip missing symbols */ nd = rb_next(nd); } else if (use_browser == 1) { - key = hist_entry__tui_annotate(he, evsel, NULL, false); + key = hist_entry__tui_annotate(he, evsel, NULL, + ann->tui_dump); + if (ann->tui_dump) + return; switch (key) { case -1: @@ -486,6 +490,7 @@ int cmd_annotate(int argc, const char **argv) "dump raw trace in ASCII"), OPT_BOOLEAN(0, "gtk", &annotate.use_gtk, "Use the GTK interface"), OPT_BOOLEAN(0, "tui", &annotate.use_tui, "Use the TUI interface"), + OPT_BOOLEAN(0, "tui-dump", &annotate.tui_dump, "Dump TUI content"), OPT_BOOLEAN(0, "stdio", &annotate.use_stdio, "Use the stdio interface"), OPT_STRING('k', "vmlinux", &symbol_conf.vmlinux_name, "file", "vmlinux pathname"), @@ -576,7 +581,10 @@ int cmd_annotat
[PATCH v1 0/4] perf annotate: Create a new '--tui-dump' option
There is a requirement to let perf annotate support displaying the IPC/Cycle. In previous patch, this is supported in TUI mode. While it's not convenient for users since they have to take screen shots and copy/paste data. This patch series introduces a new option '--tui-dump' in perf annotate to dump the TUI output to stdio. User can easily use the command line like: 'perf annotate --tui-dump > /tmp/log.txt' For example: $ perf annotate compute_flag --tui-dump Percent IPC Cycle Disassembly of section .text: 00400640 : compute_flag(): volatile int count; static unsigned int s_randseed; __attribute__((noinline)) int compute_flag() { 23.00 1.16 sub$0x8,%rsp int i; i = rand() % 2; 23.06 1.16 1callq rand@plt return i; 27.01 3.38 mov%eax,%edx } 3.38 add$0x8,%rsp { int i; i = rand() % 2; return i; 3.38 shr$0x1f,%edx 3.38 add%edx,%eax 3.38 and$0x1,%eax 3.38 sub%edx,%eax } 26.93 3.38 2retq The '--stdio' option is still kept now. Maybe in future, we can only maintain the TUI routines and drop the lagacy stdio code. But right now we'd better keep it until the '--tui-dump' option is good enough. Jin Yao (4): perf browser: Add a new 'dump' flag perf browser: bypass ui_init if in tui dump mode perf annotate: Process the new switch flag tui_dump perf annotate: Enable the '--tui-dump' mode tools/perf/Documentation/perf-annotate.txt | 3 +++ tools/perf/builtin-annotate.c | 12 +++-- tools/perf/builtin-c2c.c | 2 +- tools/perf/builtin-report.c| 2 +- tools/perf/builtin-top.c | 2 +- tools/perf/ui/browser.c| 38 +++ tools/perf/ui/browser.h| 1 + tools/perf/ui/browsers/annotate.c | 41 +- tools/perf/ui/browsers/hists.c | 2 +- tools/perf/ui/setup.c | 9 +-- tools/perf/ui/ui.h | 2 +- tools/perf/util/annotate.h | 6 +++-- tools/perf/util/hist.h | 11 +--- 13 files changed, 99 insertions(+), 32 deletions(-) -- 2.7.4
[PATCH v1 1/4] perf browser: Add a new 'dump' flag
We have a new requirement to provide the TUI output option for non-interactive mode. For example, write the TUI output to stdio directly. This patch creates a new flag 'dump' in struct ui_browser. Once it's on, for the formatted buffer, we just print it on stdio. Signed-off-by: Jin Yao --- tools/perf/ui/browser.c | 38 +- tools/perf/ui/browser.h | 1 + 2 files changed, 34 insertions(+), 5 deletions(-) diff --git a/tools/perf/ui/browser.c b/tools/perf/ui/browser.c index 63399af..3534c6c 100644 --- a/tools/perf/ui/browser.c +++ b/tools/perf/ui/browser.c @@ -41,8 +41,13 @@ int ui_browser__set_color(struct ui_browser *browser, int color) void ui_browser__set_percent_color(struct ui_browser *browser, double percent, bool current) { -int color = ui_browser__percent_color(browser, percent, current); -ui_browser__set_color(browser, color); + int color; + + if (browser->dump) + return; + + color = ui_browser__percent_color(browser, percent, current); + ui_browser__set_color(browser, color); } void ui_browser__gotorc(struct ui_browser *browser, int y, int x) @@ -50,10 +55,27 @@ void ui_browser__gotorc(struct ui_browser *browser, int y, int x) SLsmg_gotorc(browser->y + y, browser->x + x); } +static void write_nstring(const char *msg, unsigned int width) +{ + unsigned int len = strlen(msg); + unsigned int i = 0; + + while (i < len && i < width) { + putchar(msg[i]); + i++; + } + + while (i++ < width) + putchar(' '); +} + void ui_browser__write_nstring(struct ui_browser *browser __maybe_unused, const char *msg, unsigned int width) { - slsmg_write_nstring(msg, width); + if (browser->dump) + write_nstring(msg, width); + else + slsmg_write_nstring(msg, width); } void ui_browser__printf(struct ui_browser *browser __maybe_unused, const char *fmt, ...) @@ -61,7 +83,10 @@ void ui_browser__printf(struct ui_browser *browser __maybe_unused, const char *f va_list args; va_start(args, fmt); - slsmg_vprintf(fmt, args); + if (browser->dump) + vprintf(fmt, args); + else + slsmg_vprintf(fmt, args); va_end(args); } @@ -496,7 +521,10 @@ unsigned int ui_browser__list_head_refresh(struct ui_browser *browser) list_for_each_from(pos, head) { if (!browser->filter || !browser->filter(browser, pos)) { - ui_browser__gotorc(browser, row, 0); + if (!browser->dump) + ui_browser__gotorc(browser, row, 0); + else if (row > 0) + printf("\n"); browser->write(browser, pos, row); if (++row == browser->rows) break; diff --git a/tools/perf/ui/browser.h b/tools/perf/ui/browser.h index 03e1734..66d9405 100644 --- a/tools/perf/ui/browser.h +++ b/tools/perf/ui/browser.h @@ -28,6 +28,7 @@ struct ui_browser { u32 nr_entries; bool navkeypressed; bool use_navkeypressed; + bool dump; }; int ui_browser__set_color(struct ui_browser *browser, int color); -- 2.7.4
[PATCH v1 2/4] perf browser: bypass ui_init if in tui dump mode
We create a tui dump mode in previous patch. In tui dump mode, the output is written to stdio. We need to bypass ui_init() in setup_browser(). Signed-off-by: Jin Yao --- tools/perf/builtin-annotate.c | 2 +- tools/perf/builtin-c2c.c | 2 +- tools/perf/builtin-report.c | 2 +- tools/perf/builtin-top.c | 2 +- tools/perf/ui/setup.c | 9 +++-- tools/perf/ui/ui.h| 2 +- 6 files changed, 12 insertions(+), 7 deletions(-) diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c index ead6ae4..2db5b50 100644 --- a/tools/perf/builtin-annotate.c +++ b/tools/perf/builtin-annotate.c @@ -576,7 +576,7 @@ int cmd_annotate(int argc, const char **argv) else if (annotate.use_gtk) use_browser = 2; - setup_browser(true); + setup_browser(true, false); if (use_browser == 1 && annotate.has_br_stack) { sort__mode = SORT_MODE__BRANCH; diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c index 98d243f..6abc13b 100644 --- a/tools/perf/builtin-c2c.c +++ b/tools/perf/builtin-c2c.c @@ -2618,7 +2618,7 @@ static int perf_c2c__report(int argc, const char **argv) else use_browser = 1; - setup_browser(false); + setup_browser(false, false); err = perf_session__process_events(session); if (err) { diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c index 971ccba..99277b7 100644 --- a/tools/perf/builtin-report.c +++ b/tools/perf/builtin-report.c @@ -1261,7 +1261,7 @@ int cmd_report(int argc, const char **argv) } if (strcmp(input_name, "-") != 0) - setup_browser(true); + setup_browser(true, false); else use_browser = 0; diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c index 0a26b56..b7db01a 100644 --- a/tools/perf/builtin-top.c +++ b/tools/perf/builtin-top.c @@ -1430,7 +1430,7 @@ int cmd_top(int argc, const char **argv) else if (top.use_tui) use_browser = 1; - setup_browser(false); + setup_browser(false, false); if (setup_sorting(top.evlist) < 0) { if (sort_order) diff --git a/tools/perf/ui/setup.c b/tools/perf/ui/setup.c index 44fe824..8219b2e 100644 --- a/tools/perf/ui/setup.c +++ b/tools/perf/ui/setup.c @@ -73,9 +73,9 @@ int stdio__config_color(const struct option *opt __maybe_unused, return 0; } -void setup_browser(bool fallback_to_pager) +void setup_browser(bool fallback_to_pager, bool tui_dump) { - if (use_browser < 2 && (!isatty(1) || dump_trace)) + if (use_browser < 2 && (!isatty(1) || dump_trace) && !tui_dump) use_browser = 0; /* default to TUI */ @@ -92,6 +92,11 @@ void setup_browser(bool fallback_to_pager) /* fall through */ case 1: use_browser = 1; + if (tui_dump) { + setup_pager(); + break; + } + if (ui__init() == 0) break; /* fall through */ diff --git a/tools/perf/ui/ui.h b/tools/perf/ui/ui.h index 9b6fdf0..020a85b 100644 --- a/tools/perf/ui/ui.h +++ b/tools/perf/ui/ui.h @@ -11,7 +11,7 @@ extern void *perf_gtk_handle; extern int use_browser; -void setup_browser(bool fallback_to_pager); +void setup_browser(bool fallback_to_pager, bool tui_dump); void exit_browser(bool wait_for_ok); #ifdef HAVE_SLANG_SUPPORT -- 2.7.4
[PATCH v1 0/3] Support perf -vv
We keep having bug reports that when users build perf on their own, but they don't install some needed libraries such as libelf, libbfd/libibery. The perf can build, but it is missing important functionality. And users may complain that perf has issue or bug. This patch-set support 'perf -vv' which will print the compiled-in status of libraries. Once users think perf missing some functionality, it should be very easy for them to check the libraries status. For example: $ ./perf -vv perf version 4.13.rc5.g9b7a81b dwarf: [ on ] dwarf_getlocations: [ on ] glibc: [ on ] gtk2: [ on ] libaudit: [ off ] libbfd: [ on ] libelf: [ on ] libnuma: [ on ] numa_num_possible_cpus: [ on ] libperl: [ on ] libpython: [ on ] libslang: [ on ] libcrypto: [ on ] libunwind: [ on ] libdw-dwarf-unwind: [ on ] zlib: [ on ] lzma: [ on ] get_cpuid: [ on ] bpf: [ on ] Jin Yao (3): perf config: Add -DNO_GLIBC to CFLAGS perf version: Print the status of compiled-in libraries perf: Support perf -vv tools/perf/Makefile.config | 2 + tools/perf/builtin-version.c | 125 +++ tools/perf/builtin.h | 1 + tools/perf/perf.c| 6 +++ 4 files changed, 134 insertions(+) -- 2.7.4
[PATCH v1 1/3] perf config: Add -DNO_GLIBC to CFLAGS
For most of libraries, in perf.config, they can be recorded with -DHAVE_XXX or -DNO_XXX in CFLAGS according to if they are compiled-in. Then C code could know if the library is compiled-in or not. While for glibc, no existing -DHAVE_XXX or -DNO_XXX. This patch adds -DNO_GLIBC to CFLAGS. Signed-off-by: Jin Yao --- tools/perf/Makefile.config | 2 ++ 1 file changed, 2 insertions(+) diff --git a/tools/perf/Makefile.config b/tools/perf/Makefile.config index 98ff736..5883dd6 100644 --- a/tools/perf/Makefile.config +++ b/tools/perf/Makefile.config @@ -324,6 +324,8 @@ else NO_LIBBPF := 1 NO_JVMTI := 1 else + CFLAGS += -DNO_GLIBC + ifneq ($(filter s% -static%,$(LDFLAGS),),) msg := $(error No static glibc found, please install glibc-static); else -- 2.7.4
[PATCH v1 3/3] perf: Support perf -vv
We keep having bug reports that when users build perf on their own, but they don't install some needed libraries such as libelf, libbfd/libibery. The perf can build, but it is missing important functionality. This patch provides a new option '-vv' which will print the compiled-in status of libraries. For example: $ ./perf -vv perf version 4.13.rc5.g9b7a81b dwarf: [ on ] dwarf_getlocations: [ on ] glibc: [ on ] gtk2: [ on ] libaudit: [ off ] libbfd: [ on ] libelf: [ on ] libnuma: [ on ] numa_num_possible_cpus: [ on ] libperl: [ on ] libpython: [ on ] libslang: [ on ] libcrypto: [ on ] libunwind: [ on ] libdw-dwarf-unwind: [ on ] zlib: [ on ] lzma: [ on ] get_cpuid: [ on ] bpf: [ on ] Signed-off-by: Jin Yao --- tools/perf/perf.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/tools/perf/perf.c b/tools/perf/perf.c index 1b3fc8e..300c83d 100644 --- a/tools/perf/perf.c +++ b/tools/perf/perf.c @@ -64,6 +64,7 @@ static struct cmd_struct commands[] = { { "top",cmd_top,0 }, { "annotate", cmd_annotate, 0 }, { "version",cmd_version,0 }, + { "version2", cmd_version2, 0 }, { "script", cmd_script, 0 }, { "sched", cmd_sched, 0 }, #ifdef HAVE_LIBELF_SUPPORT @@ -190,6 +191,11 @@ static int handle_options(const char ***argv, int *argc, int *envchanged) break; } + if (!strcmp(cmd, "-vv")) { + (*argv)[0] = "--version2"; + break; + } + /* * Check remaining flags. */ -- 2.7.4
[PATCH v1 2/3] perf version: Print the status of compiled-in libraries
This patch checks the values passed by CFLAGS (-DXXX) and then print the status of libraries. For example, if HAVE_DWARF_SUPPORT is defined, that means the library "dwarf" is compiled-in. The patch will print the status "on" for this library. Signed-off-by: Jin Yao --- tools/perf/builtin-version.c | 125 +++ tools/perf/builtin.h | 1 + 2 files changed, 126 insertions(+) diff --git a/tools/perf/builtin-version.c b/tools/perf/builtin-version.c index 37019c5..90a0a7f 100644 --- a/tools/perf/builtin-version.c +++ b/tools/perf/builtin-version.c @@ -9,3 +9,128 @@ int cmd_version(int argc __maybe_unused, const char **argv __maybe_unused) printf("perf version %s\n", perf_version_string); return 0; } + +static void status_print(const char *name, const char *status) +{ + printf("%22s: [ %3s ]\n", name, status); +} + +static void library_status(void) +{ +#ifdef HAVE_DWARF_SUPPORT + status_print("dwarf", "on"); +#else + status_print("dwarf", "off"); +#endif + +#ifdef HAVE_DWARF_GETLOCATIONS + status_print("dwarf_getlocations", "on"); +#else + status_print("dwarf_getlocations", "off"); +#endif + +#ifdef NO_GLIBC + status_print("glibc", "off"); +#else + status_print("glibc", "on"); +#endif + +#ifdef HAVE_GTK2_SUPPORT + status_print("gtk2", "on"); +#else + status_print("gtk2", "off"); +#endif + +#ifdef HAVE_LIBAUDIT_SUPPORT + status_print("libaudit", "on"); +#else + status_print("libaudit", "off"); +#endif + +#ifdef HAVE_LIBBFD_SUPPORT + status_print("libbfd", "on"); +#else + status_print("libbfd", "off"); +#endif + +#ifdef HAVE_LIBELF_SUPPORT + status_print("libelf", "on"); +#else + status_print("libelf", "off"); +#endif + +#ifdef HAVE_LIBNUMA_SUPPORT + status_print("libnuma", "on"); + status_print("numa_num_possible_cpus", "on"); +#else + status_print("libnuma", "off"); + status_print("numa_num_possible_cpus", "off"); +#endif + +#ifdef NO_LIBPERL + status_print("libperl", "off"); +#else + status_print("libperl", "on"); +#endif + +#ifdef NO_LIBPYTHON + status_print("libpython", "off"); +#else + status_print("libpython", "on"); +#endif + +#ifdef HAVE_SLANG_SUPPORT + status_print("libslang", "on"); +#else + status_print("libslang", "off"); +#endif + +#ifdef HAVE_LIBCRYPTO_SUPPORT + status_print("libcrypto", "on"); +#else + status_print("libcrypto", "off"); +#endif + +#ifdef HAVE_LIBUNWIND_SUPPORT + status_print("libunwind", "on"); +#else + status_print("libunwind", "off"); +#endif + +#ifdef HAVE_DWARF_SUPPORT + status_print("libdw-dwarf-unwind", "on"); +#else + status_print("libdw-dwarf-unwind", "off"); +#endif + +#ifdef HAVE_ZLIB_SUPPORT + status_print("zlib", "on"); +#else + status_print("zlib", "off"); +#endif + +#ifdef HAVE_LZMA_SUPPORT + status_print("lzma", "on"); +#else + status_print("lzma", "off"); +#endif + +#ifdef HAVE_AUXTRACE_SUPPORT + status_print("get_cpuid", "on"); +#else + status_print("get_cpuid", "off"); +#endif + +#ifdef HAVE_LIBBPF_SUPPORT + status_print("bpf", "on"); +#else + status_print("bpf", "off"); +#endif +} + +int cmd_version2(int argc, const char **argv) +{ + cmd_version(argc, argv); + library_status(); + return 0; +} diff --git a/tools/perf/builtin.h b/tools/perf/builtin.h index 05745f3..c7508ee 100644 --- a/tools/perf/builtin.h +++ b/tools/perf/builtin.h @@ -29,6 +29,7 @@ int cmd_timechart(int argc, const char **argv); int cmd_top(int argc, const char **argv); int cmd_script(int argc, const char **argv); int cmd_version(int argc, const char **argv); +int cmd_version2(int argc, const char **argv); int cmd_probe(int argc, const char **argv); int cmd_kmem(int argc, const char **argv); int cmd_lock(int argc, const char **argv); -- 2.7.4
Re: [PATCH v1 0/3] Support perf -vv
On 3/26/2018 5:07 PM, Jiri Olsa wrote: On Mon, Mar 26, 2018 at 02:00:31AM -0700, Andi Kleen wrote: On Tue, Mar 27, 2018 at 12:07:01AM +0800, Jin Yao wrote: We keep having bug reports that when users build perf on their own, but they don't install some needed libraries such as libelf, libbfd/libibery. The perf can build, but it is missing important functionality. And users may complain that perf has issue or bug. This patch-set support 'perf -vv' which will print the compiled-in status of libraries. Once users think perf missing some functionality, it should be very easy for them to check the libraries status. I don't think this solves the problem. How should the user know that they need to run perf -vv. Also normal users don't know that libelf is needed for symbols for example. We need a warning that is visible together with the symbols and that clearly describes the problem. IIRC we decided to go *with* the message in the perf report or other affected command that would suggest to run 'perf -vv' for more details jirka Hi Andi, For how the users know they should run 'perf -vv', as the discussion with Jiri and Arnaldo, the message will be displayed if necessary when users perform perf report or other perf commands. That could be implemented in a follow-up patch (not in this patch-set). For let user know what the library they need, maybe we can add more information in 'perf -vv', for example, add something like, "Please install libelf-dev, libelf-devel or elfutils-libelf-devel before building perf" Thanks Jin Yao -Andi For example: $ ./perf -vv perf version 4.13.rc5.g9b7a81b dwarf: [ on ] dwarf_getlocations: [ on ] glibc: [ on ] gtk2: [ on ] libaudit: [ off ] libbfd: [ on ] libelf: [ on ] libnuma: [ on ] numa_num_possible_cpus: [ on ] libperl: [ on ] libpython: [ on ] libslang: [ on ] libcrypto: [ on ] libunwind: [ on ] libdw-dwarf-unwind: [ on ] zlib: [ on ] lzma: [ on ] get_cpuid: [ on ] bpf: [ on ] Jin Yao (3): perf config: Add -DNO_GLIBC to CFLAGS perf version: Print the status of compiled-in libraries perf: Support perf -vv tools/perf/Makefile.config | 2 + tools/perf/builtin-version.c | 125 +++ tools/perf/builtin.h | 1 + tools/perf/perf.c| 6 +++ 4 files changed, 134 insertions(+) -- 2.7.4
Re: [PATCH v1 2/3] perf version: Print the status of compiled-in libraries
On 3/26/2018 5:39 PM, Jiri Olsa wrote: On Tue, Mar 27, 2018 at 12:07:03AM +0800, Jin Yao wrote: This patch checks the values passed by CFLAGS (-DXXX) and then print the status of libraries. For example, if HAVE_DWARF_SUPPORT is defined, that means the library "dwarf" is compiled-in. The patch will print the status "on" for this library. Signed-off-by: Jin Yao --- tools/perf/builtin-version.c | 125 +++ tools/perf/builtin.h | 1 + 2 files changed, 126 insertions(+) diff --git a/tools/perf/builtin-version.c b/tools/perf/builtin-version.c index 37019c5..90a0a7f 100644 --- a/tools/perf/builtin-version.c +++ b/tools/perf/builtin-version.c @@ -9,3 +9,128 @@ int cmd_version(int argc __maybe_unused, const char **argv __maybe_unused) printf("perf version %s\n", perf_version_string); return 0; } + +static void status_print(const char *name, const char *status) +{ + printf("%22s: [ %3s ]\n", name, status); +} + +static void library_status(void) +{ +#ifdef HAVE_DWARF_SUPPORT + status_print("dwarf", "on"); +#else + status_print("dwarf", "off"); +#endif could this and all those below be in some generic macro? #define STATUS(__d, __m)\ #ifdef __d \ status_print(#__m, "on"); \ #else \ status_print(#__m, "OFF");\ #endif STATUS(HAVE_DWARF_SUPPORT, dwarf) also please print the 'OFF' and use colors as in the build message Fine, thanks. I will update the code. + +#ifdef HAVE_DWARF_GETLOCATIONS + status_print("dwarf_getlocations", "on"); +#else + status_print("dwarf_getlocations", "off"); +#endif + +#ifdef NO_GLIBC + status_print("glibc", "off"); +#else + status_print("glibc", "on"); +#endif + +#ifdef HAVE_GTK2_SUPPORT + status_print("gtk2", "on"); +#else + status_print("gtk2", "off"); +#endif + SNIP +} + +int cmd_version2(int argc, const char **argv) +{ + cmd_version(argc, argv); + library_status(); + return 0; +} diff --git a/tools/perf/builtin.h b/tools/perf/builtin.h index 05745f3..c7508ee 100644 --- a/tools/perf/builtin.h +++ b/tools/perf/builtin.h @@ -29,6 +29,7 @@ int cmd_timechart(int argc, const char **argv); int cmd_top(int argc, const char **argv); int cmd_script(int argc, const char **argv); int cmd_version(int argc, const char **argv); +int cmd_version2(int argc, const char **argv); please don't add new command for this, handle this in cmd_version Yeah, I know that, adding a new command is not a good idea. But how can I pass a new parameter to cmd_version to indicate it's the -vv case? Use argv[1]? But looks not good since argc is 1. Any idea about that? Thanks Jin Yao thanks, jirka
Re: [PATCH v1 2/3] perf version: Print the status of compiled-in libraries
On 3/26/2018 5:39 PM, Jiri Olsa wrote: On Tue, Mar 27, 2018 at 12:07:03AM +0800, Jin Yao wrote: This patch checks the values passed by CFLAGS (-DXXX) and then print the status of libraries. For example, if HAVE_DWARF_SUPPORT is defined, that means the library "dwarf" is compiled-in. The patch will print the status "on" for this library. Signed-off-by: Jin Yao --- tools/perf/builtin-version.c | 125 +++ tools/perf/builtin.h | 1 + 2 files changed, 126 insertions(+) diff --git a/tools/perf/builtin-version.c b/tools/perf/builtin-version.c index 37019c5..90a0a7f 100644 --- a/tools/perf/builtin-version.c +++ b/tools/perf/builtin-version.c @@ -9,3 +9,128 @@ int cmd_version(int argc __maybe_unused, const char **argv __maybe_unused) printf("perf version %s\n", perf_version_string); return 0; } + +static void status_print(const char *name, const char *status) +{ + printf("%22s: [ %3s ]\n", name, status); +} + +static void library_status(void) +{ +#ifdef HAVE_DWARF_SUPPORT + status_print("dwarf", "on"); +#else + status_print("dwarf", "off"); +#endif could this and all those below be in some generic macro? #define STATUS(__d, __m)\ #ifdef __d \ status_print(#__m, "on"); \ #else \ status_print(#__m, "OFF");\ #endif STATUS(HAVE_DWARF_SUPPORT, dwarf) Hi Jiri, I have tried this macro definition, but unfortunately the compilation is failed. error: '#' is not followed by a macro parameter #define STATUS(__d, __m) \ I just guess the '#' in #ifdef confuses the gcc. Looks we can't define #ifdef/#endif block in a macro. Thanks Jin Yao also please print the 'OFF' and use colors as in the build message + +#ifdef HAVE_DWARF_GETLOCATIONS + status_print("dwarf_getlocations", "on"); +#else + status_print("dwarf_getlocations", "off"); +#endif + +#ifdef NO_GLIBC + status_print("glibc", "off"); +#else + status_print("glibc", "on"); +#endif + +#ifdef HAVE_GTK2_SUPPORT + status_print("gtk2", "on"); +#else + status_print("gtk2", "off"); +#endif + SNIP +} + +int cmd_version2(int argc, const char **argv) +{ + cmd_version(argc, argv); + library_status(); + return 0; +} diff --git a/tools/perf/builtin.h b/tools/perf/builtin.h index 05745f3..c7508ee 100644 --- a/tools/perf/builtin.h +++ b/tools/perf/builtin.h @@ -29,6 +29,7 @@ int cmd_timechart(int argc, const char **argv); int cmd_top(int argc, const char **argv); int cmd_script(int argc, const char **argv); int cmd_version(int argc, const char **argv); +int cmd_version2(int argc, const char **argv); please don't add new command for this, handle this in cmd_version thanks, jirka
Re: [PATCH v1 2/3] perf version: Print the status of compiled-in libraries
On 3/26/2018 9:51 PM, Jin, Yao wrote: On 3/26/2018 5:39 PM, Jiri Olsa wrote: On Tue, Mar 27, 2018 at 12:07:03AM +0800, Jin Yao wrote: This patch checks the values passed by CFLAGS (-DXXX) and then print the status of libraries. For example, if HAVE_DWARF_SUPPORT is defined, that means the library "dwarf" is compiled-in. The patch will print the status "on" for this library. Signed-off-by: Jin Yao --- tools/perf/builtin-version.c | 125 +++ tools/perf/builtin.h | 1 + 2 files changed, 126 insertions(+) diff --git a/tools/perf/builtin-version.c b/tools/perf/builtin-version.c index 37019c5..90a0a7f 100644 --- a/tools/perf/builtin-version.c +++ b/tools/perf/builtin-version.c @@ -9,3 +9,128 @@ int cmd_version(int argc __maybe_unused, const char **argv __maybe_unused) printf("perf version %s\n", perf_version_string); return 0; } + +static void status_print(const char *name, const char *status) +{ + printf("%22s: [ %3s ]\n", name, status); +} + +static void library_status(void) +{ +#ifdef HAVE_DWARF_SUPPORT + status_print("dwarf", "on"); +#else + status_print("dwarf", "off"); +#endif could this and all those below be in some generic macro? #define STATUS(__d, __m) \ #ifdef __d \ status_print(#__m, "on"); \ #else \ status_print(#__m, "OFF"); \ #endif STATUS(HAVE_DWARF_SUPPORT, dwarf) also please print the 'OFF' and use colors as in the build message Fine, thanks. I will update the code. + +#ifdef HAVE_DWARF_GETLOCATIONS + status_print("dwarf_getlocations", "on"); +#else + status_print("dwarf_getlocations", "off"); +#endif + +#ifdef NO_GLIBC + status_print("glibc", "off"); +#else + status_print("glibc", "on"); +#endif + +#ifdef HAVE_GTK2_SUPPORT + status_print("gtk2", "on"); +#else + status_print("gtk2", "off"); +#endif + SNIP +} + +int cmd_version2(int argc, const char **argv) +{ + cmd_version(argc, argv); + library_status(); + return 0; +} diff --git a/tools/perf/builtin.h b/tools/perf/builtin.h index 05745f3..c7508ee 100644 --- a/tools/perf/builtin.h +++ b/tools/perf/builtin.h @@ -29,6 +29,7 @@ int cmd_timechart(int argc, const char **argv); int cmd_top(int argc, const char **argv); int cmd_script(int argc, const char **argv); int cmd_version(int argc, const char **argv); +int cmd_version2(int argc, const char **argv); please don't add new command for this, handle this in cmd_version Yeah, I know that, adding a new command is not a good idea. But how can I pass a new parameter to cmd_version to indicate it's the -vv case? Use argv[1]? But looks not good since argc is 1. Any idea about that? Thanks Jin Yao Hi Jiri, Maybe could we just add the information to 'perf -v'? I mean we don't need the 'perf -vv', directly add the information to 'perf -v'. Is it OK? Thanks Jin Yao thanks, jirka
Re: [PATCH v1 2/3] perf version: Print the status of compiled-in libraries
On 3/27/2018 1:58 PM, Ingo Molnar wrote: * Jin Yao wrote: +#ifdef HAVE_DWARF_SUPPORT +#ifdef HAVE_DWARF_GETLOCATIONS +#ifdef NO_GLIBC +#ifdef HAVE_GTK2_SUPPORT +#ifdef HAVE_LIBAUDIT_SUPPORT +#ifdef HAVE_LIBBFD_SUPPORT +#ifdef HAVE_LIBELF_SUPPORT +#ifdef HAVE_LIBNUMA_SUPPORT +#ifdef NO_LIBPERL +#ifdef NO_LIBPYTHON +#ifdef HAVE_SLANG_SUPPORT +#ifdef HAVE_LIBCRYPTO_SUPPORT +#ifdef HAVE_LIBUNWIND_SUPPORT +#ifdef HAVE_DWARF_SUPPORT +#ifdef HAVE_ZLIB_SUPPORT +#ifdef HAVE_LZMA_SUPPORT +#ifdef HAVE_AUXTRACE_SUPPORT +#ifdef HAVE_LIBBPF_SUPPORT BTW., it would be nice at this point to fix those 3 outliers that have a negation in their library support status macro: +#ifdef NO_GLIBC +#ifdef NO_LIBPERL +#ifdef NO_LIBPYTHON ... and invert them back to the HAVE_* side of the logic: +#ifdef HAVE_GLIBC +#ifdef HAVE_LIBPERL +#ifdef HAVE_LIBPYTHON That should make all related code more consistent and more readable. Thanks, Ingo Yes, it's better, thanks! I will try to convert them to the HAVE_*. Thanks Jin Yao
Re: [PATCH v1 3/3] perf: Support perf -vv
On 3/27/2018 2:03 PM, Ingo Molnar wrote: * Jin Yao wrote: +++ b/tools/perf/perf.c @@ -64,6 +64,7 @@ static struct cmd_struct commands[] = { { "top", cmd_top,0 }, { "annotate", cmd_annotate, 0 }, { "version", cmd_version,0 }, + { "version2", cmd_version2, 0 }, { "script", cmd_script, 0 }, { "sched",cmd_sched, 0 }, #ifdef HAVE_LIBELF_SUPPORT @@ -190,6 +191,11 @@ static int handle_options(const char ***argv, int *argc, int *envchanged) break; } + if (!strcmp(cmd, "-vv")) { + (*argv)[0] = "--version2"; + break; + } That is just lazy hackery, please use the regular option library to add options and document them! Also, '--version2' is a sloppy name, the right solution is to do what Git does, it has a --build-options sub-option to 'perf version': triton:~/tip> git version -h usage: git version [] --build-options also print build options triton:~/tip> git version --build-options git version 2.14.1 sizeof-long: 8 This should be done for perf as well, and _then_ maybe can we add a helper alias that maps '-vv' to 'version --build-options'. Thanks, Ingo Something like 'perf version --build-options' and that will be mapped to '-vv'. It's a good idea, thanks! Thanks Jin Yao
Re: [PATCH v1 2/3] perf version: Print the status of compiled-in libraries
On 3/27/2018 8:56 PM, Jiri Olsa wrote: On Tue, Mar 27, 2018 at 09:44:23AM +0800, Jin, Yao wrote: On 3/26/2018 5:39 PM, Jiri Olsa wrote: On Tue, Mar 27, 2018 at 12:07:03AM +0800, Jin Yao wrote: This patch checks the values passed by CFLAGS (-DXXX) and then print the status of libraries. For example, if HAVE_DWARF_SUPPORT is defined, that means the library "dwarf" is compiled-in. The patch will print the status "on" for this library. Signed-off-by: Jin Yao --- tools/perf/builtin-version.c | 125 +++ tools/perf/builtin.h | 1 + 2 files changed, 126 insertions(+) diff --git a/tools/perf/builtin-version.c b/tools/perf/builtin-version.c index 37019c5..90a0a7f 100644 --- a/tools/perf/builtin-version.c +++ b/tools/perf/builtin-version.c @@ -9,3 +9,128 @@ int cmd_version(int argc __maybe_unused, const char **argv __maybe_unused) printf("perf version %s\n", perf_version_string); return 0; } + +static void status_print(const char *name, const char *status) +{ + printf("%22s: [ %3s ]\n", name, status); +} + +static void library_status(void) +{ +#ifdef HAVE_DWARF_SUPPORT + status_print("dwarf", "on"); +#else + status_print("dwarf", "off"); +#endif could this and all those below be in some generic macro? #define STATUS(__d, __m)\ #ifdef __d \ status_print(#__m, "on"); \ #else \ status_print(#__m, "OFF");\ #endif STATUS(HAVE_DWARF_SUPPORT, dwarf) Hi Jiri, I have tried this macro definition, but unfortunately the compilation is failed. error: '#' is not followed by a macro parameter #define STATUS(__d, __m) \ I just guess the '#' in #ifdef confuses the gcc. Looks we can't define #ifdef/#endif block in a macro. ah crap.. right ;-) how about we take the IS_BUILTIN thingie from include/linux/kconfig.h and use it as in attached test change.. it gives me: [jolsa@krava perf]$ ./perf version perf version 4.16.rc6.g1ee9a60 dwarf: [ on ] krava: [ OFF ] we could put those macros into tools/include/tools/config.h, Arnaldo? together with the comments from kconfig.h that I cut out.. thanks, jirka --- diff --git a/tools/perf/builtin-version.c b/tools/perf/builtin-version.c index 1fe458792cc1..b392f9e80fd0 100644 --- a/tools/perf/builtin-version.c +++ b/tools/perf/builtin-version.c @@ -6,8 +6,33 @@ int version_verbose; + +#define __ARG_PLACEHOLDER_1 0, +#define __take_second_arg(__ignored, val, ...) val + +#define __is_defined(x) ___is_defined(x) +#define ___is_defined(val) is_defined(__ARG_PLACEHOLDER_##val) +#define is_defined(arg1_or_junk)__take_second_arg(arg1_or_junk 1, 0) + +#define IS_BUILTIN(option) __is_defined(option) + + +static void status_print(const char *name, const char *status) +{ + printf("%22s: [ %3s ]\n", name, status); +} + int cmd_version(int argc __maybe_unused, const char **argv __maybe_unused) { printf("perf version %s\n", perf_version_string); + +#define STATUS(__d, __m) \ + if (IS_BUILTIN(__d))\ + status_print(#__m, "on"); \ + else\ + status_print(#__m, "OFF"); + + STATUS(HAVE_DWARF_SUPPORT, dwarf) + STATUS(HAVE_KRAVA_SUPPORT, krava) return 0; } Hi Jiri, It looks we need to copy the IS_BUILTIN and associated macro definitions to a common header file. I guess you will post this patch if Arnaldo agree with this idea, right? I plan to post v2 of this patch series after this 'IS_BUILTIN' patch gets merged. Thanks Jin Yao
Re: [PATCH v1 2/3] perf version: Print the status of compiled-in libraries
On 3/27/2018 8:38 PM, Jiri Olsa wrote: On Mon, Mar 26, 2018 at 09:51:03PM +0800, Jin, Yao wrote: On 3/26/2018 5:39 PM, Jiri Olsa wrote: On Tue, Mar 27, 2018 at 12:07:03AM +0800, Jin Yao wrote: This patch checks the values passed by CFLAGS (-DXXX) and then print the status of libraries. For example, if HAVE_DWARF_SUPPORT is defined, that means the library "dwarf" is compiled-in. The patch will print the status "on" for this library. Signed-off-by: Jin Yao --- tools/perf/builtin-version.c | 125 +++ tools/perf/builtin.h | 1 + 2 files changed, 126 insertions(+) diff --git a/tools/perf/builtin-version.c b/tools/perf/builtin-version.c index 37019c5..90a0a7f 100644 --- a/tools/perf/builtin-version.c +++ b/tools/perf/builtin-version.c @@ -9,3 +9,128 @@ int cmd_version(int argc __maybe_unused, const char **argv __maybe_unused) printf("perf version %s\n", perf_version_string); return 0; } + +static void status_print(const char *name, const char *status) +{ + printf("%22s: [ %3s ]\n", name, status); +} + +static void library_status(void) +{ +#ifdef HAVE_DWARF_SUPPORT + status_print("dwarf", "on"); +#else + status_print("dwarf", "off"); +#endif could this and all those below be in some generic macro? #define STATUS(__d, __m)\ #ifdef __d \ status_print(#__m, "on"); \ #else \ status_print(#__m, "OFF");\ #endif STATUS(HAVE_DWARF_SUPPORT, dwarf) also please print the 'OFF' and use colors as in the build message Fine, thanks. I will update the code. + +#ifdef HAVE_DWARF_GETLOCATIONS + status_print("dwarf_getlocations", "on"); +#else + status_print("dwarf_getlocations", "off"); +#endif + +#ifdef NO_GLIBC + status_print("glibc", "off"); +#else + status_print("glibc", "on"); +#endif + +#ifdef HAVE_GTK2_SUPPORT + status_print("gtk2", "on"); +#else + status_print("gtk2", "off"); +#endif + SNIP +} + +int cmd_version2(int argc, const char **argv) +{ + cmd_version(argc, argv); + library_status(); + return 0; +} diff --git a/tools/perf/builtin.h b/tools/perf/builtin.h index 05745f3..c7508ee 100644 --- a/tools/perf/builtin.h +++ b/tools/perf/builtin.h @@ -29,6 +29,7 @@ int cmd_timechart(int argc, const char **argv); int cmd_top(int argc, const char **argv); int cmd_script(int argc, const char **argv); int cmd_version(int argc, const char **argv); +int cmd_version2(int argc, const char **argv); please don't add new command for this, handle this in cmd_version Yeah, I know that, adding a new command is not a good idea. But how can I pass a new parameter to cmd_version to indicate it's the -vv case? Use argv[1]? But looks not good since argc is 1. Any idea about that? I guess the based would be to have 'perf version -v', where in cmd_version you'd call standard option processing.. the 'perf -vv' shortcut could for example be done through some global var like in attached patch... untested jirka --- diff --git a/tools/perf/builtin-version.c b/tools/perf/builtin-version.c index 37019c5d675f..1fe458792cc1 100644 --- a/tools/perf/builtin-version.c +++ b/tools/perf/builtin-version.c @@ -4,6 +4,8 @@ #include #include +int version_verbose; + int cmd_version(int argc __maybe_unused, const char **argv __maybe_unused) { printf("perf version %s\n", perf_version_string); diff --git a/tools/perf/perf.c b/tools/perf/perf.c index 1b3fc8ec0fa2..a5c95c6c0de7 100644 --- a/tools/perf/perf.c +++ b/tools/perf/perf.c @@ -185,7 +185,14 @@ static int handle_options(const char ***argv, int *argc, int *envchanged) break; } - if (!strcmp(cmd, "-v")) { + if (strstarts(cmd, "-v")) { + int i; + + for (i = 2; cmd[i]; i++) { + if (cmd[i] == 'v') + version_verbose++; + } + (*argv)[0] = "--version"; break; } diff --git a/tools/perf/perf.h b/tools/perf/perf.h index 8fec1abd0f1f..9c45ba179635 100644 --- a/tools/perf/perf.h +++ b/tools/perf/perf.h @@ -13,6 +13,8 @@ void test_attr__init(void); void test_attr__open(struct perf_event_attr *attr, pid_t pid, int cpu, int fd, int group_fd, unsigned long flags); +extern int version_verbose; + #define HAVE_ATTR_TEST #include "perf-sys.h" Using global variable, hmm, that's fine, looks no better way. :) I will follow Ingo's suggestion. Use 'perf -v --build-options' and map it to 'perf -vv'. Thanks Jin Yao
[PATCH v1 3/3] perf version: Print status for syscall_table
This patch doesn't print "libaudit" line if HAVE_SYSCALL_TABLE_SUPPORT is available and add a line for HAVE_SYSCALL_TABLE_SUPPORT. For example, $ ./perf -vv perf version 4.13.rc5.gc2f8af9 dwarf: [ on ] # HAVE_DWARF_SUPPORT dwarf_getlocations: [ on ] # HAVE_DWARF_GETLOCATIONS_SUPPORT glibc: [ on ] # HAVE_GLIBC_SUPPORT gtk2: [ on ] # HAVE_GTK2_SUPPORT syscall_table: [ on ] # HAVE_SYSCALL_TABLE_SUPPORT libbfd: [ on ] # HAVE_LIBBFD_SUPPORT libelf: [ on ] # HAVE_LIBELF_SUPPORT libnuma: [ on ] # HAVE_LIBNUMA_SUPPORT numa_num_possible_cpus: [ on ] # HAVE_LIBNUMA_SUPPORT libperl: [ on ] # HAVE_LIBPERL_SUPPORT libpython: [ on ] # HAVE_LIBPYTHON_SUPPORT libslang: [ on ] # HAVE_SLANG_SUPPORT libcrypto: [ on ] # HAVE_LIBCRYPTO_SUPPORT libunwind: [ on ] # HAVE_LIBUNWIND_SUPPORT libdw-dwarf-unwind: [ on ] # HAVE_DWARF_SUPPORT zlib: [ on ] # HAVE_ZLIB_SUPPORT lzma: [ on ] # HAVE_LZMA_SUPPORT get_cpuid: [ on ] # HAVE_AUXTRACE_SUPPORT bpf: [ on ] # HAVE_LIBBPF_SUPPORT The line "syscall_table: [ on ] # HAVE_SYSCALL_TABLE_SUPPORT" is new created. Signed-off-by: Jin Yao --- tools/perf/builtin-version.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/tools/perf/builtin-version.c b/tools/perf/builtin-version.c index 2abe391..50df168 100644 --- a/tools/perf/builtin-version.c +++ b/tools/perf/builtin-version.c @@ -60,7 +60,10 @@ static void library_status(void) STATUS(HAVE_DWARF_GETLOCATIONS_SUPPORT, dwarf_getlocations); STATUS(HAVE_GLIBC_SUPPORT, glibc); STATUS(HAVE_GTK2_SUPPORT, gtk2); +#ifndef HAVE_SYSCALL_TABLE_SUPPORT STATUS(HAVE_LIBAUDIT_SUPPORT, libaudit); +#endif + STATUS(HAVE_SYSCALL_TABLE_SUPPORT, syscall_table); STATUS(HAVE_LIBBFD_SUPPORT, libbfd); STATUS(HAVE_LIBELF_SUPPORT, libelf); STATUS(HAVE_LIBNUMA_SUPPORT, libnuma); -- 2.7.4
[PATCH v1 0/3] perf version: some follow-up updates
1. Remove -DNO_LIBPERL and use -DHAVE_LIBPERL_SUPPORT in C code. 2. Remove -DNO_LIBPYTHON and use -DHAVE_LIBPYTHON_SUPPORT in C code. 3. Rename HAVE_SYSCALL_TABLE to HAVE_SYSCALL_TABLE_SUPPORT. 4. Don't print "libaudit" line if HAVE_SYSCALL_TABLE_SUPPORT is available and add a line for HAVE_SYSCALL_TABLE_SUPPORT. Jin Yao (3): perf script: Use HAVE_LIBXXX_SUPPORT to replace NO_LIBXXX perf: Rename HAVE_SYSCALL_TABLE to HAVE_SYSCALL_TABLE_SUPPORT perf version: Print status for syscall_table tools/perf/Makefile.config | 2 +- tools/perf/builtin-help.c | 2 +- tools/perf/builtin-script.c | 4 ++-- tools/perf/builtin-version.c| 3 +++ tools/perf/perf.c | 4 ++-- tools/perf/util/generate-cmdlist.sh | 2 +- tools/perf/util/syscalltbl.c| 6 +++--- tools/perf/util/trace-event-scripting.c | 4 ++-- 8 files changed, 15 insertions(+), 12 deletions(-) -- 2.7.4
[PATCH v1 2/3] perf: Rename HAVE_SYSCALL_TABLE to HAVE_SYSCALL_TABLE_SUPPORT
To make consistent with other HAVE_XXX_SUPPORT in Makefile.config, this patch renames HAVE_SYSCALL_TABLE to HAVE_SYSCALL_TABLE_SUPPORT and update the C code accordingly. Signed-off-by: Jin Yao --- tools/perf/Makefile.config | 2 +- tools/perf/builtin-help.c | 2 +- tools/perf/perf.c | 4 ++-- tools/perf/util/generate-cmdlist.sh | 2 +- tools/perf/util/syscalltbl.c| 6 +++--- 5 files changed, 8 insertions(+), 8 deletions(-) diff --git a/tools/perf/Makefile.config b/tools/perf/Makefile.config index c7abd83..9ea9b06 100644 --- a/tools/perf/Makefile.config +++ b/tools/perf/Makefile.config @@ -68,7 +68,7 @@ ifeq ($(NO_PERF_REGS),0) endif ifneq ($(NO_SYSCALL_TABLE),1) - CFLAGS += -DHAVE_SYSCALL_TABLE + CFLAGS += -DHAVE_SYSCALL_TABLE_SUPPORT endif # So far there's only x86 and arm libdw unwind support merged in perf. diff --git a/tools/perf/builtin-help.c b/tools/perf/builtin-help.c index 4aca13f..1c41b4e 100644 --- a/tools/perf/builtin-help.c +++ b/tools/perf/builtin-help.c @@ -439,7 +439,7 @@ int cmd_help(int argc, const char **argv) #ifdef HAVE_LIBELF_SUPPORT "probe", #endif -#if defined(HAVE_LIBAUDIT_SUPPORT) || defined(HAVE_SYSCALL_TABLE) +#if defined(HAVE_LIBAUDIT_SUPPORT) || defined(HAVE_SYSCALL_TABLE_SUPPORT) "trace", #endif NULL }; diff --git a/tools/perf/perf.c b/tools/perf/perf.c index 1659029..20a08cb 100644 --- a/tools/perf/perf.c +++ b/tools/perf/perf.c @@ -73,7 +73,7 @@ static struct cmd_struct commands[] = { { "lock", cmd_lock, 0 }, { "kvm",cmd_kvm,0 }, { "test", cmd_test, 0 }, -#if defined(HAVE_LIBAUDIT_SUPPORT) || defined(HAVE_SYSCALL_TABLE) +#if defined(HAVE_LIBAUDIT_SUPPORT) || defined(HAVE_SYSCALL_TABLE_SUPPORT) { "trace", cmd_trace, 0 }, #endif { "inject", cmd_inject, 0 }, @@ -491,7 +491,7 @@ int main(int argc, const char **argv) argv[0] = cmd; } if (strstarts(cmd, "trace")) { -#if defined(HAVE_LIBAUDIT_SUPPORT) || defined(HAVE_SYSCALL_TABLE) +#if defined(HAVE_LIBAUDIT_SUPPORT) || defined(HAVE_SYSCALL_TABLE_SUPPORT) setup_path(); argv[0] = "trace"; return cmd_trace(argc, argv); diff --git a/tools/perf/util/generate-cmdlist.sh b/tools/perf/util/generate-cmdlist.sh index ff17920..c3cef36 100755 --- a/tools/perf/util/generate-cmdlist.sh +++ b/tools/perf/util/generate-cmdlist.sh @@ -38,7 +38,7 @@ do done echo "#endif /* HAVE_LIBELF_SUPPORT */" -echo "#if defined(HAVE_LIBAUDIT_SUPPORT) || defined(HAVE_SYSCALL_TABLE)" +echo "#if defined(HAVE_LIBAUDIT_SUPPORT) || defined(HAVE_SYSCALL_TABLE_SUPPORT)" sed -n -e 's/^perf-\([^]*\)[ ].* audit*/\1/p' command-list.txt | sort | while read cmd diff --git a/tools/perf/util/syscalltbl.c b/tools/perf/util/syscalltbl.c index 895122d..0ee7f56 100644 --- a/tools/perf/util/syscalltbl.c +++ b/tools/perf/util/syscalltbl.c @@ -17,7 +17,7 @@ #include #include -#ifdef HAVE_SYSCALL_TABLE +#ifdef HAVE_SYSCALL_TABLE_SUPPORT #include #include "string2.h" #include "util.h" @@ -139,7 +139,7 @@ int syscalltbl__strglobmatch_first(struct syscalltbl *tbl, const char *syscall_g return syscalltbl__strglobmatch_next(tbl, syscall_glob, idx); } -#else /* HAVE_SYSCALL_TABLE */ +#else /* HAVE_SYSCALL_TABLE_SUPPORT */ #include @@ -176,4 +176,4 @@ int syscalltbl__strglobmatch_first(struct syscalltbl *tbl, const char *syscall_g { return syscalltbl__strglobmatch_next(tbl, syscall_glob, idx); } -#endif /* HAVE_SYSCALL_TABLE */ +#endif /* HAVE_SYSCALL_TABLE_SUPPORT */ -- 2.7.4
Re: [PATCH] perf util: Display warning when perf report/annotate is missing some libs
Hi Jiri, I'm still thinking it's worth displaying the warning when perf missing some libraries. Somebody just told me that perf didn't work well. While after some investigations, I found it's just missing some libraries when building the perf. But I have spent some time on getting the root cause. If with this patch, it should be very easily to know that. Thanks Jin Yao On 1/12/2018 10:22 AM, Jin, Yao wrote: On 1/11/2018 11:30 PM, Jiri Olsa wrote: On Thu, Jan 11, 2018 at 07:03:06PM +0800, Jin Yao wrote: We keep having bug reports that when users build perf on their own, we already have same warnings during the build Yes, there will be warnings displayed during the build if some libraries are missing. While I do find the users (especially the new perf user) don't notice the warning. They only care about the success of build. Once they use this perf version and find something working strangely, they will say that perf may have bug. :( but they don't install some needed libraries like libelf, libbfd/libibery. how about saying that in the symbol column, instead of poluting report's output, like: $ perf report --stdio # Overhead Command Shared Object Symbol (disabled) # ... .. I just think it'd better provide some hints to user. For example, "symbol is disabled and you need to install libelf/xxx", say something like that. But it looks the column can't contain too much information (i.e. no more space to contain the entire hints). Any idea? Or just add this warning in verbose mode? also your change does not affect tui mode annotation for some reason does not start at all.. could be little more verbose ;-) jirka Yes, it doesn't affect tui mode. Or we just add this warning in verbose mode? e.g. perf report -v? Thanks Jin Yao
Re: [PATCH] perf util: Display warning when perf report/annotate is missing some libs
On 3/21/2018 11:38 PM, Jiri Olsa wrote: On Wed, Mar 21, 2018 at 10:11:10AM +0800, Jin, Yao wrote: Hi Jiri, I'm still thinking it's worth displaying the warning when perf missing some libraries. Somebody just told me that perf didn't work well. While after some investigations, I found it's just missing some libraries when building the perf. But I have spent some time on getting the root cause. If with this patch, it should be very easily to know that. true.. Arnaldo, any feedback on this one? I just think it'd better provide some hints to user. For example, "symbol is disabled and you need to install libelf/xxx", say something like that. But it looks the column can't contain too much information (i.e. no more space to contain the entire hints). Any idea? Or just add this warning in verbose mode? also your change does not affect tui mode annotation for some reason does not start at all.. could be little more verbose ;-) jirka Yes, it doesn't affect tui mode. Or we just add this warning in verbose mode? e.g. perf report -v? how about displaying libraries separately with -vv output, that would mimic the build message, like: $ ./perf -vv perf version 4.16.rc6.g18fd48 dwarf: [ on ] dwarf_getlocations: [ on ] glibc: [ on ] gtk2: [ on ] libaudit: [ on ] libbfd: [ on ] libelf: [ on ] libnuma: [ on ] numa_num_possible_cpus: [ on ] libperl: [ on ] libpython: [ on ] libslang: [ on ] libcrypto: [ on ] libunwind: [ on ] libdw-dwarf-unwind: [ on ] zlib: [ on ] lzma: [ on ] get_cpuid: [ on ] bpf: [ on ] and perf -vvv could display the 'make VF=1' info jirka I'm just afraid that the newbie will not check the -vv on his own when he gets trouble in using perf. In other words, if a user is experienced and he knows -vv yet, I may assume that he should know installing all libraries before building the perf. This patch is specific for the perf newbie. It will directly shows the error/warning when the user launches the perf binary. It will have a little bit helps, I guess. :) Thanks Jin Yao
Re: [PATCH] perf util: Display warning when perf report/annotate is missing some libs
On 3/22/2018 2:52 AM, Arnaldo Carvalho de Melo wrote: Em Wed, Mar 21, 2018 at 05:04:46PM +0100, Jiri Olsa escreveu: On Wed, Mar 21, 2018 at 12:43:15PM -0300, Arnaldo Carvalho de Melo wrote: Em Wed, Mar 21, 2018 at 12:40:35PM -0300, Arnaldo Carvalho de Melo escreveu: Em Wed, Mar 21, 2018 at 04:38:07PM +0100, Jiri Olsa escreveu: On Wed, Mar 21, 2018 at 10:11:10AM +0800, Jin, Yao wrote: Hi Jiri, I'm still thinking it's worth displaying the warning when perf missing some libraries. Somebody just told me that perf didn't work well. While after some investigations, I found it's just missing some libraries when building the perf. But I have spent some time on getting the root cause. If with this patch, it should be very easily to know that. true.. Arnaldo, any feedback on this one? Lemme re-read the thread... Well, how about we make it harder to build without key libraries? I.e. if we detect that what we consider a core set of libraries isn't found in the system, then we stop the build, warn about it and ask the user to confirm that the build should proceed by passing some explicit -DI_KNOW_WHAT_I_AM_DOING___PROCEED=doit hum, not sure we want to complicate the build even more than it is now :-\ and IMO it still won't help much in Jin's problem, if user forces the build anyway Well, if a user _forces_ a build, not taking into consideration a warning that _is_ emitted and _stops_ the build, about the functionality it will lose by doing forcing the build, then comes back and complains that that functionality is not present, then it becomes difficult to help this user... :-) On the other hand, if the user forgets to install an important library, the warning is emitted but the build proceeds, no explicit action was performed, just a warning wasn't noticed, and the user complains, then I'd say: "hey, are you sure library foo devel files were present when you build it?", i.e. the support back and forth Jin is trying to avoid. And for users that _saw_ the warning, _knew_ they _didn't_ want that functionality, to be reminded while running, say 'perf report' that something they _decided not to have_ isn't present, then that could be annoying, no? Lemme try another idea: what if we do something like gcc does and print the features present when showing the version? I.e.: [acme@jouet perf]$ gcc -v Using built-in specs. COLLECT_GCC=/usr/bin/gcc COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/7/lto-wrapper OFFLOAD_TARGET_NAMES=nvptx-none OFFLOAD_TARGET_DEFAULT=1 Target: x86_64-redhat-linux Configured with: ../configure --enable-bootstrap --enable-languages=c,c++,objc,obj-c++,fortran,ada,go,lto --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-shared --enable-threads=posix --enable-checking=release --enable-multilib --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-gnu-unique-object --enable-linker-build-id --with-gcc-major-version-only --with-linker-hash-style=gnu --enable-plugin --enable-initfini-array --with-isl --enable-libmpx --enable-offload-targets=nvptx-none --without-cuda-driver --enable-gnu-indirect-function --with-tune=generic --with-arch_32=i686 --build=x86_64-redhat-linux Thread model: posix gcc version 7.3.1 20180303 (Red Hat 7.3.1-5) (GCC) [acme@jouet perf]$ - Arnaldo Is this too complicated for perf newbie to understand? For my problem, the mistake only occurs on perf newbie. I just think, it'd better return a direct and clear message to them. Maybe they don't know or don't use -v or -vv to do more investigation by themselves. Thanks Jin Yao
Re: [PATCH] perf util: Display warning when perf report/annotate is missing some libs
On 3/22/2018 4:51 PM, Jiri Olsa wrote: On Thu, Mar 22, 2018 at 09:04:10AM +0800, Jin, Yao wrote: On 3/21/2018 11:38 PM, Jiri Olsa wrote: On Wed, Mar 21, 2018 at 10:11:10AM +0800, Jin, Yao wrote: Hi Jiri, I'm still thinking it's worth displaying the warning when perf missing some libraries. Somebody just told me that perf didn't work well. While after some investigations, I found it's just missing some libraries when building the perf. But I have spent some time on getting the root cause. If with this patch, it should be very easily to know that. true.. Arnaldo, any feedback on this one? I just think it'd better provide some hints to user. For example, "symbol is disabled and you need to install libelf/xxx", say something like that. But it looks the column can't contain too much information (i.e. no more space to contain the entire hints). Any idea? Or just add this warning in verbose mode? also your change does not affect tui mode annotation for some reason does not start at all.. could be little more verbose ;-) jirka Yes, it doesn't affect tui mode. Or we just add this warning in verbose mode? e.g. perf report -v? how about displaying libraries separately with -vv output, that would mimic the build message, like: $ ./perf -vv perf version 4.16.rc6.g18fd48 dwarf: [ on ] dwarf_getlocations: [ on ] glibc: [ on ] gtk2: [ on ] libaudit: [ on ] libbfd: [ on ] libelf: [ on ] libnuma: [ on ] numa_num_possible_cpus: [ on ] libperl: [ on ] libpython: [ on ] libslang: [ on ] libcrypto: [ on ] libunwind: [ on ] libdw-dwarf-unwind: [ on ] zlib: [ on ] lzma: [ on ] get_cpuid: [ on ] bpf: [ on ] and perf -vvv could display the 'make VF=1' info jirka I'm just afraid that the newbie will not check the -vv on his own when he gets trouble in using perf. In other words, if a user is experienced and he knows -vv yet, I may assume that he should know installing all libraries before building the perf. This patch is specific for the perf newbie. It will directly shows the error/warning when the user launches the perf binary. It will have a little bit helps, I guess. :) I just don't like the idea that when you run perf report, or annotate it spits out lines for every missing feature maybe we could detect missing features for given command and display line about missing features and say something like: 'Warning: symbol,dwarf support not compiled in (for more details run perf -vv)' or somwthing like that.. ;-) jirka Hi Jiri, I think your idea is very good! I guess following it's just an example copied from perf building process, right? $ ./perf -vv perf version 4.16.rc6.g18fd48 dwarf: [ on ] dwarf_getlocations: [ on ] glibc: [ on ] gtk2: [ on ] libaudit: [ on ] libbfd: [ on ] libelf: [ on ] libnuma: [ on ] numa_num_possible_cpus: [ on ] libperl: [ on ] libpython: [ on ] libslang: [ on ] libcrypto: [ on ] libunwind: [ on ] libdw-dwarf-unwind: [ on ] zlib: [ on ] lzma: [ on ] get_cpuid: [ on ] bpf: [ on ] We can check some CFLAGS like "#ifdef HAVE_XXX" in perf code to determine if some libraries are compiled in. For example, #ifdef HAVE_LIBNUMA_SUPPORT printf("libnuma: [ on ]"); #endif For some features, such as "numa_num_possible_cpus", which doesn't have CFLAGS variables. Maybe we can ignore them in report? I'd like to upgrade my patch to support perf -vv. Thanks Jin Yao
Re: [PATCH v2] perf annotate: Support to display the IPC/Cycle in tui mode
Hi, Could this patch be accepted? I'm working on the supporting for stdio mode. I just wish to post the patch series after this patch being merged. Thanks Jin Yao On 2/27/2018 5:38 PM, Jin Yao wrote: Unlike the perf report interactive annotate mode, the perf annotate doesn't display the IPC/Cycle even if branch info is recorded in perf data file. perf record -b ... perf annotate function It should show IPC/cycle, but it doesn't. This patch lets perf annotate support the displaying of IPC/Cycle if branch info is in perf data. For example, perf annotate compute_flag Percent│ IPC Cycle │ │ │Disassembly of section .text: │ │00400640 : │compute_flag(): │volatile int count; │static unsigned int s_randseed; │ │__attribute__((noinline)) │int compute_flag() │{ 22.96 │1.18 584sub$0x8,%rsp │int i; │ │i = rand() % 2; 23.02 │1.18 1 → callq rand@plt │ │return i; 27.05 │3.37 mov%eax,%edx │} │3.37 add$0x8,%rsp │{ │int i; │ │i = rand() % 2; │ │return i; │3.37 shr$0x1f,%edx │3.37 add%edx,%eax │3.37 and$0x1,%eax │3.37 sub%edx,%eax │} 26.97 │3.37 2 ← retq Note that, this patch only supports tui mode. For stdio, now it just keeps original behavior. Will support it in a follow-up patch. perf annotate compute_flag --stdio Percent | Source code & Disassembly of div for cycles:ppp (7993 samples) -- : : : :Disassembly of section .text: : :00400640 : :compute_flag(): :volatile int count; :static unsigned int s_randseed; : :__attribute__((noinline)) :int compute_flag() :{ 0.29 : 400640: sub$0x8,%rsp # +100.00% :int i; : :i = rand() % 2; 42.93 : 400644: callq 400490 # -100.00% (p:100.00%) : :return i; 0.10 : 400649: mov%eax,%edx # +100.00% :} 0.94 : 40064b: add$0x8,%rsp :{ :int i; : :i = rand() % 2; : :return i; 27.02 : 40064f: shr$0x1f,%edx 0.15 : 400652: add%edx,%eax 1.24 : 400654: and$0x1,%eax 2.08 : 400657: sub%edx,%eax :} 25.26 : 400659: retq # -100.00% (p:100.00%) Change-log: --- v2: --- No code change. Just change the patch description to make it more clear. Signed-off-by: Jin Yao --- tools/perf/builtin-annotate.c | 88 --- 1 file changed, 82 insertions(+), 6 deletions(-) diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c index f15731a..ead6ae4 100644 --- a/tools/perf/builtin-annotate.c +++ b/tools/perf/builtin-annotate.c @@ -44,6 +44,7 @@ struct perf_annotate { bool full_paths; bool print_line; bool skip_missing; + bool has_br_stack; const char *sym_hist_filter; const char *cpu_list; DECLARE_BITMAP(cpu_bitmap, MAX_NR_CPUS); @@ -146,16 +147,73 @@ static void process_branch_stack(struct branch_stack *bs, struct addr_location * free(bi); } +static int hist_iter__branch_callback(struct hist_entry_iter *iter, + struct addr_location *al __maybe_unused, + bool single __maybe_unused, + void *arg __maybe_unused) +{ + struct hist_entry *he = iter->he; + struct branch_info *bi; + struct perf_sample *sample = iter->sample; + struct perf_evsel *evsel = iter->evsel; + int err; + + hist__account_cycles(sample->branch_stack, al, sample, false); + + bi = he->branch_info; + err = addr_map_symbol__inc_samples(&bi->from, sample, evsel->idx); + + if (err) + goto out; + + err = addr_map_symbol__inc_
Re: [PATCH v1 0/4] perf annotate: Create a new '--tui-dump' option
On 3/13/2018 11:20 PM, Arnaldo Carvalho de Melo wrote: Em Tue, Mar 13, 2018 at 10:16:50PM +0800, Jin Yao escreveu: There is a requirement to let perf annotate support displaying the IPC/Cycle. In previous patch, this is supported in TUI mode. While it's not convenient for users since they have to take screen shots and copy/paste data. This patch series introduces a new option '--tui-dump' in perf annotate to dump the TUI output to stdio. User can easily use the command line like: 'perf annotate --tui-dump > /tmp/log.txt' My first impression is that this pollutes the code with way too many ifs, I was thinking more of a: while (read parsed objdump line) { ins__fprintf(); } Yes, the issue in my patch is that it uses many 'if' to check if it's tui_dump(or called 'stdio2') or tui mode. Going from the refresh routine, I started doing the conversion, but haven't completed it, there are opportunities for more __scnprintf like routines, also one to find the percent_max, etc then those would be used both in these two for --stdio2, that eventually would become --stdio with the old one becoming --stdio1, till we're satisfied with the new default. I have some questions for the following code. Please correct me if I misunderstand anything. static void annotation_line__fprintf(struct annotate_line *al, FILE *fp) { struct browser_line *bl = browser_line(al); int i; double percent_max = 0.0; char bf[256]; for (i = 0; i < browser->nr_events; i++) { if (al->samples[i].percent > percent_max) percent_max = al->samples[i].percent; } /* the following if/else block should be transformed into a __scnprintf routine that formats a buffer and then the TUI and --stdio2 use it */ if (al->offset != -1 && percent_max != 0.0) { for (i = 0; i < ab->nr_events; i++) { if (annotate_browser__opts.show_total_period) { fprintf(fp, browser, "%11" PRIu64 " ", al->samples[i].he.period); } else if (annotate_browser__opts.show_nr_samples) { fprintf(fp, browser, "%6" PRIu64 " ", al->samples[i].he.nr_samples); } else { fprintf(fp, "%6.2f ", al->samples[i].percent); } } } else { ui_browser__printf(browser, "%*s", pcnt_width, annotate_browser__opts.show_total_period ? "Period" : annotate_browser__opts.show_nr_samples ? "Samples" : "Percent"); } I guess the above code has not been completed yet. My understanding for Arnaldo's idea is that the output should be written to a buffer via scnprintf and the buffer will be passed to TUI or stdio2 and printed out later. One potential issue is how to process the color for TUI? For example, the call to ui_browser__set_percent_color. If we need to set color for TUI, it looks we still have to check if it's a TUI mode. For example, Suppose the bf[] has been written with the Percent string yet, next we need, if (--tui) { ui_browser__set_percent_color(...); ui_browser__printf(bf, ...); } else { printf(..., bf); } Is my understanding correct? /* The ab->have_cycles should go to a separate struct, outside * annotation_browser, and the rest should go to something that just does scnprintf on a buffer * that then is printed on the TUI or with fprintf */ if (ab->have_cycles) { if (al->ipc) fprintf(fp, "%*.2f ", IPC_WIDTH - 1, al->ipc) > else if (show_title) ui_browser__printf(browser, "%*s ", IPC_WIDTH - 1, "IPC"); if (al->cycles) ui_browser__printf(browser, "%*" PRIu64 " ", CYCLES_WIDTH - 1, al->cycles); else if (!show_title) ui_browser__write_nstring(browser, " ", CYCLES_WIDTH); else ui_browser__printf(browser, "%*s ", CYCLES_WIDTH - 1, "Cycle"); } SLsmg_write_char(' '); /* The scroll bar isn't being used */ if (!browser->navkeypressed) width += 1; if (!*al->line) fprintf(fp, "\n"); else if (al->offset == -1) { if (al->line_nr && annotat
[PATCH 2/2] perf report: Display average IPC and IPC coverage per symbol
Support displaying the average IPC and IPC coverage for symbol in perf report TUI browser. We create a new sort-key 'ipc' for that. For example, $ perf record -g -b ... $ perf report -s symbol,ipc or perf report -s ipc Overhead Symbol IPC [IPC Coverage] 39.60% [.] __random 2.30 [ 54.8%] 18.02% [.] main 0.43 [ 54.3%] 14.21% [.] compute_flag 2.29 [100.0%] 14.16% [.] rand 0.36 [100.0%] 7.06% [.] __random_r 2.57 [ 70.5%] 6.85% [.] rand@plt 0.00 [ 0.0%] Note that, stdio mode doesn't support this feature. Signed-off-by: Jin Yao --- tools/perf/Documentation/perf-report.txt | 1 + tools/perf/builtin-report.c | 11 tools/perf/util/hist.h | 1 + tools/perf/util/sort.c | 43 tools/perf/util/sort.h | 1 + tools/perf/util/symbol.h | 1 + 6 files changed, 58 insertions(+) diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt index 474a494..e7b8974 100644 --- a/tools/perf/Documentation/perf-report.txt +++ b/tools/perf/Documentation/perf-report.txt @@ -122,6 +122,7 @@ OPTIONS - in_tx: branch in TSX transaction - abort: TSX transaction abort. - cycles: Cycles in basic block + - ipc: average ipc (instruction per cycle) of function And default sort keys are changed to comm, dso_from, symbol_from, dso_to and symbol_to, see '--branch-stack'. diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c index 257c9c1..9b75e11 100644 --- a/tools/perf/builtin-report.c +++ b/tools/perf/builtin-report.c @@ -954,6 +954,7 @@ int cmd_report(int argc, const char **argv) bool has_br_stack = false; int branch_mode = -1; bool branch_call_mode = false; + bool symbol_ipc = false; #define CALLCHAIN_DEFAULT_OPT "graph,0.5,caller,function,percent" const char report_callchain_help[] = "Display call graph (stack chain/backtrace):\n\n" CALLCHAIN_REPORT_HELP @@ -1284,6 +1285,13 @@ int cmd_report(int argc, const char **argv) else use_browser = 0; + if ((sort__mode == SORT_MODE__BRANCH) && sort_order && + strstr(sort_order, "ipc")) { + if (!strstr(sort_order, "symbol")) + sort_order = "symbol,ipc"; + symbol_ipc = true; + } + if (setup_sorting(session->evlist) < 0) { if (sort_order) parse_options_usage(report_usage, options, "s", 1); @@ -1331,6 +1339,9 @@ int cmd_report(int argc, const char **argv) symbol_conf.sort_by_name = true; } annotation_config__init(); + } else if (symbol_ipc) { + pr_err("Only TUI mode supports sort-key ipc\n"); + goto error; } if (symbol__init(&session->header.env) < 0) diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h index 3badd7f..664b5ed 100644 --- a/tools/perf/util/hist.h +++ b/tools/perf/util/hist.h @@ -62,6 +62,7 @@ enum hist_column { HISTC_TRACE, HISTC_SYM_SIZE, HISTC_DSO_SIZE, + HISTC_SYMBOL_IPC, HISTC_NR_COLS, /* Last entry */ }; diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c index f96c005..94f62c8 100644 --- a/tools/perf/util/sort.c +++ b/tools/perf/util/sort.c @@ -13,6 +13,7 @@ #include "strlist.h" #include #include "mem-events.h" +#include "annotate.h" #include regex_tparent_regex; @@ -422,6 +423,47 @@ struct sort_entry sort_srcline_to = { .se_width_idx = HISTC_SRCLINE_TO, }; +static int hist_entry__sym_ipc_snprintf(struct hist_entry *he, char *bf, + size_t size, unsigned int width) +{ + + struct symbol *sym = he->ms.sym; + struct map *map = he->ms.map; + struct perf_evsel *evsel = hists_to_evsel(he->hists); + struct annotation *notes; + double ipc = 0.0, coverage = 0.0; + char tmp[64]; + + if (!sym) + return repsep_snprintf(bf, size, "%-*s", width, "-"); + + if (!sym->annotated && + symbol__annotate2(sym, map, evsel, &annotation__default_options, + NULL) < 0) { + return 0; + } + + sym->annotated = true; + notes = symbol__annotation(sym); + + if (notes->hit_cycles) + ipc = notes->hit_insn / ((double)notes->hit_cycles); + + if (notes->total_insn) +
[PATCH 0/2] perf report/annotate: Support average IPC and IPC coverage for function
Add supporting of displaying the average IPC and IPC coverage percentage per function. For example, $ perf record -g -b ... $ perf report -s symbol,ipc Overhead Symbol IPC [IPC Coverage] 39.60% [.] __random 2.30 [ 54.8%] 18.02% [.] main 0.43 [ 54.3%] 14.21% [.] compute_flag 2.29 [100.0%] 14.16% [.] rand 0.36 [100.0%] 7.06% [.] __random_r 2.57 [ 70.5%] 6.85% [.] rand@plt 0.00 [ 0.0%] ... $ perf annotate --stdio2 Percent IPC Cycle (Average IPC: 2.30, IPC Coverage: 54.8%) Disassembly of section .text: 0003aac0 : 8.32 3.28 sub$0x18,%rsp 3.28 mov$0x1,%esi 3.28 xor%eax,%eax 3.28 cmpl $0x0,argp_program_version_hook@@GLIBC_2.2.5+0x1e0 11.57 3.28 1 ↓ je 20 lock cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+0x8a0 ↓ jne29 ↓ jmp43 11.57 1.1020: cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+0x8a0 ... Jin Yao (2): perf annotate: Compute average IPC and IPC coverage per symbol perf report: Display average IPC and IPC coverage per symbol tools/perf/Documentation/perf-report.txt | 1 + tools/perf/builtin-report.c | 11 tools/perf/util/annotate.c | 41 +++--- tools/perf/util/annotate.h | 5 tools/perf/util/hist.h | 1 + tools/perf/util/sort.c | 43 tools/perf/util/sort.h | 1 + tools/perf/util/symbol.h | 1 + 8 files changed, 101 insertions(+), 3 deletions(-) -- 2.7.4
[PATCH 1/2] perf annotate: Compute average IPC and IPC coverage per symbol
Add support to perf report annotate view or perf annotate --stdio2 to aggregate the IPC derived from timed LBRs per symbol. We compute the average IPC and the IPC coverage percentage. For example, $ perf annotate --stdio2 Percent IPC Cycle (Average IPC: 2.30, IPC Coverage: 54.8%) Disassembly of section .text: 0003aac0 : 8.32 3.28 sub$0x18,%rsp 3.28 mov$0x1,%esi 3.28 xor%eax,%eax 3.28 cmpl $0x0,argp_program_version_hook@@GLIBC_2.2.5+0x1e0 11.57 3.28 1 ↓ je 20 lock cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+0x8a0 ↓ jne29 ↓ jmp43 11.57 1.1020: cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+0x8a0 0.00 1.10 1 ↓ je 43 29: lea__abort_msg@@GLIBC_PRIVATE+0x8a0,%rdi sub$0x80,%rsp → callq __lll_lock_wait_private add$0x80,%rsp 0.00 3.0043: lea__ctype_b@GLIBC_2.2.5+0x38,%rdi 3.00 lea0xc(%rsp),%rsi 8.49 3.00 1 → callq __random_r 7.91 1.94 cmpl $0x0,argp_program_version_hook@@GLIBC_2.2.5+0x1e0 0.00 1.94 1 ↓ je 68 lock decl __abort_msg@@GLIBC_PRIVATE+0x8a0 ↓ jne70 ↓ jmp8a 0.00 2.0068: decl __abort_msg@@GLIBC_PRIVATE+0x8a0 21.56 2.00 1 ↓ je 8a 70: lea__abort_msg@@GLIBC_PRIVATE+0x8a0,%rdi sub$0x80,%rsp → callq __lll_unlock_wake_private add$0x80,%rsp 21.56 2.908a: movslq 0xc(%rsp),%rax 2.90 add$0x18,%rsp 9.03 2.90 1 ← retq It shows for this symbol the average IPC is 2.30 and the IPC coverage is 54.8%. Signed-off-by: Jin Yao --- tools/perf/util/annotate.c | 41 ++--- tools/perf/util/annotate.h | 5 + 2 files changed, 43 insertions(+), 3 deletions(-) diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c index 6936daf..4b2b1b0 100644 --- a/tools/perf/util/annotate.c +++ b/tools/perf/util/annotate.c @@ -1000,6 +1000,7 @@ static unsigned annotation__count_insn(struct annotation *notes, u64 start, u64 static void annotation__count_and_fill(struct annotation *notes, u64 start, u64 end, struct cyc_hist *ch) { unsigned n_insn; + unsigned int cover_insn = 0; u64 offset; n_insn = annotation__count_insn(notes, start, end); @@ -1013,21 +1014,34 @@ static void annotation__count_and_fill(struct annotation *notes, u64 start, u64 for (offset = start; offset <= end; offset++) { struct annotation_line *al = notes->offsets[offset]; - if (al) + if (al && al->ipc == 0.0) { al->ipc = ipc; + cover_insn++; + } + } + + if (cover_insn) { + notes->hit_cycles += ch->cycles; + notes->hit_insn += n_insn * ch->num; + notes->cover_insn += cover_insn; } } } void annotation__compute_ipc(struct annotation *notes, size_t size) { - u64 offset; + s64 offset; if (!notes->src || !notes->src->cycles_hist) return; + notes->total_insn = annotation__count_insn(notes, 0, size - 1); + notes->hit_cycles = 0; + notes->hit_insn = 0; + notes->cover_insn = 0; + pthread_mutex_lock(¬es->lock); - for (offset = 0; offset < size; ++offset) { + for (offset = size - 1; offset >= 0; --offset) { struct cyc_hist *ch; ch = ¬es->src->cycles_hist[offset]; @@ -2563,6 +2577,22 @@ static void disasm_line__write(struct disasm_line *dl, struct annotation *notes, disasm_line__scnprintf(dl, bf, size, !notes->options->use_offset); } +static void ipc_coverage_string(char *bf, int size, struct annotation *notes) +{ + double ipc = 0.0, coverage = 0.0; + + if (notes->hit_cycles) + ipc = notes->hit_insn / ((double)notes->hit_cycles); + + if (notes->total_insn) { + coverage = notes->cover_insn * 100.0 / + ((double)notes->total_insn); + } + + scnprintf(bf, size, "(Average IPC: %.2f, IPC Coverage: %.1f%%)", + ipc, coverage); +} + static void __annotation_line__write(struct annotation_line *al, struct annotation *notes
Re: [PATCH 2/2] perf report: Display average IPC and IPC coverage per symbol
On 11/26/2018 5:55 PM, Jiri Olsa wrote: On Mon, Nov 26, 2018 at 05:40:54PM +0800, Jin Yao wrote: SNIP diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c index f96c005..94f62c8 100644 --- a/tools/perf/util/sort.c +++ b/tools/perf/util/sort.c @@ -13,6 +13,7 @@ #include "strlist.h" #include #include "mem-events.h" +#include "annotate.h" #include regex_t parent_regex; @@ -422,6 +423,47 @@ struct sort_entry sort_srcline_to = { .se_width_idx = HISTC_SRCLINE_TO, }; +static int hist_entry__sym_ipc_snprintf(struct hist_entry *he, char *bf, + size_t size, unsigned int width) +{ + + struct symbol *sym = he->ms.sym; + struct map *map = he->ms.map; + struct perf_evsel *evsel = hists_to_evsel(he->hists); + struct annotation *notes; + double ipc = 0.0, coverage = 0.0; + char tmp[64]; + + if (!sym) + return repsep_snprintf(bf, size, "%-*s", width, "-"); + + if (!sym->annotated && + symbol__annotate2(sym, map, evsel, &annotation__default_options, + NULL) < 0) { + return 0; + } this seems to make the sorting extra long, would you please consider progress bar update for this? should be added somewhere around hists sorting code Hi Jiri, Sorting doesn't take long time in my test but the session event processing takes some time. I just think maybe we need a new ops for stdio progress bar like what the tui_progress__ops does now. That should be benefit for all stdio usages. That may be another separate patch-set. Thanks Jin Yao thanks, jirka
Re: [PATCH 2/2] perf report: Display average IPC and IPC coverage per symbol
On 11/26/2018 5:52 PM, Jiri Olsa wrote: On Mon, Nov 26, 2018 at 05:40:54PM +0800, Jin Yao wrote: Support displaying the average IPC and IPC coverage for symbol in perf report TUI browser. We create a new sort-key 'ipc' for that. For example, $ perf record -g -b ... $ perf report -s symbol,ipc or perf report -s ipc Overhead Symbol IPC [IPC Coverage] 39.60% [.] __random 2.30 [ 54.8%] 18.02% [.] main 0.43 [ 54.3%] 14.21% [.] compute_flag 2.29 [100.0%] 14.16% [.] rand 0.36 [100.0%] 7.06% [.] __random_r 2.57 [ 70.5%] 6.85% [.] rand@plt 0.00 [ 0.0%] Note that, stdio mode doesn't support this feature. the patch below allowed this for stdio please merge it in, and feel free to change it as you see fit jirka Thanks so much! I have merged following code in v2. Thanks Jin Yao --- diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c index 9b75e118f609..a6756dc13285 100644 --- a/tools/perf/builtin-report.c +++ b/tools/perf/builtin-report.c @@ -85,6 +85,7 @@ struct report { int socket_filter; DECLARE_BITMAP(cpu_bitmap, MAX_NR_CPUS); struct branch_type_stat brtype_stat; + boolsymbol_ipc; }; static int report__config(const char *var, const char *value, void *cb) @@ -129,7 +130,7 @@ static int hist_iter__report_callback(struct hist_entry_iter *iter, struct mem_info *mi; struct branch_info *bi; - if (!ui__has_annotation()) + if (!ui__has_annotation() && !rep->symbol_ipc) return 0; hist__account_cycles(sample->branch_stack, al, sample, @@ -174,7 +175,7 @@ static int hist_iter__branch_callback(struct hist_entry_iter *iter, struct perf_evsel *evsel = iter->evsel; int err; - if (!ui__has_annotation()) + if (!ui__has_annotation() && !rep->symbol_ipc) return 0; hist__account_cycles(sample->branch_stack, al, sample, @@ -954,7 +955,6 @@ int cmd_report(int argc, const char **argv) bool has_br_stack = false; int branch_mode = -1; bool branch_call_mode = false; - bool symbol_ipc = false; #define CALLCHAIN_DEFAULT_OPT "graph,0.5,caller,function,percent" const char report_callchain_help[] = "Display call graph (stack chain/backtrace):\n\n" CALLCHAIN_REPORT_HELP @@ -1289,7 +1289,7 @@ int cmd_report(int argc, const char **argv) strstr(sort_order, "ipc")) { if (!strstr(sort_order, "symbol")) sort_order = "symbol,ipc"; - symbol_ipc = true; + report.symbol_ipc = true; } if (setup_sorting(session->evlist) < 0) { @@ -1319,7 +1319,7 @@ int cmd_report(int argc, const char **argv) * so don't allocate extra space that won't be used in the stdio * implementation. */ - if (ui__has_annotation()) { + if (ui__has_annotation() || report.symbol_ipc) { ret = symbol__annotation_init(); if (ret < 0) goto error; @@ -1339,9 +1339,6 @@ int cmd_report(int argc, const char **argv) symbol_conf.sort_by_name = true; } annotation_config__init(); - } else if (symbol_ipc) { - pr_err("Only TUI mode supports sort-key ipc\n"); - goto error; } if (symbol__init(&session->header.env) < 0)
Re: [PATCH 2/2] perf report: Display average IPC and IPC coverage per symbol
On 11/26/2018 5:53 PM, Jiri Olsa wrote: On Mon, Nov 26, 2018 at 05:40:54PM +0800, Jin Yao wrote: SNIP diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c index f96c005..94f62c8 100644 --- a/tools/perf/util/sort.c +++ b/tools/perf/util/sort.c @@ -13,6 +13,7 @@ #include "strlist.h" #include #include "mem-events.h" +#include "annotate.h" #include regex_t parent_regex; @@ -422,6 +423,47 @@ struct sort_entry sort_srcline_to = { .se_width_idx = HISTC_SRCLINE_TO, }; +static int hist_entry__sym_ipc_snprintf(struct hist_entry *he, char *bf, + size_t size, unsigned int width) +{ + + struct symbol *sym = he->ms.sym; + struct map *map = he->ms.map; + struct perf_evsel *evsel = hists_to_evsel(he->hists); + struct annotation *notes; + double ipc = 0.0, coverage = 0.0; + char tmp[64]; + + if (!sym) + return repsep_snprintf(bf, size, "%-*s", width, "-"); + + if (!sym->annotated && + symbol__annotate2(sym, map, evsel, &annotation__default_options, + NULL) < 0) { + return 0; + } + + sym->annotated = true; I don't like this being set in here.. please move it to symbol__annotate2 or symbol__annotate, not sure which one of these is the best fit I have set this flag in symbol__annotate2 in v2. + notes = symbol__annotation(sym); + + if (notes->hit_cycles) + ipc = notes->hit_insn / ((double)notes->hit_cycles); + + if (notes->total_insn) + coverage = notes->cover_insn * 100.0 / + ((double)notes->total_insn); missing { } for multiline code in 'if' condition Fixed, thanks! Thanks Jin Yao thanks, jirka
[PATCH v2 1/3] perf annotate: Compute average IPC and IPC coverage per symbol
Add support to perf report annotate view or perf annotate --stdio2 to aggregate the IPC derived from timed LBRs per symbol. We compute the average IPC and the IPC coverage percentage. For example, $ perf annotate --stdio2 Percent IPC Cycle (Average IPC: 2.30, IPC Coverage: 54.8%) Disassembly of section .text: 0003aac0 : 8.32 3.28 sub$0x18,%rsp 3.28 mov$0x1,%esi 3.28 xor%eax,%eax 3.28 cmpl $0x0,argp_program_version_hook@@GLIBC_2.2.5+0x1e0 11.57 3.28 1 ↓ je 20 lock cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+0x8a0 ↓ jne29 ↓ jmp43 11.57 1.1020: cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+0x8a0 0.00 1.10 1 ↓ je 43 29: lea__abort_msg@@GLIBC_PRIVATE+0x8a0,%rdi sub$0x80,%rsp → callq __lll_lock_wait_private add$0x80,%rsp 0.00 3.0043: lea__ctype_b@GLIBC_2.2.5+0x38,%rdi 3.00 lea0xc(%rsp),%rsi 8.49 3.00 1 → callq __random_r 7.91 1.94 cmpl $0x0,argp_program_version_hook@@GLIBC_2.2.5+0x1e0 0.00 1.94 1 ↓ je 68 lock decl __abort_msg@@GLIBC_PRIVATE+0x8a0 ↓ jne70 ↓ jmp8a 0.00 2.0068: decl __abort_msg@@GLIBC_PRIVATE+0x8a0 21.56 2.00 1 ↓ je 8a 70: lea__abort_msg@@GLIBC_PRIVATE+0x8a0,%rdi sub$0x80,%rsp → callq __lll_unlock_wake_private add$0x80,%rsp 21.56 2.908a: movslq 0xc(%rsp),%rax 2.90 add$0x18,%rsp 9.03 2.90 1 ← retq It shows for this symbol the average IPC is 2.30 and the IPC coverage is 54.8%. Signed-off-by: Jin Yao --- tools/perf/util/annotate.c | 41 ++--- tools/perf/util/annotate.h | 5 + 2 files changed, 43 insertions(+), 3 deletions(-) diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c index 6936daf..4b2b1b0 100644 --- a/tools/perf/util/annotate.c +++ b/tools/perf/util/annotate.c @@ -1000,6 +1000,7 @@ static unsigned annotation__count_insn(struct annotation *notes, u64 start, u64 static void annotation__count_and_fill(struct annotation *notes, u64 start, u64 end, struct cyc_hist *ch) { unsigned n_insn; + unsigned int cover_insn = 0; u64 offset; n_insn = annotation__count_insn(notes, start, end); @@ -1013,21 +1014,34 @@ static void annotation__count_and_fill(struct annotation *notes, u64 start, u64 for (offset = start; offset <= end; offset++) { struct annotation_line *al = notes->offsets[offset]; - if (al) + if (al && al->ipc == 0.0) { al->ipc = ipc; + cover_insn++; + } + } + + if (cover_insn) { + notes->hit_cycles += ch->cycles; + notes->hit_insn += n_insn * ch->num; + notes->cover_insn += cover_insn; } } } void annotation__compute_ipc(struct annotation *notes, size_t size) { - u64 offset; + s64 offset; if (!notes->src || !notes->src->cycles_hist) return; + notes->total_insn = annotation__count_insn(notes, 0, size - 1); + notes->hit_cycles = 0; + notes->hit_insn = 0; + notes->cover_insn = 0; + pthread_mutex_lock(¬es->lock); - for (offset = 0; offset < size; ++offset) { + for (offset = size - 1; offset >= 0; --offset) { struct cyc_hist *ch; ch = ¬es->src->cycles_hist[offset]; @@ -2563,6 +2577,22 @@ static void disasm_line__write(struct disasm_line *dl, struct annotation *notes, disasm_line__scnprintf(dl, bf, size, !notes->options->use_offset); } +static void ipc_coverage_string(char *bf, int size, struct annotation *notes) +{ + double ipc = 0.0, coverage = 0.0; + + if (notes->hit_cycles) + ipc = notes->hit_insn / ((double)notes->hit_cycles); + + if (notes->total_insn) { + coverage = notes->cover_insn * 100.0 / + ((double)notes->total_insn); + } + + scnprintf(bf, size, "(Average IPC: %.2f, IPC Coverage: %.1f%%)", + ipc, coverage); +} + static void __annotation_line__write(struct annotation_line *al, struct annotation *notes
[PATCH v2 3/3] perf report: Display average IPC and IPC coverage per symbol
Support displaying the average IPC and IPC coverage for symbol in perf report TUI browser. We create a new sort-key 'ipc' for that. For example, $ perf record -g -b ... $ perf report -s symbol,ipc or perf report -s ipc Overhead Symbol IPC [IPC Coverage] 39.60% [.] __random 2.30 [ 54.8%] 18.02% [.] main 0.43 [ 54.3%] 14.21% [.] compute_flag 2.29 [100.0%] 14.16% [.] rand 0.36 [100.0%] 7.06% [.] __random_r 2.57 [ 70.5%] 6.85% [.] rand@plt 0.00 [ 0.0%] Jiri Olsa provided the patch to support the stdio mode. I merged Jiri's code in this patch. $ perf report -s ipc --stdio # Overhead Symbol IPC [IPC Coverage] # ... # 39.60% [.] __random 2.30 [ 54.8%] 18.02% [.] main 0.43 [ 54.3%] 14.21% [.] compute_flag 2.29 [100.0%] 14.16% [.] rand 0.36 [100.0%] 7.06% [.] __random_r 2.57 [ 70.5%] 6.85% [.] rand@plt 0.00 [ 0.0%] 0.02% [k] run_timer_softirq 1.60 [ 57.2%] v2: --- Merge in Jiri's patch to support stdio mode Signed-off-by: Jin Yao --- tools/perf/Documentation/perf-report.txt | 1 + tools/perf/builtin-report.c | 14 --- tools/perf/util/hist.h | 1 + tools/perf/util/sort.c | 42 tools/perf/util/sort.h | 1 + 5 files changed, 56 insertions(+), 3 deletions(-) diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt index 474a494..e7b8974 100644 --- a/tools/perf/Documentation/perf-report.txt +++ b/tools/perf/Documentation/perf-report.txt @@ -122,6 +122,7 @@ OPTIONS - in_tx: branch in TSX transaction - abort: TSX transaction abort. - cycles: Cycles in basic block + - ipc: average ipc (instruction per cycle) of function And default sort keys are changed to comm, dso_from, symbol_from, dso_to and symbol_to, see '--branch-stack'. diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c index 257c9c1..a6756dc 100644 --- a/tools/perf/builtin-report.c +++ b/tools/perf/builtin-report.c @@ -85,6 +85,7 @@ struct report { int socket_filter; DECLARE_BITMAP(cpu_bitmap, MAX_NR_CPUS); struct branch_type_stat brtype_stat; + boolsymbol_ipc; }; static int report__config(const char *var, const char *value, void *cb) @@ -129,7 +130,7 @@ static int hist_iter__report_callback(struct hist_entry_iter *iter, struct mem_info *mi; struct branch_info *bi; - if (!ui__has_annotation()) + if (!ui__has_annotation() && !rep->symbol_ipc) return 0; hist__account_cycles(sample->branch_stack, al, sample, @@ -174,7 +175,7 @@ static int hist_iter__branch_callback(struct hist_entry_iter *iter, struct perf_evsel *evsel = iter->evsel; int err; - if (!ui__has_annotation()) + if (!ui__has_annotation() && !rep->symbol_ipc) return 0; hist__account_cycles(sample->branch_stack, al, sample, @@ -1284,6 +1285,13 @@ int cmd_report(int argc, const char **argv) else use_browser = 0; + if ((sort__mode == SORT_MODE__BRANCH) && sort_order && + strstr(sort_order, "ipc")) { + if (!strstr(sort_order, "symbol")) + sort_order = "symbol,ipc"; + report.symbol_ipc = true; + } + if (setup_sorting(session->evlist) < 0) { if (sort_order) parse_options_usage(report_usage, options, "s", 1); @@ -1311,7 +1319,7 @@ int cmd_report(int argc, const char **argv) * so don't allocate extra space that won't be used in the stdio * implementation. */ - if (ui__has_annotation()) { + if (ui__has_annotation() || report.symbol_ipc) { ret = symbol__annotation_init(); if (ret < 0) goto error; diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h index 3badd7f..664b5ed 100644 --- a/tools/perf/util/hist.h +++ b/tools/perf/util/hist.h @@ -62,6 +62,7 @@ enum hist_column { HISTC_TRACE, HISTC_SYM_SIZE, HISTC_DSO_SIZE, + HISTC_SYMBOL_IPC, HISTC_NR_COLS, /* Last entry */ }; diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c index f96c005..cc8ada3 100644 --- a/tools/perf/util/sort.c +++ b/tools/perf/util/sort.c @@ -13,6 +13,7 @@ #include "strlist
[PATCH v2 2/3] perf annotate: Create a annotate2 flag in struct symbol
We often use the symbol__annotate2() to annotate a specified symbol. While annotating may take some time, so in order to avoid annotating the same symbol repeatedly, the patch creates a new flag to indicate the symbol has been annotated. Signed-off-by: Jin Yao --- tools/perf/util/annotate.c | 1 + tools/perf/util/symbol.h | 1 + 2 files changed, 2 insertions(+) diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c index 4b2b1b0..f69d8e1 100644 --- a/tools/perf/util/annotate.c +++ b/tools/perf/util/annotate.c @@ -2798,6 +2798,7 @@ int symbol__annotate2(struct symbol *sym, struct map *map, struct perf_evsel *ev notes->nr_events = nr_pcnt; annotation__update_column_widths(notes); + sym->annotate2 = true; return 0; diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h index d026d21..14d9d43 100644 --- a/tools/perf/util/symbol.h +++ b/tools/perf/util/symbol.h @@ -63,6 +63,7 @@ struct symbol { u8 ignore:1; u8 inlined:1; u8 arch_sym; + boolannotate2; charname[0]; }; -- 2.7.4
[PATCH v2 0/3] perf report/annotate: Support average IPC and IPC coverage for function
Add supporting of displaying the average IPC and IPC coverage percentage per function. For example, $ perf record -g -b ... $ perf report -s symbol,ipc or perf report -s symbol,ipc --stdio Overhead Symbol IPC [IPC Coverage] 39.60% [.] __random 2.30 [ 54.8%] 18.02% [.] main 0.43 [ 54.3%] 14.21% [.] compute_flag 2.29 [100.0%] 14.16% [.] rand 0.36 [100.0%] 7.06% [.] __random_r 2.57 [ 70.5%] 6.85% [.] rand@plt 0.00 [ 0.0%] ... $ perf annotate --stdio2 Percent IPC Cycle (Average IPC: 2.30, IPC Coverage: 54.8%) Disassembly of section .text: 0003aac0 : 8.32 3.28 sub$0x18,%rsp 3.28 mov$0x1,%esi 3.28 xor%eax,%eax 3.28 cmpl $0x0,argp_program_version_hook@@GLIBC_2.2.5+0x1e0 11.57 3.28 1 ↓ je 20 lock cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+0x8a0 ↓ jne29 ↓ jmp43 11.57 1.1020: cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+0x8a0 ... v2: --- 1. Merge in Jiri's patch to support stdio mode 2. Add a new patch "perf annotate: Create a annotate2 flag in struct symbol" which records if the symbol has been annotated yet. 3. Minor update such as adding { } for multiline code in 'if' condition. Jin Yao (3): perf annotate: Compute average IPC and IPC coverage per symbol perf annotate: Create a annotate2 flag in struct symbol perf report: Display average IPC and IPC coverage per symbol tools/perf/Documentation/perf-report.txt | 1 + tools/perf/builtin-report.c | 14 --- tools/perf/util/annotate.c | 42 +--- tools/perf/util/annotate.h | 5 tools/perf/util/hist.h | 1 + tools/perf/util/sort.c | 42 tools/perf/util/sort.h | 1 + tools/perf/util/symbol.h | 1 + 8 files changed, 101 insertions(+), 6 deletions(-) -- 2.7.4
Re: [PATCH v2 3/3] perf report: Display average IPC and IPC coverage per symbol
On 11/28/2018 2:33 AM, Andi Kleen wrote: On Tue, Nov 27, 2018 at 08:09:48PM +0800, Jin Yao wrote: Support displaying the average IPC and IPC coverage for symbol in perf report TUI browser. We create a new sort-key 'ipc' for that. Another thing that would be nice would be to automatically enable these columns perf report when -b was enabled in the perf.data. We can't check for timed LBR being supported without looking into samples, but at least can check for -b. Otherwise a lot of people won't realize it's supported. It would show always 0.0 when timed LBR is not supported, but that might be ok. Perhaps make both columns empty in this case instead of 0.0. -Andi Hi Andi, Yes, I agree, people will not realize a new option 'ipc' is created. So I will enable 2 columns ("IPC" and "IPC columns") even when timed LBR is not supported. Just show empty or '-' in these columns. Thanks Jin Yao
[PATCH v3 2/3] perf annotate: Create a annotate2 flag in struct symbol
We often use the symbol__annotate2() to annotate a specified symbol. While annotating may take some time, so in order to avoid annotating the same symbol repeatedly, the patch creates a new flag to indicate the symbol has been annotated. Signed-off-by: Jin Yao --- tools/perf/util/annotate.c | 1 + tools/perf/util/symbol.h | 1 + 2 files changed, 2 insertions(+) diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c index 4b2b1b0..f69d8e1 100644 --- a/tools/perf/util/annotate.c +++ b/tools/perf/util/annotate.c @@ -2798,6 +2798,7 @@ int symbol__annotate2(struct symbol *sym, struct map *map, struct perf_evsel *ev notes->nr_events = nr_pcnt; annotation__update_column_widths(notes); + sym->annotate2 = true; return 0; diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h index d026d21..14d9d43 100644 --- a/tools/perf/util/symbol.h +++ b/tools/perf/util/symbol.h @@ -63,6 +63,7 @@ struct symbol { u8 ignore:1; u8 inlined:1; u8 arch_sym; + boolannotate2; charname[0]; }; -- 2.7.4
[PATCH v3 0/3] perf report/annotate: Support average IPC and IPC coverage for function
Add supporting of displaying the average IPC and IPC coverage percentage per function. For example, $ perf record -b ... $ perf report -s symbol or perf report -s symbol --stdio Overhead Symbol IPC [IPC Coverage] 39.60% [.] __random 2.30 [ 54.8%] 18.02% [.] main 0.43 [ 54.3%] 14.21% [.] compute_flag 2.29 [100.0%] 14.16% [.] rand 0.36 [100.0%] 7.06% [.] __random_r 2.57 [ 70.5%] 6.85% [.] rand@plt 0.00 [ 0.0%] ... $ perf annotate --stdio2 Percent IPC Cycle (Average IPC: 2.30, IPC Coverage: 54.8%) Disassembly of section .text: 0003aac0 : 8.32 3.28 sub$0x18,%rsp 3.28 mov$0x1,%esi 3.28 xor%eax,%eax 3.28 cmpl $0x0,argp_program_version_hook@@GLIBC_2.2.5+0x1e0 11.57 3.28 1 ↓ je 20 lock cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+0x8a0 ↓ jne29 ↓ jmp43 11.57 1.1020: cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+0x8a0 ... v3: --- Remove the sortkey "ipc" from command-line. The columns "IPC" and "[IPC Coverage]" are automatically enabled when "symbol" is specified. Patch "perf report: Display average IPC and IPC coverage per symbol" is impacted. v2: --- 1. Merge in Jiri's patch to support stdio mode 2. Add a new patch "perf annotate: Create a annotate2 flag in struct symbol" which records if the symbol has been annotated yet. 3. Minor update such as adding { } for multiline code in 'if' condition. Jin Yao (3): perf annotate: Compute average IPC and IPC coverage per symbol perf annotate: Create a annotate2 flag in struct symbol perf report: Display average IPC and IPC coverage per symbol tools/perf/builtin-report.c | 26 --- tools/perf/util/annotate.c | 42 --- tools/perf/util/annotate.h | 5 tools/perf/util/hist.h | 1 + tools/perf/util/sort.c | 61 + tools/perf/util/sort.h | 2 ++ tools/perf/util/symbol.h| 1 + 7 files changed, 132 insertions(+), 6 deletions(-) -- 2.7.4
[PATCH v3 3/3] perf report: Display average IPC and IPC coverage per symbol
Support displaying the average IPC and IPC coverage per symbol in perf report TUI and stdio modes. For example, $ perf record -b ... $ perf report -s symbol Overhead Symbol IPC [IPC Coverage] 39.60% [.] __random 2.30 [ 54.8%] 18.02% [.] main 0.43 [ 54.3%] 14.21% [.] compute_flag 2.29 [100.0%] 14.16% [.] rand 0.36 [100.0%] 7.06% [.] __random_r 2.57 [ 70.5%] 6.85% [.] rand@plt 0.00 [ 0.0%] Jiri Olsa provides the patch to support the stdio mode. I merge Jiri's code in this patch. $ perf report -s symbol --stdio # Overhead Symbol IPC [IPC Coverage] # ... # 39.60% [.] __random 2.30 [ 54.8%] 18.02% [.] main 0.43 [ 54.3%] 14.21% [.] compute_flag 2.29 [100.0%] 14.16% [.] rand 0.36 [100.0%] 7.06% [.] __random_r 2.57 [ 70.5%] 6.85% [.] rand@plt 0.00 [ 0.0%] 0.02% [k] run_timer_softirq 1.60 [ 57.2%] The columns "IPC" and "[IPC Coverage]" are automatically enabled when the sort-key "symbol" is specified. If the perf.data doesn't contain timed LBR information, columns are filled with "-". For example, # Overhead Symbol IPC [IPC Coverage] # ... # 46.57% [.] main - - 17.60% [.] rand - - 15.84% [.] __random_r - - 11.90% [.] __random - - 6.50% [.] compute_flag - - 1.59% [.] rand@plt - - 0.00% [.] _dl_relocate_object - - 0.00% [k] tlb_flush_mmu- - 0.00% [k] perf_event_mmap - - 0.00% [k] native_sched_clock - - 0.00% [k] intel_pmu_handle_irq_v4 - - 0.00% [k] native_write_msr - - v3: --- Removed the sortkey 'ipc' from command-line. The columns "IPC" and "[IPC Coverage]" are automatically enabled when "symbol" is specified. v2: --- Merge in Jiri's patch to support stdio mode Signed-off-by: Jin Yao --- tools/perf/builtin-report.c | 26 --- tools/perf/util/hist.h | 1 + tools/perf/util/sort.c | 61 + tools/perf/util/sort.h | 2 ++ 4 files changed, 87 insertions(+), 3 deletions(-) diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c index 257c9c1..4958095 100644 --- a/tools/perf/builtin-report.c +++ b/tools/perf/builtin-report.c @@ -85,6 +85,7 @@ struct report { int socket_filter; DECLARE_BITMAP(cpu_bitmap, MAX_NR_CPUS); struct branch_type_stat brtype_stat; + boolsymbol_ipc; }; static int report__config(const char *var, const char *value, void *cb) @@ -129,7 +130,7 @@ static int hist_iter__report_callback(struct hist_entry_iter *iter, struct mem_info *mi; struct branch_info *bi; - if (!ui__has_annotation()) + if (!ui__has_annotation() && !rep->symbol_ipc) return 0; hist__account_cycles(sample->branch_stack, al, sample, @@ -174,7 +175,7 @@ static int hist_iter__branch_callback(struct hist_entry_iter *iter, struct perf_evsel *evsel = iter->evsel; int err; - if (!ui__has_annotation()) + if (!ui__has_annotation() && !rep->symbol_ipc) return 0; hist__account_cycles(sample->branch_stack, al, sample, @@ -1133,6 +1134,7 @@ int cmd_report(int argc, const char **argv) .mode = PERF_DATA_MODE_READ, }; int ret = hists__init(); + char sort_tmp[128]; if (ret < 0) return ret; @@ -1284,6 +1286,24 @@ int cmd_report(int argc, const char **argv) else use_browser = 0; + if (sort_order && strstr(sort_order, "ipc")) { + parse_options_usage(report_usage, options, "s", 1); + goto error; + } + + if (sort_order && strstr(sort_order, "symbol")) { + if (sort__mode == SORT_MODE__BRANCH) { + snprintf(sort_tmp, sizeof(sort_tmp), "%s,%s", +sort_order, "ipc_lbr"); + report.symbol_ipc = true; + } else { + snprintf(sort_tmp, sizeof(sort_tmp), "%s,%s", +sort_order, "ipc_null&qu
[PATCH v3 1/3] perf annotate: Compute average IPC and IPC coverage per symbol
Add support to perf report annotate view or perf annotate --stdio2 to aggregate the IPC derived from timed LBRs per symbol. We compute the average IPC and the IPC coverage percentage. For example, $ perf annotate --stdio2 Percent IPC Cycle (Average IPC: 2.30, IPC Coverage: 54.8%) Disassembly of section .text: 0003aac0 : 8.32 3.28 sub$0x18,%rsp 3.28 mov$0x1,%esi 3.28 xor%eax,%eax 3.28 cmpl $0x0,argp_program_version_hook@@GLIBC_2.2.5+0x1e0 11.57 3.28 1 ↓ je 20 lock cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+0x8a0 ↓ jne29 ↓ jmp43 11.57 1.1020: cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+0x8a0 0.00 1.10 1 ↓ je 43 29: lea__abort_msg@@GLIBC_PRIVATE+0x8a0,%rdi sub$0x80,%rsp → callq __lll_lock_wait_private add$0x80,%rsp 0.00 3.0043: lea__ctype_b@GLIBC_2.2.5+0x38,%rdi 3.00 lea0xc(%rsp),%rsi 8.49 3.00 1 → callq __random_r 7.91 1.94 cmpl $0x0,argp_program_version_hook@@GLIBC_2.2.5+0x1e0 0.00 1.94 1 ↓ je 68 lock decl __abort_msg@@GLIBC_PRIVATE+0x8a0 ↓ jne70 ↓ jmp8a 0.00 2.0068: decl __abort_msg@@GLIBC_PRIVATE+0x8a0 21.56 2.00 1 ↓ je 8a 70: lea__abort_msg@@GLIBC_PRIVATE+0x8a0,%rdi sub$0x80,%rsp → callq __lll_unlock_wake_private add$0x80,%rsp 21.56 2.908a: movslq 0xc(%rsp),%rax 2.90 add$0x18,%rsp 9.03 2.90 1 ← retq It shows for this symbol the average IPC is 2.30 and the IPC coverage is 54.8%. Signed-off-by: Jin Yao --- tools/perf/util/annotate.c | 41 ++--- tools/perf/util/annotate.h | 5 + 2 files changed, 43 insertions(+), 3 deletions(-) diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c index 6936daf..4b2b1b0 100644 --- a/tools/perf/util/annotate.c +++ b/tools/perf/util/annotate.c @@ -1000,6 +1000,7 @@ static unsigned annotation__count_insn(struct annotation *notes, u64 start, u64 static void annotation__count_and_fill(struct annotation *notes, u64 start, u64 end, struct cyc_hist *ch) { unsigned n_insn; + unsigned int cover_insn = 0; u64 offset; n_insn = annotation__count_insn(notes, start, end); @@ -1013,21 +1014,34 @@ static void annotation__count_and_fill(struct annotation *notes, u64 start, u64 for (offset = start; offset <= end; offset++) { struct annotation_line *al = notes->offsets[offset]; - if (al) + if (al && al->ipc == 0.0) { al->ipc = ipc; + cover_insn++; + } + } + + if (cover_insn) { + notes->hit_cycles += ch->cycles; + notes->hit_insn += n_insn * ch->num; + notes->cover_insn += cover_insn; } } } void annotation__compute_ipc(struct annotation *notes, size_t size) { - u64 offset; + s64 offset; if (!notes->src || !notes->src->cycles_hist) return; + notes->total_insn = annotation__count_insn(notes, 0, size - 1); + notes->hit_cycles = 0; + notes->hit_insn = 0; + notes->cover_insn = 0; + pthread_mutex_lock(¬es->lock); - for (offset = 0; offset < size; ++offset) { + for (offset = size - 1; offset >= 0; --offset) { struct cyc_hist *ch; ch = ¬es->src->cycles_hist[offset]; @@ -2563,6 +2577,22 @@ static void disasm_line__write(struct disasm_line *dl, struct annotation *notes, disasm_line__scnprintf(dl, bf, size, !notes->options->use_offset); } +static void ipc_coverage_string(char *bf, int size, struct annotation *notes) +{ + double ipc = 0.0, coverage = 0.0; + + if (notes->hit_cycles) + ipc = notes->hit_insn / ((double)notes->hit_cycles); + + if (notes->total_insn) { + coverage = notes->cover_insn * 100.0 / + ((double)notes->total_insn); + } + + scnprintf(bf, size, "(Average IPC: %.2f, IPC Coverage: %.1f%%)", + ipc, coverage); +} + static void __annotation_line__write(struct annotation_line *al, struct annotation *notes
Re: [PATCH v3 0/3] perf report/annotate: Support average IPC and IPC coverage for function
On 11/28/2018 5:10 PM, Ingo Molnar wrote: * Jin Yao wrote: Add supporting of displaying the average IPC and IPC coverage percentage per function. For example, $ perf record -b ... $ perf report -s symbol or perf report -s symbol --stdio Overhead Symbol IPC [IPC Coverage] 39.60% [.] __random 2.30 [ 54.8%] 18.02% [.] main 0.43 [ 54.3%] 14.21% [.] compute_flag 2.29 [100.0%] 14.16% [.] rand 0.36 [100.0%] 7.06% [.] __random_r 2.57 [ 70.5%] 6.85% [.] rand@plt 0.00 [ 0.0%] ... $ perf annotate --stdio2 Percent IPC Cycle (Average IPC: 2.30, IPC Coverage: 54.8%) Disassembly of section .text: 0003aac0 : 8.32 3.28 sub$0x18,%rsp 3.28 mov$0x1,%esi 3.28 xor%eax,%eax 3.28 cmpl $0x0,argp_program_version_hook@@GLIBC_2.2.5+0x1e0 11.57 3.28 1 ↓ je 20 lock cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+0x8a0 ↓ jne29 ↓ jmp43 11.57 1.1020: cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+0x8a0 That's a nice feature: please add meaningful documentation, accessible via the perf help system preferably, that outlines how the IPC metrics should be interpreted and how they are useful when optimizing programs. Thanks, Ingo Hi Ingo, Thanks so much for your comments! I think I will add some explanations in perf/Documentation/perf-report.txt, maybe somewhere around the sort_key section (-s::). Thanks Jin Yao
Re: [PATCH v3 0/3] perf report/annotate: Support average IPC and IPC coverage for function
On 11/28/2018 6:18 PM, Jiri Olsa wrote: On Wed, Nov 28, 2018 at 11:17:57AM +0100, Jiri Olsa wrote: On Wed, Nov 28, 2018 at 11:14:55PM +0800, Jin Yao wrote: Add supporting of displaying the average IPC and IPC coverage percentage per function. For example, $ perf record -b ... $ perf report -s symbol or perf report -s symbol --stdio Overhead Symbol IPC [IPC Coverage] 39.60% [.] __random 2.30 [ 54.8%] 18.02% [.] main 0.43 [ 54.3%] 14.21% [.] compute_flag 2.29 [100.0%] 14.16% [.] rand 0.36 [100.0%] 7.06% [.] __random_r 2.57 [ 70.5%] 6.85% [.] rand@plt 0.00 [ 0.0%] ... $ perf annotate --stdio2 Percent IPC Cycle (Average IPC: 2.30, IPC Coverage: 54.8%) Disassembly of section .text: 0003aac0 : 8.32 3.28 sub$0x18,%rsp 3.28 mov$0x1,%esi 3.28 xor%eax,%eax 3.28 cmpl $0x0,argp_program_version_hook@@GLIBC_2.2.5+0x1e0 11.57 3.28 1 ↓ je 20 lock cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+0x8a0 ↓ jne29 ↓ jmp43 11.57 1.1020: cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+0x8a0 ... v3: --- Remove the sortkey "ipc" from command-line. The columns "IPC" and "[IPC Coverage]" are automatically enabled when "symbol" is specified. Patch "perf report: Display average IPC and IPC coverage per symbol" is impacted. v2: --- 1. Merge in Jiri's patch to support stdio mode 2. Add a new patch "perf annotate: Create a annotate2 flag in struct symbol" which records if the symbol has been annotated yet. 3. Minor update such as adding { } for multiline code in 'if' condition. Jin Yao (3): perf annotate: Compute average IPC and IPC coverage per symbol perf annotate: Create a annotate2 flag in struct symbol perf report: Display average IPC and IPC coverage per symbol hi, I took he liberty and moved the annotation retrieval into resort phase under progress bar scope. It's currently on top of my perf/fixes branch, could you please check it? git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git commits: 7f3ffdb9783f perf tools: Move symbol annotation to resort e87f7d3c4f10 perf tools: Add perf_evsel__output_resort_cb function 40012b422108 perf tools: Add argument to hists__resort_cb_t callback jirka Hi Jiri, Thanks for your patches. I have tested with your repo. Now I can see 2 progress bars. One is displayed at the events processing phase, the other is displayed at resorting phase. I have only one concern that is, in my test, much of time is consumed by the event processing phase, for example, 90% of time. Only 10% of time is consumed at resorting phase. So do we really need the second progress bar? Thanks Jin Yao
[PATCH v4 1/4] perf annotate: Compute average IPC and IPC coverage per symbol
Add support to perf report annotate view or perf annotate --stdio2 to aggregate the IPC derived from timed LBRs per symbol. We compute the average IPC and the IPC coverage percentage. For example, $ perf annotate --stdio2 Percent IPC Cycle (Average IPC: 2.30, IPC Coverage: 54.8%) Disassembly of section .text: 0003aac0 : 8.32 3.28 sub$0x18,%rsp 3.28 mov$0x1,%esi 3.28 xor%eax,%eax 3.28 cmpl $0x0,argp_program_version_hook@@GLIBC_2.2.5+0x1e0 11.57 3.28 1 ↓ je 20 lock cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+0x8a0 ↓ jne29 ↓ jmp43 11.57 1.1020: cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+0x8a0 0.00 1.10 1 ↓ je 43 29: lea__abort_msg@@GLIBC_PRIVATE+0x8a0,%rdi sub$0x80,%rsp → callq __lll_lock_wait_private add$0x80,%rsp 0.00 3.0043: lea__ctype_b@GLIBC_2.2.5+0x38,%rdi 3.00 lea0xc(%rsp),%rsi 8.49 3.00 1 → callq __random_r 7.91 1.94 cmpl $0x0,argp_program_version_hook@@GLIBC_2.2.5+0x1e0 0.00 1.94 1 ↓ je 68 lock decl __abort_msg@@GLIBC_PRIVATE+0x8a0 ↓ jne70 ↓ jmp8a 0.00 2.0068: decl __abort_msg@@GLIBC_PRIVATE+0x8a0 21.56 2.00 1 ↓ je 8a 70: lea__abort_msg@@GLIBC_PRIVATE+0x8a0,%rdi sub$0x80,%rsp → callq __lll_unlock_wake_private add$0x80,%rsp 21.56 2.908a: movslq 0xc(%rsp),%rax 2.90 add$0x18,%rsp 9.03 2.90 1 ← retq It shows for this symbol the average IPC is 2.30 and the IPC coverage is 54.8%. Signed-off-by: Jin Yao --- tools/perf/util/annotate.c | 41 ++--- tools/perf/util/annotate.h | 5 + 2 files changed, 43 insertions(+), 3 deletions(-) diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c index 6936daf..4b2b1b0 100644 --- a/tools/perf/util/annotate.c +++ b/tools/perf/util/annotate.c @@ -1000,6 +1000,7 @@ static unsigned annotation__count_insn(struct annotation *notes, u64 start, u64 static void annotation__count_and_fill(struct annotation *notes, u64 start, u64 end, struct cyc_hist *ch) { unsigned n_insn; + unsigned int cover_insn = 0; u64 offset; n_insn = annotation__count_insn(notes, start, end); @@ -1013,21 +1014,34 @@ static void annotation__count_and_fill(struct annotation *notes, u64 start, u64 for (offset = start; offset <= end; offset++) { struct annotation_line *al = notes->offsets[offset]; - if (al) + if (al && al->ipc == 0.0) { al->ipc = ipc; + cover_insn++; + } + } + + if (cover_insn) { + notes->hit_cycles += ch->cycles; + notes->hit_insn += n_insn * ch->num; + notes->cover_insn += cover_insn; } } } void annotation__compute_ipc(struct annotation *notes, size_t size) { - u64 offset; + s64 offset; if (!notes->src || !notes->src->cycles_hist) return; + notes->total_insn = annotation__count_insn(notes, 0, size - 1); + notes->hit_cycles = 0; + notes->hit_insn = 0; + notes->cover_insn = 0; + pthread_mutex_lock(¬es->lock); - for (offset = 0; offset < size; ++offset) { + for (offset = size - 1; offset >= 0; --offset) { struct cyc_hist *ch; ch = ¬es->src->cycles_hist[offset]; @@ -2563,6 +2577,22 @@ static void disasm_line__write(struct disasm_line *dl, struct annotation *notes, disasm_line__scnprintf(dl, bf, size, !notes->options->use_offset); } +static void ipc_coverage_string(char *bf, int size, struct annotation *notes) +{ + double ipc = 0.0, coverage = 0.0; + + if (notes->hit_cycles) + ipc = notes->hit_insn / ((double)notes->hit_cycles); + + if (notes->total_insn) { + coverage = notes->cover_insn * 100.0 / + ((double)notes->total_insn); + } + + scnprintf(bf, size, "(Average IPC: %.2f, IPC Coverage: %.1f%%)", + ipc, coverage); +} + static void __annotation_line__write(struct annotation_line *al, struct annotation *notes
[PATCH v4 4/4] perf report: Documentation average IPC and IPC coverage
Add explanations for new columns "IPC" and "IPC coverage" in perf documentation. Signed-off-by: Jin Yao --- tools/perf/Documentation/perf-report.txt | 8 1 file changed, 8 insertions(+) diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt index 474a494..e5a32f3 100644 --- a/tools/perf/Documentation/perf-report.txt +++ b/tools/perf/Documentation/perf-report.txt @@ -126,6 +126,14 @@ OPTIONS And default sort keys are changed to comm, dso_from, symbol_from, dso_to and symbol_to, see '--branch-stack'. + When the sort key symbol is specified, columns "IPC" and "IPC Coverage" + are enabled automatically. Column "IPC" reports the average IPC per function + and column "IPC coverage" reports the percentage of instructions with + sampled IPC in this function. IPC means Instruction Per Cycle. If it's low, + it indicates there may be performance bottleneck when the function is + executed, such as, memory access bottleneck. If a function has high overhead + and low IPC, it's worth further analysis for performance optimization. + If the --mem-mode option is used, the following sort keys are also available (incompatible with --branch-stack): symbol_daddr, dso_daddr, locked, tlb, mem, snoop, dcacheline. -- 2.7.4
[PATCH v4 3/4] perf report: Display average IPC and IPC coverage per symbol
Support displaying the average IPC and IPC coverage per symbol in perf report TUI and stdio modes. For example, $ perf record -b ... $ perf report -s symbol Overhead Symbol IPC [IPC Coverage] 39.60% [.] __random 2.30 [ 54.8%] 18.02% [.] main 0.43 [ 54.3%] 14.21% [.] compute_flag 2.29 [100.0%] 14.16% [.] rand 0.36 [100.0%] 7.06% [.] __random_r 2.57 [ 70.5%] 6.85% [.] rand@plt 0.00 [ 0.0%] Jiri Olsa provides the patch to support the stdio mode. I merge Jiri's code in this patch. $ perf report -s symbol --stdio # Overhead Symbol IPC [IPC Coverage] # ... # 39.60% [.] __random 2.30 [ 54.8%] 18.02% [.] main 0.43 [ 54.3%] 14.21% [.] compute_flag 2.29 [100.0%] 14.16% [.] rand 0.36 [100.0%] 7.06% [.] __random_r 2.57 [ 70.5%] 6.85% [.] rand@plt 0.00 [ 0.0%] 0.02% [k] run_timer_softirq 1.60 [ 57.2%] The columns "IPC" and "[IPC Coverage]" are automatically enabled when the sort-key "symbol" is specified. If the perf.data doesn't contain timed LBR information, columns are filled with "-". For example, # Overhead Symbol IPC [IPC Coverage] # ... # 46.57% [.] main - - 17.60% [.] rand - - 15.84% [.] __random_r - - 11.90% [.] __random - - 6.50% [.] compute_flag - - 1.59% [.] rand@plt - - 0.00% [.] _dl_relocate_object - - 0.00% [k] tlb_flush_mmu- - 0.00% [k] perf_event_mmap - - 0.00% [k] native_sched_clock - - 0.00% [k] intel_pmu_handle_irq_v4 - - 0.00% [k] native_write_msr - - v3: --- Removed the sortkey 'ipc' from command-line. The columns "IPC" and "[IPC Coverage]" are automatically enabled when "symbol" is specified. v2: --- Merge in Jiri's patch to support stdio mode Signed-off-by: Jin Yao --- tools/perf/builtin-report.c | 26 --- tools/perf/util/hist.h | 1 + tools/perf/util/sort.c | 61 + tools/perf/util/sort.h | 2 ++ 4 files changed, 87 insertions(+), 3 deletions(-) diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c index 257c9c1..4958095 100644 --- a/tools/perf/builtin-report.c +++ b/tools/perf/builtin-report.c @@ -85,6 +85,7 @@ struct report { int socket_filter; DECLARE_BITMAP(cpu_bitmap, MAX_NR_CPUS); struct branch_type_stat brtype_stat; + boolsymbol_ipc; }; static int report__config(const char *var, const char *value, void *cb) @@ -129,7 +130,7 @@ static int hist_iter__report_callback(struct hist_entry_iter *iter, struct mem_info *mi; struct branch_info *bi; - if (!ui__has_annotation()) + if (!ui__has_annotation() && !rep->symbol_ipc) return 0; hist__account_cycles(sample->branch_stack, al, sample, @@ -174,7 +175,7 @@ static int hist_iter__branch_callback(struct hist_entry_iter *iter, struct perf_evsel *evsel = iter->evsel; int err; - if (!ui__has_annotation()) + if (!ui__has_annotation() && !rep->symbol_ipc) return 0; hist__account_cycles(sample->branch_stack, al, sample, @@ -1133,6 +1134,7 @@ int cmd_report(int argc, const char **argv) .mode = PERF_DATA_MODE_READ, }; int ret = hists__init(); + char sort_tmp[128]; if (ret < 0) return ret; @@ -1284,6 +1286,24 @@ int cmd_report(int argc, const char **argv) else use_browser = 0; + if (sort_order && strstr(sort_order, "ipc")) { + parse_options_usage(report_usage, options, "s", 1); + goto error; + } + + if (sort_order && strstr(sort_order, "symbol")) { + if (sort__mode == SORT_MODE__BRANCH) { + snprintf(sort_tmp, sizeof(sort_tmp), "%s,%s", +sort_order, "ipc_lbr"); + report.symbol_ipc = true; + } else { + snprintf(sort_tmp, sizeof(sort_tmp), "%s,%s", +sort_order, "ipc_null&qu