RE: NumaTOP 1.0 launched

2013-04-17 Thread Jin, Yao
Hi, 

https://01.org/numatop/ is slightly faster to access.
Or please access https://github.com/01org/numatop to get the source directly. 

Thanks
Jin Yao

-Original Message-
From: Jin, Yao 
Sent: Wednesday, April 17, 2013 10:30 AM
To: 'l...@lwn.net'; 'linux-kernel@vger.kernel.org'; 
'linux-perf-us...@vger.kernel.org'
Subject: NumaTOP 1.0 launched

We are pleased to announce today that the NumaTOP project has been added to 
01.org. 

Performance analysis engineers know that NUMA can seriously impact performance 
and that NUMA performance analysis can be challenging. We've realized that 
currently there isn't an easy-to-use tool that lets us easily observe whether 
NUMA-related issues exist and, if so, where the NUMA bottleneck(s) reside. It 
can be quite challenging, especially in complex server environments. 

We decided to create a tool that automatically performs the typical steps in 
NUMA analysis and provides a good starting point to dive in and fix 
NUMA-related bottlenecks. That's NumaTOP!

NumaTOP is an observation tool for runtime memory locality characterization and 
analysis of processes and threads running on a NUMA system. It helps the user 
characterize the NUMA behavior of processes and threads and identify where the 
NUMA-related performance bottlenecks reside. It uses Intel performance counter 
sampling technologies and associates the performance data with system runtime 
information to provide real-time analysis for production systems.

NumaTOP is a GUI tool. It can run on Linux kernel 3.8 with a perf load latency 
patch today. That patch is planned to be integrated into kernel 3.9 or a later 
release.

To learn more about NumaTOP, visit:
http://01.org/numatop/

Best Regards
Jin Yao

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [RFC][PATCH 7/7] perf/annotate: Add branch stack / basic block information

2016-09-08 Thread Jin, Yao
Hi, 

The idea of my patch is a little bit different.  It targets to associate the 
branch
coverage percentage and branch mispredict rate with the source code then user
can see them directly in perf annotate source code / assembly view.

The screenshot to show the branch%. 
https://github.com/yaoj/perf/blob/master/2.png

The screenshot to show the mispredict rate.
https://github.com/yaoj/perf/blob/master/3.png

The --stdio mode is tested working well now and will do for --tui mode in next.

The internal review comments require the patch to keep the existing "Percent"
column unchanged and add new columns "Branch%" and "Mispred%" if LBRs
are recorded in perf.data. For example, following 3 columns will be shown at
default if LBRs are in perf.data when executing perf annotate --stdio.

Percent | Branch% | Mispred% |

With this comment, the patch needs to be changed. 

Please let me know what do you think for this patch and  if I should go ahead. 

Thanks
Jin Yao

-Original Message-
From: Arnaldo Carvalho de Melo [mailto:a...@kernel.org] 
Sent: Friday, September 9, 2016 1:11 AM
To: Andi Kleen 
Cc: Stephane Eranian ; Peter Zijlstra 
; Ingo Molnar ; LKML 
; Jiri Olsa ; Linus Torvalds 
; David Carrillo-Cisneros ; 
Alexander Shishkin ; Namhyung Kim 
; Liang, Kan ; Anshuman Khandual 
; Jin, Yao 
Subject: Re: [RFC][PATCH 7/7] perf/annotate: Add branch stack / basic block 
information

Em Thu, Sep 08, 2016 at 09:59:15AM -0700, Andi Kleen escreveu:
> On Thu, Sep 08, 2016 at 09:43:53AM -0700, Stephane Eranian wrote:
> > Hi,
> > 
> > On Thu, Sep 8, 2016 at 9:18 AM, Arnaldo Carvalho de Melo 
> >  wrote:
> > >
> > > Em Fri, Jul 08, 2016 at 06:36:32PM +0200, Peter Zijlstra escreveu:
> > > > On Fri, Jul 08, 2016 at 06:27:33PM +0200, Peter Zijlstra wrote:
> > > >
> > > > > I've been thinking of filtering all targets and branches that 
> > > > > are smaller than 0.1% in order to avoid this, but so far I've 
> > > > > just been ignoring these things.
> > > >
> > > > Like so... seems to 'work'.
> > >
> > > So I merged this one with 7/7 and this is the result, screenshot 
> > > to capture the colors:
> > >
> > >   http://vger.kernel.org/~acme/perf/annotate_basic_blocks.png
> > >
> > > Please let me know if I should go ahead and push with the combined 
> > > patch, that is now at:
> > >
> > > https://git.kernel.org/cgit/linux/kernel/git/acme/linux.git/commit
> > > /?h=perf/annotate_basic_blocks&id=baf41a43fa439ac534d21e41882a7858
> > > d5cee1e5
> > >
> > > git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git 
> > > perf/annotate_basic_blocks
> > >
> > > Is that ok?
> > >
> > I like the idea and yes, branch stack can be used for this, but I 
> > have a hard time understanding the colored output.
> > What is the explanation for the color changes?
> > How do I interpret the percentages in the comments of the assembly:
> > -54.50% (p: 42%)
> > Why not have dedicated columns before the assembly with proper column 
> > headers?
> 
> Yes columns with headers are better. Jin Yao has been looking at this 
> and already has some patches.

For --tui as well?

- Arnaldo


[PATCH] perf symbols: Not using the ubuntu debuginfo when dso load

2016-09-01 Thread Jin Yao
In dso__load, it iterates over *interesting" debug images which
have symtab/dynsym/opd section. For example, for loading the
libc-2.23.so on ubuntu 16.04, it will try 2 images in turn.

Try image 1: /lib/x86_64-linux-gnu/libc-2.23.so. Since it doesn't
have ".symtab" section, after symsrc__init return, runtime_ss = ss
(ss is associated with /lib/x86_64-linux-gnu/libc-2.23.so), and
syms_ss = NULL.

Try image 2: /usr/lib/debug/lib/x86_64-linux-gnu/libc-2.23.so.
Since it has ".symtab" section, after symsrc__init return,
syms_ss = ss (ss is associated with 
/usr/lib/debug/lib/x86_64-linux-gnu/libc-2.23.so),
and runtime_ss is not changed.

Now at the dso__load_sym(dso, map, syms_ss, runtime_ss, filter, kmod),
syms_ss is associated with /usr/lib/debug/lib/x86_64-linux-gnu/libc-2.23.so,
But the dso->name is /lib/x86_64-linux-gnu/libc-2.23.so.

In dso__load_sym, it needs to compute the dso->text_offset.

if (elf_section_by_name(elf, &ehdr, &tshdr, ".text", NULL))
dso->text_offset = tshdr.sh_addr - tshdr.sh_offset;

The issue is that tshdr is the ELF section header for
/usr/lib/debug/lib/x86_64-linux-gnu/libc-2.23.so not the header for
/lib/x86_64-linux-gnu/libc-2.23.so. So the result of dso->text_offset
is not correct.

In symbol__disassemble, it calls map__rip_2objdump to convert symbol start
address to objdump address. The converted address is
rip + map->dso->text_offset

Since this text_offset is not correct, so finally objdump uses wrong
start address to disassemble /lib/x86_64-linux-gnu/libc-2.23.so.

This patch lets dso__load not try ubuntu debuginfo
(/usr/lib/debug/lib/x86_64-linux-gnu/libc-2.23.so in this example), because
1: the image doesn't have valid assembly code (check output of objdump)
2: it can avoid above issue.

Signed-off-by: Jin Yao 
---
 tools/perf/util/symbol.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index 37e8d20..25a10a3 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -1324,7 +1324,6 @@ static bool dso__is_compatible_symtab_type(struct dso 
*dso, bool kmod,
case DSO_BINARY_TYPE__DEBUGLINK:
case DSO_BINARY_TYPE__SYSTEM_PATH_DSO:
case DSO_BINARY_TYPE__FEDORA_DEBUGINFO:
-   case DSO_BINARY_TYPE__UBUNTU_DEBUGINFO:
case DSO_BINARY_TYPE__BUILDID_DEBUGINFO:
case DSO_BINARY_TYPE__OPENEMBEDDED_DEBUGINFO:
return !kmod && dso->kernel == DSO_TYPE_USER;
@@ -1353,6 +1352,7 @@ static bool dso__is_compatible_symtab_type(struct dso 
*dso, bool kmod,
return true;
 
case DSO_BINARY_TYPE__NOT_FOUND:
+   case DSO_BINARY_TYPE__UBUNTU_DEBUGINFO:
default:
return false;
}
-- 
2.7.4



[PATCH v1 0/4] perf annotate: Display multiple events on the left side of annotate view

2017-08-15 Thread Jin Yao
perf record -e cycles,branches ...
perf annotate main --stdio

The result only shows cycles. It should show both cycles and
branches on the left side of annotate view. It works with
"--group", but need this to work even without groups.

The patch series supports to display multiple events on the
left side of annotate view for stdio, tui and gtk modes.

Jin Yao (4):
  perf annotate: create a new hists to manage multiple events samples
  perf annotate: Display multiple events for stdio mode
  perf annotate: Display multiple events for tui mode
  perf annotate: Display multiple events for gtk mode

 tools/perf/builtin-annotate.c |  62 +
 tools/perf/builtin-top.c  |   3 +-
 tools/perf/ui/browsers/annotate.c |  49 +++---
 tools/perf/ui/browsers/hists.c|   2 +-
 tools/perf/ui/gtk/annotate.c  |  35 ---
 tools/perf/util/annotate.c| 187 +-
 tools/perf/util/annotate.h|  16 +++-
 tools/perf/util/hist.h|   8 +-
 tools/perf/util/sort.c|  21 +
 tools/perf/util/sort.h|  13 +++
 10 files changed, 301 insertions(+), 95 deletions(-)

-- 
2.7.4



[PATCH v1 3/4] perf annotate: Display multiple events for tui mode

2017-08-15 Thread Jin Yao
For example:
perf record -e cycles,branches ./div
perf annotate main

  │for (i = 0; i < 20; i++) {
  │flag = compute_flag();
  5.77   4.85 │38:   xor%eax,%eax
  0.01   0.01 │→ callq  compute_flag
  │
  │count++;
  2.38   4.38 │  movcount,%edx
  0.00   0.00 │  add$0x1,%edx
  │
  │if (flag)
  │  test   %eax,%eax
  │srand(s_randseed);
  │
  │for (i = 0; i < 20; i++) {
  │flag = compute_flag();
  │
  │count++;
  0.60   0.44 │  mov%edx,count
  │
  │if (flag)
  3.99   4.32 │↓ je 82

Signed-off-by: Jin Yao 
---
 tools/perf/ui/browsers/annotate.c | 49 ---
 tools/perf/ui/browsers/hists.c|  2 +-
 tools/perf/util/annotate.h|  6 +++--
 tools/perf/util/hist.h|  8 ---
 4 files changed, 45 insertions(+), 20 deletions(-)

diff --git a/tools/perf/ui/browsers/annotate.c 
b/tools/perf/ui/browsers/annotate.c
index 80f38da..b76c238 100644
--- a/tools/perf/ui/browsers/annotate.c
+++ b/tools/perf/ui/browsers/annotate.c
@@ -428,13 +428,15 @@ static void annotate_browser__set_rb_top(struct 
annotate_browser *browser,
 }
 
 static void annotate_browser__calc_percent(struct annotate_browser *browser,
-  struct perf_evsel *evsel)
+  struct perf_evsel *evsel,
+  struct hist_entry *he)
 {
struct map_symbol *ms = browser->b.priv;
struct symbol *sym = ms->sym;
struct annotation *notes = symbol__annotation(sym);
struct disasm_line *pos, *next;
s64 len = symbol__size(sym);
+   struct hist_event *hevt;
 
browser->entries = RB_ROOT;
 
@@ -455,13 +457,25 @@ static void annotate_browser__calc_percent(struct 
annotate_browser *browser,
 
for (i = 0; i < browser->nr_events; i++) {
struct sym_hist_entry sample;
-
-   bpos->samples[i].percent = disasm__calc_percent(notes,
+   if (evsel) {
+   bpos->samples[i].percent =
+   disasm__calc_percent(notes,
evsel->idx + i,
pos->offset,
next ? next->offset : len,
&path, &sample);
-   bpos->samples[i].he = sample;
+   bpos->samples[i].he = sample;
+   } else if (he) {
+   hevt = &he->events[i];
+   notes = symbol__annotation(hevt->ms.sym);
+   bpos->samples[i].percent =
+   disasm__calc_percent(notes,
+   hevt->idx,
+   pos->offset,
+   next ? next->offset : len,
+   &path, &sample);
+   bpos->samples[i].he = sample;
+   }
 
if (max_percent < bpos->samples[i].percent)
max_percent = bpos->samples[i].percent;
@@ -567,7 +581,7 @@ static bool annotate_browser__callq(struct annotate_browser 
*browser,
}
 
pthread_mutex_unlock(¬es->lock);
-   symbol__tui_annotate(target.sym, target.map, evsel, hbt);
+   symbol__tui_annotate(target.sym, target.map, evsel, hbt, NULL);
sym_title(ms->sym, ms->map, title, sizeof(title));
ui_browser__show_title(&browser->b, title);
return true;
@@ -755,7 +769,8 @@ static void annotate_browser__update_addr_width(struct 
annotate_browser *browser
 
 static int annotate_browser__run(struct annotate_browser *browser,
 struct perf_evsel *evsel,
-struct hist_browser_timer *hbt)
+struct hist_browser_timer *hbt,
+struct hist_entry *he)
 {
struct rb_node *nd = NULL;
struct map_symbol *ms = browser->b.priv;
@@ -769,7 +784,7 @@ static int annotate_browser__run(struct annotate_browser 
*browser,
if (ui_browser__show(&browser->b, title, help) < 0)
return -1;
 
-   annotate_browser__calc_percent(browser, evsel);
+   annotate_browser__

[PATCH v1 1/4] perf annotate: create a new hists to manage multiple events samples

2017-08-15 Thread Jin Yao
An issue is found during using perf annotate.

perf record -e cycles,branches ...
perf annotate main --stdio

The result only shows cycles. It should show both cycles and
branches on the left side. It works with "--group", but need
this to work even without groups.

In current design, the hists is per event. So we need a new
hists to manage the samples for multiple events and use a new
hist_event data structure to save the map/symbol information
for per event.

Signed-off-by: Jin Yao 
---
 tools/perf/builtin-annotate.c | 60 +++
 tools/perf/util/annotate.c|  8 ++
 tools/perf/util/annotate.h|  4 +++
 tools/perf/util/sort.c| 21 +++
 tools/perf/util/sort.h| 13 ++
 5 files changed, 89 insertions(+), 17 deletions(-)

diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
index 658c920..833866c 100644
--- a/tools/perf/builtin-annotate.c
+++ b/tools/perf/builtin-annotate.c
@@ -45,6 +45,7 @@ struct perf_annotate {
bool   skip_missing;
const char *sym_hist_filter;
const char *cpu_list;
+   struct hists events_hists;
DECLARE_BITMAP(cpu_bitmap, MAX_NR_CPUS);
 };
 
@@ -153,6 +154,7 @@ static int perf_evsel__add_sample(struct perf_evsel *evsel,
struct hists *hists = evsel__hists(evsel);
struct hist_entry *he;
int ret;
+   struct hist_event *hevt;
 
if (ann->sym_hist_filter != NULL &&
(al->sym == NULL ||
@@ -177,12 +179,30 @@ static int perf_evsel__add_sample(struct perf_evsel 
*evsel,
 */
process_branch_stack(sample->branch_stack, al, sample);
 
-   he = hists__add_entry(hists, al, NULL, NULL, NULL, sample, true);
-   if (he == NULL)
-   return -ENOMEM;
+   if (symbol_conf.nr_events == 1) {
+   he = hists__add_entry(hists, al, NULL, NULL, NULL,
+ sample, true);
+   if (he == NULL)
+   return -ENOMEM;
+
+   ret = hist_entry__inc_addr_samples(he, sample, evsel->idx,
+  al->addr);
+   hists__inc_nr_samples(hists, true);
+   } else {
+   he = hists__add_entry(&ann->events_hists, al, NULL,
+ NULL, NULL, sample, true);
+   if (he == NULL)
+   return -ENOMEM;
+
+   hevt = hist_entry__event_add(he, evsel);
+   if (hevt == NULL)
+   return -ENOMEM;
+
+   ret = hist_event__inc_addr_samples(hevt, sample, hevt->idx,
+  al->addr);
+   hists__inc_nr_samples(&ann->events_hists, true);
+   }
 
-   ret = hist_entry__inc_addr_samples(he, sample, evsel->idx, al->addr);
-   hists__inc_nr_samples(hists, true);
return ret;
 }
 
@@ -304,7 +324,8 @@ static int __cmd_annotate(struct perf_annotate *ann)
int ret;
struct perf_session *session = ann->session;
struct perf_evsel *pos;
-   u64 total_nr_samples;
+   u64 total_nr_samples = 0;
+   struct hists *hists;
 
if (ann->cpu_list) {
ret = perf_session__cpu_bitmap(session, ann->cpu_list,
@@ -335,23 +356,26 @@ static int __cmd_annotate(struct perf_annotate *ann)
if (verbose > 2)
perf_session__fprintf_dsos(session, stdout);
 
-   total_nr_samples = 0;
-   evlist__for_each_entry(session->evlist, pos) {
-   struct hists *hists = evsel__hists(pos);
-   u32 nr_samples = hists->stats.nr_events[PERF_RECORD_SAMPLE];
+   if (symbol_conf.nr_events > 1) {
+   hists = &ann->events_hists;
+   total_nr_samples +=
+   hists->stats.nr_events[PERF_RECORD_SAMPLE];
+
+   hists__collapse_resort(hists, NULL);
+   hists__output_resort(hists, NULL);
+   hists__find_annotations(hists, NULL, ann);
+   } else {
+   evlist__for_each_entry(session->evlist, pos) {
+   hists = evsel__hists(pos);
+   total_nr_samples +=
+   hists->stats.nr_events[PERF_RECORD_SAMPLE];
 
-   if (nr_samples > 0) {
-   total_nr_samples += nr_samples;
hists__collapse_resort(hists, NULL);
/* Don't sort callchain */
perf_evsel__reset_sample_bit(pos, CALLCHAIN);
perf_evsel__output_resort(pos, NULL);
-
-   if (symbol_conf.event_group &&
-   !perf_evsel__is_group_leader(pos))
-   continue;
-
hists__find_annotations(hists, pos, ann);
+

[PATCH v1 4/4] perf annotate: Display multiple events for gtk mode

2017-08-15 Thread Jin Yao
For example:
perf record -e cycles,branches ./div
perf annotate main --gtk

Both the cycles and branches are displayed at the left column in
gtk window.

Signed-off-by: Jin Yao 
---
 tools/perf/ui/gtk/annotate.c | 35 ++-
 1 file changed, 22 insertions(+), 13 deletions(-)

diff --git a/tools/perf/ui/gtk/annotate.c b/tools/perf/ui/gtk/annotate.c
index 0217619..f314654 100644
--- a/tools/perf/ui/gtk/annotate.c
+++ b/tools/perf/ui/gtk/annotate.c
@@ -88,7 +88,8 @@ static int perf_gtk__get_line(char *buf, size_t size, struct 
disasm_line *dl)
 
 static int perf_gtk__annotate_symbol(GtkWidget *window, struct symbol *sym,
struct map *map, struct perf_evsel *evsel,
-   struct hist_browser_timer *hbt __maybe_unused)
+   struct hist_browser_timer *hbt __maybe_unused,
+   struct hist_entry *he)
 {
struct disasm_line *pos, *n;
struct annotation *notes;
@@ -98,6 +99,7 @@ static int perf_gtk__annotate_symbol(GtkWidget *window, 
struct symbol *sym,
GtkWidget *view;
int i;
char s[512];
+   struct hist_event *hevt;
 
notes = symbol__annotation(sym);
 
@@ -124,17 +126,18 @@ static int perf_gtk__annotate_symbol(GtkWidget *window, 
struct symbol *sym,
 
gtk_list_store_append(store, &iter);
 
-   if (perf_evsel__is_group_event(evsel)) {
-   for (i = 0; i < evsel->nr_members; i++) {
+   if (evsel) {
+   ret = perf_gtk__get_percent(s, sizeof(s), sym, pos,
+   evsel->idx);
+   } else if (he) {
+   for (i = 0; i < he->event_nr; i++) {
+   hevt = &he->events[i];
ret += perf_gtk__get_percent(s + ret,
 sizeof(s) - ret,
-sym, pos,
-evsel->idx + i);
+hevt->ms.sym, pos,
+hevt->idx);
ret += scnprintf(s + ret, sizeof(s) - ret, " ");
}
-   } else {
-   ret = perf_gtk__get_percent(s, sizeof(s), sym, pos,
-   evsel->idx);
}
 
if (ret)
@@ -157,19 +160,25 @@ static int perf_gtk__annotate_symbol(GtkWidget *window, 
struct symbol *sym,
 
 static int symbol__gtk_annotate(struct symbol *sym, struct map *map,
struct perf_evsel *evsel,
-   struct hist_browser_timer *hbt)
+   struct hist_browser_timer *hbt,
+   struct hist_entry *he)
 {
GtkWidget *window;
GtkWidget *notebook;
GtkWidget *scrolled_window;
GtkWidget *tab_label;
int err;
+   char *arch;
+
+   if (!evsel)
+   arch = perf_evsel__env_arch(he->events[0].evsel);
+   else
+   arch = perf_evsel__env_arch(evsel);
 
if (map->dso->annotate_warned)
return -1;
 
-   err = symbol__disassemble(sym, map, perf_evsel__env_arch(evsel),
- 0, NULL, NULL);
+   err = symbol__disassemble(sym, map, arch, 0, NULL, NULL);
if (err) {
char msg[BUFSIZ];
symbol__strerror_disassemble(sym, map, err, msg, sizeof(msg));
@@ -228,7 +237,7 @@ static int symbol__gtk_annotate(struct symbol *sym, struct 
map *map,
gtk_notebook_append_page(GTK_NOTEBOOK(notebook), scrolled_window,
 tab_label);
 
-   perf_gtk__annotate_symbol(scrolled_window, sym, map, evsel, hbt);
+   perf_gtk__annotate_symbol(scrolled_window, sym, map, evsel, hbt, he);
return 0;
 }
 
@@ -236,7 +245,7 @@ int hist_entry__gtk_annotate(struct hist_entry *he,
 struct perf_evsel *evsel,
 struct hist_browser_timer *hbt)
 {
-   return symbol__gtk_annotate(he->ms.sym, he->ms.map, evsel, hbt);
+   return symbol__gtk_annotate(he->ms.sym, he->ms.map, evsel, hbt, he);
 }
 
 void perf_gtk__show_annotations(void)
-- 
2.7.4



[PATCH v1 2/4] perf annotate: Display multiple events for stdio mode

2017-08-15 Thread Jin Yao
For example:
perf record -e cycles,branches ./div
perf annotate main --stdio

 Percent |  Source code & Disassembly of div for branches,cycles 
(90966 samples)

..
 :  for (i = 0; i < 20; i++) {
 :  flag = compute_flag();
5.774.85 :4004e8:   xor%eax,%eax
0.010.01 :4004ea:   callq  400640 
 :
 :  count++;
2.384.38 :4004ef:   mov0x200b57(%rip),%edx
0.000.00 :4004f5:   add$0x1,%edx
 :
 :  if (flag)
0.000.00 :4004f8:   test   %eax,%eax
 :  srand(s_randseed);
 :
 :  for (i = 0; i < 20; i++) {
 :  flag = compute_flag();
 :
 :  count++;
0.600.44 :4004fa:   mov%edx,0x200b4c(%rip)
 :
 :  if (flag)
3.994.32 :400500:   je 400532 

Signed-off-by: Jin Yao 
---
 tools/perf/builtin-annotate.c |   2 +-
 tools/perf/builtin-top.c  |   3 +-
 tools/perf/util/annotate.c| 179 --
 tools/perf/util/annotate.h|   6 +-
 4 files changed, 145 insertions(+), 45 deletions(-)

diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
index 833866c..98663bd 100644
--- a/tools/perf/builtin-annotate.c
+++ b/tools/perf/builtin-annotate.c
@@ -240,7 +240,7 @@ static int hist_entry__tty_annotate(struct hist_entry *he,
struct perf_annotate *ann)
 {
return symbol__tty_annotate(he->ms.sym, he->ms.map, evsel,
-   ann->print_line, ann->full_paths, 0, 0);
+   ann->print_line, ann->full_paths, 0, 0, he);
 }
 
 static void hists__find_annotations(struct hists *hists,
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index ee954bd..2287667 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -245,7 +245,8 @@ static void perf_top__show_details(struct perf_top *top)
printf("  Events  Pcnt (>=%d%%)\n", top->sym_pcnt_filter);
 
more = symbol__annotate_printf(symbol, he->ms.map, top->sym_evsel,
-  0, top->sym_pcnt_filter, 
top->print_entries, 4);
+  0, top->sym_pcnt_filter,
+  top->print_entries, 4, NULL);
 
if (top->evlist->enabled) {
if (top->zero)
diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 16ec881..8630108 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -1071,7 +1071,8 @@ static void annotate__branch_printf(struct block_range 
*br, u64 addr)
 
 static int disasm_line__print(struct disasm_line *dl, struct symbol *sym, u64 
start,
  struct perf_evsel *evsel, u64 len, int min_pcnt, int 
printed,
- int max_lines, struct disasm_line *queue)
+ int max_lines, struct disasm_line *queue,
+ struct hist_entry *he)
 {
static const char *prev_line;
static const char *prev_color;
@@ -1082,31 +1083,39 @@ static int disasm_line__print(struct disasm_line *dl, 
struct symbol *sym, u64 st
double *ppercents = &percent;
struct sym_hist_entry sample;
struct sym_hist_entry *psamples = &sample;
-   int i, nr_percent = 1;
+   int i, nr_percent;
const char *color;
struct annotation *notes = symbol__annotation(sym);
s64 offset = dl->offset;
const u64 addr = start + offset;
struct disasm_line *next;
struct block_range *br;
+   struct hist_event *hevt;
 
next = disasm__get_next_ip_line(¬es->src->source, dl);
-
-   if (perf_evsel__is_group_event(evsel)) {
-   nr_percent = evsel->nr_members;
-   ppercents = calloc(nr_percent, sizeof(double));
-   psamples = calloc(nr_percent, sizeof(struct 
sym_hist_entry));
-   if (ppercents == NULL || psamples == NULL) {
-   return -1;
-   }
+   nr_percent = (evsel) ? 1 : he->event_nr;
+   ppercents = calloc(nr_percent, sizeof(double));
+   psamples = calloc(nr_percent, sizeof(struct sym_hist_entry));

[PATCH v3 0/2] perf report: Implement visual marker for macro fusion in annotate

2017-06-20 Thread Jin Yao
Macro fusion merges two instructions to a single micro-op. Intel
core platform performs this hardware optimization under limited
circumstances. For example, CMP + JCC can be "fused" and executed
/retired together. While with sampling this can result in the
sample sometimes being on the JCC and sometimes on the CMP.
So for the fused instruction pair, they could be considered
together.

On Nehalem, fused instruction pairs:
cmp/test + jcc.

On other new CPU:
cmp/test/add/sub/and/inc/dec + jcc.

This patch series marks the case clearly by joining the fused
instruction pair in the arrow of the jump.

For example:

   │   ┌──cmpl   $0x0,argp_program_version_hook
 81.93 │   ├──je 20
   │   │  lock   cmpxchg %esi,0x38a9a4(%rip)
   │   │↓ jne29
   │   │↓ jmp43
 11.47 │20:└─→cmpxch %esi,0x38a999(%rip)

Change-log:
---
v3: 1.  Add checking for Nehalem (CMP, TEST). For other newer
Intel CPUs just check it by default (CMP, TEST, ADD,
SUB, AND, INC, DEC).

2.  Use Arnaldo's fix to let the display be better

v2: According to Arnaldo's comments, remove the weak function and
use an arch-specific function instead to check fused instruction
pair.

v1: Inital post

Jin Yao (2):
  perf util: Check for fused instruction
  perf report: Implement visual marker for macro fusion in annotate

 tools/perf/arch/x86/annotate/instructions.c | 37 +
 tools/perf/ui/browser.c | 29 ++
 tools/perf/ui/browser.h |  2 ++
 tools/perf/ui/browsers/annotate.c   | 32 +
 tools/perf/util/annotate.c  | 17 +
 tools/perf/util/annotate.h  |  3 +++
 6 files changed, 120 insertions(+)

-- 
2.7.4



[PATCH v3 2/2] perf report: Implement visual marker for macro fusion in annotate

2017-06-20 Thread Jin Yao
For marking the fused instructions clearly, This patch adds a
line before the first instruction of pair and joins it with the
arrow of the jump.

For example, when je is selected in annotate view, the line
before cmpl is displayed and joins the arrow of je.

   │   ┌──cmpl   $0x0,argp_program_version_hook
 81.93 │   ├──je 20
   │   │  lock   cmpxchg %esi,0x38a9a4(%rip)
   │   │↓ jne29
   │   │↓ jmp43
 11.47 │20:└─→cmpxch %esi,0x38a999(%rip)

That means the cmpl+je is fused instruction pair and they should
be considered together.

Change-log:
---
v3: Use Arnaldo's fix to let the display be better.
To get the evsel->evlist->env->cpuid, save the evsel in
annotate_browser.

v2: No more changes, just uses a new function "ins__is_fused"
to check if the instructions are fused.

v1: Initial post

Signed-off-by: Jin Yao 
---
 tools/perf/ui/browser.c   | 29 +
 tools/perf/ui/browser.h   |  2 ++
 tools/perf/ui/browsers/annotate.c | 32 
 tools/perf/util/annotate.c|  5 +
 tools/perf/util/annotate.h|  1 +
 5 files changed, 69 insertions(+)

diff --git a/tools/perf/ui/browser.c b/tools/perf/ui/browser.c
index a4d3762..9ef7677 100644
--- a/tools/perf/ui/browser.c
+++ b/tools/perf/ui/browser.c
@@ -738,6 +738,35 @@ void __ui_browser__line_arrow(struct ui_browser *browser, 
unsigned int column,
__ui_browser__line_arrow_down(browser, column, start, end);
 }
 
+void ui_browser__mark_fused(struct ui_browser *browser, unsigned int column,
+   unsigned int row, bool arrow_down)
+{
+   unsigned int end_row;
+
+   if (row >= browser->top_idx)
+   end_row = row - browser->top_idx;
+   else
+   return;
+
+   SLsmg_set_char_set(1);
+
+   if (arrow_down) {
+   ui_browser__gotorc(browser, end_row, column - 1);
+   SLsmg_write_char(SLSMG_ULCORN_CHAR);
+   ui_browser__gotorc(browser, end_row, column);
+   SLsmg_draw_hline(2);
+   ui_browser__gotorc(browser, end_row + 1, column - 1);
+   SLsmg_write_char(SLSMG_LTEE_CHAR);
+   } else {
+   ui_browser__gotorc(browser, end_row, column - 1);
+   SLsmg_write_char(SLSMG_LTEE_CHAR);
+   ui_browser__gotorc(browser, end_row, column);
+   SLsmg_draw_hline(2);
+   }
+
+   SLsmg_set_char_set(0);
+}
+
 void ui_browser__init(void)
 {
int i = 0;
diff --git a/tools/perf/ui/browser.h b/tools/perf/ui/browser.h
index be3b70e..a12eff7 100644
--- a/tools/perf/ui/browser.h
+++ b/tools/perf/ui/browser.h
@@ -43,6 +43,8 @@ void ui_browser__printf(struct ui_browser *browser, const 
char *fmt, ...);
 void ui_browser__write_graph(struct ui_browser *browser, int graph);
 void __ui_browser__line_arrow(struct ui_browser *browser, unsigned int column,
  u64 start, u64 end);
+void ui_browser__mark_fused(struct ui_browser *browser, unsigned int column,
+   unsigned int row, bool arrow_down);
 void __ui_browser__show_title(struct ui_browser *browser, const char *title);
 void ui_browser__show_title(struct ui_browser *browser, const char *title);
 int ui_browser__show(struct ui_browser *browser, const char *title,
diff --git a/tools/perf/ui/browsers/annotate.c 
b/tools/perf/ui/browsers/annotate.c
index 27f41f2..d3c7109 100644
--- a/tools/perf/ui/browsers/annotate.c
+++ b/tools/perf/ui/browsers/annotate.c
@@ -9,6 +9,7 @@
 #include "../../util/symbol.h"
 #include "../../util/evsel.h"
 #include "../../util/config.h"
+#include "../../util/evlist.h"
 #include 
 #include 
 #include 
@@ -55,6 +56,7 @@ struct annotate_browser {
struct disasm_line  *selection;
struct disasm_line  **offsets;
struct arch *arch;
+   struct perf_evsel   *evsel;
int nr_events;
u64 start;
int nr_asm_entries;
@@ -272,6 +274,28 @@ static bool disasm_line__is_valid_jump(struct disasm_line 
*dl, struct symbol *sy
return true;
 }
 
+static bool is_fused(struct annotate_browser *ab, struct disasm_line *cursor)
+{
+   struct disasm_line *pos = list_prev_entry(cursor, node);
+   struct perf_evsel *evsel = ab->evsel;
+   const char *name;
+
+   if (!pos || !evsel || !evsel->evlist || !evsel->evlist->env ||
+   !evsel->evlist->env->cpuid)
+   return false;
+
+   if (ins__is_lock(&pos->ins))
+   name = pos->ops.locked.ins.name;
+   else
+   name = pos->ins.name;
+
+   if (!name || !cursor->ins.name)
+   return false;
+
+   return ins__is_fused(ab->arch, evsel->evlist->env->cpuid, name,
+cursor->ins.na

[PATCH v3 1/2] perf util: Check for fused instruction

2017-06-20 Thread Jin Yao
Macro fusion merges two instructions to a single micro-op. Intel
core platform performs this hardware optimization under limited
circumstances.

For example, CMP + JCC can be "fused" and executed /retired
together. While with sampling this can result in the sample
sometimes being on the JCC and sometimes on the CMP.
So for the fused instruction pair, they could be considered
together.

On Nehalem, fused instruction pairs:
cmp/test + jcc.

On other new CPU:
cmp/test/add/sub/and/inc/dec + jcc.

This patch adds an x86-specific function which checks if 2
instructions are in a "fused" pair. For non-x86 arch, the
function is just NULL.

Change-log:
---
v3: Add checking for Nehalem (CMP, TEST). For other newer
Intel CPUs just check it by default (CMP, TEST, ADD,
SUB, AND, INC, DEC).

v2: Remove the origial weak function. Arnaldo points out
that doing it as a weak function that will be overridden
by the host arch doesn't work. So now it's implemented
as an arch-specific function.

v1: Initial post

Signed-off-by: Jin Yao 
---
 tools/perf/arch/x86/annotate/instructions.c | 37 +
 tools/perf/util/annotate.c  | 12 ++
 tools/perf/util/annotate.h  |  2 ++
 3 files changed, 51 insertions(+)

diff --git a/tools/perf/arch/x86/annotate/instructions.c 
b/tools/perf/arch/x86/annotate/instructions.c
index c1625f2..8d06dd4 100644
--- a/tools/perf/arch/x86/annotate/instructions.c
+++ b/tools/perf/arch/x86/annotate/instructions.c
@@ -76,3 +76,40 @@ static struct ins x86__instructions[] = {
{ .name = "xbeginq",.ops = &jump_ops, },
{ .name = "retq",   .ops = &ret_ops,  },
 };
+
+static bool x86__ins_is_fused(char *cpuid, const char *ins1, const char *ins2)
+{
+   unsigned int family, model, stepping;
+   int ret;
+
+   /*
+* cpuid = "GenuineIntel,family,model,stepping"
+*/
+   ret = sscanf(cpuid, "%*[^,],%u,%u,%u", &family, &model, &stepping);
+
+   if ((ret != 3) || (family != 6) || (model < 0x1e) ||
+strstr(ins2, "jmp")) {
+   return false;
+   }
+
+   if (model == 0x1e) {
+   /* Nehalem */
+   if ((strstr(ins1, "cmp") && !strstr(ins1, "xchg")) ||
+strstr(ins1, "test")) {
+   return true;
+   }
+   } else {
+   /* Newer platform */
+   if ((strstr(ins1, "cmp") && !strstr(ins1, "xchg")) ||
+strstr(ins1, "test") ||
+strstr(ins1, "add") ||
+strstr(ins1, "sub") ||
+strstr(ins1, "and") ||
+strstr(ins1, "inc") ||
+strstr(ins1, "dec")) {
+   return true;
+   }
+   }
+
+   return false;
+}
diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index be1caab..8cf6025 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -48,6 +48,8 @@ struct arch {
boolinitialized;
void*priv;
int (*init)(struct arch *arch);
+   bool(*ins_is_fused)(char *cpuid, const char *ins1,
+   const char *ins2);
struct  {
char comment_char;
char skip_functions_char;
@@ -129,6 +131,7 @@ static struct arch architectures[] = {
.name = "x86",
.instructions = x86__instructions,
.nr_instructions = ARRAY_SIZE(x86__instructions),
+   .ins_is_fused = x86__ins_is_fused,
.objdump =  {
.comment_char = '#',
},
@@ -171,6 +174,15 @@ int ins__scnprintf(struct ins *ins, char *bf, size_t size,
return ins__raw_scnprintf(ins, bf, size, ops);
 }
 
+bool ins__is_fused(struct arch *arch, char *cpuid, const char *ins1,
+  const char *ins2)
+{
+   if (!arch || !arch->ins_is_fused || !cpuid)
+   return false;
+
+   return arch->ins_is_fused(cpuid, ins1, ins2);
+}
+
 static int call__parse(struct arch *arch, struct ins_operands *ops, struct map 
*map)
 {
char *endptr, *tok, *name;
diff --git a/tools/perf/util/annotate.h b/tools/perf/util/annotate.h
index 2105503..35cb7b5 100644
--- a/tools/perf/util/annotate.h
+++ b/tools/perf/util/annotate.h
@@ -53,6 +53,8 @@ bool ins__is_jump(const struct ins *ins);
 bool ins__is_call(const struct ins *ins);
 bool ins__is_ret(const struct ins *ins);
 int ins__scnprintf(struct ins *ins, char *bf, size_t size, struct ins_operands 
*ops);
+bool ins__is_fused(struct arch *arch, char *cpuid, const char *ins1,
+  const char *ins2);
 
 struct annotation;
 
-- 
2.7.4



Re: [PATCH v5 2/6] perf record: Get the first sample time and last sample time

2017-11-03 Thread Jin, Yao



On 10/24/2017 3:16 PM, Jiri Olsa wrote:

On Tue, Oct 24, 2017 at 10:03:05AM +0800, Jin, Yao wrote:

SNIP


hum, could you still unset the sample if there's no time given?
and keep the speed in this case..

jirka



Hi Jiri,

I check this question again. The '--time' option is for perf report but not
for perf record.

For perf record, we have to always walk on all samples to get the time of
first sample and the time of last sample whatever buildid_all is enabled or
not enabled. So 'rec->tool.sample = NULL' is removed.

Sorry, the previous mail was replied at midnight, I was drowsy. :(

If my answer is correct, I will not send v6. If my understanding is still
not correct, please let me know.


right, I did not realize we store this unconditionaly.. then yes, it's ok

I think I've already acked this, anyway for the patchset:

Acked-by: Jiri Olsa 

thanks,
jirka



Hi Arnaldo,

Is this patch-set OK for merging or anything I should improve?

Thanks
Jin Yao



Re: [PATCH v5 2/6] perf record: Get the first sample time and last sample time

2017-11-03 Thread Jin, Yao



On 11/4/2017 12:29 AM, Arnaldo Carvalho de Melo wrote:

Em Tue, Oct 24, 2017 at 09:16:59AM +0200, Jiri Olsa escreveu:

On Tue, Oct 24, 2017 at 10:03:05AM +0800, Jin, Yao wrote:

SNIP


hum, could you still unset the sample if there's no time given?
and keep the speed in this case..

jirka



Hi Jiri,

I check this question again. The '--time' option is for perf report but not
for perf record.

For perf record, we have to always walk on all samples to get the time of
first sample and the time of last sample whatever buildid_all is enabled or
not enabled. So 'rec->tool.sample = NULL' is removed.

Sorry, the previous mail was replied at midnight, I was drowsy. :(

If my answer is correct, I will not send v6. If my understanding is still
not correct, please let me know.


right, I did not realize we store this unconditionaly.. then yes, it's ok


And should we store this unconditionally? What this patch is doing is
making 'perf record' unconditionally slower so that the generated
perf.data file becomes useful for some usecases, but not for all, only
people interested in using 'perf report/script --time' will benefit,
right?



Yes, that makes sense.


I thought that we could get this sorted out in a different fashion, i.e.
getting the first timestamp is easy, even if we don't process build-ids,
right? To get the last one we could ask the kernel to insert an extra
dummy sample at the end, one that we know the size and thus can to to
the end of the file, rewind that size, get the event and parse the
sample, agreed?



Yes, agree.


So I suggest that first make this conditional, i.e. 'perf record
--timestamps' will enable the logic you implemented, and as a followup,
if you agree, add the dummy, known size event at the end, and then even
when build-ids are not processed, the cost for getting the timestamps
will be next to zero.



So we will use 2 steps for the implementation.

Step1:
Upgrade this patch series to v6. The change is to add "--timestamps" 
option in perf record.


Step2:
As a followup patch series, let the kernel generate a dummy sample at 
the end. That maybe needs more discussions on what the format we should 
use in dummy sample and how to generate the dummy sample in kernel.


Let me complete the step1 first.

Thanks
Jin Yao


- Arnaldo

- Arnaldo
  

I think I've already acked this, anyway for the patchset:

Acked-by: Jiri Olsa 

thanks,
jirka


Re: [PATCH v5 2/6] perf record: Get the first sample time and last sample time

2017-11-05 Thread Jin, Yao



On 11/4/2017 6:24 PM, Jiri Olsa wrote:

On Fri, Nov 03, 2017 at 01:29:42PM -0300, Arnaldo Carvalho de Melo wrote:

Em Tue, Oct 24, 2017 at 09:16:59AM +0200, Jiri Olsa escreveu:

On Tue, Oct 24, 2017 at 10:03:05AM +0800, Jin, Yao wrote:

SNIP


hum, could you still unset the sample if there's no time given?
and keep the speed in this case..

jirka



Hi Jiri,

I check this question again. The '--time' option is for perf report but not
for perf record.

For perf record, we have to always walk on all samples to get the time of
first sample and the time of last sample whatever buildid_all is enabled or
not enabled. So 'rec->tool.sample = NULL' is removed.

Sorry, the previous mail was replied at midnight, I was drowsy. :(

If my answer is correct, I will not send v6. If my understanding is still
not correct, please let me know.


right, I did not realize we store this unconditionaly.. then yes, it's ok


And should we store this unconditionally? What this patch is doing is
making 'perf record' unconditionally slower so that the generated
perf.data file becomes useful for some usecases, but not for all, only
people interested in using 'perf report/script --time' will benefit,
right?


maybe we can also silently enable that when processing buildids,
(which is set by default), there's no big performance hit once
we already go through samples

jirka



It's a good idea. Default enabling --timestamps in perf record since 
buildids is enabled by default as well.


But if buildids is not enabled, then it needs to check if --timestamps 
is enabled. I will follow this rule in v6.


Thanks
Jin Yao



I thought that we could get this sorted out in a different fashion, i.e.
getting the first timestamp is easy, even if we don't process build-ids,
right? To get the last one we could ask the kernel to insert an extra
dummy sample at the end, one that we know the size and thus can to to
the end of the file, rewind that size, get the event and parse the
sample, agreed?

So I suggest that first make this conditional, i.e. 'perf record
--timestamps' will enable the logic you implemented, and as a followup,
if you agree, add the dummy, known size event at the end, and then even
when build-ids are not processed, the cost for getting the timestamps
will be next to zero.

- Arnaldo

- Arnaldo
  

I think I've already acked this, anyway for the patchset:

Acked-by: Jiri Olsa 

thanks,
jirka


[PATCH v6 3/6] perf util: Create function to parse time percent

2017-11-05 Thread Jin Yao
Current perf report/script/... have a --time option to limit the time
range of output. But right now it only supports absolute time.

For easy using, now it can support a percent of time usage.

For example:

1. Select the second 10% time slice
   perf report --time 10%/2

2. Select from 0% to 10% time slice
   perf report --time 0%-10%

It also support the multiple time ranges.

3. Select the first and second 10% time slices
   perf report --time 10%/1,10%/2

4. Select from 0% to 10% and 30% to 40% slices
   perf report --time 0%-10%,30%-40%

Change log:
---
v4: An issue is found. Following passes.
perf script --time 10%/10x12321xsdfdasfdsafdsafdsa

Now it uses strtol to replace atoi.

Signed-off-by: Jin Yao 
---
 tools/perf/util/time-utils.c | 205 ---
 tools/perf/util/time-utils.h |   3 +
 2 files changed, 196 insertions(+), 12 deletions(-)

diff --git a/tools/perf/util/time-utils.c b/tools/perf/util/time-utils.c
index 5b5d021..79e4281 100644
--- a/tools/perf/util/time-utils.c
+++ b/tools/perf/util/time-utils.c
@@ -5,6 +5,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "perf.h"
 #include "debug.h"
@@ -59,11 +60,10 @@ static int parse_timestr_sec_nsec(struct perf_time_interval 
*ptime,
return 0;
 }
 
-int perf_time__parse_str(struct perf_time_interval *ptime, const char *ostr)
+static int split_start_end(char **start, char **end, const char *ostr, char ch)
 {
char *start_str, *end_str;
char *d, *str;
-   int rc = 0;
 
if (ostr == NULL || *ostr == '\0')
return 0;
@@ -73,25 +73,35 @@ int perf_time__parse_str(struct perf_time_interval *ptime, 
const char *ostr)
if (str == NULL)
return -ENOMEM;
 
-   ptime->start = 0;
-   ptime->end = 0;
-
-   /* str has the format: ,
-* variations: ,
-* ,
-* ,
-*/
start_str = str;
-   d = strchr(start_str, ',');
+   d = strchr(start_str, ch);
if (d) {
*d = '\0';
++d;
}
end_str = d;
 
+   *start = start_str;
+   *end = end_str;
+
+   return 0;
+}
+
+int perf_time__parse_str(struct perf_time_interval *ptime, const char *ostr)
+{
+   char *start_str = NULL, *end_str;
+   int rc;
+
+   rc = split_start_end(&start_str, &end_str, ostr, ',');
+   if (rc || !start_str)
+   return rc;
+
+   ptime->start = 0;
+   ptime->end = 0;
+
rc = parse_timestr_sec_nsec(ptime, start_str, end_str);
 
-   free(str);
+   free(start_str);
 
/* make sure end time is after start time if it was given */
if (rc == 0 && ptime->end && ptime->end < ptime->start)
@@ -103,6 +113,177 @@ int perf_time__parse_str(struct perf_time_interval 
*ptime, const char *ostr)
return rc;
 }
 
+static int parse_percent(double *pcnt, char *str)
+{
+   char *c;
+
+   c = strchr(str, '%');
+   if (c)
+   *c = '\0';
+   else
+   return -1;
+
+   *pcnt = atof(str) / 100.0;
+
+   return 0;
+}
+
+static int percent_slash_split(char *str, struct perf_time_interval *ptime,
+  u64 start, u64 end)
+{
+   char *p, *end_str;
+   double pcnt, start_pcnt, end_pcnt;
+   u64 total = end - start;
+   int i;
+
+   /*
+* Example:
+* 10%/2: select the second 10% slice and the third 10% slice
+*/
+
+   /* We can modify this string since the original one is copied */
+   p = strchr(str, '/');
+   if (!p)
+   return -1;
+
+   *p = '\0';
+   if (parse_percent(&pcnt, str) < 0)
+   return -1;
+
+   p++;
+   i = (int)strtol(p, &end_str, 10);
+   if (*end_str)
+   return -1;
+
+   if (pcnt <= 0.0)
+   return -1;
+
+   start_pcnt = pcnt * (i - 1);
+   end_pcnt = pcnt * i;
+
+   if (start_pcnt < 0.0 || start_pcnt > 1.0 ||
+   end_pcnt < 0.0 || end_pcnt > 1.0) {
+   return -1;
+   }
+
+   ptime->start = start + round(start_pcnt * total);
+   ptime->end = start + round(end_pcnt * total);
+
+   return 0;
+}
+
+static int percent_dash_split(char *str, struct perf_time_interval *ptime,
+ u64 start, u64 end)
+{
+   char *start_str = NULL, *end_str;
+   double start_pcnt, end_pcnt;
+   u64 total = end - start;
+   int ret;
+
+   /*
+* Example: 0%-10%
+*/
+
+   ret = split_start_end(&start_str, &end_str, str, '-');
+   if (ret || !start_str)
+   return ret;
+
+   if ((parse_percent(&start_pcnt, start_str) != 0) ||
+   (parse_percent(&end_pcnt, end_str

[PATCH v6 5/6] perf report: support time percent and multiple time ranges

2017-11-05 Thread Jin Yao
perf report has a --time option to limit the time range of output.
It only supports absolute time.

Now this option is extended to support multiple time ranges and
support the percent of time.

For example:

1. Select the first and second 10% time slices
perf report --time 10%/1,10%/2

2. Select from 0% to 10% and 30% to 40% slices
perf report --time 0%-10%,30%-40%

Change log:
---
v6: Fix the merge issue with latest perf/core branch.
No functional changes.

v5: Add checking of first/last sample time to detect if it's recorded
in perf.data. If it's not recorded, returns error message to user.

v4: Remove perf_time__skip_sample, only uses perf_time__ranges_skip_sample

v3: Since the definitions of first_sample_time/last_sample_time
are moved from perf_session to perf_evlist so change the
related code.

Signed-off-by: Jin Yao 
---
 tools/perf/Documentation/perf-report.txt | 16 
 tools/perf/builtin-report.c  | 31 ++-
 2 files changed, 42 insertions(+), 5 deletions(-)

diff --git a/tools/perf/Documentation/perf-report.txt 
b/tools/perf/Documentation/perf-report.txt
index ddde2b5..ed36553 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -402,6 +402,22 @@ OPTIONS
stop time is not given (i.e, time string is 'x.y,') then analysis goes
to end of file.
 
+   Also support time percent with multipe time range. Time string is
+   'a%/n,b%/m,...' or 'a%-b%,c%-%d,...'. The maximum number of slices is 
10.
+
+   For example:
+   Select the second 10% time slice
+   perf report --time 10%/2
+
+   Select from 0% to 10% time slice
+   perf report --time 0%-10%
+
+   Select the first and second 10% time slices
+   perf report --time 10%/1,10%/2
+
+   Select from 0% to 10% and 30% to 40% slices
+   perf report --time 0%-10%,30%-40%
+
 --itrace::
Options for decoding instruction tracing data. The options are:
 
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 3c2d9d4..c43181d 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -51,6 +51,8 @@
 #include 
 #include 
 
+#define PTIME_RANGE_MAX10
+
 struct report {
struct perf_tooltool;
struct perf_session *session;
@@ -68,7 +70,8 @@ struct report {
const char  *cpu_list;
const char  *symbol_filter_str;
const char  *time_str;
-   struct perf_time_interval ptime;
+   struct perf_time_interval ptime_range[PTIME_RANGE_MAX];
+   int range_num;
float   min_percent;
u64 nr_entries;
u64 queue_size;
@@ -185,8 +188,10 @@ static int process_sample_event(struct perf_tool *tool,
};
int ret = 0;
 
-   if (perf_time__skip_sample(&rep->ptime, sample->time))
+   if (perf_time__ranges_skip_sample(rep->ptime_range, rep->range_num,
+ sample->time)) {
return 0;
+   }
 
if (machine__resolve(machine, &al, sample) < 0) {
pr_debug("problem processing %d event, skipping it.\n",
@@ -1073,9 +1078,25 @@ int cmd_report(int argc, const char **argv)
if (symbol__init(&session->header.env) < 0)
goto error;
 
-   if (perf_time__parse_str(&report.ptime, report.time_str) != 0) {
-   pr_err("Invalid time string\n");
-   return -EINVAL;
+   if (perf_time__parse_str(report.ptime_range, report.time_str) != 0) {
+   if (session->evlist->first_sample_time == 0 &&
+   session->evlist->last_sample_time == 0) {
+   pr_err("No first/last sample time in perf data\n");
+   return -EINVAL;
+   }
+
+   report.range_num = perf_time__percent_parse_str(
+   report.ptime_range, PTIME_RANGE_MAX,
+   report.time_str,
+   session->evlist->first_sample_time,
+   session->evlist->last_sample_time);
+
+   if (report.range_num < 0) {
+   pr_err("Invalid time string\n");
+   return -EINVAL;
+   }
+   } else {
+   report.range_num = 1;
}
 
sort__setup_elide(stdout);
-- 
2.7.4



[PATCH v6 1/6] perf header: Record first sample time and last sample time in perf file header

2017-11-05 Thread Jin Yao
perf report/script/... have a --time option to limit the time range
of output. That's very useful to slice large traces, e.g. when processing
the output of perf script for some analysis.

But right now --time only supports absolute time. Also there is no fast
way to get the start/end times of a given trace except for looking at it.
This makes it hard to e.g. only decode the first half of the trace, which
is useful for parallelization of scripts

Another problem is that perf records are variable size and there is no
synchronization mechanism. So the only way to find the last sample reliably
would be to walk all samples. But we want to avoid that in perf report/...
because it is already quite expensive. That is why storing the first sample
time and last sample time in perf record is better.

This patch creates a new header feature type HEADER_SAMPLE_TIME and related
ops. Save the first sample time and the last sample time to the feature
section in perf file header.

Change log:
---
v4: Use perf script time style for timestamp printing. Also add with
the printing of sample duration.

v3: Remove the definitions of first_sample_time/last_sample_time from
perf_session. Just define them in perf_evlist

Signed-off-by: Jin Yao 
---
 tools/perf/Documentation/perf.data-file-format.txt |  4 ++
 tools/perf/util/evlist.h   |  2 +
 tools/perf/util/header.c   | 60 ++
 tools/perf/util/header.h   |  1 +
 4 files changed, 67 insertions(+)

diff --git a/tools/perf/Documentation/perf.data-file-format.txt 
b/tools/perf/Documentation/perf.data-file-format.txt
index e90c59c..d5e3043 100644
--- a/tools/perf/Documentation/perf.data-file-format.txt
+++ b/tools/perf/Documentation/perf.data-file-format.txt
@@ -238,6 +238,10 @@ struct auxtrace_index {
struct auxtrace_index_entry entries[PERF_AUXTRACE_INDEX_ENTRY_COUNT];
 };
 
+   HEADER_SAMPLE_TIME = 21,
+
+Two uint64_t for the time of first sample and the time of last sample.
+
other bits are reserved and should ignored for now
HEADER_FEAT_BITS= 256,
 
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index 8c433e9..a08f407 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -50,6 +50,8 @@ struct perf_evlist {
struct perf_evsel *selected;
struct events_stats stats;
struct perf_env *env;
+   u64 first_sample_time;
+   u64 last_sample_time;
 };
 
 struct perf_evsel_str_handler {
diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index 6e59dcc..fd81c5b1 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -17,6 +17,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "evlist.h"
 #include "evsel.h"
@@ -36,6 +37,7 @@
 #include 
 #include "asm/bug.h"
 #include "tool.h"
+#include "time-utils.h"
 
 #include "sane_ctype.h"
 
@@ -1181,6 +1183,20 @@ static int write_stat(struct feat_fd *ff __maybe_unused,
return 0;
 }
 
+static int write_sample_time(struct feat_fd *ff,
+struct perf_evlist *evlist)
+{
+   int ret;
+
+   ret = do_write(ff, &evlist->first_sample_time,
+  sizeof(evlist->first_sample_time));
+   if (ret < 0)
+   return ret;
+
+   return do_write(ff, &evlist->last_sample_time,
+   sizeof(evlist->last_sample_time));
+}
+
 static void print_hostname(struct feat_fd *ff, FILE *fp)
 {
fprintf(fp, "# hostname : %s\n", ff->ph->env.hostname);
@@ -1506,6 +1522,28 @@ static void print_group_desc(struct feat_fd *ff, FILE 
*fp)
}
 }
 
+static void print_sample_time(struct feat_fd *ff, FILE *fp)
+{
+   struct perf_session *session;
+   char time_buf[32];
+   double d;
+
+   session = container_of(ff->ph, struct perf_session, header);
+
+   timestamp__scnprintf_usec(session->evlist->first_sample_time,
+ time_buf, sizeof(time_buf));
+   fprintf(fp, "# time of first sample : %s\n", time_buf);
+
+   timestamp__scnprintf_usec(session->evlist->last_sample_time,
+ time_buf, sizeof(time_buf));
+   fprintf(fp, "# time of last sample : %s\n", time_buf);
+
+   d = (double)(session->evlist->last_sample_time -
+   session->evlist->first_sample_time) / NSEC_PER_MSEC;
+
+   fprintf(fp, "# sample duration : %10.3f ms\n", d);
+}
+
 static int __event_process_build_id(struct build_id_event *bev,
char *filename,
struct perf_session *session)
@@ -2147,6 +2185,27 @@ static int process_cache(struct feat_fd *ff, void *data 
__maybe_unused)
return -1;
 }
 
+static int process_sam

[PATCH v6 6/6] perf script: support time percent and multiple time ranges

2017-11-05 Thread Jin Yao
perf script has a --time option to limit the time range of output.
It only supports absolute time.

Now this option is extended to support multiple time ranges and
support the percent of time.

For example:

1. Select the first and second 10% time slices
   perf script --time 10%/1,10%/2

2. Select from 0% to 10% and 30% to 40% slices
   perf script --time 0%-10%,30%-40%

Change log:
---
v6: Fix the merge issue with latest perf/core branch.
No functional changes.

v5: Add checking of first/last sample time to detect if it's recorded
in perf.data. If it's not recorded, returns error message to user.

v4: Remove perf_time__skip_sample, only uses perf_time__ranges_skip_sample

v3: Since the definitions of first_sample_time/last_sample_time
are moved from perf_session to perf_evlist so change the
related code.

Signed-off-by: Jin Yao 
---
 tools/perf/Documentation/perf-script.txt | 16 +++
 tools/perf/builtin-script.c  | 34 ++--
 2 files changed, 44 insertions(+), 6 deletions(-)

diff --git a/tools/perf/Documentation/perf-script.txt 
b/tools/perf/Documentation/perf-script.txt
index 2811fcf..5313fd1 100644
--- a/tools/perf/Documentation/perf-script.txt
+++ b/tools/perf/Documentation/perf-script.txt
@@ -321,6 +321,22 @@ include::itrace.txt[]
stop time is not given (i.e, time string is 'x.y,') then analysis goes
to end of file.
 
+   Also support time percent with multipe time range. Time string is
+   'a%/n,b%/m,...' or 'a%-b%,c%-%d,...'. The maximum number of slices is 
10.
+
+   For example:
+   Select the second 10% time slice
+   perf script --time 10%/2
+
+   Select from 0% to 10% time slice
+   perf script --time 0%-10%
+
+   Select the first and second 10% time slices
+   perf script --time 10%/1,10%/2
+
+   Select from 0% to 10% and 30% to 40% slices
+   perf script --time 0%-10%,30%-40%
+
 --max-blocks::
Set the maximum number of program blocks to print with brstackasm for
each sample.
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 89975e3..dad548f 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -1429,6 +1429,8 @@ static int perf_sample__fprintf_synth(struct perf_sample 
*sample,
return 0;
 }
 
+#define PTIME_RANGE_MAX10
+
 struct perf_script {
struct perf_tooltool;
struct perf_session *session;
@@ -1442,7 +1444,8 @@ struct perf_script {
struct thread_map   *threads;
int name_width;
const char  *time_str;
-   struct perf_time_interval ptime;
+   struct perf_time_interval ptime_range[PTIME_RANGE_MAX];
+   int range_num;
 };
 
 static int perf_evlist__max_name_len(struct perf_evlist *evlist)
@@ -1642,8 +1645,10 @@ static int process_sample_event(struct perf_tool *tool,
struct perf_script *scr = container_of(tool, struct perf_script, tool);
struct addr_location al;
 
-   if (perf_time__skip_sample(&scr->ptime, sample->time))
+   if (perf_time__ranges_skip_sample(scr->ptime_range, scr->range_num,
+ sample->time)) {
return 0;
+   }
 
if (debug_mode) {
if (sample->time < last_timestamp) {
@@ -3252,10 +3257,27 @@ int cmd_script(int argc, const char **argv)
goto out_delete;
 
/* needs to be parsed after looking up reference time */
-   if (perf_time__parse_str(&script.ptime, script.time_str) != 0) {
-   pr_err("Invalid time string\n");
-   err = -EINVAL;
-   goto out_delete;
+   if (perf_time__parse_str(script.ptime_range, script.time_str) != 0) {
+   if (session->evlist->first_sample_time == 0 &&
+   session->evlist->last_sample_time == 0) {
+   pr_err("No first/last sample time in perf data\n");
+   err = -EINVAL;
+   goto out_delete;
+   }
+
+   script.range_num = perf_time__percent_parse_str(
+   script.ptime_range, PTIME_RANGE_MAX,
+   script.time_str,
+   session->evlist->first_sample_time,
+   session->evlist->last_sample_time);
+
+   if (script.range_num < 0) {
+   pr_err("Invalid time string\n");
+   err = -EINVAL;
+   goto out_delete;
+   }
+   } else {
+   script.range_num = 1;
}
 
err = __cmd_script(&script);
-- 
2.7.4



[PATCH v6 0/6] perf report/script: Support percent and multiple range in --time option

2017-11-05 Thread Jin Yao
v6:
---
1. Create a new option "--timestamp-boundary" in perf record.
   
   Currently '--buildid-all' is not enabled by default. So the walking
   on all samples is the default operation. There is no big overhead
   to calculate the timestamp boundary in process_sample_event handler
   once we already go through all samples. So the timestamp boundary
   calculation is enabled by default when '--buildid-all' is not enabled.

   While if '--buildid-all' is enabled, we creates a new option
   "--timestamp-boundary" for user to decide if it enables the
   timestamp boundary calculation.

   Impacted patch:
   ---
   perf record: Get the first sample time and last sample time

2. Fix the merge issue with the latest perf/core branch.
   No functional changes.

   Impacted patch:
   ---
   perf report: support time percent and multiple time ranges
   perf script: support time percent and multiple time ranges

v5:
---
1. There is an issue that the sample walking can only work when
   '--buildid-all' is not enabled. So we need to let the walking
   be able to work even if '--buildid-all' is enabled and let the
   processing skips the dso hit marking for this case.

2. Check if first/last sample time is recorded in perf data file.
   If it's not recorded, return error message to user.

Patched modified in v5:
   perf record: Get the first sample time and last sample time
   perf report: support time percent and multiple time ranges
   perf script: support time percent and multiple time ranges

v4:
---
1. Use perf script time style for timestamp printing. Also add with
   the printing of sample duration. For example:

   perf report --header

   time of first sample : 5276531.323099
   time of last sample : 5276555.345625
   sample duration :  24022.526 ms

2. Fix an invalid time string issue. For example,

   perf script --time 10%/10x12321xsdfdasfdsafdsafdsa

   Now in code, it uses strtol to replace atoi.

3. Remove perf_time__skip_sample, only uses perf_time__ranges_skip_sample
   in perf report/perf script.

v3:
---
1. Move the definitions of first_sample_time/last_sample_time from
   perf_session and struct record to perf_evlist and update the
   related code.

v2:
---
1. This patch creates a new header feature type HEADER_SAMPLE_TIME and related
   ops. Save the first sample time and the last sample time to the feature
   section in perf file header.

2. Add checking for last element in time range.

   For example, select the first and second 10% time slices.
   perf report --time 10%/1,10%/2

   Note that now it includes the last element in [10%, 20%] but it
   doesn't include the last element in [0, 10%). It's to avoid
   the overlap.

Following patches are changed:

   perf header: Record first sample time and last sample time in perf file 
header
   perf record: Get the first sample time and last sample time
   perf util: Create function to perform multiple time range checking

v1: initial post


Current perf report/script/... have a --time option to limit the time
range of output. But it only supports the absolute time.

The patch series extend this option to let it support percent of time
and support the multiple time ranges.

For example:

1. Select the second 10% time slice
   perf report --time 10%/2

2. Select from 0% to 10% time slice
   perf report --time 0%-10%

It also support the multiple time ranges.

3. Select the first and second 10% time slices
   perf report --time 10%/1,10%/2

4. Select from 0% to 10% and 30% to 40% slices
   perf report --time 0%-10%,30%-40%

Jin Yao (6):
  perf header: Record first sample time and last sample time in perf
file header
  perf record: Get the first sample time and last sample time
  perf util: Create function to parse time percent
  perf util: Create function to perform multiple time range checking
  perf report: support time percent and multiple time ranges
  perf script: support time percent and multiple time ranges

 tools/perf/Documentation/perf-record.txt   |   3 +
 tools/perf/Documentation/perf-report.txt   |  16 ++
 tools/perf/Documentation/perf-script.txt   |  16 ++
 tools/perf/Documentation/perf.data-file-format.txt |   4 +
 tools/perf/builtin-record.c|  18 +-
 tools/perf/builtin-report.c|  31 ++-
 tools/perf/builtin-script.c|  34 ++-
 tools/perf/util/evlist.h   |   2 +
 tools/perf/util/header.c   |  60 ++
 tools/perf/util/header.h   |   1 +
 tools/perf/util/time-utils.c   | 233 +++--
 tools/perf/util/time-utils.h   |   6 +
 12 files changed, 398 insertions(+), 26 deletions(-)

-- 
2.7.4



[PATCH v6 2/6] perf record: Get the first sample time and last sample time

2017-11-05 Thread Jin Yao
In perf record, it's walked on all samples yet. So it's very easy to get
the first/last samples and save the time to perf file header via the
function write_sample_time().

In later, perf report/script will fetch the time from perf file header.

Change log:
---
v6: Currently '--buildid-all' is not enabled at default. So the walking
on all samples is the default operation. There is no big overhead
to calculate the timestamp boundary in process_sample_event handler
once we already go through all samples. So the timestamp boundary
calculation is enabled by default when '--buildid-all' is not enabled.

While if '--buildid-all' is enabled, we creates a new option
"--timestamp-boundary" for user to decide if it enables the
timestamp boundary calculation.

v5: There is an issue that the sample walking can only work when
'--buildid-all' is not enabled. So we need to let the walking
be able to work even if '--buildid-all' is enabled and let the
processing skips the dso hit marking for this case.

At first, I want to provide a new option "--record-time-boundaries".
While after consideration, I think a new option is not very
necessary.

v3: Remove the definitions of first_sample_time and last_sample_time
    from struct record and directly save them in perf_evlist.

Signed-off-by: Jin Yao 
---
 tools/perf/Documentation/perf-record.txt |  3 +++
 tools/perf/builtin-record.c  | 18 +++---
 2 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/tools/perf/Documentation/perf-record.txt 
b/tools/perf/Documentation/perf-record.txt
index 5a626ef..3eea6de 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -430,6 +430,9 @@ Configure all used events to run in user space.
 --timestamp-filename
 Append timestamp to output file name.
 
+--timestamp-boundary::
+Record timestamp boundary (time of first/last samples).
+
 --switch-output[=mode]::
 Generate multiple perf.data files, timestamp prefixed, switching to a new one
 based on 'mode' value:
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index f4d9fc5..082a0cb 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -78,6 +78,7 @@ struct record {
boolno_buildid_cache_set;
boolbuildid_all;
booltimestamp_filename;
+   booltimestamp_boundary;
struct switch_outputswitch_output;
unsigned long long  samples;
 };
@@ -391,8 +392,15 @@ static int process_sample_event(struct perf_tool *tool,
 {
struct record *rec = container_of(tool, struct record, tool);
 
-   rec->samples++;
+   if (rec->evlist->first_sample_time == 0)
+   rec->evlist->first_sample_time = sample->time;
+
+   rec->evlist->last_sample_time = sample->time;
 
+   if (rec->buildid_all)
+   return 0;
+
+   rec->samples++;
return build_id__mark_dso_hit(tool, event, sample, evsel, machine);
 }
 
@@ -417,9 +425,11 @@ static int process_buildids(struct record *rec)
 
/*
 * If --buildid-all is given, it marks all DSO regardless of hits,
-* so no need to process samples.
+* so no need to process samples. But if timestamp_boundary is enabled,
+* it still needs to walk on all samples to get the timestamps of
+* first/last samples.
 */
-   if (rec->buildid_all)
+   if (rec->buildid_all && !rec->timestamp_boundary)
rec->tool.sample = NULL;
 
return perf_session__process_events(session);
@@ -1579,6 +1589,8 @@ static struct option __record_options[] = {
"Record build-id of all DSOs regardless of hits"),
OPT_BOOLEAN(0, "timestamp-filename", &record.timestamp_filename,
"append timestamp to output filename"),
+   OPT_BOOLEAN(0, "timestamp-boundary", &record.timestamp_boundary,
+   "Record timestamp boundary (time of first/last samples)"),
OPT_STRING_OPTARG_SET(0, "switch-output", &record.switch_output.str,
  &record.switch_output.set, "signal,size,time",
  "Switch output when receive SIGUSR2 or cross 
size,time threshold",
-- 
2.7.4



[PATCH v6 4/6] perf util: Create function to perform multiple time range checking

2017-11-05 Thread Jin Yao
Previous patch supports the multiple time range.

For example, select the first and second 10% time slices.
perf report --time 10%/1,10%/2

We need a function to check if a timestamp is in the ranges of
[0, 10%) and [10%, 20%].

Note that it includes the last element in [10%, 20%] but it
doesn't include the last element in [0, 10%). It's to avoid
the overlap.

This patch implments a new function perf_time__ranges_skip_sample
for this checking.

Change log:
---
v4: Let perf_time__ranges_skip_sample be compatible with
perf_time__skip_sample when only one time range.

Signed-off-by: Jin Yao 
---
 tools/perf/util/time-utils.c | 28 
 tools/perf/util/time-utils.h |  3 +++
 2 files changed, 31 insertions(+)

diff --git a/tools/perf/util/time-utils.c b/tools/perf/util/time-utils.c
index 79e4281..b380356 100644
--- a/tools/perf/util/time-utils.c
+++ b/tools/perf/util/time-utils.c
@@ -299,6 +299,34 @@ bool perf_time__skip_sample(struct perf_time_interval 
*ptime, u64 timestamp)
return false;
 }
 
+bool perf_time__ranges_skip_sample(struct perf_time_interval *ptime_buf,
+  int num, u64 timestamp)
+{
+   struct perf_time_interval *ptime;
+   int i;
+
+   if ((timestamp == 0) || (num == 0))
+   return false;
+
+   if (num == 1)
+   return perf_time__skip_sample(&ptime_buf[0], timestamp);
+
+   /*
+* start/end of multiple time ranges must be valid.
+*/
+   for (i = 0; i < num; i++) {
+   ptime = &ptime_buf[i];
+
+   if (timestamp >= ptime->start &&
+   ((timestamp < ptime->end && i < num - 1) ||
+(timestamp <= ptime->end && i == num - 1))) {
+   break;
+   }
+   }
+
+   return (i == num) ? true : false;
+}
+
 int timestamp__scnprintf_usec(u64 timestamp, char *buf, size_t sz)
 {
u64  sec = timestamp / NSEC_PER_SEC;
diff --git a/tools/perf/util/time-utils.h b/tools/perf/util/time-utils.h
index fd018e2..de279ea 100644
--- a/tools/perf/util/time-utils.h
+++ b/tools/perf/util/time-utils.h
@@ -17,6 +17,9 @@ int perf_time__percent_parse_str(struct perf_time_interval 
*ptime_buf, int num,
 
 bool perf_time__skip_sample(struct perf_time_interval *ptime, u64 timestamp);
 
+bool perf_time__ranges_skip_sample(struct perf_time_interval *ptime_buf,
+  int num, u64 timestamp);
+
 int timestamp__scnprintf_usec(u64 timestamp, char *buf, size_t sz);
 
 int fetch_current_timestamp(char *buf, size_t sz);
-- 
2.7.4



[PATCH] perf util: Fix wrong processing when closing evsel fd

2017-10-18 Thread Jin Yao
In current xyarray code, xyarray__max_x() returns max_y, and
xyarray__max_y() returns max_x.

It's confusing and for code logic it looks not correct.
Error happens when closing evsel fd. Let's see this scenario:

1. Allocate an fd (pseudo-code)
---

perf_evsel__alloc_fd(struct perf_evsel *evsel, int ncpus, int nthreads)
{
evsel->fd = xyarray__new(ncpus, nthreads, sizeof(int));
}

xyarray__new(int xlen, int ylen, size_t entry_size)
{
size_t row_size = ylen * entry_size;
struct xyarray *xy = zalloc(sizeof(*xy) + xlen * row_size);

xy->entry_size = entry_size;
xy->row_size   = row_size;
xy->entries= xlen * ylen;
xy->max_x  = xlen;
xy->max_y  = ylen;
..
}

So max_x is ncpus, max_y is nthreads and row_size = nthreads * 4.

2. Use perf syscall and get the fd
--

int perf_evsel__open(struct perf_evsel *evsel, struct cpu_map *cpus,
 struct thread_map *threads)
{
for (cpu = 0; cpu < cpus->nr; cpu++) {

for (thread = 0; thread < nthreads; thread++) {
int fd, group_fd;

fd = sys_perf_event_open(&evsel->attr, pid, 
cpus->map[cpu],
 group_fd, flags);

FD(evsel, cpu, thread) = fd;
}
}

static inline void *xyarray__entry(struct xyarray *xy, int x, int y)
{
return &xy->contents[x * xy->row_size + y * xy->entry_size];
}

These codes don't have issues. The issue happens in the closing of fd.

3. Close fd.
---

void perf_evsel__close_fd(struct perf_evsel *evsel)
{
int cpu, thread;

for (cpu = 0; cpu < xyarray__max_x(evsel->fd); cpu++)
for (thread = 0; thread < xyarray__max_y(evsel->fd); ++thread) {
close(FD(evsel, cpu, thread));
FD(evsel, cpu, thread) = -1;
}
}

Since xyarray__max_x() returns max_y (nthreads) and xyarry__max_y()
returns max_x (ncpus), so above code is actually to be:

for (cpu = 0; cpu < nthreads; cpu++)
for (thread = 0; thread < ncpus; ++thread) {
close(FD(evsel, cpu, thread));
FD(evsel, cpu, thread) = -1;
}

It's not correct!

This change is introduced by "475fb533fb7d"
("perf evsel: Fix buffer overflow while freeing events")

This fix is to let xyarray__max_x() return max_x (ncpus) and
let xyarry__max_y() return max_y (nthreads)

Signed-off-by: Jin Yao 
---
 tools/perf/util/xyarray.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/xyarray.h b/tools/perf/util/xyarray.h
index 4ba726c..54af6046 100644
--- a/tools/perf/util/xyarray.h
+++ b/tools/perf/util/xyarray.h
@@ -23,12 +23,12 @@ static inline void *xyarray__entry(struct xyarray *xy, int 
x, int y)
 
 static inline int xyarray__max_y(struct xyarray *xy)
 {
-   return xy->max_x;
+   return xy->max_y;
 }
 
 static inline int xyarray__max_x(struct xyarray *xy)
 {
-   return xy->max_y;
+   return xy->max_x;
 }
 
 #endif /* _PERF_XYARRAY_H_ */
-- 
2.7.4



Re: [PATCH v4 0/6] perf report/script: Support percent and multiple range in --time option

2017-10-18 Thread Jin, Yao



On 10/5/2017 4:50 PM, Jiri Olsa wrote:

On Tue, Oct 03, 2017 at 10:22:32PM +0800, Jin Yao wrote:

v4:
---
1. Use perf script time style for timestamp printing. Also add with
the printing of sample duration. For example:

perf report --header

time of first sample : 5276531.323099
time of last sample : 5276555.345625
sample duration :  24022.526 ms

2. Fix an invalid time string issue. For example,

perf script --time 10%/10x12321xsdfdasfdsafdsafdsa

Now in code, it uses strtol to replace atoi.


for the patchset:

Acked-by: Jiri Olsa 

thanks,
jirka



Thanks Jiri!

Hi Arnaldo,

Could this patchset be merged?

Thanks
Jin Yao


Re: [PATCH v4 2/6] perf record: Get the first sample time and last sample time

2017-10-19 Thread Jin, Yao



On 10/20/2017 4:25 AM, Arnaldo Carvalho de Melo wrote:

Em Thu, Oct 19, 2017 at 05:21:27PM -0300, Arnaldo Carvalho de Melo escreveu:

Em Tue, Oct 03, 2017 at 10:22:34PM +0800, Jin Yao escreveu:

In perf record, it's walked on all samples yet. So it's very easy to get


You're saying that perf record walks all samples always? That only
happens when we generate the build-id table, right? And people disable
that to speed up the process, knowing that some limitations will come
from that, for doing analysis right after running it is mostly OK to
disable the build-id processing.


So either you add a new option that processes all events without doing
build-id processing (and all the locking, struct thread, map, etc
processing it entails) and just looks at the sample->time, and when
build id processing is enabled, just do as you're doing in this patch,
then, at perf report --time you should look to see if those start/end
times were filled in and if not tell that to the user, i.e. that
either --record-time-boundaries (or a better name :-) ) has to be used,
or, that build-id process, with a short explanation that
--record-time-boundaries is a bit cheaper.

- Arnaldo
  


Hi Arnaldo,

Thanks so much for reminding me that the walking only happens when 
build-id processing is enabled. Yes, the step is same as what you said 
so there will be an issue when the build-id processing is disabled.


According to your suggestion, I will provide a new option 
"--record-time-boundaries" in perf record. If user disables the build-id 
processing, the perf record will ask user to set the 
"--record-time-boundaries". If user enables the build-id processing, the 
perf record will not ask user to set the "--record-time-boundaries".


Also in perf report, if it doesn't see start/end time filled in the perf 
file header, it will show some information to let user know he should 
set "--record-time-boundaries" or enable the build-id processing in perf 
record.


I will provide v5 patch series, maybe some days later.

Thanks
Jin Yao







[PATCH] perf/x86/intel/uncore: Provide alias for IIO free-running boxes on SKX

2018-08-26 Thread Jin Yao
For Skylake Server, Linux has supported a number of free running
counters that collect counts of IO clocks/Bandwidth/Utilization.

For example, to collect the inbound bandwidth,

root@skx /sys/devices# ls | grep uncore_iio
uncore_iio_0
uncore_iio_1
uncore_iio_2
uncore_iio_3
uncore_iio_4
uncore_iio_5
uncore_iio_free_running_0
uncore_iio_free_running_1
uncore_iio_free_running_2
uncore_iio_free_running_3
uncore_iio_free_running_4
uncore_iio_free_running_5

root@skx /sys/devices# perf stat -a -e uncore_iio_free_running_2/bw_in_port0/
^C
 Performance counter stats for 'system wide':

153.19 MiB  uncore_iio_free_running_2/bw_in_port0/

   8.037701069 seconds time elapsed

While it's hard for user to understanding what the box the
uncore_iio_free_running_N means.

This patch provides aliases for the boxes. With this patch,
for example,

root@skx /sys/devices# ls | grep uncore_iio
uncore_iio_0
uncore_iio_1
uncore_iio_2
uncore_iio_3
uncore_iio_4
uncore_iio_5
uncore_iio_cbdma
uncore_iio_mcp0
uncore_iio_mcp1
uncore_iio_pcie0
uncore_iio_pcie1
uncore_iio_pcie2

root@skx ~# perf stat -a -e uncore_iio_pcie1/bw_in_port0/
^C
 Performance counter stats for 'system wide':

153.12 MiB  uncore_iio_pcie1/bw_in_port0/

   8.469790720 seconds time elapsed

Signed-off-by: Jin Yao 
---
 arch/x86/events/intel/uncore.c   |  9 +++--
 arch/x86/events/intel/uncore.h   |  1 +
 arch/x86/events/intel/uncore_snbep.c | 10 ++
 3 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c
index 27a4614..6c6615f 100644
--- a/arch/x86/events/intel/uncore.c
+++ b/arch/x86/events/intel/uncore.c
@@ -812,8 +812,13 @@ static int uncore_pmu_register(struct intel_uncore_pmu 
*pmu)
else
sprintf(pmu->name, "uncore");
} else {
-   sprintf(pmu->name, "uncore_%s_%d", pmu->type->name,
-   pmu->pmu_idx);
+   if (pmu->type->alias && pmu->pmu_idx < pmu->type->num_boxes) {
+   sprintf(pmu->name, "uncore_%s",
+   pmu->type->alias[pmu->pmu_idx]);
+   } else {
+   sprintf(pmu->name, "uncore_%s_%d", pmu->type->name,
+   pmu->pmu_idx);
+   }
}
 
ret = perf_pmu_register(&pmu->pmu, pmu->name, -1);
diff --git a/arch/x86/events/intel/uncore.h b/arch/x86/events/intel/uncore.h
index e17ab88..db9fd9a 100644
--- a/arch/x86/events/intel/uncore.h
+++ b/arch/x86/events/intel/uncore.h
@@ -44,6 +44,7 @@ struct freerunning_counters;
 
 struct intel_uncore_type {
const char *name;
+   const char **alias;
int num_counters;
int num_boxes;
int perf_ctr_bits;
diff --git a/arch/x86/events/intel/uncore_snbep.c 
b/arch/x86/events/intel/uncore_snbep.c
index 51d7c11..9cd7e3d 100644
--- a/arch/x86/events/intel/uncore_snbep.c
+++ b/arch/x86/events/intel/uncore_snbep.c
@@ -3596,8 +3596,18 @@ static const struct attribute_group 
skx_uncore_iio_freerunning_format_group = {
.attrs = skx_uncore_iio_freerunning_formats_attr,
 };
 
+static const char * const skx_uncore_iio_free_running_aliases[] = {
+   { "iio_cbdma" },
+   { "iio_pcie0" },
+   { "iio_pcie1" },
+   { "iio_pcie2" },
+   { "iio_mcp0" },
+   { "iio_mcp1" },
+};
+
 static struct intel_uncore_type skx_uncore_iio_free_running = {
.name   = "iio_free_running",
+   .alias  = skx_uncore_iio_free_running_aliases,
.num_counters   = 17,
.num_boxes  = 6,
.num_freerunning_types  = SKX_IIO_FREERUNNING_TYPE_MAX,
-- 
2.7.4



Re: [PATCH] perf util: Use target->per_thread and target->system_wide flags

2018-01-23 Thread Jin, Yao



On 1/23/2018 10:40 PM, Jiri Olsa wrote:

On Tue, Jan 23, 2018 at 07:02:44AM +0800, Jin, Yao wrote:

SNIP


  threads = thread_map__new_str(target->pid, target->tid, target->uid,
- target->per_thread);
+ target->per_thread && 
target->system_wide);


At first glance I thought your solution would do the trick but perf
record does use target->system_wide when the '-a' switch is used.
Moreover specifying the '-a' switch doesn't prevent the '--per-thread'
option from being used as well, making both target->perf_thread and
target_system_wide equal to true (and that is not good).

Although not a fan of adding more to struct target, the advantage of
having target->all_threads is that we are guaranteed that it isn't
used anywhere else.

Let me know what you think,
Mathieu



If we specify both '-a' and '--per-thread' to perf record, perf record will
override'--per-thread'. So now target->per_thread = false, and
target->system_wide = true.

If we specify '--per-thread' only to perf record, target->per_thread = true,
and target->system_wide = false.

So whatever for any case, target->per_thread && target->system_wide is
false.

Since the parameter is false, in thread_map__new_str(), it will not execute
the thread_map__new_all_cpus(). So that will not change perf record previous
behavior.

In perf stat, it allows the case that target->per_thread and
target->system_wide are all true. That means we want to collect system-wide
per-thread metrics.


could you please put this description into comment
before the thread_map__new_str is called?

thanks,
jirka



OK, I will put this comment in v2. I will send v2 soon.

Thanks
Jin Yao


[PATCH v2] perf util: Use target->per_thread and target->system_wide flags

2018-01-23 Thread Jin Yao
Mathieu Poirier reports issue in commit ("73c0ca1eee3d perf thread_map:
Enumerate all threads from /proc") that it has negative impact on
'perf record --per-thread'. It has the effect of creating a kernel event
for each thread in the system for 'perf record --per-thread'.

Mathieu Poirier's patch ("perf util: Do not reuse target->per_thread flag")
can fix this issue by creating a new target->all_threads flag.

This patch is based on Mathieu Poirier's patch but it doesn't use a new
target->all_threads flag. This patch just uses 'target->per_thread &&
target->system_wide' as a condition to check for all threads case.

v2:
---
According to Jiri's comment, add description to explain why the patch
uses 'target->per_thread && target->system_wide'.

v2 doesn't have functional change.

Signed-off-by: Jin Yao 
---
 tools/perf/util/evlist.c | 20 +++-
 tools/perf/util/thread_map.c |  4 ++--
 tools/perf/util/thread_map.h |  2 +-
 3 files changed, 22 insertions(+), 4 deletions(-)

diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index ac35cd2..3eea0f8 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -1106,8 +1106,26 @@ int perf_evlist__create_maps(struct perf_evlist *evlist, 
struct target *target)
struct cpu_map *cpus;
struct thread_map *threads;
 
+   /*
+* If specify '-a' and '--per-thread' to perf record, perf record
+* will override '--per-thread'. target->per_thread = false and
+* target->system_wide = true.
+*
+* If specify '--per-thread' only to perf record,
+* target->per_thread = true and target->system_wide = false.
+*
+* So target->per_thread && target->system_wide is false.
+* For perf record, thread_map__new_str doesn't call
+* thread_map__new_all_cpus. That will keep perf record's
+* current behavior.
+*
+* For perf stat, it allows the case that target->per_thread and
+* target->system_wide are all true. It means to collect system-wide
+* per-thread data. thread_map__new_str will call
+* thread_map__new_all_cpus to enumerate all threads.
+*/
threads = thread_map__new_str(target->pid, target->tid, target->uid,
- target->per_thread);
+ target->per_thread && 
target->system_wide);
 
if (!threads)
return -1;
diff --git a/tools/perf/util/thread_map.c b/tools/perf/util/thread_map.c
index 3e1038f..729dad8 100644
--- a/tools/perf/util/thread_map.c
+++ b/tools/perf/util/thread_map.c
@@ -323,7 +323,7 @@ struct thread_map *thread_map__new_by_tid_str(const char 
*tid_str)
 }
 
 struct thread_map *thread_map__new_str(const char *pid, const char *tid,
-  uid_t uid, bool per_thread)
+  uid_t uid, bool all_threads)
 {
if (pid)
return thread_map__new_by_pid_str(pid);
@@ -331,7 +331,7 @@ struct thread_map *thread_map__new_str(const char *pid, 
const char *tid,
if (!tid && uid != UINT_MAX)
return thread_map__new_by_uid(uid);
 
-   if (per_thread)
+   if (all_threads)
return thread_map__new_all_cpus();
 
return thread_map__new_by_tid_str(tid);
diff --git a/tools/perf/util/thread_map.h b/tools/perf/util/thread_map.h
index 0a806b9..5ec91cf 100644
--- a/tools/perf/util/thread_map.h
+++ b/tools/perf/util/thread_map.h
@@ -31,7 +31,7 @@ struct thread_map *thread_map__get(struct thread_map *map);
 void thread_map__put(struct thread_map *map);
 
 struct thread_map *thread_map__new_str(const char *pid,
-   const char *tid, uid_t uid, bool per_thread);
+   const char *tid, uid_t uid, bool all_threads);
 
 struct thread_map *thread_map__new_by_tid_str(const char *tid_str);
 
-- 
2.7.4



[PATCH] perf util: Display warning when perf report/annotate is missing some libs

2018-01-10 Thread Jin Yao
We keep having bug reports that when users build perf on their own,
but they don't install some needed libraries like libelf, libbfd/libibery.

The perf can build, but it is missing important functionality.

For example, perf report doesn't display any symbols and perf annotate
doesn't work.

This patch displays warnings that these libraries are missing in build
when perf report / perf annotate are used.

Signed-off-by: Jin Yao 
---
 tools/perf/builtin-annotate.c |  2 ++
 tools/perf/builtin-report.c   |  2 ++
 tools/perf/util/symbol.c  | 21 +
 tools/perf/util/symbol.h  |  2 ++
 4 files changed, 27 insertions(+)

diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
index f15731a..609eb7a 100644
--- a/tools/perf/builtin-annotate.c
+++ b/tools/perf/builtin-annotate.c
@@ -511,6 +511,8 @@ int cmd_annotate(int argc, const char **argv)
 
setup_browser(true);
 
+   build_lib_warning();
+
ret = __cmd_annotate(&annotate);
 
 out_delete:
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index dd4df9a..550adb7 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -1293,6 +1293,8 @@ int cmd_report(int argc, const char **argv)
}
}
 
+   build_lib_warning();
+
if (symbol__init(&session->header.env) < 0)
goto error;
 
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index cc065d4..4205ba8 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -2224,3 +2224,24 @@ int symbol__config_symfs(const struct option *opt 
__maybe_unused,
free(bf);
return 0;
 }
+
+void build_lib_warning(void)
+{
+#ifndef HAVE_LIBELF_SUPPORT
+   pr_warning("Symbols are disabled!\n"
+  "Please install libelf-dev, libelf-devel "
+  "or elfutils-libelf-devel before building perf.\n");
+#endif
+
+#ifndef HAVE_DWARF_SUPPORT
+   pr_warning("Unwind support is disabled!\n"
+  "Please install elfutils-devel/libdw-dev "
+  "before building perf.\n");
+#endif
+
+#ifndef HAVE_LIBBFD_SUPPORT
+   pr_warning("C++ demangling and line numbers are disabled!\n"
+  "Please install binutils-dev[el]/"
+  "libiberty-dev before building perf.\n");
+#endif
+}
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index 0563f33..d88e046 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -383,6 +383,8 @@ int get_sdt_note_list(struct list_head *head, const char 
*target);
 int cleanup_sdt_note_list(struct list_head *sdt_notes);
 int sdt_notes__get_count(struct list_head *start);
 
+void build_lib_warning(void);
+
 #define SDT_BASE_SCN ".stapsdt.base"
 #define SDT_NOTE_SCN  ".note.stapsdt"
 #define SDT_NOTE_TYPE 3
-- 
2.7.4



Re: [PATCH] perf util: Display warning when perf report/annotate is missing some libs

2018-01-11 Thread Jin, Yao



On 1/11/2018 11:30 PM, Jiri Olsa wrote:

On Thu, Jan 11, 2018 at 07:03:06PM +0800, Jin Yao wrote:

We keep having bug reports that when users build perf on their own,


we already have same warnings during the build



Yes, there will be warnings displayed during the build if some libraries 
are missing.


While I do find the users (especially the new perf user) don't notice 
the warning. They only care about the success of build.


Once they use this perf version and find something working strangely, 
they will say that perf may have bug. :(



but they don't install some needed libraries like libelf, libbfd/libibery.


how about saying that in the symbol column,
instead of poluting report's output, like:

$ perf report --stdio

# Overhead  Command  Shared Object Symbol (disabled)
#   ...    ..



I just think it'd better provide some hints to user. For example, 
"symbol is disabled and you need to install libelf/xxx", say something 
like that.


But it looks the column can't contain too much information (i.e. no more 
space to contain the entire hints).


Any idea? Or just add this warning in verbose mode?


also your change does not affect tui mode

annotation for some reason does not start at all.. could be
little more verbose ;-)

jirka



Yes, it doesn't affect tui mode.

Or we just add this warning in verbose mode?

e.g. perf report -v?

Thanks
Jin Yao


Re: [PATCH] perf stat: Ignore error thread when enabling system-wide --per-thread

2018-01-21 Thread Jin, Yao



On 1/16/2018 9:17 PM, Jiri Olsa wrote:

On Tue, Jan 16, 2018 at 09:06:09PM +0800, Jin, Yao wrote:

Just tested. But looks it's not OK for '--per-thread' case.


yea, I haven't tested much.. might need soem tweaking,
but my point was that it could be doable on one place
instead of introducing another if possible

jirka



Hi Jiri,

I ever considered to move the operation of removing error thread to 
perf_evsel__fallback(). The perf_evsel__fallback() is common code and 
it's shared by perf report, perf stat and perf top.


While finally I think it'd better let the caller decide to remove error 
thread and try again, or just return the warning message. 
perf_evsel__fallback() probably doesn't know what the caller want to do.


That's my current thinking. Maybe there will be a better fix...

Thanks
Jin Yao


Re: [PATCH] perf util: Fix evlist->threads when working with 'perf stat --per-thread'

2018-01-22 Thread Jin, Yao



On 1/13/2018 3:53 AM, Arnaldo Carvalho de Melo wrote:

Em Fri, Jan 12, 2018 at 09:44:53AM -0700, Mathieu Poirier escreveu:

After commit ("73c0ca1eee3d perf thread_map: Enumerate all threads
from /proc") any condition where target->per_thread is set will see all the
threads in a system added to evlist->threads.

Since the 'record' utility also uses function thread_map__new_str(), any
attempt to use the --perf-thread option will also end up accounting for all


--perf-thread? Do you mean --per-thread, ok, so 'record' has this, uses
the opts->per_thread that then Jin reused for his recent patches... Jin,
can you try to untangle this, probably you can't reuse that
opts->per-thread thing...

Without looking further, I think that could be a better avenue to fix
this issue, without having to add a 'stat' specific method to the core
classes.

- Arnaldo



Hi,

Very sorry, I just see this mail today.

Please let me look at this issue first.

Thanks
Jin Yao


threads in a system, resulting in new kernel events being created for all
threads rather than just the thread of interest.

This patch keeps the newly introduced functionality but make sure it
doesn't affect other utilities outside of 'stat'.

Signed-off-by: Mathieu Poirier 
---
  tools/perf/builtin-stat.c |  2 +-
  tools/perf/util/evlist.c  | 29 -
  tools/perf/util/evlist.h  |  2 ++
  3 files changed, 27 insertions(+), 6 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 98bf9d32f222..82d16426b984 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -2833,7 +2833,7 @@ int cmd_stat(int argc, const char **argv)
if ((stat_config.aggr_mode == AGGR_THREAD) && (target.system_wide))
target.per_thread = true;
  
-	if (perf_evlist__create_maps(evsel_list, &target) < 0) {

+   if (perf_evlist__create_stat_maps(evsel_list, &target) < 0) {
if (target__has_task(&target)) {
pr_err("Problems finding threads of monitor\n");
parse_options_usage(stat_usage, stat_options, "p", 1);
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index f0a5e09c4071..a4ae684f28de 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -1100,13 +1100,11 @@ int perf_evlist__mmap(struct perf_evlist *evlist, 
unsigned int pages)
return perf_evlist__mmap_ex(evlist, pages, 0, false);
  }
  
-int perf_evlist__create_maps(struct perf_evlist *evlist, struct target *target)

+static int _perf_evlist__create_maps(struct perf_evlist *evlist,
+struct target *target,
+struct thread_map *threads)
  {
struct cpu_map *cpus;
-   struct thread_map *threads;
-
-   threads = thread_map__new_str(target->pid, target->tid, target->uid,
- target->per_thread);
  
  	if (!threads)

return -1;
@@ -1130,6 +1128,27 @@ int perf_evlist__create_maps(struct perf_evlist *evlist, 
struct target *target)
return -1;
  }
  
+int perf_evlist__create_maps(struct perf_evlist *evlist, struct target *target)

+{
+   struct thread_map *threads;
+
+   threads = thread_map__new_str(target->pid, target->tid, target->uid,
+ false);
+
+   return _perf_evlist__create_maps(evlist, target, threads);
+}
+
+int perf_evlist__create_stat_maps(struct perf_evlist *evlist,
+ struct target *target)
+{
+   struct thread_map *threads;
+
+   threads = thread_map__new_str(target->pid, target->tid, target->uid,
+ target->per_thread);
+
+   return _perf_evlist__create_maps(evlist, target, threads);
+}
+
  void perf_evlist__set_maps(struct perf_evlist *evlist, struct cpu_map *cpus,
   struct thread_map *threads)
  {
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index 75160666d305..80c654990e19 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -188,6 +188,8 @@ void perf_evlist__set_selected(struct perf_evlist *evlist,
  void perf_evlist__set_maps(struct perf_evlist *evlist, struct cpu_map *cpus,
   struct thread_map *threads);
  int perf_evlist__create_maps(struct perf_evlist *evlist, struct target 
*target);
+int perf_evlist__create_stat_maps(struct perf_evlist *evlist,
+ struct target *target);
  int perf_evlist__apply_filters(struct perf_evlist *evlist, struct perf_evsel 
**err_evsel);
  
  void __perf_evlist__set_leader(struct list_head *list);

--
2.7.4


Re: [PATCH 1/1] perf util: Do not reuse target->per_thread flag

2018-01-22 Thread Jin, Yao

Hi Mathieu,

Sorry, I just see this patch today.

I have tested this patch and it works.

Another idea from me is it doesn't need to add a new target->all_threads 
flag.


We just use target->per-thread && target->system_wide as a condition to 
check for all per-thread case.


I just think for your perf record case, the target->system_wide will not 
be set. Instead, if target->per-thread and target->system_wide are both 
set, that means we needs to trace on all threads, right?


Thanks
Jin Yao

On 1/13/2018 7:12 AM, Mathieu Poirier wrote:

Commit ("73c0ca1eee3d perf thread_map: Enumerate all threads from /proc")
is using the target->per_thread flag to specify that all threads in a
system should be taken into account.  That is then used in function
thread_map__new_str() where all threads are added to evlist->threads.

Variable target->per_thread is also used by 'perf record' when handling
trace sessions using the --per-thread command line option.  Since the
target->per_thread flag is set all threads will be added to
evlist->threads, which has the effect of creating a kernel event for each
thread in the system.

This patch address the issue by creating a new target->all_threads flag
that gets set from the 'stat' utility, avoiding any conflict with other
utilities using target->per_thread.

Signed-off-by: Mathieu Poirier 
---
  tools/perf/builtin-stat.c| 2 +-
  tools/perf/util/evlist.c | 2 +-
  tools/perf/util/target.h | 1 +
  tools/perf/util/thread_map.c | 4 ++--
  tools/perf/util/thread_map.h | 2 +-
  5 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 98bf9d32f222..87e156bfb45e 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -2831,7 +2831,7 @@ int cmd_stat(int argc, const char **argv)
target__validate(&target);
  
  	if ((stat_config.aggr_mode == AGGR_THREAD) && (target.system_wide))

-   target.per_thread = true;
+   target.per_thread = target.all_threads = true;
  
  	if (perf_evlist__create_maps(evsel_list, &target) < 0) {

if (target__has_task(&target)) {
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index f0a5e09c4071..c239eb895612 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -1106,7 +1106,7 @@ int perf_evlist__create_maps(struct perf_evlist *evlist, 
struct target *target)
struct thread_map *threads;
  
  	threads = thread_map__new_str(target->pid, target->tid, target->uid,

- target->per_thread);
+ target->all_threads);
  
  	if (!threads)

return -1;
diff --git a/tools/perf/util/target.h b/tools/perf/util/target.h
index 6ef01a83b24e..0da8ea2b6801 100644
--- a/tools/perf/util/target.h
+++ b/tools/perf/util/target.h
@@ -15,6 +15,7 @@ struct target {
bool uses_mmap;
bool default_per_cpu;
bool per_thread;
+   bool all_threads;
  };
  
  enum target_errno {

diff --git a/tools/perf/util/thread_map.c b/tools/perf/util/thread_map.c
index 3e1038f6491c..729dad8f412d 100644
--- a/tools/perf/util/thread_map.c
+++ b/tools/perf/util/thread_map.c
@@ -323,7 +323,7 @@ struct thread_map *thread_map__new_by_tid_str(const char 
*tid_str)
  }
  
  struct thread_map *thread_map__new_str(const char *pid, const char *tid,

-  uid_t uid, bool per_thread)
+  uid_t uid, bool all_threads)
  {
if (pid)
return thread_map__new_by_pid_str(pid);
@@ -331,7 +331,7 @@ struct thread_map *thread_map__new_str(const char *pid, 
const char *tid,
if (!tid && uid != UINT_MAX)
return thread_map__new_by_uid(uid);
  
-	if (per_thread)

+   if (all_threads)
return thread_map__new_all_cpus();
  
  	return thread_map__new_by_tid_str(tid);

diff --git a/tools/perf/util/thread_map.h b/tools/perf/util/thread_map.h
index 0a806b99e73c..5ec91cfd1869 100644
--- a/tools/perf/util/thread_map.h
+++ b/tools/perf/util/thread_map.h
@@ -31,7 +31,7 @@ struct thread_map *thread_map__get(struct thread_map *map);
  void thread_map__put(struct thread_map *map);
  
  struct thread_map *thread_map__new_str(const char *pid,

-   const char *tid, uid_t uid, bool per_thread);
+   const char *tid, uid_t uid, bool all_threads);
  
  struct thread_map *thread_map__new_by_tid_str(const char *tid_str);
  



[PATCH] perf util: Use target->per_thread and target->system_wide flags

2018-01-22 Thread Jin Yao
Mathieu Poirier reports issue in commit ("73c0ca1eee3d perf thread_map:
Enumerate all threads from /proc") that it has negative impact on
'perf record --per-thread'. It has the effect of creating a kernel event
for each thread in the system for 'perf record --per-thread'.

Mathieu Poirier's patch ("perf util: Do not reuse target->per_thread flag")
can fix this issue by creating a new target->all_threads flag.

This patch is based on Mathieu Poirier's patch but it doesn't use a new
target->all_threads flag. This patch just uses 'target->per_thread &&
target->system_wide' as a condition to check for all threads case.

Signed-off-by: Jin Yao 
---
 tools/perf/util/evlist.c | 2 +-
 tools/perf/util/thread_map.c | 4 ++--
 tools/perf/util/thread_map.h | 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 120efd8..9dff74a 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -1106,7 +1106,7 @@ int perf_evlist__create_maps(struct perf_evlist *evlist, 
struct target *target)
struct thread_map *threads;
 
threads = thread_map__new_str(target->pid, target->tid, target->uid,
- target->per_thread);
+ target->per_thread && 
target->system_wide);
 
if (!threads)
return -1;
diff --git a/tools/perf/util/thread_map.c b/tools/perf/util/thread_map.c
index 3e1038f..729dad8 100644
--- a/tools/perf/util/thread_map.c
+++ b/tools/perf/util/thread_map.c
@@ -323,7 +323,7 @@ struct thread_map *thread_map__new_by_tid_str(const char 
*tid_str)
 }
 
 struct thread_map *thread_map__new_str(const char *pid, const char *tid,
-  uid_t uid, bool per_thread)
+  uid_t uid, bool all_threads)
 {
if (pid)
return thread_map__new_by_pid_str(pid);
@@ -331,7 +331,7 @@ struct thread_map *thread_map__new_str(const char *pid, 
const char *tid,
if (!tid && uid != UINT_MAX)
return thread_map__new_by_uid(uid);
 
-   if (per_thread)
+   if (all_threads)
return thread_map__new_all_cpus();
 
return thread_map__new_by_tid_str(tid);
diff --git a/tools/perf/util/thread_map.h b/tools/perf/util/thread_map.h
index 0a806b9..5ec91cf 100644
--- a/tools/perf/util/thread_map.h
+++ b/tools/perf/util/thread_map.h
@@ -31,7 +31,7 @@ struct thread_map *thread_map__get(struct thread_map *map);
 void thread_map__put(struct thread_map *map);
 
 struct thread_map *thread_map__new_str(const char *pid,
-   const char *tid, uid_t uid, bool per_thread);
+   const char *tid, uid_t uid, bool all_threads);
 
 struct thread_map *thread_map__new_by_tid_str(const char *tid_str);
 
-- 
2.7.4



Re: [PATCH] perf util: Use target->per_thread and target->system_wide flags

2018-01-22 Thread Jin, Yao



On 1/23/2018 5:10 AM, Mathieu Poirier wrote:

On 22 January 2018 at 15:15, Jin Yao  wrote:

Mathieu Poirier reports issue in commit ("73c0ca1eee3d perf thread_map:
Enumerate all threads from /proc") that it has negative impact on
'perf record --per-thread'. It has the effect of creating a kernel event
for each thread in the system for 'perf record --per-thread'.

Mathieu Poirier's patch ("perf util: Do not reuse target->per_thread flag")
can fix this issue by creating a new target->all_threads flag.

This patch is based on Mathieu Poirier's patch but it doesn't use a new
target->all_threads flag. This patch just uses 'target->per_thread &&
target->system_wide' as a condition to check for all threads case.

Signed-off-by: Jin Yao 
---
  tools/perf/util/evlist.c | 2 +-
  tools/perf/util/thread_map.c | 4 ++--
  tools/perf/util/thread_map.h | 2 +-
  3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 120efd8..9dff74a 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -1106,7 +1106,7 @@ int perf_evlist__create_maps(struct perf_evlist *evlist, 
struct target *target)
 struct thread_map *threads;

 threads = thread_map__new_str(target->pid, target->tid, target->uid,
- target->per_thread);
+ target->per_thread && 
target->system_wide);


At first glance I thought your solution would do the trick but perf
record does use target->system_wide when the '-a' switch is used.
Moreover specifying the '-a' switch doesn't prevent the '--per-thread'
option from being used as well, making both target->perf_thread and
target_system_wide equal to true (and that is not good).

Although not a fan of adding more to struct target, the advantage of
having target->all_threads is that we are guaranteed that it isn't
used anywhere else.

Let me know what you think,
Mathieu



If we specify both '-a' and '--per-thread' to perf record, perf record 
will override'--per-thread'. So now target->per_thread = false, and 
target->system_wide = true.


If we specify '--per-thread' only to perf record, target->per_thread = 
true, and target->system_wide = false.


So whatever for any case, target->per_thread && target->system_wide is 
false.


Since the parameter is false, in thread_map__new_str(), it will not 
execute the thread_map__new_all_cpus(). So that will not change perf 
record previous behavior.


In perf stat, it allows the case that target->per_thread and 
target->system_wide are all true. That means we want to collect 
system-wide per-thread metrics.


That's my current thinking.

Thanks
Jin Yao


Re: [PATCH] perf util: Use target->per_thread and target->system_wide flags

2018-01-22 Thread Jin, Yao



On 1/23/2018 7:56 AM, Mathieu Poirier wrote:

On 22 January 2018 at 15:15, Jin Yao  wrote:

Mathieu Poirier reports issue in commit ("73c0ca1eee3d perf thread_map:
Enumerate all threads from /proc") that it has negative impact on
'perf record --per-thread'. It has the effect of creating a kernel event
for each thread in the system for 'perf record --per-thread'.

Mathieu Poirier's patch ("perf util: Do not reuse target->per_thread flag")
can fix this issue by creating a new target->all_threads flag.

This patch is based on Mathieu Poirier's patch but it doesn't use a new
target->all_threads flag. This patch just uses 'target->per_thread &&
target->system_wide' as a condition to check for all threads case.

Signed-off-by: Jin Yao 
---
  tools/perf/util/evlist.c | 2 +-
  tools/perf/util/thread_map.c | 4 ++--
  tools/perf/util/thread_map.h | 2 +-
  3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 120efd8..9dff74a 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -1106,7 +1106,7 @@ int perf_evlist__create_maps(struct perf_evlist *evlist, 
struct target *target)
 struct thread_map *threads;

 threads = thread_map__new_str(target->pid, target->tid, target->uid,
- target->per_thread);
+ target->per_thread && 
target->system_wide);

 if (!threads)
 return -1;
diff --git a/tools/perf/util/thread_map.c b/tools/perf/util/thread_map.c
index 3e1038f..729dad8 100644
--- a/tools/perf/util/thread_map.c
+++ b/tools/perf/util/thread_map.c
@@ -323,7 +323,7 @@ struct thread_map *thread_map__new_by_tid_str(const char 
*tid_str)
  }

  struct thread_map *thread_map__new_str(const char *pid, const char *tid,
-  uid_t uid, bool per_thread)
+  uid_t uid, bool all_threads)
  {
 if (pid)
 return thread_map__new_by_pid_str(pid);
@@ -331,7 +331,7 @@ struct thread_map *thread_map__new_str(const char *pid, 
const char *tid,
 if (!tid && uid != UINT_MAX)
 return thread_map__new_by_uid(uid);

-   if (per_thread)
+   if (all_threads)
 return thread_map__new_all_cpus();

 return thread_map__new_by_tid_str(tid);
diff --git a/tools/perf/util/thread_map.h b/tools/perf/util/thread_map.h
index 0a806b9..5ec91cf 100644
--- a/tools/perf/util/thread_map.h
+++ b/tools/perf/util/thread_map.h
@@ -31,7 +31,7 @@ struct thread_map *thread_map__get(struct thread_map *map);
  void thread_map__put(struct thread_map *map);

  struct thread_map *thread_map__new_str(const char *pid,
-   const char *tid, uid_t uid, bool per_thread);
+   const char *tid, uid_t uid, bool all_threads);

  struct thread_map *thread_map__new_by_tid_str(const char *tid_str);


Reviewed-and-tested-by: Mathieu Poirier "mathieu.poir...@linaro.org"



Thanks a lot for reviewing the patch.

Thanks
Jin Yao




Re: [PATCH v7 2/6] perf record: Get the first sample time and last sample time

2018-01-04 Thread Jin, Yao



On 1/5/2018 3:09 AM, Arnaldo Carvalho de Melo wrote:

Em Fri, Dec 08, 2017 at 09:13:42PM +0800, Jin Yao escreveu:

In the default 'perf record' configuration, all samples are processed,
to create the HEADER_BUILD_ID table. So it's very easy to get the
first/last samples and save the time to perf file header via the
function write_sample_time().

Later, at post processing time, perf report/script will fetch
the time from perf file header.


So, at this point I was expecting that that record would be present on
the perf.data file:

[acme@jouet perf]$ perf record --timestamp-boundary sleep 1
Cannot read kernel map
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.001 MB perf.data (7 samples) ]
[acme@jouet perf]$ perf report -D | grep PERF_RECORD_SAMPLE | wc -l
7
[acme@jouet perf]$ perf report -D | grep PERF_RECORD_SAMPLE_TIME
[acme@jouet perf]$

What am I doing wrong?

To clarify, this is with just the first two patches in this series
applied.

- Arnaldo
  


Hi Arnaldo,

The timestamp boundary information is saved in perf file header.

So if we want to look at them, we need to add '--header' in perf report.

For example,

root@skl:/tmp# perf report -D --header | grep 'time of'
# time of first sample : 248333.706656
# time of last sample : 248357.215328

Thanks
Jin Yao


Re: [PATCH v7 2/6] perf record: Get the first sample time and last sample time

2018-01-05 Thread Jin, Yao



On 1/5/2018 8:53 PM, Arnaldo Carvalho de Melo wrote:

Em Fri, Jan 05, 2018 at 09:15:03AM +0800, Jin, Yao escreveu:



On 1/5/2018 3:09 AM, Arnaldo Carvalho de Melo wrote:

Em Fri, Dec 08, 2017 at 09:13:42PM +0800, Jin Yao escreveu:

In the default 'perf record' configuration, all samples are processed,
to create the HEADER_BUILD_ID table. So it's very easy to get the
first/last samples and save the time to perf file header via the
function write_sample_time().

Later, at post processing time, perf report/script will fetch
the time from perf file header.


So, at this point I was expecting that that record would be present on
the perf.data file:

[acme@jouet perf]$ perf record --timestamp-boundary sleep 1
Cannot read kernel map
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.001 MB perf.data (7 samples) ]
[acme@jouet perf]$ perf report -D | grep PERF_RECORD_SAMPLE | wc -l
7
[acme@jouet perf]$ perf report -D | grep PERF_RECORD_SAMPLE_TIME
[acme@jouet perf]$

What am I doing wrong?

To clarify, this is with just the first two patches in this series
applied.

- Arnaldo


Hi Arnaldo,

The timestamp boundary information is saved in perf file header.


Right, my bad, I somehow thought it would be as PERF_RECORD_SAMPLE_TIME,
duh.

Will continue reviewing, sorry about that.

- Arnaldo
  


Thanks for reviewing the patch.

If you see anything I should improve, please let me know.

Thanks
Jin Yao


So if we want to look at them, we need to add '--header' in perf report.

For example,

root@skl:/tmp# perf report -D --header | grep 'time of'
# time of first sample : 248333.706656
# time of last sample : 248357.215328

Thanks
Jin Yao


[PATCH] perf report: Fix wrong jump arrow

2018-01-28 Thread Jin Yao
When we use perf report interactive annotate view, we can see
the position of jump arrow is not correct. For example,

1. perf record -b ...
2. perf report
3. In interactive mode, select Annotate 'function'

Percent│ IPC Cycle
   │if (flag)
  1.37 │0.4┌──   1  ↓ je 82
   │   │x += x / y + y / x;
  0.00 │0.4│  1310movsd  (%rsp),%xmm0
  0.00 │0.4│   565movsd  0x8(%rsp),%xmm4
   │0.4│  movsd  0x8(%rsp),%xmm1
   │0.4│  movsd  (%rsp),%xmm3
   │0.4│  divsd  %xmm4,%xmm0
  0.00 │0.4│   579divsd  %xmm3,%xmm1
   │0.4│  movsd  (%rsp),%xmm2
   │0.4│  addsd  %xmm1,%xmm0
   │0.4│  addsd  %xmm2,%xmm0
  0.00 │0.4│  movsd  %xmm0,(%rsp)
   │   │volatile double x = 1212121212, y = 121212;
   │   │
   │   │s_randseed = time(0);
   │   │srand(s_randseed);
   │   │
   │   │for (i = 0; i < 20; i++) {
  1.37 │0.4└─→  82:   sub$0x1,%ebx
 28.21 │0.4817  ↑ jne38

The jump arrow in above example is not correct. It should add the
width of IPC and Cycle.

With this patch, the result is:

Percent│ IPC Cycle
   │if (flag)
  1.37 │0.48 1 ┌──je 82
   │   │x += x / y + y / x;
  0.00 │0.48  1310 │  movsd  (%rsp),%xmm0
  0.00 │0.48   565 │  movsd  0x8(%rsp),%xmm4
   │0.48   │  movsd  0x8(%rsp),%xmm1
   │0.48   │  movsd  (%rsp),%xmm3
   │0.48   │  divsd  %xmm4,%xmm0
  0.00 │0.48   579 │  divsd  %xmm3,%xmm1
   │0.48   │  movsd  (%rsp),%xmm2
   │0.48   │  addsd  %xmm1,%xmm0
   │0.48   │  addsd  %xmm2,%xmm0
  0.00 │0.48   │  movsd  %xmm0,(%rsp)
   │   │volatile double x = 1212121212, y = 121212;
   │   │
   │   │s_randseed = time(0);
   │   │srand(s_randseed);
   │   │
   │   │for (i = 0; i < 20; i++) {
  1.37 │0.4882:└─→sub$0x1,%ebx
 28.21 │0.4817  ↑ jne38

Signed-off-by: Jin Yao 
---
 tools/perf/ui/browsers/annotate.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/tools/perf/ui/browsers/annotate.c 
b/tools/perf/ui/browsers/annotate.c
index 2864279..e2f6663 100644
--- a/tools/perf/ui/browsers/annotate.c
+++ b/tools/perf/ui/browsers/annotate.c
@@ -319,6 +319,7 @@ static void annotate_browser__draw_current_jump(struct 
ui_browser *browser)
struct map_symbol *ms = ab->b.priv;
struct symbol *sym = ms->sym;
u8 pcnt_width = annotate_browser__pcnt_width(ab);
+   int width = 0;
 
/* PLT symbols contain external offsets */
if (strstr(sym->name, "@plt"))
@@ -340,13 +341,17 @@ static void annotate_browser__draw_current_jump(struct 
ui_browser *browser)
to = (u64)btarget->idx;
}
 
+   if (ab->have_cycles)
+   width = IPC_WIDTH + CYCLES_WIDTH;
+
ui_browser__set_color(browser, HE_COLORSET_JUMP_ARROWS);
-   __ui_browser__line_arrow(browser, pcnt_width + 2 + ab->addr_width,
+   __ui_browser__line_arrow(browser,
+pcnt_width + 2 + ab->addr_width + width,
 from, to);
 
if (is_fused(ab, cursor)) {
ui_browser__mark_fused(browser,
-  pcnt_width + 3 + ab->addr_width,
+  pcnt_width + 3 + ab->addr_width + width,
   from - 1,
   to > from ? true : false);
}
-- 
2.7.4



Re: [PATCH v1 0/8] perf: Follow-up patches to improve time slice

2018-01-16 Thread Jin, Yao



On 1/16/2018 10:48 PM, Arnaldo Carvalho de Melo wrote:

Em Tue, Jan 16, 2018 at 12:55:19PM +0100, Jiri Olsa escreveu:

On Wed, Jan 10, 2018 at 11:00:25PM +0800, Jin Yao wrote:

It's follow-up patches to improve the perf time slice feature
(perf report/script --time xxx)

1. Improve the error message
perf report: Improve error msg when no first/last sample time found
perf script: Improve error msg when no first/last sample time found

2. Fix an issue that illegal percent was accepted previously (e.g. 1abc%)
perf util: Improve error checking for time percent input

3. Omit the slice index if possible. For example,
perf report --stdio --time 10%/1 is equivalent to
perf report --stdio --time 10%

perf util: Support no index time percent slice

4. Add indication of time slices in perf report header.
perf report: Add an indication of what time slices are used
>>> 5. Remove the time slices number limitation in perf report/script
perf util: Allocate time slices buffer according to number of comma
perf report: Remove the time slices number limitation
perf script: Remove the time slices number limitation

Jin Yao (8):
   perf report: Improve error msg when no first/last sample time found
   perf script: Improve error msg when no first/last sample time found
   perf util: Improve error checking for time percent input
   perf util: Support no index time percent slice
   perf report: Add an indication of what time slices are used
   perf util: Allocate time slices buffer according to number of comma
   perf report: Remove the time slices number limitation
   perf script: Remove the time slices number limitation


from quick look it looks ok

Reviewed-by: Jiri Olsa 


Thanks, applied, now test building with 'make -C tools/perf build-test',
containers, etc.

- Arnaldo



Thanks Arnaldo!

Thanks
Jin Yao


Re: [PATCH v7 3/6] perf util: Create function to parse time percent

2018-01-08 Thread Jin, Yao



On 1/8/2018 10:38 PM, Arnaldo Carvalho de Melo wrote:

Em Mon, Jan 08, 2018 at 11:31:49AM -0300, Arnaldo Carvalho de Melo escreveu:

Em Fri, Dec 08, 2017 at 09:13:43PM +0800, Jin Yao escreveu:

Current perf report/script/... have a --time option to limit the time
range of output. But right now it only supports absolute time.

For easy using, now it can support a percent of time usage.

For example:

1. Select the second 10% time slice
perf report --time 10%/2


After applying this patch I'm not being able to get any of these
examples to work:

[root@jouet home]# perf report --header | grep "time of"
# time of first sample : 22947.909226
# time of last sample : 22948.910704
[root@jouet home]#

Then, when I try the first example:

[root@jouet home]# perf report --stdio --time 1%-20%
Invalid time string
# To display the perf.data header info, please use --header/--header-only 
options.
#
[root@jouet home]#

What am I doing wrong?


Oh well, the way you worded this cset it looked like after applying it I
would be able to use percents, etc, which is not the case, I'll probably
need to apply the next patches to _then_ this work as advertised in this
cset comment.

Please try to make it clear in the commit messages these details, to
ease reviewing.

- Arnaldo



Hi Arnaldo,

Sorry about that.

I will take care next time to try my best to make the patch description 
more clear.


Thanks
Jin Yao


Re: [PATCH v7 5/6] perf report: support time percent and multiple time ranges

2018-01-08 Thread Jin, Yao



On 1/8/2018 11:04 PM, Arnaldo Carvalho de Melo wrote:

One more thing to consider:

When you use:

perf report --time 10%/1

it will do what is asked but there is no indication of what percentage
ranges are in place, it would be nice to have this in the first line in
the TUI, right after this:

Samples: 128  of event 'cycles:ppp', Event count (approx.): 21386169 (10%/1)

- Arnaldo



Hi Arnaldo,

Thanks for merging this feature and thanks so much for providing 
comments for how to improve this feature in next step.


Please let me summarize what the follow-up patch needs to do:

1. Improve the error message, something like:
"HINT: use 'perf record --foobar' to record the first/last sample 
timestamps in the perf.data file header or enable build-id processing."


2. Currently it uses magic number 10 to limit the number of time slices. 
But it's not good. The best thing would be to have no such limitations, 
i.e. to lift this limitation from the code and docs.


3. 'perf report --time 10%' should be equivalent to 'perf report --time 
10%/1'. Or at least say something like:


   "percent slices need an index to specify which one is wanted" when 
you notice a % in the --time string, etc.


4. Add an indication of what percentage ranges are being used. For example,

Samples: 128  of event 'cycles:ppp', Event count (approx.): 21386169 (10%/1)

I will develop a follow-up patchset to improve this feature.

Thanks
Jin Yao


[PATCH v1 0/8] perf: Follow-up patches to improve time slice

2018-01-09 Thread Jin Yao
It's follow-up patches to improve the perf time slice feature
(perf report/script --time xxx)

1. Improve the error message
   perf report: Improve error msg when no first/last sample time found
   perf script: Improve error msg when no first/last sample time found

2. Fix an issue that illegal percent was accepted previously (e.g. 1abc%)
   perf util: Improve error checking for time percent input

3. Omit the slice index if possible. For example,
   perf report --stdio --time 10%/1 is equivalent to
   perf report --stdio --time 10%

   perf util: Support no index time percent slice 

4. Add indication of time slices in perf report header.
   perf report: Add an indication of what time slices are used

5. Remove the time slices number limitation in perf report/script
   perf util: Allocate time slices buffer according to number of comma
   perf report: Remove the time slices number limitation
   perf script: Remove the time slices number limitation

Jin Yao (8):
  perf report: Improve error msg when no first/last sample time found
  perf script: Improve error msg when no first/last sample time found
  perf util: Improve error checking for time percent input
  perf util: Support no index time percent slice
  perf report: Add an indication of what time slices are used
  perf util: Allocate time slices buffer according to number of comma
  perf report: Remove the time slices number limitation
  perf script: Remove the time slices number limitation

 tools/perf/Documentation/perf-report.txt |  2 +-
 tools/perf/Documentation/perf-script.txt | 10 ++---
 tools/perf/builtin-report.c  | 30 +
 tools/perf/builtin-script.c  | 21 +++---
 tools/perf/util/time-utils.c | 72 +++-
 tools/perf/util/time-utils.h |  2 +
 6 files changed, 117 insertions(+), 20 deletions(-)

-- 
2.7.4



[PATCH v1 1/8] perf report: Improve error msg when no first/last sample time found

2018-01-09 Thread Jin Yao
The following message will be returned to user when executing
'perf report --time' if perf data file doesn't contain the
first/last sample time.

"HINT: no first/last sample time found in perf data.
 Please use latest perf binary to execute 'perf record'
 (if '--buildid-all' is enabled, needs to set '--timestamp-boundary')."

Signed-off-by: Jin Yao 
---
 tools/perf/builtin-report.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index dd4df9a..a6c5cf2 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -1299,7 +1299,9 @@ int cmd_report(int argc, const char **argv)
if (perf_time__parse_str(report.ptime_range, report.time_str) != 0) {
if (session->evlist->first_sample_time == 0 &&
session->evlist->last_sample_time == 0) {
-   pr_err("No first/last sample time in perf data\n");
+   pr_err("HINT: no first/last sample time found in perf 
data.\n"
+  "Please use latest perf binary to execute 'perf 
record'\n"
+  "(if '--buildid-all' is enabled, please set 
'--timestamp-boundary').\n");
return -EINVAL;
}
 
-- 
2.7.4



[PATCH v1 5/8] perf report: Add an indication of what time slices are used

2018-01-09 Thread Jin Yao
Add a time slices indication to the perf report header.

For example,

  # perf report --stdio --time 10%

  # Total Lost Samples: 0
  #
  # Samples: 9K of event 'cycles:ppp' (time slices: 10%)
  # Event count (approx.): 8951288803

Signed-off-by: Jin Yao 
---
 tools/perf/builtin-report.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index a6c5cf2..77c954c 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -403,6 +403,9 @@ static size_t hists__fprintf_nr_sample_events(struct hists 
*hists, struct report
if (evname != NULL)
ret += fprintf(fp, " of event '%s'", evname);
 
+   if (rep->time_str)
+   ret += fprintf(fp, " (time slices: %s)", rep->time_str);
+
if (symbol_conf.show_ref_callgraph &&
strstr(evname, "call-graph=no")) {
ret += fprintf(fp, ", show reference callgraph");
-- 
2.7.4



[PATCH v1 7/8] perf report: Remove the time slices number limitation

2018-01-09 Thread Jin Yao
Previously it was only allowed to use at most 10 time slices
in 'perf report --time'.

This patch removes this limitation.
For example, following command line is OK (12 time slices)

perf report --stdio --time 
1%/1,1%/2,1%/3,1%/4,1%/5,1%/6,1%/7,1%/8,1%/9,1%/10,1%/11,1%/12

Signed-off-by: Jin Yao 
---
 tools/perf/Documentation/perf-report.txt |  2 +-
 tools/perf/builtin-report.c  | 23 +--
 2 files changed, 18 insertions(+), 7 deletions(-)

diff --git a/tools/perf/Documentation/perf-report.txt 
b/tools/perf/Documentation/perf-report.txt
index 5522ce0..1940c4f 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -403,7 +403,7 @@ OPTIONS
to end of file.
 
Also support time percent with multiple time range. Time string is
-   'a%/n,b%/m,...' or 'a%-b%,c%-%d,...'. The maximum number of slices is 
10.
+   'a%/n,b%/m,...' or 'a%-b%,c%-%d,...'.
 
For example:
Select the second 10% time slice:
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 77c954c..fe89021 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -54,8 +54,6 @@
 #include 
 #include 
 
-#define PTIME_RANGE_MAX10
-
 struct report {
struct perf_tooltool;
struct perf_session *session;
@@ -76,7 +74,8 @@ struct report {
const char  *cpu_list;
const char  *symbol_filter_str;
const char  *time_str;
-   struct perf_time_interval ptime_range[PTIME_RANGE_MAX];
+   struct perf_time_interval *ptime_range;
+   int range_size;
int range_num;
float   min_percent;
u64 nr_entries;
@@ -1299,24 +1298,33 @@ int cmd_report(int argc, const char **argv)
if (symbol__init(&session->header.env) < 0)
goto error;
 
+   report.ptime_range = perf_time__range_alloc(report.time_str,
+   &report.range_size);
+   if (!report.ptime_range) {
+   ret = -ENOMEM;
+   goto error;
+   }
+
if (perf_time__parse_str(report.ptime_range, report.time_str) != 0) {
if (session->evlist->first_sample_time == 0 &&
session->evlist->last_sample_time == 0) {
pr_err("HINT: no first/last sample time found in perf 
data.\n"
   "Please use latest perf binary to execute 'perf 
record'\n"
   "(if '--buildid-all' is enabled, please set 
'--timestamp-boundary').\n");
-   return -EINVAL;
+   ret = -EINVAL;
+   goto error;
}
 
report.range_num = perf_time__percent_parse_str(
-   report.ptime_range, PTIME_RANGE_MAX,
+   report.ptime_range, report.range_size,
report.time_str,
session->evlist->first_sample_time,
session->evlist->last_sample_time);
 
if (report.range_num < 0) {
pr_err("Invalid time string\n");
-   return -EINVAL;
+   ret = -EINVAL;
+   goto error;
}
} else {
report.range_num = 1;
@@ -1332,6 +1340,9 @@ int cmd_report(int argc, const char **argv)
ret = 0;
 
 error:
+   if (report.ptime_range)
+   free(report.ptime_range);
+
perf_session__delete(session);
return ret;
 }
-- 
2.7.4



[PATCH v1 3/8] perf util: Improve error checking for time percent input

2018-01-09 Thread Jin Yao
The command line like 'perf report --stdio --time 1abc%/1' could be
accepted by perf. It looks not very good.

This patch uses strtod() to replace original atof() and check the
entire string. Now for the same command line, it would return error
message "Invalid time string".

root@skl:/tmp# perf report --stdio --time 1abc%/1
Invalid time string

Signed-off-by: Jin Yao 
---
 tools/perf/util/time-utils.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/time-utils.c b/tools/perf/util/time-utils.c
index 3f7f18f..88510ab 100644
--- a/tools/perf/util/time-utils.c
+++ b/tools/perf/util/time-utils.c
@@ -116,7 +116,8 @@ int perf_time__parse_str(struct perf_time_interval *ptime, 
const char *ostr)
 
 static int parse_percent(double *pcnt, char *str)
 {
-   char *c;
+   char *c, *endptr;
+   double d;
 
c = strchr(str, '%');
if (c)
@@ -124,8 +125,11 @@ static int parse_percent(double *pcnt, char *str)
else
return -1;
 
-   *pcnt = atof(str) / 100.0;
+   d = strtod(str, &endptr);
+   if (endptr != str + strlen(str))
+   return -1;
 
+   *pcnt = d / 100.0;
return 0;
 }
 
-- 
2.7.4



[PATCH v1 8/8] perf script: Remove the time slices number limitation

2018-01-09 Thread Jin Yao
Previously it was only allowed to use at most 10 time slices
in 'perf script --time'.

This patch removes this limitation.
For example, following command line is OK (12 time slices)

perf script --time 
1%/1,1%/2,1%/3,1%/4,1%/5,1%/6,1%/7,1%/8,1%/9,1%/10,1%/11,1%/12

Signed-off-by: Jin Yao 
---
 tools/perf/Documentation/perf-script.txt | 10 +-
 tools/perf/builtin-script.c  | 17 +
 2 files changed, 18 insertions(+), 9 deletions(-)

diff --git a/tools/perf/Documentation/perf-script.txt 
b/tools/perf/Documentation/perf-script.txt
index 806ec63..7730c1d 100644
--- a/tools/perf/Documentation/perf-script.txt
+++ b/tools/perf/Documentation/perf-script.txt
@@ -351,19 +351,19 @@ include::itrace.txt[]
to end of file.
 
Also support time percent with multipe time range. Time string is
-   'a%/n,b%/m,...' or 'a%-b%,c%-%d,...'. The maximum number of slices is 
10.
+   'a%/n,b%/m,...' or 'a%-b%,c%-%d,...'.
 
For example:
-   Select the second 10% time slice
+   Select the second 10% time slice:
perf script --time 10%/2
 
-   Select from 0% to 10% time slice
+   Select from 0% to 10% time slice:
perf script --time 0%-10%
 
-   Select the first and second 10% time slices
+   Select the first and second 10% time slices:
perf script --time 10%/1,10%/2
 
-   Select from 0% to 10% and 30% to 40% slices
+   Select from 0% to 10% and 30% to 40% slices:
perf script --time 0%-10%,30%-40%
 
 --max-blocks::
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 4f691af..f116a31 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -1480,8 +1480,6 @@ static int perf_sample__fprintf_synth(struct perf_sample 
*sample,
return 0;
 }
 
-#define PTIME_RANGE_MAX10
-
 struct perf_script {
struct perf_tooltool;
struct perf_session *session;
@@ -1496,7 +1494,8 @@ struct perf_script {
struct thread_map   *threads;
int name_width;
const char  *time_str;
-   struct perf_time_interval ptime_range[PTIME_RANGE_MAX];
+   struct perf_time_interval *ptime_range;
+   int range_size;
int range_num;
 };
 
@@ -3444,6 +3443,13 @@ int cmd_script(int argc, const char **argv)
if (err < 0)
goto out_delete;
 
+   script.ptime_range = perf_time__range_alloc(script.time_str,
+   &script.range_size);
+   if (!script.ptime_range) {
+   err = -ENOMEM;
+   goto out_delete;
+   }
+
/* needs to be parsed after looking up reference time */
if (perf_time__parse_str(script.ptime_range, script.time_str) != 0) {
if (session->evlist->first_sample_time == 0 &&
@@ -3456,7 +3462,7 @@ int cmd_script(int argc, const char **argv)
}
 
script.range_num = perf_time__percent_parse_str(
-   script.ptime_range, PTIME_RANGE_MAX,
+   script.ptime_range, script.range_size,
script.time_str,
session->evlist->first_sample_time,
session->evlist->last_sample_time);
@@ -3475,6 +3481,9 @@ int cmd_script(int argc, const char **argv)
flush_scripting();
 
 out_delete:
+   if (script.ptime_range)
+   free(script.ptime_range);
+
perf_evlist__free_stats(session->evlist);
perf_session__delete(session);
 
-- 
2.7.4



[PATCH v1 4/8] perf util: Support no index time percent slice

2018-01-09 Thread Jin Yao
Previously, the time percent slice needs an index to specify
which one the user wants.

While it may be easy for using if the index can be omitted.
So with this patch, for example,

perf report --stdio --time 10%/1 should be equivalent to
perf report --stdio --time 10%

Signed-off-by: Jin Yao 
---
 tools/perf/util/time-utils.c | 36 
 1 file changed, 36 insertions(+)

diff --git a/tools/perf/util/time-utils.c b/tools/perf/util/time-utils.c
index 88510ab..5769f97 100644
--- a/tools/perf/util/time-utils.c
+++ b/tools/perf/util/time-utils.c
@@ -261,6 +261,37 @@ static int percent_comma_split(struct perf_time_interval 
*ptime_buf, int num,
return i;
 }
 
+static int one_percent_convert(struct perf_time_interval *ptime_buf,
+  const char *ostr, u64 start, u64 end, char *c)
+{
+   char *str;
+   int len = strlen(ostr), ret;
+
+   /*
+* c points to '%'.
+* '%' should be the last character
+*/
+   if (ostr + len - 1 != c)
+   return -1;
+
+   /*
+* Construct a string like "xx%/1"
+*/
+   str = malloc(len + 3);
+   if (str == NULL)
+   return -ENOMEM;
+
+   memcpy(str, ostr, len);
+   strcpy(str + len, "/1");
+
+   ret = percent_slash_split(str, ptime_buf, start, end);
+   if (ret == 0)
+   ret = 1;
+
+   free(str);
+   return ret;
+}
+
 int perf_time__percent_parse_str(struct perf_time_interval *ptime_buf, int num,
 const char *ostr, u64 start, u64 end)
 {
@@ -270,6 +301,7 @@ int perf_time__percent_parse_str(struct perf_time_interval 
*ptime_buf, int num,
 * ostr example:
 * 10%/2,10%/3: select the second 10% slice and the third 10% slice
 * 0%-10%,30%-40%: multiple time range
+* 50%: just one percent
 */
 
memset(ptime_buf, 0, sizeof(*ptime_buf) * num);
@@ -286,6 +318,10 @@ int perf_time__percent_parse_str(struct perf_time_interval 
*ptime_buf, int num,
   end, percent_dash_split);
}
 
+   c = strchr(ostr, '%');
+   if (c)
+   return one_percent_convert(ptime_buf, ostr, start, end, c);
+
return -1;
 }
 
-- 
2.7.4



[PATCH v1 2/8] perf script: Improve error msg when no first/last sample time found

2018-01-09 Thread Jin Yao
The following message will be returned to user when executing
'perf script --time' if perf data file doesn't contain the
first/last sample time.

"HINT: no first/last sample time found in perf data.
 Please use latest perf binary to execute 'perf record'
 (if '--buildid-all' is enabled, needs to set '--timestamp-boundary')."

Signed-off-by: Jin Yao 
---
 tools/perf/builtin-script.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index c1cce47..4f691af 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -3448,7 +3448,9 @@ int cmd_script(int argc, const char **argv)
if (perf_time__parse_str(script.ptime_range, script.time_str) != 0) {
if (session->evlist->first_sample_time == 0 &&
session->evlist->last_sample_time == 0) {
-   pr_err("No first/last sample time in perf data\n");
+   pr_err("HINT: no first/last sample time found in perf 
data.\n"
+  "Please use latest perf binary to execute 'perf 
record'\n"
+  "(if '--buildid-all' is enabled, please set 
'--timestamp-boundary').\n");
err = -EINVAL;
goto out_delete;
}
-- 
2.7.4



[PATCH v1 6/8] perf util: Allocate time slices buffer according to number of comma

2018-01-09 Thread Jin Yao
Previously we use a magic number 10 to limit the number of
time slices. It's not very good.

This patch creates a new function perf_time__range_alloc()
to allocate time slices buffer. The number of buffer entries is
determined by the number of comma in string but at least it will
allocate one entry even if no comma is found.

Signed-off-by: Jin Yao 
---
 tools/perf/util/time-utils.c | 28 
 tools/perf/util/time-utils.h |  2 ++
 2 files changed, 30 insertions(+)

diff --git a/tools/perf/util/time-utils.c b/tools/perf/util/time-utils.c
index 5769f97..6193b46 100644
--- a/tools/perf/util/time-utils.c
+++ b/tools/perf/util/time-utils.c
@@ -325,6 +325,34 @@ int perf_time__percent_parse_str(struct perf_time_interval 
*ptime_buf, int num,
return -1;
 }
 
+struct perf_time_interval *perf_time__range_alloc(const char *ostr, int *size)
+{
+   const char *p1, *p2;
+   int i = 1;
+   struct perf_time_interval *ptime;
+
+   /*
+* At least allocate one time range.
+*/
+   if (!ostr)
+   goto alloc;
+
+   p1 = ostr;
+   while (p1 < ostr + strlen(ostr)) {
+   p2 = strchr(p1, ',');
+   if (!p2)
+   break;
+
+   p1 = p2 + 1;
+   i++;
+   }
+
+alloc:
+   *size = i;
+   ptime = calloc(i, sizeof(*ptime));
+   return ptime;
+}
+
 bool perf_time__skip_sample(struct perf_time_interval *ptime, u64 timestamp)
 {
/* if time is not set don't drop sample */
diff --git a/tools/perf/util/time-utils.h b/tools/perf/util/time-utils.h
index 34d5eba..70b177d 100644
--- a/tools/perf/util/time-utils.h
+++ b/tools/perf/util/time-utils.h
@@ -16,6 +16,8 @@ int perf_time__parse_str(struct perf_time_interval *ptime, 
const char *ostr);
 int perf_time__percent_parse_str(struct perf_time_interval *ptime_buf, int num,
 const char *ostr, u64 start, u64 end);
 
+struct perf_time_interval *perf_time__range_alloc(const char *ostr, int *size);
+
 bool perf_time__skip_sample(struct perf_time_interval *ptime, u64 timestamp);
 
 bool perf_time__ranges_skip_sample(struct perf_time_interval *ptime_buf,
-- 
2.7.4



[PATCH] perf stat: Ignore error thread when enabling system-wide --per-thread

2018-01-15 Thread Jin Yao
If we execute 'perf stat --per-thread' with non-root account
(even set kernel.perf_event_paranoid = -1 yet), it reports the error:

jinyao@skl:~$ perf stat --per-thread
Error:
You may not have permission to collect system-wide stats.

Consider tweaking /proc/sys/kernel/perf_event_paranoid,
which controls use of the performance events system by
unprivileged users (without CAP_SYS_ADMIN).

The current value is 2:

  -1: Allow use of (almost) all events by all users
  Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK
>= 0: Disallow ftrace function tracepoint by users without CAP_SYS_ADMIN
  Disallow raw tracepoint access by users without CAP_SYS_ADMIN
>= 1: Disallow CPU event access by users without CAP_SYS_ADMIN
>= 2: Disallow kernel profiling by users without CAP_SYS_ADMIN

To make this setting permanent, edit /etc/sysctl.conf too, e.g.:

kernel.perf_event_paranoid = -1

Perhaps the ptrace rule doesn't allow to trace some processes. But anyway
the global --per-thread mode had better ignore such errors and continue
working on other threads.

This patch will record the index of error thread in perf_evsel__open()
and remove this thread before retrying.

For example (run with non-root, kernel.perf_event_paranoid isn't set):

jinyao@skl:~$ perf stat --per-thread
^C
 Performance counter stats for 'system wide':

  vmstat-3458   6.171984  cpu-clock:u (msec)#   
 0.000 CPUs utilized
perf-3670   0.515599  cpu-clock:u (msec)#   
 0.000 CPUs utilized
  vmstat-3458  1,163,643  cycles:u  #   
 0.189 GHz
perf-3670 40,881  cycles:u  #   
 0.079 GHz
  vmstat-3458  1,410,238  instructions:u#   
 1.21  insn per cycle
perf-3670  3,536  instructions:u#   
 0.09  insn per cycle
  vmstat-3458288,937  branches:u#   
46.814 M/sec
perf-3670936  branches:u#   
 1.815 M/sec
  vmstat-3458 15,195  branch-misses:u   #   
 5.26% of all branches
perf-3670 76  branch-misses:u   #   
 8.12% of all branches

  12.651675247 seconds time elapsed

Signed-off-by: Jin Yao 
---
 tools/perf/builtin-stat.c| 14 +-
 tools/perf/util/evsel.c  |  3 +++
 tools/perf/util/thread_map.c |  1 +
 tools/perf/util/thread_map.h |  1 +
 4 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 98bf9d3..bcdb47c 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -632,7 +632,19 @@ static int __run_perf_stat(int argc, const char **argv)
 if (verbose > 0)
 ui__warning("%s\n", msg);
 goto try_again;
-}
+   } else if (target__has_per_thread(&target) &&
+  evsel_list->threads &&
+  evsel_list->threads->err_thread != -1) {
+   /*
+* For global --per-thread case, skip current
+* error thread.
+*/
+   if (!thread_map__remove(evsel_list->threads,
+   
evsel_list->threads->err_thread)) {
+   evsel_list->threads->err_thread = -1;
+   goto try_again;
+   }
+   }
 
perf_evsel__open_strerror(counter, &target,
  errno, msg, sizeof(msg));
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index a4d256e..12f8733 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1899,6 +1899,9 @@ int perf_evsel__open(struct perf_evsel *evsel, struct 
cpu_map *cpus,
goto fallback_missing_features;
}
 out_close:
+   if (err)
+   threads->err_thread = thread;
+
do {
while (--thread >= 0) {
close(FD(evsel, cpu, thread));
diff --git a/tools/perf/util/thread_map.c b/tools/perf/util/thread_map.c
index 3e1038f..870cb0c 100644
--- a/tools/perf/util/thread_map.c
+++ b/tools/perf/util/thread_map.c
@@ -32,6 +32,7 @@ static void thread_map__reset(struct thread_map *map, int 
start, int nr)
size_t size = (nr - start) * sizeof(map->map[0]);
 
memset(&map->map[start], 0, size);
+   map->err_thread = -1;
 }
 
 static s

Re: [PATCH v1 0/8] perf: Follow-up patches to improve time slice

2018-01-16 Thread Jin, Yao



On 1/16/2018 7:55 PM, Jiri Olsa wrote:

On Wed, Jan 10, 2018 at 11:00:25PM +0800, Jin Yao wrote:

It's follow-up patches to improve the perf time slice feature
(perf report/script --time xxx)

1. Improve the error message
perf report: Improve error msg when no first/last sample time found
perf script: Improve error msg when no first/last sample time found

2. Fix an issue that illegal percent was accepted previously (e.g. 1abc%)
perf util: Improve error checking for time percent input

3. Omit the slice index if possible. For example,
perf report --stdio --time 10%/1 is equivalent to
perf report --stdio --time 10%

perf util: Support no index time percent slice

4. Add indication of time slices in perf report header.
perf report: Add an indication of what time slices are used

5. Remove the time slices number limitation in perf report/script
perf util: Allocate time slices buffer according to number of comma
perf report: Remove the time slices number limitation
perf script: Remove the time slices number limitation

Jin Yao (8):
   perf report: Improve error msg when no first/last sample time found
   perf script: Improve error msg when no first/last sample time found
   perf util: Improve error checking for time percent input
   perf util: Support no index time percent slice
   perf report: Add an indication of what time slices are used
   perf util: Allocate time slices buffer according to number of comma
   perf report: Remove the time slices number limitation
   perf script: Remove the time slices number limitation


from quick look it looks ok

Reviewed-by: Jiri Olsa 

thanks,
jirka



Hi Jiri,

Thanks so much for reviewing the patch.

Thanks
Jin Yao


Re: [PATCH] perf stat: Ignore error thread when enabling system-wide --per-thread

2018-01-16 Thread Jin, Yao

Just tested. But looks it's not OK for '--per-thread' case.

jinyao@skl:~/$ ./perf stat --per-thread
WARNING: Ignored open failure for pid 1
WARNING: Ignored open failure for pid 2
WARNING: Ignored open failure for pid 4
WARNING: Ignored open failure for pid 6
WARNING: Ignored open failure for pid 7
WARNING: Ignored open failure for pid 8
WARNING: Ignored open failure for pid 9
WARNING: Ignored open failure for pid 10
..
WARNING: Ignored open failure for pid 11783
WARNING: Ignored open failure for pid 12275
WARNING: Ignored open failure for pid 31733
WARNING: Ignored open failure for pid 31737
Error:
You may not have permission to collect system-wide stats.

Consider tweaking /proc/sys/kernel/perf_event_paranoid,
which controls use of the performance events system by
unprivileged users (without CAP_SYS_ADMIN).

The current value is 2:

  -1: Allow use of (almost) all events by all users
  Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK
>= 0: Disallow ftrace function tracepoint by users without CAP_SYS_ADMIN
  Disallow raw tracepoint access by users without CAP_SYS_ADMIN
>= 1: Disallow CPU event access by users without CAP_SYS_ADMIN
>= 2: Disallow kernel profiling by users without CAP_SYS_ADMIN

To make this setting permanent, edit /etc/sysctl.conf too, e.g.:

kernel.perf_event_paranoid = -1

Thanks
Jin Yao

On 1/16/2018 8:51 PM, Jiri Olsa wrote:

On Tue, Jan 16, 2018 at 11:43:08PM +0800, Jin Yao wrote:

If we execute 'perf stat --per-thread' with non-root account
(even set kernel.perf_event_paranoid = -1 yet), it reports the error:

jinyao@skl:~$ perf stat --per-thread
Error:
You may not have permission to collect system-wide stats.

Consider tweaking /proc/sys/kernel/perf_event_paranoid,
which controls use of the performance events system by
unprivileged users (without CAP_SYS_ADMIN).

The current value is 2:

   -1: Allow use of (almost) all events by all users
   Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK

= 0: Disallow ftrace function tracepoint by users without CAP_SYS_ADMIN

   Disallow raw tracepoint access by users without CAP_SYS_ADMIN

= 1: Disallow CPU event access by users without CAP_SYS_ADMIN
= 2: Disallow kernel profiling by users without CAP_SYS_ADMIN


To make this setting permanent, edit /etc/sysctl.conf too, e.g.:

 kernel.perf_event_paranoid = -1

Perhaps the ptrace rule doesn't allow to trace some processes. But anyway
the global --per-thread mode had better ignore such errors and continue
working on other threads.

This patch will record the index of error thread in perf_evsel__open()
and remove this thread before retrying.

For example (run with non-root, kernel.perf_event_paranoid isn't set):

jinyao@skl:~$ perf stat --per-thread
^C
  Performance counter stats for 'system wide':

   vmstat-3458   6.171984  cpu-clock:u (msec)#  
  0.000 CPUs utilized
 perf-3670   0.515599  cpu-clock:u (msec)#  
  0.000 CPUs utilized
   vmstat-3458  1,163,643  cycles:u  #  
  0.189 GHz
 perf-3670 40,881  cycles:u  #  
  0.079 GHz
   vmstat-3458  1,410,238  instructions:u#  
  1.21  insn per cycle
 perf-3670  3,536  instructions:u#  
  0.09  insn per cycle
   vmstat-3458288,937  branches:u#  
 46.814 M/sec
 perf-3670936  branches:u#  
  1.815 M/sec
   vmstat-3458 15,195  branch-misses:u   #  
  5.26% of all branches
 perf-3670 76  branch-misses:u   #  
  8.12% of all branches

   12.651675247 seconds time elapsed


could we use the existing code like in attached patch?

we might also swap following warning for some generic one:
   WARNING: Ignored open failure for pid 28392

jirka


---
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 98bf9d32f222..f9d4d3ad6fc5 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -276,6 +276,9 @@ static int create_perf_stat_counter(struct perf_evsel 
*evsel)
attr->enable_on_exec = 1;
}
  
+	if (stat_config.aggr_mode == AGGR_THREAD)

+   evsel->ignore_missing_thread = true;
+
if (target__has_cpu(&target) && !target__has_per_thread(&target))
return perf_evsel__open_per_cpu(evsel, perf_evsel__cpus(evsel));
  
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c

index 8f971a2301d1..509ee175bc97 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1656,7 +1656,7 @@ static bool ignore_missing_thread(struct perf_evsel 
*evsel,
return false;
  
  	/* The

Re: [PATCH] perf stat: Ignore error thread when enabling system-wide --per-thread

2018-01-16 Thread Jin, Yao



On 1/16/2018 9:17 PM, Jiri Olsa wrote:

On Tue, Jan 16, 2018 at 09:06:09PM +0800, Jin, Yao wrote:

Just tested. But looks it's not OK for '--per-thread' case.


yea, I haven't tested much.. might need soem tweaking,
but my point was that it could be doable on one place
instead of introducing another if possible

jirka



Yes, I understand. It'd better we can put the code to more common places.

Actually I used the similar patch as yours at first. While I found it 
didn't work for the case of system-wide '--per-thread' then I developed 
current one.


Thanks
Jin Yao




[PATCH v1 3/4] perf annotate: Process the new switch flag tui_dump

2018-03-12 Thread Jin Yao
The tui_dump is created as a parameter and it will be finally passed to
symbol__tui_annotate() and be saved to browser.dump.

It's a switch for TUI routines to indicate if TUI output needs to be
dumped to stdio.

Signed-off-by: Jin Yao 
---
 tools/perf/builtin-annotate.c |  2 +-
 tools/perf/ui/browsers/annotate.c | 41 +++
 tools/perf/ui/browsers/hists.c|  2 +-
 tools/perf/util/annotate.h|  6 --
 tools/perf/util/hist.h| 11 +++
 5 files changed, 42 insertions(+), 20 deletions(-)

diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
index 2db5b50..8ba8e2c 100644
--- a/tools/perf/builtin-annotate.c
+++ b/tools/perf/builtin-annotate.c
@@ -334,7 +334,7 @@ static void hists__find_annotations(struct hists *hists,
/* skip missing symbols */
nd = rb_next(nd);
} else if (use_browser == 1) {
-   key = hist_entry__tui_annotate(he, evsel, NULL);
+   key = hist_entry__tui_annotate(he, evsel, NULL, false);
 
switch (key) {
case -1:
diff --git a/tools/perf/ui/browsers/annotate.c 
b/tools/perf/ui/browsers/annotate.c
index 618edf9..bb7229d 100644
--- a/tools/perf/ui/browsers/annotate.c
+++ b/tools/perf/ui/browsers/annotate.c
@@ -122,7 +122,9 @@ static void disasm_line__write(struct disasm_line *dl, 
struct ui_browser *browse
   char *bf, size_t size)
 {
if (dl->ins.ops && dl->ins.ops->scnprintf) {
-   if (ins__is_jump(&dl->ins)) {
+   if (browser->dump) {
+   ui_browser__write_nstring(browser, " ", 2);
+   } else if (ins__is_jump(&dl->ins)) {
bool fwd = dl->ops.target.offset > dl->al.offset;
 
ui_browser__write_graph(browser, fwd ? 
SLSMG_DARROW_CHAR :
@@ -217,7 +219,10 @@ static void annotate_browser__write(struct ui_browser 
*browser, void *entry, int
ui_browser__printf(browser, "%*s ", CYCLES_WIDTH - 1, 
"Cycle");
}
 
-   SLsmg_write_char(' ');
+   if (browser->dump)
+   putchar(' ');
+   else
+   SLsmg_write_char(' ');
 
/* The scroll bar isn't being used */
if (!browser->navkeypressed)
@@ -388,6 +393,9 @@ static unsigned int annotate_browser__refresh(struct 
ui_browser *browser)
int ret = ui_browser__list_head_refresh(browser);
int pcnt_width = annotate_browser__pcnt_width(ab);
 
+   if (browser->dump)
+   return ret;
+
if (annotate_browser__opts.jump_arrows)
annotate_browser__draw_current_jump(browser);
 
@@ -589,7 +597,7 @@ static bool annotate_browser__callq(struct annotate_browser 
*browser,
}
 
pthread_mutex_unlock(¬es->lock);
-   symbol__tui_annotate(dl->ops.target.sym, ms->map, evsel, hbt);
+   symbol__tui_annotate(dl->ops.target.sym, ms->map, evsel, hbt, false);
sym_title(ms->sym, ms->map, title, sizeof(title));
ui_browser__show_title(&browser->b, title);
return true;
@@ -961,7 +969,7 @@ static int annotate_browser__run(struct annotate_browser 
*browser,
 }
 
 int map_symbol__tui_annotate(struct map_symbol *ms, struct perf_evsel *evsel,
-struct hist_browser_timer *hbt)
+struct hist_browser_timer *hbt, bool tui_dump)
 {
/* Set default value for show_total_period and show_nr_samples  */
annotate_browser__opts.show_total_period =
@@ -969,17 +977,19 @@ int map_symbol__tui_annotate(struct map_symbol *ms, 
struct perf_evsel *evsel,
annotate_browser__opts.show_nr_samples =
symbol_conf.show_nr_samples;
 
-   return symbol__tui_annotate(ms->sym, ms->map, evsel, hbt);
+   return symbol__tui_annotate(ms->sym, ms->map, evsel, hbt, tui_dump);
 }
 
 int hist_entry__tui_annotate(struct hist_entry *he, struct perf_evsel *evsel,
-struct hist_browser_timer *hbt)
+struct hist_browser_timer *hbt, bool tui_dump)
 {
/* reset abort key so that it can get Ctrl-C as a key */
-   SLang_reset_tty();
-   SLang_init_tty(0, 0, 0);
+   if (!tui_dump) {
+   SLang_reset_tty();
+   SLang_init_tty(0, 0, 0);
+   }
 
-   return map_symbol__tui_annotate(&he->ms, evsel, hbt);
+   return map_symbol__tui_annotate(&he->ms, evsel, hbt, tui_dump);
 }
 
 
@@ -1100,7 +1110,8 @@ static inline int width_jumps(int n)
 
 int symbol__tui_annotate(struct symbol *sym, struct map *map,
 struct perf_evsel *evsel,
-struct hist_browser_timer *hbt)
+  

[PATCH v1 4/4] perf annotate: Enable the '--tui-dump' mode

2018-03-12 Thread Jin Yao
It creates a new option '--tui-dump' in perf annotate command line.

With this option, for example, the perf annotate output:

$ perf annotate compute_flag --tui-dump

Percent  IPC Cycle

Disassembly of section .text:

00400640 :
compute_flag():
volatile int count;
static unsigned int s_randseed;

__attribute__((noinline))
int compute_flag()
{
 23.00  1.16  sub$0x8,%rsp
int i;

i = rand() % 2;
 23.06  1.16 1callq  rand@plt

return i;
 27.01  3.38  mov%eax,%edx
}
3.38  add$0x8,%rsp
{
int i;

i = rand() % 2;

return i;
3.38  shr$0x1f,%edx
3.38  add%edx,%eax
3.38  and$0x1,%eax
3.38  sub%edx,%eax
}
 26.93  3.38 2retq

Compared to TUI output,

$ perf annotate compute_flag

   │Disassembly of section .text:
   │
   │00400640 
   │compute_flag():
   │volatile int count;
   │static unsigned int s_randseed;
   │
   │__attribute__((noinline))
   │int compute_flag()
   │{
 23.00 │1.16  sub$0x8,%rsp
   │int i;
   │
   │i = rand() % 2;
 23.06 │1.16 1  → callq  rand@plt
   │
   │return i;
 27.01 │3.38  mov%eax,%edx
   │}
   │3.38  add$0x8,%rsp
   │{
   │int i;
   │
   │i = rand() % 2;
   │
   │return i;
   │3.38  shr$0x1f,%edx
   │3.38  add%edx,%eax
   │3.38  and$0x1,%eax
   │3.38  sub%edx,%eax
   │}
 26.93 │3.38 2  ← retq

Signed-off-by: Jin Yao 
---
 tools/perf/Documentation/perf-annotate.txt |  3 +++
 tools/perf/builtin-annotate.c  | 12 ++--
 2 files changed, 13 insertions(+), 2 deletions(-)

diff --git a/tools/perf/Documentation/perf-annotate.txt 
b/tools/perf/Documentation/perf-annotate.txt
index 292809c3..d52ae47 100644
--- a/tools/perf/Documentation/perf-annotate.txt
+++ b/tools/perf/Documentation/perf-annotate.txt
@@ -83,6 +83,9 @@ OPTIONS
 
 --gtk:: Use the GTK interface.
 
+--tui-dump: Use the TUI interface. The TUI output is dumped to stdio
+   interface.
+
 -C::
 --cpu=:: Only report samples for the list of CPUs provided. Multiple CPUs 
can
be provided as a comma-separated list with no space: 0,1. Ranges of
diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
index 8ba8e2c..5110160 100644
--- a/tools/perf/builtin-annotate.c
+++ b/tools/perf/builtin-annotate.c
@@ -41,6 +41,7 @@ struct perf_annotate {
struct perf_tool tool;
struct perf_session *session;
bool   use_tui, use_stdio, use_gtk;
+   bool   tui_dump;
bool   full_paths;
bool   print_line;
bool   skip_missing;
@@ -334,7 +335,10 @@ static void hists__find_annotations(struct hists *hists,
/* skip missing symbols */
nd = rb_next(nd);
} else if (use_browser == 1) {
-   key = hist_entry__tui_annotate(he, evsel, NULL, false);
+   key = hist_entry__tui_annotate(he, evsel, NULL,
+  ann->tui_dump);
+   if (ann->tui_dump)
+   return;
 
switch (key) {
case -1:
@@ -486,6 +490,7 @@ int cmd_annotate(int argc, const char **argv)
"dump raw trace in ASCII"),
OPT_BOOLEAN(0, "gtk", &annotate.use_gtk, "Use the GTK interface"),
OPT_BOOLEAN(0, "tui", &annotate.use_tui, "Use the TUI interface"),
+   OPT_BOOLEAN(0, "tui-dump", &annotate.tui_dump, "Dump TUI content"),
OPT_BOOLEAN(0, "stdio", &annotate.use_stdio, "Use the stdio interface"),
OPT_STRING('k', "vmlinux", &symbol_conf.vmlinux_name,
   "file", "vmlinux pathname"),
@@ -576,7 +581,10 @@ int cmd_annotat

[PATCH v1 0/4] perf annotate: Create a new '--tui-dump' option

2018-03-12 Thread Jin Yao
There is a requirement to let perf annotate support displaying the IPC/Cycle.
In previous patch, this is supported in TUI mode. While it's not convenient
for users since they have to take screen shots and copy/paste data.

This patch series introduces a new option '--tui-dump' in perf annotate to
dump the TUI output to stdio.

User can easily use the command line like:
'perf annotate --tui-dump > /tmp/log.txt' 
 
For example:
$ perf annotate compute_flag --tui-dump

Percent  IPC Cycle

Disassembly of section .text:

00400640 :
compute_flag():
volatile int count;
static unsigned int s_randseed;

__attribute__((noinline))
int compute_flag()
{
 23.00  1.16  sub$0x8,%rsp
int i;

i = rand() % 2;
 23.06  1.16 1callq  rand@plt

return i;
 27.01  3.38  mov%eax,%edx
}
3.38  add$0x8,%rsp
{
int i;

i = rand() % 2;

return i;
3.38  shr$0x1f,%edx
3.38  add%edx,%eax
3.38  and$0x1,%eax
3.38  sub%edx,%eax
}
 26.93  3.38 2retq

The '--stdio' option is still kept now. Maybe in future, we can only
maintain the TUI routines and drop the lagacy stdio code. But right
now we'd better keep it until the '--tui-dump' option is good enough.

Jin Yao (4):
  perf browser: Add a new 'dump' flag
  perf browser: bypass ui_init if in tui dump mode
  perf annotate: Process the new switch flag tui_dump
  perf annotate: Enable the '--tui-dump' mode

 tools/perf/Documentation/perf-annotate.txt |  3 +++
 tools/perf/builtin-annotate.c  | 12 +++--
 tools/perf/builtin-c2c.c   |  2 +-
 tools/perf/builtin-report.c|  2 +-
 tools/perf/builtin-top.c   |  2 +-
 tools/perf/ui/browser.c| 38 +++
 tools/perf/ui/browser.h|  1 +
 tools/perf/ui/browsers/annotate.c  | 41 +-
 tools/perf/ui/browsers/hists.c |  2 +-
 tools/perf/ui/setup.c  |  9 +--
 tools/perf/ui/ui.h |  2 +-
 tools/perf/util/annotate.h |  6 +++--
 tools/perf/util/hist.h | 11 +---
 13 files changed, 99 insertions(+), 32 deletions(-)

-- 
2.7.4



[PATCH v1 1/4] perf browser: Add a new 'dump' flag

2018-03-12 Thread Jin Yao
We have a new requirement to provide the TUI output option for
non-interactive mode. For example, write the TUI output to stdio directly.

This patch creates a new flag 'dump' in struct ui_browser. Once it's on,
for the formatted buffer, we just print it on stdio.

Signed-off-by: Jin Yao 
---
 tools/perf/ui/browser.c | 38 +-
 tools/perf/ui/browser.h |  1 +
 2 files changed, 34 insertions(+), 5 deletions(-)

diff --git a/tools/perf/ui/browser.c b/tools/perf/ui/browser.c
index 63399af..3534c6c 100644
--- a/tools/perf/ui/browser.c
+++ b/tools/perf/ui/browser.c
@@ -41,8 +41,13 @@ int ui_browser__set_color(struct ui_browser *browser, int 
color)
 void ui_browser__set_percent_color(struct ui_browser *browser,
   double percent, bool current)
 {
-int color = ui_browser__percent_color(browser, percent, current);
-ui_browser__set_color(browser, color);
+   int color;
+
+   if (browser->dump)
+   return;
+
+   color = ui_browser__percent_color(browser, percent, current);
+   ui_browser__set_color(browser, color);
 }
 
 void ui_browser__gotorc(struct ui_browser *browser, int y, int x)
@@ -50,10 +55,27 @@ void ui_browser__gotorc(struct ui_browser *browser, int y, 
int x)
SLsmg_gotorc(browser->y + y, browser->x + x);
 }
 
+static void write_nstring(const char *msg, unsigned int width)
+{
+   unsigned int len = strlen(msg);
+   unsigned int i = 0;
+
+   while (i < len && i < width) {
+   putchar(msg[i]);
+   i++;
+   }
+
+   while (i++ < width)
+   putchar(' ');
+}
+
 void ui_browser__write_nstring(struct ui_browser *browser __maybe_unused, 
const char *msg,
   unsigned int width)
 {
-   slsmg_write_nstring(msg, width);
+   if (browser->dump)
+   write_nstring(msg, width);
+   else
+   slsmg_write_nstring(msg, width);
 }
 
 void ui_browser__printf(struct ui_browser *browser __maybe_unused, const char 
*fmt, ...)
@@ -61,7 +83,10 @@ void ui_browser__printf(struct ui_browser *browser 
__maybe_unused, const char *f
va_list args;
 
va_start(args, fmt);
-   slsmg_vprintf(fmt, args);
+   if (browser->dump)
+   vprintf(fmt, args);
+   else
+   slsmg_vprintf(fmt, args);
va_end(args);
 }
 
@@ -496,7 +521,10 @@ unsigned int ui_browser__list_head_refresh(struct 
ui_browser *browser)
 
list_for_each_from(pos, head) {
if (!browser->filter || !browser->filter(browser, pos)) {
-   ui_browser__gotorc(browser, row, 0);
+   if (!browser->dump)
+   ui_browser__gotorc(browser, row, 0);
+   else if (row > 0)
+   printf("\n");
browser->write(browser, pos, row);
if (++row == browser->rows)
break;
diff --git a/tools/perf/ui/browser.h b/tools/perf/ui/browser.h
index 03e1734..66d9405 100644
--- a/tools/perf/ui/browser.h
+++ b/tools/perf/ui/browser.h
@@ -28,6 +28,7 @@ struct ui_browser {
u32   nr_entries;
bool  navkeypressed;
bool  use_navkeypressed;
+   bool  dump;
 };
 
 int  ui_browser__set_color(struct ui_browser *browser, int color);
-- 
2.7.4



[PATCH v1 2/4] perf browser: bypass ui_init if in tui dump mode

2018-03-12 Thread Jin Yao
We create a tui dump mode in previous patch. In tui dump mode, the output
is written to stdio. We need to bypass ui_init() in setup_browser().

Signed-off-by: Jin Yao 
---
 tools/perf/builtin-annotate.c | 2 +-
 tools/perf/builtin-c2c.c  | 2 +-
 tools/perf/builtin-report.c   | 2 +-
 tools/perf/builtin-top.c  | 2 +-
 tools/perf/ui/setup.c | 9 +++--
 tools/perf/ui/ui.h| 2 +-
 6 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
index ead6ae4..2db5b50 100644
--- a/tools/perf/builtin-annotate.c
+++ b/tools/perf/builtin-annotate.c
@@ -576,7 +576,7 @@ int cmd_annotate(int argc, const char **argv)
else if (annotate.use_gtk)
use_browser = 2;
 
-   setup_browser(true);
+   setup_browser(true, false);
 
if (use_browser == 1 && annotate.has_br_stack) {
sort__mode = SORT_MODE__BRANCH;
diff --git a/tools/perf/builtin-c2c.c b/tools/perf/builtin-c2c.c
index 98d243f..6abc13b 100644
--- a/tools/perf/builtin-c2c.c
+++ b/tools/perf/builtin-c2c.c
@@ -2618,7 +2618,7 @@ static int perf_c2c__report(int argc, const char **argv)
else
use_browser = 1;
 
-   setup_browser(false);
+   setup_browser(false, false);
 
err = perf_session__process_events(session);
if (err) {
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 971ccba..99277b7 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -1261,7 +1261,7 @@ int cmd_report(int argc, const char **argv)
}
 
if (strcmp(input_name, "-") != 0)
-   setup_browser(true);
+   setup_browser(true, false);
else
use_browser = 0;
 
diff --git a/tools/perf/builtin-top.c b/tools/perf/builtin-top.c
index 0a26b56..b7db01a 100644
--- a/tools/perf/builtin-top.c
+++ b/tools/perf/builtin-top.c
@@ -1430,7 +1430,7 @@ int cmd_top(int argc, const char **argv)
else if (top.use_tui)
use_browser = 1;
 
-   setup_browser(false);
+   setup_browser(false, false);
 
if (setup_sorting(top.evlist) < 0) {
if (sort_order)
diff --git a/tools/perf/ui/setup.c b/tools/perf/ui/setup.c
index 44fe824..8219b2e 100644
--- a/tools/perf/ui/setup.c
+++ b/tools/perf/ui/setup.c
@@ -73,9 +73,9 @@ int stdio__config_color(const struct option *opt 
__maybe_unused,
return 0;
 }
 
-void setup_browser(bool fallback_to_pager)
+void setup_browser(bool fallback_to_pager, bool tui_dump)
 {
-   if (use_browser < 2 && (!isatty(1) || dump_trace))
+   if (use_browser < 2 && (!isatty(1) || dump_trace) && !tui_dump)
use_browser = 0;
 
/* default to TUI */
@@ -92,6 +92,11 @@ void setup_browser(bool fallback_to_pager)
/* fall through */
case 1:
use_browser = 1;
+   if (tui_dump) {
+   setup_pager();
+   break;
+   }
+
if (ui__init() == 0)
break;
/* fall through */
diff --git a/tools/perf/ui/ui.h b/tools/perf/ui/ui.h
index 9b6fdf0..020a85b 100644
--- a/tools/perf/ui/ui.h
+++ b/tools/perf/ui/ui.h
@@ -11,7 +11,7 @@ extern void *perf_gtk_handle;
 
 extern int use_browser;
 
-void setup_browser(bool fallback_to_pager);
+void setup_browser(bool fallback_to_pager, bool tui_dump);
 void exit_browser(bool wait_for_ok);
 
 #ifdef HAVE_SLANG_SUPPORT
-- 
2.7.4



[PATCH v1 0/3] Support perf -vv

2018-03-26 Thread Jin Yao
We keep having bug reports that when users build perf on their own,
but they don't install some needed libraries such as libelf,
libbfd/libibery.

The perf can build, but it is missing important functionality. And
users may complain that perf has issue or bug. 

This patch-set support 'perf -vv' which will print the compiled-in
status of libraries. Once users think perf missing some functionality,
it should be very easy for them to check the libraries status.

For example:

$ ./perf -vv
perf version 4.13.rc5.g9b7a81b
 dwarf: [  on ]
dwarf_getlocations: [  on ]
 glibc: [  on ]
  gtk2: [  on ]
  libaudit: [ off ]
libbfd: [  on ]
libelf: [  on ]
   libnuma: [  on ]
numa_num_possible_cpus: [  on ]
   libperl: [  on ]
 libpython: [  on ]
  libslang: [  on ]
 libcrypto: [  on ]
 libunwind: [  on ]
libdw-dwarf-unwind: [  on ]
  zlib: [  on ]
  lzma: [  on ]
 get_cpuid: [  on ]
   bpf: [  on ]

Jin Yao (3):
  perf config: Add -DNO_GLIBC to CFLAGS
  perf version: Print the status of compiled-in libraries
  perf: Support perf -vv

 tools/perf/Makefile.config   |   2 +
 tools/perf/builtin-version.c | 125 +++
 tools/perf/builtin.h |   1 +
 tools/perf/perf.c|   6 +++
 4 files changed, 134 insertions(+)

-- 
2.7.4



[PATCH v1 1/3] perf config: Add -DNO_GLIBC to CFLAGS

2018-03-26 Thread Jin Yao
For most of libraries, in perf.config, they can be recorded with
-DHAVE_XXX or -DNO_XXX in CFLAGS according to if they are compiled-in.
Then C code could know if the library is compiled-in or not.

While for glibc, no existing -DHAVE_XXX or -DNO_XXX.

This patch adds -DNO_GLIBC to CFLAGS.

Signed-off-by: Jin Yao 
---
 tools/perf/Makefile.config | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/perf/Makefile.config b/tools/perf/Makefile.config
index 98ff736..5883dd6 100644
--- a/tools/perf/Makefile.config
+++ b/tools/perf/Makefile.config
@@ -324,6 +324,8 @@ else
   NO_LIBBPF := 1
   NO_JVMTI := 1
 else
+  CFLAGS += -DNO_GLIBC
+
   ifneq ($(filter s% -static%,$(LDFLAGS),),)
 msg := $(error No static glibc found, please install glibc-static);
   else
-- 
2.7.4



[PATCH v1 3/3] perf: Support perf -vv

2018-03-26 Thread Jin Yao
We keep having bug reports that when users build perf on their own,
but they don't install some needed libraries such as libelf,
libbfd/libibery.

The perf can build, but it is missing important functionality.

This patch provides a new option '-vv' which will print the
compiled-in status of libraries.

For example:

$ ./perf -vv
perf version 4.13.rc5.g9b7a81b
 dwarf: [  on ]
dwarf_getlocations: [  on ]
 glibc: [  on ]
  gtk2: [  on ]
  libaudit: [ off ]
libbfd: [  on ]
libelf: [  on ]
   libnuma: [  on ]
numa_num_possible_cpus: [  on ]
   libperl: [  on ]
 libpython: [  on ]
  libslang: [  on ]
 libcrypto: [  on ]
 libunwind: [  on ]
libdw-dwarf-unwind: [  on ]
  zlib: [  on ]
  lzma: [  on ]
 get_cpuid: [  on ]
   bpf: [  on ]

Signed-off-by: Jin Yao 
---
 tools/perf/perf.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/tools/perf/perf.c b/tools/perf/perf.c
index 1b3fc8e..300c83d 100644
--- a/tools/perf/perf.c
+++ b/tools/perf/perf.c
@@ -64,6 +64,7 @@ static struct cmd_struct commands[] = {
{ "top",cmd_top,0 },
{ "annotate",   cmd_annotate,   0 },
{ "version",cmd_version,0 },
+   { "version2",   cmd_version2,   0 },
{ "script", cmd_script, 0 },
{ "sched",  cmd_sched,  0 },
 #ifdef HAVE_LIBELF_SUPPORT
@@ -190,6 +191,11 @@ static int handle_options(const char ***argv, int *argc, 
int *envchanged)
break;
}
 
+   if (!strcmp(cmd, "-vv")) {
+   (*argv)[0] = "--version2";
+   break;
+   }
+
/*
 * Check remaining flags.
 */
-- 
2.7.4



[PATCH v1 2/3] perf version: Print the status of compiled-in libraries

2018-03-26 Thread Jin Yao
This patch checks the values passed by CFLAGS (-DXXX) and then
print the status of libraries.

For example, if HAVE_DWARF_SUPPORT is defined, that means the
library "dwarf" is compiled-in. The patch will print the status
"on" for this library.

Signed-off-by: Jin Yao 
---
 tools/perf/builtin-version.c | 125 +++
 tools/perf/builtin.h |   1 +
 2 files changed, 126 insertions(+)

diff --git a/tools/perf/builtin-version.c b/tools/perf/builtin-version.c
index 37019c5..90a0a7f 100644
--- a/tools/perf/builtin-version.c
+++ b/tools/perf/builtin-version.c
@@ -9,3 +9,128 @@ int cmd_version(int argc __maybe_unused, const char **argv 
__maybe_unused)
printf("perf version %s\n", perf_version_string);
return 0;
 }
+
+static void status_print(const char *name, const char *status)
+{
+   printf("%22s: [ %3s ]\n", name, status);
+}
+
+static void library_status(void)
+{
+#ifdef HAVE_DWARF_SUPPORT
+   status_print("dwarf", "on");
+#else
+   status_print("dwarf", "off");
+#endif
+
+#ifdef HAVE_DWARF_GETLOCATIONS
+   status_print("dwarf_getlocations", "on");
+#else
+   status_print("dwarf_getlocations", "off");
+#endif
+
+#ifdef NO_GLIBC
+   status_print("glibc", "off");
+#else
+   status_print("glibc", "on");
+#endif
+
+#ifdef HAVE_GTK2_SUPPORT
+   status_print("gtk2", "on");
+#else
+   status_print("gtk2", "off");
+#endif
+
+#ifdef HAVE_LIBAUDIT_SUPPORT
+   status_print("libaudit", "on");
+#else
+   status_print("libaudit", "off");
+#endif
+
+#ifdef HAVE_LIBBFD_SUPPORT
+   status_print("libbfd", "on");
+#else
+   status_print("libbfd", "off");
+#endif
+
+#ifdef HAVE_LIBELF_SUPPORT
+   status_print("libelf", "on");
+#else
+   status_print("libelf", "off");
+#endif
+
+#ifdef HAVE_LIBNUMA_SUPPORT
+   status_print("libnuma", "on");
+   status_print("numa_num_possible_cpus", "on");
+#else
+   status_print("libnuma", "off");
+   status_print("numa_num_possible_cpus", "off");
+#endif
+
+#ifdef NO_LIBPERL
+   status_print("libperl", "off");
+#else
+   status_print("libperl", "on");
+#endif
+
+#ifdef NO_LIBPYTHON
+   status_print("libpython", "off");
+#else
+   status_print("libpython", "on");
+#endif
+
+#ifdef HAVE_SLANG_SUPPORT
+   status_print("libslang", "on");
+#else
+   status_print("libslang", "off");
+#endif
+
+#ifdef HAVE_LIBCRYPTO_SUPPORT
+   status_print("libcrypto", "on");
+#else
+   status_print("libcrypto", "off");
+#endif
+
+#ifdef HAVE_LIBUNWIND_SUPPORT
+   status_print("libunwind", "on");
+#else
+   status_print("libunwind", "off");
+#endif
+
+#ifdef HAVE_DWARF_SUPPORT
+   status_print("libdw-dwarf-unwind", "on");
+#else
+   status_print("libdw-dwarf-unwind", "off");
+#endif
+
+#ifdef HAVE_ZLIB_SUPPORT
+   status_print("zlib", "on");
+#else
+   status_print("zlib", "off");
+#endif
+
+#ifdef HAVE_LZMA_SUPPORT
+   status_print("lzma", "on");
+#else
+   status_print("lzma", "off");
+#endif
+
+#ifdef HAVE_AUXTRACE_SUPPORT
+   status_print("get_cpuid", "on");
+#else
+   status_print("get_cpuid", "off");
+#endif
+
+#ifdef HAVE_LIBBPF_SUPPORT
+   status_print("bpf", "on");
+#else
+   status_print("bpf", "off");
+#endif
+}
+
+int cmd_version2(int argc, const char **argv)
+{
+   cmd_version(argc, argv);
+   library_status();
+   return 0;
+}
diff --git a/tools/perf/builtin.h b/tools/perf/builtin.h
index 05745f3..c7508ee 100644
--- a/tools/perf/builtin.h
+++ b/tools/perf/builtin.h
@@ -29,6 +29,7 @@ int cmd_timechart(int argc, const char **argv);
 int cmd_top(int argc, const char **argv);
 int cmd_script(int argc, const char **argv);
 int cmd_version(int argc, const char **argv);
+int cmd_version2(int argc, const char **argv);
 int cmd_probe(int argc, const char **argv);
 int cmd_kmem(int argc, const char **argv);
 int cmd_lock(int argc, const char **argv);
-- 
2.7.4



Re: [PATCH v1 0/3] Support perf -vv

2018-03-26 Thread Jin, Yao



On 3/26/2018 5:07 PM, Jiri Olsa wrote:

On Mon, Mar 26, 2018 at 02:00:31AM -0700, Andi Kleen wrote:

On Tue, Mar 27, 2018 at 12:07:01AM +0800, Jin Yao wrote:

We keep having bug reports that when users build perf on their own,
but they don't install some needed libraries such as libelf,
libbfd/libibery.

The perf can build, but it is missing important functionality. And
users may complain that perf has issue or bug.

This patch-set support 'perf -vv' which will print the compiled-in
status of libraries. Once users think perf missing some functionality,
it should be very easy for them to check the libraries status.


I don't think this solves the problem. How should the user know
that they need to run perf -vv. Also normal users don't know
that libelf is needed for symbols for example.

We need a warning that is visible together with the symbols
and that clearly describes the problem.


IIRC we decided to go *with* the message in the perf report
or other affected command that would suggest to run 'perf -vv'
for more details

jirka



Hi Andi,

For how the users know they should run 'perf -vv', as the discussion 
with Jiri and Arnaldo, the message will be displayed if necessary when 
users perform perf report or other perf commands. That could be 
implemented in a follow-up patch (not in this patch-set).


For let user know what the library they need, maybe we can add more 
information in 'perf -vv', for example, add something like,


"Please install libelf-dev, libelf-devel or elfutils-libelf-devel before 
building perf"


Thanks
Jin Yao



-Andi



For example:

$ ./perf -vv
perf version 4.13.rc5.g9b7a81b
  dwarf: [  on ]
 dwarf_getlocations: [  on ]
  glibc: [  on ]
   gtk2: [  on ]
   libaudit: [ off ]
 libbfd: [  on ]
 libelf: [  on ]
libnuma: [  on ]
numa_num_possible_cpus: [  on ]
libperl: [  on ]
  libpython: [  on ]
   libslang: [  on ]
  libcrypto: [  on ]
  libunwind: [  on ]
 libdw-dwarf-unwind: [  on ]
   zlib: [  on ]
   lzma: [  on ]
          get_cpuid: [  on ]
bpf: [  on ]

Jin Yao (3):
   perf config: Add -DNO_GLIBC to CFLAGS
   perf version: Print the status of compiled-in libraries
   perf: Support perf -vv

  tools/perf/Makefile.config   |   2 +
  tools/perf/builtin-version.c | 125 +++
  tools/perf/builtin.h |   1 +
  tools/perf/perf.c|   6 +++
  4 files changed, 134 insertions(+)

--
2.7.4



Re: [PATCH v1 2/3] perf version: Print the status of compiled-in libraries

2018-03-26 Thread Jin, Yao



On 3/26/2018 5:39 PM, Jiri Olsa wrote:

On Tue, Mar 27, 2018 at 12:07:03AM +0800, Jin Yao wrote:

This patch checks the values passed by CFLAGS (-DXXX) and then
print the status of libraries.

For example, if HAVE_DWARF_SUPPORT is defined, that means the
library "dwarf" is compiled-in. The patch will print the status
"on" for this library.

Signed-off-by: Jin Yao 
---
  tools/perf/builtin-version.c | 125 +++
  tools/perf/builtin.h |   1 +
  2 files changed, 126 insertions(+)

diff --git a/tools/perf/builtin-version.c b/tools/perf/builtin-version.c
index 37019c5..90a0a7f 100644
--- a/tools/perf/builtin-version.c
+++ b/tools/perf/builtin-version.c
@@ -9,3 +9,128 @@ int cmd_version(int argc __maybe_unused, const char **argv 
__maybe_unused)
printf("perf version %s\n", perf_version_string);
return 0;
  }
+
+static void status_print(const char *name, const char *status)
+{
+   printf("%22s: [ %3s ]\n", name, status);
+}
+
+static void library_status(void)
+{
+#ifdef HAVE_DWARF_SUPPORT
+   status_print("dwarf", "on");
+#else
+   status_print("dwarf", "off");
+#endif


could this and all those below be in some generic macro?

#define STATUS(__d, __m)\
#ifdef __d  \
status_print(#__m, "on"); \
#else   \
status_print(#__m, "OFF");\
#endif

STATUS(HAVE_DWARF_SUPPORT, dwarf)


also please print the 'OFF' and use colors as in the build message



Fine, thanks. I will update the code.


+
+#ifdef HAVE_DWARF_GETLOCATIONS
+   status_print("dwarf_getlocations", "on");
+#else
+   status_print("dwarf_getlocations", "off");
+#endif
+
+#ifdef NO_GLIBC
+   status_print("glibc", "off");
+#else
+   status_print("glibc", "on");
+#endif
+
+#ifdef HAVE_GTK2_SUPPORT
+   status_print("gtk2", "on");
+#else
+   status_print("gtk2", "off");
+#endif
+


SNIP


+}
+
+int cmd_version2(int argc, const char **argv)
+{
+   cmd_version(argc, argv);
+   library_status();
+   return 0;
+}
diff --git a/tools/perf/builtin.h b/tools/perf/builtin.h
index 05745f3..c7508ee 100644
--- a/tools/perf/builtin.h
+++ b/tools/perf/builtin.h
@@ -29,6 +29,7 @@ int cmd_timechart(int argc, const char **argv);
  int cmd_top(int argc, const char **argv);
  int cmd_script(int argc, const char **argv);
  int cmd_version(int argc, const char **argv);
+int cmd_version2(int argc, const char **argv);


please don't add new command for this, handle this in cmd_version



Yeah, I know that, adding a new command is not a good idea.

But how can I pass a new parameter to cmd_version to indicate it's the 
-vv case? Use argv[1]? But looks not good since argc is 1.


Any idea about that?

Thanks
Jin Yao


thanks,
jirka



Re: [PATCH v1 2/3] perf version: Print the status of compiled-in libraries

2018-03-26 Thread Jin, Yao



On 3/26/2018 5:39 PM, Jiri Olsa wrote:

On Tue, Mar 27, 2018 at 12:07:03AM +0800, Jin Yao wrote:

This patch checks the values passed by CFLAGS (-DXXX) and then
print the status of libraries.

For example, if HAVE_DWARF_SUPPORT is defined, that means the
library "dwarf" is compiled-in. The patch will print the status
"on" for this library.

Signed-off-by: Jin Yao 
---
  tools/perf/builtin-version.c | 125 +++
  tools/perf/builtin.h |   1 +
  2 files changed, 126 insertions(+)

diff --git a/tools/perf/builtin-version.c b/tools/perf/builtin-version.c
index 37019c5..90a0a7f 100644
--- a/tools/perf/builtin-version.c
+++ b/tools/perf/builtin-version.c
@@ -9,3 +9,128 @@ int cmd_version(int argc __maybe_unused, const char **argv 
__maybe_unused)
printf("perf version %s\n", perf_version_string);
return 0;
  }
+
+static void status_print(const char *name, const char *status)
+{
+   printf("%22s: [ %3s ]\n", name, status);
+}
+
+static void library_status(void)
+{
+#ifdef HAVE_DWARF_SUPPORT
+   status_print("dwarf", "on");
+#else
+   status_print("dwarf", "off");
+#endif


could this and all those below be in some generic macro?

#define STATUS(__d, __m)\
#ifdef __d  \
status_print(#__m, "on"); \
#else   \
status_print(#__m, "OFF");\
#endif

STATUS(HAVE_DWARF_SUPPORT, dwarf)




Hi Jiri,

I have tried this macro definition, but unfortunately the compilation is 
failed.


error: '#' is not followed by a macro parameter
 #define STATUS(__d, __m)  \

I just guess the '#' in #ifdef confuses the gcc. Looks we can't define 
#ifdef/#endif block in a macro.


Thanks
Jin Yao


also please print the 'OFF' and use colors as in the build message


+
+#ifdef HAVE_DWARF_GETLOCATIONS
+   status_print("dwarf_getlocations", "on");
+#else
+   status_print("dwarf_getlocations", "off");
+#endif
+
+#ifdef NO_GLIBC
+   status_print("glibc", "off");
+#else
+   status_print("glibc", "on");
+#endif
+
+#ifdef HAVE_GTK2_SUPPORT
+   status_print("gtk2", "on");
+#else
+   status_print("gtk2", "off");
+#endif
+


SNIP


+}
+
+int cmd_version2(int argc, const char **argv)
+{
+   cmd_version(argc, argv);
+   library_status();
+   return 0;
+}
diff --git a/tools/perf/builtin.h b/tools/perf/builtin.h
index 05745f3..c7508ee 100644
--- a/tools/perf/builtin.h
+++ b/tools/perf/builtin.h
@@ -29,6 +29,7 @@ int cmd_timechart(int argc, const char **argv);
  int cmd_top(int argc, const char **argv);
  int cmd_script(int argc, const char **argv);
  int cmd_version(int argc, const char **argv);
+int cmd_version2(int argc, const char **argv);


please don't add new command for this, handle this in cmd_version

thanks,
jirka



Re: [PATCH v1 2/3] perf version: Print the status of compiled-in libraries

2018-03-26 Thread Jin, Yao



On 3/26/2018 9:51 PM, Jin, Yao wrote:



On 3/26/2018 5:39 PM, Jiri Olsa wrote:

On Tue, Mar 27, 2018 at 12:07:03AM +0800, Jin Yao wrote:

This patch checks the values passed by CFLAGS (-DXXX) and then
print the status of libraries.

For example, if HAVE_DWARF_SUPPORT is defined, that means the
library "dwarf" is compiled-in. The patch will print the status
"on" for this library.

Signed-off-by: Jin Yao 
---
  tools/perf/builtin-version.c | 125 
+++

  tools/perf/builtin.h |   1 +
  2 files changed, 126 insertions(+)

diff --git a/tools/perf/builtin-version.c b/tools/perf/builtin-version.c
index 37019c5..90a0a7f 100644
--- a/tools/perf/builtin-version.c
+++ b/tools/perf/builtin-version.c
@@ -9,3 +9,128 @@ int cmd_version(int argc __maybe_unused, const char 
**argv __maybe_unused)

  printf("perf version %s\n", perf_version_string);
  return 0;
  }
+
+static void status_print(const char *name, const char *status)
+{
+    printf("%22s: [ %3s ]\n", name, status);
+}
+
+static void library_status(void)
+{
+#ifdef HAVE_DWARF_SUPPORT
+    status_print("dwarf", "on");
+#else
+    status_print("dwarf", "off");
+#endif


could this and all those below be in some generic macro?

#define STATUS(__d, __m)    \
#ifdef __d \
status_print(#__m, "on");    \
#else    \
status_print(#__m, "OFF");    \
#endif

STATUS(HAVE_DWARF_SUPPORT, dwarf)


also please print the 'OFF' and use colors as in the build message



Fine, thanks. I will update the code.


+
+#ifdef HAVE_DWARF_GETLOCATIONS
+    status_print("dwarf_getlocations", "on");
+#else
+    status_print("dwarf_getlocations", "off");
+#endif
+
+#ifdef NO_GLIBC
+    status_print("glibc", "off");
+#else
+    status_print("glibc", "on");
+#endif
+
+#ifdef HAVE_GTK2_SUPPORT
+    status_print("gtk2", "on");
+#else
+    status_print("gtk2", "off");
+#endif
+


SNIP


+}
+
+int cmd_version2(int argc, const char **argv)
+{
+    cmd_version(argc, argv);
+    library_status();
+    return 0;
+}
diff --git a/tools/perf/builtin.h b/tools/perf/builtin.h
index 05745f3..c7508ee 100644
--- a/tools/perf/builtin.h
+++ b/tools/perf/builtin.h
@@ -29,6 +29,7 @@ int cmd_timechart(int argc, const char **argv);
  int cmd_top(int argc, const char **argv);
  int cmd_script(int argc, const char **argv);
  int cmd_version(int argc, const char **argv);
+int cmd_version2(int argc, const char **argv);


please don't add new command for this, handle this in cmd_version



Yeah, I know that, adding a new command is not a good idea.

But how can I pass a new parameter to cmd_version to indicate it's the 
-vv case? Use argv[1]? But looks not good since argc is 1.


Any idea about that?

Thanks
Jin Yao



Hi Jiri,

Maybe could we just add the information to 'perf -v'? I mean we don't 
need the 'perf -vv', directly add the information to 'perf -v'. Is it OK?


Thanks
Jin Yao


thanks,
jirka



Re: [PATCH v1 2/3] perf version: Print the status of compiled-in libraries

2018-03-26 Thread Jin, Yao



On 3/27/2018 1:58 PM, Ingo Molnar wrote:


* Jin Yao  wrote:


+#ifdef HAVE_DWARF_SUPPORT
+#ifdef HAVE_DWARF_GETLOCATIONS
+#ifdef NO_GLIBC
+#ifdef HAVE_GTK2_SUPPORT
+#ifdef HAVE_LIBAUDIT_SUPPORT
+#ifdef HAVE_LIBBFD_SUPPORT
+#ifdef HAVE_LIBELF_SUPPORT
+#ifdef HAVE_LIBNUMA_SUPPORT
+#ifdef NO_LIBPERL
+#ifdef NO_LIBPYTHON
+#ifdef HAVE_SLANG_SUPPORT
+#ifdef HAVE_LIBCRYPTO_SUPPORT
+#ifdef HAVE_LIBUNWIND_SUPPORT
+#ifdef HAVE_DWARF_SUPPORT
+#ifdef HAVE_ZLIB_SUPPORT
+#ifdef HAVE_LZMA_SUPPORT
+#ifdef HAVE_AUXTRACE_SUPPORT
+#ifdef HAVE_LIBBPF_SUPPORT


BTW., it would be nice at this point to fix those 3 outliers that have a 
negation
in their library support status macro:


+#ifdef NO_GLIBC
+#ifdef NO_LIBPERL
+#ifdef NO_LIBPYTHON


... and invert them back to the HAVE_* side of the logic:


+#ifdef HAVE_GLIBC
+#ifdef HAVE_LIBPERL
+#ifdef HAVE_LIBPYTHON


That should make all related code more consistent and more readable.

Thanks,

Ingo



Yes, it's better, thanks!
I will try to convert them to the HAVE_*.

Thanks
Jin Yao


Re: [PATCH v1 3/3] perf: Support perf -vv

2018-03-26 Thread Jin, Yao



On 3/27/2018 2:03 PM, Ingo Molnar wrote:


* Jin Yao  wrote:


+++ b/tools/perf/perf.c
@@ -64,6 +64,7 @@ static struct cmd_struct commands[] = {
{ "top",  cmd_top,0 },
{ "annotate", cmd_annotate,   0 },
{ "version",  cmd_version,0 },
+   { "version2", cmd_version2,   0 },
{ "script",   cmd_script, 0 },
{ "sched",cmd_sched,  0 },
  #ifdef HAVE_LIBELF_SUPPORT
@@ -190,6 +191,11 @@ static int handle_options(const char ***argv, int *argc, 
int *envchanged)
break;
}
  
+		if (!strcmp(cmd, "-vv")) {

+   (*argv)[0] = "--version2";
+   break;
+   }


That is just lazy hackery, please use the regular option library to add options
and document them!

Also, '--version2' is a sloppy name, the right solution is to do what Git does, 
it
has a --build-options sub-option to 'perf version':

   triton:~/tip> git version -h
   usage: git version []

   --build-options   also print build options


   triton:~/tip> git version --build-options
   git version 2.14.1
   sizeof-long: 8

This should be done for perf as well, and _then_ maybe can we add a helper alias
that maps '-vv' to 'version --build-options'.

Thanks,

    Ingo



Something like 'perf version --build-options' and that will be mapped to 
'-vv'.


It's a good idea, thanks!

Thanks
Jin Yao


Re: [PATCH v1 2/3] perf version: Print the status of compiled-in libraries

2018-03-27 Thread Jin, Yao



On 3/27/2018 8:56 PM, Jiri Olsa wrote:

On Tue, Mar 27, 2018 at 09:44:23AM +0800, Jin, Yao wrote:



On 3/26/2018 5:39 PM, Jiri Olsa wrote:

On Tue, Mar 27, 2018 at 12:07:03AM +0800, Jin Yao wrote:

This patch checks the values passed by CFLAGS (-DXXX) and then
print the status of libraries.

For example, if HAVE_DWARF_SUPPORT is defined, that means the
library "dwarf" is compiled-in. The patch will print the status
"on" for this library.

Signed-off-by: Jin Yao 
---
   tools/perf/builtin-version.c | 125 
+++
   tools/perf/builtin.h |   1 +
   2 files changed, 126 insertions(+)

diff --git a/tools/perf/builtin-version.c b/tools/perf/builtin-version.c
index 37019c5..90a0a7f 100644
--- a/tools/perf/builtin-version.c
+++ b/tools/perf/builtin-version.c
@@ -9,3 +9,128 @@ int cmd_version(int argc __maybe_unused, const char **argv 
__maybe_unused)
printf("perf version %s\n", perf_version_string);
return 0;
   }
+
+static void status_print(const char *name, const char *status)
+{
+   printf("%22s: [ %3s ]\n", name, status);
+}
+
+static void library_status(void)
+{
+#ifdef HAVE_DWARF_SUPPORT
+   status_print("dwarf", "on");
+#else
+   status_print("dwarf", "off");
+#endif


could this and all those below be in some generic macro?

#define STATUS(__d, __m)\
#ifdef __d  \
status_print(#__m, "on"); \
#else   \
status_print(#__m, "OFF");\
#endif

STATUS(HAVE_DWARF_SUPPORT, dwarf)




Hi Jiri,

I have tried this macro definition, but unfortunately the compilation is
failed.

error: '#' is not followed by a macro parameter
  #define STATUS(__d, __m)  \

I just guess the '#' in #ifdef confuses the gcc. Looks we can't define
#ifdef/#endif block in a macro.



ah crap.. right ;-) how about we take the IS_BUILTIN thingie
from include/linux/kconfig.h and use it as in attached test
change.. it gives me:

[jolsa@krava perf]$ ./perf version
perf version 4.16.rc6.g1ee9a60
 dwarf: [  on ]
 krava: [ OFF ]

we could put those macros into tools/include/tools/config.h, Arnaldo?
together with the comments from kconfig.h that I cut out..

thanks,
jirka


---
diff --git a/tools/perf/builtin-version.c b/tools/perf/builtin-version.c
index 1fe458792cc1..b392f9e80fd0 100644
--- a/tools/perf/builtin-version.c
+++ b/tools/perf/builtin-version.c
@@ -6,8 +6,33 @@
  
  int version_verbose;
  
+

+#define __ARG_PLACEHOLDER_1 0,
+#define __take_second_arg(__ignored, val, ...) val
+
+#define __is_defined(x) ___is_defined(x)
+#define ___is_defined(val)  is_defined(__ARG_PLACEHOLDER_##val)
+#define is_defined(arg1_or_junk)__take_second_arg(arg1_or_junk 1, 0)
+
+#define IS_BUILTIN(option) __is_defined(option)
+
+
+static void status_print(const char *name, const char *status)
+{
+   printf("%22s: [ %3s ]\n", name, status);
+}
+
  int cmd_version(int argc __maybe_unused, const char **argv __maybe_unused)
  {
printf("perf version %s\n", perf_version_string);
+
+#define STATUS(__d, __m)   \
+   if (IS_BUILTIN(__d))\
+   status_print(#__m, "on"); \
+   else\
+   status_print(#__m, "OFF");
+
+   STATUS(HAVE_DWARF_SUPPORT, dwarf)
+   STATUS(HAVE_KRAVA_SUPPORT, krava)
return 0;
  }



Hi Jiri,

It looks we need to copy the IS_BUILTIN and associated macro definitions 
to a common header file.


I guess you will post this patch if Arnaldo agree with this idea, right?

I plan to post v2 of this patch series after this 'IS_BUILTIN' patch 
gets merged.


Thanks
Jin Yao


Re: [PATCH v1 2/3] perf version: Print the status of compiled-in libraries

2018-03-27 Thread Jin, Yao



On 3/27/2018 8:38 PM, Jiri Olsa wrote:

On Mon, Mar 26, 2018 at 09:51:03PM +0800, Jin, Yao wrote:



On 3/26/2018 5:39 PM, Jiri Olsa wrote:

On Tue, Mar 27, 2018 at 12:07:03AM +0800, Jin Yao wrote:

This patch checks the values passed by CFLAGS (-DXXX) and then
print the status of libraries.

For example, if HAVE_DWARF_SUPPORT is defined, that means the
library "dwarf" is compiled-in. The patch will print the status
"on" for this library.

Signed-off-by: Jin Yao 
---
   tools/perf/builtin-version.c | 125 
+++
   tools/perf/builtin.h |   1 +
   2 files changed, 126 insertions(+)

diff --git a/tools/perf/builtin-version.c b/tools/perf/builtin-version.c
index 37019c5..90a0a7f 100644
--- a/tools/perf/builtin-version.c
+++ b/tools/perf/builtin-version.c
@@ -9,3 +9,128 @@ int cmd_version(int argc __maybe_unused, const char **argv 
__maybe_unused)
printf("perf version %s\n", perf_version_string);
return 0;
   }
+
+static void status_print(const char *name, const char *status)
+{
+   printf("%22s: [ %3s ]\n", name, status);
+}
+
+static void library_status(void)
+{
+#ifdef HAVE_DWARF_SUPPORT
+   status_print("dwarf", "on");
+#else
+   status_print("dwarf", "off");
+#endif


could this and all those below be in some generic macro?

#define STATUS(__d, __m)\
#ifdef __d  \
status_print(#__m, "on"); \
#else   \
status_print(#__m, "OFF");\
#endif

STATUS(HAVE_DWARF_SUPPORT, dwarf)


also please print the 'OFF' and use colors as in the build message



Fine, thanks. I will update the code.


+
+#ifdef HAVE_DWARF_GETLOCATIONS
+   status_print("dwarf_getlocations", "on");
+#else
+   status_print("dwarf_getlocations", "off");
+#endif
+
+#ifdef NO_GLIBC
+   status_print("glibc", "off");
+#else
+   status_print("glibc", "on");
+#endif
+
+#ifdef HAVE_GTK2_SUPPORT
+   status_print("gtk2", "on");
+#else
+   status_print("gtk2", "off");
+#endif
+


SNIP


+}
+
+int cmd_version2(int argc, const char **argv)
+{
+   cmd_version(argc, argv);
+   library_status();
+   return 0;
+}
diff --git a/tools/perf/builtin.h b/tools/perf/builtin.h
index 05745f3..c7508ee 100644
--- a/tools/perf/builtin.h
+++ b/tools/perf/builtin.h
@@ -29,6 +29,7 @@ int cmd_timechart(int argc, const char **argv);
   int cmd_top(int argc, const char **argv);
   int cmd_script(int argc, const char **argv);
   int cmd_version(int argc, const char **argv);
+int cmd_version2(int argc, const char **argv);


please don't add new command for this, handle this in cmd_version



Yeah, I know that, adding a new command is not a good idea.

But how can I pass a new parameter to cmd_version to indicate it's the -vv
case? Use argv[1]? But looks not good since argc is 1.

Any idea about that?


I guess the based would be to have 'perf version -v', where
in cmd_version you'd call standard option processing..

the 'perf -vv' shortcut could for example be done through
some global var like in attached patch... untested

jirka


---
diff --git a/tools/perf/builtin-version.c b/tools/perf/builtin-version.c
index 37019c5d675f..1fe458792cc1 100644
--- a/tools/perf/builtin-version.c
+++ b/tools/perf/builtin-version.c
@@ -4,6 +4,8 @@
  #include 
  #include 
  
+int version_verbose;

+
  int cmd_version(int argc __maybe_unused, const char **argv __maybe_unused)
  {
printf("perf version %s\n", perf_version_string);
diff --git a/tools/perf/perf.c b/tools/perf/perf.c
index 1b3fc8ec0fa2..a5c95c6c0de7 100644
--- a/tools/perf/perf.c
+++ b/tools/perf/perf.c
@@ -185,7 +185,14 @@ static int handle_options(const char ***argv, int *argc, 
int *envchanged)
break;
}
  
-		if (!strcmp(cmd, "-v")) {

+   if (strstarts(cmd, "-v")) {
+   int i;
+
+   for (i = 2; cmd[i]; i++) {
+   if (cmd[i] == 'v')
+   version_verbose++;
+   }
+
(*argv)[0] = "--version";
break;
}
diff --git a/tools/perf/perf.h b/tools/perf/perf.h
index 8fec1abd0f1f..9c45ba179635 100644
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -13,6 +13,8 @@ void test_attr__init(void);
  void test_attr__open(struct perf_event_attr *attr, pid_t pid, int cpu,
         int fd, int group_fd, unsigned long flags);
  
+extern int version_verbose;

+
  #define HAVE_ATTR_TEST
  #include "perf-sys.h"
  



Using global variable, hmm, that's fine, looks no better way. :)

I will follow Ingo's suggestion. Use 'perf -v --build-options' and map 
it to 'perf -vv'.


Thanks
Jin Yao


[PATCH v1 3/3] perf version: Print status for syscall_table

2018-04-08 Thread Jin Yao
This patch doesn't print "libaudit" line if HAVE_SYSCALL_TABLE_SUPPORT
is available and add a line for HAVE_SYSCALL_TABLE_SUPPORT.

For example,

$ ./perf -vv
perf version 4.13.rc5.gc2f8af9
 dwarf: [ on  ]  # HAVE_DWARF_SUPPORT
dwarf_getlocations: [ on  ]  # HAVE_DWARF_GETLOCATIONS_SUPPORT
 glibc: [ on  ]  # HAVE_GLIBC_SUPPORT
  gtk2: [ on  ]  # HAVE_GTK2_SUPPORT
 syscall_table: [ on  ]  # HAVE_SYSCALL_TABLE_SUPPORT
libbfd: [ on  ]  # HAVE_LIBBFD_SUPPORT
libelf: [ on  ]  # HAVE_LIBELF_SUPPORT
   libnuma: [ on  ]  # HAVE_LIBNUMA_SUPPORT
numa_num_possible_cpus: [ on  ]  # HAVE_LIBNUMA_SUPPORT
   libperl: [ on  ]  # HAVE_LIBPERL_SUPPORT
 libpython: [ on  ]  # HAVE_LIBPYTHON_SUPPORT
  libslang: [ on  ]  # HAVE_SLANG_SUPPORT
 libcrypto: [ on  ]  # HAVE_LIBCRYPTO_SUPPORT
 libunwind: [ on  ]  # HAVE_LIBUNWIND_SUPPORT
libdw-dwarf-unwind: [ on  ]  # HAVE_DWARF_SUPPORT
  zlib: [ on  ]  # HAVE_ZLIB_SUPPORT
  lzma: [ on  ]  # HAVE_LZMA_SUPPORT
 get_cpuid: [ on  ]  # HAVE_AUXTRACE_SUPPORT
   bpf: [ on  ]  # HAVE_LIBBPF_SUPPORT

The line "syscall_table: [ on  ]  # HAVE_SYSCALL_TABLE_SUPPORT" is
new created.

Signed-off-by: Jin Yao 
---
 tools/perf/builtin-version.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/tools/perf/builtin-version.c b/tools/perf/builtin-version.c
index 2abe391..50df168 100644
--- a/tools/perf/builtin-version.c
+++ b/tools/perf/builtin-version.c
@@ -60,7 +60,10 @@ static void library_status(void)
STATUS(HAVE_DWARF_GETLOCATIONS_SUPPORT, dwarf_getlocations);
STATUS(HAVE_GLIBC_SUPPORT, glibc);
STATUS(HAVE_GTK2_SUPPORT, gtk2);
+#ifndef HAVE_SYSCALL_TABLE_SUPPORT
STATUS(HAVE_LIBAUDIT_SUPPORT, libaudit);
+#endif
+   STATUS(HAVE_SYSCALL_TABLE_SUPPORT, syscall_table);
STATUS(HAVE_LIBBFD_SUPPORT, libbfd);
STATUS(HAVE_LIBELF_SUPPORT, libelf);
STATUS(HAVE_LIBNUMA_SUPPORT, libnuma);
-- 
2.7.4



[PATCH v1 0/3] perf version: some follow-up updates

2018-04-08 Thread Jin Yao
1. Remove -DNO_LIBPERL and use -DHAVE_LIBPERL_SUPPORT in C code.

2. Remove -DNO_LIBPYTHON and use -DHAVE_LIBPYTHON_SUPPORT in C code.

3. Rename HAVE_SYSCALL_TABLE to HAVE_SYSCALL_TABLE_SUPPORT.

4. Don't print "libaudit" line if HAVE_SYSCALL_TABLE_SUPPORT is available
   and add a line for HAVE_SYSCALL_TABLE_SUPPORT.

Jin Yao (3):
  perf script: Use HAVE_LIBXXX_SUPPORT to replace NO_LIBXXX
  perf: Rename HAVE_SYSCALL_TABLE to HAVE_SYSCALL_TABLE_SUPPORT
  perf version: Print status for syscall_table

 tools/perf/Makefile.config  | 2 +-
 tools/perf/builtin-help.c   | 2 +-
 tools/perf/builtin-script.c | 4 ++--
 tools/perf/builtin-version.c| 3 +++
 tools/perf/perf.c   | 4 ++--
 tools/perf/util/generate-cmdlist.sh | 2 +-
 tools/perf/util/syscalltbl.c| 6 +++---
 tools/perf/util/trace-event-scripting.c | 4 ++--
 8 files changed, 15 insertions(+), 12 deletions(-)

-- 
2.7.4



[PATCH v1 2/3] perf: Rename HAVE_SYSCALL_TABLE to HAVE_SYSCALL_TABLE_SUPPORT

2018-04-08 Thread Jin Yao
To make consistent with other HAVE_XXX_SUPPORT in Makefile.config,
this patch renames HAVE_SYSCALL_TABLE to HAVE_SYSCALL_TABLE_SUPPORT
and update the C code accordingly.

Signed-off-by: Jin Yao 
---
 tools/perf/Makefile.config  | 2 +-
 tools/perf/builtin-help.c   | 2 +-
 tools/perf/perf.c   | 4 ++--
 tools/perf/util/generate-cmdlist.sh | 2 +-
 tools/perf/util/syscalltbl.c| 6 +++---
 5 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/tools/perf/Makefile.config b/tools/perf/Makefile.config
index c7abd83..9ea9b06 100644
--- a/tools/perf/Makefile.config
+++ b/tools/perf/Makefile.config
@@ -68,7 +68,7 @@ ifeq ($(NO_PERF_REGS),0)
 endif
 
 ifneq ($(NO_SYSCALL_TABLE),1)
-  CFLAGS += -DHAVE_SYSCALL_TABLE
+  CFLAGS += -DHAVE_SYSCALL_TABLE_SUPPORT
 endif
 
 # So far there's only x86 and arm libdw unwind support merged in perf.
diff --git a/tools/perf/builtin-help.c b/tools/perf/builtin-help.c
index 4aca13f..1c41b4e 100644
--- a/tools/perf/builtin-help.c
+++ b/tools/perf/builtin-help.c
@@ -439,7 +439,7 @@ int cmd_help(int argc, const char **argv)
 #ifdef HAVE_LIBELF_SUPPORT
"probe",
 #endif
-#if defined(HAVE_LIBAUDIT_SUPPORT) || defined(HAVE_SYSCALL_TABLE)
+#if defined(HAVE_LIBAUDIT_SUPPORT) || defined(HAVE_SYSCALL_TABLE_SUPPORT)
"trace",
 #endif
NULL };
diff --git a/tools/perf/perf.c b/tools/perf/perf.c
index 1659029..20a08cb 100644
--- a/tools/perf/perf.c
+++ b/tools/perf/perf.c
@@ -73,7 +73,7 @@ static struct cmd_struct commands[] = {
{ "lock",   cmd_lock,   0 },
{ "kvm",cmd_kvm,0 },
{ "test",   cmd_test,   0 },
-#if defined(HAVE_LIBAUDIT_SUPPORT) || defined(HAVE_SYSCALL_TABLE)
+#if defined(HAVE_LIBAUDIT_SUPPORT) || defined(HAVE_SYSCALL_TABLE_SUPPORT)
{ "trace",  cmd_trace,  0 },
 #endif
{ "inject", cmd_inject, 0 },
@@ -491,7 +491,7 @@ int main(int argc, const char **argv)
argv[0] = cmd;
}
if (strstarts(cmd, "trace")) {
-#if defined(HAVE_LIBAUDIT_SUPPORT) || defined(HAVE_SYSCALL_TABLE)
+#if defined(HAVE_LIBAUDIT_SUPPORT) || defined(HAVE_SYSCALL_TABLE_SUPPORT)
setup_path();
argv[0] = "trace";
return cmd_trace(argc, argv);
diff --git a/tools/perf/util/generate-cmdlist.sh 
b/tools/perf/util/generate-cmdlist.sh
index ff17920..c3cef36 100755
--- a/tools/perf/util/generate-cmdlist.sh
+++ b/tools/perf/util/generate-cmdlist.sh
@@ -38,7 +38,7 @@ do
 done
 echo "#endif /* HAVE_LIBELF_SUPPORT */"
 
-echo "#if defined(HAVE_LIBAUDIT_SUPPORT) || defined(HAVE_SYSCALL_TABLE)"
+echo "#if defined(HAVE_LIBAUDIT_SUPPORT) || 
defined(HAVE_SYSCALL_TABLE_SUPPORT)"
 sed -n -e 's/^perf-\([^]*\)[   ].* audit*/\1/p' command-list.txt |
 sort |
 while read cmd
diff --git a/tools/perf/util/syscalltbl.c b/tools/perf/util/syscalltbl.c
index 895122d..0ee7f56 100644
--- a/tools/perf/util/syscalltbl.c
+++ b/tools/perf/util/syscalltbl.c
@@ -17,7 +17,7 @@
 #include 
 #include 
 
-#ifdef HAVE_SYSCALL_TABLE
+#ifdef HAVE_SYSCALL_TABLE_SUPPORT
 #include 
 #include "string2.h"
 #include "util.h"
@@ -139,7 +139,7 @@ int syscalltbl__strglobmatch_first(struct syscalltbl *tbl, 
const char *syscall_g
return syscalltbl__strglobmatch_next(tbl, syscall_glob, idx);
 }
 
-#else /* HAVE_SYSCALL_TABLE */
+#else /* HAVE_SYSCALL_TABLE_SUPPORT */
 
 #include 
 
@@ -176,4 +176,4 @@ int syscalltbl__strglobmatch_first(struct syscalltbl *tbl, 
const char *syscall_g
 {
return syscalltbl__strglobmatch_next(tbl, syscall_glob, idx);
 }
-#endif /* HAVE_SYSCALL_TABLE */
+#endif /* HAVE_SYSCALL_TABLE_SUPPORT */
-- 
2.7.4



Re: [PATCH] perf util: Display warning when perf report/annotate is missing some libs

2018-03-20 Thread Jin, Yao

Hi Jiri,

I'm still thinking it's worth displaying the warning when perf missing 
some libraries.


Somebody just told me that perf didn't work well. While after some 
investigations, I found it's just missing some libraries when building 
the perf.


But I have spent some time on getting the root cause. If with this 
patch, it should be very easily to know that.


Thanks
Jin Yao

On 1/12/2018 10:22 AM, Jin, Yao wrote:



On 1/11/2018 11:30 PM, Jiri Olsa wrote:

On Thu, Jan 11, 2018 at 07:03:06PM +0800, Jin Yao wrote:

We keep having bug reports that when users build perf on their own,


we already have same warnings during the build



Yes, there will be warnings displayed during the build if some libraries 
are missing.


While I do find the users (especially the new perf user) don't notice 
the warning. They only care about the success of build.


Once they use this perf version and find something working strangely, 
they will say that perf may have bug. :(


but they don't install some needed libraries like libelf, 
libbfd/libibery.


how about saying that in the symbol column,
instead of poluting report's output, like:

$ perf report --stdio

# Overhead  Command  Shared Object Symbol (disabled)
#   ...    ..



I just think it'd better provide some hints to user. For example, 
"symbol is disabled and you need to install libelf/xxx", say something 
like that.


But it looks the column can't contain too much information (i.e. no more 
space to contain the entire hints).


Any idea? Or just add this warning in verbose mode?


also your change does not affect tui mode

annotation for some reason does not start at all.. could be
little more verbose ;-)

jirka



Yes, it doesn't affect tui mode.

Or we just add this warning in verbose mode?

e.g. perf report -v?

Thanks
Jin Yao


Re: [PATCH] perf util: Display warning when perf report/annotate is missing some libs

2018-03-21 Thread Jin, Yao



On 3/21/2018 11:38 PM, Jiri Olsa wrote:

On Wed, Mar 21, 2018 at 10:11:10AM +0800, Jin, Yao wrote:

Hi Jiri,

I'm still thinking it's worth displaying the warning when perf missing some
libraries.

Somebody just told me that perf didn't work well. While after some
investigations, I found it's just missing some libraries when building the
perf.

But I have spent some time on getting the root cause. If with this patch, it
should be very easily to know that.


true.. Arnaldo, any feedback on this one?


I just think it'd better provide some hints to user. For example,
"symbol is disabled and you need to install libelf/xxx", say something
like that.

But it looks the column can't contain too much information (i.e. no more
space to contain the entire hints).

Any idea? Or just add this warning in verbose mode?


also your change does not affect tui mode

annotation for some reason does not start at all.. could be
little more verbose ;-)

jirka



Yes, it doesn't affect tui mode.

Or we just add this warning in verbose mode?

e.g. perf report -v?


how about displaying libraries separately with -vv output,
that would mimic the build message, like:

   $ ./perf -vv
   perf version 4.16.rc6.g18fd48

dwarf: [ on  ]
   dwarf_getlocations: [ on  ]
glibc: [ on  ]
 gtk2: [ on  ]
 libaudit: [ on  ]
   libbfd: [ on  ]
   libelf: [ on  ]
  libnuma: [ on  ]
   numa_num_possible_cpus: [ on  ]
  libperl: [ on  ]
libpython: [ on  ]
 libslang: [ on  ]
libcrypto: [ on  ]
libunwind: [ on  ]
   libdw-dwarf-unwind: [ on  ]
 zlib: [ on  ]
 lzma: [ on  ]
get_cpuid: [ on  ]
  bpf: [ on  ]

and perf -vvv could display the 'make VF=1' info

jirka



I'm just afraid that the newbie will not check the -vv on his own when 
he gets trouble in using perf.


In other words, if a user is experienced and he knows -vv yet, I may 
assume that he should know installing all libraries before building the 
perf.


This patch is specific for the perf newbie. It will directly shows the 
error/warning when the user launches the perf binary. It will have a 
little bit helps, I guess. :)


Thanks
Jin Yao



Re: [PATCH] perf util: Display warning when perf report/annotate is missing some libs

2018-03-21 Thread Jin, Yao



On 3/22/2018 2:52 AM, Arnaldo Carvalho de Melo wrote:

Em Wed, Mar 21, 2018 at 05:04:46PM +0100, Jiri Olsa escreveu:

On Wed, Mar 21, 2018 at 12:43:15PM -0300, Arnaldo Carvalho de Melo wrote:

Em Wed, Mar 21, 2018 at 12:40:35PM -0300, Arnaldo Carvalho de Melo escreveu:

Em Wed, Mar 21, 2018 at 04:38:07PM +0100, Jiri Olsa escreveu:

On Wed, Mar 21, 2018 at 10:11:10AM +0800, Jin, Yao wrote:

Hi Jiri,

I'm still thinking it's worth displaying the warning when perf missing some
libraries.

Somebody just told me that perf didn't work well. While after some
investigations, I found it's just missing some libraries when building the
perf.

But I have spent some time on getting the root cause. If with this patch, it
should be very easily to know that.



true.. Arnaldo, any feedback on this one?



Lemme re-read the thread...



Well, how about we make it harder to build without key libraries? I.e.
if we detect that what we consider a core set of libraries isn't found
in the system, then we stop the build, warn about it and ask the user to
confirm that the build should proceed by passing some explicit
-DI_KNOW_WHAT_I_AM_DOING___PROCEED=doit



hum, not sure we want to complicate the build even more than it
is now :-\ and IMO it still won't help much in Jin's problem,
if user forces the build anyway


Well, if a user _forces_ a build, not taking into consideration a
warning that _is_ emitted and _stops_ the build, about the functionality
it will lose by doing forcing the build, then comes back and complains
that that functionality is not present, then it becomes difficult to
help this user...  :-)

On the other hand, if the user forgets to install an important library,
the warning is emitted but the build proceeds, no explicit action was
performed, just a warning wasn't noticed, and the user complains, then
I'd say: "hey, are you sure library foo devel files were present when
you build it?", i.e. the support back and forth Jin is trying to avoid.

And for users that _saw_ the warning, _knew_ they _didn't_ want that
functionality, to be reminded while running, say 'perf report' that
something they _decided not to have_ isn't present, then that could be
annoying, no?

Lemme try another idea: what if we do something like gcc does and print
the features present when showing the version?

I.e.:

[acme@jouet perf]$ gcc -v
Using built-in specs.
COLLECT_GCC=/usr/bin/gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/7/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-redhat-linux
Configured with: ../configure --enable-bootstrap 
--enable-languages=c,c++,objc,obj-c++,fortran,ada,go,lto --prefix=/usr 
--mandir=/usr/share/man --infodir=/usr/share/info 
--with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-shared 
--enable-threads=posix --enable-checking=release --enable-multilib 
--with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions 
--enable-gnu-unique-object --enable-linker-build-id 
--with-gcc-major-version-only --with-linker-hash-style=gnu --enable-plugin 
--enable-initfini-array --with-isl --enable-libmpx 
--enable-offload-targets=nvptx-none --without-cuda-driver 
--enable-gnu-indirect-function --with-tune=generic --with-arch_32=i686 
--build=x86_64-redhat-linux
Thread model: posix
gcc version 7.3.1 20180303 (Red Hat 7.3.1-5) (GCC)
[acme@jouet perf]$

- Arnaldo



Is this too complicated for perf newbie to understand?

For my problem, the mistake only occurs on perf newbie. I just think, 
it'd better return a direct and clear message to them. Maybe they don't 
know or don't use -v or -vv to do more investigation by themselves.


Thanks
Jin Yao


Re: [PATCH] perf util: Display warning when perf report/annotate is missing some libs

2018-03-22 Thread Jin, Yao



On 3/22/2018 4:51 PM, Jiri Olsa wrote:

On Thu, Mar 22, 2018 at 09:04:10AM +0800, Jin, Yao wrote:



On 3/21/2018 11:38 PM, Jiri Olsa wrote:

On Wed, Mar 21, 2018 at 10:11:10AM +0800, Jin, Yao wrote:

Hi Jiri,

I'm still thinking it's worth displaying the warning when perf missing some
libraries.

Somebody just told me that perf didn't work well. While after some
investigations, I found it's just missing some libraries when building the
perf.

But I have spent some time on getting the root cause. If with this patch, it
should be very easily to know that.


true.. Arnaldo, any feedback on this one?


I just think it'd better provide some hints to user. For example,
"symbol is disabled and you need to install libelf/xxx", say something
like that.

But it looks the column can't contain too much information (i.e. no more
space to contain the entire hints).

Any idea? Or just add this warning in verbose mode?


also your change does not affect tui mode

annotation for some reason does not start at all.. could be
little more verbose ;-)

jirka



Yes, it doesn't affect tui mode.

Or we just add this warning in verbose mode?

e.g. perf report -v?


how about displaying libraries separately with -vv output,
that would mimic the build message, like:

$ ./perf -vv
perf version 4.16.rc6.g18fd48

 dwarf: [ on  ]
dwarf_getlocations: [ on  ]
 glibc: [ on  ]
  gtk2: [ on  ]
  libaudit: [ on  ]
libbfd: [ on  ]
libelf: [ on  ]
   libnuma: [ on  ]
numa_num_possible_cpus: [ on  ]
   libperl: [ on  ]
 libpython: [ on  ]
  libslang: [ on  ]
 libcrypto: [ on  ]
 libunwind: [ on  ]
libdw-dwarf-unwind: [ on  ]
  zlib: [ on  ]
  lzma: [ on  ]
 get_cpuid: [ on  ]
   bpf: [ on  ]

and perf -vvv could display the 'make VF=1' info

jirka



I'm just afraid that the newbie will not check the -vv on his own when he
gets trouble in using perf.

In other words, if a user is experienced and he knows -vv yet, I may assume
that he should know installing all libraries before building the perf.

This patch is specific for the perf newbie. It will directly shows the
error/warning when the user launches the perf binary. It will have a little
bit helps, I guess. :)


I just don't like the idea that when you run perf report,
or annotate it spits out lines for every missing feature

maybe we could detect missing features for given command
and display line about missing features and say something
like:

'Warning: symbol,dwarf support not compiled in (for more details run perf -vv)'

or somwthing like that.. ;-)

jirka



Hi Jiri,

I think your idea is very good!

I guess following it's just an example copied from perf building 
process, right?


$ ./perf -vv
perf version 4.16.rc6.g18fd48

 dwarf: [ on  ]
dwarf_getlocations: [ on  ]
 glibc: [ on  ]
  gtk2: [ on  ]
  libaudit: [ on  ]
libbfd: [ on  ]
libelf: [ on  ]
   libnuma: [ on  ]
numa_num_possible_cpus: [ on  ]
   libperl: [ on  ]
 libpython: [ on  ]
  libslang: [ on  ]
 libcrypto: [ on  ]
 libunwind: [ on  ]
libdw-dwarf-unwind: [ on  ]
  zlib: [ on  ]
  lzma: [ on  ]
 get_cpuid: [ on  ]
   bpf: [ on  ]

We can check some CFLAGS like "#ifdef HAVE_XXX" in perf code to 
determine if some libraries are compiled in.


For example,

#ifdef HAVE_LIBNUMA_SUPPORT
printf("libnuma: [ on  ]");
#endif

For some features, such as "numa_num_possible_cpus", which doesn't have 
CFLAGS variables. Maybe we can ignore them in report?


I'd like to upgrade my patch to support perf -vv.

Thanks
Jin Yao



Re: [PATCH v2] perf annotate: Support to display the IPC/Cycle in tui mode

2018-03-07 Thread Jin, Yao

Hi,

Could this patch be accepted?

I'm working on the supporting for stdio mode. I just wish to post the 
patch series after this patch being merged.


Thanks
Jin Yao

On 2/27/2018 5:38 PM, Jin Yao wrote:

Unlike the perf report interactive annotate mode, the perf annotate
doesn't display the IPC/Cycle even if branch info is recorded in perf
data file.

perf record -b ...
perf annotate function

It should show IPC/cycle, but it doesn't.

This patch lets perf annotate support the displaying of IPC/Cycle if
branch info is in perf data.

For example,

perf annotate compute_flag

Percent│ IPC Cycle
│
│
│Disassembly of section .text:
│
│00400640 :
│compute_flag():
│volatile int count;
│static unsigned int s_randseed;
│
│__attribute__((noinline))
│int compute_flag()
│{
  22.96 │1.18   584sub$0x8,%rsp
│int i;
│
│i = rand() % 2;
  23.02 │1.18 1  → callq  rand@plt
│
│return i;
  27.05 │3.37  mov%eax,%edx
│}
│3.37  add$0x8,%rsp
│{
│int i;
│
│i = rand() % 2;
│
│return i;
│3.37  shr$0x1f,%edx
│3.37  add%edx,%eax
│3.37  and$0x1,%eax
│3.37  sub%edx,%eax
│}
  26.97 │3.37 2  ← retq

Note that, this patch only supports tui mode. For stdio,
now it just keeps original behavior. Will support it in a
follow-up patch.

perf annotate compute_flag --stdio

  Percent |  Source code & Disassembly of div for cycles:ppp (7993 samples)
--
  :
  :
  :
  :Disassembly of section .text:
  :
  :00400640 :
  :compute_flag():
  :volatile int count;
  :static unsigned int s_randseed;
  :
  :__attribute__((noinline))
  :int compute_flag()
  :{
 0.29 :   400640:   sub$0x8,%rsp # +100.00%
  :int i;
  :
  :i = rand() % 2;
42.93 :   400644:   callq  400490  # -100.00% (p:100.00%)
  :
  :return i;
 0.10 :   400649:   mov%eax,%edx # +100.00%
  :}
 0.94 :   40064b:   add$0x8,%rsp
  :{
  :int i;
  :
  :i = rand() % 2;
  :
  :return i;
27.02 :   40064f:   shr$0x1f,%edx
 0.15 :   400652:   add%edx,%eax
 1.24 :   400654:   and$0x1,%eax
 2.08 :   400657:   sub%edx,%eax
  :}
25.26 :   400659:   retq # -100.00% (p:100.00%)

Change-log:
---
v2:
---
No code change. Just change the patch description to make it more clear.

Signed-off-by: Jin Yao 
---
  tools/perf/builtin-annotate.c | 88 ---
  1 file changed, 82 insertions(+), 6 deletions(-)

diff --git a/tools/perf/builtin-annotate.c b/tools/perf/builtin-annotate.c
index f15731a..ead6ae4 100644
--- a/tools/perf/builtin-annotate.c
+++ b/tools/perf/builtin-annotate.c
@@ -44,6 +44,7 @@ struct perf_annotate {
bool   full_paths;
bool   print_line;
bool   skip_missing;
+   bool   has_br_stack;
const char *sym_hist_filter;
const char *cpu_list;
DECLARE_BITMAP(cpu_bitmap, MAX_NR_CPUS);
@@ -146,16 +147,73 @@ static void process_branch_stack(struct branch_stack *bs, 
struct addr_location *
free(bi);
  }
  
+static int hist_iter__branch_callback(struct hist_entry_iter *iter,

+ struct addr_location *al __maybe_unused,
+ bool single __maybe_unused,
+ void *arg __maybe_unused)
+{
+   struct hist_entry *he = iter->he;
+   struct branch_info *bi;
+   struct perf_sample *sample = iter->sample;
+   struct perf_evsel *evsel = iter->evsel;
+   int err;
+
+   hist__account_cycles(sample->branch_stack, al, sample, false);
+
+   bi = he->branch_info;
+   err = addr_map_symbol__inc_samples(&bi->from, sample, evsel->idx);
+
+   if (err)
+   goto out;
+
+   err = addr_map_symbol__inc_

Re: [PATCH v1 0/4] perf annotate: Create a new '--tui-dump' option

2018-03-13 Thread Jin, Yao



On 3/13/2018 11:20 PM, Arnaldo Carvalho de Melo wrote:

Em Tue, Mar 13, 2018 at 10:16:50PM +0800, Jin Yao escreveu:

There is a requirement to let perf annotate support displaying the IPC/Cycle.
In previous patch, this is supported in TUI mode. While it's not convenient
for users since they have to take screen shots and copy/paste data.

This patch series introduces a new option '--tui-dump' in perf annotate to
dump the TUI output to stdio.

User can easily use the command line like:
'perf annotate --tui-dump > /tmp/log.txt'


My first impression is that this pollutes the code with way too many
ifs, I was thinking more of a:

while (read parsed objdump line) {
ins__fprintf();
}



Yes, the issue in my patch is that it uses many 'if' to check if it's 
tui_dump(or called 'stdio2') or tui mode.



Going from the refresh routine, I started doing the conversion, but haven't
completed it, there are opportunities for more __scnprintf like routines, also
one to find the percent_max, etc then those would be used both in these two for
--stdio2, that eventually would become --stdio with the old one becoming
--stdio1, till we're satisfied with the new default.



I have some questions for the following code. Please correct me if I 
misunderstand anything.



static void annotation_line__fprintf(struct annotate_line *al, FILE *fp)
{
struct browser_line *bl = browser_line(al);
int i;
double percent_max = 0.0;
char bf[256];

for (i = 0; i < browser->nr_events; i++) {
if (al->samples[i].percent > percent_max)
percent_max = al->samples[i].percent;
}

/* the following if/else block should be transformed into a __scnprintf 
routine
  that formats a buffer and then the TUI and --stdio2 use it */

if (al->offset != -1 && percent_max != 0.0) {
for (i = 0; i < ab->nr_events; i++) {
if (annotate_browser__opts.show_total_period) {
fprintf(fp, browser, "%11" PRIu64 " ", 
al->samples[i].he.period);
} else if (annotate_browser__opts.show_nr_samples) {
fprintf(fp, browser, "%6" PRIu64 " ", 
al->samples[i].he.nr_samples);
} else {
fprintf(fp, "%6.2f ", al->samples[i].percent);
}
}
} else {
ui_browser__printf(browser, "%*s", pcnt_width,
   annotate_browser__opts.show_total_period ? 
"Period" :
   annotate_browser__opts.show_nr_samples ? "Samples" : 
"Percent");

}



I guess the above code has not been completed yet. My understanding for 
Arnaldo's idea is that the output should be written to a buffer via 
scnprintf and the buffer will be passed to TUI or stdio2 and printed out 
later.


One potential issue is how to process the color for TUI? For example, 
the call to ui_browser__set_percent_color. If we need to set color for 
TUI, it looks we still have to check if it's a TUI mode.


For example,

Suppose the bf[] has been written with the Percent string yet, next we need,

if (--tui) {
ui_browser__set_percent_color(...);
ui_browser__printf(bf, ...);
} else {
printf(..., bf);
}

Is my understanding correct?


 /* The ab->have_cycles should go to a separate struct, outside
   * annotation_browser, and the rest should go to something that just 
does scnprintf on a buffer
  * that then is printed on the TUI or with fprintf */

if (ab->have_cycles) {
if (al->ipc)
fprintf(fp, "%*.2f ", IPC_WIDTH - 1, al->ipc) > 
else if (show_title)
ui_browser__printf(browser, "%*s ", IPC_WIDTH - 1, 
"IPC");

if (al->cycles)
ui_browser__printf(browser, "%*" PRIu64 " ",
   CYCLES_WIDTH - 1, al->cycles);
else if (!show_title)
ui_browser__write_nstring(browser, " ", CYCLES_WIDTH);
else
ui_browser__printf(browser, "%*s ", CYCLES_WIDTH - 1, 
"Cycle");
}

SLsmg_write_char(' ');

/* The scroll bar isn't being used */
if (!browser->navkeypressed)
width += 1;

if (!*al->line)
fprintf(fp, "\n");
else if (al->offset == -1) {
if (al->line_nr && annotat

[PATCH 2/2] perf report: Display average IPC and IPC coverage per symbol

2018-11-25 Thread Jin Yao
Support displaying the average IPC and IPC coverage for symbol
in perf report TUI browser. We create a new sort-key 'ipc' for
that.

For example,

$ perf record -g -b ...
$ perf report -s symbol,ipc or
  perf report -s ipc

Overhead  Symbol   IPC   [IPC Coverage]
  39.60%  [.] __random 2.30  [ 54.8%]
  18.02%  [.] main 0.43  [ 54.3%]
  14.21%  [.] compute_flag 2.29  [100.0%]
  14.16%  [.] rand 0.36  [100.0%]
   7.06%  [.] __random_r   2.57  [ 70.5%]
   6.85%  [.] rand@plt 0.00  [  0.0%]

Note that, stdio mode doesn't support this feature.

Signed-off-by: Jin Yao 
---
 tools/perf/Documentation/perf-report.txt |  1 +
 tools/perf/builtin-report.c  | 11 
 tools/perf/util/hist.h   |  1 +
 tools/perf/util/sort.c   | 43 
 tools/perf/util/sort.h   |  1 +
 tools/perf/util/symbol.h |  1 +
 6 files changed, 58 insertions(+)

diff --git a/tools/perf/Documentation/perf-report.txt 
b/tools/perf/Documentation/perf-report.txt
index 474a494..e7b8974 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -122,6 +122,7 @@ OPTIONS
- in_tx: branch in TSX transaction
- abort: TSX transaction abort.
- cycles: Cycles in basic block
+   - ipc: average ipc (instruction per cycle) of function
 
And default sort keys are changed to comm, dso_from, symbol_from, dso_to
and symbol_to, see '--branch-stack'.
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 257c9c1..9b75e11 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -954,6 +954,7 @@ int cmd_report(int argc, const char **argv)
bool has_br_stack = false;
int branch_mode = -1;
bool branch_call_mode = false;
+   bool symbol_ipc = false;
 #define CALLCHAIN_DEFAULT_OPT  "graph,0.5,caller,function,percent"
const char report_callchain_help[] = "Display call graph (stack 
chain/backtrace):\n\n"
 CALLCHAIN_REPORT_HELP
@@ -1284,6 +1285,13 @@ int cmd_report(int argc, const char **argv)
else
use_browser = 0;
 
+   if ((sort__mode == SORT_MODE__BRANCH) && sort_order &&
+   strstr(sort_order, "ipc")) {
+   if (!strstr(sort_order, "symbol"))
+   sort_order = "symbol,ipc";
+   symbol_ipc = true;
+   }
+
if (setup_sorting(session->evlist) < 0) {
if (sort_order)
parse_options_usage(report_usage, options, "s", 1);
@@ -1331,6 +1339,9 @@ int cmd_report(int argc, const char **argv)
symbol_conf.sort_by_name = true;
}
annotation_config__init();
+   } else if (symbol_ipc) {
+   pr_err("Only TUI mode supports sort-key ipc\n");
+   goto error;
}
 
if (symbol__init(&session->header.env) < 0)
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index 3badd7f..664b5ed 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -62,6 +62,7 @@ enum hist_column {
HISTC_TRACE,
HISTC_SYM_SIZE,
HISTC_DSO_SIZE,
+   HISTC_SYMBOL_IPC,
HISTC_NR_COLS, /* Last entry */
 };
 
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index f96c005..94f62c8 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -13,6 +13,7 @@
 #include "strlist.h"
 #include 
 #include "mem-events.h"
+#include "annotate.h"
 #include 
 
 regex_tparent_regex;
@@ -422,6 +423,47 @@ struct sort_entry sort_srcline_to = {
.se_width_idx   = HISTC_SRCLINE_TO,
 };
 
+static int hist_entry__sym_ipc_snprintf(struct hist_entry *he, char *bf,
+   size_t size, unsigned int width)
+{
+
+   struct symbol *sym = he->ms.sym;
+   struct map *map = he->ms.map;
+   struct perf_evsel *evsel = hists_to_evsel(he->hists);
+   struct annotation *notes;
+   double ipc = 0.0, coverage = 0.0;
+   char tmp[64];
+
+   if (!sym)
+   return repsep_snprintf(bf, size, "%-*s", width, "-");
+
+   if (!sym->annotated &&
+   symbol__annotate2(sym, map, evsel, &annotation__default_options,
+ NULL) < 0) {
+   return 0;
+   }
+
+   sym->annotated = true;
+   notes = symbol__annotation(sym);
+
+   if (notes->hit_cycles)
+   ipc = notes->hit_insn / ((double)notes->hit_cycles);
+
+   if (notes->total_insn)
+   

[PATCH 0/2] perf report/annotate: Support average IPC and IPC coverage for function

2018-11-25 Thread Jin Yao
Add supporting of displaying the average IPC and IPC coverage
percentage per function.

For example,

$ perf record -g -b ...
$ perf report -s symbol,ipc

Overhead  Symbol   IPC   [IPC Coverage]
  39.60%  [.] __random 2.30  [ 54.8%]
  18.02%  [.] main 0.43  [ 54.3%]
  14.21%  [.] compute_flag 2.29  [100.0%]
  14.16%  [.] rand 0.36  [100.0%]
   7.06%  [.] __random_r   2.57  [ 70.5%]
   6.85%  [.] rand@plt 0.00  [  0.0%]
  ...

$ perf annotate --stdio2

Percent  IPC Cycle (Average IPC: 2.30, IPC Coverage: 54.8%)

Disassembly of section .text:

0003aac0 :
  8.32  3.28  sub$0x18,%rsp
3.28  mov$0x1,%esi
3.28  xor%eax,%eax
3.28  cmpl   
$0x0,argp_program_version_hook@@GLIBC_2.2.5+0x1e0
 11.57  3.28 1  ↓ je 20
  lock   cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+0x8a0
↓ jne29
↓ jmp43
 11.57  1.1020:   cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+0x8a0
 ...

Jin Yao (2):
  perf annotate: Compute average IPC and IPC coverage per symbol
  perf report: Display average IPC and IPC coverage per symbol

 tools/perf/Documentation/perf-report.txt |  1 +
 tools/perf/builtin-report.c  | 11 
 tools/perf/util/annotate.c   | 41 +++---
 tools/perf/util/annotate.h   |  5 
 tools/perf/util/hist.h   |  1 +
 tools/perf/util/sort.c   | 43 
 tools/perf/util/sort.h   |  1 +
 tools/perf/util/symbol.h |  1 +
 8 files changed, 101 insertions(+), 3 deletions(-)

-- 
2.7.4



[PATCH 1/2] perf annotate: Compute average IPC and IPC coverage per symbol

2018-11-25 Thread Jin Yao
Add support to perf report annotate view or perf annotate --stdio2 to
aggregate the IPC derived from timed LBRs per symbol. We compute the
average IPC and the IPC coverage percentage.

For example,

$ perf annotate --stdio2

Percent  IPC Cycle (Average IPC: 2.30, IPC Coverage: 54.8%)

Disassembly of section .text:

0003aac0 :
  8.32  3.28  sub$0x18,%rsp
3.28  mov$0x1,%esi
3.28  xor%eax,%eax
3.28  cmpl   
$0x0,argp_program_version_hook@@GLIBC_2.2.5+0x1e0
 11.57  3.28 1  ↓ je 20
  lock   cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+0x8a0
↓ jne29
↓ jmp43
 11.57  1.1020:   cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+0x8a0
  0.00  1.10 1  ↓ je 43
29:   lea__abort_msg@@GLIBC_PRIVATE+0x8a0,%rdi
  sub$0x80,%rsp
→ callq  __lll_lock_wait_private
  add$0x80,%rsp
  0.00  3.0043:   lea__ctype_b@GLIBC_2.2.5+0x38,%rdi
3.00  lea0xc(%rsp),%rsi
  8.49  3.00 1  → callq  __random_r
  7.91  1.94  cmpl   
$0x0,argp_program_version_hook@@GLIBC_2.2.5+0x1e0
  0.00  1.94 1  ↓ je 68
  lock   decl   __abort_msg@@GLIBC_PRIVATE+0x8a0
↓ jne70
↓ jmp8a
  0.00  2.0068:   decl   __abort_msg@@GLIBC_PRIVATE+0x8a0
 21.56  2.00 1  ↓ je 8a
70:   lea__abort_msg@@GLIBC_PRIVATE+0x8a0,%rdi
  sub$0x80,%rsp
→ callq  __lll_unlock_wake_private
  add$0x80,%rsp
 21.56  2.908a:   movslq 0xc(%rsp),%rax
2.90  add$0x18,%rsp
  9.03  2.90 1  ← retq

It shows for this symbol the average IPC is 2.30 and the IPC coverage
is 54.8%.

Signed-off-by: Jin Yao 
---
 tools/perf/util/annotate.c | 41 ++---
 tools/perf/util/annotate.h |  5 +
 2 files changed, 43 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 6936daf..4b2b1b0 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -1000,6 +1000,7 @@ static unsigned annotation__count_insn(struct annotation 
*notes, u64 start, u64
 static void annotation__count_and_fill(struct annotation *notes, u64 start, 
u64 end, struct cyc_hist *ch)
 {
unsigned n_insn;
+   unsigned int cover_insn = 0;
u64 offset;
 
n_insn = annotation__count_insn(notes, start, end);
@@ -1013,21 +1014,34 @@ static void annotation__count_and_fill(struct 
annotation *notes, u64 start, u64
for (offset = start; offset <= end; offset++) {
struct annotation_line *al = notes->offsets[offset];
 
-   if (al)
+   if (al && al->ipc == 0.0) {
al->ipc = ipc;
+   cover_insn++;
+   }
+   }
+
+   if (cover_insn) {
+   notes->hit_cycles += ch->cycles;
+   notes->hit_insn += n_insn * ch->num;
+   notes->cover_insn += cover_insn;
}
}
 }
 
 void annotation__compute_ipc(struct annotation *notes, size_t size)
 {
-   u64 offset;
+   s64 offset;
 
if (!notes->src || !notes->src->cycles_hist)
return;
 
+   notes->total_insn = annotation__count_insn(notes, 0, size - 1);
+   notes->hit_cycles = 0;
+   notes->hit_insn = 0;
+   notes->cover_insn = 0;
+
pthread_mutex_lock(¬es->lock);
-   for (offset = 0; offset < size; ++offset) {
+   for (offset = size - 1; offset >= 0; --offset) {
struct cyc_hist *ch;
 
ch = ¬es->src->cycles_hist[offset];
@@ -2563,6 +2577,22 @@ static void disasm_line__write(struct disasm_line *dl, 
struct annotation *notes,
disasm_line__scnprintf(dl, bf, size, !notes->options->use_offset);
 }
 
+static void ipc_coverage_string(char *bf, int size, struct annotation *notes)
+{
+   double ipc = 0.0, coverage = 0.0;
+
+   if (notes->hit_cycles)
+   ipc = notes->hit_insn / ((double)notes->hit_cycles);
+
+   if (notes->total_insn) {
+   coverage = notes->cover_insn * 100.0 /
+   ((double)notes->total_insn);
+   }
+
+   scnprintf(bf, size, "(Average IPC: %.2f, IPC Coverage: %.1f%%)",
+ ipc, coverage);
+}
+
 static void __annotation_line__write(struct annotation_line *al, struct 
annotation *notes

Re: [PATCH 2/2] perf report: Display average IPC and IPC coverage per symbol

2018-11-26 Thread Jin, Yao




On 11/26/2018 5:55 PM, Jiri Olsa wrote:

On Mon, Nov 26, 2018 at 05:40:54PM +0800, Jin Yao wrote:

SNIP


diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index f96c005..94f62c8 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -13,6 +13,7 @@
  #include "strlist.h"
  #include 
  #include "mem-events.h"
+#include "annotate.h"
  #include 
  
  regex_t		parent_regex;

@@ -422,6 +423,47 @@ struct sort_entry sort_srcline_to = {
.se_width_idx   = HISTC_SRCLINE_TO,
  };
  
+static int hist_entry__sym_ipc_snprintf(struct hist_entry *he, char *bf,

+   size_t size, unsigned int width)
+{
+
+   struct symbol *sym = he->ms.sym;
+   struct map *map = he->ms.map;
+   struct perf_evsel *evsel = hists_to_evsel(he->hists);
+   struct annotation *notes;
+   double ipc = 0.0, coverage = 0.0;
+   char tmp[64];
+
+   if (!sym)
+   return repsep_snprintf(bf, size, "%-*s", width, "-");
+
+   if (!sym->annotated &&
+   symbol__annotate2(sym, map, evsel, &annotation__default_options,
+ NULL) < 0) {
+   return 0;
+   }


this seems to make the sorting extra long, would you
please consider progress bar update for this?

should be added somewhere around hists sorting code



Hi Jiri,

Sorting doesn't take long time in my test but the session event 
processing takes some time.


I just think maybe we need a new ops for stdio progress bar like what 
the tui_progress__ops does now. That should be benefit for all stdio usages.


That may be another separate patch-set.

Thanks
Jin Yao


thanks,
jirka



Re: [PATCH 2/2] perf report: Display average IPC and IPC coverage per symbol

2018-11-26 Thread Jin, Yao




On 11/26/2018 5:52 PM, Jiri Olsa wrote:

On Mon, Nov 26, 2018 at 05:40:54PM +0800, Jin Yao wrote:

Support displaying the average IPC and IPC coverage for symbol
in perf report TUI browser. We create a new sort-key 'ipc' for
that.

For example,

$ perf record -g -b ...
$ perf report -s symbol,ipc or
   perf report -s ipc

Overhead  Symbol   IPC   [IPC Coverage]
   39.60%  [.] __random 2.30  [ 54.8%]
   18.02%  [.] main 0.43  [ 54.3%]
   14.21%  [.] compute_flag 2.29  [100.0%]
   14.16%  [.] rand 0.36  [100.0%]
7.06%  [.] __random_r   2.57  [ 70.5%]
6.85%  [.] rand@plt 0.00  [  0.0%]

Note that, stdio mode doesn't support this feature.


the patch below allowed this for stdio
please merge it in, and feel free to change it as you see fit

jirka




Thanks so much! I have merged following code in v2.

Thanks
Jin Yao


---
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 9b75e118f609..a6756dc13285 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -85,6 +85,7 @@ struct report {
int socket_filter;
DECLARE_BITMAP(cpu_bitmap, MAX_NR_CPUS);
struct branch_type_stat brtype_stat;
+   boolsymbol_ipc;
  };
  
  static int report__config(const char *var, const char *value, void *cb)

@@ -129,7 +130,7 @@ static int hist_iter__report_callback(struct 
hist_entry_iter *iter,
struct mem_info *mi;
struct branch_info *bi;
  
-	if (!ui__has_annotation())

+   if (!ui__has_annotation() && !rep->symbol_ipc)
return 0;
  
  	hist__account_cycles(sample->branch_stack, al, sample,

@@ -174,7 +175,7 @@ static int hist_iter__branch_callback(struct 
hist_entry_iter *iter,
struct perf_evsel *evsel = iter->evsel;
int err;
  
-	if (!ui__has_annotation())

+   if (!ui__has_annotation() && !rep->symbol_ipc)
return 0;
  
  	hist__account_cycles(sample->branch_stack, al, sample,

@@ -954,7 +955,6 @@ int cmd_report(int argc, const char **argv)
bool has_br_stack = false;
int branch_mode = -1;
bool branch_call_mode = false;
-   bool symbol_ipc = false;
  #define CALLCHAIN_DEFAULT_OPT  "graph,0.5,caller,function,percent"
const char report_callchain_help[] = "Display call graph (stack 
chain/backtrace):\n\n"
 CALLCHAIN_REPORT_HELP
@@ -1289,7 +1289,7 @@ int cmd_report(int argc, const char **argv)
strstr(sort_order, "ipc")) {
if (!strstr(sort_order, "symbol"))
sort_order = "symbol,ipc";
-   symbol_ipc = true;
+   report.symbol_ipc = true;
}
  
  	if (setup_sorting(session->evlist) < 0) {

@@ -1319,7 +1319,7 @@ int cmd_report(int argc, const char **argv)
 * so don't allocate extra space that won't be used in the stdio
 * implementation.
 */
-   if (ui__has_annotation()) {
+   if (ui__has_annotation() || report.symbol_ipc) {
ret = symbol__annotation_init();
if (ret < 0)
goto error;
@@ -1339,9 +1339,6 @@ int cmd_report(int argc, const char **argv)
symbol_conf.sort_by_name = true;
}
annotation_config__init();
-   } else if (symbol_ipc) {
-   pr_err("Only TUI mode supports sort-key ipc\n");
-   goto error;
}
  
  	if (symbol__init(&session->header.env) < 0)




Re: [PATCH 2/2] perf report: Display average IPC and IPC coverage per symbol

2018-11-26 Thread Jin, Yao




On 11/26/2018 5:53 PM, Jiri Olsa wrote:

On Mon, Nov 26, 2018 at 05:40:54PM +0800, Jin Yao wrote:

SNIP


diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index f96c005..94f62c8 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -13,6 +13,7 @@
  #include "strlist.h"
  #include 
  #include "mem-events.h"
+#include "annotate.h"
  #include 
  
  regex_t		parent_regex;

@@ -422,6 +423,47 @@ struct sort_entry sort_srcline_to = {
.se_width_idx   = HISTC_SRCLINE_TO,
  };
  
+static int hist_entry__sym_ipc_snprintf(struct hist_entry *he, char *bf,

+   size_t size, unsigned int width)
+{
+
+   struct symbol *sym = he->ms.sym;
+   struct map *map = he->ms.map;
+   struct perf_evsel *evsel = hists_to_evsel(he->hists);
+   struct annotation *notes;
+   double ipc = 0.0, coverage = 0.0;
+   char tmp[64];
+
+   if (!sym)
+   return repsep_snprintf(bf, size, "%-*s", width, "-");
+
+   if (!sym->annotated &&
+   symbol__annotate2(sym, map, evsel, &annotation__default_options,
+ NULL) < 0) {
+   return 0;
+   }
+
+   sym->annotated = true;


I don't like this being set in here.. please move it to
symbol__annotate2 or symbol__annotate, not sure which
one of these is the best fit



I have set this flag in symbol__annotate2 in v2.


+   notes = symbol__annotation(sym);
+
+   if (notes->hit_cycles)
+   ipc = notes->hit_insn / ((double)notes->hit_cycles);
+
+   if (notes->total_insn)
+   coverage = notes->cover_insn * 100.0 /
+   ((double)notes->total_insn);


missing { } for multiline code in 'if' condition



Fixed, thanks!

Thanks
Jin Yao


thanks,
jirka



[PATCH v2 1/3] perf annotate: Compute average IPC and IPC coverage per symbol

2018-11-26 Thread Jin Yao
Add support to perf report annotate view or perf annotate --stdio2 to
aggregate the IPC derived from timed LBRs per symbol. We compute the
average IPC and the IPC coverage percentage.

For example,

$ perf annotate --stdio2

Percent  IPC Cycle (Average IPC: 2.30, IPC Coverage: 54.8%)

Disassembly of section .text:

0003aac0 :
  8.32  3.28  sub$0x18,%rsp
3.28  mov$0x1,%esi
3.28  xor%eax,%eax
3.28  cmpl   
$0x0,argp_program_version_hook@@GLIBC_2.2.5+0x1e0
 11.57  3.28 1  ↓ je 20
  lock   cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+0x8a0
↓ jne29
↓ jmp43
 11.57  1.1020:   cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+0x8a0
  0.00  1.10 1  ↓ je 43
29:   lea__abort_msg@@GLIBC_PRIVATE+0x8a0,%rdi
  sub$0x80,%rsp
→ callq  __lll_lock_wait_private
  add$0x80,%rsp
  0.00  3.0043:   lea__ctype_b@GLIBC_2.2.5+0x38,%rdi
3.00  lea0xc(%rsp),%rsi
  8.49  3.00 1  → callq  __random_r
  7.91  1.94  cmpl   
$0x0,argp_program_version_hook@@GLIBC_2.2.5+0x1e0
  0.00  1.94 1  ↓ je 68
  lock   decl   __abort_msg@@GLIBC_PRIVATE+0x8a0
↓ jne70
↓ jmp8a
  0.00  2.0068:   decl   __abort_msg@@GLIBC_PRIVATE+0x8a0
 21.56  2.00 1  ↓ je 8a
70:   lea__abort_msg@@GLIBC_PRIVATE+0x8a0,%rdi
  sub$0x80,%rsp
→ callq  __lll_unlock_wake_private
  add$0x80,%rsp
 21.56  2.908a:   movslq 0xc(%rsp),%rax
2.90  add$0x18,%rsp
  9.03  2.90 1  ← retq

It shows for this symbol the average IPC is 2.30 and the IPC coverage
is 54.8%.

Signed-off-by: Jin Yao 
---
 tools/perf/util/annotate.c | 41 ++---
 tools/perf/util/annotate.h |  5 +
 2 files changed, 43 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 6936daf..4b2b1b0 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -1000,6 +1000,7 @@ static unsigned annotation__count_insn(struct annotation 
*notes, u64 start, u64
 static void annotation__count_and_fill(struct annotation *notes, u64 start, 
u64 end, struct cyc_hist *ch)
 {
unsigned n_insn;
+   unsigned int cover_insn = 0;
u64 offset;
 
n_insn = annotation__count_insn(notes, start, end);
@@ -1013,21 +1014,34 @@ static void annotation__count_and_fill(struct 
annotation *notes, u64 start, u64
for (offset = start; offset <= end; offset++) {
struct annotation_line *al = notes->offsets[offset];
 
-   if (al)
+   if (al && al->ipc == 0.0) {
al->ipc = ipc;
+   cover_insn++;
+   }
+   }
+
+   if (cover_insn) {
+   notes->hit_cycles += ch->cycles;
+   notes->hit_insn += n_insn * ch->num;
+   notes->cover_insn += cover_insn;
}
}
 }
 
 void annotation__compute_ipc(struct annotation *notes, size_t size)
 {
-   u64 offset;
+   s64 offset;
 
if (!notes->src || !notes->src->cycles_hist)
return;
 
+   notes->total_insn = annotation__count_insn(notes, 0, size - 1);
+   notes->hit_cycles = 0;
+   notes->hit_insn = 0;
+   notes->cover_insn = 0;
+
pthread_mutex_lock(¬es->lock);
-   for (offset = 0; offset < size; ++offset) {
+   for (offset = size - 1; offset >= 0; --offset) {
struct cyc_hist *ch;
 
ch = ¬es->src->cycles_hist[offset];
@@ -2563,6 +2577,22 @@ static void disasm_line__write(struct disasm_line *dl, 
struct annotation *notes,
disasm_line__scnprintf(dl, bf, size, !notes->options->use_offset);
 }
 
+static void ipc_coverage_string(char *bf, int size, struct annotation *notes)
+{
+   double ipc = 0.0, coverage = 0.0;
+
+   if (notes->hit_cycles)
+   ipc = notes->hit_insn / ((double)notes->hit_cycles);
+
+   if (notes->total_insn) {
+   coverage = notes->cover_insn * 100.0 /
+   ((double)notes->total_insn);
+   }
+
+   scnprintf(bf, size, "(Average IPC: %.2f, IPC Coverage: %.1f%%)",
+ ipc, coverage);
+}
+
 static void __annotation_line__write(struct annotation_line *al, struct 
annotation *notes

[PATCH v2 3/3] perf report: Display average IPC and IPC coverage per symbol

2018-11-26 Thread Jin Yao
Support displaying the average IPC and IPC coverage for symbol
in perf report TUI browser. We create a new sort-key 'ipc' for
that.

For example,

$ perf record -g -b ...
$ perf report -s symbol,ipc or
  perf report -s ipc

Overhead  Symbol   IPC   [IPC Coverage]
  39.60%  [.] __random 2.30  [ 54.8%]
  18.02%  [.] main 0.43  [ 54.3%]
  14.21%  [.] compute_flag 2.29  [100.0%]
  14.16%  [.] rand 0.36  [100.0%]
   7.06%  [.] __random_r   2.57  [ 70.5%]
   6.85%  [.] rand@plt 0.00  [  0.0%]

Jiri Olsa  provided the patch to support the
stdio mode. I merged Jiri's code in this patch.

$ perf report -s ipc --stdio

  # Overhead  Symbol   IPC   [IPC Coverage]
  #   ...  
  #
39.60%  [.] __random   2.30  [ 54.8%]
18.02%  [.] main   0.43  [ 54.3%]
14.21%  [.] compute_flag   2.29  [100.0%]
14.16%  [.] rand   0.36  [100.0%]
 7.06%  [.] __random_r 2.57  [ 70.5%]
 6.85%  [.] rand@plt   0.00  [  0.0%]
 0.02%  [k] run_timer_softirq  1.60  [ 57.2%]

v2:
---
Merge in Jiri's patch to support stdio mode

Signed-off-by: Jin Yao 
---
 tools/perf/Documentation/perf-report.txt |  1 +
 tools/perf/builtin-report.c  | 14 ---
 tools/perf/util/hist.h   |  1 +
 tools/perf/util/sort.c   | 42 
 tools/perf/util/sort.h   |  1 +
 5 files changed, 56 insertions(+), 3 deletions(-)

diff --git a/tools/perf/Documentation/perf-report.txt 
b/tools/perf/Documentation/perf-report.txt
index 474a494..e7b8974 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -122,6 +122,7 @@ OPTIONS
- in_tx: branch in TSX transaction
- abort: TSX transaction abort.
- cycles: Cycles in basic block
+   - ipc: average ipc (instruction per cycle) of function
 
And default sort keys are changed to comm, dso_from, symbol_from, dso_to
and symbol_to, see '--branch-stack'.
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 257c9c1..a6756dc 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -85,6 +85,7 @@ struct report {
int socket_filter;
DECLARE_BITMAP(cpu_bitmap, MAX_NR_CPUS);
struct branch_type_stat brtype_stat;
+   boolsymbol_ipc;
 };
 
 static int report__config(const char *var, const char *value, void *cb)
@@ -129,7 +130,7 @@ static int hist_iter__report_callback(struct 
hist_entry_iter *iter,
struct mem_info *mi;
struct branch_info *bi;
 
-   if (!ui__has_annotation())
+   if (!ui__has_annotation() && !rep->symbol_ipc)
return 0;
 
hist__account_cycles(sample->branch_stack, al, sample,
@@ -174,7 +175,7 @@ static int hist_iter__branch_callback(struct 
hist_entry_iter *iter,
struct perf_evsel *evsel = iter->evsel;
int err;
 
-   if (!ui__has_annotation())
+   if (!ui__has_annotation() && !rep->symbol_ipc)
return 0;
 
hist__account_cycles(sample->branch_stack, al, sample,
@@ -1284,6 +1285,13 @@ int cmd_report(int argc, const char **argv)
else
use_browser = 0;
 
+   if ((sort__mode == SORT_MODE__BRANCH) && sort_order &&
+   strstr(sort_order, "ipc")) {
+   if (!strstr(sort_order, "symbol"))
+   sort_order = "symbol,ipc";
+   report.symbol_ipc = true;
+   }
+
if (setup_sorting(session->evlist) < 0) {
if (sort_order)
parse_options_usage(report_usage, options, "s", 1);
@@ -1311,7 +1319,7 @@ int cmd_report(int argc, const char **argv)
 * so don't allocate extra space that won't be used in the stdio
 * implementation.
 */
-   if (ui__has_annotation()) {
+   if (ui__has_annotation() || report.symbol_ipc) {
ret = symbol__annotation_init();
if (ret < 0)
goto error;
diff --git a/tools/perf/util/hist.h b/tools/perf/util/hist.h
index 3badd7f..664b5ed 100644
--- a/tools/perf/util/hist.h
+++ b/tools/perf/util/hist.h
@@ -62,6 +62,7 @@ enum hist_column {
HISTC_TRACE,
HISTC_SYM_SIZE,
HISTC_DSO_SIZE,
+   HISTC_SYMBOL_IPC,
HISTC_NR_COLS, /* Last entry */
 };
 
diff --git a/tools/perf/util/sort.c b/tools/perf/util/sort.c
index f96c005..cc8ada3 100644
--- a/tools/perf/util/sort.c
+++ b/tools/perf/util/sort.c
@@ -13,6 +13,7 @@
 #include "strlist

[PATCH v2 2/3] perf annotate: Create a annotate2 flag in struct symbol

2018-11-26 Thread Jin Yao
We often use the symbol__annotate2() to annotate a specified symbol.
While annotating may take some time, so in order to avoid annotating
the same symbol repeatedly, the patch creates a new flag to indicate
the symbol has been annotated.

Signed-off-by: Jin Yao 
---
 tools/perf/util/annotate.c | 1 +
 tools/perf/util/symbol.h   | 1 +
 2 files changed, 2 insertions(+)

diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 4b2b1b0..f69d8e1 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -2798,6 +2798,7 @@ int symbol__annotate2(struct symbol *sym, struct map 
*map, struct perf_evsel *ev
notes->nr_events = nr_pcnt;
 
annotation__update_column_widths(notes);
+   sym->annotate2 = true;
 
return 0;
 
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index d026d21..14d9d43 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -63,6 +63,7 @@ struct symbol {
u8  ignore:1;
u8  inlined:1;
u8  arch_sym;
+   boolannotate2;
charname[0];
 };
 
-- 
2.7.4



[PATCH v2 0/3] perf report/annotate: Support average IPC and IPC coverage for function

2018-11-26 Thread Jin Yao
Add supporting of displaying the average IPC and IPC coverage
percentage per function.

For example,

$ perf record -g -b ...
$ perf report -s symbol,ipc or
  perf report -s symbol,ipc --stdio

Overhead  Symbol   IPC   [IPC Coverage]
  39.60%  [.] __random 2.30  [ 54.8%]
  18.02%  [.] main 0.43  [ 54.3%]
  14.21%  [.] compute_flag 2.29  [100.0%]
  14.16%  [.] rand 0.36  [100.0%]
   7.06%  [.] __random_r   2.57  [ 70.5%]
   6.85%  [.] rand@plt 0.00  [  0.0%]
  ...

$ perf annotate --stdio2

Percent  IPC Cycle (Average IPC: 2.30, IPC Coverage: 54.8%)

Disassembly of section .text:

0003aac0 :
  8.32  3.28  sub$0x18,%rsp
3.28  mov$0x1,%esi
3.28  xor%eax,%eax
3.28  cmpl   
$0x0,argp_program_version_hook@@GLIBC_2.2.5+0x1e0
 11.57  3.28 1  ↓ je 20
  lock   cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+0x8a0
↓ jne29
↓ jmp43
 11.57  1.1020:   cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+0x8a0
 ...

v2:
---
  1. Merge in Jiri's patch to support stdio mode

  2. Add a new patch "perf annotate: Create a annotate2 flag
 in struct symbol" which records if the symbol has been
 annotated yet.

  3. Minor update such as adding { } for multiline code in 'if'
 condition.

Jin Yao (3):
  perf annotate: Compute average IPC and IPC coverage per symbol
  perf annotate: Create a annotate2 flag in struct symbol
  perf report: Display average IPC and IPC coverage per symbol

 tools/perf/Documentation/perf-report.txt |  1 +
 tools/perf/builtin-report.c  | 14 ---
 tools/perf/util/annotate.c   | 42 +---
 tools/perf/util/annotate.h   |  5 
 tools/perf/util/hist.h   |  1 +
 tools/perf/util/sort.c   | 42 
 tools/perf/util/sort.h   |  1 +
 tools/perf/util/symbol.h |  1 +
 8 files changed, 101 insertions(+), 6 deletions(-)

-- 
2.7.4



Re: [PATCH v2 3/3] perf report: Display average IPC and IPC coverage per symbol

2018-11-27 Thread Jin, Yao




On 11/28/2018 2:33 AM, Andi Kleen wrote:

On Tue, Nov 27, 2018 at 08:09:48PM +0800, Jin Yao wrote:

Support displaying the average IPC and IPC coverage for symbol
in perf report TUI browser. We create a new sort-key 'ipc' for
that.


Another thing that would be nice would be to automatically enable
these columns perf report when -b was enabled in the perf.data.
We can't check for timed LBR being supported without
looking into samples, but at least can check for -b.

Otherwise a lot of people won't realize it's supported.

It would show always 0.0 when timed LBR is not supported,
but that might be ok. Perhaps make both columns empty
in this case instead of 0.0.


-Andi



Hi Andi,

Yes, I agree, people will not realize a new option 'ipc' is created.

So I will enable 2 columns ("IPC" and "IPC columns") even when timed LBR 
is not supported. Just show empty or '-' in these columns.


Thanks
Jin Yao



[PATCH v3 2/3] perf annotate: Create a annotate2 flag in struct symbol

2018-11-27 Thread Jin Yao
We often use the symbol__annotate2() to annotate a specified symbol.
While annotating may take some time, so in order to avoid annotating
the same symbol repeatedly, the patch creates a new flag to indicate
the symbol has been annotated.

Signed-off-by: Jin Yao 
---
 tools/perf/util/annotate.c | 1 +
 tools/perf/util/symbol.h   | 1 +
 2 files changed, 2 insertions(+)

diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 4b2b1b0..f69d8e1 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -2798,6 +2798,7 @@ int symbol__annotate2(struct symbol *sym, struct map 
*map, struct perf_evsel *ev
notes->nr_events = nr_pcnt;
 
annotation__update_column_widths(notes);
+   sym->annotate2 = true;
 
return 0;
 
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index d026d21..14d9d43 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -63,6 +63,7 @@ struct symbol {
u8  ignore:1;
u8  inlined:1;
u8  arch_sym;
+   boolannotate2;
charname[0];
 };
 
-- 
2.7.4



[PATCH v3 0/3] perf report/annotate: Support average IPC and IPC coverage for function

2018-11-27 Thread Jin Yao
Add supporting of displaying the average IPC and IPC coverage
percentage per function.

For example,

$ perf record -b ...
$ perf report -s symbol or
  perf report -s symbol --stdio

Overhead  Symbol   IPC   [IPC Coverage]
  39.60%  [.] __random 2.30  [ 54.8%]
  18.02%  [.] main 0.43  [ 54.3%]
  14.21%  [.] compute_flag 2.29  [100.0%]
  14.16%  [.] rand 0.36  [100.0%]
   7.06%  [.] __random_r   2.57  [ 70.5%]
   6.85%  [.] rand@plt 0.00  [  0.0%]
  ...

$ perf annotate --stdio2

Percent  IPC Cycle (Average IPC: 2.30, IPC Coverage: 54.8%)

Disassembly of section .text:

0003aac0 :
  8.32  3.28  sub$0x18,%rsp
3.28  mov$0x1,%esi
3.28  xor%eax,%eax
3.28  cmpl   
$0x0,argp_program_version_hook@@GLIBC_2.2.5+0x1e0
 11.57  3.28 1  ↓ je 20
  lock   cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+0x8a0
↓ jne29
↓ jmp43
 11.57  1.1020:   cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+0x8a0
 ...

v3:
---
Remove the sortkey "ipc" from command-line. The columns "IPC"
and "[IPC Coverage]" are automatically enabled when "symbol"
is specified.

Patch "perf report: Display average IPC and IPC coverage per symbol"
is impacted.

v2:
---
  1. Merge in Jiri's patch to support stdio mode

  2. Add a new patch "perf annotate: Create a annotate2 flag
 in struct symbol" which records if the symbol has been
 annotated yet.

  3. Minor update such as adding { } for multiline code in 'if'
 condition.

Jin Yao (3):
  perf annotate: Compute average IPC and IPC coverage per symbol
  perf annotate: Create a annotate2 flag in struct symbol
  perf report: Display average IPC and IPC coverage per symbol

 tools/perf/builtin-report.c | 26 ---
 tools/perf/util/annotate.c  | 42 ---
 tools/perf/util/annotate.h  |  5 
 tools/perf/util/hist.h  |  1 +
 tools/perf/util/sort.c  | 61 +
 tools/perf/util/sort.h  |  2 ++
 tools/perf/util/symbol.h|  1 +
 7 files changed, 132 insertions(+), 6 deletions(-)

-- 
2.7.4



[PATCH v3 3/3] perf report: Display average IPC and IPC coverage per symbol

2018-11-27 Thread Jin Yao
Support displaying the average IPC and IPC coverage per symbol
in perf report TUI and stdio modes.

For example,

$ perf record -b ...
$ perf report -s symbol

Overhead  Symbol   IPC   [IPC Coverage]
  39.60%  [.] __random 2.30  [ 54.8%]
  18.02%  [.] main 0.43  [ 54.3%]
  14.21%  [.] compute_flag 2.29  [100.0%]
  14.16%  [.] rand 0.36  [100.0%]
   7.06%  [.] __random_r   2.57  [ 70.5%]
   6.85%  [.] rand@plt 0.00  [  0.0%]

Jiri Olsa  provides the patch to support the
stdio mode. I merge Jiri's code in this patch.

$ perf report -s symbol --stdio

  # Overhead  Symbol   IPC   [IPC Coverage]
  #   ...  
  #
39.60%  [.] __random   2.30  [ 54.8%]
18.02%  [.] main   0.43  [ 54.3%]
14.21%  [.] compute_flag   2.29  [100.0%]
14.16%  [.] rand   0.36  [100.0%]
 7.06%  [.] __random_r 2.57  [ 70.5%]
 6.85%  [.] rand@plt   0.00  [  0.0%]
 0.02%  [k] run_timer_softirq  1.60  [ 57.2%]

The columns "IPC" and "[IPC Coverage]" are automatically enabled
when the sort-key "symbol" is specified. If the perf.data doesn't
contain timed LBR information, columns are filled with "-".

For example,

  # Overhead  Symbol   IPC   [IPC Coverage]
  #   ...  
  #
  46.57%  [.] main -  -
  17.60%  [.] rand -  -
  15.84%  [.] __random_r   -  -
  11.90%  [.] __random -  -
   6.50%  [.] compute_flag -  -
   1.59%  [.] rand@plt -  -
   0.00%  [.] _dl_relocate_object  -  -
   0.00%  [k] tlb_flush_mmu-  -
   0.00%  [k] perf_event_mmap  -  -
   0.00%  [k] native_sched_clock   -  -
   0.00%  [k] intel_pmu_handle_irq_v4  -  -
   0.00%  [k] native_write_msr -  -

v3:
---
Removed the sortkey 'ipc' from command-line. The columns "IPC"
and "[IPC Coverage]" are automatically enabled when "symbol"
is specified.

v2:
---
Merge in Jiri's patch to support stdio mode

Signed-off-by: Jin Yao 
---
 tools/perf/builtin-report.c | 26 ---
 tools/perf/util/hist.h  |  1 +
 tools/perf/util/sort.c  | 61 +
 tools/perf/util/sort.h  |  2 ++
 4 files changed, 87 insertions(+), 3 deletions(-)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 257c9c1..4958095 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -85,6 +85,7 @@ struct report {
int socket_filter;
DECLARE_BITMAP(cpu_bitmap, MAX_NR_CPUS);
struct branch_type_stat brtype_stat;
+   boolsymbol_ipc;
 };
 
 static int report__config(const char *var, const char *value, void *cb)
@@ -129,7 +130,7 @@ static int hist_iter__report_callback(struct 
hist_entry_iter *iter,
struct mem_info *mi;
struct branch_info *bi;
 
-   if (!ui__has_annotation())
+   if (!ui__has_annotation() && !rep->symbol_ipc)
return 0;
 
hist__account_cycles(sample->branch_stack, al, sample,
@@ -174,7 +175,7 @@ static int hist_iter__branch_callback(struct 
hist_entry_iter *iter,
struct perf_evsel *evsel = iter->evsel;
int err;
 
-   if (!ui__has_annotation())
+   if (!ui__has_annotation() && !rep->symbol_ipc)
return 0;
 
hist__account_cycles(sample->branch_stack, al, sample,
@@ -1133,6 +1134,7 @@ int cmd_report(int argc, const char **argv)
.mode  = PERF_DATA_MODE_READ,
};
int ret = hists__init();
+   char sort_tmp[128];
 
if (ret < 0)
return ret;
@@ -1284,6 +1286,24 @@ int cmd_report(int argc, const char **argv)
else
use_browser = 0;
 
+   if (sort_order && strstr(sort_order, "ipc")) {
+   parse_options_usage(report_usage, options, "s", 1);
+   goto error;
+   }
+
+   if (sort_order && strstr(sort_order, "symbol")) {
+   if (sort__mode == SORT_MODE__BRANCH) {
+   snprintf(sort_tmp, sizeof(sort_tmp), "%s,%s",
+sort_order, "ipc_lbr");
+   report.symbol_ipc = true;
+   } else {
+   snprintf(sort_tmp, sizeof(sort_tmp), "%s,%s",
+sort_order, "ipc_null&qu

[PATCH v3 1/3] perf annotate: Compute average IPC and IPC coverage per symbol

2018-11-27 Thread Jin Yao
Add support to perf report annotate view or perf annotate --stdio2 to
aggregate the IPC derived from timed LBRs per symbol. We compute the
average IPC and the IPC coverage percentage.

For example,

$ perf annotate --stdio2

Percent  IPC Cycle (Average IPC: 2.30, IPC Coverage: 54.8%)

Disassembly of section .text:

0003aac0 :
  8.32  3.28  sub$0x18,%rsp
3.28  mov$0x1,%esi
3.28  xor%eax,%eax
3.28  cmpl   
$0x0,argp_program_version_hook@@GLIBC_2.2.5+0x1e0
 11.57  3.28 1  ↓ je 20
  lock   cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+0x8a0
↓ jne29
↓ jmp43
 11.57  1.1020:   cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+0x8a0
  0.00  1.10 1  ↓ je 43
29:   lea__abort_msg@@GLIBC_PRIVATE+0x8a0,%rdi
  sub$0x80,%rsp
→ callq  __lll_lock_wait_private
  add$0x80,%rsp
  0.00  3.0043:   lea__ctype_b@GLIBC_2.2.5+0x38,%rdi
3.00  lea0xc(%rsp),%rsi
  8.49  3.00 1  → callq  __random_r
  7.91  1.94  cmpl   
$0x0,argp_program_version_hook@@GLIBC_2.2.5+0x1e0
  0.00  1.94 1  ↓ je 68
  lock   decl   __abort_msg@@GLIBC_PRIVATE+0x8a0
↓ jne70
↓ jmp8a
  0.00  2.0068:   decl   __abort_msg@@GLIBC_PRIVATE+0x8a0
 21.56  2.00 1  ↓ je 8a
70:   lea__abort_msg@@GLIBC_PRIVATE+0x8a0,%rdi
  sub$0x80,%rsp
→ callq  __lll_unlock_wake_private
  add$0x80,%rsp
 21.56  2.908a:   movslq 0xc(%rsp),%rax
2.90  add$0x18,%rsp
  9.03  2.90 1  ← retq

It shows for this symbol the average IPC is 2.30 and the IPC coverage
is 54.8%.

Signed-off-by: Jin Yao 
---
 tools/perf/util/annotate.c | 41 ++---
 tools/perf/util/annotate.h |  5 +
 2 files changed, 43 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 6936daf..4b2b1b0 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -1000,6 +1000,7 @@ static unsigned annotation__count_insn(struct annotation 
*notes, u64 start, u64
 static void annotation__count_and_fill(struct annotation *notes, u64 start, 
u64 end, struct cyc_hist *ch)
 {
unsigned n_insn;
+   unsigned int cover_insn = 0;
u64 offset;
 
n_insn = annotation__count_insn(notes, start, end);
@@ -1013,21 +1014,34 @@ static void annotation__count_and_fill(struct 
annotation *notes, u64 start, u64
for (offset = start; offset <= end; offset++) {
struct annotation_line *al = notes->offsets[offset];
 
-   if (al)
+   if (al && al->ipc == 0.0) {
al->ipc = ipc;
+   cover_insn++;
+   }
+   }
+
+   if (cover_insn) {
+   notes->hit_cycles += ch->cycles;
+   notes->hit_insn += n_insn * ch->num;
+   notes->cover_insn += cover_insn;
}
}
 }
 
 void annotation__compute_ipc(struct annotation *notes, size_t size)
 {
-   u64 offset;
+   s64 offset;
 
if (!notes->src || !notes->src->cycles_hist)
return;
 
+   notes->total_insn = annotation__count_insn(notes, 0, size - 1);
+   notes->hit_cycles = 0;
+   notes->hit_insn = 0;
+   notes->cover_insn = 0;
+
pthread_mutex_lock(¬es->lock);
-   for (offset = 0; offset < size; ++offset) {
+   for (offset = size - 1; offset >= 0; --offset) {
struct cyc_hist *ch;
 
ch = ¬es->src->cycles_hist[offset];
@@ -2563,6 +2577,22 @@ static void disasm_line__write(struct disasm_line *dl, 
struct annotation *notes,
disasm_line__scnprintf(dl, bf, size, !notes->options->use_offset);
 }
 
+static void ipc_coverage_string(char *bf, int size, struct annotation *notes)
+{
+   double ipc = 0.0, coverage = 0.0;
+
+   if (notes->hit_cycles)
+   ipc = notes->hit_insn / ((double)notes->hit_cycles);
+
+   if (notes->total_insn) {
+   coverage = notes->cover_insn * 100.0 /
+   ((double)notes->total_insn);
+   }
+
+   scnprintf(bf, size, "(Average IPC: %.2f, IPC Coverage: %.1f%%)",
+ ipc, coverage);
+}
+
 static void __annotation_line__write(struct annotation_line *al, struct 
annotation *notes

Re: [PATCH v3 0/3] perf report/annotate: Support average IPC and IPC coverage for function

2018-11-28 Thread Jin, Yao




On 11/28/2018 5:10 PM, Ingo Molnar wrote:


* Jin Yao  wrote:


Add supporting of displaying the average IPC and IPC coverage
percentage per function.

For example,

$ perf record -b ...
$ perf report -s symbol or
   perf report -s symbol --stdio

Overhead  Symbol   IPC   [IPC Coverage]
   39.60%  [.] __random 2.30  [ 54.8%]
   18.02%  [.] main 0.43  [ 54.3%]
   14.21%  [.] compute_flag 2.29  [100.0%]
   14.16%  [.] rand 0.36  [100.0%]
7.06%  [.] __random_r   2.57  [ 70.5%]
6.85%  [.] rand@plt 0.00  [  0.0%]
   ...

$ perf annotate --stdio2

Percent  IPC Cycle (Average IPC: 2.30, IPC Coverage: 54.8%)

 Disassembly of section .text:

 0003aac0 :
   8.32  3.28  sub$0x18,%rsp
 3.28  mov$0x1,%esi
 3.28  xor%eax,%eax
 3.28  cmpl   
$0x0,argp_program_version_hook@@GLIBC_2.2.5+0x1e0
  11.57  3.28 1  ↓ je 20
   lock   cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+0x8a0
 ↓ jne29
 ↓ jmp43
  11.57  1.1020:   cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+0x8a0


That's a nice feature: please add meaningful documentation, accessible
via the perf help system preferably, that outlines how the IPC metrics
should be interpreted and how they are useful when optimizing programs.

Thanks,

Ingo



Hi Ingo,

Thanks so much for your comments! I think I will add some explanations 
in perf/Documentation/perf-report.txt, maybe somewhere around the 
sort_key section (-s::).


Thanks
Jin Yao


Re: [PATCH v3 0/3] perf report/annotate: Support average IPC and IPC coverage for function

2018-11-28 Thread Jin, Yao




On 11/28/2018 6:18 PM, Jiri Olsa wrote:

On Wed, Nov 28, 2018 at 11:17:57AM +0100, Jiri Olsa wrote:

On Wed, Nov 28, 2018 at 11:14:55PM +0800, Jin Yao wrote:

Add supporting of displaying the average IPC and IPC coverage
percentage per function.

For example,

$ perf record -b ...
$ perf report -s symbol or
   perf report -s symbol --stdio

Overhead  Symbol   IPC   [IPC Coverage]
   39.60%  [.] __random 2.30  [ 54.8%]
   18.02%  [.] main 0.43  [ 54.3%]
   14.21%  [.] compute_flag 2.29  [100.0%]
   14.16%  [.] rand 0.36  [100.0%]
7.06%  [.] __random_r   2.57  [ 70.5%]
6.85%  [.] rand@plt 0.00  [  0.0%]
   ...

$ perf annotate --stdio2

Percent  IPC Cycle (Average IPC: 2.30, IPC Coverage: 54.8%)

 Disassembly of section .text:

 0003aac0 :
   8.32  3.28  sub$0x18,%rsp
 3.28  mov$0x1,%esi
 3.28  xor%eax,%eax
 3.28  cmpl   
$0x0,argp_program_version_hook@@GLIBC_2.2.5+0x1e0
  11.57  3.28 1  ↓ je 20
   lock   cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+0x8a0
 ↓ jne29
 ↓ jmp43
  11.57  1.1020:   cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+0x8a0
  ...

v3:
---
 Remove the sortkey "ipc" from command-line. The columns "IPC"
 and "[IPC Coverage]" are automatically enabled when "symbol"
 is specified.

 Patch "perf report: Display average IPC and IPC coverage per symbol"
 is impacted.

v2:
---
   1. Merge in Jiri's patch to support stdio mode

   2. Add a new patch "perf annotate: Create a annotate2 flag
  in struct symbol" which records if the symbol has been
  annotated yet.

   3. Minor update such as adding { } for multiline code in 'if'
  condition.

Jin Yao (3):
   perf annotate: Compute average IPC and IPC coverage per symbol
   perf annotate: Create a annotate2 flag in struct symbol
   perf report: Display average IPC and IPC coverage per symbol


hi,
I took he liberty and moved the annotation retrieval into
resort phase under progress bar scope. It's currently on top
of my perf/fixes branch, could you please check it?

   git://git.kernel.org/pub/scm/linux/kernel/git/jolsa/perf.git


commits:
7f3ffdb9783f perf tools: Move symbol annotation to resort
e87f7d3c4f10 perf tools: Add perf_evsel__output_resort_cb function
40012b422108 perf tools: Add argument to hists__resort_cb_t callback

jirka



Hi Jiri,

Thanks for your patches. I have tested with your repo. Now I can see 2 
progress bars. One is displayed at the events processing phase, the 
other is displayed at resorting phase.


I have only one concern that is, in my test, much of time is consumed by 
the event processing phase, for example, 90% of time. Only 10% of time 
is consumed at resorting phase.


So do we really need the second progress bar?

Thanks
Jin Yao


[PATCH v4 1/4] perf annotate: Compute average IPC and IPC coverage per symbol

2018-11-28 Thread Jin Yao
Add support to perf report annotate view or perf annotate --stdio2 to
aggregate the IPC derived from timed LBRs per symbol. We compute the
average IPC and the IPC coverage percentage.

For example,

$ perf annotate --stdio2

Percent  IPC Cycle (Average IPC: 2.30, IPC Coverage: 54.8%)

Disassembly of section .text:

0003aac0 :
  8.32  3.28  sub$0x18,%rsp
3.28  mov$0x1,%esi
3.28  xor%eax,%eax
3.28  cmpl   
$0x0,argp_program_version_hook@@GLIBC_2.2.5+0x1e0
 11.57  3.28 1  ↓ je 20
  lock   cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+0x8a0
↓ jne29
↓ jmp43
 11.57  1.1020:   cmpxchg %esi,__abort_msg@@GLIBC_PRIVATE+0x8a0
  0.00  1.10 1  ↓ je 43
29:   lea__abort_msg@@GLIBC_PRIVATE+0x8a0,%rdi
  sub$0x80,%rsp
→ callq  __lll_lock_wait_private
  add$0x80,%rsp
  0.00  3.0043:   lea__ctype_b@GLIBC_2.2.5+0x38,%rdi
3.00  lea0xc(%rsp),%rsi
  8.49  3.00 1  → callq  __random_r
  7.91  1.94  cmpl   
$0x0,argp_program_version_hook@@GLIBC_2.2.5+0x1e0
  0.00  1.94 1  ↓ je 68
  lock   decl   __abort_msg@@GLIBC_PRIVATE+0x8a0
↓ jne70
↓ jmp8a
  0.00  2.0068:   decl   __abort_msg@@GLIBC_PRIVATE+0x8a0
 21.56  2.00 1  ↓ je 8a
70:   lea__abort_msg@@GLIBC_PRIVATE+0x8a0,%rdi
  sub$0x80,%rsp
→ callq  __lll_unlock_wake_private
  add$0x80,%rsp
 21.56  2.908a:   movslq 0xc(%rsp),%rax
2.90  add$0x18,%rsp
  9.03  2.90 1  ← retq

It shows for this symbol the average IPC is 2.30 and the IPC coverage
is 54.8%.

Signed-off-by: Jin Yao 
---
 tools/perf/util/annotate.c | 41 ++---
 tools/perf/util/annotate.h |  5 +
 2 files changed, 43 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/annotate.c b/tools/perf/util/annotate.c
index 6936daf..4b2b1b0 100644
--- a/tools/perf/util/annotate.c
+++ b/tools/perf/util/annotate.c
@@ -1000,6 +1000,7 @@ static unsigned annotation__count_insn(struct annotation 
*notes, u64 start, u64
 static void annotation__count_and_fill(struct annotation *notes, u64 start, 
u64 end, struct cyc_hist *ch)
 {
unsigned n_insn;
+   unsigned int cover_insn = 0;
u64 offset;
 
n_insn = annotation__count_insn(notes, start, end);
@@ -1013,21 +1014,34 @@ static void annotation__count_and_fill(struct 
annotation *notes, u64 start, u64
for (offset = start; offset <= end; offset++) {
struct annotation_line *al = notes->offsets[offset];
 
-   if (al)
+   if (al && al->ipc == 0.0) {
al->ipc = ipc;
+   cover_insn++;
+   }
+   }
+
+   if (cover_insn) {
+   notes->hit_cycles += ch->cycles;
+   notes->hit_insn += n_insn * ch->num;
+   notes->cover_insn += cover_insn;
}
}
 }
 
 void annotation__compute_ipc(struct annotation *notes, size_t size)
 {
-   u64 offset;
+   s64 offset;
 
if (!notes->src || !notes->src->cycles_hist)
return;
 
+   notes->total_insn = annotation__count_insn(notes, 0, size - 1);
+   notes->hit_cycles = 0;
+   notes->hit_insn = 0;
+   notes->cover_insn = 0;
+
pthread_mutex_lock(¬es->lock);
-   for (offset = 0; offset < size; ++offset) {
+   for (offset = size - 1; offset >= 0; --offset) {
struct cyc_hist *ch;
 
ch = ¬es->src->cycles_hist[offset];
@@ -2563,6 +2577,22 @@ static void disasm_line__write(struct disasm_line *dl, 
struct annotation *notes,
disasm_line__scnprintf(dl, bf, size, !notes->options->use_offset);
 }
 
+static void ipc_coverage_string(char *bf, int size, struct annotation *notes)
+{
+   double ipc = 0.0, coverage = 0.0;
+
+   if (notes->hit_cycles)
+   ipc = notes->hit_insn / ((double)notes->hit_cycles);
+
+   if (notes->total_insn) {
+   coverage = notes->cover_insn * 100.0 /
+   ((double)notes->total_insn);
+   }
+
+   scnprintf(bf, size, "(Average IPC: %.2f, IPC Coverage: %.1f%%)",
+ ipc, coverage);
+}
+
 static void __annotation_line__write(struct annotation_line *al, struct 
annotation *notes

[PATCH v4 4/4] perf report: Documentation average IPC and IPC coverage

2018-11-28 Thread Jin Yao
Add explanations for new columns "IPC" and "IPC coverage" in perf
documentation.

Signed-off-by: Jin Yao 
---
 tools/perf/Documentation/perf-report.txt | 8 
 1 file changed, 8 insertions(+)

diff --git a/tools/perf/Documentation/perf-report.txt 
b/tools/perf/Documentation/perf-report.txt
index 474a494..e5a32f3 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -126,6 +126,14 @@ OPTIONS
And default sort keys are changed to comm, dso_from, symbol_from, dso_to
and symbol_to, see '--branch-stack'.
 
+   When the sort key symbol is specified, columns "IPC" and "IPC Coverage"
+   are enabled automatically. Column "IPC" reports the average IPC per 
function
+   and column "IPC coverage" reports the percentage of instructions with
+   sampled IPC in this function. IPC means Instruction Per Cycle. If it's 
low,
+   it indicates there may be performance bottleneck when the function is
+   executed, such as, memory access bottleneck. If a function has high 
overhead
+   and low IPC, it's worth further analysis for performance optimization.
+
If the --mem-mode option is used, the following sort keys are also 
available
(incompatible with --branch-stack):
symbol_daddr, dso_daddr, locked, tlb, mem, snoop, dcacheline.
-- 
2.7.4



[PATCH v4 3/4] perf report: Display average IPC and IPC coverage per symbol

2018-11-28 Thread Jin Yao
Support displaying the average IPC and IPC coverage per symbol
in perf report TUI and stdio modes.

For example,

$ perf record -b ...
$ perf report -s symbol

Overhead  Symbol   IPC   [IPC Coverage]
  39.60%  [.] __random 2.30  [ 54.8%]
  18.02%  [.] main 0.43  [ 54.3%]
  14.21%  [.] compute_flag 2.29  [100.0%]
  14.16%  [.] rand 0.36  [100.0%]
   7.06%  [.] __random_r   2.57  [ 70.5%]
   6.85%  [.] rand@plt 0.00  [  0.0%]

Jiri Olsa  provides the patch to support the
stdio mode. I merge Jiri's code in this patch.

$ perf report -s symbol --stdio

  # Overhead  Symbol   IPC   [IPC Coverage]
  #   ...  
  #
39.60%  [.] __random   2.30  [ 54.8%]
18.02%  [.] main   0.43  [ 54.3%]
14.21%  [.] compute_flag   2.29  [100.0%]
14.16%  [.] rand   0.36  [100.0%]
 7.06%  [.] __random_r 2.57  [ 70.5%]
 6.85%  [.] rand@plt   0.00  [  0.0%]
 0.02%  [k] run_timer_softirq  1.60  [ 57.2%]

The columns "IPC" and "[IPC Coverage]" are automatically enabled
when the sort-key "symbol" is specified. If the perf.data doesn't
contain timed LBR information, columns are filled with "-".

For example,

  # Overhead  Symbol   IPC   [IPC Coverage]
  #   ...  
  #
  46.57%  [.] main -  -
  17.60%  [.] rand -  -
  15.84%  [.] __random_r   -  -
  11.90%  [.] __random -  -
   6.50%  [.] compute_flag -  -
   1.59%  [.] rand@plt -  -
   0.00%  [.] _dl_relocate_object  -  -
   0.00%  [k] tlb_flush_mmu-  -
   0.00%  [k] perf_event_mmap  -  -
   0.00%  [k] native_sched_clock   -  -
   0.00%  [k] intel_pmu_handle_irq_v4  -  -
   0.00%  [k] native_write_msr -  -

v3:
---
Removed the sortkey 'ipc' from command-line. The columns "IPC"
and "[IPC Coverage]" are automatically enabled when "symbol"
is specified.

v2:
---
Merge in Jiri's patch to support stdio mode

Signed-off-by: Jin Yao 
---
 tools/perf/builtin-report.c | 26 ---
 tools/perf/util/hist.h  |  1 +
 tools/perf/util/sort.c  | 61 +
 tools/perf/util/sort.h  |  2 ++
 4 files changed, 87 insertions(+), 3 deletions(-)

diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 257c9c1..4958095 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -85,6 +85,7 @@ struct report {
int socket_filter;
DECLARE_BITMAP(cpu_bitmap, MAX_NR_CPUS);
struct branch_type_stat brtype_stat;
+   boolsymbol_ipc;
 };
 
 static int report__config(const char *var, const char *value, void *cb)
@@ -129,7 +130,7 @@ static int hist_iter__report_callback(struct 
hist_entry_iter *iter,
struct mem_info *mi;
struct branch_info *bi;
 
-   if (!ui__has_annotation())
+   if (!ui__has_annotation() && !rep->symbol_ipc)
return 0;
 
hist__account_cycles(sample->branch_stack, al, sample,
@@ -174,7 +175,7 @@ static int hist_iter__branch_callback(struct 
hist_entry_iter *iter,
struct perf_evsel *evsel = iter->evsel;
int err;
 
-   if (!ui__has_annotation())
+   if (!ui__has_annotation() && !rep->symbol_ipc)
return 0;
 
hist__account_cycles(sample->branch_stack, al, sample,
@@ -1133,6 +1134,7 @@ int cmd_report(int argc, const char **argv)
.mode  = PERF_DATA_MODE_READ,
};
int ret = hists__init();
+   char sort_tmp[128];
 
if (ret < 0)
return ret;
@@ -1284,6 +1286,24 @@ int cmd_report(int argc, const char **argv)
else
use_browser = 0;
 
+   if (sort_order && strstr(sort_order, "ipc")) {
+   parse_options_usage(report_usage, options, "s", 1);
+   goto error;
+   }
+
+   if (sort_order && strstr(sort_order, "symbol")) {
+   if (sort__mode == SORT_MODE__BRANCH) {
+   snprintf(sort_tmp, sizeof(sort_tmp), "%s,%s",
+sort_order, "ipc_lbr");
+   report.symbol_ipc = true;
+   } else {
+   snprintf(sort_tmp, sizeof(sort_tmp), "%s,%s",
+sort_order, "ipc_null&qu

  1   2   3   4   5   6   7   8   9   10   >