[tip: perf/core] perf metricgroup: Support multiple events for metricgroup
The following commit has been merged into the perf/core branch of tip: Commit-ID: f01642e4912bb80a01d693f4cc6fb0897207a090 Gitweb: https://git.kernel.org/tip/f01642e4912bb80a01d693f4cc6fb0897207a090 Author:Jin Yao AuthorDate:Wed, 28 Aug 2019 13:59:32 +08:00 Committer: Arnaldo Carvalho de Melo CommitterDate: Sat, 31 Aug 2019 22:27:52 -03:00 perf metricgroup: Support multiple events for metricgroup Some uncore metrics don't work as expected. For example, on cascadelakex: root@lkp-csl-2sp2:~# perf stat -M UNC_M_PMM_BANDWIDTH.TOTAL -a -- sleep 1 Performance counter stats for 'system wide': 1841092 unc_m_pmm_rpq_inserts 3680816 unc_m_pmm_wpq_inserts 1.001775055 seconds time elapsed root@lkp-csl-2sp2:~# perf stat -M UNC_M_PMM_READ_LATENCY -a -- sleep 1 Performance counter stats for 'system wide': 860649746 unc_m_pmm_rpq_occupancy.all 1840557 unc_m_pmm_rpq_inserts 12790627455 unc_m_clockticks 1.001773348 seconds time elapsed No metrics 'UNC_M_PMM_BANDWIDTH.TOTAL' or 'UNC_M_PMM_READ_LATENCY' are reported. The issue is, the case of an alias expanding to mulitple events is not supported, typically the uncore events. (see comments in find_evsel_group()). For UNC_M_PMM_BANDWIDTH.TOTAL in above example, the expanded event group is '{unc_m_pmm_rpq_inserts,unc_m_pmm_wpq_inserts}:W', but the actual events passed to find_evsel_group are: unc_m_pmm_rpq_inserts unc_m_pmm_rpq_inserts unc_m_pmm_rpq_inserts unc_m_pmm_rpq_inserts unc_m_pmm_rpq_inserts unc_m_pmm_rpq_inserts unc_m_pmm_wpq_inserts unc_m_pmm_wpq_inserts unc_m_pmm_wpq_inserts unc_m_pmm_wpq_inserts unc_m_pmm_wpq_inserts unc_m_pmm_wpq_inserts For this multiple events case, it's not supported well. This patch introduces a new field 'metric_leader' in struct evsel. The first event is considered as a metric leader. For the rest of same events, they point to the first event via it's metric_leader field in struct evsel. This design is for adding the counting results of all same events to the first event in group (the metric_leader). With this patch, root@lkp-csl-2sp2:~# perf stat -M UNC_M_PMM_BANDWIDTH.TOTAL -a -- sleep 1 Performance counter stats for 'system wide': 1842108 unc_m_pmm_rpq_inserts #337.2 MB/sec UNC_M_PMM_BANDWIDTH.TOTAL 3682209 unc_m_pmm_wpq_inserts 1.001819706 seconds time elapsed root@lkp-csl-2sp2:~# perf stat -M UNC_M_PMM_READ_LATENCY -a -- sleep 1 Performance counter stats for 'system wide': 861970685 unc_m_pmm_rpq_occupancy.all #219.4 ns UNC_M_PMM_READ_LATENCY 1842772 unc_m_pmm_rpq_inserts 12790196356 unc_m_clockticks 1.001749103 seconds time elapsed Now we can see the correct metrics 'UNC_M_PMM_BANDWIDTH.TOTAL' and 'UNC_M_PMM_READ_LATENCY'. Signed-off-by: Jin Yao Cc: Alexander Shishkin Cc: Andi Kleen Cc: Jiri Olsa Cc: Kan Liang Cc: Peter Zijlstra Link: http://lore.kernel.org/lkml/20190828055932.8269-5-yao@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/util/evsel.h | 1 +- tools/perf/util/metricgroup.c | 84 +- tools/perf/util/stat-shadow.c | 27 +-- 3 files changed, 68 insertions(+), 44 deletions(-) diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h index fd60cac..68321d1 100644 --- a/tools/perf/util/evsel.h +++ b/tools/perf/util/evsel.h @@ -168,6 +168,7 @@ struct evsel { const char *metric_expr; const char *metric_name; struct evsel**metric_events; + struct evsel*metric_leader; boolcollect_stat; boolweak_group; boolpercore; diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c index f474a29..a7c0424 100644 --- a/tools/perf/util/metricgroup.c +++ b/tools/perf/util/metricgroup.c @@ -90,57 +90,61 @@ struct egroup { const char *metric_unit; }; -static bool record_evsel(int *ind, struct evsel **start, -int idnum, -struct evsel **metric_events, -struct evsel *ev) -{ - metric_events[*ind] = ev; - if (*ind == 0) - *start = ev; - if (++*ind == idnum) { - metric_events[*ind] = NULL; - return true; - } - return false; -} - static struct evsel *find_evsel_group(struct evlist *perf_evlist, const char **ids, int idnum, struct evsel **metric_events) { - struct evsel *ev, *start = NULL; - int ind = 0; + struct evsel *ev; + int i = 0; + bool leader_found; evlist__for_each_entry (perf_evlist, ev) { -
[tip: perf/core] perf metricgroup: Scale the metric result
The following commit has been merged into the perf/core branch of tip: Commit-ID: 287f2649f791819dd2d8f32f0213c8c521d6dfa0 Gitweb: https://git.kernel.org/tip/287f2649f791819dd2d8f32f0213c8c521d6dfa0 Author:Jin Yao AuthorDate:Wed, 28 Aug 2019 13:59:31 +08:00 Committer: Arnaldo Carvalho de Melo CommitterDate: Sat, 31 Aug 2019 22:27:52 -03:00 perf metricgroup: Scale the metric result Some metrics define the scale unit, such as { "BriefDescription": "Intel Optane DC persistent memory read latency (ns). Derived from unc_m_pmm_rpq_occupancy.all", "Counter": "0,1,2,3", "EventCode": "0xE0", "EventName": "UNC_M_PMM_READ_LATENCY", "MetricExpr": "UNC_M_PMM_RPQ_OCCUPANCY.ALL / UNC_M_PMM_RPQ_INSERTS / UNC_M_CLOCKTICKS", "MetricName": "UNC_M_PMM_READ_LATENCY", "PerPkg": "1", "ScaleUnit": "60ns", "UMask": "0x1", "Unit": "iMC" }, For above example, the ratio should be, ratio = (UNC_M_PMM_RPQ_OCCUPANCY.ALL / UNC_M_PMM_RPQ_INSERTS / UNC_M_CLOCKTICKS) * 60 But in current code, the ratio is not scaled ( * 60) With this patch, the ratio is scaled and the unit (ns) is printed. For example, #219.4 ns UNC_M_PMM_READ_LATENCY Signed-off-by: Jin Yao Cc: Alexander Shishkin Cc: Andi Kleen Cc: Jiri Olsa Cc: Kan Liang Cc: Peter Zijlstra Link: http://lore.kernel.org/lkml/20190828055932.8269-4-yao@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/util/metricgroup.c | 3 +++- tools/perf/util/metricgroup.h | 1 +- tools/perf/util/stat-shadow.c | 38 -- 3 files changed, 31 insertions(+), 11 deletions(-) diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c index 33f5e21..f474a29 100644 --- a/tools/perf/util/metricgroup.c +++ b/tools/perf/util/metricgroup.c @@ -87,6 +87,7 @@ struct egroup { const char **ids; const char *metric_name; const char *metric_expr; + const char *metric_unit; }; static bool record_evsel(int *ind, struct evsel **start, @@ -182,6 +183,7 @@ static int metricgroup__setup_events(struct list_head *groups, } expr->metric_expr = eg->metric_expr; expr->metric_name = eg->metric_name; + expr->metric_unit = eg->metric_unit; expr->metric_events = metric_events; list_add(&expr->nd, &me->head); } @@ -453,6 +455,7 @@ static int metricgroup__add_metric(const char *metric, struct strbuf *events, eg->idnum = idnum; eg->metric_name = pe->metric_name; eg->metric_expr = pe->metric_expr; + eg->metric_unit = pe->unit; list_add_tail(&eg->nd, group_list); ret = 0; } diff --git a/tools/perf/util/metricgroup.h b/tools/perf/util/metricgroup.h index e5092f6..475c7f9 100644 --- a/tools/perf/util/metricgroup.h +++ b/tools/perf/util/metricgroup.h @@ -20,6 +20,7 @@ struct metric_expr { struct list_head nd; const char *metric_expr; const char *metric_name; + const char *metric_unit; struct evsel **metric_events; }; diff --git a/tools/perf/util/stat-shadow.c b/tools/perf/util/stat-shadow.c index 2ed5e00..696d263 100644 --- a/tools/perf/util/stat-shadow.c +++ b/tools/perf/util/stat-shadow.c @@ -715,6 +715,7 @@ static void generic_metric(struct perf_stat_config *config, struct evsel **metric_events, char *name, const char *metric_name, + const char *metric_unit, double avg, int cpu, struct perf_stat_output_ctx *out, @@ -722,7 +723,7 @@ static void generic_metric(struct perf_stat_config *config, { print_metric_t print_metric = out->print_metric; struct parse_ctx pctx; - double ratio; + double ratio, scale; int i; void *ctxp = out->ctx; char *n, *pn; @@ -732,7 +733,6 @@ static void generic_metric(struct perf_stat_config *config, for (i = 0; metric_events[i]; i++) { struct saved_value *v; struct stats *stats; - double scale; if (!strcmp(metric_events[i]->name, "duration_time")) { stats = &walltime_nsecs_stats; @@ -762,16 +762,32 @@ static void generic_metric(struct perf_stat_config *config, if (!metric_events[i]) { const char *p = metric_expr; - if (expr__parse(&ratio, &pctx, &p) == 0) - print_metric(config, ctxp, NULL, "%8.1f", - metric_name ? - metric_name : - out->force_header ? name : "", -
[tip: perf/core] perf pmu: Change convert_scale from static to global
The following commit has been merged into the perf/core branch of tip: Commit-ID: a55ab7c4ca6986a542d313b02043a39ebf712a39 Gitweb: https://git.kernel.org/tip/a55ab7c4ca6986a542d313b02043a39ebf712a39 Author:Jin Yao AuthorDate:Wed, 28 Aug 2019 13:59:29 +08:00 Committer: Arnaldo Carvalho de Melo CommitterDate: Sat, 31 Aug 2019 22:27:51 -03:00 perf pmu: Change convert_scale from static to global The function convert_scale() can be used to convert string to unit and scale. For example, s = "60ns"; convert_scale(s, &unit, &scale); unit = "ns", scale = 60. Currently this function is static. This patch renames the function to perf_pmu__convert_scale and changes the function to global. No functional change. Signed-off-by: Jin Yao Cc: Alexander Shishkin Cc: Andi Kleen Cc: Jiri Olsa Cc: Kan Liang Cc: Peter Zijlstra Link: http://lore.kernel.org/lkml/20190828055932.8269-2-yao@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/util/pmu.c | 6 +++--- tools/perf/util/pmu.h | 2 ++ 2 files changed, 5 insertions(+), 3 deletions(-) diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c index 6b3448f..fb597fa 100644 --- a/tools/perf/util/pmu.c +++ b/tools/perf/util/pmu.c @@ -102,7 +102,7 @@ static int pmu_format(const char *name, struct list_head *format) return 0; } -static int convert_scale(const char *scale, char **end, double *sval) +int perf_pmu__convert_scale(const char *scale, char **end, double *sval) { char *lc; int ret = 0; @@ -165,7 +165,7 @@ static int perf_pmu__parse_scale(struct perf_pmu_alias *alias, char *dir, char * else scale[sret] = '\0'; - ret = convert_scale(scale, NULL, &alias->scale); + ret = perf_pmu__convert_scale(scale, NULL, &alias->scale); error: close(fd); return ret; @@ -373,7 +373,7 @@ static int __perf_pmu__new_alias(struct list_head *list, char *dir, char *name, desc ? strdup(desc) : NULL; alias->topic = topic ? strdup(topic) : NULL; if (unit) { - if (convert_scale(unit, &unit, &alias->scale) < 0) + if (perf_pmu__convert_scale(unit, &unit, &alias->scale) < 0) return -1; snprintf(alias->unit, sizeof(alias->unit), "%s", unit); } diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h index 3f8b79b..f36ade6 100644 --- a/tools/perf/util/pmu.h +++ b/tools/perf/util/pmu.h @@ -96,4 +96,6 @@ struct perf_event_attr *perf_pmu__get_default_config(struct perf_pmu *pmu); struct pmu_events_map *perf_pmu__find_map(struct perf_pmu *pmu); +int perf_pmu__convert_scale(const char *scale, char **end, double *sval); + #endif /* __PMU_H */
[tip: perf/core] perf diff: Report noisy for cycles diff
The following commit has been merged into the perf/core branch of tip: Commit-ID: cebf7d51a6c3babc4d0589da7aec0de1af0a5691 Gitweb: https://git.kernel.org/tip/cebf7d51a6c3babc4d0589da7aec0de1af0a5691 Author:Jin Yao AuthorDate:Wed, 25 Sep 2019 09:14:46 +08:00 Committer: Arnaldo Carvalho de Melo CommitterDate: Fri, 11 Oct 2019 10:57:00 -03:00 perf diff: Report noisy for cycles diff This patch prints the stddev and hist for the cycles diff of program block. It can help us to understand if the cycles is noisy or not. This patch is inspired by Andi Kleen's patch: https://lwn.net/Articles/600471/ We create new option '--cycles-hist'. Example: perf record -b ./div perf record -b ./div perf diff -c cycles # Baseline[Program Block Range] Cycles Diff Shared Object Symbol # .. . # 46.72% [div.c:40 -> div.c:40]0 div[.] main 46.72% [div.c:42 -> div.c:44]0 div[.] main 46.72% [div.c:42 -> div.c:39]0 div[.] main 20.54% [random_r.c:357 -> random_r.c:394]1 libc-2.27.so [.] __random_r 20.54% [random_r.c:357 -> random_r.c:380]0 libc-2.27.so [.] __random_r 20.54% [random_r.c:388 -> random_r.c:388]0 libc-2.27.so [.] __random_r 20.54% [random_r.c:388 -> random_r.c:391]0 libc-2.27.so [.] __random_r 17.04% [random.c:288 -> random.c:291]0 libc-2.27.so [.] __random 17.04% [random.c:291 -> random.c:291]0 libc-2.27.so [.] __random 17.04% [random.c:293 -> random.c:293]0 libc-2.27.so [.] __random 17.04% [random.c:295 -> random.c:295]0 libc-2.27.so [.] __random 17.04% [random.c:295 -> random.c:295]0 libc-2.27.so [.] __random 17.04% [random.c:298 -> random.c:298]0 libc-2.27.so [.] __random 8.40% [div.c:22 -> div.c:25]0 div[.] compute_flag 8.40% [div.c:27 -> div.c:28]0 div[.] compute_flag 5.14%[rand.c:26 -> rand.c:27]0 libc-2.27.so [.] rand 5.14%[rand.c:28 -> rand.c:28]0 libc-2.27.so [.] rand 2.15% [rand@plt+0 -> rand@plt+0]0 div[.] rand@plt 0.00% [kernel.kallsyms] [k] __x86_indirect_thunk_rax 0.00%[do_mmap+714 -> do_mmap+732] -10 [kernel.kallsyms] [k] do_mmap 0.00%[do_mmap+737 -> do_mmap+765]1 [kernel.kallsyms] [k] do_mmap 0.00%[do_mmap+262 -> do_mmap+299]0 [kernel.kallsyms] [k] do_mmap 0.00% [__x86_indirect_thunk_r15+0 -> __x86_indirect_thunk_r15+0]7 [kernel.kallsyms] [k] __x86_indirect_thunk_r15 0.00%[native_sched_clock+0 -> native_sched_clock+119] -1 [kernel.kallsyms] [k] native_sched_clock 0.00% [native_write_msr+0 -> native_write_msr+16] -13 [kernel.kallsyms] [k] native_write_msr When we enable the option '--cycles-hist', the output is perf diff -c cycles --cycles-hist # Baseline[Program Block Range] Cycles Diff stddev/Hist Shared Object Symbol # .. . . # 46.72% [div.c:40 -> div.c:40]0 ± 37.8% ▁█▁▁██▁█ div[.] main 46.72% [div.c:42 -> div.c:44]0 ± 49.4% ▁▁▂█ div[.] main 46.72% [div.c:42 -> div.c:39]0 ± 24.1% ▃█▂▄▁▃▂▁ div[.] main 20.54% [random_r.c:357 -> random_r.c:394]1 ± 33.5% ▅▂▁█▃▁▂▁ libc-2.27.so [.] __random_r 20.54% [random_r.c:357 -> random_r.c:380]0 ± 39.4% ▁▁█▁██▅▁ libc-2.27.so [.] __random_r 20.54% [random_r.c:388 -> random_r.c:388]0 libc-2.27.so [.] __random_r 20.54%
[tip: perf/core] perf report: Add warning when libunwind not compiled in
The following commit has been merged into the perf/core branch of tip: Commit-ID: 800d3f561659b5436f8c57e7c26dd1f6928b5615 Gitweb: https://git.kernel.org/tip/800d3f561659b5436f8c57e7c26dd1f6928b5615 Author:Jin Yao AuthorDate:Fri, 11 Oct 2019 10:21:22 +08:00 Committer: Arnaldo Carvalho de Melo CommitterDate: Tue, 15 Oct 2019 08:36:22 -03:00 perf report: Add warning when libunwind not compiled in We received a user report that call-graph DWARF mode was enabled in 'perf record' but 'perf report' didn't unwind the callstack correctly. The reason was, libunwind was not compiled in. We can use 'perf -vv' to check the compiled libraries but it would be valuable to report a warning to user directly (especially valuable for a perf newbie). The warning is: Warning: Please install libunwind development packages during the perf build. Both TUI and stdio are supported. Signed-off-by: Jin Yao Cc: Alexander Shishkin Cc: Andi Kleen Cc: Jiri Olsa Cc: Kan Liang Cc: Peter Zijlstra Link: http://lore.kernel.org/lkml/20191011022122.26369-1-yao@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/builtin-report.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c index aae0e57..7accaf8 100644 --- a/tools/perf/builtin-report.c +++ b/tools/perf/builtin-report.c @@ -399,6 +399,13 @@ static int report__setup_sample_type(struct report *rep) PERF_SAMPLE_BRANCH_ANY)) rep->nonany_branch_mode = true; +#ifndef HAVE_LIBUNWIND_SUPPORT + if (dwarf_callchain_users) { + ui__warning("Please install libunwind development packages " + "during the perf build.\n"); + } +#endif + return 0; }
[tip: perf/core] perf stat: Support --all-kernel/--all-user
The following commit has been merged into the perf/core branch of tip: Commit-ID: dd071024bf52156eed31deaf511c6e7a82a6f57b Gitweb: https://git.kernel.org/tip/dd071024bf52156eed31deaf511c6e7a82a6f57b Author:Jin Yao AuthorDate:Fri, 11 Oct 2019 13:05:45 +08:00 Committer: Arnaldo Carvalho de Melo CommitterDate: Tue, 15 Oct 2019 08:39:42 -03:00 perf stat: Support --all-kernel/--all-user 'perf record' has supported --all-kernel / --all-user to configure all used events to run in kernel space or run in user space. But 'perf stat' doesn't support these options. It would be useful to support these options in 'perf stat' too to keep the same semantics available in both tools. Signed-off-by: Jin Yao Acked-by: Jiri Olsa Cc: Alexander Shishkin Cc: Andi Kleen Cc: Kan Liang Cc: Peter Zijlstra Link: http://lore.kernel.org/lkml/20191011050545.3899-1-yao@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/Documentation/perf-stat.txt | 6 ++ tools/perf/builtin-stat.c | 6 ++ tools/perf/util/stat.c | 10 ++ tools/perf/util/stat.h | 2 ++ 4 files changed, 24 insertions(+) diff --git a/tools/perf/Documentation/perf-stat.txt b/tools/perf/Documentation/perf-stat.txt index 930c51c..a9af4e4 100644 --- a/tools/perf/Documentation/perf-stat.txt +++ b/tools/perf/Documentation/perf-stat.txt @@ -323,6 +323,12 @@ The output is SMI cycles%, equals to (aperf - unhalted core cycles) / aperf Users who wants to get the actual value can apply --no-metric-only. +--all-kernel:: +Configure all used events to run in kernel space. + +--all-user:: +Configure all used events to run in user space. + EXAMPLES diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c index 468fc49..c88d4e1 100644 --- a/tools/perf/builtin-stat.c +++ b/tools/perf/builtin-stat.c @@ -803,6 +803,12 @@ static struct option stat_options[] = { OPT_CALLBACK('M', "metrics", &evsel_list, "metric/metric group list", "monitor specified metrics or metric groups (separated by ,)", parse_metric_groups), + OPT_BOOLEAN_FLAG(0, "all-kernel", &stat_config.all_kernel, +"Configure all used events to run in kernel space.", +PARSE_OPT_EXCLUSIVE), + OPT_BOOLEAN_FLAG(0, "all-user", &stat_config.all_user, +"Configure all used events to run in user space.", +PARSE_OPT_EXCLUSIVE), OPT_END() }; diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c index ebdd130..6822e4f 100644 --- a/tools/perf/util/stat.c +++ b/tools/perf/util/stat.c @@ -490,6 +490,16 @@ int create_perf_stat_counter(struct evsel *evsel, if (config->identifier) attr->sample_type = PERF_SAMPLE_IDENTIFIER; + if (config->all_user) { + attr->exclude_kernel = 1; + attr->exclude_user = 0; + } + + if (config->all_kernel) { + attr->exclude_kernel = 0; + attr->exclude_user = 1; + } + /* * Disabling all counters initially, they will be enabled * either manually by us or by kernel via enable_on_exec diff --git a/tools/perf/util/stat.h b/tools/perf/util/stat.h index edbeb2f..081c4a5 100644 --- a/tools/perf/util/stat.h +++ b/tools/perf/util/stat.h @@ -106,6 +106,8 @@ struct perf_stat_config { bool big_num; bool no_merge; bool walltime_run_table; + bool all_kernel; + bool all_user; FILE*output; unsigned int interval; unsigned int timeout;
[tip: perf/core] perf list: Hide deprecated events by default
The following commit has been merged into the perf/core branch of tip: Commit-ID: a7f6c8c81afdd6d24eb12558f2fb66901207d349 Gitweb: https://git.kernel.org/tip/a7f6c8c81afdd6d24eb12558f2fb66901207d349 Author:Jin Yao AuthorDate:Tue, 15 Oct 2019 10:53:57 +08:00 Committer: Arnaldo Carvalho de Melo CommitterDate: Sat, 19 Oct 2019 15:35:01 -03:00 perf list: Hide deprecated events by default There are some deprecated events listed by perf list. But we can't remove them from perf list with ease because some old scripts may use them. Deprecated events are old names of renamed events. When an event gets renamed the old name is kept around for some time and marked with Deprecated. The newer Intel event lists in the tree already have these headers. So we need to keep them in the event list, but provide a new option to show them. The new option is "--deprecated". With this patch, the deprecated events are hidden by default but they can be displayed when option "--deprecated" is enabled. Signed-off-by: Jin Yao Acked-by: Jiri Olsa Cc: Alexander Shishkin Cc: Andi Kleen Cc: Jin Yao Cc: Kan Liang Cc: Peter Zijlstra Link: http://lore.kernel.org/lkml/20191015025357.8708-1-yao@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/Documentation/perf-list.txt | 3 +++- tools/perf/builtin-list.c | 14 + tools/perf/pmu-events/jevents.c| 26 +++-- tools/perf/pmu-events/jevents.h| 3 ++- tools/perf/pmu-events/pmu-events.h | 1 +- tools/perf/util/parse-events.c | 4 ++-- tools/perf/util/parse-events.h | 2 +- tools/perf/util/pmu.c | 17 tools/perf/util/pmu.h | 4 +++- 9 files changed, 55 insertions(+), 19 deletions(-) diff --git a/tools/perf/Documentation/perf-list.txt b/tools/perf/Documentation/perf-list.txt index 18ed1b0..6345db3 100644 --- a/tools/perf/Documentation/perf-list.txt +++ b/tools/perf/Documentation/perf-list.txt @@ -36,6 +36,9 @@ Enable debugging output. Print how named events are resolved internally into perf events, and also any extra expressions computed by perf stat. +--deprecated:: +Print deprecated events. By default the deprecated events are hidden. + [[EVENT_MODIFIERS]] EVENT MODIFIERS --- diff --git a/tools/perf/builtin-list.c b/tools/perf/builtin-list.c index 08e62ae..965ef01 100644 --- a/tools/perf/builtin-list.c +++ b/tools/perf/builtin-list.c @@ -26,6 +26,7 @@ int cmd_list(int argc, const char **argv) int i; bool raw_dump = false; bool long_desc_flag = false; + bool deprecated = false; struct option list_options[] = { OPT_BOOLEAN(0, "raw-dump", &raw_dump, "Dump raw events"), OPT_BOOLEAN('d', "desc", &desc_flag, @@ -34,6 +35,8 @@ int cmd_list(int argc, const char **argv) "Print longer event descriptions."), OPT_BOOLEAN(0, "details", &details_flag, "Print information on the perf event names and expressions used internally by events."), + OPT_BOOLEAN(0, "deprecated", &deprecated, + "Print deprecated events."), OPT_INCR(0, "debug", &verbose, "Enable debugging output"), OPT_END() @@ -55,7 +58,7 @@ int cmd_list(int argc, const char **argv) if (argc == 0) { print_events(NULL, raw_dump, !desc_flag, long_desc_flag, - details_flag); + details_flag, deprecated); return 0; } @@ -78,7 +81,8 @@ int cmd_list(int argc, const char **argv) print_hwcache_events(NULL, raw_dump); else if (strcmp(argv[i], "pmu") == 0) print_pmu_events(NULL, raw_dump, !desc_flag, - long_desc_flag, details_flag); + long_desc_flag, details_flag, + deprecated); else if (strcmp(argv[i], "sdt") == 0) print_sdt_events(NULL, NULL, raw_dump); else if (strcmp(argv[i], "metric") == 0 || strcmp(argv[i], "metrics") == 0) @@ -91,7 +95,8 @@ int cmd_list(int argc, const char **argv) if (sep == NULL) { print_events(argv[i], raw_dump, !desc_flag, long_desc_flag, - details_flag); + details_flag, + deprecated); continue; } sep_idx = sep - argv[i]; @@ -117,7 +122,8 @@ int cmd_list(int argc, co
[tip: perf/core] perf stat: Zero all the 'ena' and 'run' array slot stats for interval mode
The following commit has been merged into the perf/core branch of tip: Commit-ID: 0e0bf1ea1147fcf74eab19c2d3c853cc3740a72f Gitweb: https://git.kernel.org/tip/0e0bf1ea1147fcf74eab19c2d3c853cc3740a72f Author:Jin Yao AuthorDate:Thu, 09 Apr 2020 15:07:55 +08:00 Committer: Arnaldo Carvalho de Melo CommitterDate: Wed, 22 Apr 2020 15:51:01 -03:00 perf stat: Zero all the 'ena' and 'run' array slot stats for interval mode As the code comments in perf_stat_process_counter() say, we calculate counter's data every interval, and the display code shows ps->res_stats avg value. We need to zero the stats for interval mode. But the current code only zeros the res_stats[0], it doesn't zero the res_stats[1] and res_stats[2], which are for ena and run of counter. This patch zeros the whole res_stats[] for interval mode. Fixes: 51fd2df1e882 ("perf stat: Fix interval output values") Signed-off-by: Jin Yao Cc: Alexander Shishkin Cc: Andi Kleen Cc: Jin Yao Cc: Jiri Olsa Cc: Kan Liang Cc: Peter Zijlstra Link: http://lore.kernel.org/lkml/20200409070755.17261-1-yao@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/util/stat.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c index 5f26137..242476e 100644 --- a/tools/perf/util/stat.c +++ b/tools/perf/util/stat.c @@ -368,8 +368,10 @@ int perf_stat_process_counter(struct perf_stat_config *config, * interval mode, otherwise overall avg running * averages will be shown for each interval. */ - if (config->interval) - init_stats(ps->res_stats); + if (config->interval) { + for (i = 0; i < 3; i++) + init_stats(&ps->res_stats[i]); + } if (counter->per_pkg) zero_per_pkg(counter);
[tip: perf/core] perf stat: Improve runtime stat for interval mode
The following commit has been merged into the perf/core branch of tip: Commit-ID: 197ba86fdc888dc0d3d6b89b402c9c6851d4c6fb Gitweb: https://git.kernel.org/tip/197ba86fdc888dc0d3d6b89b402c9c6851d4c6fb Author:Jin Yao AuthorDate:Mon, 20 Apr 2020 22:54:17 +08:00 Committer: Arnaldo Carvalho de Melo CommitterDate: Thu, 23 Apr 2020 11:03:46 -03:00 perf stat: Improve runtime stat for interval mode For interval mode, the metric is printed after the '#' character if it exists. But it's not calculated by the counts generated in this interval. See the following examples: root@kbl-ppc:~# perf stat -M CPI -I1000 --interval-count 2 # time counts unit events 1.000422803764,809 inst_retired.any # 2.9 CPI 1.000422803 2,234,932 cycles 2.001464585 1,960,061 inst_retired.any # 1.6 CPI 2.001464585 4,022,591 cycles The second CPI should not be 1.6 (4,022,591/1,960,061 is 2.1) root@kbl-ppc:~# perf stat -e cycles,instructions -I1000 --interval-count 2 # time counts unit events 1.000429493 2,869,311 cycles 1.000429493816,875 instructions #0.28 insn per cycle 2.001516426 9,260,973 cycles 2.001516426 5,250,634 instructions #0.87 insn per cycle The second 'insn per cycle' should not be 0.87 (5,250,634/9,260,973 is 0.57). The current code uses a global variable 'rt_stat' for tracking and updating the std dev of runtime stat. Unlike the counts, 'rt_stat' is not reset for interval. While the counts are reset for interval. perf_stat_process_counter() { if (config->interval) init_stats(ps->res_stats); } So for interval mode, the 'rt_stat' variable should be reset too. This patch resets 'rt_stat' before read_counters(), so the runtime stat is only calculated by the counts generated in this interval. With this patch: root@kbl-ppc:~# perf stat -M CPI -I1000 --interval-count 2 # time counts unit events 1.000420924 2,408,818 inst_retired.any # 2.1 CPI 1.000420924 5,010,111 cycles 2.001448579 2,798,407 inst_retired.any # 1.6 CPI 2.001448579 4,599,861 cycles root@kbl-ppc:~# perf stat -e cycles,instructions -I1000 --interval-count 2 # time counts unit events 1.000428555 2,769,714 cycles 1.000428555774,462 instructions #0.28 insn per cycle 2.001471562 3,595,904 cycles 2.001471562 1,243,703 instructions #0.35 insn per cycle Now the second 'insn per cycle' and CPI are calculated by the counts generated in this interval. Signed-off-by: Jin Yao Acked-by: Jiri Olsa Tested-By: Kajol Jain Cc: Alexander Shishkin Cc: Andi Kleen Cc: Jin Yao Cc: Kan Liang Cc: Peter Zijlstra Link: http://lore.kernel.org/lkml/20200420145417.6864-1-yao@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo --- tools/perf/Documentation/perf-stat.txt | 2 ++ tools/perf/builtin-stat.c | 1 + 2 files changed, 3 insertions(+) diff --git a/tools/perf/Documentation/perf-stat.txt b/tools/perf/Documentation/perf-stat.txt index 4d56586..3fb5028 100644 --- a/tools/perf/Documentation/perf-stat.txt +++ b/tools/perf/Documentation/perf-stat.txt @@ -176,6 +176,8 @@ Print count deltas every N milliseconds (minimum: 1ms) The overhead percentage could be high in some cases, for instance with small, sub 100ms intervals. Use with caution. example: 'perf stat -I 1000 -e cycles -a sleep 5' +If the metric exists, it is calculated by the counts generated in this interval and the metric is printed after #. + --interval-count times:: Print count deltas for fixed number of times. This option should be used together with "-I" option. diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c index 9207b6c..3f050d8 100644 --- a/tools/perf/builtin-stat.c +++ b/tools/perf/builtin-stat.c @@ -359,6 +359,7 @@ static void process_interval(void) clock_gettime(CLOCK_MONOTONIC, &ts); diff_timespec(&rs, &ts, &ref_time); + perf_stat__reset_shadow_per_stat(&rt_stat); read_counters(&rs); if (STAT_RECORD) {