While perf-stat has its own create_perf_stat_counter() helper to open events, dependent on target configuration, it uses perf_evlist__close() to close events.
The common perf_evlist__{open,close}() helpers don't consider the target configuration, and always evsel->cpus even where create_perf_stat_counter() would have used an empty_cpu_map. On some systems, the CPU PMU describes cpus under sysfs, and evsel->cpus may be set even when we open per-thread events. For per-thread events, we don't use evsel->cpus in create_perf_stat_counter(), though perf_evlist__close() will. Thus we try to close more events than we allocated and opened, resulting in segfaults when we go out-of-bounds: $ perf stat -e armv8_pmuv3/cpu_cycles/ true *** Error in `./old-perf': free(): invalid next size (fast): 0x00000000263a55c0 *** Aborted This problem was introduced by commit: 3df33eff2ba96be4 ("perf stat: Avoid skew when reading events") ... prior to that commit, we closed events as we read them, using the correct number of CPUs. This patch fixes the problem by adding a new close_counters() routine that mirrors create_perf_stat_counter(), ensuring that we always correctly balance open/close. Fixes: 3df33eff2ba96be4 ("perf stat: Avoid skew when reading events") Signed-off-by: Mark Rutland <mark.rutl...@arm.com> Tested-by: Ganapatrao Kulkarni <ganapatrao.kulka...@cavium.com> Cc: Arnaldo Carvalho de Melo <a...@kernel.org> Cc: Alexander Shishkin <alexander.shish...@linux.intel.com> Cc: linux-kernel@vger.kernel.org --- tools/perf/builtin-stat.c | 24 +++++++++++++++++++++++- 1 file changed, 23 insertions(+), 1 deletion(-) diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c index a935b50..ceedc0a 100644 --- a/tools/perf/builtin-stat.c +++ b/tools/perf/builtin-stat.c @@ -364,6 +364,28 @@ static void read_counters(void) } } +/* + * Close all evnt FDs we open in __run_perf_stat() and + * create_perf_stat_counter(), taking care to match the number of threads and CPUs. + * + * Note that perf_evlist__close(evsel_list) is not equivalent, as it doesn't + * take the target into account. + */ +static void close_counters(void) +{ + bool per_cpu = target__has_cpu(&target); + struct perf_evsel *evsel; + + evlist__for_each_entry(evsel_list, evsel) { + if (per_cpu) + perf_evsel__close_per_cpu(evsel, + perf_evsel__cpus(evsel)); + else + perf_evsel__close_per_thread(evsel, + evsel_list->threads); + } +} + static void process_interval(void) { struct timespec ts, rs; @@ -704,7 +726,7 @@ static int __run_perf_stat(int argc, const char **argv) * group leaders. */ read_counters(); - perf_evlist__close(evsel_list); + close_counters(); return WEXITSTATUS(status); } -- 1.9.1