While perf-stat has its own create_perf_stat_counter() helper to open
events, dependent on target configuration, it uses perf_evlist__close()
to close events.

The common perf_evlist__{open,close}() helpers don't consider the target
configuration, and always evsel->cpus even where
create_perf_stat_counter() would have used an empty_cpu_map.

On some systems, the CPU PMU describes cpus under sysfs, and evsel->cpus
may be set even when we open per-thread events. For per-thread events,
we don't use evsel->cpus in create_perf_stat_counter(), though
perf_evlist__close() will. Thus we try to close more events than we
allocated and opened, resulting in segfaults when we go out-of-bounds:

$ perf stat -e armv8_pmuv3/cpu_cycles/ true
*** Error in `./old-perf': free(): invalid next size (fast): 0x00000000263a55c0 
***
Aborted

This problem was introduced by commit:

  3df33eff2ba96be4 ("perf stat: Avoid skew when reading events")

... prior to that commit, we closed events as we read them, using the
correct number of CPUs.

This patch fixes the problem by adding a new close_counters() routine
that mirrors create_perf_stat_counter(), ensuring that we always
correctly balance open/close.

Fixes: 3df33eff2ba96be4 ("perf stat: Avoid skew when reading events")
Signed-off-by: Mark Rutland <mark.rutl...@arm.com>
Tested-by: Ganapatrao Kulkarni <ganapatrao.kulka...@cavium.com>
Cc: Arnaldo Carvalho de Melo <a...@kernel.org>
Cc: Alexander Shishkin <alexander.shish...@linux.intel.com>
Cc: linux-kernel@vger.kernel.org
---
 tools/perf/builtin-stat.c | 24 +++++++++++++++++++++++-
 1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index a935b50..ceedc0a 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -364,6 +364,28 @@ static void read_counters(void)
        }
 }
 
+/*
+ * Close all evnt FDs we open in __run_perf_stat() and
+ * create_perf_stat_counter(), taking care to match the number of threads and 
CPUs.
+ *
+ * Note that perf_evlist__close(evsel_list) is not equivalent, as it doesn't
+ * take the target into account.
+ */
+static void close_counters(void)
+{
+       bool per_cpu = target__has_cpu(&target);
+       struct perf_evsel *evsel;
+
+       evlist__for_each_entry(evsel_list, evsel) {
+               if (per_cpu)
+                       perf_evsel__close_per_cpu(evsel,
+                                                 perf_evsel__cpus(evsel));
+               else
+                       perf_evsel__close_per_thread(evsel,
+                                                    evsel_list->threads);
+       }
+}
+
 static void process_interval(void)
 {
        struct timespec ts, rs;
@@ -704,7 +726,7 @@ static int __run_perf_stat(int argc, const char **argv)
         * group leaders.
         */
        read_counters();
-       perf_evlist__close(evsel_list);
+       close_counters();
 
        return WEXITSTATUS(status);
 }
-- 
1.9.1

Reply via email to