Re: [PATCH v2 0/3] perf-stat: share hardware PMCs with BPF

Arnaldo Carvalho de Melo Wed, 17 Mar 2021 06:12:27 -0700

Em Wed, Mar 17, 2021 at 02:29:28PM +0900, Namhyung Kim escreveu:
> Hi Song,
> 
> On Wed, Mar 17, 2021 at 6:18 AM Song Liu <[email protected]> wrote:
> >
> > perf uses performance monitoring counters (PMCs) to monitor system
> > performance. The PMCs are limited hardware resources. For example,
> > Intel CPUs have 3x fixed PMCs and 4x programmable PMCs per cpu.
> >
> > Modern data center systems use these PMCs in many different ways:
> > system level monitoring, (maybe nested) container level monitoring, per
> > process monitoring, profiling (in sample mode), etc. In some cases,
> > there are more active perf_events than available hardware PMCs. To allow
> > all perf_events to have a chance to run, it is necessary to do expensive
> > time multiplexing of events.
> >
> > On the other hand, many monitoring tools count the common metrics (cycles,
> > instructions). It is a waste to have multiple tools create multiple
> > perf_events of "cycles" and occupy multiple PMCs.
> 
> Right, it'd be really helpful when the PMCs are frequently or mostly shared.
> But it'd also increase the overhead for uncontended cases as BPF programs
> need to run on every context switch.  Depending on the workload, it may
> cause a non-negligible performance impact.  So users should be aware of it.


Would be interesting to, humm, measure both cases to have a firm number
of the impact, how many instructions are added when sharing using
--bpf-counters?

I.e. compare the "expensive time multiplexing of events" with its
avoidance by using --bpf-counters.

Song, have you perfmormed such measurements?

- Arnaldo
 
> Thanks,
> Namhyung
> 
> >
> > bperf tries to reduce such wastes by allowing multiple perf_events of
> > "cycles" or "instructions" (at different scopes) to share PMUs. Instead
> > of having each perf-stat session to read its own perf_events, bperf uses
> > BPF programs to read the perf_events and aggregate readings to BPF maps.
> > Then, the perf-stat session(s) reads the values from these BPF maps.
> >
> > Changes v1 => v2:
> >   1. Add documentation.
> >   2. Add a shell test.
> >   3. Rename options, default path of the atto-map, and some variables.
> >   4. Add a separate patch that moves clock_gettime() in __run_perf_stat()
> >      to after enable_counters().
> >   5. Make perf_cpu_map for all cpus a global variable.
> >   6. Use sysfs__mountpoint() for default attr-map path.
> >   7. Use cpu__max_cpu() instead of libbpf_num_possible_cpus().
> >   8. Add flag "enabled" to the follower program. Then move follower attach
> >      to bperf__load() and simplify bperf__enable().
> >
> > Song Liu (3):
> >   perf-stat: introduce bperf, share hardware PMCs with BPF
> >   perf-stat: measure t0 and ref_time after enable_counters()
> >   perf-test: add a test for perf-stat --bpf-counters option
> >
> >  tools/perf/Documentation/perf-stat.txt        |  11 +
> >  tools/perf/Makefile.perf                      |   1 +
> >  tools/perf/builtin-stat.c                     |  20 +-
> >  tools/perf/tests/shell/stat_bpf_counters.sh   |  34 ++
> >  tools/perf/util/bpf_counter.c                 | 519 +++++++++++++++++-
> >  tools/perf/util/bpf_skel/bperf.h              |  14 +
> >  tools/perf/util/bpf_skel/bperf_follower.bpf.c |  69 +++
> >  tools/perf/util/bpf_skel/bperf_leader.bpf.c   |  46 ++
> >  tools/perf/util/bpf_skel/bperf_u.h            |  14 +
> >  tools/perf/util/evsel.h                       |  20 +-
> >  tools/perf/util/target.h                      |   4 +-
> >  11 files changed, 742 insertions(+), 10 deletions(-)
> >  create mode 100755 tools/perf/tests/shell/stat_bpf_counters.sh
> >  create mode 100644 tools/perf/util/bpf_skel/bperf.h
> >  create mode 100644 tools/perf/util/bpf_skel/bperf_follower.bpf.c
> >  create mode 100644 tools/perf/util/bpf_skel/bperf_leader.bpf.c
> >  create mode 100644 tools/perf/util/bpf_skel/bperf_u.h
> >
> > --
> > 2.30.2

-- 

- Arnaldo

Re: [PATCH v2 0/3] perf-stat: share hardware PMCs with BPF

Reply via email to