* Arnaldo Carvalho de Melo <a...@kernel.org> wrote:

> Hi Ingo,
> 
>       Please consider pulling into tip/perf/core,
> 
> Thanks,
> 
> - Arnaldo
> 
> The following changes since commit 10b37cb59fa1e61fec1386f324615e0e8202cd87:
> 
>   Merge tag 'perf-vendor_events-for-mingo-20161018' of 
> git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core 
> (2016-10-19 15:22:26 +0200)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git 
> tags/perf-c2c-for-mingo-20161020
> 
> for you to fetch changes up to 535bbde62701b2bb298063e9dfa007e8a1ff95d1:
> 
>   perf c2c report: Add --show-all option (2016-10-19 13:18:31 -0300)
> 
> ----------------------------------------------------------------
> - The 'perf c2c' tool provides means for Shared Data C2C/HITM analysis.
> 
>   It allows you to track down cacheline contention. The tool is based
>   on x86's load latency and precise store facility events provided by
>   Intel CPUs.
> 
>   It was tested by Joe Mario and has proven to be useful, finding some
>   cacheline contentions. Joe also wrote a blog about c2c tool with
>   examples:
> 
>     https://joemario.github.io/blog/2016/09/01/c2c-blog/
> 
>   Excerpt of the content on this site:
> 
>   ---
>     At a high level, “perf c2c” will show you:
> 
>     * The cachelines where false sharing was detected.
>     * The readers and writers to those cachelines, and the offsets where 
> those accesses occurred.
>     * The pid, tid, instruction addr, function name, binary object name for 
> those readers and writers.
>     * The source file and line number for each reader and writer.
>     * The average load latency for the loads to those cachelines.
>     * Which numa nodes the samples a cacheline came from and which CPUs were 
> involved.
> 
>     Using perf c2c is similar to using the Linux perf tool today.
>     First collect data with “perf c2c record” Then generate a report output 
> with “perf c2c report”
>   ---
> 
>   There one finds extensive details on using the tool, with tips on
>   reducing the volume of samples while still capturing enough to do
>   its job. (Dick Fowles, Joe Mario, Don Zickus, Jiri Olsa)
> 
> Signed-off-by: Arnaldo Carvalho de Melo <a...@redhat.com>
> 
> ----------------------------------------------------------------
> Jiri Olsa (52):
>       perf c2c: Introduce c2c_decode_stats function
>       perf c2c: Introduce c2c_add_stats function
>       perf c2c: Add c2c command
>       perf c2c: Add record subcommand
>       perf c2c: Add report subcommand
>       perf c2c report: Add dimension support
>       perf c2c report: Add sort_entry dimension support
>       perf c2c report: Fallback to standard dimensions
>       perf c2c report: Add sample processing
>       perf c2c report: Add cacheline hists processing
>       perf c2c report: Decode c2c_stats for hist entries
>       perf c2c report: Add header macros
>       perf c2c report: Add 'dcacheline' dimension key
>       perf c2c report: Add 'offset' dimension key
>       perf c2c report: Add 'iaddr' dimension key
>       perf c2c report: Add hitm related dimension keys
>       perf c2c report: Add stores related dimension keys
>       perf c2c report: Add loads related dimension keys
>       perf c2c report: Add llc and remote loads related dimension keys
>       perf c2c report: Add llc load miss dimension key
>       perf c2c report: Add total record sort key
>       perf c2c report: Add total loads sort key
>       perf c2c report: Add hitm percent sort key
>       perf c2c report: Add hitm/store percent related sort keys
>       perf c2c report: Add dram related sort keys
>       perf c2c report: Add 'pid' sort key
>       perf c2c report: Add 'tid' sort key
>       perf c2c report: Add 'symbol' and 'dso' sort keys
>       perf c2c report: Add 'node' sort key
>       perf c2c report: Add stats related sort keys
>       perf c2c report: Add 'cpucnt' sort key
>       perf c2c report: Add src line sort key
>       perf c2c report: Setup number of header lines for hists
>       perf c2c report: Set final resort fields
>       perf c2c report: Add stdio output support
>       perf c2c report: Add main TUI browser
>       perf c2c report: Add TUI cacheline browser
>       perf c2c report: Add global stats stdio output
>       perf c2c report: Add shared cachelines stats stdio output
>       perf c2c report: Add c2c related stats stdio output
>       perf c2c report: Allow to report callchains
>       perf c2c report: Limit the cachelines table entries
>       perf c2c report: Add support to choose local HITMs
>       perf c2c report: Allow to set cacheline sort fields
>       perf c2c report: Recalc width of global sort entries
>       perf c2c report: Add cacheline index entry
>       perf c2c report: Add support to manage symbol name length
>       perf c2c report: Iterate node display in browser
>       perf c2c report: Add help windows
>       perf c2c: Add man page and credits
>       perf c2c report: Add --no-source option
>       perf c2c report: Add --show-all option
> 
>  tools/perf/Build                      |    1 +
>  tools/perf/Documentation/perf-c2c.txt |  282 ++++
>  tools/perf/builtin-c2c.c              | 2754 
> +++++++++++++++++++++++++++++++++
>  tools/perf/builtin.h                  |    1 +
>  tools/perf/perf.c                     |    1 +
>  tools/perf/ui/browsers/hists.c        |    2 +-
>  tools/perf/ui/browsers/hists.h        |    1 +
>  tools/perf/util/hist.c                |    1 +
>  tools/perf/util/hist.h                |    1 +
>  tools/perf/util/mem-events.c          |  128 ++
>  tools/perf/util/mem-events.h          |   37 +
>  tools/perf/util/sort.c                |    2 +-
>  tools/perf/util/sort.h                |    1 +
>  13 files changed, 3210 insertions(+), 2 deletions(-)
>  create mode 100644 tools/perf/Documentation/perf-c2c.txt
>  create mode 100644 tools/perf/builtin-c2c.c

Pulled the perf-c2c-for-mingo-20161021 tag, thanks a lot Arnaldo!

I can see some teething problems. For example if I run it on an older kernel 
(v4.4 
distro kernel), I get this:

 triton:~/tip> perf c2c record perf bench sched pipe
 # Running 'sched/pipe' benchmark:
 # Executed 1000000 pipe operations between two processes

     Total time: 12.001 [sec]

      12.001919 usecs/op
          83320 ops/sec
 [ perf record: Woken up 18 times to write data ]
 [ perf record: Captured and wrote 5.356 MB perf.data (69804 samples) ]

but there's no 'perf c2c report' TUI output at all:

 Shared Data Cache Line Table     (0 entries, sorted on remote HITMs)           
                                                                                
                       
                              Total      Rmt  ----- LLC Load Hitm -----  ---- 
Store Reference ----  --- Load Dram ----      LLC    Total  ----- Core Load Hit 
-----  -- LLC Load Hit -
 Index           Cacheline  records     Hitm    Total      Lcl      Rmt    
Total    L1Hit   L1Miss       Lcl       Rmt  Ld Miss    Loads       FB       L1 
      L2       Llc       Rm
                                                                                
                                                                                
                     
and just an empty screen.

If I do 'perf report' I get two events:

 Available samples
 24K cpu/mem-loads,ldlat=30/P
 45K cpu/mem-stores/P

and both have some real data.

What am I missing?

        Ingo

Reply via email to