Hi Jiri,

On 4/27/2020 6:29 PM, Jiri Olsa wrote:
On Mon, Apr 20, 2020 at 09:04:44AM +0800, Jin Yao wrote:
Sometimes, a small change in a hot function reducing the cycles of
this function, but the overall workload doesn't get faster. It is
interesting where the cycles are moved to.

What it would like is to diff before/after streams. A stream we think
is a callchain which is aggregated by the branch records from perf
samples.

I wonder we could use this on intel_pt trace.. like compare streams
for given function call.. not sure that would be feasible, but might
be good idea to write this in a generic way and not callchain specific

jirka


Yes, that's a good idea. We should try to write the code in a generic way.

Thanks
Jin Yao


By browsing the hot streams, we can understand the hot code path.
By comparing the cycles variation of same streams between old perf
data and new perf data, we can understand if the cycles are moved
to other codes.

The before stream is the stream in perf.data.old. The after stream
is the stream in perf.data.

Diffing before/after streams compares top N hottest streams between
two perf data files.

If all entries of one stream in perf.data.old are fully matched with
all entries of another stream in perf.data, we think two streams
are matched, otherwise the streams are not matched.

For example,

    cycles: 1, hits: 26.80%                 cycles: 1, hits: 27.30%
--------------------------              --------------------------
              main div.c:39                           main div.c:39
              main div.c:44                           main div.c:44

The above streams are matched and we can see for the same streams the
cycles (1) are equal and the callchain hit percents are slightly changed
(26.80% vs. 27.30%). That's expected.

But that's not always true if source code is changed in perf.data
(e.g. div.c:39 is changed). If div.c:39 is changed, they are different
streams, we can't compare them. We will think the stream in perf.data
is a new stream.

The challenge is how to identify the changed source lines. The basic
idea is to use linux command "diff" to compare the source file A and
source file A* line by line (assume file A is used in perf.data.old
and file A* is used in perf.data). According to "diff" output,
we can generate a source line mapping table.

For example,

   Execute 'diff ./before/div.c ./after/div.c'

   25c25
   <       i = rand() % 2;
   ---
   >       i = rand() % 4;
   39c39
   <       for (i = 0; i < 2000000000; i++) {
   ---
   >       for (i = 0; i < 20000000001; i++) {

   div.c (after -> before) lines mapping:
   0 -> 0
   1 -> 1
   2 -> 2
   3 -> 3
   4 -> 4
   5 -> 5
   6 -> 6
   7 -> 7
   8 -> 8
   9 -> 9
   ...
   24 -> 24
   25 -> -1
   26 -> 26
   27 -> 27
   28 -> 28
   29 -> 29
   30 -> 30
   31 -> 31
   32 -> 32
   33 -> 33
   34 -> 34
   35 -> 35
   36 -> 36
   37 -> 37
   38 -> 38
   39 -> -1
   40 -> 40
   ...

 From the table, we can easily know div.c:39 is source line changed.
(it's mapped to -1). So following two streams are not matched.

    cycles: 1, hits: 26.80%                 cycles: 1, hits: 27.30%
--------------------------              --------------------------
              main div.c:39                           main div.c:39
              main div.c:44                           main div.c:44

Now let's see examples.

perf record -b ...      Generate perf.data.old with branch data
perf record -b ...      Generate perf.data with branch data
perf diff --stream

[ Matched hot chains between old perf data and new perf data) ]

hot chain pair 1:
             cycles: 1, hits: 26.80%                 cycles: 1, hits: 27.30%
         ---------------------------              --------------------------
                       main div.c:39                           main div.c:39
                       main div.c:44                           main div.c:44

hot chain pair 2:
            cycles: 35, hits: 21.43%                cycles: 33, hits: 19.37%
         ---------------------------              --------------------------
           __random_r random_r.c:360               __random_r random_r.c:360
           __random_r random_r.c:388               __random_r random_r.c:388
           __random_r random_r.c:388               __random_r random_r.c:388
           __random_r random_r.c:380               __random_r random_r.c:380
           __random_r random_r.c:357               __random_r random_r.c:357
               __random random.c:293                   __random random.c:293
               __random random.c:293                   __random random.c:293
               __random random.c:291                   __random random.c:291
               __random random.c:291                   __random random.c:291
               __random random.c:291                   __random random.c:291
               __random random.c:288                   __random random.c:288
                      rand rand.c:27                          rand rand.c:27
                      rand rand.c:26                          rand rand.c:26
                            rand@plt                                rand@plt
                            rand@plt                                rand@plt
               compute_flag div.c:25                   compute_flag div.c:25
               compute_flag div.c:22                   compute_flag div.c:22
                       main div.c:40                           main div.c:40
                       main div.c:40                           main div.c:40
                       main div.c:39                           main div.c:39

hot chain pair 3:
             cycles: 18, hits: 6.10%                 cycles: 19, hits: 6.51%
         ---------------------------              --------------------------
           __random_r random_r.c:360               __random_r random_r.c:360
           __random_r random_r.c:388               __random_r random_r.c:388
           __random_r random_r.c:388               __random_r random_r.c:388
           __random_r random_r.c:380               __random_r random_r.c:380
           __random_r random_r.c:357               __random_r random_r.c:357
               __random random.c:293                   __random random.c:293
               __random random.c:293                   __random random.c:293
               __random random.c:291                   __random random.c:291
               __random random.c:291                   __random random.c:291
               __random random.c:291                   __random random.c:291
               __random random.c:288                   __random random.c:288
                      rand rand.c:27                          rand rand.c:27
                      rand rand.c:26                          rand rand.c:26
                            rand@plt                                rand@plt
                            rand@plt                                rand@plt
               compute_flag div.c:25                   compute_flag div.c:25
               compute_flag div.c:22                   compute_flag div.c:22
                       main div.c:40                           main div.c:40

hot chain pair 4:
              cycles: 9, hits: 5.95%                  cycles: 8, hits: 5.03%
         ---------------------------              --------------------------
           __random_r random_r.c:360               __random_r random_r.c:360
           __random_r random_r.c:388               __random_r random_r.c:388
           __random_r random_r.c:388               __random_r random_r.c:388
           __random_r random_r.c:380               __random_r random_r.c:380

[ Hot chains in old perf data but source line changed (*) in new perf data ]

[ Hot chains in old perf data only ]

hot chain 1:
              cycles: 2, hits: 4.08%
          --------------------------
                       main div.c:42
               compute_flag div.c:28

[ Hot chains in new perf data only ]

hot chain 1:
                                                     cycles: 36, hits: 3.36%
                                                  --------------------------
                                                   __random_r random_r.c:357
                                                       __random random.c:293
                                                       __random random.c:293
                                                       __random random.c:291
                                                       __random random.c:291
                                                       __random random.c:291
                                                       __random random.c:288
                                                              rand rand.c:27
                                                              rand rand.c:26
                                                                    rand@plt
                                                                    rand@plt
                                                       compute_flag div.c:25
                                                       compute_flag div.c:22
                                                               main div.c:40
                                                               main div.c:40

If we enable the source line comparison option, the output may be different.

perf record -b ...      Generate perf.data.old with branch data
perf record -b ...      Generate perf.data with branch data
perf diff --stream --before ./before --after ./after

[ Matched hot chains between old perf data and new perf data) ]

hot chain pair 1:
             cycles: 18, hits: 6.10%                 cycles: 19, hits: 6.51%
         ---------------------------              --------------------------
           __random_r random_r.c:360               __random_r random_r.c:360
           __random_r random_r.c:388               __random_r random_r.c:388
           __random_r random_r.c:388               __random_r random_r.c:388
           __random_r random_r.c:380               __random_r random_r.c:380
           __random_r random_r.c:357               __random_r random_r.c:357
               __random random.c:293                   __random random.c:293
               __random random.c:293                   __random random.c:293
               __random random.c:291                   __random random.c:291
               __random random.c:291                   __random random.c:291
               __random random.c:291                   __random random.c:291
               __random random.c:288                   __random random.c:288
                      rand rand.c:27                          rand rand.c:27
                      rand rand.c:26                          rand rand.c:26
                            rand@plt                                rand@plt
                            rand@plt                                rand@plt
               compute_flag div.c:25                   compute_flag div.c:25
               compute_flag div.c:22                   compute_flag div.c:22
                       main div.c:40                           main div.c:40

hot chain pair 2:
              cycles: 9, hits: 5.95%                  cycles: 8, hits: 5.03%
         ---------------------------              --------------------------
           __random_r random_r.c:360               __random_r random_r.c:360
           __random_r random_r.c:388               __random_r random_r.c:388
           __random_r random_r.c:388               __random_r random_r.c:388
           __random_r random_r.c:380               __random_r random_r.c:380

[ Hot chains in old perf data but source line changed (*) in new perf data ]

hot chain pair 1:
             cycles: 1, hits: 26.80%                 cycles: 1, hits: 27.30%
         ---------------------------              --------------------------
                       main div.c:39                           main div.c:39*
                       main div.c:44                           main div.c:44

hot chain pair 2:
            cycles: 35, hits: 21.43%                cycles: 33, hits: 19.37%
         ---------------------------              --------------------------
           __random_r random_r.c:360               __random_r random_r.c:360
           __random_r random_r.c:388               __random_r random_r.c:388
           __random_r random_r.c:388               __random_r random_r.c:388
           __random_r random_r.c:380               __random_r random_r.c:380
           __random_r random_r.c:357               __random_r random_r.c:357
               __random random.c:293                   __random random.c:293
               __random random.c:293                   __random random.c:293
               __random random.c:291                   __random random.c:291
               __random random.c:291                   __random random.c:291
               __random random.c:291                   __random random.c:291
               __random random.c:288                   __random random.c:288
                      rand rand.c:27                          rand rand.c:27
                      rand rand.c:26                          rand rand.c:26
                            rand@plt                                rand@plt
                            rand@plt                                rand@plt
               compute_flag div.c:25                   compute_flag div.c:25
               compute_flag div.c:22                   compute_flag div.c:22
                       main div.c:40                           main div.c:40
                       main div.c:40                           main div.c:40
                       main div.c:39                           main div.c:39*

[ Hot chains in old perf data only ]

hot chain 1:
              cycles: 2, hits: 4.08%
          --------------------------
                       main div.c:42
               compute_flag div.c:28

[ Hot chains in new perf data only ]

hot chain 1:
                                                     cycles: 36, hits: 3.36%
                                                  --------------------------
                                                   __random_r random_r.c:357
                                                       __random random.c:293
                                                       __random random.c:293
                                                       __random random.c:291
                                                       __random random.c:291
                                                       __random random.c:291
                                                       __random random.c:288
                                                              rand rand.c:27
                                                              rand rand.c:26
                                                                    rand@plt
                                                                    rand@plt
                                                       compute_flag div.c:25
                                                       compute_flag div.c:22
                                                               main div.c:40
                                                               main div.c:40

Now we can see, following streams pair is moved to another section
"[ Hot chains in old perf data but source line changed (*) in new perf data ]"

             cycles: 1, hits: 26.80%                 cycles: 1, hits: 27.30%
         ---------------------------              --------------------------
                       main div.c:39                           main div.c:39*
                       main div.c:44                           main div.c:44

v3:
---
v2 has 14 patches, it's hard to review.
v3 is only 7 patches for basic stream comparison.

Jin Yao (7):
   perf util: Create source line mapping table
   perf util: Create streams for managing top N hottest callchains
   perf util: Return per-event callchain streams
   perf util: Compare two streams
   perf util: Calculate the sum of all streams hits
   perf util: Report hot streams
   perf diff: Support hot streams comparison

  tools/perf/Documentation/perf-diff.txt |  14 +
  tools/perf/builtin-diff.c              | 170 +++++++-
  tools/perf/util/Build                  |   1 +
  tools/perf/util/callchain.c            | 495 ++++++++++++++++++++++
  tools/perf/util/callchain.h            |  32 ++
  tools/perf/util/srclist.c              | 555 +++++++++++++++++++++++++
  tools/perf/util/srclist.h              |  65 +++
  7 files changed, 1319 insertions(+), 13 deletions(-)
  create mode 100644 tools/perf/util/srclist.c
  create mode 100644 tools/perf/util/srclist.h

--
2.17.1


Reply via email to