[PATCH v4 0/7] perf: Stream comparison

2020-08-25 Thread Jin Yao
Sometimes, a small change in a hot function reducing the cycles of
this function, but the overall workload doesn't get faster. It is
interesting where the cycles are moved to.

What it would like is to diff before/after streams. The stream is the
branch history which is aggregated by the branch records from perf
samples. For example, the callchains aggregated from the branch records.
By browsing the hot stream, we can understand the hot code path.

By browsing the hot streams, we can understand the hot code path.
By comparing the cycles variation of same streams between old perf
data and new perf data, we can understand if the cycles are moved
to other codes.

The before stream is the stream in perf.data.old. The after stream
is the stream in perf.data.

Diffing before/after streams compares top N hottest streams between
two perf data files.

If all entries of one stream in perf.data.old are fully matched with
all entries of another stream in perf.data, we think two streams
are matched, otherwise the streams are not matched.

For example,

   cycles: 1, hits: 26.80% cycles: 1, hits: 27.30%
--  --
 main div.c:39   main div.c:39
 main div.c:44   main div.c:44

The above streams are matched and we can see for the same streams the
cycles (1) are equal and the callchain hit percents are slightly changed
(26.80% vs. 27.30%). That's expected.

Now let's see examples.

perf record -b ...  Generate perf.data.old with branch data
perf record -b ...  Generate perf.data with branch data
perf diff --stream

[ Matched hot streams ]

hot chain pair 1:
cycles: 1, hits: 27.77%  cycles: 1, hits: 9.24%
---  --
  main div.c:39   main div.c:39
  main div.c:44   main div.c:44

hot chain pair 2:
   cycles: 34, hits: 20.06%cycles: 27, hits: 16.98%
---  --
  __random_r random_r.c:360   __random_r random_r.c:360
  __random_r random_r.c:388   __random_r random_r.c:388
  __random_r random_r.c:388   __random_r random_r.c:388
  __random_r random_r.c:380   __random_r random_r.c:380
  __random_r random_r.c:357   __random_r random_r.c:357
  __random random.c:293   __random random.c:293
  __random random.c:293   __random random.c:293
  __random random.c:291   __random random.c:291
  __random random.c:291   __random random.c:291
  __random random.c:291   __random random.c:291
  __random random.c:288   __random random.c:288
 rand rand.c:27  rand rand.c:27
 rand rand.c:26  rand rand.c:26
   rand@pltrand@plt
   rand@pltrand@plt
  compute_flag div.c:25   compute_flag div.c:25
  compute_flag div.c:22   compute_flag div.c:22
  main div.c:40   main div.c:40
  main div.c:40   main div.c:40
  main div.c:39   main div.c:39

hot chain pair 3:
 cycles: 9, hits: 4.48%  cycles: 6, hits: 4.51%
---  --
  __random_r random_r.c:360   __random_r random_r.c:360
  __random_r random_r.c:388   __random_r random_r.c:388
  __random_r random_r.c:388   __random_r random_r.c:388
  __random_r random_r.c:380   __random_r random_r.c:380

[ Hot streams in old perf data only ]

hot chain 1:
cycles: 18, hits: 6.75%
 --
  __random_r random_r.c:360
  __random_r random_r.c:388
  __random_r random_r.c:388
  __random_r random_r.c:380
  __random_r random_r.c:357
  __random random.c:293
  __random random.c:293
  __random random.c:291
  __random random.c:291
  __random random.c:291
  __random random.c:288
 rand rand.c:27
 rand rand.c:26
   rand@plt
   rand@plt
  compute_flag div.c:25
  compute_flag div.c:22
  main div.c:40

hot chain 2:
cycles: 29,

[PATCH v4 0/7] perf: Stream comparison

2020-05-25 Thread Jin Yao
Sometimes, a small change in a hot function reducing the cycles of
this function, but the overall workload doesn't get faster. It is
interesting where the cycles are moved to.

What it would like is to diff before/after streams. The stream is the
branch history which is aggregated by the branch records from perf
samples. For example, the callchains aggregated from the branch records.
By browsing the hot stream, we can understand the hot code path.

By browsing the hot streams, we can understand the hot code path.
By comparing the cycles variation of same streams between old perf
data and new perf data, we can understand if the cycles are moved
to other codes.

The before stream is the stream in perf.data.old. The after stream
is the stream in perf.data.

Diffing before/after streams compares top N hottest streams between
two perf data files.

If all entries of one stream in perf.data.old are fully matched with
all entries of another stream in perf.data, we think two streams
are matched, otherwise the streams are not matched.

For example,

   cycles: 1, hits: 26.80% cycles: 1, hits: 27.30%
--  --
 main div.c:39   main div.c:39
 main div.c:44   main div.c:44

The above streams are matched and we can see for the same streams the
cycles (1) are equal and the callchain hit percents are slightly changed
(26.80% vs. 27.30%). That's expected.

Now let's see examples.

perf record -b ...  Generate perf.data.old with branch data
perf record -b ...  Generate perf.data with branch data
perf diff --stream

[ Matched hot streams ]

hot chain pair 1:
cycles: 1, hits: 27.77%  cycles: 1, hits: 9.24%
---  --
  main div.c:39   main div.c:39
  main div.c:44   main div.c:44

hot chain pair 2:
   cycles: 34, hits: 20.06%cycles: 27, hits: 16.98%
---  --
  __random_r random_r.c:360   __random_r random_r.c:360
  __random_r random_r.c:388   __random_r random_r.c:388
  __random_r random_r.c:388   __random_r random_r.c:388
  __random_r random_r.c:380   __random_r random_r.c:380
  __random_r random_r.c:357   __random_r random_r.c:357
  __random random.c:293   __random random.c:293
  __random random.c:293   __random random.c:293
  __random random.c:291   __random random.c:291
  __random random.c:291   __random random.c:291
  __random random.c:291   __random random.c:291
  __random random.c:288   __random random.c:288
 rand rand.c:27  rand rand.c:27
 rand rand.c:26  rand rand.c:26
   rand@pltrand@plt
   rand@pltrand@plt
  compute_flag div.c:25   compute_flag div.c:25
  compute_flag div.c:22   compute_flag div.c:22
  main div.c:40   main div.c:40
  main div.c:40   main div.c:40
  main div.c:39   main div.c:39

hot chain pair 3:
 cycles: 9, hits: 4.48%  cycles: 6, hits: 4.51%
---  --
  __random_r random_r.c:360   __random_r random_r.c:360
  __random_r random_r.c:388   __random_r random_r.c:388
  __random_r random_r.c:388   __random_r random_r.c:388
  __random_r random_r.c:380   __random_r random_r.c:380

[ Hot streams in old perf data only ]

hot chain 1:
cycles: 18, hits: 6.75%
 --
  __random_r random_r.c:360
  __random_r random_r.c:388
  __random_r random_r.c:388
  __random_r random_r.c:380
  __random_r random_r.c:357
  __random random.c:293
  __random random.c:293
  __random random.c:291
  __random random.c:291
  __random random.c:291
  __random random.c:288
 rand rand.c:27
 rand rand.c:26
   rand@plt
   rand@plt
  compute_flag div.c:25
  compute_flag div.c:22
  main div.c:40

hot chain 2:
cycles: 29,