On Fri, 28 Aug 2020, Prathamesh Kulkarni via Gcc wrote:

> I wonder if that's (one of) the main factor(s) behind slowdown or it's
> not too relevant ?

Probably not. Some advice to make your search more directed:

Pass '-n' to 'perf report'. Relative sample ratios are hard to reason about
when they are computed against different bases, it's much easier to see that
a loop is slowing down if it went from 4000 to 4500 in absolute sample count
as opposed to 90% to 91% in relative sample ratio.

Before diving down 'perf report', be sure to fully account for differences
in 'perf stat' output. Do the programs execute the same number of instructions,
so the difference only in scheduling? Do the programs suffer from the same
amount of branch mispredictions? Please show output of 'perf stat' on the
mailing list too, so everyone is on the same page about that.

I also suspect that the dramatic slowdown has to do with the extra branch.
Your CPU might have some specialized counters for branch prediction, see
'perf list'.

Alexander

Reply via email to