https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114860

--- Comment #5 from Andrew Pinski <pinskia at gcc dot gnu.org> ---
(In reply to prathamesh3492 from comment #4)
> To check for any
> possible icache misses I used L1I_CACHE_REFILL counter, and turns out that
> there are 64% more L1 icache misses for above adrp instruction with
> a2f4be3dae0 compared to 82d6d385f97, which may (partially) explain the
> performance difference ? Although perf stat shows there are around 7% more
> L1 icache misses for whole program run with 82d6d385f97 compared to
> a2f4be3dae0.

This makes it sound like there is some code alignment issue going on or a
branch misprediction issue going on. 

bad alignment: 4aeae4
good alignment 4aec44

The good alignment case is at the (almost) start at an icache line while the
bad alignment case is in the middle. (I am assuming 64byte cache lines which I
think is correct)

Maybe look at mispredicted branches too. It might be the branch leading to this
code is being mispredicted more due to the address of the branch is now
interfeeing with another branch.

It might just have been bad luck that caused this regression in both cases
really; alignment differences and/or address differences can be bad luck.

Reply via email to