https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114860
--- Comment #5 from Andrew Pinski <pinskia at gcc dot gnu.org> --- (In reply to prathamesh3492 from comment #4) > To check for any > possible icache misses I used L1I_CACHE_REFILL counter, and turns out that > there are 64% more L1 icache misses for above adrp instruction with > a2f4be3dae0 compared to 82d6d385f97, which may (partially) explain the > performance difference ? Although perf stat shows there are around 7% more > L1 icache misses for whole program run with 82d6d385f97 compared to > a2f4be3dae0. This makes it sound like there is some code alignment issue going on or a branch misprediction issue going on. bad alignment: 4aeae4 good alignment 4aec44 The good alignment case is at the (almost) start at an icache line while the bad alignment case is in the middle. (I am assuming 64byte cache lines which I think is correct) Maybe look at mispredicted branches too. It might be the branch leading to this code is being mispredicted more due to the address of the branch is now interfeeing with another branch. It might just have been bad luck that caused this regression in both cases really; alignment differences and/or address differences can be bad luck.