https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114860
--- Comment #9 from Tamar Christina <tnfchris at gcc dot gnu.org> --- (In reply to prathamesh3492 from comment #8) > Hi Tamar, > Using -falign-loops=5 indeed brings back the performance. > The adrp instruction has same address (0x4ae784) by setting -falign-loops=5 > (which reduces misalignment to 4) with/without a2f4be3dae0. So I guess this > is really code-alignment issue ? > Indeed, we don't aggressively align loops if they require too much padding to not bloat the binaries too much. That's why sometimes you just get unlucky and the hot loop gets misaligned.