https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63503
--- Comment #21 from Evandro <e.menezes at samsung dot com> --- (In reply to ramana.radhakrish...@arm.com from comment #20) > What's the kind of performance delta you see if you managed to unroll > the loop just a wee bit ? Probably not much looking at the code produced > here. Comparing the cycle counts on Juno when running the program from the matrix multiplication test above built with -Ofast and unrolling: -fno-unroll-loops: 592000 -funroll-loops --param max-unroll-times=2: 594000 -funroll-loops --param max-unroll-times=4: 592000 -funroll-loops: 590000 (implies --param max-unroll-times=8) -funroll-loops --param max-unroll-times=16: 581000 It seems to me that without effective iv-opt in place, loops have to be unrolled too aggressively to make any difference in this case, greatly sacrificing code size.