https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88760
Wilco <wilco at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |wilco at gcc dot gnu.org --- Comment #4 from Wilco <wilco at gcc dot gnu.org> --- (In reply to ktkachov from comment #2) > Created attachment 45386 [details] > aarch64-llvm output with -Ofast -mcpu=cortex-a57 > > I'm attaching the full LLVM aarch64 output. > > The output you quoted is with -funroll-loops. If that's not given, GCC > doesn't seem to unroll by default at all (on aarch64 or x86_64 from my > testing). > > Is there anything we can do to make the default unrolling a bit more > aggressive? I don't think the RTL unroller works at all. It doesn't have the right settings, and doesn't understand how to unroll, so we always get inefficient and bloated code. To do unrolling correctly it has to be integrated at tree level - for example when vectorization isn't possible/beneficial, unrolling might still be a good idea.