http://gcc.gnu.org/bugzilla/show_bug.cgi?id=51078

--- Comment #11 from Grygoriy Fuchedzhy <grygoriy.fuchedzhy at gmail dot com> 
2011-11-11 21:53:50 UTC ---
I've tried different optimization options:
1. -march=native -O2,
2. -march=native -O2 -funroll-loops,
3. -march=native -O2 -funroll-all-loops,
4. -march=native -O3,
5. -march=native -O3 -funroll-loops,
6. -march=native -O3 -funroll-all-loops

And got following list of optimizations from faster to slower: 5, 6, 1, 4, 2, 3

You can see that code with automatic loop unrolling sometimes performs worse
than code without one. 2 and 3 optimizations gives 1.5 times worse result
compared to 1 variant.

5 and 6 variant is better then 1, but still manual loop unrolling performs
better(more than 25%).

Also I've tried changing number of unrolled iteration from 2 to 8 and got best
performance for 4 and 5 on core2 cpu.

Reply via email to