https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95784

--- Comment #6 from Gabriel Ravier <gabravier at gmail dot com> ---
Created attachment 48761
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48761&action=edit
File for benchmarking this function but everything is aligned properly.

I've changed the source file slightly, it looks like the LLVM version was
faster than the "do nothing" version because the loop was misaligned. This is
the test results I get with the version with aligned loops (I've also adjusted
the amount of iterations) :

$ gcc test.S -O3 -ggdb3 -DGCC_VERSION && time ./a.out && gcc test.S -O3 -ggdb3
-DLLVM_VERSION && time ./a.out && gcc test.S -O3 -ggdb3 && time ./a.out

real    0m3.130s # GCC version
user    0m3.122s
sys     0m0.001s

real    0m2.599s # LLVM version
user    0m2.593s
sys     0m0.001s

real    0m2.597s # version that does nothing
user    0m2.591s
sys     0m0.000s

I can now note that the LLVM version is now almost as fast as literally doing
nothing, so now it looks really much better than the GCC version, at least to
me.

Reply via email to