On Wed, Jul 20, 2011 at 10:47 AM, Robin Gareus <ro...@gareus.org> wrote: >> On gcc/Linux, (gcc 4.5.2) the same code produce a *slow down* of around >> 2.5x. >> >> Well, anybody have an idea of why ? >> >> I am actually running linux (Ubuntu 11.04) under a VMWare virtual >> machine, i do not know is this may have any implications. > > Maybe. A better comparison would be: clang/Linux vs. gcc/Linux and > clang/MacOSX vs gcc/MacOSX compiled binaries. > > Also as Dan already pointed out: gcc has a whole lot of optimization > flags which are not enabled by default. try '-O3 -msse2 -ffast-math'. > '-ftree-vectorizer-verbose=2' is handy while optimizing code.
In addition... inspecting the disassembly is helpful (-S -o myprogram.s). Rule of thumb is that you should have `movaps` (MOVe Aligned Packed-Storage) and `mulps` (MULtiply Packed Storage) instructions for multiplying vectors of single-precision floats. In addition... profiling with valgrind/callgrind is helpful (esp. if you have it dump instructions/assembly)... $ valgrind --tool=callgrind --dump-instr=yes ./myprogram Open the output file with kcachegrind and it'll save you a lot of time. -gabriel _______________________________________________ Linux-audio-dev mailing list Linux-audio-dev@lists.linuxaudio.org http://lists.linuxaudio.org/listinfo/linux-audio-dev