- vector version is about 3% faster than above instead of 10% slower - wow!
So why is gcc 4.0 producing worse code when using intel style intrinsics and why isn't the union version using builtins as fast as using the vector version?

I can answer why unions are slower: that's because they are spilled to memory on every assignment -- GCC 4.0 knows how to replace structs with different scalar variables (one per item), but not unions. GCC 3.4 knew about none of these possibilities.


About why vectors are faster, well, a lot of the vector support has been rewritten in GCC 4.0 so that may be the case.

I do not know exactly why builtins are still slower, but you may want to create a PR and add me on the CC list ([EMAIL PROTECTED]).

Paolo

Reply via email to