One problem I have with the vectorizer on by default is that it enables tree loop unrolling, which sometimes generates quite bloated/weird code and it's unclear if it helps.
Would it be possible to only do the unrolling when vectorizing? Also I suspect the trade off on vectorizing is different between architectures that support unaligned vectors well and others that don't. With unalignment handling vectorized code often seems to be very bloated. -Andi -- a...@linux.intel.com -- Speaking for myself only