Another thing worth noting is that I believe Intel has put some
effort into next gen (?) LLVM/Clang for autovectorizing into
AVX2. It might be worth looking into as it uses a mask that
allows the CPU to skip computations that would lead to no change,
but I think it is only available on last gen Intel CPUs.
Also worth keeping in mind is that future versions of LLVM will
have to deal with GCC extensions and perhaps also Clang pragmas.
So maybe take a look at:
http://clang.llvm.org/docs/LanguageExtensions.html#vectors-and-extended-vectors
and
http://clang.llvm.org/docs/LanguageExtensions.html#extensions-for-loop-hint-optimizations
?