On Saturday, 2 April 2016 at 06:13:24 UTC, Iain Buclaw wrote:
I would just let the compiler optimize / vectorize the operation, but then again that it is probably just me who thinks these things.
It's intended to replace the array ops in druntime, relying on vecorizers won't suffice, e.g. your example already stops working when I pass dynamic instead of static arrays.
I'm not aware of any intrinsic to load unaligned data. Only to assume alignment.
__builtin_ia32_loadups __builtin_ia32_storeups