Manu:
They must be aligned, and multiples of N elements.
The D GC currently allocates them 16-bytes aligned (but if you
slice the array you can lose some alignment). On some new CPUs
the penalty for misalignment is small.
You often have "n" values, where n is variable. If n is large
enough and you are using D vector ops, the handling of the head
and tail doesn't waste too much time. If you have very few values
it's much better to use the SIMD code.
Well, each are valid comparisons in different situations. I'm
not sure how syntax could clearly select the one you want.
Maybe later we'll look for some syntax sugar for this.
Are D intrinsics offering instructions to perform prefetching?
Well, GCC does at least. If you're worried about performance at
this level, you're probably already using GCC :)
I think D SIMD programmers will expect something functionally
like __builtin_prefetch to be available in D too:
http://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html#index-g_t_005f_005fbuiltin_005fprefetch-3396
Thank you,
bye,
bearophile