On Wednesday, 7 September 2016 at 11:53:00 UTC, Marc Schütz wrote:
Would a `vectorize` range adapter be feasible that prepares the input to make it SIMD compatible? That is, force alignment, process left-over elements at the end, etc.? As far as I understand, the problems with auto vectorization stem from a difficulty of compilers to recognize vectorizing opportunities, and (as Manu described) from incompatible semantics of scalar and vector types that the compiler needs to preserve. But if that hypothetical `vectorize` helper forces the input data into one of a handful of well-known formats and types, wouldn't it be possible to make the compilers recognize those (especially if they are accompanied by suitable pragma or other compiler hints)?


Contrary to popular belief, alignment is not a showstopper of SIMD code. Both Intel and ARM processors have instructions to access data from unaligned addresses. And on Intel processors, there is not even any speed penalty for using them on aligned addresses. Which means you can either forget it (on Intel) or just check the data alignment before you start and choose an optimal specialization of the main loop.

However regarding auto vectorization, I'm with Manu. I won't put my bet on auto vectorization because I have never seen any non-trivial auto vectorized code that comes even close to hand tuned SIMD code. The compiler always have to play conservatively. The compiler has no idea that you are only using 10 bits of each 16bit components in a vector. It can't even help you shuffle RGBARGBARGBARGBA into RRRRGGGGBBBBAAAA. The best we can do is to create something that makes writing SIMD kernels easy/reusable/composable.

Reply via email to