Re: Taking pipeline processing to the next level

finalpatch via Digitalmars-d Wed, 07 Sep 2016 05:32:17 -0700

On Wednesday, 7 September 2016 at 11:53:00 UTC, Marc Schütz wrote:

Would a `vectorize` range adapter be feasible that prepares theinput to make it SIMD compatible? That is, force alignment,process left-over elements at the end, etc.? As far as Iunderstand, the problems with auto vectorization stem from adifficulty of compilers to recognize vectorizing opportunities,and (as Manu described) from incompatible semantics of scalarand vector types that the compiler needs to preserve. But ifthat hypothetical `vectorize` helper forces the input data intoone of a handful of well-known formats and types, wouldn't itbe possible to make the compilers recognize those (especiallyif they are accompanied by suitable pragma or other compilerhints)?

Contrary to popular belief, alignment is not a showstopper ofSIMD code. Both Intel and ARM processors have instructions toaccess data from unaligned addresses. And on Intel processors,there is not even any speed penalty for using them on alignedaddresses. Which means you can either forget it (on Intel) orjust check the data alignment before you start and choose anoptimal specialization of the main loop.

However regarding auto vectorization, I'm with Manu. I won't putmy bet on auto vectorization because I have never seen anynon-trivial auto vectorized code that comes even close to handtuned SIMD code. The compiler always have to play conservatively.The compiler has no idea that you are only using 10 bits of each16bit components in a vector. It can't even help you shuffleRGBARGBARGBARGBA into RRRRGGGGBBBBAAAA. The best we can do is tocreate something that makes writing SIMD kernelseasy/reusable/composable.

Re: Taking pipeline processing to the next level

Reply via email to