Manu:

The compiler would have to do some serious magic to optimise that; flattening both sides of the if into parallel expressions, and then applying the mask to combine...

I think it's a small amount of magic.

The simple features shown in that paper are fully focused on SIMD programming, so they aren't introducing things clearly not efficient.


I'm personally not in favour of SIMD constructs that are anything less than
optimal (but I appreciate I'm probably in the minority here).


(The simple benchmarks of the paper show a 5-15% performance loss compared
to handwritten SIMD code.)


Right, as I suspected.

15% is a very small performance loss, if for the programmer the alternative is writing scalar code, that is 2 or 3 times slower :-)

The SIMD programmers that can't stand a 1% loss of performance use the intrinsics manually (or write in asm) and they ignore all other things.

A much larger population of system programmers wish to use modern CPUs efficiently, but they don't have time (or skill, this means their programs are too much often buggy) for assembly-level programming. Currently they use smart numerical C++ libraries, use modern Fortran versions, and/or write C/C++ scalar code (or Fortran), add "restrict" annotations, and take a look at the produced asm hoping the modern compiler back-ends will vectorize it. This is not good enough, and it's far from a 15% loss.

This paper shows a third way, making such kind of programming simpler and approachable for a wider audience, with a small performance loss compared to handwritten code. This is what language designers do since 60+ years :-)

Bye,
bearophile

Reply via email to