Re: intel-intrinsics v1.0.0

Guillaume Piolat via Digitalmars-d-announce Thu, 14 Feb 2019 06:07:03 -0800

On Wednesday, 13 February 2019 at 23:26:48 UTC, Crayo List wrote:

On Wednesday, 13 February 2019 at 19:55:05 UTC, GuillaumePiolat wrote:
On Wednesday, 13 February 2019 at 04:57:29 UTC, Crayo Listwrote:
On Wednesday, 6 February 2019 at 01:05:29 UTC, GuillaumePiolat wrote:
"intel-intrinsics" is a DUB package for people interested inx86 performance that want neither to write assembly, nor aLDC-specific snippet... and still have fastest possible code.
This is really cool and I appreciate your efforts!
However (for those who are unaware) there is an alternativeway that is (arguably) better;
https://ispc.github.io/index.html
You can write portable vectorized code that can be triviallyinvoked from D.
ispc is another compiler in your build, and you'd write inanother language, so it's not really the same thing.
That's mostly what I said, except that I did not say it's thesame thing.It's an alternative way to produce vectorized code in adeterministic and portable way.
This is NOT an auto-vectorizing compiler!
I haven't used it (nor do I know anyone who do) so don'treally know why it would be any better
And that's precisely why I posted here; for those people thathave interest in vectorizing their code in a portable way to beaware that there is another (arguably) better way.
I highly recommend browsing through the walkthrough example;
https://ispc.github.io/example.html
For example, I have code that I can run on my Xeon Phi 7250Knights Landing CPU by compiling with--target=avx512knl-i32x16, then I can run the exact same codewith no change at all on my i7-5820k by compiling with--target=avx2-i32x8. Each time I get optimal code. This is notsomething you can easily do with intrinsics!

I don't disagree but ispc sounds more like a host-only OpenCL tome, rather than a replacement/competition for intel-intrinsics.

Intrinsics are easy: if calling another compiler with anothersource language might be trivial, then importing a DUB packageand start using it within the same source code is even moretrivial!

I take issue with the claim that Single Program Multiple Datayields much more performance than well written intrinsics code:when your compiler auto-vectorize (or you vectorized using SIMDsemantics) you _also_ have one instruction for multiple data. Theonly gain I can see for SPMD would be use of non-temporal writes,since they are so hard to use effectively in practice.

I also take some issue with "portability": SIMD intrinsicsoptimize quite deterministically (some instructions get generatedsince LDC 1.0.0 -O0), also LLVM IR is portable to ARM, whereasispc will likely never as admitted by its author:https://pharr.org/matt/blog/2018/04/29/ispc-retrospective.html

My interests on AVX-512 are subnormal: it can _slow down_ thingson some x86 CPUs:https://gist.github.com/rygorous/32bc3ea8301dba09358fd2c64e02d774In general the latest instructions sets are increasingly hard toapply, and have lower yield.

The newer Intel instruction sets are basically a scam for theperformance-minded. Sponsored work on x265 yields reallyabnormally low results, rewriting things with AVX-512:https://software.intel.com/en-us/articles/accelerating-x265-with-intel-advanced-vector-extensions-512-intel-avx-512

As to compiling precisely for the host target: we are buildingB2C software here so don't control the host machine. Thankfullythe ancient SIMD instructions sets yield most of the value! Sincea lot of the time memory throughput is the bottleneck.

I can see ispc being more useful when you know the precise modelof your target Intel CPU. I would also like to see it compare toIntel's own software OpenCL: it seems it started its life asinternal competition.

Re: intel-intrinsics v1.0.0

Reply via email to