Re: Any usable SIMD implementation?

Joe Duarte via Digitalmars-d Sun, 17 Apr 2016 17:31:57 -0700

On Tuesday, 5 April 2016 at 10:27:46 UTC, Walter Bright wrote:

Besides, I think it's a poor design to customize the app foronly one SIMD type. A better idea (I've repeated this adnauseum over the years) is to have n modules, one for eachsupported SIMD type. Compile and link all of them in, thendetect the SIMD type at runtime and call the correspondingmodule. (This is how the D array ops are currently implemented.)

There are many organizations in the world that are buildingsoftware in-house, where such software is targeted to modern CPUSIMD types, most typically AVX/AVX2 and crypto instructions.

In these settings -- many of them scientific compute or big datacenter operators -- they know what servers they have, what CPUplatforms they have. They don't care about portability to thepast, older computers and so forth. A runtime check would make nosense for them, not for their baseline, and it would probably bea waste of time for them to design code to run on pre-AVXsilicon. (AVX is not new anymore -- it's been around for a fewyears.)

Good examples can be found on Cloudflare's blog, especially VladKrasnov's posts. Here's one where he accelerates Golang's cryptolibraries:https://blog.cloudflare.com/go-crypto-bridging-the-performance-gap/

Companies like CF probably spend millions of dollars onelectricity, and there are some workloads where AVX-optimizedcode can yield tangible monetary savings.

Someone else said talked about marking "Broadwell" and othergeneration names. As others have said, it's better to specifyfeatures. I wanted to chime in with a couple of additionalexamples. Intel's transactional memory accelerating instructions(TSX) are only available on some Broadwell parts because therewas a bug in the original implementation (Haswell and earlyBroadwell) and it's disabled on most. But the new Broadwellserver chips have it, and it's a big deal for some DB workloads.Similarly, only some Skylake chips have the Secure Guardinstructions (SGX), which are very powerful for creating secureenclaves on an untrusted host.

On the broader SIMD-as-first-class-citizen issue, I think itwould be worth thinking about how to bake SIMD into the languageinstead of bolting it on. If I were designing a new language in2016, I would take a fresh look at how SIMD could be baked into alanguage's core constructs. I'd think about new loop abstractionsthat could make SIMD easier to exploit, and how to nudgeprogrammers away from serial monotonic mindsets and into more ofa SIMD/FMA way of reasoning.

Re: Any usable SIMD implementation?

Reply via email to