On Tuesday, 5 April 2016 at 10:27:46 UTC, Walter Bright wrote:
Besides, I think it's a poor design to customize the app for only one SIMD type. A better idea (I've repeated this ad nauseum over the years) is to have n modules, one for each supported SIMD type. Compile and link all of them in, then detect the SIMD type at runtime and call the corresponding module. (This is how the D array ops are currently implemented.)

There are many organizations in the world that are building software in-house, where such software is targeted to modern CPU SIMD types, most typically AVX/AVX2 and crypto instructions.

In these settings -- many of them scientific compute or big data center operators -- they know what servers they have, what CPU platforms they have. They don't care about portability to the past, older computers and so forth. A runtime check would make no sense for them, not for their baseline, and it would probably be a waste of time for them to design code to run on pre-AVX silicon. (AVX is not new anymore -- it's been around for a few years.)

Good examples can be found on Cloudflare's blog, especially Vlad Krasnov's posts. Here's one where he accelerates Golang's crypto libraries: https://blog.cloudflare.com/go-crypto-bridging-the-performance-gap/

Companies like CF probably spend millions of dollars on electricity, and there are some workloads where AVX-optimized code can yield tangible monetary savings.

Someone else said talked about marking "Broadwell" and other generation names. As others have said, it's better to specify features. I wanted to chime in with a couple of additional examples. Intel's transactional memory accelerating instructions (TSX) are only available on some Broadwell parts because there was a bug in the original implementation (Haswell and early Broadwell) and it's disabled on most. But the new Broadwell server chips have it, and it's a big deal for some DB workloads. Similarly, only some Skylake chips have the Secure Guard instructions (SGX), which are very powerful for creating secure enclaves on an untrusted host.

On the broader SIMD-as-first-class-citizen issue, I think it would be worth thinking about how to bake SIMD into the language instead of bolting it on. If I were designing a new language in 2016, I would take a fresh look at how SIMD could be baked into a language's core constructs. I'd think about new loop abstractions that could make SIMD easier to exploit, and how to nudge programmers away from serial monotonic mindsets and into more of a SIMD/FMA way of reasoning.

Reply via email to