https://issues.dlang.org/show_bug.cgi?id=15873
Marco Leise <marco.le...@gmx.de> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |marco.le...@gmx.de --- Comment #7 from Marco Leise <marco.le...@gmx.de> --- My concern is with "fast.json" where the call site reads auto json = parseJSON(...); and I feel that import core.cpuid; if (sse42) handleJson!true(); else handleJson!false(); void handleJson(bool sse42)() { auto json = parseJSON!sse42(...); } is just not palatable. ('handleJson' being needed, since the return value would be a RAII struct with compile-time specialization.) Importing core.cpuid, figuring out which flag to use and set as a template argument and writing a switch-case or if-else is not economically reasonable, so to speak when you could enable SSE4 globally and often implicitly (-march=native). Also in my case DMD wont profit, because it's inline assembly doesn't inline (making it too slow) and GDC wont profit because it is not supported by core.cpuid, leaving only LDC - but that's another story. My argument here is that the one writing SIMD code is not necessarily the one calling it. Compile-time information about the (implied) target enables us to reduce the cognitive load for library users, and still make use of the latest CPU features. This is working to great benefit with intrinsics in other compilers (for popcnt, memcpy, etc.), but we can't imitate that. So we ended up with runtime checks against a global variable in popcnt for what should be a single instruction on recent CPUs and an additional "SSE4 only" _popcnt in http://dlang.org/phobos/core_bitop.html#.popcnt --