Am Mon, 4 Apr 2016 11:43:58 -0700 schrieb Walter Bright <newshou...@digitalmars.com>:
> On 4/4/2016 9:21 AM, Marco Leise wrote: > > To put this to good use, we need a reliable way - basically > > a global variable - to check for SSE4 (or POPCNT, etc.). What > > we have now does not work across all compilers. > > http://dlang.org/phobos/core_cpuid.html That's what I implied in "what we have now": import core.cpuid; writeln( mmx ); // prints 'false' with GDC version(InlineAsm_X86_Any) writeln("DMD and LDC support the Dlang inline assembler"); else writeln("GDC has the GCC extended inline assembler"); Both LLVM and GCC have moved to "extended inline assemblers" that require you to provide information about input, output and scratch registers as well as memory locations, so the compiler can see through the asm-block for register allocation and inlining purposes. It's more difficult to get right, but also more rewarding, as it enables you to write no-overhead "one-liners" and "intrinsics" while having calling conventions still handled by the compiler. An example for GDC: struct DblWord { ulong lo, hi; } /// Multiplies two machine words and returns a double /// machine word. DblWord bigMul(ulong x, ulong y) { DblWord tmp = void; // '=a' and '=d' are outputs to RAX and RDX // respectively that are bound to the two // fields of 'tmp'. // '"a" x' means that we want 'x' as input in // RAX and '"rm" y' places 'y' wherever it // suits the compiler (any general purpose // register or memory location). // 'mulq %3' multiplies with the ulong // represented by the argument at index 3 (y). asm { "mulq %3" : "=a" tmp.lo, "=d" tmp.hi : "a" x, "rm" y; } return tmp; } In the above example the compiler has enough information to inline the function or directly return the result in RAX:RDX without writing to memory first. The same thing in DMD would likely have turned out slower than emulating this using several uint->ulong multiplies. Although less powerful, the LDC team implemented Dlang inline assembly according to the specs and so core.cpuid works for them. GDC on the other hand is out of the picture until either 1) GDC adds Dlang inline assembly 2) core.cpuid duplicates most of its assembly code to support the GCC extended inline assembler I would prefer a common extended inline assembler though, because when you use it for performance reasons you typically cannot go with non-inlinable Dlang asm, so you end up with pure D for DMD, GCC asm for GDC and LDC asm - three code paths. -- Marco