Am Mon, 4 Apr 2016 11:43:58 -0700
schrieb Walter Bright <newshou...@digitalmars.com>:

> On 4/4/2016 9:21 AM, Marco Leise wrote:
> >    To put this to good use, we need a reliable way - basically
> >    a global variable - to check for SSE4 (or POPCNT, etc.). What
> >    we have now does not work across all compilers.  
> 
> http://dlang.org/phobos/core_cpuid.html

That's what I implied in "what we have now":

        import core.cpuid;

        writeln( mmx );  // prints 'false' with GDC
        version(InlineAsm_X86_Any)
                writeln("DMD and LDC support the Dlang inline assembler");
        else
                writeln("GDC has the GCC extended inline assembler");

Both LLVM and GCC have moved to "extended inline assemblers"
that require you to provide information about input, output
and scratch registers as well as memory locations, so the
compiler can see through the asm-block for register allocation
and inlining purposes. It's more difficult to get right, but
also more rewarding, as it enables you to write no-overhead
"one-liners" and "intrinsics" while having calling conventions
still handled by the compiler. An example for GDC:

        struct DblWord { ulong lo, hi; }

        /// Multiplies two machine words and returns a double
        /// machine word.
        DblWord bigMul(ulong x, ulong y)
        {
                DblWord tmp = void;
                // '=a' and '=d' are outputs to RAX and RDX
                // respectively that are bound to the two
                // fields of 'tmp'.
                // '"a" x' means that we want 'x' as input in
                // RAX and '"rm" y' places 'y' wherever it
                // suits the compiler (any general purpose
                // register or memory location).
                // 'mulq %3' multiplies with the ulong
                // represented by the argument at index 3 (y). 
                asm {
                        "mulq %3"
                         : "=a" tmp.lo, "=d" tmp.hi
                         : "a" x, "rm" y;
                }
                return tmp;
        }

In the above example the compiler has enough information to
inline the function or directly return the result in RAX:RDX
without writing to memory first. The same thing in DMD would
likely have turned out slower than emulating this using
several uint->ulong multiplies.

Although less powerful, the LDC team implemented Dlang inline
assembly according to the specs and so core.cpuid works for
them. GDC on the other hand is out of the picture until either
1) GDC adds Dlang inline assembly
2) core.cpuid duplicates most of its assembly code to support
   the GCC extended inline assembler

I would prefer a common extended inline assembler though,
because when you use it for performance reasons you typically
cannot go with non-inlinable Dlang asm, so you end up with pure
D for DMD, GCC asm for GDC and LDC asm - three code paths.

-- 
Marco

Reply via email to