I tried to look for a workaround for this.  It seemed that using a
union instead of memcpy was enough to convince GCC to optimize into a
single "mov".

    struct alpha unpack(uint64_t x)
    {
        union {
            struct alpha r;
            uint64_t i;
        } u;
        u.i = x;
        return u.r;
    }

But that trick turned out to be short-lived.  If I wrap the wrapper
with another function:

    struct alpha wrapperwrapper(uint64_t y)
    {
        return wrapper(y);
    }

I get the same 37-line assembly generated for this function.  What's
even more strange is that if I just define two identical wrappers in
the same translation unit:

    struct alpha wrapper(uint64_t y)
    {
        return unpack(y);
    }

    struct alpha wrapper2(uint64_t y)
    {
        return unpack(y);
    }

One of them gets optimized perfectly, while the other fails, even
though the bodies of the two functions are completely identical!

Reply via email to