I tried to look for a workaround for this. It seemed that using a union instead of memcpy was enough to convince GCC to optimize into a single "mov".
struct alpha unpack(uint64_t x) { union { struct alpha r; uint64_t i; } u; u.i = x; return u.r; } But that trick turned out to be short-lived. If I wrap the wrapper with another function: struct alpha wrapperwrapper(uint64_t y) { return wrapper(y); } I get the same 37-line assembly generated for this function. What's even more strange is that if I just define two identical wrappers in the same translation unit: struct alpha wrapper(uint64_t y) { return unpack(y); } struct alpha wrapper2(uint64_t y) { return unpack(y); } One of them gets optimized perfectly, while the other fails, even though the bodies of the two functions are completely identical!