On Monday, 19 July 2021 at 10:21:58 UTC, kinke wrote:
What works reliably is a manual mov:
```
int4 _mm_add_int4(int4 a, int4 b)
{
int4 r;
asm { "paddd %1, %2; movdqa %2, %0" : "=x" (r) : "x" (a),
"x" (b); }
return r;
}
```
This workaround is actually missing the clobber constraint for
`%2`, which might be problematic after inlining.
You can also specify the registers explicitly like so (here
exploiting ABI knowledge about `a` being passed in XMM1, and `b`
in XMM0 for extern(D)):
```
int4 _mm_add_int4(int4 a, int4 b)
{
asm { "paddd %1, %0" : "=xmm0" (b) : "xmm1" (a), "xmm0" (b); }
return b;
}
```
=>
```
paddd xmm0, xmm1
ret
```
But this might likely tamper with LLVM register allocation
optimizations after inlining...