On Monday, 19 July 2021 at 10:21:58 UTC, kinke wrote:
What works reliably is a manual mov:

```
int4 _mm_add_int4(int4 a, int4 b)
{
    int4 r;
asm { "paddd %1, %2; movdqa %2, %0" : "=x" (r) : "x" (a), "x" (b); }
    return r;
}
```

This workaround is actually missing the clobber constraint for `%2`, which might be problematic after inlining.

You can also specify the registers explicitly like so (here exploiting ABI knowledge about `a` being passed in XMM1, and `b` in XMM0 for extern(D)):

```
int4 _mm_add_int4(int4 a, int4 b)
{
    asm { "paddd %1, %0" : "=xmm0" (b) : "xmm1" (a), "xmm0" (b); }
    return b;
}
```

=>

```
paddd   xmm0, xmm1
ret
```

But this might likely tamper with LLVM register allocation optimizations after inlining...

Reply via email to