https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121099
Bug ID: 121099
Summary: GCC doesn't optimize `_mm_set_ps()` very well
Product: gcc
Version: 15.1.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: target
Assignee: unassigned at gcc dot gnu.org
Reporter: lh_mouse at 126 dot com
Target Milestone: ---
For this 4-way comparison function:
(https://gcc.godbolt.org/z/sY4vdcjdq)
```
// Returns (angles are in degrees)
// - 0b1110 for 0 - 45 where x > y > 0 > -x
// - 0b1111 for 45 - 90 where y > x > 0 > -x
// - 0b0111 for 90 - 135 where y > -x > 0 > x
// - 0b0011 for 135 - 180 where -x > y > 0 > x
// - 0b0001 for 180 - 225 where -x > 0 > y > x
// - 0b0000 for 225 - 270 where -x > 0 > x > y
// - 0b1000 for 270 - 315 where x > 0 > -x > y
// - 0b1100 for 315 - 360 where x > 0 > y > -x
int
octant_of_angle(float y, float x)
{
__m128 ps = _mm_cmpgt_ps(_mm_set_ps(x, x, y, y), _mm_set_ps(0, -y, 0, x));
return _mm_movemask_ps(ps);
}
```
GCC emits 8 instructions for the two `_mm_set_ps()` intrins:
```
vunpcklps xmm2, xmm0, xmm0
vxorps xmm0, xmm0, XMMWORD PTR .LC0[rip]
vxorps xmm4, xmm4, xmm4
vunpcklps xmm3, xmm1, xmm1
vunpcklps xmm1, xmm1, xmm4
vunpcklps xmm0, xmm0, xmm4
vmovlhps xmm2, xmm2, xmm3
vmovlhps xmm1, xmm1, xmm0
```
while Clang only emits 3:
```
vshufps xmm2, xmm0, xmm1, 0
vxorps xmm0, xmm0, xmmword ptr [rip + .LCPI0_0]
vinsertps xmm0, xmm1, xmm0, 42
```