[llvm-bugs] [Bug 174009] Improve AMDGPU codegen for copysign(x, fneg(y))

LLVM Bugs via llvm-bugs Tue, 30 Dec 2025 10:00:35 -0800

Issue	174009
Summary	Improve AMDGPU codegen for copysign(x, fneg(y))
Labels	backend:AMDGPU, missed-optimization, floating-point
Assignees
Reporter	arsenm

    InstCombine tries to replace some fmuls that only change the sign bit with copysign. In the case the sign bit needs to be flipped, it inserts an fneg. This results in poor codegen.


e.g., running this through instcombine:
```
define float @fmul_nnan_pos_zero(float %x) {
  %fmul = fmul nnan float %x, 0.0
 ret float %fmul
}

define float @fmul_nnan_neg_zero(float %x) {
  %fmul = fmul nnan float %x, -0.0
  ret float %fmul
}
```

Yields
```
define float @fmul_nnan_pos_zero(float %x) {
  %fmul = call nnan float @llvm.copysign.f32(float 0.000000e+00, float %x)
  ret float %fmul
}

define float @fmul_nnan_neg_zero(float %x) {
  %1 = fneg nnan float %x
  %fmul = call nnan float @llvm.copysign.f32(float 0.000000e+00, float %1)
  ret float %fmul
}
```

Running this through codegen:

```
fmul_nnan_pos_zero:                     ; @fmul_nnan_pos_zero
; %bb.0:
	s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	s_brev_b32 s4, -2
	v_bfi_b32 v0, s4, 0, v0
	s_setpc_b64 s[30:31]
```

```
fmul_nnan_neg_zero:                     ; @fmul_nnan_neg_zero
; %bb.0:
	s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	v_xor_b32_e32 v0, 0x80000000, v0
	s_brev_b32 s4, -2
	v_bfi_b32 v0, s4, 0, v0
	s_setpc_b64 s[30:31]
```

We can do better in both of these cases. We shouldn't produce bfi with a 0 input

_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

[llvm-bugs] [Bug 174009] Improve AMDGPU codegen for copysign(x, fneg(y))

Reply via email to