Issue 174009
Summary Improve AMDGPU codegen for copysign(x, fneg(y))
Labels backend:AMDGPU, missed-optimization, floating-point
Assignees
Reporter arsenm
    InstCombine tries to replace some fmuls that only change the sign bit with copysign. In the case the sign bit needs to be flipped, it inserts an fneg. This results in poor codegen.

e.g., running this through instcombine:
```
define float @fmul_nnan_pos_zero(float %x) {
  %fmul = fmul nnan float %x, 0.0
 ret float %fmul
}

define float @fmul_nnan_neg_zero(float %x) {
  %fmul = fmul nnan float %x, -0.0
  ret float %fmul
}
```

Yields
```
define float @fmul_nnan_pos_zero(float %x) {
  %fmul = call nnan float @llvm.copysign.f32(float 0.000000e+00, float %x)
  ret float %fmul
}

define float @fmul_nnan_neg_zero(float %x) {
  %1 = fneg nnan float %x
  %fmul = call nnan float @llvm.copysign.f32(float 0.000000e+00, float %1)
  ret float %fmul
}
```

Running this through codegen:

```
fmul_nnan_pos_zero:                     ; @fmul_nnan_pos_zero
; %bb.0:
	s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	s_brev_b32 s4, -2
	v_bfi_b32 v0, s4, 0, v0
	s_setpc_b64 s[30:31]
```

```
fmul_nnan_neg_zero:                     ; @fmul_nnan_neg_zero
; %bb.0:
	s_waitcnt vmcnt(0) expcnt(0) lgkmcnt(0)
	v_xor_b32_e32 v0, 0x80000000, v0
	s_brev_b32 s4, -2
	v_bfi_b32 v0, s4, 0, v0
	s_setpc_b64 s[30:31]
```

We can do better in both of these cases. We shouldn't produce bfi with a 0 input 

_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to