| Issue |
180630
|
| Summary |
Suboptimal lowering of float bitwise ops on targets without hardware support
|
| Labels |
|
| Assignees |
|
| Reporter |
tgross35
|
Demo: https://llvm.godbolt.org/z/7E6xxn47b. Input:
```llvm
define zeroext i1 @foo(half %x) unnamed_addr {
start:
%i = bitcast half %x to i16
%masked = and i16 %i, 32767
%r = icmp eq i16 %masked, 0
ret i1 %r
}
```
Instcombine turns this into:
```llvm
define zeroext i1 @foo(half %x) unnamed_addr {
start:
%r = fcmp oeq half %x, 0xH0000
ret i1 %r
}
```
Then on x86, the following is generated:
```asm
foo:
push rax
call __extendhfsf2@PLT
xorps xmm1, xmm1
cmpeqss xmm1, xmm0
movd eax, xmm1
and eax, 1
pop rcx
ret
```
The bitwise ops would be ~4 instructions. The generated code is significantly worse given the cost of calling `__extendhfsf2`.
This shows up for all float types where there isn't hardware support. For example, https://rust.godbolt.org/z/oYWYnja4q has a libcall for all of `half`, `float`, `double`, `fp128` that shouldn't be needed.
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs