jayfoad wrote:
> That's right: +-0 and +-inf is still handled by full division.
v_rcp itself handles these cases correctly. Only the NR fixup messes them up.
So how about a structure like:
```
y = rcp(x);
if (x not 0 or inf) {
do the NR fixup
}
```
This avoids having the complete 12-instruction fdiv expansion bloating the code.
> I fell for the "let's remove the if-then-else and handle everything with
> selects!" idea a couple of times already. When I do that I see a performance
> regression in arithmetic intensive kernel (~30%): full division is more
> efficient that Newton-Raphson + a bunch of selects.
What architecture are you benchmarking on? In the codegen with selects
(gfx1010) I see:
```
v_cmp_lt_f32_e64 s1, 0x7e800000, |s0|
v_cmp_ngt_f32_e64 vcc_lo, 0x800000, |s0|
v_cndmask_b32_e64 v0, 1.0, 0x2f800000, s1
v_cndmask_b32_e32 v0, 0x4f800000, v0, vcc_lo
```
This is suspicious because GFX10+ has a fast path for v_cmp immediately
followed by v_cndmask, so I would expect this code to be ordered more like:
```
v_cmp_lt_f32_e64 vcc_lo, 0x7e800000, |s0|
v_cndmask_b32_e64 v0, 1.0, 0x2f800000, vcc_lo
v_cmp_ngt_f32_e64 vcc_lo, 0x800000, |s0|
v_cndmask_b32_e32 v0, 0x4f800000, v0, vcc_lo
```
> 1ULP for normals, 0 for everything else.
Isn't there some existing metadata or similar that specifies the max ULP error
the user will tolerate? You could piggy back on that instead of inventing a new
command line option to control this.
https://github.com/llvm/llvm-project/pull/194716
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits