jayfoad wrote:

> That's right: +-0 and +-inf is still handled by full division.

v_rcp itself handles these cases correctly. Only the NR fixup messes them up. 
So how about a structure like:
```
y = rcp(x);
if (x not 0 or inf) {
  do the NR fixup
}
```
This avoids having the complete 12-instruction fdiv expansion bloating the code.

> I fell for the "let's remove the if-then-else and handle everything with 
> selects!" idea a couple of times already. When I do that I see a performance 
> regression in arithmetic intensive kernel (~30%): full division is more 
> efficient that Newton-Raphson + a bunch of selects.

What architecture are you benchmarking on? In the codegen with selects 
(gfx1010) I see:
```
        v_cmp_lt_f32_e64 s1, 0x7e800000, |s0|
        v_cmp_ngt_f32_e64 vcc_lo, 0x800000, |s0|
        v_cndmask_b32_e64 v0, 1.0, 0x2f800000, s1
        v_cndmask_b32_e32 v0, 0x4f800000, v0, vcc_lo
```
This is suspicious because GFX10+ has a fast path for v_cmp immediately 
followed by v_cndmask, so I would expect this code to be ordered more like:
```
        v_cmp_lt_f32_e64 vcc_lo, 0x7e800000, |s0|
        v_cndmask_b32_e64 v0, 1.0, 0x2f800000, vcc_lo
        v_cmp_ngt_f32_e64 vcc_lo, 0x800000, |s0|
        v_cndmask_b32_e32 v0, 0x4f800000, v0, vcc_lo
```

> 1ULP for normals, 0 for everything else.

Isn't there some existing metadata or similar that specifies the max ULP error 
the user will tolerate? You could piggy back on that instead of inventing a new 
command line option to control this.

https://github.com/llvm/llvm-project/pull/194716
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to