On Tue, Dec 4, 2018 at 1:18 AM Iago Toral Quiroga <ito...@igalia.com> wrote:
> The 16-bit polynomial execution doesn't meet Khronos precision > requirements. > Also, the half-float denorm range starts at 2^(-14) and with asin taking > input > values in the range [0, 1], polynomial approximations can lead to flushing > relatively easy. > > An alternative is to use the atan2 formula to compute asin, which is the > reference taken by Khronos to determine precision requirements, but that > ends up generating too many additional instructions when compared to the > polynomial approximation. Specifically, for the Intel case, doing this > adds +41 instructions to the program for each asin/acos call, which looks > like an undesirable trade off. > > So for now we take the easy way out and fallback to using the 32-bit > polynomial approximation, which is better (faster) than the 16-bit atan2 > implementation and gives us better precision that matches Khronos > requirements. > --- > src/compiler/spirv/vtn_glsl450.c | 21 +++++++++++++++++++-- > 1 file changed, 19 insertions(+), 2 deletions(-) > > diff --git a/src/compiler/spirv/vtn_glsl450.c > b/src/compiler/spirv/vtn_glsl450.c > index bb340c87416..64a1431ae14 100644 > --- a/src/compiler/spirv/vtn_glsl450.c > +++ b/src/compiler/spirv/vtn_glsl450.c > @@ -201,8 +201,20 @@ build_log(nir_builder *b, nir_ssa_def *x) > * in each case. > */ > static nir_ssa_def * > -build_asin(nir_builder *b, nir_ssa_def *x, float _p0, float _p1) > +build_asin(nir_builder *b, nir_ssa_def *_x, float _p0, float _p1) > { > + /* The polynomial approximation isn't precise enough to meet half-float > + * precision requirements. Alternatively, we could implement this using > + * the formula: > This isn't surprising. It's possible we could restructure the floating-point calculation to be more stable but just doing 32-bit seems reasonable. > + * > + * asin(x) = atan2(x, sqrt(1 - x*x)) > + * > + * But that is very expensive, so instead we just do the polynomial > + * approximation in 32-bit math and then we convert the result back to > + * 16-bit. > + */ > + nir_ssa_def *x = _x->bit_size == 16 ? nir_f2f32(b, _x) : _x; > Mind restructuring this as follows? if (x->bit_size == 16) { /* Comment goes here */ return f2f16(b, build_asin(b, f2f32(b, x), p0, p1)); } I find a bit of recursion easier to read than having two bits at the beginning and end. > + > nir_ssa_def *p0 = nir_imm_floatN_t(b, _p0, x->bit_size); > nir_ssa_def *p1 = nir_imm_floatN_t(b, _p1, x->bit_size); > nir_ssa_def *one = nir_imm_floatN_t(b, 1.0f, x->bit_size); > @@ -210,7 +222,8 @@ build_asin(nir_builder *b, nir_ssa_def *x, float _p0, > float _p1) > nir_ssa_def *m_pi_4_minus_one = > nir_imm_floatN_t(b, M_PI_4f - 1.0f, x->bit_size); > nir_ssa_def *abs_x = nir_fabs(b, x); > - return nir_fmul(b, nir_fsign(b, x), > + nir_ssa_def *result = > + nir_fmul(b, nir_fsign(b, x), > nir_fsub(b, m_pi_2, > nir_fmul(b, nir_fsqrt(b, nir_fsub(b, one, > abs_x)), > nir_fadd(b, m_pi_2, > @@ -220,6 +233,10 @@ build_asin(nir_builder *b, nir_ssa_def *x, float _p0, > float _p1) > > nir_fadd(b, p0, > > nir_fmul(b, abs_x, > > p1))))))))); > + if (_x->bit_size == 16) > + result = nir_f2f16(b, result); > + > + return result; > } > > /** > -- > 2.17.1 > > _______________________________________________ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev >
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev