On Fri, 2018-12-07 at 09:26 -0600, Jason Ekstrand wrote: > On Tue, Dec 4, 2018 at 1:18 AM Iago Toral Quiroga <ito...@igalia.com> > wrote: > > The 16-bit polynomial execution doesn't meet Khronos precision > > requirements. > > > > Also, the half-float denorm range starts at 2^(-14) and with asin > > taking input > > > > values in the range [0, 1], polynomial approximations can lead to > > flushing > > > > relatively easy. > > > > > > > > An alternative is to use the atan2 formula to compute asin, which > > is the > > > > reference taken by Khronos to determine precision requirements, but > > that > > > > ends up generating too many additional instructions when compared > > to the > > > > polynomial approximation. Specifically, for the Intel case, doing > > this > > > > adds +41 instructions to the program for each asin/acos call, which > > looks > > > > like an undesirable trade off. > > > > > > > > So for now we take the easy way out and fallback to using the 32- > > bit > > > > polynomial approximation, which is better (faster) than the 16-bit > > atan2 > > > > implementation and gives us better precision that matches Khronos > > > > requirements. > > > > --- > > > > src/compiler/spirv/vtn_glsl450.c | 21 +++++++++++++++++++-- > > > > 1 file changed, 19 insertions(+), 2 deletions(-) > > > > > > > > diff --git a/src/compiler/spirv/vtn_glsl450.c > > b/src/compiler/spirv/vtn_glsl450.c > > > > index bb340c87416..64a1431ae14 100644 > > > > --- a/src/compiler/spirv/vtn_glsl450.c > > > > +++ b/src/compiler/spirv/vtn_glsl450.c > > > > @@ -201,8 +201,20 @@ build_log(nir_builder *b, nir_ssa_def *x) > > > > * in each case. > > > > */ > > > > static nir_ssa_def * > > > > -build_asin(nir_builder *b, nir_ssa_def *x, float _p0, float _p1) > > > > +build_asin(nir_builder *b, nir_ssa_def *_x, float _p0, float _p1) > > > > { > > > > + /* The polynomial approximation isn't precise enough to meet > > half-float > > > > + * precision requirements. Alternatively, we could implement > > this using > > > > + * the formula: > > This isn't surprising. It's possible we could restructure the > floating-point calculation to be more stable but just doing 32-bit > seems reasonable. > > > + * > > > > + * asin(x) = atan2(x, sqrt(1 - x*x)) > > > > + * > > > > + * But that is very expensive, so instead we just do the > > polynomial > > > > + * approximation in 32-bit math and then we convert the result > > back to > > > > + * 16-bit. > > > > + */ > > > > + nir_ssa_def *x = _x->bit_size == 16 ? nir_f2f32(b, _x) : _x; > > Mind restructuring this as follows? > > if (x->bit_size == 16) { > /* Comment goes here */ > return f2f16(b, build_asin(b, f2f32(b, x), p0, p1)); > }
Yes, this actually looks better to me as well. Fixed locally. Thanks! > I find a bit of recursion easier to read than having two bits at the > beginning and end. > > > + > > > > nir_ssa_def *p0 = nir_imm_floatN_t(b, _p0, x->bit_size); > > > > nir_ssa_def *p1 = nir_imm_floatN_t(b, _p1, x->bit_size); > > > > nir_ssa_def *one = nir_imm_floatN_t(b, 1.0f, x->bit_size); > > > > @@ -210,7 +222,8 @@ build_asin(nir_builder *b, nir_ssa_def *x, > > float _p0, float _p1) > > > > nir_ssa_def *m_pi_4_minus_one = > > > > nir_imm_floatN_t(b, M_PI_4f - 1.0f, x->bit_size); > > > > nir_ssa_def *abs_x = nir_fabs(b, x); > > > > - return nir_fmul(b, nir_fsign(b, x), > > > > + nir_ssa_def *result = > > > > + nir_fmul(b, nir_fsign(b, x), > > > > nir_fsub(b, m_pi_2, > > > > nir_fmul(b, nir_fsqrt(b, nir_fsub(b, > > one, abs_x)), > > > > nir_fadd(b, m_pi_2, > > > > @@ -220,6 +233,10 @@ build_asin(nir_builder *b, nir_ssa_def *x, > > float _p0, float _p1) > > > > > > nir_fadd(b, p0, > > > > > > nir_fmul(b, abs_x, > > > > > > p1))))))))); > > > > + if (_x->bit_size == 16) > > > > + result = nir_f2f16(b, result); > > > > + > > > > + return result; > > > > } > > > > > > > > /** > >
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev