The 16-bit polynomial execution doesn't meet Khronos precision requirements. Also, the half-float denorm range starts at 2^(-14) and with asin taking input values in the range [0, 1], polynomial approximations can lead to flushing relatively easy.
An alternative is to use the atan2 formula to compute asin, which is the reference taken by Khronos to determine precision requirements, but that ends up generating too many additional instructions when compared to the polynomial approximation. Specifically, for the Intel case, doing this adds +41 instructions to the program for each asin/acos call, which looks like an undesirable trade off. So for now we take the easy way out and fallback to using the 32-bit polynomial approximation, which is better (faster) than the 16-bit atan2 implementation and gives us better precision that matches Khronos requirements. --- src/compiler/spirv/vtn_glsl450.c | 21 +++++++++++++++++++-- 1 file changed, 19 insertions(+), 2 deletions(-) diff --git a/src/compiler/spirv/vtn_glsl450.c b/src/compiler/spirv/vtn_glsl450.c index bb340c87416..64a1431ae14 100644 --- a/src/compiler/spirv/vtn_glsl450.c +++ b/src/compiler/spirv/vtn_glsl450.c @@ -201,8 +201,20 @@ build_log(nir_builder *b, nir_ssa_def *x) * in each case. */ static nir_ssa_def * -build_asin(nir_builder *b, nir_ssa_def *x, float _p0, float _p1) +build_asin(nir_builder *b, nir_ssa_def *_x, float _p0, float _p1) { + /* The polynomial approximation isn't precise enough to meet half-float + * precision requirements. Alternatively, we could implement this using + * the formula: + * + * asin(x) = atan2(x, sqrt(1 - x*x)) + * + * But that is very expensive, so instead we just do the polynomial + * approximation in 32-bit math and then we convert the result back to + * 16-bit. + */ + nir_ssa_def *x = _x->bit_size == 16 ? nir_f2f32(b, _x) : _x; + nir_ssa_def *p0 = nir_imm_floatN_t(b, _p0, x->bit_size); nir_ssa_def *p1 = nir_imm_floatN_t(b, _p1, x->bit_size); nir_ssa_def *one = nir_imm_floatN_t(b, 1.0f, x->bit_size); @@ -210,7 +222,8 @@ build_asin(nir_builder *b, nir_ssa_def *x, float _p0, float _p1) nir_ssa_def *m_pi_4_minus_one = nir_imm_floatN_t(b, M_PI_4f - 1.0f, x->bit_size); nir_ssa_def *abs_x = nir_fabs(b, x); - return nir_fmul(b, nir_fsign(b, x), + nir_ssa_def *result = + nir_fmul(b, nir_fsign(b, x), nir_fsub(b, m_pi_2, nir_fmul(b, nir_fsqrt(b, nir_fsub(b, one, abs_x)), nir_fadd(b, m_pi_2, @@ -220,6 +233,10 @@ build_asin(nir_builder *b, nir_ssa_def *x, float _p0, float _p1) nir_fadd(b, p0, nir_fmul(b, abs_x, p1))))))))); + if (_x->bit_size == 16) + result = nir_f2f16(b, result); + + return result; } /** -- 2.17.1 _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev