Re: [Mesa-dev] [PATCH 10/12] st/nine: Remove usage of SQRT in ff code
On Mon, Feb 8, 2016 at 12:33 AM, Ilia Mirkin wrote: > On Sun, Feb 7, 2016 at 6:26 PM, Axel Davy wrote: >> On 08/02/2016 00:21, Ilia Mirkin wrote: >>> >>> On Sun, Feb 7, 2016 at 6:13 PM, Axel Davy wrote: SQRT is not supported everywhere, so replace it by RSQ + RCP Signed-off-by: Axel Davy --- src/gallium/state_trackers/nine/nine_ff.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/gallium/state_trackers/nine/nine_ff.c b/src/gallium/state_trackers/nine/nine_ff.c index a5466a7..894fc63 100644 --- a/src/gallium/state_trackers/nine/nine_ff.c +++ b/src/gallium/state_trackers/nine/nine_ff.c @@ -563,7 +563,8 @@ nine_ff_build_vs(struct NineDevice9 *device, struct vs_build_ctx *vs) struct ureg_src cPsz2 = ureg_DECL_constant(ureg, 27); ureg_DP3(ureg, tmp_x, ureg_src(r[1]), ureg_src(r[1])); -ureg_SQRT(ureg, tmp_y, _X(tmp)); +ureg_RSQ(ureg, tmp_y, _X(tmp)); +ureg_RCP(ureg, tmp_y, _Y(tmp)); >>> >>> I'd recommend doing >>> >>> ureg_MUL(ureg, tmp_y, _Y(tmp), _X(tmp)) >>> >>> instead. That should be (a) more numerically stable (rcp doesn't have >>> great precision), and (b) not blow up for 0. >> >> Ok for the precision, but I'm not sure for 0 >> >> With the mul version, with 0, it ends up computing inf * 0 = NaN, >> whereas with the rcp version, it does 1/inf == 0 (as far as I know), >> which is the expected result. > > Hmmm... not sure what RSQ(0) returns actually. I assumed it was NaN. > What you really want is a "flush nan to 0" option on the mul like nvc0 > has, but there's no way to express that in TGSI. > > Perhaps you can keep the SQRT if PIPE_CAP_TGSI_SQRT is exposed, and > otherwise do the MUL or the RCP. FWIW this is what glsl_to_tgsi does: > > emit_scalar(ir, TGSI_OPCODE_RSQ, result_dst, op[0]); > emit_asm(ir, TGSI_OPCODE_MUL, result_dst, result_src, op[0]); > /* For incoming channels <= 0, set the result to 0. */ > op[0].negate = ~op[0].negate; > emit_asm(ir, TGSI_OPCODE_CMP, result_dst, > op[0], result_src, st_src_reg_for_float(0.0)); FWIW, "NaN to 0" is always enabled on radeon. We also enable it for compute. Marek ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 10/12] st/nine: Remove usage of SQRT in ff code
On Sun, Feb 7, 2016 at 6:26 PM, Axel Davy wrote: > On 08/02/2016 00:21, Ilia Mirkin wrote: >> >> On Sun, Feb 7, 2016 at 6:13 PM, Axel Davy wrote: >>> >>> SQRT is not supported everywhere, so replace >>> it by RSQ + RCP >>> >>> Signed-off-by: Axel Davy >>> --- >>> src/gallium/state_trackers/nine/nine_ff.c | 3 ++- >>> 1 file changed, 2 insertions(+), 1 deletion(-) >>> >>> diff --git a/src/gallium/state_trackers/nine/nine_ff.c >>> b/src/gallium/state_trackers/nine/nine_ff.c >>> index a5466a7..894fc63 100644 >>> --- a/src/gallium/state_trackers/nine/nine_ff.c >>> +++ b/src/gallium/state_trackers/nine/nine_ff.c >>> @@ -563,7 +563,8 @@ nine_ff_build_vs(struct NineDevice9 *device, struct >>> vs_build_ctx *vs) >>> struct ureg_src cPsz2 = ureg_DECL_constant(ureg, 27); >>> >>> ureg_DP3(ureg, tmp_x, ureg_src(r[1]), ureg_src(r[1])); >>> -ureg_SQRT(ureg, tmp_y, _X(tmp)); >>> +ureg_RSQ(ureg, tmp_y, _X(tmp)); >>> +ureg_RCP(ureg, tmp_y, _Y(tmp)); >> >> I'd recommend doing >> >> ureg_MUL(ureg, tmp_y, _Y(tmp), _X(tmp)) >> >> instead. That should be (a) more numerically stable (rcp doesn't have >> great precision), and (b) not blow up for 0. > > Ok for the precision, but I'm not sure for 0 > > With the mul version, with 0, it ends up computing inf * 0 = NaN, > whereas with the rcp version, it does 1/inf == 0 (as far as I know), > which is the expected result. Hmmm... not sure what RSQ(0) returns actually. I assumed it was NaN. What you really want is a "flush nan to 0" option on the mul like nvc0 has, but there's no way to express that in TGSI. Perhaps you can keep the SQRT if PIPE_CAP_TGSI_SQRT is exposed, and otherwise do the MUL or the RCP. FWIW this is what glsl_to_tgsi does: emit_scalar(ir, TGSI_OPCODE_RSQ, result_dst, op[0]); emit_asm(ir, TGSI_OPCODE_MUL, result_dst, result_src, op[0]); /* For incoming channels <= 0, set the result to 0. */ op[0].negate = ~op[0].negate; emit_asm(ir, TGSI_OPCODE_CMP, result_dst, op[0], result_src, st_src_reg_for_float(0.0)); ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 10/12] st/nine: Remove usage of SQRT in ff code
On 08/02/2016 00:21, Ilia Mirkin wrote: On Sun, Feb 7, 2016 at 6:13 PM, Axel Davy wrote: SQRT is not supported everywhere, so replace it by RSQ + RCP Signed-off-by: Axel Davy --- src/gallium/state_trackers/nine/nine_ff.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/gallium/state_trackers/nine/nine_ff.c b/src/gallium/state_trackers/nine/nine_ff.c index a5466a7..894fc63 100644 --- a/src/gallium/state_trackers/nine/nine_ff.c +++ b/src/gallium/state_trackers/nine/nine_ff.c @@ -563,7 +563,8 @@ nine_ff_build_vs(struct NineDevice9 *device, struct vs_build_ctx *vs) struct ureg_src cPsz2 = ureg_DECL_constant(ureg, 27); ureg_DP3(ureg, tmp_x, ureg_src(r[1]), ureg_src(r[1])); -ureg_SQRT(ureg, tmp_y, _X(tmp)); +ureg_RSQ(ureg, tmp_y, _X(tmp)); +ureg_RCP(ureg, tmp_y, _Y(tmp)); I'd recommend doing ureg_MUL(ureg, tmp_y, _Y(tmp), _X(tmp)) instead. That should be (a) more numerically stable (rcp doesn't have great precision), and (b) not blow up for 0. Ok for the precision, but I'm not sure for 0 With the mul version, with 0, it ends up computing inf * 0 = NaN, whereas with the rcp version, it does 1/inf == 0 (as far as I know), which is the expected result. ureg_MAD(ureg, tmp_x, _Y(tmp), _(cPsz2), _(cPsz2)); ureg_MAD(ureg, tmp_x, _Y(tmp), _X(tmp), _(cPsz1)); ureg_RCP(ureg, tmp_x, ureg_src(tmp)); -- 2.7.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 10/12] st/nine: Remove usage of SQRT in ff code
On Sun, Feb 7, 2016 at 6:13 PM, Axel Davy wrote: > SQRT is not supported everywhere, so replace > it by RSQ + RCP > > Signed-off-by: Axel Davy > --- > src/gallium/state_trackers/nine/nine_ff.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/src/gallium/state_trackers/nine/nine_ff.c > b/src/gallium/state_trackers/nine/nine_ff.c > index a5466a7..894fc63 100644 > --- a/src/gallium/state_trackers/nine/nine_ff.c > +++ b/src/gallium/state_trackers/nine/nine_ff.c > @@ -563,7 +563,8 @@ nine_ff_build_vs(struct NineDevice9 *device, struct > vs_build_ctx *vs) > struct ureg_src cPsz2 = ureg_DECL_constant(ureg, 27); > > ureg_DP3(ureg, tmp_x, ureg_src(r[1]), ureg_src(r[1])); > -ureg_SQRT(ureg, tmp_y, _X(tmp)); > +ureg_RSQ(ureg, tmp_y, _X(tmp)); > +ureg_RCP(ureg, tmp_y, _Y(tmp)); I'd recommend doing ureg_MUL(ureg, tmp_y, _Y(tmp), _X(tmp)) instead. That should be (a) more numerically stable (rcp doesn't have great precision), and (b) not blow up for 0. > ureg_MAD(ureg, tmp_x, _Y(tmp), _(cPsz2), _(cPsz2)); > ureg_MAD(ureg, tmp_x, _Y(tmp), _X(tmp), _(cPsz1)); > ureg_RCP(ureg, tmp_x, ureg_src(tmp)); > -- > 2.7.0 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev