Re: [Mesa-dev] NaN behavior in GLSL (was Re: [PATCH] glsl: always do sqrt(abs()) and inversesqrt(abs()))

Axel Davy Sat, 14 Jan 2017 14:48:06 -0800

On 13/01/2017 19:06, Nicolai Hähnle wrote:

On 13.01.2017 18:53, Jason Ekstrand wrote:

On Fri, Jan 13, 2017 at 8:43 AM, Marek Olšák <mar...@gmail.com
<mailto:mar...@gmail.com>> wrote:


    On Fri, Jan 13, 2017 at 5:25 PM, Jason Ekstrand
    <ja...@jlekstrand.net <mailto:ja...@jlekstrand.net>> wrote:
    > On Fri, Jan 13, 2017 at 4:05 AM, Marek Olšák <mar...@gmail.com
    <mailto:mar...@gmail.com>> wrote:
    >>
    >> On Fri, Jan 13, 2017 at 3:37 AM, Ilia Mirkin
    <imir...@alum.mit.edu <mailto:imir...@alum.mit.edu>> wrote:
    >> > On Thu, Jan 12, 2017 at 9:13 PM, Jason Ekstrand
    <ja...@jlekstrand.net <mailto:ja...@jlekstrand.net>>
    >> > wrote:
    >> >> Unless, of course, it's controlled by the same hardware bit...
    Clearly,
    >> >> we
    >> >> can can give you abs on rsq without denorm flushing (easy
    shader hacks)
    >> >> but
    >> >> not the other way around.
    >> >
    >> > OK, so somehow I missed that earlier. However there's an
    interesting
    >> > section in the PRM:
    >> >
    >> >
    >> >
https://01.org/sites/default/files/documentation/intel-gfx-prm-osrc-skl-vol07-3d_media_gpgpu.pdf
<https://01.org/sites/default/files/documentation/intel-gfx-prm-osrc-skl-vol07-3d_media_gpgpu.pdf>
    >> >

>> > on PDF page 854, "Dismissed Legacy Behaviors" which has alist of

    >> > suggested IEEE 754 deviations for DX9. One of them is indeed
    that 0 *

>> > x = 0, but another is that input NaNs be propagated withcertain

    >> > exceptions. Also they suggest that RCP(0)/RSQ(0) = fmax.
    Interesting.
    >> >

>> > So at this point, the zero_wins thing is pretty much blown.i965>> > appears to have an all-or-nothing approach, and additionallythat

    >> > approach doesn't match up exactly to what NVIDIA does (or at
    least I'm
    >> > not aware of a clamp-everything mode).
    >> >
    >> > This will take some thought to figure out how something can be
    >> > specified so that a single spec works for both i965 and nv/amd.
    OTOH
    >> > we could have two different specs that just expose different
    things -
    >> > e.g. i965 could expose a MESA_shader_float_alt_mode or whatever
    which

>> > is spec'd to do the things that the PRM says, and nv/amdhave the

    >> > MESA_shader_float_zero_wins ext which does what we were talking
    about
    >> > earlier.
    >> >
    >> > I'm open to other suggestions too.
    >>

>> There is also the "small" problem that it would take anon-trivial>> effort for us on the LLVM side. You guys can flip a switch. Wecan't.

    >
    >
    > Don't you have to expend that effort for ARB programs anyway?  I
    thought
    > they weren't supposed to generate NaN either.

    No, we don't, because st/mesa adds abs before RSQ and the driver
    implements POW as log+mul+exp, where mul follows the rule

0*anything=0. I don't think any other opcode follows that rulethough.



Ah.  That makes sense.  Do you also implement DIV as MUL+RCP?

For single-precision, yes. For double-precision, it seems we need tomove away from that due to precision issues (which is itself a bitodd, since you don't seem to have encountered that?).


Nicolai

 If so,
the two of those should take care of NaN getting generated in the
shader.  We'd still have to do something about inf and maybe denorms.


I did some tests on Ivy Bridge and amd 7730m on Windows 10.

======= The tests ========

With a sm3 pixel shader (writing to a fp32 render target and reading theresult):

Intel: Things seem to match the ALT mode (log, rcp, rsq clamped. No NaNgenerated except if using a NaN constant as input.)


Amd: log is clamped, rcp and rsq do produce INF. NaN is propagated.

Matteo did test on NVidia: log, rcp and rsq do produce INF. NaN ispropagated.


Common to all cards:

0*NaN/Inf/-Inf = 0

nrm(inf, inf, inf) = (0, 0, 0) (probably comes from 0*anything = 0)

I tested the same thing with a sm2 pixel shader, and the results werenot affected on Intel and Amd.

Adapting wine initial fp_special_test to test what happens in vertexshaders (the output is not written to fp32 render target, instead 3unorm values are produced to try to guess what happens, so results areharder to interprete, and the following may have some errors).

Intel: log, rcp and rsq are clamped. One of the unorm values changeswhen using vs/ps 3 instead of vs/ps 2 for the part of the test where thevs outputs a shader constant containing NaN or Inf, thus perhaps thereis a slight change there in the rasterizer behaviour. The results arenot enough to deduce whether the ALT mode is used or not.

Amd: log is clamped, rcp and rsq are clamped when using vs 2 and produceINF when using vs 3. 0 * rcp(0) = 0 * rsq(0) = 0.

The filled nvidia results for the vs 2 version of the test say log, rcpand rsq are not clamped.

The fact 0*inf = 0 instead of NaN contradicts the r500 docs and thegeforce 6 docs.

The intel, r500 and geforce 6 docs seem to indicate there is alsospecific NaN behaviours with CMP, MIN and MAX.

For MIN and MAX, it is written in case one of the two terms in NaN, thesecond term is always returned (whereas apparently for dx10 it is thenon-NaN term). I haven't done tests for MIN/MAX.



====== Conclusion ======

There seems to be a lot of variations between what vendors do. It seemseither having access to the ALT mode or having access to the0*anything=0 would make wine and nine happy.

The ALT mode sounds like something that can be emulated (basicallyclamping everything that can produce inf), but I think some apps are nothappy with that (Please confirm Matteo ?), perhaps the unknown thingsintel seems to do different for vs2 and vs3, or the other specificthings the ALT mode does, do fix these apps. It probably is stillinteresting for wine and nine to have these (ALT and 0*inf = 0) asextensions.



It probably is ok to not specify the behaviours around NaN.

With 0*anything = 0, the only way to have NaN is either to feed NaN viathe vertex inputs or the constants (something which the intel spec saysis forbiden in dx9), or to have inf in a vertex shader output (itbecomes NaN as pixel shader input). If some games are hit by a problemdue to that, it probably can be fixed by app workaround doing more clamping.That said, apps that require 0*inf = 0 could also be fixed by appworkaround doing more clamping and avoiding inf generation.

Since it seems easy to have an intel ALT mode extension and an0*(+-inf)=0 extension, I would think it is a good idea to have them, butwhat the tests show is that they may not be required.



Axel


_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Re: [Mesa-dev] NaN behavior in GLSL (was Re: [PATCH] glsl: always do sqrt(abs()) and inversesqrt(abs()))

Reply via email to