On 13/01/2017 19:06, Nicolai Hähnle wrote:
On 13.01.2017 18:53, Jason Ekstrand wrote:
On Fri, Jan 13, 2017 at 8:43 AM, Marek Olšák <mar...@gmail.com
<mailto:mar...@gmail.com>> wrote:

    On Fri, Jan 13, 2017 at 5:25 PM, Jason Ekstrand
    <ja...@jlekstrand.net <mailto:ja...@jlekstrand.net>> wrote:
    > On Fri, Jan 13, 2017 at 4:05 AM, Marek Olšák <mar...@gmail.com
    <mailto:mar...@gmail.com>> wrote:
    >>
    >> On Fri, Jan 13, 2017 at 3:37 AM, Ilia Mirkin
    <imir...@alum.mit.edu <mailto:imir...@alum.mit.edu>> wrote:
    >> > On Thu, Jan 12, 2017 at 9:13 PM, Jason Ekstrand
    <ja...@jlekstrand.net <mailto:ja...@jlekstrand.net>>
    >> > wrote:
    >> >> Unless, of course, it's controlled by the same hardware bit...
    Clearly,
    >> >> we
    >> >> can can give you abs on rsq without denorm flushing (easy
    shader hacks)
    >> >> but
    >> >> not the other way around.
    >> >
    >> > OK, so somehow I missed that earlier. However there's an
    interesting
    >> > section in the PRM:
    >> >
    >> >
    >> >
https://01.org/sites/default/files/documentation/intel-gfx-prm-osrc-skl-vol07-3d_media_gpgpu.pdf
<https://01.org/sites/default/files/documentation/intel-gfx-prm-osrc-skl-vol07-3d_media_gpgpu.pdf>
    >> >
>> > on PDF page 854, "Dismissed Legacy Behaviors" which has a list of
    >> > suggested IEEE 754 deviations for DX9. One of them is indeed
    that 0 *
>> > x = 0, but another is that input NaNs be propagated with certain
    >> > exceptions. Also they suggest that RCP(0)/RSQ(0) = fmax.
    Interesting.
    >> >
>> > So at this point, the zero_wins thing is pretty much blown. i965 >> > appears to have an all-or-nothing approach, and additionally that
    >> > approach doesn't match up exactly to what NVIDIA does (or at
    least I'm
    >> > not aware of a clamp-everything mode).
    >> >
    >> > This will take some thought to figure out how something can be
    >> > specified so that a single spec works for both i965 and nv/amd.
    OTOH
    >> > we could have two different specs that just expose different
    things -
    >> > e.g. i965 could expose a MESA_shader_float_alt_mode or whatever
    which
>> > is spec'd to do the things that the PRM says, and nv/amd have the
    >> > MESA_shader_float_zero_wins ext which does what we were talking
    about
    >> > earlier.
    >> >
    >> > I'm open to other suggestions too.
    >>
>> There is also the "small" problem that it would take a non-trivial >> effort for us on the LLVM side. You guys can flip a switch. We can't.
    >
    >
    > Don't you have to expend that effort for ARB programs anyway?  I
    thought
    > they weren't supposed to generate NaN either.

    No, we don't, because st/mesa adds abs before RSQ and the driver
    implements POW as log+mul+exp, where mul follows the rule
0*anything=0. I don't think any other opcode follows that rule though.


Ah.  That makes sense.  Do you also implement DIV as MUL+RCP?

For single-precision, yes. For double-precision, it seems we need to move away from that due to precision issues (which is itself a bit odd, since you don't seem to have encountered that?).

Nicolai

 If so,
the two of those should take care of NaN getting generated in the
shader.  We'd still have to do something about inf and maybe denorms.


I did some tests on Ivy Bridge and amd 7730m on Windows 10.

======= The tests ========

With a sm3 pixel shader (writing to a fp32 render target and reading the result):

Intel: Things seem to match the ALT mode (log, rcp, rsq clamped. No NaN generated except if using a NaN constant as input.)

Amd: log is clamped, rcp and rsq do produce INF. NaN is propagated.

Matteo did test on NVidia: log, rcp and rsq do produce INF. NaN is propagated.

Common to all cards:

0*NaN/Inf/-Inf = 0

nrm(inf, inf, inf) = (0, 0, 0) (probably comes from 0*anything = 0)


I tested the same thing with a sm2 pixel shader, and the results were not affected on Intel and Amd.


Adapting wine initial fp_special_test to test what happens in vertex shaders (the output is not written to fp32 render target, instead 3 unorm values are produced to try to guess what happens, so results are harder to interprete, and the following may have some errors).

Intel: log, rcp and rsq are clamped. One of the unorm values changes when using vs/ps 3 instead of vs/ps 2 for the part of the test where the vs outputs a shader constant containing NaN or Inf, thus perhaps there is a slight change there in the rasterizer behaviour. The results are not enough to deduce whether the ALT mode is used or not.

Amd: log is clamped, rcp and rsq are clamped when using vs 2 and produce INF when using vs 3. 0 * rcp(0) = 0 * rsq(0) = 0.

The filled nvidia results for the vs 2 version of the test say log, rcp and rsq are not clamped.


The fact 0*inf = 0 instead of NaN contradicts the r500 docs and the geforce 6 docs.


The intel, r500 and geforce 6 docs seem to indicate there is also specific NaN behaviours with CMP, MIN and MAX.

For MIN and MAX, it is written in case one of the two terms in NaN, the second term is always returned (whereas apparently for dx10 it is the non-NaN term). I haven't done tests for MIN/MAX.


====== Conclusion ======

There seems to be a lot of variations between what vendors do. It seems either having access to the ALT mode or having access to the 0*anything=0 would make wine and nine happy.


The ALT mode sounds like something that can be emulated (basically clamping everything that can produce inf), but I think some apps are not happy with that (Please confirm Matteo ?), perhaps the unknown things intel seems to do different for vs2 and vs3, or the other specific things the ALT mode does, do fix these apps. It probably is still interesting for wine and nine to have these (ALT and 0*inf = 0) as extensions.


It probably is ok to not specify the behaviours around NaN.
With 0*anything = 0, the only way to have NaN is either to feed NaN via the vertex inputs or the constants (something which the intel spec says is forbiden in dx9), or to have inf in a vertex shader output (it becomes NaN as pixel shader input). If some games are hit by a problem due to that, it probably can be fixed by app workaround doing more clamping. That said, apps that require 0*inf = 0 could also be fixed by app workaround doing more clamping and avoiding inf generation.


Since it seems easy to have an intel ALT mode extension and an 0*(+-inf)=0 extension, I would think it is a good idea to have them, but what the tests show is that they may not be required.


Axel


_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to