On Fri, Mar 20, 2015 at 1:58 PM, Ian Romanick <i...@freedesktop.org> wrote:
> From: Ian Romanick <ian.d.roman...@intel.com>
>
> On SNB+, the Boolean result is always 0 or ~0, so MOV.nz produces the
> same effect as AND.nz.  However, later cmod propagation passes can
> handle the MOV.nz, but they cannot handle the AND.nz because the source
> is not generated by a CMP.
>
> It's worth noting that this commit was a lot more effective before
> commit bb22aa0 (i965/fs: Ignore type in cmod prop if scan_inst is CMP.).
> Without that commit, this commit improved ~2,500 shaders on each
> affected platform, including Sandy Bridge.
>
> Ivy Bridge (0x0166):
> total instructions in shared programs: 6291794 -> 6291668 (-0.00%)
> instructions in affected programs:     41207 -> 41081 (-0.31%)
> helped:                                154
> HURT:                                  28
>
> Haswell (0x0426):
> total instructions in shared programs: 5779180 -> 5779054 (-0.00%)
> instructions in affected programs:     37210 -> 37084 (-0.34%)
> helped:                                154
> HURT:                                  28
>
> Broadwell (0x162E):
> total instructions in shared programs: 6823014 -> 6822848 (-0.00%)
> instructions in affected programs:     40195 -> 40029 (-0.41%)
> helped:                                164
> HURT:                                  28
>
> No change on GM45, Iron Lake, Sandy Bridge, Ivy Bridge with NIR, or
> Haswell with NIR.
>
> Signed-off-by: Ian Romanick <ian.d.roman...@intel.com>
> ---

I looked at some helped shaders. They seem to be doing this:

const vec4 ps_c0 = vec4(1.0, -1.0, 0.0, -0.0);
...
        t0_ps.x = (gl_FrontFacing ? ps_c0.x : ps_c0.y);
        t0_ps.y = (gl_FrontFacing ? ps_c0.w : ps_c0.y);
        t0_ps.x = ((-t0_ps.x >= 0.0) ? ps_c0.z : ps_c0.x);

so before this patch we hit the
fs_visitor::try_opt_frontfacing_ternary path for t0_ps.x and not for
t0_ps.y, generating:

asr(8)          g26<1>D         -g0<0,1,0>W     15D
or(8)           g36.1<2>W       g0<0,1,0>W      0x3f80UW
mov(1)          g25<1>F         [0F, 0F, 0F, 0F]VF
and.nz.f0(8)    null            g26<8,8,1>D     1D    <--- this gets
removed with this patch
and(8)          g35<1>D         g36<8,8,1>D     0xbf800000UD
mov(8)          g38<1>F         -g25<0,1,0>F
mov(8)          g40<1>F         g25<0,1,0>F
(+f0) sel(8)    g37<1>F         g38<8,8,1>F     -1F
cmp.ge.f0(8)    null            -g35<8,8,1>F    g25<0,1,0>F
(+f0) sel(8)    g39<1>F         g40<8,8,1>F     1F

After this patch we generate
asr.nz.f0(8)    null            -g0<0,1,0>W     15D
or(8)           g35.1<2>W       g0<0,1,0>W      0x3f80UW
mov(1)          g25<1>F         [0F, 0F, 0F, 0F]VF
and(8)          g34<1>D         g35<8,8,1>D     0xbf800000UD
mov(8)          g37<1>F         -g25<0,1,0>F
mov(8)          g39<1>F         g25<0,1,0>F
(+f0) sel(8)    g36<1>F         g37<8,8,1>F     -1F
cmp.ge.f0(8)    null            -g34<8,8,1>F    g25<0,1,0>F
(+f0) sel(8)    g38<1>F         g39<8,8,1>F     1F

10 instructions to 9. That's an annoying amount of assembly to digest,
but basically we're just benefiting because of the order the uses of
the flag. If we could simply rearrange the flag writes and reads, we
would generate better code, and...

If we could recognize that there are multiple gl_FrontFacing ? ... :
... expressions, we probably would have just emitted asr.nz.f0 and a
couple of SELs.

So I don't really think this patch is helping anything except by accident. :)
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to