On Fri, Mar 20, 2015 at 1:58 PM, Ian Romanick <i...@freedesktop.org> wrote: > From: Ian Romanick <ian.d.roman...@intel.com> > > On SNB+, the Boolean result is always 0 or ~0, so MOV.nz produces the > same effect as AND.nz. However, later cmod propagation passes can > handle the MOV.nz, but they cannot handle the AND.nz because the source > is not generated by a CMP. > > It's worth noting that this commit was a lot more effective before > commit bb22aa0 (i965/fs: Ignore type in cmod prop if scan_inst is CMP.). > Without that commit, this commit improved ~2,500 shaders on each > affected platform, including Sandy Bridge. > > Ivy Bridge (0x0166): > total instructions in shared programs: 6291794 -> 6291668 (-0.00%) > instructions in affected programs: 41207 -> 41081 (-0.31%) > helped: 154 > HURT: 28 > > Haswell (0x0426): > total instructions in shared programs: 5779180 -> 5779054 (-0.00%) > instructions in affected programs: 37210 -> 37084 (-0.34%) > helped: 154 > HURT: 28 > > Broadwell (0x162E): > total instructions in shared programs: 6823014 -> 6822848 (-0.00%) > instructions in affected programs: 40195 -> 40029 (-0.41%) > helped: 164 > HURT: 28 > > No change on GM45, Iron Lake, Sandy Bridge, Ivy Bridge with NIR, or > Haswell with NIR. > > Signed-off-by: Ian Romanick <ian.d.roman...@intel.com> > ---
I looked at some helped shaders. They seem to be doing this: const vec4 ps_c0 = vec4(1.0, -1.0, 0.0, -0.0); ... t0_ps.x = (gl_FrontFacing ? ps_c0.x : ps_c0.y); t0_ps.y = (gl_FrontFacing ? ps_c0.w : ps_c0.y); t0_ps.x = ((-t0_ps.x >= 0.0) ? ps_c0.z : ps_c0.x); so before this patch we hit the fs_visitor::try_opt_frontfacing_ternary path for t0_ps.x and not for t0_ps.y, generating: asr(8) g26<1>D -g0<0,1,0>W 15D or(8) g36.1<2>W g0<0,1,0>W 0x3f80UW mov(1) g25<1>F [0F, 0F, 0F, 0F]VF and.nz.f0(8) null g26<8,8,1>D 1D <--- this gets removed with this patch and(8) g35<1>D g36<8,8,1>D 0xbf800000UD mov(8) g38<1>F -g25<0,1,0>F mov(8) g40<1>F g25<0,1,0>F (+f0) sel(8) g37<1>F g38<8,8,1>F -1F cmp.ge.f0(8) null -g35<8,8,1>F g25<0,1,0>F (+f0) sel(8) g39<1>F g40<8,8,1>F 1F After this patch we generate asr.nz.f0(8) null -g0<0,1,0>W 15D or(8) g35.1<2>W g0<0,1,0>W 0x3f80UW mov(1) g25<1>F [0F, 0F, 0F, 0F]VF and(8) g34<1>D g35<8,8,1>D 0xbf800000UD mov(8) g37<1>F -g25<0,1,0>F mov(8) g39<1>F g25<0,1,0>F (+f0) sel(8) g36<1>F g37<8,8,1>F -1F cmp.ge.f0(8) null -g34<8,8,1>F g25<0,1,0>F (+f0) sel(8) g38<1>F g39<8,8,1>F 1F 10 instructions to 9. That's an annoying amount of assembly to digest, but basically we're just benefiting because of the order the uses of the flag. If we could simply rearrange the flag writes and reads, we would generate better code, and... If we could recognize that there are multiple gl_FrontFacing ? ... : ... expressions, we probably would have just emitted asr.nz.f0 and a couple of SELs. So I don't really think this patch is helping anything except by accident. :) _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev