On 04/06/2015 11:35 AM, Matt Turner wrote: > On Fri, Mar 20, 2015 at 1:58 PM, Ian Romanick <i...@freedesktop.org> wrote: >> From: Ian Romanick <ian.d.roman...@intel.com> >> >> On SNB+, the Boolean result is always 0 or ~0, so MOV.nz produces the >> same effect as AND.nz. However, later cmod propagation passes can >> handle the MOV.nz, but they cannot handle the AND.nz because the source >> is not generated by a CMP. >> >> It's worth noting that this commit was a lot more effective before >> commit bb22aa0 (i965/fs: Ignore type in cmod prop if scan_inst is CMP.). >> Without that commit, this commit improved ~2,500 shaders on each >> affected platform, including Sandy Bridge. >> >> Ivy Bridge (0x0166): >> total instructions in shared programs: 6291794 -> 6291668 (-0.00%) >> instructions in affected programs: 41207 -> 41081 (-0.31%) >> helped: 154 >> HURT: 28 >> >> Haswell (0x0426): >> total instructions in shared programs: 5779180 -> 5779054 (-0.00%) >> instructions in affected programs: 37210 -> 37084 (-0.34%) >> helped: 154 >> HURT: 28 >> >> Broadwell (0x162E): >> total instructions in shared programs: 6823014 -> 6822848 (-0.00%) >> instructions in affected programs: 40195 -> 40029 (-0.41%) >> helped: 164 >> HURT: 28 >> >> No change on GM45, Iron Lake, Sandy Bridge, Ivy Bridge with NIR, or >> Haswell with NIR. >> >> Signed-off-by: Ian Romanick <ian.d.roman...@intel.com> >> --- > > I looked at some helped shaders. They seem to be doing this: > > const vec4 ps_c0 = vec4(1.0, -1.0, 0.0, -0.0); > ... > t0_ps.x = (gl_FrontFacing ? ps_c0.x : ps_c0.y); > t0_ps.y = (gl_FrontFacing ? ps_c0.w : ps_c0.y); > t0_ps.x = ((-t0_ps.x >= 0.0) ? ps_c0.z : ps_c0.x); > > so before this patch we hit the > fs_visitor::try_opt_frontfacing_ternary path for t0_ps.x and not for > t0_ps.y, generating: > > asr(8) g26<1>D -g0<0,1,0>W 15D > or(8) g36.1<2>W g0<0,1,0>W 0x3f80UW > mov(1) g25<1>F [0F, 0F, 0F, 0F]VF > and.nz.f0(8) null g26<8,8,1>D 1D <--- this gets > removed with this patch > and(8) g35<1>D g36<8,8,1>D 0xbf800000UD > mov(8) g38<1>F -g25<0,1,0>F > mov(8) g40<1>F g25<0,1,0>F > (+f0) sel(8) g37<1>F g38<8,8,1>F -1F > cmp.ge.f0(8) null -g35<8,8,1>F g25<0,1,0>F > (+f0) sel(8) g39<1>F g40<8,8,1>F 1F > > After this patch we generate > asr.nz.f0(8) null -g0<0,1,0>W 15D > or(8) g35.1<2>W g0<0,1,0>W 0x3f80UW > mov(1) g25<1>F [0F, 0F, 0F, 0F]VF > and(8) g34<1>D g35<8,8,1>D 0xbf800000UD > mov(8) g37<1>F -g25<0,1,0>F > mov(8) g39<1>F g25<0,1,0>F > (+f0) sel(8) g36<1>F g37<8,8,1>F -1F > cmp.ge.f0(8) null -g34<8,8,1>F g25<0,1,0>F > (+f0) sel(8) g38<1>F g39<8,8,1>F 1F > > 10 instructions to 9. That's an annoying amount of assembly to digest, > but basically we're just benefiting because of the order the uses of > the flag. If we could simply rearrange the flag writes and reads, we > would generate better code, and... > > If we could recognize that there are multiple gl_FrontFacing ? ... : > ... expressions, we probably would have just emitted asr.nz.f0 and a > couple of SELs.
Right... I wonder what happens to these shaders after patch 14. The t0_ps.y calculation will change to 't0_ps.y = -float(gl_FrontFacing)' after patch 10. The final t0_ps.x calculation will get changed to 't0_ps.x = float(t0_ps.x == 0)' after patch 12. With patches 13 and 14, tree grafting will enable some other changes. > So I don't really think this patch is helping anything except by accident. :) That is definitely possible. Before some of the changes to the cmod propagation pass, this patch helped a couple thousand shaders. I'll test the series with this patch reverted and see if there are still any benefits. _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev