On Wed, Sep 23, 2015 at 12:29 PM, Alejandro Piñeiro <apinhe...@igalia.com> wrote: > Without this commit, an uniform is bailed out if the instruction
"an uniform" -> "a uniform" (because the 'u' actually makes a consonant sound) I'm not sure what is meant by "bailed out" > has 3 sources. This commit allow to go on if the combined > swizzle is a single value. Nice find! This is spot-on. Indeed, 3-src instructions can access a scalar value and we were mistakenly ignoring that capability. > WIP: it can be clearly cleaned, as the condition is somewhat > ugly right now. > > Shader-db results for vec4 programs on Haswell: > total instructions in shared programs: 1666043 -> 1648378 (-1.06%) > instructions in affected programs: 812755 -> 795090 (-2.17%) > helped: 7930 > HURT: 0 I get Without NIR: total instructions in shared programs: 6717457 -> 6713423 (-0.06%) instructions in affected programs: 296630 -> 292596 (-1.36%) helped: 2533 With NIR: total instructions in shared programs: 6529329 -> 6511760 (-0.27%) instructions in affected programs: 811929 -> 794360 (-2.16%) helped: 7865 I find providing filtered results (i.e., only results from vertex shaders) to be confusing, especially when that fact is withheld. > --- > > Some background: in order to find places to optimize the instruction > count, I'm running shader-db comparing the outcome using IR vs using > NIR. Currently one of the HURT shaders is > unity/57-DeferredDirectional.shader_test, that I simplified to the > following shader with the same issue: > > uniform float myFloat; > > void main () > { > gl_Position = mix (gl_Color, gl_Vertex, vec4(myFloat)); > } > > Note that it is an uniform, and a float, used to recreate a vec4 on an > lrp. That example doesn't make too much sense, but again, it is a > synthetic shader that shows the same issue that the original one that > made more sense. > > This shader requires one instruction more on NIR. There is one > major difference at the starting point. On IR we have this: > > 0: lrp vgrf0.0:F, u0.xxxx:F, attr0.xyzw:F, attr3.xyzw:F > 1: mov m2:D, 0D > 2: mov m3:F, vgrf0.xyzw:F > 3: vs_urb_write (null):UD > > While on NIR we have this: > > 0: mov vgrf0.0:F, attr3.xyzw:F > 1: mov vgrf1.0:F, attr0.xyzw:F > 2: mov vgrf2.0:UD, u0.xyzw:UD > 3: lrp vgrf3.0:F, vgrf2.xxxx:F, vgrf1.xyzw:F, vgrf0.xyzw:F > 4: mov m2:D, 0D > 5: mov m3:F, vgrf3.xyzw:F > 6: vs_urb_write (null):UD > > So NIR initially emits three extra MOVs. Two of them are not a problem > because are properly propagated and dead-code eliminated. But the third > one no, as it get bailed out from the original condition, that bailed > out when it was an uniform on a 3-sourced instruction. > > After that I need to confess that I got to the solution based on the > assumption that this MOV can be copy-propagated (as IR doesn't have > it) and the outcome when I just removed that condition. When I just > hacked out that condition, one assertion at > brw_eu_emit.c::get_3src_subreg_nr was triggered: > > assert(brw_is_single_value_swizzle(reg.dw1.bits.swizzle)); > > That "inspired me" to add that check on the condition. But at this > point I can't clearly justify it, even if it doesn't show any piglit > regression, and shows a shader-db numbers so positive. > > But when I was about to start to research this, I remembered that > meanwhile I have focused myself on trying to help on the backend side, > other people, like Matt Turner, Eduardo Lima, and Jason Ekstrand have > been working on the NIR side, including commits like > "nir/lower_vec_to_movs: Don't emit unneeded movs". So perhaps the way > to solve this issue is going to the root, and avoid those extra MOVs > as originally IR did, and someone is already working on that. So > before deeping on researching why this is working. Is someone else > working to solve this issue on the NIR side? Do someone thinks that > this should be solved there (or even on the MOV emission)? > > > src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp > b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp > index d3f0ddd..400a13a 100644 > --- a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp > +++ b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp > @@ -325,7 +325,8 @@ try_copy_propagate(const struct brw_device_info *devinfo, > inst->opcode == SHADER_OPCODE_GEN4_SCRATCH_WRITE) > return false; > > - if (inst->is_3src() && value.file == UNIFORM) > + if (inst->is_3src() && value.file == UNIFORM && > + > !brw_is_single_value_swizzle(brw_compose_swizzle(inst->src[arg].swizzle, > value.swizzle))) I think just linewrap this and remove the WIP from the commit message, and Reviewed-by: Matt Turner <matts...@gmail.com> Excellent work! _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev