Without this commit, an uniform is bailed out if the instruction has 3 sources. This commit allow to go on if the combined swizzle is a single value.
WIP: it can be clearly cleaned, as the condition is somewhat ugly right now. Shader-db results for vec4 programs on Haswell: total instructions in shared programs: 1666043 -> 1648378 (-1.06%) instructions in affected programs: 812755 -> 795090 (-2.17%) helped: 7930 HURT: 0 --- Some background: in order to find places to optimize the instruction count, I'm running shader-db comparing the outcome using IR vs using NIR. Currently one of the HURT shaders is unity/57-DeferredDirectional.shader_test, that I simplified to the following shader with the same issue: uniform float myFloat; void main () { gl_Position = mix (gl_Color, gl_Vertex, vec4(myFloat)); } Note that it is an uniform, and a float, used to recreate a vec4 on an lrp. That example doesn't make too much sense, but again, it is a synthetic shader that shows the same issue that the original one that made more sense. This shader requires one instruction more on NIR. There is one major difference at the starting point. On IR we have this: 0: lrp vgrf0.0:F, u0.xxxx:F, attr0.xyzw:F, attr3.xyzw:F 1: mov m2:D, 0D 2: mov m3:F, vgrf0.xyzw:F 3: vs_urb_write (null):UD While on NIR we have this: 0: mov vgrf0.0:F, attr3.xyzw:F 1: mov vgrf1.0:F, attr0.xyzw:F 2: mov vgrf2.0:UD, u0.xyzw:UD 3: lrp vgrf3.0:F, vgrf2.xxxx:F, vgrf1.xyzw:F, vgrf0.xyzw:F 4: mov m2:D, 0D 5: mov m3:F, vgrf3.xyzw:F 6: vs_urb_write (null):UD So NIR initially emits three extra MOVs. Two of them are not a problem because are properly propagated and dead-code eliminated. But the third one no, as it get bailed out from the original condition, that bailed out when it was an uniform on a 3-sourced instruction. After that I need to confess that I got to the solution based on the assumption that this MOV can be copy-propagated (as IR doesn't have it) and the outcome when I just removed that condition. When I just hacked out that condition, one assertion at brw_eu_emit.c::get_3src_subreg_nr was triggered: assert(brw_is_single_value_swizzle(reg.dw1.bits.swizzle)); That "inspired me" to add that check on the condition. But at this point I can't clearly justify it, even if it doesn't show any piglit regression, and shows a shader-db numbers so positive. But when I was about to start to research this, I remembered that meanwhile I have focused myself on trying to help on the backend side, other people, like Matt Turner, Eduardo Lima, and Jason Ekstrand have been working on the NIR side, including commits like "nir/lower_vec_to_movs: Don't emit unneeded movs". So perhaps the way to solve this issue is going to the root, and avoid those extra MOVs as originally IR did, and someone is already working on that. So before deeping on researching why this is working. Is someone else working to solve this issue on the NIR side? Do someone thinks that this should be solved there (or even on the MOV emission)? src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp index d3f0ddd..400a13a 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp @@ -325,7 +325,8 @@ try_copy_propagate(const struct brw_device_info *devinfo, inst->opcode == SHADER_OPCODE_GEN4_SCRATCH_WRITE) return false; - if (inst->is_3src() && value.file == UNIFORM) + if (inst->is_3src() && value.file == UNIFORM && + !brw_is_single_value_swizzle(brw_compose_swizzle(inst->src[arg].swizzle, value.swizzle))) return false; if (inst->is_send_from_grf()) -- 2.1.4 _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev