On Wed, Apr 15, 2015 at 9:50 AM, Ilia Mirkin <imir...@alum.mit.edu> wrote: > On Wed, Apr 15, 2015 at 12:37 PM, Ian Romanick <i...@freedesktop.org> wrote: >> From: Ian Romanick <ian.d.roman...@intel.com> >> >> Shader-db results: >> >> GM45 NIR: >> total instructions in shared programs: 4082044 -> 4081919 (-0.00%) >> instructions in affected programs: 27609 -> 27484 (-0.45%) >> helped: 44 >> >> Iron Lake NIR: >> total instructions in shared programs: 5678776 -> 5678646 (-0.00%) >> instructions in affected programs: 27406 -> 27276 (-0.47%) >> helped: 45 >> >> Sandy Bridge NIR: >> total instructions in shared programs: 7329995 -> 7329096 (-0.01%) >> instructions in affected programs: 142035 -> 141136 (-0.63%) >> helped: 406 >> HURT: 19 >> >> Ivy Bridge NIR: >> total instructions in shared programs: 6769314 -> 6768359 (-0.01%) >> instructions in affected programs: 140820 -> 139865 (-0.68%) >> helped: 423 >> HURT: 2 >> >> Haswell NIR: >> total instructions in shared programs: 6183693 -> 6183298 (-0.01%) >> instructions in affected programs: 96538 -> 96143 (-0.41%) >> helped: 303 >> HURT: 4 >> >> Broadwell NIR: >> total instructions in shared programs: 7501711 -> 7498170 (-0.05%) >> instructions in affected programs: 266403 -> 262862 (-1.33%) >> helped: 705 >> HURT: 5 >> GAINED: 4 >> >> v2: Rebase on top of Connor's fix. >> >> Signed-off-by: Ian Romanick <ian.d.roman...@intel.com> >> Reviewed-by: Jordan Justen <jordan.l.jus...@intel.com> [v1] >> Cc: Jason Ekstrand <jason.ekstr...@intel.com> >> Cc: Connor Abbott <cwabbo...@gmail.com> >> --- >> >> The v2 changes were a little more intrusive than before, so it seemed >> worth sending out again. >> >> src/glsl/nir/nir_opt_cse.c | 27 ++++++++++++++++++--------- >> 1 file changed, 18 insertions(+), 9 deletions(-) >> >> diff --git a/src/glsl/nir/nir_opt_cse.c b/src/glsl/nir/nir_opt_cse.c >> index 56d491c..3c5ece7 100644 >> --- a/src/glsl/nir/nir_opt_cse.c >> +++ b/src/glsl/nir/nir_opt_cse.c >> @@ -37,18 +37,19 @@ struct cse_state { >> }; >> >> static bool >> -nir_alu_srcs_equal(nir_alu_instr *alu1, nir_alu_instr *alu2, unsigned src) >> +nir_alu_srcs_equal(nir_alu_instr *alu1, nir_alu_instr *alu2, unsigned src1, >> + unsigned src2) >> { >> - if (alu1->src[src].abs != alu2->src[src].abs || >> - alu1->src[src].negate != alu2->src[src].negate) >> + if (alu1->src[src1].abs != alu2->src[src2].abs || >> + alu1->src[src1].negate != alu2->src[src2].negate) >> return false; >> >> - for (unsigned i = 0; i < nir_ssa_alu_instr_src_components(alu1, src); >> i++) { >> - if (alu1->src[src].swizzle[i] != alu2->src[src].swizzle[i]) >> + for (unsigned i = 0; i < nir_ssa_alu_instr_src_components(alu1, src1); >> i++) { >> + if (alu1->src[src1].swizzle[i] != alu2->src[src2].swizzle[i]) >> return false; >> } >> >> - return nir_srcs_equal(alu1->src[src].src, alu2->src[src].src); >> + return nir_srcs_equal(alu1->src[src1].src, alu2->src[src2].src); >> } >> >> static bool >> @@ -71,9 +72,17 @@ nir_instrs_equal(nir_instr *instr1, nir_instr *instr2) >> if (alu1->dest.dest.ssa.num_components != >> alu2->dest.dest.ssa.num_components) >> return false; >> >> - for (unsigned i = 0; i < nir_op_infos[alu1->op].num_inputs; i++) { >> - if (!nir_alu_srcs_equal(alu1, alu2, i)) >> - return false; >> + if (nir_op_infos[alu1->op].num_inputs == 2 && >> + (nir_op_infos[alu1->op].algebraic_properties & >> NIR_OP_IS_COMMUTATIVE)) { > > Just a couple of thoughts (could well go into a later patch, or not at all) -- > > (a) is the NIR_OP_IS_COMMUTATIVE flag ever set when num_inputs != 2?
No it's not. It would probably be reasonable to turn that into an assert. With that changed (and piglitted to make sure I'm right), Reviewed-by: Jason Ekstrand <jason.ekstr...@intel.com> > (b) I see that there is an isub/fsub/etc, which are (hopefully) not > marked as commutative. Would it make sense to first get rid of them > (and introduce negate flags) before CSE is done? I guess it's an > argument that expressions should have a clear canonical form for > maximal CSE, after which various negs can be propagated through. We (i965) do at the moment. Eric doesn't for vc4 but that was his choice. >> + return (nir_alu_srcs_equal(alu1, alu2, 0, 0) && >> + nir_alu_srcs_equal(alu1, alu2, 1, 1)) || >> + (nir_alu_srcs_equal(alu1, alu2, 0, 1) && >> + nir_alu_srcs_equal(alu1, alu2, 1, 0)); >> + } else { >> + for (unsigned i = 0; i < nir_op_infos[alu1->op].num_inputs; i++) { >> + if (!nir_alu_srcs_equal(alu1, alu2, i, i)) >> + return false; >> + } >> } >> return true; >> } >> -- >> 2.1.0 >> >> _______________________________________________ >> mesa-dev mailing list >> mesa-dev@lists.freedesktop.org >> http://lists.freedesktop.org/mailman/listinfo/mesa-dev > _______________________________________________ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev