On Tue, 2015-09-22 at 15:52 -0700, Matt Turner wrote: > total instructions in shared programs: 6596689 -> 6595563 (-0.02%) > instructions in affected programs: 103154 -> 102028 (-1.09%) > helped: 253 > HURT: 217 > > It's kind of a wash in terms of programs helped/hurt, but of the > programs helped 169 are by more than 10%.
Yeah, that is a very significant gain for the helped cases. Out of curiosity, how bad are hurt programs in comparison? > --- > I tried values of 2-6, and 4 seemed to be the best. I can provide > full shader-db result files if other people want to investigate. I was wondering if shader-db metrics are the best option for selecting thresholds such as these, meaning that when you are replacing instructions that have different latencies simply comparing instruction counts may not be the best thing to do. However, if ffma operations are more expensive that fadd, a higher instruction count should always lead to worse performance, so that should be ok. Reviewed-by: Iago Toral Quiroga <ito...@igalia.com> > src/glsl/nir/nir_opt_peephole_ffma.c | 20 ++++++++++++-------- > 1 file changed, 12 insertions(+), 8 deletions(-) > > diff --git a/src/glsl/nir/nir_opt_peephole_ffma.c > b/src/glsl/nir/nir_opt_peephole_ffma.c > index 4f0f0da..3e8a34f 100644 > --- a/src/glsl/nir/nir_opt_peephole_ffma.c > +++ b/src/glsl/nir/nir_opt_peephole_ffma.c > @@ -39,7 +39,7 @@ struct peephole_ffma_state { > }; > > static inline bool > -are_all_uses_fadd(nir_ssa_def *def) > +are_all_uses_fadd(nir_ssa_def *def, unsigned *num_uses) > { > if (!list_empty(&def->if_uses)) > return false; > @@ -53,6 +53,7 @@ are_all_uses_fadd(nir_ssa_def *def) > nir_alu_instr *use_alu = nir_instr_as_alu(use_instr); > switch (use_alu->op) { > case nir_op_fadd: > + (*num_uses)++; > break; /* This one's ok */ > > case nir_op_imov: > @@ -60,7 +61,7 @@ are_all_uses_fadd(nir_ssa_def *def) > case nir_op_fneg: > case nir_op_fabs: > assert(use_alu->dest.dest.is_ssa); > - if (!are_all_uses_fadd(&use_alu->dest.dest.ssa)) > + if (!are_all_uses_fadd(&use_alu->dest.dest.ssa, num_uses)) > return false; > break; > > @@ -101,15 +102,18 @@ get_mul_for_src(nir_alu_src *src, int num_components, > *abs = true; > break; > > - case nir_op_fmul: > - /* Only absorb a fmul into a ffma if the fmul is is only used in fadd > - * operations. This prevents us from being too aggressive with our > - * fusing which can actually lead to more instructions. > + case nir_op_fmul: { > + /* Only fuse an fmul into an ffma if its result is used by not more > than > + * four fadd operations. This prevents us from too aggressively fusing > + * operations which can actually lead to more instructions or many ffma > + * operations performing the same multiply. > */ > - if (!are_all_uses_fadd(&alu->dest.dest.ssa)) > + > + unsigned num_uses = 0; > + if (!are_all_uses_fadd(&alu->dest.dest.ssa, &num_uses) || num_uses > 4) > return NULL; > break; > - > + } > default: > return NULL; > } _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev