On Fri, May 8, 2015 at 11:05 AM, Kenneth Graunke <kenn...@whitecape.org> wrote: > On Friday, May 08, 2015 10:18:44 AM Matt Turner wrote: >> On Fri, May 8, 2015 at 10:04 AM, Jason Ekstrand <ja...@jlekstrand.net> wrote: >> > Shader-db results for fragment shaders on Broadwell: >> > >> > total instructions in shared programs: 4310987 -> 4310663 (-0.01%) >> > instructions in affected programs: 43242 -> 42918 (-0.75%) >> > helped: 142 >> > HURT: 0 >> > >> > Shader-db results for vertex shaders on Broadwell: >> > >> > total instructions in shared programs: 2889581 -> 2844309 (-1.57%) >> > instructions in affected programs: 1418720 -> 1373448 (-3.19%) >> > helped: 6139 >> > HURT: 0 >> > --- >> > src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 12 ++++++++++++ >> > 1 file changed, 12 insertions(+) >> > >> > diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp >> > b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp >> > index 555987d..161a262 100644 >> > --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp >> > +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp >> > @@ -21,6 +21,8 @@ >> > * IN THE SOFTWARE. >> > */ >> > >> > +#include <algorithm> >> > + >> > #include "glsl/ir.h" >> > #include "glsl/ir_optimization.h" >> > #include "glsl/nir/glsl_to_nir.h" >> > @@ -662,6 +664,16 @@ fs_visitor::nir_emit_alu(nir_alu_instr *instr) >> > op[i] = offset(op[i], instr->src[i].swizzle[channel]); >> > } >> > >> > + /* Immediates can only be used as the second source of two-source >> > + * instructions. We have code in opt_algebraic to flip them as needed >> > + * for most instructions. However, it doesn't hurt anything to just do >> > + * the right thing if we can detect it at the NIR level. >> > + */ >> > + if ((nir_op_infos[instr->op].algebraic_properties & >> > NIR_OP_IS_COMMUTATIVE) && >> > + nir_src_as_const_value(instr->src[0].src)) { >> > + std::swap(op[0], op[1]); >> > + } >> > + >> >> The real problem is that we haven't lifted the restriction about >> non-commutative integer multiply on Broadwell: >> >> brw_fs_copy_propagation.cpp: >> >> /* Fit this constant in by commuting the operands. >> * Exception: we can't do this for 32-bit integer MUL/MACH >> * because it's asymmetric. >> */ >> if ((inst->opcode == BRW_OPCODE_MUL || >> inst->opcode == BRW_OPCODE_MACH) && >> (inst->src[1].type == BRW_REGISTER_TYPE_D || >> inst->src[1].type == BRW_REGISTER_TYPE_UD)) >> break; >> inst->src[0] = inst->src[1]; >> inst->src[1] = val; >> progress = true; > > Yeah. I also wrote a patch to do that, adding > > (brw->gen < 8 || brw->is_cherryview) &&
In that case, shouldn't Cherry View take the same path as hsw when emitting multiplies and look for 15-bit constants? > which also solves the problem. But it won't help on Cherryview, which I > believe still has the asymmetrical multiply, while switching to shifts > will. I suppose another alternative to NIR late optimizations is to > have brw_fs_nir's handling of imul check for power of two sizes and emit > a SHL. That would be easy. I really don't think SHL is the issue here. It's that we're being stupid about multiplies. SHL is a nice hack but unless it is actually faster, I think it's hacking around the problem. > I do think we really need to make logical IMUL opcodes and lower them to > MUL/MACH as needed later, so we don't need to worry about breaking MACH > in cases like this. _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev