Re: [Mesa-dev] [PATCH] i965/fs_nir: Put the immediate in src1 for commutative ALU ops
On Fri, May 8, 2015 at 11:15 AM, Matt Turner wrote: > On Fri, May 8, 2015 at 11:13 AM, Jason Ekstrand wrote: >> On Fri, May 8, 2015 at 11:05 AM, Kenneth Graunke >> wrote: >>> On Friday, May 08, 2015 10:18:44 AM Matt Turner wrote: On Fri, May 8, 2015 at 10:04 AM, Jason Ekstrand wrote: > Shader-db results for fragment shaders on Broadwell: > >total instructions in shared programs: 4310987 -> 4310663 (-0.01%) >instructions in affected programs: 43242 -> 42918 (-0.75%) >helped:142 >HURT: 0 > > Shader-db results for vertex shaders on Broadwell: > >total instructions in shared programs: 2889581 -> 2844309 (-1.57%) >instructions in affected programs: 1418720 -> 1373448 (-3.19%) >helped:6139 >HURT: 0 > --- > src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 12 > 1 file changed, 12 insertions(+) > > diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > index 555987d..161a262 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > @@ -21,6 +21,8 @@ > * IN THE SOFTWARE. > */ > > +#include > + > #include "glsl/ir.h" > #include "glsl/ir_optimization.h" > #include "glsl/nir/glsl_to_nir.h" > @@ -662,6 +664,16 @@ fs_visitor::nir_emit_alu(nir_alu_instr *instr) >op[i] = offset(op[i], instr->src[i].swizzle[channel]); > } > > + /* Immediates can only be used as the second source of two-source > +* instructions. We have code in opt_algebraic to flip them as > needed > +* for most instructions. However, it doesn't hurt anything to just > do > +* the right thing if we can detect it at the NIR level. > +*/ > + if ((nir_op_infos[instr->op].algebraic_properties & > NIR_OP_IS_COMMUTATIVE) && > + nir_src_as_const_value(instr->src[0].src)) { > + std::swap(op[0], op[1]); > + } > + The real problem is that we haven't lifted the restriction about non-commutative integer multiply on Broadwell: brw_fs_copy_propagation.cpp: /* Fit this constant in by commuting the operands. * Exception: we can't do this for 32-bit integer MUL/MACH * because it's asymmetric. */ if ((inst->opcode == BRW_OPCODE_MUL || inst->opcode == BRW_OPCODE_MACH) && (inst->src[1].type == BRW_REGISTER_TYPE_D || inst->src[1].type == BRW_REGISTER_TYPE_UD)) break; inst->src[0] = inst->src[1]; inst->src[1] = val; progress = true; >>> >>> Yeah. I also wrote a patch to do that, adding >>> >>>(brw->gen < 8 || brw->is_cherryview) && >> >> In that case, shouldn't Cherry View take the same path as hsw when >> emitting multiplies and look for 15-bit constants? > > Almost... that path needs to set one of the MUL's source types to W/UW > and the stride to 2, like in commit 0c06d019. I'm planning to rip out > all of the multiplication code from brw_fs_visitor.cpp/brw_fs_nir.cpp > and move it to a common lowering pass, so I'll fix that at the same > time. Awesome! Thanks for working on that! ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/fs_nir: Put the immediate in src1 for commutative ALU ops
On Fri, May 8, 2015 at 11:13 AM, Jason Ekstrand wrote: > On Fri, May 8, 2015 at 11:05 AM, Kenneth Graunke > wrote: >> On Friday, May 08, 2015 10:18:44 AM Matt Turner wrote: >>> On Fri, May 8, 2015 at 10:04 AM, Jason Ekstrand >>> wrote: >>> > Shader-db results for fragment shaders on Broadwell: >>> > >>> >total instructions in shared programs: 4310987 -> 4310663 (-0.01%) >>> >instructions in affected programs: 43242 -> 42918 (-0.75%) >>> >helped:142 >>> >HURT: 0 >>> > >>> > Shader-db results for vertex shaders on Broadwell: >>> > >>> >total instructions in shared programs: 2889581 -> 2844309 (-1.57%) >>> >instructions in affected programs: 1418720 -> 1373448 (-3.19%) >>> >helped:6139 >>> >HURT: 0 >>> > --- >>> > src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 12 >>> > 1 file changed, 12 insertions(+) >>> > >>> > diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp >>> > b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp >>> > index 555987d..161a262 100644 >>> > --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp >>> > +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp >>> > @@ -21,6 +21,8 @@ >>> > * IN THE SOFTWARE. >>> > */ >>> > >>> > +#include >>> > + >>> > #include "glsl/ir.h" >>> > #include "glsl/ir_optimization.h" >>> > #include "glsl/nir/glsl_to_nir.h" >>> > @@ -662,6 +664,16 @@ fs_visitor::nir_emit_alu(nir_alu_instr *instr) >>> >op[i] = offset(op[i], instr->src[i].swizzle[channel]); >>> > } >>> > >>> > + /* Immediates can only be used as the second source of two-source >>> > +* instructions. We have code in opt_algebraic to flip them as needed >>> > +* for most instructions. However, it doesn't hurt anything to just >>> > do >>> > +* the right thing if we can detect it at the NIR level. >>> > +*/ >>> > + if ((nir_op_infos[instr->op].algebraic_properties & >>> > NIR_OP_IS_COMMUTATIVE) && >>> > + nir_src_as_const_value(instr->src[0].src)) { >>> > + std::swap(op[0], op[1]); >>> > + } >>> > + >>> >>> The real problem is that we haven't lifted the restriction about >>> non-commutative integer multiply on Broadwell: >>> >>> brw_fs_copy_propagation.cpp: >>> >>>/* Fit this constant in by commuting the operands. >>> * Exception: we can't do this for 32-bit integer MUL/MACH >>> * because it's asymmetric. >>> */ >>>if ((inst->opcode == BRW_OPCODE_MUL || >>> inst->opcode == BRW_OPCODE_MACH) && >>>(inst->src[1].type == BRW_REGISTER_TYPE_D || >>> inst->src[1].type == BRW_REGISTER_TYPE_UD)) >>> break; >>>inst->src[0] = inst->src[1]; >>>inst->src[1] = val; >>>progress = true; >> >> Yeah. I also wrote a patch to do that, adding >> >>(brw->gen < 8 || brw->is_cherryview) && > > In that case, shouldn't Cherry View take the same path as hsw when > emitting multiplies and look for 15-bit constants? Almost... that path needs to set one of the MUL's source types to W/UW and the stride to 2, like in commit 0c06d019. I'm planning to rip out all of the multiplication code from brw_fs_visitor.cpp/brw_fs_nir.cpp and move it to a common lowering pass, so I'll fix that at the same time. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/fs_nir: Put the immediate in src1 for commutative ALU ops
On Fri, May 8, 2015 at 11:05 AM, Kenneth Graunke wrote: > On Friday, May 08, 2015 10:18:44 AM Matt Turner wrote: >> On Fri, May 8, 2015 at 10:04 AM, Jason Ekstrand wrote: >> > Shader-db results for fragment shaders on Broadwell: >> > >> >total instructions in shared programs: 4310987 -> 4310663 (-0.01%) >> >instructions in affected programs: 43242 -> 42918 (-0.75%) >> >helped:142 >> >HURT: 0 >> > >> > Shader-db results for vertex shaders on Broadwell: >> > >> >total instructions in shared programs: 2889581 -> 2844309 (-1.57%) >> >instructions in affected programs: 1418720 -> 1373448 (-3.19%) >> >helped:6139 >> >HURT: 0 >> > --- >> > src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 12 >> > 1 file changed, 12 insertions(+) >> > >> > diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp >> > b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp >> > index 555987d..161a262 100644 >> > --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp >> > +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp >> > @@ -21,6 +21,8 @@ >> > * IN THE SOFTWARE. >> > */ >> > >> > +#include >> > + >> > #include "glsl/ir.h" >> > #include "glsl/ir_optimization.h" >> > #include "glsl/nir/glsl_to_nir.h" >> > @@ -662,6 +664,16 @@ fs_visitor::nir_emit_alu(nir_alu_instr *instr) >> >op[i] = offset(op[i], instr->src[i].swizzle[channel]); >> > } >> > >> > + /* Immediates can only be used as the second source of two-source >> > +* instructions. We have code in opt_algebraic to flip them as needed >> > +* for most instructions. However, it doesn't hurt anything to just do >> > +* the right thing if we can detect it at the NIR level. >> > +*/ >> > + if ((nir_op_infos[instr->op].algebraic_properties & >> > NIR_OP_IS_COMMUTATIVE) && >> > + nir_src_as_const_value(instr->src[0].src)) { >> > + std::swap(op[0], op[1]); >> > + } >> > + >> >> The real problem is that we haven't lifted the restriction about >> non-commutative integer multiply on Broadwell: >> >> brw_fs_copy_propagation.cpp: >> >>/* Fit this constant in by commuting the operands. >> * Exception: we can't do this for 32-bit integer MUL/MACH >> * because it's asymmetric. >> */ >>if ((inst->opcode == BRW_OPCODE_MUL || >> inst->opcode == BRW_OPCODE_MACH) && >>(inst->src[1].type == BRW_REGISTER_TYPE_D || >> inst->src[1].type == BRW_REGISTER_TYPE_UD)) >> break; >>inst->src[0] = inst->src[1]; >>inst->src[1] = val; >>progress = true; > > Yeah. I also wrote a patch to do that, adding > >(brw->gen < 8 || brw->is_cherryview) && In that case, shouldn't Cherry View take the same path as hsw when emitting multiplies and look for 15-bit constants? > which also solves the problem. But it won't help on Cherryview, which I > believe still has the asymmetrical multiply, while switching to shifts > will. I suppose another alternative to NIR late optimizations is to > have brw_fs_nir's handling of imul check for power of two sizes and emit > a SHL. That would be easy. I really don't think SHL is the issue here. It's that we're being stupid about multiplies. SHL is a nice hack but unless it is actually faster, I think it's hacking around the problem. > I do think we really need to make logical IMUL opcodes and lower them to > MUL/MACH as needed later, so we don't need to worry about breaking MACH > in cases like this. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/fs_nir: Put the immediate in src1 for commutative ALU ops
On Fri, May 8, 2015 at 11:05 AM, Kenneth Graunke wrote: > On Friday, May 08, 2015 10:18:44 AM Matt Turner wrote: >> On Fri, May 8, 2015 at 10:04 AM, Jason Ekstrand wrote: >> > Shader-db results for fragment shaders on Broadwell: >> > >> >total instructions in shared programs: 4310987 -> 4310663 (-0.01%) >> >instructions in affected programs: 43242 -> 42918 (-0.75%) >> >helped:142 >> >HURT: 0 >> > >> > Shader-db results for vertex shaders on Broadwell: >> > >> >total instructions in shared programs: 2889581 -> 2844309 (-1.57%) >> >instructions in affected programs: 1418720 -> 1373448 (-3.19%) >> >helped:6139 >> >HURT: 0 >> > --- >> > src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 12 >> > 1 file changed, 12 insertions(+) >> > >> > diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp >> > b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp >> > index 555987d..161a262 100644 >> > --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp >> > +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp >> > @@ -21,6 +21,8 @@ >> > * IN THE SOFTWARE. >> > */ >> > >> > +#include >> > + >> > #include "glsl/ir.h" >> > #include "glsl/ir_optimization.h" >> > #include "glsl/nir/glsl_to_nir.h" >> > @@ -662,6 +664,16 @@ fs_visitor::nir_emit_alu(nir_alu_instr *instr) >> >op[i] = offset(op[i], instr->src[i].swizzle[channel]); >> > } >> > >> > + /* Immediates can only be used as the second source of two-source >> > +* instructions. We have code in opt_algebraic to flip them as needed >> > +* for most instructions. However, it doesn't hurt anything to just do >> > +* the right thing if we can detect it at the NIR level. >> > +*/ >> > + if ((nir_op_infos[instr->op].algebraic_properties & >> > NIR_OP_IS_COMMUTATIVE) && >> > + nir_src_as_const_value(instr->src[0].src)) { >> > + std::swap(op[0], op[1]); >> > + } >> > + >> >> The real problem is that we haven't lifted the restriction about >> non-commutative integer multiply on Broadwell: >> >> brw_fs_copy_propagation.cpp: >> >>/* Fit this constant in by commuting the operands. >> * Exception: we can't do this for 32-bit integer MUL/MACH >> * because it's asymmetric. >> */ >>if ((inst->opcode == BRW_OPCODE_MUL || >> inst->opcode == BRW_OPCODE_MACH) && >>(inst->src[1].type == BRW_REGISTER_TYPE_D || >> inst->src[1].type == BRW_REGISTER_TYPE_UD)) >> break; >>inst->src[0] = inst->src[1]; >>inst->src[1] = val; >>progress = true; > > Yeah. I also wrote a patch to do that, adding > >(brw->gen < 8 || brw->is_cherryview) && > > which also solves the problem. But it won't help on Cherryview, which I > believe still has the asymmetrical multiply, while switching to shifts > will. I suppose another alternative to NIR late optimizations is to > have brw_fs_nir's handling of imul check for power of two sizes and emit > a SHL. That would be easy. > > I do think we really need to make logical IMUL opcodes and lower them to > MUL/MACH as needed later, so we don't need to worry about breaking MACH > in cases like this. Indeed. I mentioned yesterday I've been planning to do it for a while, so I'll go ahead and take care of it. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/fs_nir: Put the immediate in src1 for commutative ALU ops
On Friday, May 08, 2015 10:18:44 AM Matt Turner wrote: > On Fri, May 8, 2015 at 10:04 AM, Jason Ekstrand wrote: > > Shader-db results for fragment shaders on Broadwell: > > > >total instructions in shared programs: 4310987 -> 4310663 (-0.01%) > >instructions in affected programs: 43242 -> 42918 (-0.75%) > >helped:142 > >HURT: 0 > > > > Shader-db results for vertex shaders on Broadwell: > > > >total instructions in shared programs: 2889581 -> 2844309 (-1.57%) > >instructions in affected programs: 1418720 -> 1373448 (-3.19%) > >helped:6139 > >HURT: 0 > > --- > > src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 12 > > 1 file changed, 12 insertions(+) > > > > diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > > b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > > index 555987d..161a262 100644 > > --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > > +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > > @@ -21,6 +21,8 @@ > > * IN THE SOFTWARE. > > */ > > > > +#include > > + > > #include "glsl/ir.h" > > #include "glsl/ir_optimization.h" > > #include "glsl/nir/glsl_to_nir.h" > > @@ -662,6 +664,16 @@ fs_visitor::nir_emit_alu(nir_alu_instr *instr) > >op[i] = offset(op[i], instr->src[i].swizzle[channel]); > > } > > > > + /* Immediates can only be used as the second source of two-source > > +* instructions. We have code in opt_algebraic to flip them as needed > > +* for most instructions. However, it doesn't hurt anything to just do > > +* the right thing if we can detect it at the NIR level. > > +*/ > > + if ((nir_op_infos[instr->op].algebraic_properties & > > NIR_OP_IS_COMMUTATIVE) && > > + nir_src_as_const_value(instr->src[0].src)) { > > + std::swap(op[0], op[1]); > > + } > > + > > The real problem is that we haven't lifted the restriction about > non-commutative integer multiply on Broadwell: > > brw_fs_copy_propagation.cpp: > >/* Fit this constant in by commuting the operands. > * Exception: we can't do this for 32-bit integer MUL/MACH > * because it's asymmetric. > */ >if ((inst->opcode == BRW_OPCODE_MUL || > inst->opcode == BRW_OPCODE_MACH) && >(inst->src[1].type == BRW_REGISTER_TYPE_D || > inst->src[1].type == BRW_REGISTER_TYPE_UD)) > break; >inst->src[0] = inst->src[1]; >inst->src[1] = val; >progress = true; Yeah. I also wrote a patch to do that, adding (brw->gen < 8 || brw->is_cherryview) && which also solves the problem. But it won't help on Cherryview, which I believe still has the asymmetrical multiply, while switching to shifts will. I suppose another alternative to NIR late optimizations is to have brw_fs_nir's handling of imul check for power of two sizes and emit a SHL. That would be easy. I do think we really need to make logical IMUL opcodes and lower them to MUL/MACH as needed later, so we don't need to worry about breaking MACH in cases like this. signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/fs_nir: Put the immediate in src1 for commutative ALU ops
On Fri, May 8, 2015 at 10:04 AM, Jason Ekstrand wrote: > Shader-db results for fragment shaders on Broadwell: > >total instructions in shared programs: 4310987 -> 4310663 (-0.01%) >instructions in affected programs: 43242 -> 42918 (-0.75%) >helped:142 >HURT: 0 > > Shader-db results for vertex shaders on Broadwell: > >total instructions in shared programs: 2889581 -> 2844309 (-1.57%) >instructions in affected programs: 1418720 -> 1373448 (-3.19%) >helped:6139 >HURT: 0 > --- > src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 12 > 1 file changed, 12 insertions(+) > > diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > index 555987d..161a262 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > @@ -21,6 +21,8 @@ > * IN THE SOFTWARE. > */ > > +#include > + > #include "glsl/ir.h" > #include "glsl/ir_optimization.h" > #include "glsl/nir/glsl_to_nir.h" > @@ -662,6 +664,16 @@ fs_visitor::nir_emit_alu(nir_alu_instr *instr) >op[i] = offset(op[i], instr->src[i].swizzle[channel]); > } > > + /* Immediates can only be used as the second source of two-source > +* instructions. We have code in opt_algebraic to flip them as needed > +* for most instructions. However, it doesn't hurt anything to just do > +* the right thing if we can detect it at the NIR level. > +*/ > + if ((nir_op_infos[instr->op].algebraic_properties & > NIR_OP_IS_COMMUTATIVE) && > + nir_src_as_const_value(instr->src[0].src)) { > + std::swap(op[0], op[1]); > + } > + The real problem is that we haven't lifted the restriction about non-commutative integer multiply on Broadwell: brw_fs_copy_propagation.cpp: /* Fit this constant in by commuting the operands. * Exception: we can't do this for 32-bit integer MUL/MACH * because it's asymmetric. */ if ((inst->opcode == BRW_OPCODE_MUL || inst->opcode == BRW_OPCODE_MACH) && (inst->src[1].type == BRW_REGISTER_TYPE_D || inst->src[1].type == BRW_REGISTER_TYPE_UD)) break; inst->src[0] = inst->src[1]; inst->src[1] = val; progress = true; ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/fs_nir: Put the immediate in src1 for commutative ALU ops
On Fri, May 8, 2015 at 10:04 AM, Jason Ekstrand wrote: > Shader-db results for fragment shaders on Broadwell: > >total instructions in shared programs: 4310987 -> 4310663 (-0.01%) >instructions in affected programs: 43242 -> 42918 (-0.75%) >helped:142 >HURT: 0 > > Shader-db results for vertex shaders on Broadwell: > >total instructions in shared programs: 2889581 -> 2844309 (-1.57%) >instructions in affected programs: 1418720 -> 1373448 (-3.19%) >helped:6139 >HURT: 0 > --- > src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 12 > 1 file changed, 12 insertions(+) > > diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > index 555987d..161a262 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > @@ -21,6 +21,8 @@ > * IN THE SOFTWARE. > */ > > +#include > + > #include "glsl/ir.h" > #include "glsl/ir_optimization.h" > #include "glsl/nir/glsl_to_nir.h" > @@ -662,6 +664,16 @@ fs_visitor::nir_emit_alu(nir_alu_instr *instr) >op[i] = offset(op[i], instr->src[i].swizzle[channel]); > } > > + /* Immediates can only be used as the second source of two-source > +* instructions. We have code in opt_algebraic to flip them as needed > +* for most instructions. However, it doesn't hurt anything to just do > +* the right thing if we can detect it at the NIR level. > +*/ > + if ((nir_op_infos[instr->op].algebraic_properties & > NIR_OP_IS_COMMUTATIVE) && > + nir_src_as_const_value(instr->src[0].src)) { > + std::swap(op[0], op[1]); We don't use STL. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev