Re: [Mesa-dev] [PATCH 0/9] NIR dead control-flow removal
On Fri, May 8, 2015 at 10:03 PM, Connor Abbott wrote: > This series implements a bunch of related optimizations that run at > once as part of the same pass to eliminate control flow that is > useless. This includes both unreachable code and useless loops, i.e. > loops that don't compute a result used by the rest of the code. To do > this, I needed to change some low-level things that change the behavior > of nir_instr_remove() (see patch 2 for details), but I've tested the > series on piglit and there are no regressions, and there are no > shader-db changes either. I tried to get you shader-db results, but it crashes part-way through. In particular, it looks like unigen heaven on high will crash it. Either that or a couple of synmark shaders but I doubt those are the culprets. I think it's also worth noting that piglit probably isn't a good test of whether or not this works. I doubt there are many piglit shaders with dead control-flow and I think GLSL IR can get rid of it in some of the common cases. If you wanted to "generate NIR" to test with, you could always hack up my SPIRV -> NIR branch to run an optimization loop after converting. Then you can see what happens without GLSL IR in the way. --Jason > This is rebased on Jason's series to use lists for use-def sets; it > turns out that the changes required were pretty minimal. > > The series is also available at: > > git://people.freedesktop.org/~cwabbott0/mesa nir-dead-cf-v3 > > Connor Abbott (9): > nir/vars_to_ssa: don't rewrite removed instructions > nir: insert ssa_undef instructions when cleanup up defs/uses > nir: add an optimization for removing dead control flow > nir/validate: validate successors at the end of a loop > nir/dead_cf: delete code that's unreachable due to jumps > nir: add nir_block_get_following_loop() helper > nir: add a helper for iterating over blocks in a cf node > nir/dead_cf: add support for removing useless loops > i965/fs/nir: enable the dead control flow optimization > > src/glsl/Makefile.sources| 1 + > src/glsl/nir/nir.c | 94 -- > src/glsl/nir/nir.h | 12 ++ > src/glsl/nir/nir_lower_vars_to_ssa.c | 3 +- > src/glsl/nir/nir_opt_dead_cf.c | 332 > +++ > src/glsl/nir/nir_validate.c | 21 +++ > src/mesa/drivers/dri/i965/brw_nir.c | 2 + > 7 files changed, 452 insertions(+), 13 deletions(-) > create mode 100644 src/glsl/nir/nir_opt_dead_cf.c > > -- > 2.1.0 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/9] nir: insert ssa_undef instructions when cleanup up defs/uses
On Sat, May 9, 2015 at 1:52 AM, Jason Ekstrand wrote: > On Fri, May 8, 2015 at 10:03 PM, Connor Abbott wrote: >> The point of cleanup_defs_uses() is to make an instruction safe to >> remove by removing any references that the rest of the shader may have >> to it. Previously, it was handling register use/def sets and removing >> the instruction from the use sets of any SSA sources it had, but if the >> instruction defined an SSA value that was used by other instructions it >> wasn't doing anything. This was ok, since we were always careful to make >> sure that no removed instruction ever had any uses, but now we want to >> start removing unreachable instruction which might still be used in >> reachable parts of the code. In that case, the value that any uses get >> is undefined since the instruction never actually executes, so we can >> just replace the instruction with an ssa_undef_instr. >> >> Signed-off-by: Connor Abbott >> --- >> src/glsl/nir/nir.c | 47 ++- >> 1 file changed, 34 insertions(+), 13 deletions(-) >> >> diff --git a/src/glsl/nir/nir.c b/src/glsl/nir/nir.c >> index f03e80a..362ac30 100644 >> --- a/src/glsl/nir/nir.c >> +++ b/src/glsl/nir/nir.c >> @@ -1206,26 +1206,29 @@ stitch_blocks(nir_block *before, nir_block *after) >> } >> >> static void >> -remove_defs_uses(nir_instr *instr); >> +remove_defs_uses(nir_instr *instr, nir_function_impl *impl); >> >> static void >> -cleanup_cf_node(nir_cf_node *node) >> +cleanup_cf_node(nir_cf_node *node, nir_function_impl *impl) >> { >> switch (node->type) { >> case nir_cf_node_block: { >>nir_block *block = nir_cf_node_as_block(node); >>/* We need to walk the instructions and clean up defs/uses */ >> - nir_foreach_instr(block, instr) >> - remove_defs_uses(instr); >> + nir_foreach_instr(block, instr) { >> + if (instr->type == nir_instr_type_jump) >> +handle_remove_jump(block, nir_instr_as_jump(instr)->type); >> + remove_defs_uses(instr, impl); >> + } > > This looks like an unrelated change. Maybe split that out? The rest > of the patch looks plausible. > --Jason Yeah, right... part of this change is just passing impl through to remove_defs_uses() so it doesn't have to recompute it, but the other part is fixing another bug where we would forget to fixup the successors/predecessors when removing jumps. I'll split out the latter part. > >>break; >> } >> >> case nir_cf_node_if: { >>nir_if *if_stmt = nir_cf_node_as_if(node); >>foreach_list_typed(nir_cf_node, child, node, &if_stmt->then_list) >> - cleanup_cf_node(child); >> + cleanup_cf_node(child, impl); >>foreach_list_typed(nir_cf_node, child, node, &if_stmt->else_list) >> - cleanup_cf_node(child); >> + cleanup_cf_node(child, impl); >> >>list_del(&if_stmt->condition.use_link); >>break; >> @@ -1234,13 +1237,12 @@ cleanup_cf_node(nir_cf_node *node) >> case nir_cf_node_loop: { >>nir_loop *loop = nir_cf_node_as_loop(node); >>foreach_list_typed(nir_cf_node, child, node, &loop->body) >> - cleanup_cf_node(child); >> + cleanup_cf_node(child, impl); >>break; >> } >> case nir_cf_node_function: { >> - nir_function_impl *impl = nir_cf_node_as_function(node); >>foreach_list_typed(nir_cf_node, child, node, &impl->body) >> - cleanup_cf_node(child); >> + cleanup_cf_node(child, impl); >>break; >> } >> default: >> @@ -1254,6 +1256,8 @@ nir_cf_node_remove(nir_cf_node *node) >> nir_function_impl *impl = nir_cf_node_get_function(node); >> nir_metadata_preserve(impl, nir_metadata_none); >> >> + cleanup_cf_node(node, impl); >> + >> if (node->type == nir_cf_node_block) { >>/* >> * Basic blocks can't really be removed by themselves, since they act >> as >> @@ -1275,8 +1279,6 @@ nir_cf_node_remove(nir_cf_node *node) >>exec_node_remove(&node->node); >>stitch_blocks(before_block, after_block); >> } >> - >> - cleanup_cf_node(node); >> } >> >> static bool >> @@ -1443,16 +1445,35 @@ remove_def_cb(nir_dest *dest, void *state) >> return true; >> } >> >> +static bool >> +remove_ssa_def_cb(nir_ssa_def *def, void *state) >> +{ >> + nir_function_impl *impl = state; >> + nir_shader *shader = impl->overload->function->shader; >> + >> + if (!list_empty(&def->uses) || !list_empty(&def->if_uses)) { >> + nir_ssa_undef_instr *undef = >> + nir_ssa_undef_instr_create(shader, def->num_components); >> + nir_instr_insert_before_cf_list(&impl->body, &undef->instr); >> + nir_ssa_def_rewrite_uses(def, nir_src_for_ssa(&undef->def), shader); >> + } >> + >> + return true; >> +} >> + >> + >> static void >> -remove_defs_uses(nir_instr *instr) >> +remove_defs_uses(nir_instr *instr, nir_function_impl *impl) >> { >> nir_foreach_dest(instr, remove_def_cb, instr);
Re: [Mesa-dev] [PATCH 2/9] nir: insert ssa_undef instructions when cleanup up defs/uses
On Fri, May 8, 2015 at 10:03 PM, Connor Abbott wrote: > The point of cleanup_defs_uses() is to make an instruction safe to > remove by removing any references that the rest of the shader may have > to it. Previously, it was handling register use/def sets and removing > the instruction from the use sets of any SSA sources it had, but if the > instruction defined an SSA value that was used by other instructions it > wasn't doing anything. This was ok, since we were always careful to make > sure that no removed instruction ever had any uses, but now we want to > start removing unreachable instruction which might still be used in > reachable parts of the code. In that case, the value that any uses get > is undefined since the instruction never actually executes, so we can > just replace the instruction with an ssa_undef_instr. > > Signed-off-by: Connor Abbott > --- > src/glsl/nir/nir.c | 47 ++- > 1 file changed, 34 insertions(+), 13 deletions(-) > > diff --git a/src/glsl/nir/nir.c b/src/glsl/nir/nir.c > index f03e80a..362ac30 100644 > --- a/src/glsl/nir/nir.c > +++ b/src/glsl/nir/nir.c > @@ -1206,26 +1206,29 @@ stitch_blocks(nir_block *before, nir_block *after) > } > > static void > -remove_defs_uses(nir_instr *instr); > +remove_defs_uses(nir_instr *instr, nir_function_impl *impl); > > static void > -cleanup_cf_node(nir_cf_node *node) > +cleanup_cf_node(nir_cf_node *node, nir_function_impl *impl) > { > switch (node->type) { > case nir_cf_node_block: { >nir_block *block = nir_cf_node_as_block(node); >/* We need to walk the instructions and clean up defs/uses */ > - nir_foreach_instr(block, instr) > - remove_defs_uses(instr); > + nir_foreach_instr(block, instr) { > + if (instr->type == nir_instr_type_jump) > +handle_remove_jump(block, nir_instr_as_jump(instr)->type); > + remove_defs_uses(instr, impl); > + } This looks like an unrelated change. Maybe split that out? The rest of the patch looks plausible. --Jason >break; > } > > case nir_cf_node_if: { >nir_if *if_stmt = nir_cf_node_as_if(node); >foreach_list_typed(nir_cf_node, child, node, &if_stmt->then_list) > - cleanup_cf_node(child); > + cleanup_cf_node(child, impl); >foreach_list_typed(nir_cf_node, child, node, &if_stmt->else_list) > - cleanup_cf_node(child); > + cleanup_cf_node(child, impl); > >list_del(&if_stmt->condition.use_link); >break; > @@ -1234,13 +1237,12 @@ cleanup_cf_node(nir_cf_node *node) > case nir_cf_node_loop: { >nir_loop *loop = nir_cf_node_as_loop(node); >foreach_list_typed(nir_cf_node, child, node, &loop->body) > - cleanup_cf_node(child); > + cleanup_cf_node(child, impl); >break; > } > case nir_cf_node_function: { > - nir_function_impl *impl = nir_cf_node_as_function(node); >foreach_list_typed(nir_cf_node, child, node, &impl->body) > - cleanup_cf_node(child); > + cleanup_cf_node(child, impl); >break; > } > default: > @@ -1254,6 +1256,8 @@ nir_cf_node_remove(nir_cf_node *node) > nir_function_impl *impl = nir_cf_node_get_function(node); > nir_metadata_preserve(impl, nir_metadata_none); > > + cleanup_cf_node(node, impl); > + > if (node->type == nir_cf_node_block) { >/* > * Basic blocks can't really be removed by themselves, since they act > as > @@ -1275,8 +1279,6 @@ nir_cf_node_remove(nir_cf_node *node) >exec_node_remove(&node->node); >stitch_blocks(before_block, after_block); > } > - > - cleanup_cf_node(node); > } > > static bool > @@ -1443,16 +1445,35 @@ remove_def_cb(nir_dest *dest, void *state) > return true; > } > > +static bool > +remove_ssa_def_cb(nir_ssa_def *def, void *state) > +{ > + nir_function_impl *impl = state; > + nir_shader *shader = impl->overload->function->shader; > + > + if (!list_empty(&def->uses) || !list_empty(&def->if_uses)) { > + nir_ssa_undef_instr *undef = > + nir_ssa_undef_instr_create(shader, def->num_components); > + nir_instr_insert_before_cf_list(&impl->body, &undef->instr); > + nir_ssa_def_rewrite_uses(def, nir_src_for_ssa(&undef->def), shader); > + } > + > + return true; > +} > + > + > static void > -remove_defs_uses(nir_instr *instr) > +remove_defs_uses(nir_instr *instr, nir_function_impl *impl) > { > nir_foreach_dest(instr, remove_def_cb, instr); > nir_foreach_src(instr, remove_use_cb, instr); > + nir_foreach_ssa_def(instr, remove_ssa_def_cb, impl); > } > > void nir_instr_remove(nir_instr *instr) > { > - remove_defs_uses(instr); > + nir_function_impl *impl = > nir_cf_node_get_function(&instr->block->cf_node); > + remove_defs_uses(instr, impl); > exec_node_remove(&instr->node); > > if (instr->type == nir_instr_type_jump) { > -- > 2.1.0 > >
Re: [Mesa-dev] [PATCH 1/9] nir/vars_to_ssa: don't rewrite removed instructions
Reviewed-by: Jason Ekstrand wrote: > We were rewriting the uses of the intrinsic instruction to point to > something else after removing it, which only happened to work since we > were lax about having dangling uses when removing instructions. Fix that > up. > > Signed-off-by: Connor Abbott > --- > src/glsl/nir/nir_lower_vars_to_ssa.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/src/glsl/nir/nir_lower_vars_to_ssa.c > b/src/glsl/nir/nir_lower_vars_to_ssa.c > index ccb8f99..8d0ae1b 100644 > --- a/src/glsl/nir/nir_lower_vars_to_ssa.c > +++ b/src/glsl/nir/nir_lower_vars_to_ssa.c > @@ -647,11 +647,12 @@ rename_variables_block(nir_block *block, struct > lower_variables_state *state) >intrin->num_components, NULL); > > nir_instr_insert_before(&intrin->instr, &mov->instr); > -nir_instr_remove(&intrin->instr); > > nir_ssa_def_rewrite_uses(&intrin->dest.ssa, > nir_src_for_ssa(&mov->dest.dest.ssa), > state->shader); > + > +nir_instr_remove(&intrin->instr); > break; > } > > -- > 2.1.0 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/4] nvc0/ir: avoid jumping to a sched instruction
Signed-off-by: Ilia Mirkin --- Pretty sure there's nothing wrong with it, but it looks odd in the code. src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp | 2 ++ src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp | 7 +-- src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp | 2 ++ 3 files changed, 9 insertions(+), 2 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp index 6bb9620..28081fa 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp @@ -1316,6 +1316,8 @@ CodeEmitterGK110::emitFlow(const Instruction *i) } else if (mask & 2) { int32_t pcRel = f->target.bb->binPos - (codeSize + 8); + if (writeIssueDelays && !(f->target.bb->binPos & 0x3f)) + pcRel += 8; // currently we don't want absolute branches assert(!f->absolute); code[0] |= (pcRel & 0x1ff) << 23; diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp index 22db368..442cedf 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp @@ -509,10 +509,13 @@ CodeEmitterGM107::emitBRA() emitCond5(0x00, CC_TR); if (!insn->srcExists(0) || insn->src(0).getFile() != FILE_MEMORY_CONST) { + int32_t pos = insn->target.bb->binPos; + if (writeIssueDelays && !(pos & 0x1f)) + pos += 8; if (!insn->absolute) - emitField(0x14, 24, insn->target.bb->binPos - (codeSize + 8)); + emitField(0x14, 24, pos - (codeSize + 8)); else - emitField(0x14, 32, insn->target.bb->binPos); + emitField(0x14, 32, pos); } else { emitCBUF (0x24, gpr, 20, 16, 0, insn->src(0)); emitField(0x05, 1, 1); diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp index d9aed34..c241973 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp @@ -1406,6 +1406,8 @@ CodeEmitterNVC0::emitFlow(const Instruction *i) } else if (mask & 2) { int32_t pcRel = f->target.bb->binPos - (codeSize + 8); + if (writeIssueDelays && !(f->target.bb->binPos & 0x3f)) + pcRel += 8; // currently we don't want absolute branches assert(!f->absolute); code[0] |= (pcRel & 0x3f) << 26; -- 2.3.6 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 4/4] nv50/ir: allow OP_SET to merge with OP_SET_AND/etc as well as a neg
This covers the pattern where a KILL_IF is used, which triggers a comparison of -x to 0. This can usually be folded into the comparison whose result is being compared to 0, however it may, itself, have already been combined with another comparison. That shouldn't impact the logic of this pass however. With this and the & 1.0 change, code like 0020: 001c0001 80081df4 set b32 $r0 lt f32 $r0 0x3e80 0028: 001c 201fc000 and b32 $r0 $r0 0x3f80 0030: 7f9c001e dd885c00 set $p0 0x1 lt f32 neg $r0 0x0 0038: 003c 1980 $p0 discard becomes 0020: 001c001d b5881df4 set $p0 0x1 lt f32 $r0 0x3e80 0028: 003c 1980 $p0 discard Signed-off-by: Ilia Mirkin --- .../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 51 ++ 1 file changed, 33 insertions(+), 18 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp index d8af19a..43a2fe9 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp @@ -278,7 +278,6 @@ private: void tryCollapseChainedMULs(Instruction *, const int s, ImmediateValue&); - // TGSI 'true' is converted to -1 by F2I(NEG(SET)), track back to SET CmpInstruction *findOriginForTestWithZero(Value *); unsigned int foldCount; @@ -337,16 +336,10 @@ ConstantFolding::findOriginForTestWithZero(Value *value) return NULL; Instruction *insn = value->getInsn(); - while (insn && insn->op != OP_SET) { + while (insn && insn->op != OP_SET && insn->op != OP_SET_AND && + insn->op != OP_SET_OR && insn->op != OP_SET_XOR) { Instruction *next = NULL; switch (insn->op) { - case OP_NEG: - case OP_ABS: - case OP_CVT: - next = insn->getSrc(0)->getInsn(); - if (insn->sType != next->dType) -return NULL; - break; case OP_MOV: next = insn->getSrc(0)->getInsn(); break; @@ -946,29 +939,51 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue &imm0, int s) case OP_SET: // TODO: SET_AND,OR,XOR { + /* This optimizes the case where the output of a set is being compared + * to zero. Since the set can only produce 0/-1 (int) or 0/1 (float), we + * can be a lot cleverer in our comparison. + */ CmpInstruction *si = findOriginForTestWithZero(i->getSrc(t)); CondCode cc, ccZ; - if (i->src(t).mod != Modifier(0)) - return; - if (imm0.reg.data.u32 != 0 || !si || si->op != OP_SET) + if (imm0.reg.data.u32 != 0 || !si) return; cc = si->setCond; ccZ = (CondCode)((unsigned int)i->asCmp()->setCond & ~CC_U); + // We do everything assuming var (cmp) 0, reverse the condition if 0 is + // first. if (s == 0) ccZ = reverseCondCode(ccZ); + // If there is a negative modifier, we need to undo that, by flipping + // the comparison to zero. + if (i->src(t).mod.neg()) + ccZ = reverseCondCode(ccZ); + // If this is a signed comparison, we expect the input to be a regular + // boolean, i.e. 0/-1. However the rest of the logic assumes that true + // is positive, so just flip the sign. + if (i->sType == TYPE_S32) { + assert(!isFloatType(si->dType)); + ccZ = reverseCondCode(ccZ); + } switch (ccZ) { - case CC_LT: cc = CC_FL; break; - case CC_GE: cc = CC_TR; break; - case CC_EQ: cc = inverseCondCode(cc); break; - case CC_LE: cc = inverseCondCode(cc); break; - case CC_GT: break; - case CC_NE: break; + case CC_LT: cc = CC_FL; break; // bool < 0 -- this is never true + case CC_GE: cc = CC_TR; break; // bool >= 0 -- this is always true + case CC_EQ: cc = inverseCondCode(cc); break; // bool == 0 -- !bool + case CC_LE: cc = inverseCondCode(cc); break; // bool <= 0 -- !bool + case CC_GT: break; // bool > 0 -- bool + case CC_NE: break; // bool != 0 -- bool default: return; } + + // Update the condition of this SET to be identical to the origin set, + // but with the updated condition code. The original SET should get + // DCE'd, ideally. + i->op = si->op; i->asCmp()->setCond = cc; i->setSrc(0, si->src(0)); i->setSrc(1, si->src(1)); + if (si->srcExists(2)) + i->setSrc(2, si->src(2)); i->sType = si->sType; } break; -- 2.3.6 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/4] nvc0/ir: allow iset to produce a boolean float
Signed-off-by: Ilia Mirkin --- src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp | 12 src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp | 1 + src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp | 8 +++- 3 files changed, 16 insertions(+), 5 deletions(-) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp index 28081fa..ab8bf2e 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp @@ -967,8 +967,8 @@ CodeEmitterGK110::emitSET(const CmpInstruction *i) code[0] = (code[0] & ~0xfc) | ((code[0] << 3) & 0xe0); if (i->defExists(1)) defId(i->def(1), 2); - else - code[0] |= 0x1c; + else + code[0] |= 0x1c; } else { switch (i->sType) { case TYPE_F32: op2 = 0x000; op1 = 0x800; break; @@ -990,8 +990,12 @@ CodeEmitterGK110::emitSET(const CmpInstruction *i) } FTZ_(3a); - if (i->dType == TYPE_F32) - code[1] |= 1 << 23; + if (i->dType == TYPE_F32) { + if (isFloatType(i->sType)) +code[1] |= 1 << 23; + else +code[1] |= 1 << 15; + } } if (i->sType == TYPE_S32) code[1] |= 1 << 19; diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp index 442cedf..399a6f1 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp @@ -1830,6 +1830,7 @@ CodeEmitterGM107::emitISET() emitCond3(0x31, insn->setCond); emitField(0x30, 1, isSignedType(insn->sType)); emitCC (0x2f); + emitField(0x2c, 1, insn->dType == TYPE_F32); emitX(0x2b); emitGPR (0x08, insn->src(0)); emitGPR (0x00, insn->def(0)); diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp index c241973..f5992bc 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp @@ -1078,8 +1078,14 @@ CodeEmitterNVC0::emitSET(const CmpInstruction *i) if (!isFloatType(i->sType)) lo = 0x3; - if (isFloatType(i->dType) || isSignedIntType(i->sType)) + if (isSignedIntType(i->sType)) lo |= 0x20; + if (isFloatType(i->dType)) { + if (isFloatType(i->sType)) + lo |= 0x20; + else + lo |= 0x80; + } switch (i->op) { case OP_SET_AND: hi = 0x1000; break; -- 2.3.6 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/4] nvc0/ir: optimize set & 1.0 to produce boolean-float sets
This has started to happen more now that the backend is producing KILL_IF more often. Signed-off-by: Ilia Mirkin --- .../drivers/nouveau/codegen/nv50_ir_peephole.cpp | 29 ++ .../nouveau/codegen/nv50_ir_target_nv50.cpp| 2 ++ 2 files changed, 31 insertions(+) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp index 14446b6..d8af19a 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp @@ -973,6 +973,35 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue &imm0, int s) } break; + case OP_AND: + { + CmpInstruction *cmp = i->getSrc(t)->getInsn()->asCmp(); + if (!cmp || cmp->op == OP_SLCT) + return; + if (!prog->getTarget()->isOpSupported(cmp->op, TYPE_F32)) + return; + if (imm0.reg.data.f32 != 1.0) + return; + if (cmp == NULL) + return; + if (i->getSrc(t)->getInsn()->dType != TYPE_U32) + return; + + i->getSrc(t)->getInsn()->dType = TYPE_F32; + if (i->src(t).mod != Modifier(0)) { + assert(i->src(t).mod == Modifier(NV50_IR_MOD_NOT)); + i->src(t).mod = Modifier(0); + cmp->setCond = reverseCondCode(cmp->setCond); + } + i->op = OP_MOV; + i->setSrc(s, NULL); + if (t) { + i->setSrc(0, i->getSrc(t)); + i->setSrc(t, NULL); + } + } + break; + case OP_SHL: { if (s != 1 || i->src(0).mod != Modifier(0)) diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp index 178a167..70180eb 100644 --- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp +++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp @@ -413,6 +413,8 @@ TargetNV50::isOpSupported(operation op, DataType ty) const return false; case OP_SAD: return ty == TYPE_S32; + case OP_SET: + return !isFloatType(ty); default: return true; } -- 2.3.6 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 9/9] i965/fs/nir: enable the dead control flow optimization
Doesn't do anything on the public shader-db. Signed-off-by: Connor Abbott --- src/glsl/Makefile.sources | 1 + src/mesa/drivers/dri/i965/brw_nir.c | 2 ++ 2 files changed, 3 insertions(+) diff --git a/src/glsl/Makefile.sources b/src/glsl/Makefile.sources index d784a81..7bdd895 100644 --- a/src/glsl/Makefile.sources +++ b/src/glsl/Makefile.sources @@ -50,6 +50,7 @@ NIR_FILES = \ nir/nir_opt_copy_propagate.c \ nir/nir_opt_cse.c \ nir/nir_opt_dce.c \ + nir/nir_opt_dead_cf.c \ nir/nir_opt_gcm.c \ nir/nir_opt_global_to_local.c \ nir/nir_opt_peephole_ffma.c \ diff --git a/src/mesa/drivers/dri/i965/brw_nir.c b/src/mesa/drivers/dri/i965/brw_nir.c index de4d7aa..4001190 100644 --- a/src/mesa/drivers/dri/i965/brw_nir.c +++ b/src/mesa/drivers/dri/i965/brw_nir.c @@ -52,6 +52,8 @@ nir_optimize(nir_shader *nir) nir_validate_shader(nir); progress |= nir_opt_constant_folding(nir); nir_validate_shader(nir); + progress |= nir_opt_dead_cf(nir); + nir_validate_shader(nir); progress |= nir_opt_remove_phis(nir); nir_validate_shader(nir); } while (progress); -- 2.1.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 6/9] nir: add nir_block_get_following_loop() helper
Signed-off-by: Connor Abbott --- src/glsl/nir/nir.c | 16 src/glsl/nir/nir.h | 2 ++ 2 files changed, 18 insertions(+) diff --git a/src/glsl/nir/nir.c b/src/glsl/nir/nir.c index efc3f01..e3c37dd 100644 --- a/src/glsl/nir/nir.c +++ b/src/glsl/nir/nir.c @@ -2097,6 +2097,22 @@ nir_block_get_following_if(nir_block *block) return nir_cf_node_as_if(next_node); } +nir_loop * +nir_block_get_following_loop(nir_block *block) +{ + if (exec_node_is_tail_sentinel(&block->cf_node.node)) + return NULL; + + if (nir_cf_node_is_last(&block->cf_node)) + return NULL; + + nir_cf_node *next_node = nir_cf_node_next(&block->cf_node); + + if (next_node->type != nir_cf_node_loop) + return NULL; + + return nir_cf_node_as_loop(next_node); +} static bool index_block(nir_block *block, void *state) { diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h index d2bf3d7..69daa14 100644 --- a/src/glsl/nir/nir.h +++ b/src/glsl/nir/nir.h @@ -1607,6 +1607,8 @@ bool nir_foreach_block_reverse(nir_function_impl *impl, nir_foreach_block_cb cb, */ nir_if *nir_block_get_following_if(nir_block *block); +nir_loop *nir_block_get_following_loop(nir_block *block); + void nir_index_local_regs(nir_function_impl *impl); void nir_index_global_regs(nir_shader *shader); void nir_index_ssa_defs(nir_function_impl *impl); -- 2.1.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 8/9] nir/dead_cf: add support for removing useless loops
Signed-off-by: Connor Abbott --- src/glsl/nir/nir_opt_dead_cf.c | 96 -- 1 file changed, 84 insertions(+), 12 deletions(-) diff --git a/src/glsl/nir/nir_opt_dead_cf.c b/src/glsl/nir/nir_opt_dead_cf.c index 17eadbd..f96bf63 100644 --- a/src/glsl/nir/nir_opt_dead_cf.c +++ b/src/glsl/nir/nir_opt_dead_cf.c @@ -28,9 +28,9 @@ #include "nir.h" /* - * This file implements an optimization that deletes statically unreachable - * code. In NIR, one way this can happen if if an if statement has a constant - * condition: + * This file implements an optimization that deletes statically + * unreachable/dead code. In NIR, one way this can happen if if an if + * statement has a constant condition: * * if (true) { *... @@ -40,7 +40,7 @@ * branch into the surrounding control flow, possibly removing more code if * the branch had a jump at the end. * - * The other way is that control flow can end in a jump so that code after it + * Another way is that control flow can end in a jump so that code after it * never gets executed. In particular, this can happen after optimizing * something like: * @@ -59,6 +59,12 @@ * } * ... * + * Finally, we also handle removing useless loops, i.e. loops with no side + * effects and without any definitions that are used elsewhere. This case is a + * little different from the first two in that the code is actually run (it + * just never does anything), but there are similar issues with needing to + * be careful with restarting after deleting the cf_node (see dead_cf_list()) + * so this is a convenient place to remove them. */ /* deletes all the control flow nodes after 'start' in the control flow list. @@ -126,19 +132,82 @@ opt_constant_if(nir_if *if_stmt, bool condition) } static bool +block_has_no_side_effects(nir_block *block, void *state) +{ + (void) state; + + nir_foreach_instr(block, instr) { + if (instr->type == nir_instr_type_call) + return false; + + if (instr->type != nir_instr_type_intrinsic) + continue; + + nir_intrinsic_instr *intrin = nir_instr_as_intrinsic(instr); + if (!nir_intrinsic_infos[intrin->intrinsic].flags & + NIR_INTRINSIC_CAN_ELIMINATE) + return false; + } + + return true; +} + +static bool +dest_not_live_out(nir_dest *dest, void *state) +{ + nir_block *after = state; + + assert(dest->is_ssa); + return !BITSET_TEST(after->live_in, dest->ssa.live_index); +} + +static bool +loop_is_dead(nir_loop *loop) +{ + nir_block *before = nir_cf_node_as_block(nir_cf_node_prev(&loop->cf_node)); + nir_block *after = nir_cf_node_as_block(nir_cf_node_next(&loop->cf_node)); + + if (!exec_list_is_empty(&after->instr_list) && + nir_block_last_instr(after)->type == nir_instr_type_phi) + return false; + + if (!nir_foreach_block_in_cf_node(&loop->cf_node, block_has_no_side_effects, + NULL)) + return false; + + for (nir_block *cur = after->imm_dom; cur != before; cur = cur->imm_dom) { + nir_foreach_instr(cur, instr) { + if (!nir_foreach_dest(instr, dest_not_live_out, after)) +return false; + } + } + + return true; +} + +static bool dead_cf_block(nir_block *block) { nir_if *following_if = nir_block_get_following_if(block); - if (!following_if) - return false; + if (following_if) { + nir_const_value *const_value = +nir_src_as_const_value(following_if->condition); + + if (!const_value) +return false; + + opt_constant_if(following_if, const_value->u[0] != 0); + return true; + } - nir_const_value *const_value = - nir_src_as_const_value(following_if->condition); + nir_loop *following_loop = nir_block_get_following_loop(block); + if (!following_loop) + return false; - if (!const_value) - return false; + if (!loop_is_dead(following_loop)) + return false; - opt_constant_if(following_if, const_value->u[0] != 0); + nir_cf_node_remove(&following_loop->cf_node); return true; } @@ -167,7 +236,7 @@ dead_cf_list(struct exec_list *list, bool *list_ends_in_jump) case nir_cf_node_block: { nir_block *block = nir_cf_node_as_block(cur); if (dead_cf_block(block)) { -/* We just deleted the if after this block, so we may have +/* We just deleted the if or loop after this block, so we may have * deleted the block before or after it -- which one is an * implementation detail. Therefore, to recover the place we were * at, we have to use the previous cf_node. @@ -238,6 +307,9 @@ dead_cf_list(struct exec_list *list, bool *list_ends_in_jump) static bool opt_dead_cf_impl(nir_function_impl *impl) { + nir_metadata_require(impl, nir_metadata_live_variables | + nir_metadata_dominance); + bool dummy; bool progress = dead_cf_list(&impl->body, &dummy); -- 2.1.0 _
[Mesa-dev] [PATCH 3/9] nir: add an optimization for removing dead control flow
I'm not so sure about where to put the helper currently in nir.c... on the one hand, it's pretty specific to this pass, but on the other hand it needs to do some very fiddly low-level things to the control flow which is why it needs access to a static function in nir.c (stitch_blocks()) that I'd rather not expose publically. Signed-off-by: Connor Abbott --- src/glsl/nir/nir.c | 26 +++ src/glsl/nir/nir.h | 8 +++ src/glsl/nir/nir_opt_dead_cf.c | 150 + 3 files changed, 184 insertions(+) create mode 100644 src/glsl/nir/nir_opt_dead_cf.c diff --git a/src/glsl/nir/nir.c b/src/glsl/nir/nir.c index 362ac30..efc3f01 100644 --- a/src/glsl/nir/nir.c +++ b/src/glsl/nir/nir.c @@ -1281,6 +1281,32 @@ nir_cf_node_remove(nir_cf_node *node) } } +/* Takes a control flow list 'cf_list,' presumed to be a child of the control + * flow node 'node,' pastes cf_list after node, and then deletes node. + */ + +void +nir_cf_list_move_after_node(nir_cf_node *node, struct exec_list *cf_list) +{ + nir_cf_node *after = nir_cf_node_next(node); + assert(after->type == nir_cf_node_block); + nir_block *after_block = nir_cf_node_as_block(after); + + foreach_list_typed(nir_cf_node, child, node, cf_list) { + child->parent = node->parent; + } + + nir_cf_node *last = exec_node_data(nir_cf_node, exec_list_get_tail(cf_list), + node); + assert(last->type == nir_cf_node_block); + nir_block *last_block = nir_cf_node_as_block(last); + + exec_node_insert_list_before(&after->node, cf_list); + stitch_blocks(last_block, after_block); + + nir_cf_node_remove(node); +} + static bool add_use_cb(nir_src *src, void *state) { diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h index 697d37e..d2bf3d7 100644 --- a/src/glsl/nir/nir.h +++ b/src/glsl/nir/nir.h @@ -1518,6 +1518,12 @@ void nir_cf_node_insert_end(struct exec_list *list, nir_cf_node *node); /** removes a control flow node, doing any cleanup necessary */ void nir_cf_node_remove(nir_cf_node *node); + +/** Takes a control flow list 'cf_list,' presumed to be a child of the control + * flow node 'node,' pastes cf_list after node, and then deletes node. + */ +void nir_cf_list_move_after_node(nir_cf_node *node, struct exec_list *cf_list); + /** requests that the given pieces of metadata be generated */ void nir_metadata_require(nir_function_impl *impl, nir_metadata required); /** dirties all but the preserved metadata */ @@ -1692,6 +1698,8 @@ bool nir_opt_cse(nir_shader *shader); bool nir_opt_dce_impl(nir_function_impl *impl); bool nir_opt_dce(nir_shader *shader); +bool nir_opt_dead_cf(nir_shader *shader); + void nir_opt_gcm(nir_shader *shader); bool nir_opt_peephole_select(nir_shader *shader); diff --git a/src/glsl/nir/nir_opt_dead_cf.c b/src/glsl/nir/nir_opt_dead_cf.c new file mode 100644 index 000..3fbb794 --- /dev/null +++ b/src/glsl/nir/nir_opt_dead_cf.c @@ -0,0 +1,150 @@ +/* + * Copyright © 2014 Connor Abbott + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + * + * Authors: + *Connor Abbott (cwabbo...@gmail.com) + * + */ + +#include "nir.h" + +/* + * This file implements an optimization that deletes statically unreachable + * code. In NIR, one way this can happen if if an if statement has a constant + * condition: + * + * if (true) { + *... + * } + * + * We delete the if statement and paste the contents of the always-executed + * branch into the surrounding control flow, possibly removing more code if + * the branch had a jump at the end. + */ + +/* deletes all the control flow nodes after 'start' in the control flow list. + */ +static void +delete_unreachable_after(nir_cf_node *start) +{ + nir_cf_node *current = start; + while (!nir_cf_node_is_last(current)) { + current = nir_cf_node_next(current); + nir_cf_node_remove(current); + } +} + +static void +
[Mesa-dev] [PATCH 5/9] nir/dead_cf: delete code that's unreachable due to jumps
Signed-off-by: Connor Abbott --- src/glsl/nir/nir_opt_dead_cf.c | 126 ++--- 1 file changed, 118 insertions(+), 8 deletions(-) diff --git a/src/glsl/nir/nir_opt_dead_cf.c b/src/glsl/nir/nir_opt_dead_cf.c index 3fbb794..17eadbd 100644 --- a/src/glsl/nir/nir_opt_dead_cf.c +++ b/src/glsl/nir/nir_opt_dead_cf.c @@ -39,6 +39,26 @@ * We delete the if statement and paste the contents of the always-executed * branch into the surrounding control flow, possibly removing more code if * the branch had a jump at the end. + * + * The other way is that control flow can end in a jump so that code after it + * never gets executed. In particular, this can happen after optimizing + * something like: + * + * if (true) { + *... + *break; + * } + * ... + * + * We also consider the case where both branches of an if end in a jump, e.g.: + * + * if (...) { + *break; + * } else { + *continue; + * } + * ... + * */ /* deletes all the control flow nodes after 'start' in the control flow list. @@ -106,30 +126,120 @@ opt_constant_if(nir_if *if_stmt, bool condition) } static bool -dead_cf_cb(nir_block *block, void *state) +dead_cf_block(nir_block *block) { - bool *progress = state; - nir_if *following_if = nir_block_get_following_if(block); if (!following_if) - return true; + return false; nir_const_value *const_value = nir_src_as_const_value(following_if->condition); if (!const_value) - return true; + return false; opt_constant_if(following_if, const_value->u[0] != 0); - *progress = true; return true; } static bool -opt_dead_cf_impl(nir_function_impl *impl) +ends_in_jump(nir_block *block) +{ + if (exec_list_is_empty(&block->instr_list)) + return false; + + nir_instr *instr = nir_block_last_instr(block); + return instr->type == nir_instr_type_jump; +} + +static bool +dead_cf_list(struct exec_list *list, bool *list_ends_in_jump) { bool progress = false; - nir_foreach_block(impl, dead_cf_cb, &progress); + *list_ends_in_jump = false; + + nir_cf_node *cur = exec_node_data(nir_cf_node, exec_list_get_head(list), + node); + nir_cf_node *prev = NULL; + + while (cur != NULL) { + switch (cur->type) { + case nir_cf_node_block: { + nir_block *block = nir_cf_node_as_block(cur); + if (dead_cf_block(block)) { +/* We just deleted the if after this block, so we may have + * deleted the block before or after it -- which one is an + * implementation detail. Therefore, to recover the place we were + * at, we have to use the previous cf_node. + */ + +if (prev) { + cur = nir_cf_node_next(prev); +} else { + cur = exec_node_data(nir_cf_node, exec_list_get_head(list), +node); +} + +block = nir_cf_node_as_block(cur); + +progress = true; + } + + if (ends_in_jump(block)) { +*list_ends_in_jump = true; + +if (!exec_node_is_tail_sentinel(cur->node.next)) { + delete_unreachable_after(cur); + return true; +} + } + + break; + } + + case nir_cf_node_if: { + nir_if *if_stmt = nir_cf_node_as_if(cur); + bool then_ends_in_jump, else_ends_in_jump; + progress |= dead_cf_list(&if_stmt->then_list, &then_ends_in_jump); + progress |= dead_cf_list(&if_stmt->else_list, &else_ends_in_jump); + + if (then_ends_in_jump && else_ends_in_jump) { +*list_ends_in_jump = true; +nir_block *next = nir_cf_node_as_block(nir_cf_node_next(cur)); +if (!exec_list_is_empty(&next->instr_list) || +!exec_node_is_tail_sentinel(next->cf_node.node.next)) { + delete_unreachable_after(cur); + return true; +} + } + + break; + } + + case nir_cf_node_loop: { + nir_loop *loop = nir_cf_node_as_loop(cur); + bool dummy; + progress |= dead_cf_list(&loop->body, &dummy); + + break; + } + + default: + unreachable("unknown cf node type"); + } + + prev = cur; + cur = nir_cf_node_next(cur); + } + + return progress; +} + +static bool +opt_dead_cf_impl(nir_function_impl *impl) +{ + bool dummy; + bool progress = dead_cf_list(&impl->body, &dummy); if (progress) nir_metadata_preserve(impl, nir_metadata_none); -- 2.1.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/9] nir/vars_to_ssa: don't rewrite removed instructions
We were rewriting the uses of the intrinsic instruction to point to something else after removing it, which only happened to work since we were lax about having dangling uses when removing instructions. Fix that up. Signed-off-by: Connor Abbott --- src/glsl/nir/nir_lower_vars_to_ssa.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/src/glsl/nir/nir_lower_vars_to_ssa.c b/src/glsl/nir/nir_lower_vars_to_ssa.c index ccb8f99..8d0ae1b 100644 --- a/src/glsl/nir/nir_lower_vars_to_ssa.c +++ b/src/glsl/nir/nir_lower_vars_to_ssa.c @@ -647,11 +647,12 @@ rename_variables_block(nir_block *block, struct lower_variables_state *state) intrin->num_components, NULL); nir_instr_insert_before(&intrin->instr, &mov->instr); -nir_instr_remove(&intrin->instr); nir_ssa_def_rewrite_uses(&intrin->dest.ssa, nir_src_for_ssa(&mov->dest.dest.ssa), state->shader); + +nir_instr_remove(&intrin->instr); break; } -- 2.1.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/9] nir: insert ssa_undef instructions when cleanup up defs/uses
The point of cleanup_defs_uses() is to make an instruction safe to remove by removing any references that the rest of the shader may have to it. Previously, it was handling register use/def sets and removing the instruction from the use sets of any SSA sources it had, but if the instruction defined an SSA value that was used by other instructions it wasn't doing anything. This was ok, since we were always careful to make sure that no removed instruction ever had any uses, but now we want to start removing unreachable instruction which might still be used in reachable parts of the code. In that case, the value that any uses get is undefined since the instruction never actually executes, so we can just replace the instruction with an ssa_undef_instr. Signed-off-by: Connor Abbott --- src/glsl/nir/nir.c | 47 ++- 1 file changed, 34 insertions(+), 13 deletions(-) diff --git a/src/glsl/nir/nir.c b/src/glsl/nir/nir.c index f03e80a..362ac30 100644 --- a/src/glsl/nir/nir.c +++ b/src/glsl/nir/nir.c @@ -1206,26 +1206,29 @@ stitch_blocks(nir_block *before, nir_block *after) } static void -remove_defs_uses(nir_instr *instr); +remove_defs_uses(nir_instr *instr, nir_function_impl *impl); static void -cleanup_cf_node(nir_cf_node *node) +cleanup_cf_node(nir_cf_node *node, nir_function_impl *impl) { switch (node->type) { case nir_cf_node_block: { nir_block *block = nir_cf_node_as_block(node); /* We need to walk the instructions and clean up defs/uses */ - nir_foreach_instr(block, instr) - remove_defs_uses(instr); + nir_foreach_instr(block, instr) { + if (instr->type == nir_instr_type_jump) +handle_remove_jump(block, nir_instr_as_jump(instr)->type); + remove_defs_uses(instr, impl); + } break; } case nir_cf_node_if: { nir_if *if_stmt = nir_cf_node_as_if(node); foreach_list_typed(nir_cf_node, child, node, &if_stmt->then_list) - cleanup_cf_node(child); + cleanup_cf_node(child, impl); foreach_list_typed(nir_cf_node, child, node, &if_stmt->else_list) - cleanup_cf_node(child); + cleanup_cf_node(child, impl); list_del(&if_stmt->condition.use_link); break; @@ -1234,13 +1237,12 @@ cleanup_cf_node(nir_cf_node *node) case nir_cf_node_loop: { nir_loop *loop = nir_cf_node_as_loop(node); foreach_list_typed(nir_cf_node, child, node, &loop->body) - cleanup_cf_node(child); + cleanup_cf_node(child, impl); break; } case nir_cf_node_function: { - nir_function_impl *impl = nir_cf_node_as_function(node); foreach_list_typed(nir_cf_node, child, node, &impl->body) - cleanup_cf_node(child); + cleanup_cf_node(child, impl); break; } default: @@ -1254,6 +1256,8 @@ nir_cf_node_remove(nir_cf_node *node) nir_function_impl *impl = nir_cf_node_get_function(node); nir_metadata_preserve(impl, nir_metadata_none); + cleanup_cf_node(node, impl); + if (node->type == nir_cf_node_block) { /* * Basic blocks can't really be removed by themselves, since they act as @@ -1275,8 +1279,6 @@ nir_cf_node_remove(nir_cf_node *node) exec_node_remove(&node->node); stitch_blocks(before_block, after_block); } - - cleanup_cf_node(node); } static bool @@ -1443,16 +1445,35 @@ remove_def_cb(nir_dest *dest, void *state) return true; } +static bool +remove_ssa_def_cb(nir_ssa_def *def, void *state) +{ + nir_function_impl *impl = state; + nir_shader *shader = impl->overload->function->shader; + + if (!list_empty(&def->uses) || !list_empty(&def->if_uses)) { + nir_ssa_undef_instr *undef = + nir_ssa_undef_instr_create(shader, def->num_components); + nir_instr_insert_before_cf_list(&impl->body, &undef->instr); + nir_ssa_def_rewrite_uses(def, nir_src_for_ssa(&undef->def), shader); + } + + return true; +} + + static void -remove_defs_uses(nir_instr *instr) +remove_defs_uses(nir_instr *instr, nir_function_impl *impl) { nir_foreach_dest(instr, remove_def_cb, instr); nir_foreach_src(instr, remove_use_cb, instr); + nir_foreach_ssa_def(instr, remove_ssa_def_cb, impl); } void nir_instr_remove(nir_instr *instr) { - remove_defs_uses(instr); + nir_function_impl *impl = nir_cf_node_get_function(&instr->block->cf_node); + remove_defs_uses(instr, impl); exec_node_remove(&instr->node); if (instr->type == nir_instr_type_jump) { -- 2.1.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 4/9] nir/validate: validate successors at the end of a loop
I found this useful while debugging some control flow bugs while working on the dead control flow pass. Signed-off-by: Connor Abbott --- src/glsl/nir/nir_validate.c | 21 + 1 file changed, 21 insertions(+) diff --git a/src/glsl/nir/nir_validate.c b/src/glsl/nir/nir_validate.c index da92ed9..0332c7d 100644 --- a/src/glsl/nir/nir_validate.c +++ b/src/glsl/nir/nir_validate.c @@ -78,6 +78,9 @@ typedef struct { /* the parent of the current cf node being visited */ nir_cf_node *parent_node; + /* whether the current loop we're visiting has a break statement */ + bool has_break; + /* the current function implementation being validated */ nir_function_impl *impl; @@ -513,6 +516,8 @@ validate_instr(nir_instr *instr, validate_state *state) break; case nir_instr_type_jump: + if (nir_instr_as_jump(instr)->type == nir_jump_break) + state->has_break = true; break; default: @@ -641,6 +646,9 @@ validate_if(nir_if *if_stmt, validate_state *state) static void validate_loop(nir_loop *loop, validate_state *state) { + bool old_has_break = state->has_break; + state->has_break = false; + assert(!exec_node_is_head_sentinel(loop->cf_node.node.prev)); nir_cf_node *prev_node = nir_cf_node_prev(&loop->cf_node); assert(prev_node->type == nir_cf_node_block); @@ -663,7 +671,20 @@ validate_loop(nir_loop *loop, validate_state *state) validate_cf_node(cf_node, state); } + nir_block *last_block = nir_cf_node_as_block(nir_loop_last_cf_node(loop)); + if (exec_list_is_empty(&last_block->instr_list) || + nir_block_last_instr(last_block)->type != nir_instr_type_jump) { + assert(&last_block->successors[0]->cf_node == nir_loop_first_cf_node(loop)); + } + if (state->has_break) { + assert(last_block->successors[1] == NULL); + } else { + nir_block *next = nir_cf_node_as_block(nir_cf_node_next(&loop->cf_node)); + assert(last_block->successors[1] == next); + } + state->parent_node = old_parent; + state->has_break = old_has_break; } static void -- 2.1.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 7/9] nir: add a helper for iterating over blocks in a cf node
We were already doing this internally for iterating over a function implementation, so just expose it directly. Signed-off-by: Connor Abbott --- src/glsl/nir/nir.c | 7 +++ src/glsl/nir/nir.h | 2 ++ 2 files changed, 9 insertions(+) diff --git a/src/glsl/nir/nir.c b/src/glsl/nir/nir.c index e3c37dd..3cabdf7 100644 --- a/src/glsl/nir/nir.c +++ b/src/glsl/nir/nir.c @@ -2055,6 +2055,13 @@ foreach_cf_node(nir_cf_node *node, nir_foreach_block_cb cb, } bool +nir_foreach_block_in_cf_node(nir_cf_node *node, nir_foreach_block_cb cb, + void *state) +{ + return foreach_cf_node(node, cb, false, state); +} + +bool nir_foreach_block(nir_function_impl *impl, nir_foreach_block_cb cb, void *state) { foreach_list_typed_safe(nir_cf_node, node, node, &impl->body) { diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h index 69daa14..4284374 100644 --- a/src/glsl/nir/nir.h +++ b/src/glsl/nir/nir.h @@ -1601,6 +1601,8 @@ bool nir_foreach_block(nir_function_impl *impl, nir_foreach_block_cb cb, void *state); bool nir_foreach_block_reverse(nir_function_impl *impl, nir_foreach_block_cb cb, void *state); +bool nir_foreach_block_in_cf_node(nir_cf_node *node, nir_foreach_block_cb cb, + void *state); /* If the following CF node is an if, this function returns that if. * Otherwise, it returns NULL. -- 2.1.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 0/9] NIR dead control-flow removal
This series implements a bunch of related optimizations that run at once as part of the same pass to eliminate control flow that is useless. This includes both unreachable code and useless loops, i.e. loops that don't compute a result used by the rest of the code. To do this, I needed to change some low-level things that change the behavior of nir_instr_remove() (see patch 2 for details), but I've tested the series on piglit and there are no regressions, and there are no shader-db changes either. This is rebased on Jason's series to use lists for use-def sets; it turns out that the changes required were pretty minimal. The series is also available at: git://people.freedesktop.org/~cwabbott0/mesa nir-dead-cf-v3 Connor Abbott (9): nir/vars_to_ssa: don't rewrite removed instructions nir: insert ssa_undef instructions when cleanup up defs/uses nir: add an optimization for removing dead control flow nir/validate: validate successors at the end of a loop nir/dead_cf: delete code that's unreachable due to jumps nir: add nir_block_get_following_loop() helper nir: add a helper for iterating over blocks in a cf node nir/dead_cf: add support for removing useless loops i965/fs/nir: enable the dead control flow optimization src/glsl/Makefile.sources| 1 + src/glsl/nir/nir.c | 94 -- src/glsl/nir/nir.h | 12 ++ src/glsl/nir/nir_lower_vars_to_ssa.c | 3 +- src/glsl/nir/nir_opt_dead_cf.c | 332 +++ src/glsl/nir/nir_validate.c | 21 +++ src/mesa/drivers/dri/i965/brw_nir.c | 2 + 7 files changed, 452 insertions(+), 13 deletions(-) create mode 100644 src/glsl/nir/nir_opt_dead_cf.c -- 2.1.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [xf86-video-nouveau] dri2: Enable BufferAge support
On 01/19/2015 12:00 PM, Chris Wilson wrote: For enable BufferAge support, we just have to be not using the DRI2Buffer->flags field for any purpose (i.e. it is always expected to be 0, as it is now) and to be sure to swap the flags field whenever we exchange buffers. As nouveau does not exactly support TripleBuffer, we don't have to worry about setting the copying the flags field when injecting the third buffer. Signed-off-by: Chris Wilson Cc: Maarten Lankhorst Cc: Mario Kleiner --- src/nouveau_dri2.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/src/nouveau_dri2.c b/src/nouveau_dri2.c index e3445b2..428ef92 100644 --- a/src/nouveau_dri2.c +++ b/src/nouveau_dri2.c @@ -711,6 +711,7 @@ nouveau_dri2_finish_swap(DrawablePtr draw, unsigned int frame, } SWAP(s->dst->name, s->src->name); + SWAP(s->dst->flags, s->src->flags); SWAP(nouveau_pixmap(dst_pix)->bo, nouveau_pixmap(src_pix)->bo); DamageRegionProcessPending(draw); @@ -1003,6 +1004,12 @@ nouveau_dri2_init(ScreenPtr pScreen) dri2.DestroyBuffer2 = nouveau_dri2_destroy_buffer2; dri2.CopyRegion2 = nouveau_dri2_copy_region2; #endif + +#if DRI2INFOREC_VERSION >= 10 + dri2.version = 10; + dri2.bufferAge = 1; +#endif + return DRI2ScreenInit(pScreen, &dri2); } Seems ok to me. Reviewed-by: Mario Kleiner -mario ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/fs: Add support for removing NOT.NZ and NOT.Z instructions.
This was not fully baked. I'll send out a fixed version later. On 05/08/2015 07:05 PM, Ian Romanick wrote: > From: Ian Romanick > > Shader-db results: > > GM45 and Iron Lake: > total instructions in shared programs: 7888585 -> 7888585 (0.00%) > instructions in affected programs: 0 -> 0 > > Sandy Bridge, Ivy Bridge, Haswell, and Broadwell: > total instructions in shared programs: 9598608 -> 9598572 (-0.00%) > instructions in affected programs: 6506 -> 6470 (-0.55%) > helped:36 > > Signed-off-by: Ian Romanick > --- > src/mesa/drivers/dri/i965/brw_fs_cmod_propagation.cpp | 8 +++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_fs_cmod_propagation.cpp > b/src/mesa/drivers/dri/i965/brw_fs_cmod_propagation.cpp > index 469f2ea..d72a83a 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs_cmod_propagation.cpp > +++ b/src/mesa/drivers/dri/i965/brw_fs_cmod_propagation.cpp > @@ -59,7 +59,8 @@ opt_cmod_propagation_local(bblock_t *block) > >if ((inst->opcode != BRW_OPCODE_AND && > inst->opcode != BRW_OPCODE_CMP && > - inst->opcode != BRW_OPCODE_MOV) || > + inst->opcode != BRW_OPCODE_MOV && > + inst->opcode != BRW_OPCODE_NOT) || >inst->predicate != BRW_PREDICATE_NONE || >!inst->dst.is_null() || >inst->src[0].file != GRF || > @@ -86,6 +87,11 @@ opt_cmod_propagation_local(bblock_t *block) >inst->conditional_mod != BRW_CONDITIONAL_NZ) > continue; > > + if (inst->opcode == BRW_OPCODE_NOT && > + inst->conditional_mod != BRW_CONDITIONAL_Z && > + inst->conditional_mod != BRW_CONDITIONAL_NZ) > + continue; > + >bool read_flag = false; >foreach_inst_in_block_reverse_starting_from(fs_inst, scan_inst, inst, >block) { > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965/fs: Add support for removing NOT.NZ and NOT.Z instructions.
From: Ian Romanick Shader-db results: GM45 and Iron Lake: total instructions in shared programs: 7888585 -> 7888585 (0.00%) instructions in affected programs: 0 -> 0 Sandy Bridge, Ivy Bridge, Haswell, and Broadwell: total instructions in shared programs: 9598608 -> 9598572 (-0.00%) instructions in affected programs: 6506 -> 6470 (-0.55%) helped:36 Signed-off-by: Ian Romanick --- src/mesa/drivers/dri/i965/brw_fs_cmod_propagation.cpp | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs_cmod_propagation.cpp b/src/mesa/drivers/dri/i965/brw_fs_cmod_propagation.cpp index 469f2ea..d72a83a 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_cmod_propagation.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_cmod_propagation.cpp @@ -59,7 +59,8 @@ opt_cmod_propagation_local(bblock_t *block) if ((inst->opcode != BRW_OPCODE_AND && inst->opcode != BRW_OPCODE_CMP && - inst->opcode != BRW_OPCODE_MOV) || + inst->opcode != BRW_OPCODE_MOV && + inst->opcode != BRW_OPCODE_NOT) || inst->predicate != BRW_PREDICATE_NONE || !inst->dst.is_null() || inst->src[0].file != GRF || @@ -86,6 +87,11 @@ opt_cmod_propagation_local(bblock_t *block) inst->conditional_mod != BRW_CONDITIONAL_NZ) continue; + if (inst->opcode == BRW_OPCODE_NOT && + inst->conditional_mod != BRW_CONDITIONAL_Z && + inst->conditional_mod != BRW_CONDITIONAL_NZ) + continue; + bool read_flag = false; foreach_inst_in_block_reverse_starting_from(fs_inst, scan_inst, inst, block) { -- 2.1.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nir/search: Save/restore the variables_seen bitmask when matching
On Fri, May 8, 2015 at 3:32 PM, Ian Romanick wrote: > On 05/08/2015 03:31 PM, Ian Romanick wrote: >> On 05/08/2015 03:25 PM, Jason Ekstrand wrote: >>> On Fri, May 8, 2015 at 3:20 PM, Ian Romanick wrote: On 05/08/2015 11:55 AM, Jason Ekstrand wrote: > On Fri, May 8, 2015 at 11:53 AM, Jason Ekstrand > wrote: >> total instructions in shared programs: 7152330 -> 7137006 (-0.21%) >> instructions in affected programs: 1330548 -> 1315224 (-1.15%) >> helped:5797 >> HURT: 76 > > I'm doing some looking into the hurt programs. It seems as if there > are some very strange interatctions between flrp and ffma. I'm still > working out exactly how to fix it up. Yes, I noticed this too. Did you see my reply to Ken earlier today? The problem I noticed /seems/ restricted to cases where the would-be interpolant is scalar but the other values are vector. >>> >>> It would surprise me a lot of that mattered. At the point where we do >>> opt_algebraic in NIR, we've already scalarized things. That said, >>> there is a *lot* going on in an optimization loop so anything's >>> possible. >> >> If I take the shader_runner test that I included in the e-mail to Ken >> and s/float alpha/vec3 alpha/ I get LRPs. I made some obvious tweaks to >> opt_algebraic to handle the mix of scalar and vector, and something like >> 150 shaders were helped. Ok, that's weird. I really have no idea why that would make a difference. Unless, of course, it affects the optimizations that GLSL IR is able to do which then, in turn, affects NIR. That would make some sense. >> I spent about an hour digging into it, and I came up dry. I have tried >> adding some rules to nir_opt_algebraic.py to convert the fmul/ffma to >> flrp, and I couldn't get a break point at the nir_opt_ffma to trigger. >> I was going to ask about it at the office on Monday, but it came up on >> the list first. > > FWIW, those rules were: > >(('ffma', b, c, ('fmul', a, ('fadd', 1.0, ('fneg', c, ('flrp', a, b, > c), '!options->lower_flrp'), >(('ffma', c, b, ('fmul', a, ('fadd', 1.0, ('fneg', c, ('flrp', a, b, > c), '!options->lower_flrp'), Did you add them to optimizations or late_optimizations? The late ones get run after opt_ffma and the others don't. I poked around for an hour or two today with similar optimizations (I actually had 6) but I wasn't able to get it to do any better. --Jason ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nir/search: Save/restore the variables_seen bitmask when matching
On 05/08/2015 03:31 PM, Ian Romanick wrote: > On 05/08/2015 03:25 PM, Jason Ekstrand wrote: >> On Fri, May 8, 2015 at 3:20 PM, Ian Romanick wrote: >>> On 05/08/2015 11:55 AM, Jason Ekstrand wrote: On Fri, May 8, 2015 at 11:53 AM, Jason Ekstrand wrote: > total instructions in shared programs: 7152330 -> 7137006 (-0.21%) > instructions in affected programs: 1330548 -> 1315224 (-1.15%) > helped:5797 > HURT: 76 I'm doing some looking into the hurt programs. It seems as if there are some very strange interatctions between flrp and ffma. I'm still working out exactly how to fix it up. >>> >>> Yes, I noticed this too. Did you see my reply to Ken earlier today? >>> The problem I noticed /seems/ restricted to cases where the would-be >>> interpolant is scalar but the other values are vector. >> >> It would surprise me a lot of that mattered. At the point where we do >> opt_algebraic in NIR, we've already scalarized things. That said, >> there is a *lot* going on in an optimization loop so anything's >> possible. > > If I take the shader_runner test that I included in the e-mail to Ken > and s/float alpha/vec3 alpha/ I get LRPs. I made some obvious tweaks to > opt_algebraic to handle the mix of scalar and vector, and something like > 150 shaders were helped. > > I spent about an hour digging into it, and I came up dry. I have tried > adding some rules to nir_opt_algebraic.py to convert the fmul/ffma to > flrp, and I couldn't get a break point at the nir_opt_ffma to trigger. > I was going to ask about it at the office on Monday, but it came up on > the list first. FWIW, those rules were: (('ffma', b, c, ('fmul', a, ('fadd', 1.0, ('fneg', c, ('flrp', a, b, c), '!options->lower_flrp'), (('ffma', c, b, ('fmul', a, ('fadd', 1.0, ('fneg', c, ('flrp', a, b, c), '!options->lower_flrp'), > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nir/search: Save/restore the variables_seen bitmask when matching
On 05/08/2015 03:25 PM, Jason Ekstrand wrote: > On Fri, May 8, 2015 at 3:20 PM, Ian Romanick wrote: >> On 05/08/2015 11:55 AM, Jason Ekstrand wrote: >>> On Fri, May 8, 2015 at 11:53 AM, Jason Ekstrand >>> wrote: total instructions in shared programs: 7152330 -> 7137006 (-0.21%) instructions in affected programs: 1330548 -> 1315224 (-1.15%) helped:5797 HURT: 76 >>> >>> I'm doing some looking into the hurt programs. It seems as if there >>> are some very strange interatctions between flrp and ffma. I'm still >>> working out exactly how to fix it up. >> >> Yes, I noticed this too. Did you see my reply to Ken earlier today? >> The problem I noticed /seems/ restricted to cases where the would-be >> interpolant is scalar but the other values are vector. > > It would surprise me a lot of that mattered. At the point where we do > opt_algebraic in NIR, we've already scalarized things. That said, > there is a *lot* going on in an optimization loop so anything's > possible. If I take the shader_runner test that I included in the e-mail to Ken and s/float alpha/vec3 alpha/ I get LRPs. I made some obvious tweaks to opt_algebraic to handle the mix of scalar and vector, and something like 150 shaders were helped. I spent about an hour digging into it, and I came up dry. I have tried adding some rules to nir_opt_algebraic.py to convert the fmul/ffma to flrp, and I couldn't get a break point at the nir_opt_ffma to trigger. I was going to ask about it at the office on Monday, but it came up on the list first. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nir/search: Save/restore the variables_seen bitmask when matching
On Fri, May 8, 2015 at 3:20 PM, Ian Romanick wrote: > On 05/08/2015 11:55 AM, Jason Ekstrand wrote: >> On Fri, May 8, 2015 at 11:53 AM, Jason Ekstrand wrote: >>> total instructions in shared programs: 7152330 -> 7137006 (-0.21%) >>> instructions in affected programs: 1330548 -> 1315224 (-1.15%) >>> helped:5797 >>> HURT: 76 >> >> I'm doing some looking into the hurt programs. It seems as if there >> are some very strange interatctions between flrp and ffma. I'm still >> working out exactly how to fix it up. > > Yes, I noticed this too. Did you see my reply to Ken earlier today? > The problem I noticed /seems/ restricted to cases where the would-be > interpolant is scalar but the other values are vector. It would surprise me a lot of that mattered. At the point where we do opt_algebraic in NIR, we've already scalarized things. That said, there is a *lot* going on in an optimization loop so anything's possible. --Jason >> --Jason >> >>> GAINED:0 >>> LOST: 8 >>> --- >>> src/glsl/nir/nir_search.c | 11 +++ >>> 1 file changed, 11 insertions(+) >>> >>> diff --git a/src/glsl/nir/nir_search.c b/src/glsl/nir/nir_search.c >>> index d86655b..4c83349 100644 >>> --- a/src/glsl/nir/nir_search.c >>> +++ b/src/glsl/nir/nir_search.c >>> @@ -199,6 +199,10 @@ match_expression(const nir_search_expression *expr, >>> nir_alu_instr *instr, >>>} >>> } >>> >>> + /* Stash off the current variables_seen bitmask. This way we can >>> +* restore it prior to matching in the commutative case below. */ >>> + unsigned variables_seen_stash = state->variables_seen; >>> + >>> bool matched = true; >>> for (unsigned i = 0; i < nir_op_infos[instr->op].num_inputs; i++) { >>>/* If the source is an explicitly sized source, then we need to reset >>> @@ -221,6 +225,13 @@ match_expression(const nir_search_expression *expr, >>> nir_alu_instr *instr, >>> >>> if (nir_op_infos[instr->op].algebraic_properties & >>> NIR_OP_IS_COMMUTATIVE) { >>>assert(nir_op_infos[instr->op].num_inputs == 2); >>> + >>> + /* Restore the variables_seen bitmask. If we don't do this, then we >>> + * could end up with an erroneous failure due to variables found in >>> the >>> + * first match attempt above not matching those in the second. >>> + */ >>> + state->variables_seen = variables_seen_stash; >>> + >>>if (!match_value(expr->srcs[0], instr, 1, num_components, >>> swizzle, state)) >>> return false; >>> -- >>> 2.4.0 >>> >> ___ >> mesa-dev mailing list >> mesa-dev@lists.freedesktop.org >> http://lists.freedesktop.org/mailman/listinfo/mesa-dev >> > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nir/search: Save/restore the variables_seen bitmask when matching
On 05/08/2015 11:55 AM, Jason Ekstrand wrote: > On Fri, May 8, 2015 at 11:53 AM, Jason Ekstrand wrote: >> total instructions in shared programs: 7152330 -> 7137006 (-0.21%) >> instructions in affected programs: 1330548 -> 1315224 (-1.15%) >> helped:5797 >> HURT: 76 > > I'm doing some looking into the hurt programs. It seems as if there > are some very strange interatctions between flrp and ffma. I'm still > working out exactly how to fix it up. Yes, I noticed this too. Did you see my reply to Ken earlier today? The problem I noticed /seems/ restricted to cases where the would-be interpolant is scalar but the other values are vector. > --Jason > >> GAINED:0 >> LOST: 8 >> --- >> src/glsl/nir/nir_search.c | 11 +++ >> 1 file changed, 11 insertions(+) >> >> diff --git a/src/glsl/nir/nir_search.c b/src/glsl/nir/nir_search.c >> index d86655b..4c83349 100644 >> --- a/src/glsl/nir/nir_search.c >> +++ b/src/glsl/nir/nir_search.c >> @@ -199,6 +199,10 @@ match_expression(const nir_search_expression *expr, >> nir_alu_instr *instr, >>} >> } >> >> + /* Stash off the current variables_seen bitmask. This way we can >> +* restore it prior to matching in the commutative case below. */ >> + unsigned variables_seen_stash = state->variables_seen; >> + >> bool matched = true; >> for (unsigned i = 0; i < nir_op_infos[instr->op].num_inputs; i++) { >>/* If the source is an explicitly sized source, then we need to reset >> @@ -221,6 +225,13 @@ match_expression(const nir_search_expression *expr, >> nir_alu_instr *instr, >> >> if (nir_op_infos[instr->op].algebraic_properties & >> NIR_OP_IS_COMMUTATIVE) { >>assert(nir_op_infos[instr->op].num_inputs == 2); >> + >> + /* Restore the variables_seen bitmask. If we don't do this, then we >> + * could end up with an erroneous failure due to variables found in >> the >> + * first match attempt above not matching those in the second. >> + */ >> + state->variables_seen = variables_seen_stash; >> + >>if (!match_value(expr->srcs[0], instr, 1, num_components, >> swizzle, state)) >> return false; >> -- >> 2.4.0 >> > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 4/5] clover: Add a mutex to guard queue::queued_events
Tom Stellard writes: > This fixes a potential crash where on a sequence like this: > > Thread 0: Check if queue is not empty. > Thread 1: Remove item from queue, making it empty. > Thread 0: Do something assuming queue is not empty. > --- > src/gallium/state_trackers/clover/core/queue.cpp | 2 ++ > src/gallium/state_trackers/clover/core/queue.hpp | 2 ++ > 2 files changed, 4 insertions(+) > > diff --git a/src/gallium/state_trackers/clover/core/queue.cpp > b/src/gallium/state_trackers/clover/core/queue.cpp > index 24f9326..87f9dcc 100644 > --- a/src/gallium/state_trackers/clover/core/queue.cpp > +++ b/src/gallium/state_trackers/clover/core/queue.cpp > @@ -44,6 +44,7 @@ command_queue::flush() { > pipe_screen *screen = device().pipe; > pipe_fence_handle *fence = NULL; > > + std::lock_guard lock(queued_events_mutex); > if (!queued_events.empty()) { >pipe->flush(pipe, &fence, 0); > > @@ -69,6 +70,7 @@ command_queue::profiling_enabled() const { > > void > command_queue::sequence(hard_event &ev) { > + std::lock_guard lock(queued_events_mutex); > if (!queued_events.empty()) >queued_events.back()().chain(ev); > > diff --git a/src/gallium/state_trackers/clover/core/queue.hpp > b/src/gallium/state_trackers/clover/core/queue.hpp > index b7166e6..bddb86c 100644 > --- a/src/gallium/state_trackers/clover/core/queue.hpp > +++ b/src/gallium/state_trackers/clover/core/queue.hpp > @@ -24,6 +24,7 @@ > #define CLOVER_CORE_QUEUE_HPP > > #include > +#include > > #include "core/object.hpp" > #include "core/context.hpp" > @@ -69,6 +70,7 @@ namespace clover { > >cl_command_queue_properties props; >pipe_context *pipe; > + std::mutex queued_events_mutex; >std::deque> queued_events; > }; > } > -- > 2.0.4 Thanks, this patch is: Reviewed-by: Francisco Jerez signature.asc Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nir/search: Assert that variable id's are in range
Reviewed-by: Connor Abbott On Fri, May 8, 2015 at 3:13 PM, Jason Ekstrand wrote: > --- > src/glsl/nir/nir_search.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/src/glsl/nir/nir_search.c b/src/glsl/nir/nir_search.c > index 5ba0160..d86655b 100644 > --- a/src/glsl/nir/nir_search.c > +++ b/src/glsl/nir/nir_search.c > @@ -90,6 +90,7 @@ match_value(const nir_search_value *value, nir_alu_instr > *instr, unsigned src, > > case nir_search_value_variable: { >nir_search_variable *var = nir_search_value_as_variable(value); > + assert(var->variable < NIR_SEARCH_MAX_VARIABLES); > >if (state->variables_seen & (1 << var->variable)) { > if (!nir_srcs_equal(state->variables[var->variable].src, > -- > 2.4.0 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nir/search: Save/restore the variables_seen bitmask when matching
On Fri, May 8, 2015 at 2:53 PM, Jason Ekstrand wrote: > total instructions in shared programs: 7152330 -> 7137006 (-0.21%) > instructions in affected programs: 1330548 -> 1315224 (-1.15%) > helped:5797 > HURT: 76 > GAINED:0 > LOST: 8 > --- > src/glsl/nir/nir_search.c | 11 +++ > 1 file changed, 11 insertions(+) > > diff --git a/src/glsl/nir/nir_search.c b/src/glsl/nir/nir_search.c > index d86655b..4c83349 100644 > --- a/src/glsl/nir/nir_search.c > +++ b/src/glsl/nir/nir_search.c > @@ -199,6 +199,10 @@ match_expression(const nir_search_expression *expr, > nir_alu_instr *instr, >} > } > > + /* Stash off the current variables_seen bitmask. This way we can > +* restore it prior to matching in the commutative case below. */ Puth the */ on the next line, and this is Reviewed-by: Connor Abbott > + unsigned variables_seen_stash = state->variables_seen; > + > bool matched = true; > for (unsigned i = 0; i < nir_op_infos[instr->op].num_inputs; i++) { >/* If the source is an explicitly sized source, then we need to reset > @@ -221,6 +225,13 @@ match_expression(const nir_search_expression *expr, > nir_alu_instr *instr, > > if (nir_op_infos[instr->op].algebraic_properties & NIR_OP_IS_COMMUTATIVE) > { >assert(nir_op_infos[instr->op].num_inputs == 2); > + > + /* Restore the variables_seen bitmask. If we don't do this, then we > + * could end up with an erroneous failure due to variables found in the > + * first match attempt above not matching those in the second. > + */ > + state->variables_seen = variables_seen_stash; > + >if (!match_value(expr->srcs[0], instr, 1, num_components, > swizzle, state)) > return false; > -- > 2.4.0 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] Stable libEGL ABI ?
Hi all, Just had a quick look at Debian's repo and noticed something rather worrying - the ABI of our libEGL is not stable. Stable in the sense of - we provide a growing list of static symbols for each new function an EGL extension adds. This comes as we set EGL_EGLEXT_PROTOTYPES, prior to including eglext.h. Which in turn exposes the the function prototypes which are explicitly annotated with default visibility. This change was introduced with commit 1ed1027e886980b9b0f48fa6bfcf3d6e209c7787 Author: Brian Paul Date: Tue May 27 13:45:41 2008 -0600 assorted changes to compile with new EGL 1.4 headers (untested) From going through the EGL 1.3 to 1.5 spec, I cannot see any mention that one should export EGL extension functions from their implementation. On the contrary, it is explicitly mentioned that some implementations may do so, but relying on it in your program is not portable. Quick look at the Nvidia official driver shows only core EGL functions, which aligns with various "undefined symbol eglCreateImageKHR" issues that I've seen not too long ago - mir, gstreamer, piglit. So the question here is: Can we "break" the already non-portable ABI, and hide everything that is not core EGL, or alternatively how can we rework things so that as we add new entry points, required by extensions, they are available only dynamically ? Personally I would opt for the former and address any issues in piglit/waffle/foo accordingly. I would suspect that libepoxy is OK, although I've never looked too closely at the code. I would love to get this in some shape or form for 10.6, and avoid exporting the two new functions from EGL_MESA_image_dma_buf_export :-\ Thanks Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 11/23] glsl/types: add new subroutine type
Patches 11-13 are: Reviewed-by: Chris Forbes On Fri, Apr 24, 2015 at 1:42 PM, Dave Airlie wrote: > From: Dave Airlie > > This type will be used to store the name of subroutine types > > as in subroutine void myfunc(void); > will store myfunc into a subroutine type. > > This is required to the parser can identify a subroutine > type in a uniform decleration as a valid type, and also for > looking up the type later. > > Also add contains_subroutine method. > > Signed-off-by: Dave Airlie > --- > src/glsl/glsl_types.cpp| 63 > ++ > src/glsl/glsl_types.h | 19 ++ > src/glsl/ir_clone.cpp | 1 + > src/glsl/link_uniform_initializers.cpp | 1 + > 4 files changed, 84 insertions(+) > > diff --git a/src/glsl/glsl_types.cpp b/src/glsl/glsl_types.cpp > index 9c9b7ef..37b5c62 100644 > --- a/src/glsl/glsl_types.cpp > +++ b/src/glsl/glsl_types.cpp > @@ -32,6 +32,7 @@ mtx_t glsl_type::mutex = _MTX_INITIALIZER_NP; > hash_table *glsl_type::array_types = NULL; > hash_table *glsl_type::record_types = NULL; > hash_table *glsl_type::interface_types = NULL; > +hash_table *glsl_type::subroutine_types = NULL; > void *glsl_type::mem_ctx = NULL; > > void > @@ -159,6 +160,22 @@ glsl_type::glsl_type(const glsl_struct_field *fields, > unsigned num_fields, > mtx_unlock(&glsl_type::mutex); > } > > +glsl_type::glsl_type(const char *subroutine_name) : > + gl_type(0), > + base_type(GLSL_TYPE_SUBROUTINE), > + sampler_dimensionality(0), sampler_shadow(0), sampler_array(0), > + sampler_type(0), interface_packing(0), > + vector_elements(0), matrix_columns(0), > + length(0) > +{ > + mtx_lock(&glsl_type::mutex); > + > + init_ralloc_type_ctx(); > + assert(subroutine_name != NULL); > + this->name = ralloc_strdup(this->mem_ctx, subroutine_name); > + this->vector_elements = 1; > + mtx_unlock(&glsl_type::mutex); > +} > > bool > glsl_type::contains_sampler() const > @@ -229,6 +246,22 @@ glsl_type::contains_opaque() const { > } > } > > +bool > +glsl_type::contains_subroutine() const > +{ > + if (this->is_array()) { > + return this->fields.array->contains_subroutine(); > + } else if (this->is_record()) { > + for (unsigned int i = 0; i < this->length; i++) { > +if (this->fields.structure[i].type->contains_subroutine()) > + return true; > + } > + return false; > + } else { > + return this->is_subroutine(); > + } > +} > + > gl_texture_index > glsl_type::sampler_index() const > { > @@ -826,6 +859,34 @@ glsl_type::get_interface_instance(const > glsl_struct_field *fields, > return t; > } > > +const glsl_type * > +glsl_type::get_subroutine_instance(const char *subroutine_name) > +{ > + const glsl_type key(subroutine_name); > + > + mtx_lock(&glsl_type::mutex); > + > + if (subroutine_types == NULL) { > + subroutine_types = hash_table_ctor(64, record_key_hash, > record_key_compare); > + } > + > + const glsl_type *t = (glsl_type *) hash_table_find(subroutine_types, & > key); > + if (t == NULL) { > + mtx_unlock(&glsl_type::mutex); > + t = new glsl_type(subroutine_name); > + mtx_lock(&glsl_type::mutex); > + > + hash_table_insert(subroutine_types, (void *) t, t); > + } > + > + assert(t->base_type == GLSL_TYPE_SUBROUTINE); > + assert(strcmp(t->name, subroutine_name) == 0); > + > + mtx_unlock(&glsl_type::mutex); > + > + return t; > +} > + > > const glsl_type * > glsl_type::get_mul_type(const glsl_type *type_a, const glsl_type *type_b) > @@ -958,6 +1019,7 @@ glsl_type::component_slots() const > case GLSL_TYPE_SAMPLER: > case GLSL_TYPE_ATOMIC_UINT: > case GLSL_TYPE_VOID: > + case GLSL_TYPE_SUBROUTINE: > case GLSL_TYPE_ERROR: >break; > } > @@ -1330,6 +1392,7 @@ glsl_type::count_attribute_slots() const > case GLSL_TYPE_IMAGE: > case GLSL_TYPE_ATOMIC_UINT: > case GLSL_TYPE_VOID: > + case GLSL_TYPE_SUBROUTINE: > case GLSL_TYPE_ERROR: >break; > } > diff --git a/src/glsl/glsl_types.h b/src/glsl/glsl_types.h > index d383dd5..078adaf 100644 > --- a/src/glsl/glsl_types.h > +++ b/src/glsl/glsl_types.h > @@ -59,6 +59,7 @@ enum glsl_base_type { > GLSL_TYPE_INTERFACE, > GLSL_TYPE_ARRAY, > GLSL_TYPE_VOID, > + GLSL_TYPE_SUBROUTINE, > GLSL_TYPE_ERROR > }; > > @@ -276,6 +277,11 @@ struct glsl_type { > const char *block_name); > > /** > +* Get the instance of an subroutine type > +*/ > + static const glsl_type *get_subroutine_instance(const char > *subroutine_name); > + > + /** > * Get the type resulting from a multiplication of \p type_a * \p type_b > */ > static const glsl_type *get_mul_type(const glsl_type *type_a, > @@ -526,6 +532,13 @@ struct glsl_type { > /** > * Query if a type is unnamed/anonymous (named by the parser) > */ > + > + bool is_subroutine() const > +
Re: [Mesa-dev] [PATCH] nir: fix sampler lowering pass for arrays
Over-all, I think this is on the right track, but I still don't think it's 100% correct. On Fri, May 8, 2015 at 12:04 AM, Tapani Pälli wrote: > > > On 05/08/2015 09:56 AM, Pohjolainen, Topi wrote: >> >> On Fri, May 08, 2015 at 09:51:54AM +0300, Tapani P?lli wrote: >>> >>> This fixes bugs with special cases where we have arrays of >>> structures containing samplers or arrays of samplers. >>> >>> I've verified that patch results in calculating same index value as >>> returned by _mesa_get_sampler_uniform_value for IR. Patch makes >>> following ES3 conformance test pass: >>> >>> ES3-CTS.shaders.struct.uniform.sampler_array_fragment >>> >>> Signed-off-by: Tapani Pälli >>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90114 >>> --- >>> src/glsl/nir/nir_lower_samplers.cpp | 6 +- >>> 1 file changed, 5 insertions(+), 1 deletion(-) >>> >>> diff --git a/src/glsl/nir/nir_lower_samplers.cpp >>> b/src/glsl/nir/nir_lower_samplers.cpp >>> index cf8ab83..9859cc0 100644 >>> --- a/src/glsl/nir/nir_lower_samplers.cpp >>> +++ b/src/glsl/nir/nir_lower_samplers.cpp >>> @@ -78,7 +78,11 @@ lower_sampler(nir_tex_instr *instr, const struct >>> gl_shader_program *shader_progr >>>instr->sampler_index *= glsl_get_length(deref->type); We really should get rid of the multiply since the sampler index is zero up until the end and the multiply does nothing but confuse people. >>>switch (deref_array->deref_array_type) { >>>case nir_deref_array_type_direct: >>> -instr->sampler_index += deref_array->base_offset; >>> + >>> +/* If this is an array of samplers. */ >> >> >> Above the case is for arrays and below you check for the sampler. This >> comment does not tell much extra :) > > > Yeah, not sure how to change it. What I want to state here is that only for > arrays of samplers we need to do this, otherwise we don't. The only other > case really is array of structs that contain sampler so maybe I should state > that instead? > > > >>> +if (deref->child->type->base_type == GLSL_TYPE_SAMPLER) >>> + instr->sampler_index += deref_array->base_offset; >>> + >>> if (deref_array->deref.child) >>> ralloc_asprintf_append(&name, "[%u]", >>> deref_array->base_offset); The two conditionals above should be combined. If the deref has a child, it should not have type SAMPLER and vice-versa. A better way to do it would be if (deref_array->deref.child) { ralloc_asprintf_append(&name, "[%u]", deref_array->base_offset); } else { assert(deref->child->type->bbase_type == GLSL_TYPE_SAMPLER); instr->sampler_index = deref_array->base_offset; } Also, it may be better to do that outside of the switch and turn the switch into an "if (deref_array->deref_array_type == deref_array_type_indirect)" to handle indirects. Right now, I don't think that we correctly handle an indirect with a base offset other than 0. Does that make sense? --Jason >>> break; >>> -- >>> 2.1.0 >>> >>> ___ >>> mesa-dev mailing list >>> mesa-dev@lists.freedesktop.org >>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [RFC PATCH] nir: Transform 4*x into x << 2 during late optimizations.
Ilia Mirkin writes: > On Fri, May 8, 2015 at 6:36 AM, Kenneth Graunke wrote: >> + # Multiplication by 4 comes up fairly often in indirect offset >> calculations. >> + # Some GPUs have weird integer multiplication limitations, but shifts >> should work >> + # equally well everywhere. >> + (('imul', 4, a), ('ishl', a, 2)), > > Not sure what the cost of doing it this way, but really you want all > powers of 2... and also udiv -> shr. Since this is python, should be > easy enough to append onto that list. AFAIK all GPU's prefer a shift > over a mul. Adreno doen't have 32-bit imul in the first place (and no > idiv either). I can confirm that I'd love shifts instead of imul/udiv on vc4. signature.asc Description: PGP signature ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] nir/search: Assert that variable id's are in range
--- src/glsl/nir/nir_search.c | 1 + 1 file changed, 1 insertion(+) diff --git a/src/glsl/nir/nir_search.c b/src/glsl/nir/nir_search.c index 5ba0160..d86655b 100644 --- a/src/glsl/nir/nir_search.c +++ b/src/glsl/nir/nir_search.c @@ -90,6 +90,7 @@ match_value(const nir_search_value *value, nir_alu_instr *instr, unsigned src, case nir_search_value_variable: { nir_search_variable *var = nir_search_value_as_variable(value); + assert(var->variable < NIR_SEARCH_MAX_VARIABLES); if (state->variables_seen & (1 << var->variable)) { if (!nir_srcs_equal(state->variables[var->variable].src, -- 2.4.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [RFC PATCH] nir: Transform 4*x into x << 2 during late optimizations.
On Fri, May 8, 2015 at 11:11 AM, Ian Romanick wrote: > On 05/08/2015 03:36 AM, Kenneth Graunke wrote: >> According to Glenn, shifts on R600 have 5x the throughput as multiplies. >> >> Intel GPUs have strange integer multiplication restrictions - on most >> hardware, MUL actually only does a 32-bit x 16-bit multiply. This >> means the arguments aren't commutative, which can limit our constant >> propagation options. SHL has no such restrictions. >> >> Shifting is probably reasonable on most people's hardware, so let's just >> do that. >> >> i965 shader-db results (using NIR for VS): >> total instructions in shared programs: 7432587 -> 7388982 (-0.59%) >> instructions in affected programs: 1360411 -> 1316806 (-3.21%) >> helped:5772 >> HURT: 0 >> >> Signed-off-by: Kenneth Graunke >> Cc: matts...@gmail.com >> Cc: ja...@jlekstrand.net >> --- >> src/glsl/nir/nir_opt_algebraic.py | 5 + >> 1 file changed, 5 insertions(+) >> >> So...I found a bizarre issue with this patch. >> >>(('imul', 4, a), ('ishl', a, 2)), >> >> totally optimizes things. However... >> >>(('imul', a, 4), ('ishl', a, 2)), >> >> doesn't seem to do anything, even though imul is commutative, and nir_search >> should totally handle that... >> >> ▄▄ ▄▄▄▄ ▄ ▄▄ >> ██ ██ ▀▀▀██▀▀▀ ███ ██ >> ▀█▄ ██ ▄█▀ ██ ▄█▀ ██ >> ██ ██ ██ ██ ██ ██ ▄██▀ ██ >> ███▀▀███ ██ ██ ██ ▀▀ >> ███ ███ ▄██ ██▄ ██ ▄▄ ▄▄ >> ▀▀▀ ▀▀▀ ▀▀▀▀ ▀▀ ▀▀ ▀▀ >> >> If you know why, let me know, otherwise I may have to look into it when more >> awake. > > I've noticed a couple other weird things that I have been unable to > understand. Shaders like the one below end with fmul/ffma instaed of > flrp, for example. I understand why that happens from GLSL IR > opt_algebraic, but it seems like nir_opt_algebraic should handle it. Just a guess, but it's quite possibly due to the commutative operations bug I just sent a patch for. --Jason > [require] > GLSL >= 1.30 > > [vertex shader] > in vec4 v; > in vec2 tc_in; > > out vec2 tc; > > void main() { > gl_Position = v; > tc = tc_in; > } > > [fragment shader] > in vec2 tc; > > out vec4 color; > > uniform sampler2D s; > uniform float a; > uniform vec3 base_color; > > void main() { > vec3 tex_color = texture(s, tc).xyz; > > color.xyz = (base_color * a) + (tex_color * (1.0 - a)); > color.a = 1.0; > } > > > >> diff --git a/src/glsl/nir/nir_opt_algebraic.py >> b/src/glsl/nir/nir_opt_algebraic.py >> index 400d60e..350471f 100644 >> --- a/src/glsl/nir/nir_opt_algebraic.py >> +++ b/src/glsl/nir/nir_opt_algebraic.py >> @@ -247,6 +247,11 @@ late_optimizations = [ >> (('fge', ('fadd', a, b), 0.0), ('fge', a, ('fneg', b))), >> (('feq', ('fadd', a, b), 0.0), ('feq', a, ('fneg', b))), >> (('fne', ('fadd', a, b), 0.0), ('fne', a, ('fneg', b))), >> + >> + # Multiplication by 4 comes up fairly often in indirect offset >> calculations. >> + # Some GPUs have weird integer multiplication limitations, but shifts >> should work >> + # equally well everywhere. >> + (('imul', 4, a), ('ishl', a, 2)), > > This should be conditionalized on whether the platform has native integers. > >> ] >> >> print nir_algebraic.AlgebraicPass("nir_opt_algebraic", >> optimizations).render() >> > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nir/search: Save/restore the variables_seen bitmask when matching
On Fri, May 8, 2015 at 11:53 AM, Jason Ekstrand wrote: > total instructions in shared programs: 7152330 -> 7137006 (-0.21%) > instructions in affected programs: 1330548 -> 1315224 (-1.15%) > helped:5797 > HURT: 76 I'm doing some looking into the hurt programs. It seems as if there are some very strange interatctions between flrp and ffma. I'm still working out exactly how to fix it up. --Jason > GAINED:0 > LOST: 8 > --- > src/glsl/nir/nir_search.c | 11 +++ > 1 file changed, 11 insertions(+) > > diff --git a/src/glsl/nir/nir_search.c b/src/glsl/nir/nir_search.c > index d86655b..4c83349 100644 > --- a/src/glsl/nir/nir_search.c > +++ b/src/glsl/nir/nir_search.c > @@ -199,6 +199,10 @@ match_expression(const nir_search_expression *expr, > nir_alu_instr *instr, >} > } > > + /* Stash off the current variables_seen bitmask. This way we can > +* restore it prior to matching in the commutative case below. */ > + unsigned variables_seen_stash = state->variables_seen; > + > bool matched = true; > for (unsigned i = 0; i < nir_op_infos[instr->op].num_inputs; i++) { >/* If the source is an explicitly sized source, then we need to reset > @@ -221,6 +225,13 @@ match_expression(const nir_search_expression *expr, > nir_alu_instr *instr, > > if (nir_op_infos[instr->op].algebraic_properties & NIR_OP_IS_COMMUTATIVE) > { >assert(nir_op_infos[instr->op].num_inputs == 2); > + > + /* Restore the variables_seen bitmask. If we don't do this, then we > + * could end up with an erroneous failure due to variables found in the > + * first match attempt above not matching those in the second. > + */ > + state->variables_seen = variables_seen_stash; > + >if (!match_value(expr->srcs[0], instr, 1, num_components, > swizzle, state)) > return false; > -- > 2.4.0 > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] nir/search: Save/restore the variables_seen bitmask when matching
total instructions in shared programs: 7152330 -> 7137006 (-0.21%) instructions in affected programs: 1330548 -> 1315224 (-1.15%) helped:5797 HURT: 76 GAINED:0 LOST: 8 --- src/glsl/nir/nir_search.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/src/glsl/nir/nir_search.c b/src/glsl/nir/nir_search.c index d86655b..4c83349 100644 --- a/src/glsl/nir/nir_search.c +++ b/src/glsl/nir/nir_search.c @@ -199,6 +199,10 @@ match_expression(const nir_search_expression *expr, nir_alu_instr *instr, } } + /* Stash off the current variables_seen bitmask. This way we can +* restore it prior to matching in the commutative case below. */ + unsigned variables_seen_stash = state->variables_seen; + bool matched = true; for (unsigned i = 0; i < nir_op_infos[instr->op].num_inputs; i++) { /* If the source is an explicitly sized source, then we need to reset @@ -221,6 +225,13 @@ match_expression(const nir_search_expression *expr, nir_alu_instr *instr, if (nir_op_infos[instr->op].algebraic_properties & NIR_OP_IS_COMMUTATIVE) { assert(nir_op_infos[instr->op].num_inputs == 2); + + /* Restore the variables_seen bitmask. If we don't do this, then we + * could end up with an erroneous failure due to variables found in the + * first match attempt above not matching those in the second. + */ + state->variables_seen = variables_seen_stash; + if (!match_value(expr->srcs[0], instr, 1, num_components, swizzle, state)) return false; -- 2.4.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/fs: Disable opt_sampler_eot for textureGather
I don't have CHV or SKL hw or docs to try and confirm this, but this does what it claims to. Reviewed-by: Chris Forbes On Sat, May 9, 2015 at 5:10 AM, Neil Roberts wrote: > The opt_sampler_eot optimisation seems to break when the last > instruction is SHADER_OPCODE_TG4. A bunch of Piglit tests end up doing > this so it causes a lot of regressions. I can't find any documentation > or known workarounds to indicate that this is expected behaviour, but > considering that this is probably a pretty unlikely situation in a > real use case we might as well disable it in order to avoid the > regressions. In total this fixes 451 tests. > > Reviewed-by: Ben Widawsky > --- > > See here for some more discussion of this: > > http://lists.freedesktop.org/archives/mesa-dev/2015-May/083640.html > > As far as I can tell the Jenkins run mentioned in that email doesn't > seem to have any tests on Cherryview or Skylake so that probably > explains why it didn't pick up the regression. > > src/mesa/drivers/dri/i965/brw_fs.cpp | 10 ++ > 1 file changed, 10 insertions(+) > > diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp > b/src/mesa/drivers/dri/i965/brw_fs.cpp > index 8dd680e..e9528e0 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs.cpp > +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp > @@ -2655,6 +2655,16 @@ fs_visitor::opt_sampler_eot() > if (unlikely(tex_inst->is_head_sentinel()) || !tex_inst->is_tex()) >return false; > > + /* This optimisation doesn't seem to work for textureGather for some > +* reason. I can't find any documentation or known workarounds to indicate > +* that this is expected, but considering that it is probably pretty > +* unlikely that a shader would directly write out the results from > +* textureGather we might as well just disable it. > +*/ > + if (tex_inst->opcode == SHADER_OPCODE_TG4 || > + tex_inst->opcode == SHADER_OPCODE_TG4_OFFSET) > + return false; > + > /* If there's no header present, we need to munge the LOAD_PAYLOAD as > well. > * It's very likely to be the previous instruction. > */ > -- > 1.9.3 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/2 v3] i965: Use predicate enable bit for conditional rendering w/o stalling
On Friday, May 08, 2015 07:04:31 PM Neil Roberts wrote: > Previously whenever a primitive is drawn the driver would call > _mesa_check_conditional_render which blocks waiting for the result of > the query to determine whether to render. On Gen7+ there is a bit in > the 3DPRIMITIVE command which can be used to disable the primitive > based on the value of a state bit. This state bit can be set based on > whether two registers have different values using the MI_PREDICATE > command. We can load these two registers with the pixel count values > stored in the query begin and end to implement conditional rendering > without stalling. > > Unfortunately these two source registers were not in the whitelist of > available registers in the kernel driver until v3.19. This patch uses > the command parser version from intel_screen to detect whether to > attempt to set the predicate data registers. > > The predicate enable bit is currently only used for drawing 3D > primitives. For blits, clears, bitmaps, copypixels and drawpixels it > still causes a stall. For most of these it would probably just work to > call the new brw_check_conditional_render function instead of > _mesa_check_conditional_render because they already work in terms of > rendering primitives. However it's a bit trickier for blits because it > can use the BLT ring or the blorp codepath. I think these operations > are less useful for conditional rendering than rendering primitives so > it might be best to leave it for a later patch. > > v2: Use the command parser version to detect whether we can write to > the predicate data registers instead of trying to execute a > register load command. > v3: Simple rebase > --- > src/mesa/drivers/dri/i965/Makefile.sources | 1 + > src/mesa/drivers/dri/i965/brw_conditional_render.c | 167 > + > src/mesa/drivers/dri/i965/brw_context.c| 4 + > src/mesa/drivers/dri/i965/brw_context.h| 21 +++ > src/mesa/drivers/dri/i965/brw_defines.h| 1 + > src/mesa/drivers/dri/i965/brw_draw.c | 16 +- > src/mesa/drivers/dri/i965/brw_queryobj.c | 17 ++- > src/mesa/drivers/dri/i965/intel_extensions.c | 5 + > src/mesa/drivers/dri/i965/intel_reg.h | 23 +++ > 9 files changed, 243 insertions(+), 12 deletions(-) > create mode 100644 src/mesa/drivers/dri/i965/brw_conditional_render.c > > diff --git a/src/mesa/drivers/dri/i965/Makefile.sources > b/src/mesa/drivers/dri/i965/Makefile.sources > index 1ae93e1..a24c20a 100644 > --- a/src/mesa/drivers/dri/i965/Makefile.sources > +++ b/src/mesa/drivers/dri/i965/Makefile.sources > @@ -18,6 +18,7 @@ i965_FILES = \ > brw_clip_unfilled.c \ > brw_clip_util.c \ > brw_compute.c \ > + brw_conditional_render.c \ > brw_context.c \ > brw_context.h \ > brw_cs.cpp \ > diff --git a/src/mesa/drivers/dri/i965/brw_conditional_render.c > b/src/mesa/drivers/dri/i965/brw_conditional_render.c > new file mode 100644 > index 000..e676aa0 > --- /dev/null > +++ b/src/mesa/drivers/dri/i965/brw_conditional_render.c > @@ -0,0 +1,167 @@ > +/* > + * Copyright © 2014 Intel Corporation > + * > + * Permission is hereby granted, free of charge, to any person obtaining a > + * copy of this software and associated documentation files (the "Software"), > + * to deal in the Software without restriction, including without limitation > + * the rights to use, copy, modify, merge, publish, distribute, sublicense, > + * and/or sell copies of the Software, and to permit persons to whom the > + * Software is furnished to do so, subject to the following conditions: > + * > + * The above copyright notice and this permission notice (including the next > + * paragraph) shall be included in all copies or substantial portions of the > + * Software. > + * > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL > + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING > + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER > DEALINGS > + * IN THE SOFTWARE. > + * > + * Authors: > + *Neil Roberts > + */ > + > +/** @file brw_conditional_render.c > + * > + * Support for conditional rendering based on query objects > + * (GL_NV_conditional_render, GL_ARB_conditional_render_inverted) on Gen7+. > + */ > + > +#include "main/imports.h" > +#include "main/condrender.h" > + > +#include "brw_context.h" > +#include "brw_defines.h" > +#include "intel_batchbuffer.h" > + > +static void > +set_predicate_enable(struct brw_context *brw, > + bool value) > +{ > + if (value) > + brw->predicate.state = BRW_PREDICATE_STATE_RENDER; > + else > + brw->predicate.stat
Re: [Mesa-dev] [PATCH 1/5] clover: Add threadsafe wrappers for pipe_screen and pipe_context
Hi Tom, On 8 May 2015 at 00:36, Tom Stellard wrote: > Events can be added to an OpenCL command queue concurrently from > multiple threads, but pipe_context and pipe_screen objects > are not threadsafe. The threadsafe wrappers protect all pipe_screen > and pipe_context function calls with a mutex, so we can safely use > them with multiple threads. > --- > src/gallium/state_trackers/clover/Makefile.am | 6 +- > src/gallium/state_trackers/clover/Makefile.sources | 4 + > src/gallium/state_trackers/clover/core/device.cpp | 2 + > .../clover/core/pipe_threadsafe_context.c | 272 > + > .../clover/core/pipe_threadsafe_screen.c | 184 ++ > .../state_trackers/clover/core/threadsafe.h| 39 +++ > src/gallium/targets/opencl/Makefile.am | 3 +- > 7 files changed, 508 insertions(+), 2 deletions(-) > create mode 100644 > src/gallium/state_trackers/clover/core/pipe_threadsafe_context.c > create mode 100644 > src/gallium/state_trackers/clover/core/pipe_threadsafe_screen.c > create mode 100644 src/gallium/state_trackers/clover/core/threadsafe.h > > --- a/src/gallium/state_trackers/clover/Makefile.sources > +++ b/src/gallium/state_trackers/clover/Makefile.sources > @@ -53,6 +53,10 @@ CPP_SOURCES := \ > util/range.hpp \ > util/tuple.hpp > > +C_SOURCES := \ > + core/pipe_threadsafe_context.c \ > + core/pipe_threadsafe_screen.c > + Regardless of the approach (re Francisco's comment), please keep the header (core/threadsafe.h) within the sources list. Thanks Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [RFC PATCH] nir: Transform 4*x into x << 2 during late optimizations.
On Fri, May 8, 2015 at 6:36 AM, Kenneth Graunke wrote: > + # Multiplication by 4 comes up fairly often in indirect offset > calculations. > + # Some GPUs have weird integer multiplication limitations, but shifts > should work > + # equally well everywhere. > + (('imul', 4, a), ('ishl', a, 2)), Not sure what the cost of doing it this way, but really you want all powers of 2... and also udiv -> shr. Since this is python, should be easy enough to append onto that list. AFAIK all GPU's prefer a shift over a mul. Adreno doen't have 32-bit imul in the first place (and no idiv either). In nouveau/codegen we just have a single check for whether the immediate is a power of 2, perhaps that can be encoded here in some way. Cheers, -ilia ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/fs_nir: Put the immediate in src1 for commutative ALU ops
On Fri, May 8, 2015 at 11:15 AM, Matt Turner wrote: > On Fri, May 8, 2015 at 11:13 AM, Jason Ekstrand wrote: >> On Fri, May 8, 2015 at 11:05 AM, Kenneth Graunke >> wrote: >>> On Friday, May 08, 2015 10:18:44 AM Matt Turner wrote: On Fri, May 8, 2015 at 10:04 AM, Jason Ekstrand wrote: > Shader-db results for fragment shaders on Broadwell: > >total instructions in shared programs: 4310987 -> 4310663 (-0.01%) >instructions in affected programs: 43242 -> 42918 (-0.75%) >helped:142 >HURT: 0 > > Shader-db results for vertex shaders on Broadwell: > >total instructions in shared programs: 2889581 -> 2844309 (-1.57%) >instructions in affected programs: 1418720 -> 1373448 (-3.19%) >helped:6139 >HURT: 0 > --- > src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 12 > 1 file changed, 12 insertions(+) > > diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > index 555987d..161a262 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > @@ -21,6 +21,8 @@ > * IN THE SOFTWARE. > */ > > +#include > + > #include "glsl/ir.h" > #include "glsl/ir_optimization.h" > #include "glsl/nir/glsl_to_nir.h" > @@ -662,6 +664,16 @@ fs_visitor::nir_emit_alu(nir_alu_instr *instr) >op[i] = offset(op[i], instr->src[i].swizzle[channel]); > } > > + /* Immediates can only be used as the second source of two-source > +* instructions. We have code in opt_algebraic to flip them as > needed > +* for most instructions. However, it doesn't hurt anything to just > do > +* the right thing if we can detect it at the NIR level. > +*/ > + if ((nir_op_infos[instr->op].algebraic_properties & > NIR_OP_IS_COMMUTATIVE) && > + nir_src_as_const_value(instr->src[0].src)) { > + std::swap(op[0], op[1]); > + } > + The real problem is that we haven't lifted the restriction about non-commutative integer multiply on Broadwell: brw_fs_copy_propagation.cpp: /* Fit this constant in by commuting the operands. * Exception: we can't do this for 32-bit integer MUL/MACH * because it's asymmetric. */ if ((inst->opcode == BRW_OPCODE_MUL || inst->opcode == BRW_OPCODE_MACH) && (inst->src[1].type == BRW_REGISTER_TYPE_D || inst->src[1].type == BRW_REGISTER_TYPE_UD)) break; inst->src[0] = inst->src[1]; inst->src[1] = val; progress = true; >>> >>> Yeah. I also wrote a patch to do that, adding >>> >>>(brw->gen < 8 || brw->is_cherryview) && >> >> In that case, shouldn't Cherry View take the same path as hsw when >> emitting multiplies and look for 15-bit constants? > > Almost... that path needs to set one of the MUL's source types to W/UW > and the stride to 2, like in commit 0c06d019. I'm planning to rip out > all of the multiplication code from brw_fs_visitor.cpp/brw_fs_nir.cpp > and move it to a common lowering pass, so I'll fix that at the same > time. Awesome! Thanks for working on that! ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/fs_nir: Put the immediate in src1 for commutative ALU ops
On Fri, May 8, 2015 at 11:13 AM, Jason Ekstrand wrote: > On Fri, May 8, 2015 at 11:05 AM, Kenneth Graunke > wrote: >> On Friday, May 08, 2015 10:18:44 AM Matt Turner wrote: >>> On Fri, May 8, 2015 at 10:04 AM, Jason Ekstrand >>> wrote: >>> > Shader-db results for fragment shaders on Broadwell: >>> > >>> >total instructions in shared programs: 4310987 -> 4310663 (-0.01%) >>> >instructions in affected programs: 43242 -> 42918 (-0.75%) >>> >helped:142 >>> >HURT: 0 >>> > >>> > Shader-db results for vertex shaders on Broadwell: >>> > >>> >total instructions in shared programs: 2889581 -> 2844309 (-1.57%) >>> >instructions in affected programs: 1418720 -> 1373448 (-3.19%) >>> >helped:6139 >>> >HURT: 0 >>> > --- >>> > src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 12 >>> > 1 file changed, 12 insertions(+) >>> > >>> > diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp >>> > b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp >>> > index 555987d..161a262 100644 >>> > --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp >>> > +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp >>> > @@ -21,6 +21,8 @@ >>> > * IN THE SOFTWARE. >>> > */ >>> > >>> > +#include >>> > + >>> > #include "glsl/ir.h" >>> > #include "glsl/ir_optimization.h" >>> > #include "glsl/nir/glsl_to_nir.h" >>> > @@ -662,6 +664,16 @@ fs_visitor::nir_emit_alu(nir_alu_instr *instr) >>> >op[i] = offset(op[i], instr->src[i].swizzle[channel]); >>> > } >>> > >>> > + /* Immediates can only be used as the second source of two-source >>> > +* instructions. We have code in opt_algebraic to flip them as needed >>> > +* for most instructions. However, it doesn't hurt anything to just >>> > do >>> > +* the right thing if we can detect it at the NIR level. >>> > +*/ >>> > + if ((nir_op_infos[instr->op].algebraic_properties & >>> > NIR_OP_IS_COMMUTATIVE) && >>> > + nir_src_as_const_value(instr->src[0].src)) { >>> > + std::swap(op[0], op[1]); >>> > + } >>> > + >>> >>> The real problem is that we haven't lifted the restriction about >>> non-commutative integer multiply on Broadwell: >>> >>> brw_fs_copy_propagation.cpp: >>> >>>/* Fit this constant in by commuting the operands. >>> * Exception: we can't do this for 32-bit integer MUL/MACH >>> * because it's asymmetric. >>> */ >>>if ((inst->opcode == BRW_OPCODE_MUL || >>> inst->opcode == BRW_OPCODE_MACH) && >>>(inst->src[1].type == BRW_REGISTER_TYPE_D || >>> inst->src[1].type == BRW_REGISTER_TYPE_UD)) >>> break; >>>inst->src[0] = inst->src[1]; >>>inst->src[1] = val; >>>progress = true; >> >> Yeah. I also wrote a patch to do that, adding >> >>(brw->gen < 8 || brw->is_cherryview) && > > In that case, shouldn't Cherry View take the same path as hsw when > emitting multiplies and look for 15-bit constants? Almost... that path needs to set one of the MUL's source types to W/UW and the stride to 2, like in commit 0c06d019. I'm planning to rip out all of the multiplication code from brw_fs_visitor.cpp/brw_fs_nir.cpp and move it to a common lowering pass, so I'll fix that at the same time. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/5] nir: Define image load, store and atomic intrinsics.
On Tuesday, May 05, 2015 11:29:52 PM Francisco Jerez wrote: > --- > src/glsl/nir/nir_intrinsics.h | 27 +++ > 1 file changed, 27 insertions(+) > > diff --git a/src/glsl/nir/nir_intrinsics.h b/src/glsl/nir/nir_intrinsics.h > index 8e28765..4b13c75 100644 > --- a/src/glsl/nir/nir_intrinsics.h > +++ b/src/glsl/nir/nir_intrinsics.h > @@ -89,6 +89,33 @@ ATOMIC(inc, 0) > ATOMIC(dec, 0) > ATOMIC(read, NIR_INTRINSIC_CAN_ELIMINATE) > > +/* > + * Image load, store and atomic intrinsics. > + * > + * All image intrinsics take an image target passed as a nir_variable. Image > + * variables contain a number of memory and layout qualifiers that influence > + * the semantics of the intrinsic. > + * > + * All image intrinsics take a four-coordinate vector and a sample index as > + * first two sources, determining the location within the image that will be > + * accessed by the intrinsic. Components not applicable to the image target > + * in use are equal to zero by convention. Image store takes an additional > + * four-component argument with the value to be written, and image atomic > + * operations take either one or two additional scalar arguments with the > same > + * meaning as in the ARB_shader_image_load_store specification. > + */ > +INTRINSIC(image_load, 2, ARR(4, 1), true, 4, 1, 0, > + NIR_INTRINSIC_CAN_ELIMINATE) > +INTRINSIC(image_store, 3, ARR(4, 1, 4), false, 0, 1, 0, 0) > +INTRINSIC(image_atomic_add, 3, ARR(4, 1, 1), true, 1, 1, 0, 0) > +INTRINSIC(image_atomic_min, 3, ARR(4, 1, 1), true, 1, 1, 0, 0) > +INTRINSIC(image_atomic_max, 3, ARR(4, 1, 1), true, 1, 1, 0, 0) > +INTRINSIC(image_atomic_and, 3, ARR(4, 1, 1), true, 1, 1, 0, 0) > +INTRINSIC(image_atomic_or, 3, ARR(4, 1, 1), true, 1, 1, 0, 0) > +INTRINSIC(image_atomic_xor, 3, ARR(4, 1, 1), true, 1, 1, 0, 0) > +INTRINSIC(image_atomic_exchange, 3, ARR(4, 1, 1), true, 1, 1, 0, 0) > +INTRINSIC(image_atomic_comp_swap, 4, ARR(4, 1, 1, 1), true, 1, 1, 0, 0) > + > #define SYSTEM_VALUE(name, components) \ > INTRINSIC(load_##name, 0, ARR(), true, components, 0, 0, \ > NIR_INTRINSIC_CAN_ELIMINATE | NIR_INTRINSIC_CAN_REORDER) > Looks great, Curro. Expanding the coordinate to a vec4 and always having the parameters present for e.g. sample_index does reduce the combinatorial explosion a lot. FWIW, I also prefer undefined. This should work out fine for SPIR-V too, it's pretty trivial to go from their style to this (just combine the two - it's SSA after all). These 5 are: Reviewed-by: Kenneth Graunke signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/fs_nir: Put the immediate in src1 for commutative ALU ops
On Fri, May 8, 2015 at 11:05 AM, Kenneth Graunke wrote: > On Friday, May 08, 2015 10:18:44 AM Matt Turner wrote: >> On Fri, May 8, 2015 at 10:04 AM, Jason Ekstrand wrote: >> > Shader-db results for fragment shaders on Broadwell: >> > >> >total instructions in shared programs: 4310987 -> 4310663 (-0.01%) >> >instructions in affected programs: 43242 -> 42918 (-0.75%) >> >helped:142 >> >HURT: 0 >> > >> > Shader-db results for vertex shaders on Broadwell: >> > >> >total instructions in shared programs: 2889581 -> 2844309 (-1.57%) >> >instructions in affected programs: 1418720 -> 1373448 (-3.19%) >> >helped:6139 >> >HURT: 0 >> > --- >> > src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 12 >> > 1 file changed, 12 insertions(+) >> > >> > diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp >> > b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp >> > index 555987d..161a262 100644 >> > --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp >> > +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp >> > @@ -21,6 +21,8 @@ >> > * IN THE SOFTWARE. >> > */ >> > >> > +#include >> > + >> > #include "glsl/ir.h" >> > #include "glsl/ir_optimization.h" >> > #include "glsl/nir/glsl_to_nir.h" >> > @@ -662,6 +664,16 @@ fs_visitor::nir_emit_alu(nir_alu_instr *instr) >> >op[i] = offset(op[i], instr->src[i].swizzle[channel]); >> > } >> > >> > + /* Immediates can only be used as the second source of two-source >> > +* instructions. We have code in opt_algebraic to flip them as needed >> > +* for most instructions. However, it doesn't hurt anything to just do >> > +* the right thing if we can detect it at the NIR level. >> > +*/ >> > + if ((nir_op_infos[instr->op].algebraic_properties & >> > NIR_OP_IS_COMMUTATIVE) && >> > + nir_src_as_const_value(instr->src[0].src)) { >> > + std::swap(op[0], op[1]); >> > + } >> > + >> >> The real problem is that we haven't lifted the restriction about >> non-commutative integer multiply on Broadwell: >> >> brw_fs_copy_propagation.cpp: >> >>/* Fit this constant in by commuting the operands. >> * Exception: we can't do this for 32-bit integer MUL/MACH >> * because it's asymmetric. >> */ >>if ((inst->opcode == BRW_OPCODE_MUL || >> inst->opcode == BRW_OPCODE_MACH) && >>(inst->src[1].type == BRW_REGISTER_TYPE_D || >> inst->src[1].type == BRW_REGISTER_TYPE_UD)) >> break; >>inst->src[0] = inst->src[1]; >>inst->src[1] = val; >>progress = true; > > Yeah. I also wrote a patch to do that, adding > >(brw->gen < 8 || brw->is_cherryview) && In that case, shouldn't Cherry View take the same path as hsw when emitting multiplies and look for 15-bit constants? > which also solves the problem. But it won't help on Cherryview, which I > believe still has the asymmetrical multiply, while switching to shifts > will. I suppose another alternative to NIR late optimizations is to > have brw_fs_nir's handling of imul check for power of two sizes and emit > a SHL. That would be easy. I really don't think SHL is the issue here. It's that we're being stupid about multiplies. SHL is a nice hack but unless it is actually faster, I think it's hacking around the problem. > I do think we really need to make logical IMUL opcodes and lower them to > MUL/MACH as needed later, so we don't need to worry about breaking MACH > in cases like this. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/fs_nir: Put the immediate in src1 for commutative ALU ops
On Fri, May 8, 2015 at 11:05 AM, Kenneth Graunke wrote: > On Friday, May 08, 2015 10:18:44 AM Matt Turner wrote: >> On Fri, May 8, 2015 at 10:04 AM, Jason Ekstrand wrote: >> > Shader-db results for fragment shaders on Broadwell: >> > >> >total instructions in shared programs: 4310987 -> 4310663 (-0.01%) >> >instructions in affected programs: 43242 -> 42918 (-0.75%) >> >helped:142 >> >HURT: 0 >> > >> > Shader-db results for vertex shaders on Broadwell: >> > >> >total instructions in shared programs: 2889581 -> 2844309 (-1.57%) >> >instructions in affected programs: 1418720 -> 1373448 (-3.19%) >> >helped:6139 >> >HURT: 0 >> > --- >> > src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 12 >> > 1 file changed, 12 insertions(+) >> > >> > diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp >> > b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp >> > index 555987d..161a262 100644 >> > --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp >> > +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp >> > @@ -21,6 +21,8 @@ >> > * IN THE SOFTWARE. >> > */ >> > >> > +#include >> > + >> > #include "glsl/ir.h" >> > #include "glsl/ir_optimization.h" >> > #include "glsl/nir/glsl_to_nir.h" >> > @@ -662,6 +664,16 @@ fs_visitor::nir_emit_alu(nir_alu_instr *instr) >> >op[i] = offset(op[i], instr->src[i].swizzle[channel]); >> > } >> > >> > + /* Immediates can only be used as the second source of two-source >> > +* instructions. We have code in opt_algebraic to flip them as needed >> > +* for most instructions. However, it doesn't hurt anything to just do >> > +* the right thing if we can detect it at the NIR level. >> > +*/ >> > + if ((nir_op_infos[instr->op].algebraic_properties & >> > NIR_OP_IS_COMMUTATIVE) && >> > + nir_src_as_const_value(instr->src[0].src)) { >> > + std::swap(op[0], op[1]); >> > + } >> > + >> >> The real problem is that we haven't lifted the restriction about >> non-commutative integer multiply on Broadwell: >> >> brw_fs_copy_propagation.cpp: >> >>/* Fit this constant in by commuting the operands. >> * Exception: we can't do this for 32-bit integer MUL/MACH >> * because it's asymmetric. >> */ >>if ((inst->opcode == BRW_OPCODE_MUL || >> inst->opcode == BRW_OPCODE_MACH) && >>(inst->src[1].type == BRW_REGISTER_TYPE_D || >> inst->src[1].type == BRW_REGISTER_TYPE_UD)) >> break; >>inst->src[0] = inst->src[1]; >>inst->src[1] = val; >>progress = true; > > Yeah. I also wrote a patch to do that, adding > >(brw->gen < 8 || brw->is_cherryview) && > > which also solves the problem. But it won't help on Cherryview, which I > believe still has the asymmetrical multiply, while switching to shifts > will. I suppose another alternative to NIR late optimizations is to > have brw_fs_nir's handling of imul check for power of two sizes and emit > a SHL. That would be easy. > > I do think we really need to make logical IMUL opcodes and lower them to > MUL/MACH as needed later, so we don't need to worry about breaking MACH > in cases like this. Indeed. I mentioned yesterday I've been planning to do it for a while, so I'll go ahead and take care of it. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [RFC PATCH] nir: Transform 4*x into x << 2 during late optimizations.
On 05/08/2015 03:36 AM, Kenneth Graunke wrote: > According to Glenn, shifts on R600 have 5x the throughput as multiplies. > > Intel GPUs have strange integer multiplication restrictions - on most > hardware, MUL actually only does a 32-bit x 16-bit multiply. This > means the arguments aren't commutative, which can limit our constant > propagation options. SHL has no such restrictions. > > Shifting is probably reasonable on most people's hardware, so let's just > do that. > > i965 shader-db results (using NIR for VS): > total instructions in shared programs: 7432587 -> 7388982 (-0.59%) > instructions in affected programs: 1360411 -> 1316806 (-3.21%) > helped:5772 > HURT: 0 > > Signed-off-by: Kenneth Graunke > Cc: matts...@gmail.com > Cc: ja...@jlekstrand.net > --- > src/glsl/nir/nir_opt_algebraic.py | 5 + > 1 file changed, 5 insertions(+) > > So...I found a bizarre issue with this patch. > >(('imul', 4, a), ('ishl', a, 2)), > > totally optimizes things. However... > >(('imul', a, 4), ('ishl', a, 2)), > > doesn't seem to do anything, even though imul is commutative, and nir_search > should totally handle that... > > ▄▄ ▄▄▄▄ ▄ ▄▄ > ██ ██ ▀▀▀██▀▀▀ ███ ██ > ▀█▄ ██ ▄█▀ ██ ▄█▀ ██ > ██ ██ ██ ██ ██ ██ ▄██▀ ██ > ███▀▀███ ██ ██ ██ ▀▀ > ███ ███ ▄██ ██▄ ██ ▄▄ ▄▄ > ▀▀▀ ▀▀▀ ▀▀▀▀ ▀▀ ▀▀ ▀▀ > > If you know why, let me know, otherwise I may have to look into it when more > awake. I've noticed a couple other weird things that I have been unable to understand. Shaders like the one below end with fmul/ffma instaed of flrp, for example. I understand why that happens from GLSL IR opt_algebraic, but it seems like nir_opt_algebraic should handle it. [require] GLSL >= 1.30 [vertex shader] in vec4 v; in vec2 tc_in; out vec2 tc; void main() { gl_Position = v; tc = tc_in; } [fragment shader] in vec2 tc; out vec4 color; uniform sampler2D s; uniform float a; uniform vec3 base_color; void main() { vec3 tex_color = texture(s, tc).xyz; color.xyz = (base_color * a) + (tex_color * (1.0 - a)); color.a = 1.0; } > diff --git a/src/glsl/nir/nir_opt_algebraic.py > b/src/glsl/nir/nir_opt_algebraic.py > index 400d60e..350471f 100644 > --- a/src/glsl/nir/nir_opt_algebraic.py > +++ b/src/glsl/nir/nir_opt_algebraic.py > @@ -247,6 +247,11 @@ late_optimizations = [ > (('fge', ('fadd', a, b), 0.0), ('fge', a, ('fneg', b))), > (('feq', ('fadd', a, b), 0.0), ('feq', a, ('fneg', b))), > (('fne', ('fadd', a, b), 0.0), ('fne', a, ('fneg', b))), > + > + # Multiplication by 4 comes up fairly often in indirect offset > calculations. > + # Some GPUs have weird integer multiplication limitations, but shifts > should work > + # equally well everywhere. > + (('imul', 4, a), ('ishl', a, 2)), This should be conditionalized on whether the platform has native integers. > ] > > print nir_algebraic.AlgebraicPass("nir_opt_algebraic", > optimizations).render() > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] i965: Store the command parser version number in intel_screen
In order to detect whether the predicate source registers can be used in a later patch we will need to know the version number for the command parser. This patch just adds a member to intel_screen and does an ioctl to get the version. Reviewed-by: Kenneth Graunke --- src/mesa/drivers/dri/i965/intel_screen.c | 7 +++ src/mesa/drivers/dri/i965/intel_screen.h | 8 +++- 2 files changed, 14 insertions(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/intel_screen.c b/src/mesa/drivers/dri/i965/intel_screen.c index dda1638..896a125 100644 --- a/src/mesa/drivers/dri/i965/intel_screen.c +++ b/src/mesa/drivers/dri/i965/intel_screen.c @@ -1407,6 +1407,13 @@ __DRIconfig **intelInitScreen2(__DRIscreen *psp) (ret != -1 || errno != EINVAL); } + struct drm_i915_getparam getparam; + getparam.param = I915_PARAM_CMD_PARSER_VERSION; + getparam.value = &intelScreen->cmd_parser_version; + const int ret = drmIoctl(psp->fd, DRM_IOCTL_I915_GETPARAM, &getparam); + if (ret == -1) + intelScreen->cmd_parser_version = 0; + psp->extensions = !intelScreen->has_context_reset_notification ? intelScreenExtensions : intelRobustScreenExtensions; diff --git a/src/mesa/drivers/dri/i965/intel_screen.h b/src/mesa/drivers/dri/i965/intel_screen.h index e7a1490..742b3d3 100644 --- a/src/mesa/drivers/dri/i965/intel_screen.h +++ b/src/mesa/drivers/dri/i965/intel_screen.h @@ -72,7 +72,13 @@ struct intel_screen * Configuration cache with default values for all contexts */ driOptionCache optionCache; -}; + + /** +* Version of the command parser reported by the +* I915_PARAM_CMD_PARSER_VERSION parameter +*/ + int cmd_parser_version; + }; extern void intelDestroyContext(__DRIcontext * driContextPriv); -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/fs_nir: Put the immediate in src1 for commutative ALU ops
On Friday, May 08, 2015 10:18:44 AM Matt Turner wrote: > On Fri, May 8, 2015 at 10:04 AM, Jason Ekstrand wrote: > > Shader-db results for fragment shaders on Broadwell: > > > >total instructions in shared programs: 4310987 -> 4310663 (-0.01%) > >instructions in affected programs: 43242 -> 42918 (-0.75%) > >helped:142 > >HURT: 0 > > > > Shader-db results for vertex shaders on Broadwell: > > > >total instructions in shared programs: 2889581 -> 2844309 (-1.57%) > >instructions in affected programs: 1418720 -> 1373448 (-3.19%) > >helped:6139 > >HURT: 0 > > --- > > src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 12 > > 1 file changed, 12 insertions(+) > > > > diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > > b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > > index 555987d..161a262 100644 > > --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > > +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > > @@ -21,6 +21,8 @@ > > * IN THE SOFTWARE. > > */ > > > > +#include > > + > > #include "glsl/ir.h" > > #include "glsl/ir_optimization.h" > > #include "glsl/nir/glsl_to_nir.h" > > @@ -662,6 +664,16 @@ fs_visitor::nir_emit_alu(nir_alu_instr *instr) > >op[i] = offset(op[i], instr->src[i].swizzle[channel]); > > } > > > > + /* Immediates can only be used as the second source of two-source > > +* instructions. We have code in opt_algebraic to flip them as needed > > +* for most instructions. However, it doesn't hurt anything to just do > > +* the right thing if we can detect it at the NIR level. > > +*/ > > + if ((nir_op_infos[instr->op].algebraic_properties & > > NIR_OP_IS_COMMUTATIVE) && > > + nir_src_as_const_value(instr->src[0].src)) { > > + std::swap(op[0], op[1]); > > + } > > + > > The real problem is that we haven't lifted the restriction about > non-commutative integer multiply on Broadwell: > > brw_fs_copy_propagation.cpp: > >/* Fit this constant in by commuting the operands. > * Exception: we can't do this for 32-bit integer MUL/MACH > * because it's asymmetric. > */ >if ((inst->opcode == BRW_OPCODE_MUL || > inst->opcode == BRW_OPCODE_MACH) && >(inst->src[1].type == BRW_REGISTER_TYPE_D || > inst->src[1].type == BRW_REGISTER_TYPE_UD)) > break; >inst->src[0] = inst->src[1]; >inst->src[1] = val; >progress = true; Yeah. I also wrote a patch to do that, adding (brw->gen < 8 || brw->is_cherryview) && which also solves the problem. But it won't help on Cherryview, which I believe still has the asymmetrical multiply, while switching to shifts will. I suppose another alternative to NIR late optimizations is to have brw_fs_nir's handling of imul check for power of two sizes and emit a SHL. That would be easy. I do think we really need to make logical IMUL opcodes and lower them to MUL/MACH as needed later, so we don't need to worry about breaking MACH in cases like this. signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2 v3] i965: Use predicate enable bit for conditional rendering w/o stalling
Previously whenever a primitive is drawn the driver would call _mesa_check_conditional_render which blocks waiting for the result of the query to determine whether to render. On Gen7+ there is a bit in the 3DPRIMITIVE command which can be used to disable the primitive based on the value of a state bit. This state bit can be set based on whether two registers have different values using the MI_PREDICATE command. We can load these two registers with the pixel count values stored in the query begin and end to implement conditional rendering without stalling. Unfortunately these two source registers were not in the whitelist of available registers in the kernel driver until v3.19. This patch uses the command parser version from intel_screen to detect whether to attempt to set the predicate data registers. The predicate enable bit is currently only used for drawing 3D primitives. For blits, clears, bitmaps, copypixels and drawpixels it still causes a stall. For most of these it would probably just work to call the new brw_check_conditional_render function instead of _mesa_check_conditional_render because they already work in terms of rendering primitives. However it's a bit trickier for blits because it can use the BLT ring or the blorp codepath. I think these operations are less useful for conditional rendering than rendering primitives so it might be best to leave it for a later patch. v2: Use the command parser version to detect whether we can write to the predicate data registers instead of trying to execute a register load command. v3: Simple rebase --- src/mesa/drivers/dri/i965/Makefile.sources | 1 + src/mesa/drivers/dri/i965/brw_conditional_render.c | 167 + src/mesa/drivers/dri/i965/brw_context.c| 4 + src/mesa/drivers/dri/i965/brw_context.h| 21 +++ src/mesa/drivers/dri/i965/brw_defines.h| 1 + src/mesa/drivers/dri/i965/brw_draw.c | 16 +- src/mesa/drivers/dri/i965/brw_queryobj.c | 17 ++- src/mesa/drivers/dri/i965/intel_extensions.c | 5 + src/mesa/drivers/dri/i965/intel_reg.h | 23 +++ 9 files changed, 243 insertions(+), 12 deletions(-) create mode 100644 src/mesa/drivers/dri/i965/brw_conditional_render.c diff --git a/src/mesa/drivers/dri/i965/Makefile.sources b/src/mesa/drivers/dri/i965/Makefile.sources index 1ae93e1..a24c20a 100644 --- a/src/mesa/drivers/dri/i965/Makefile.sources +++ b/src/mesa/drivers/dri/i965/Makefile.sources @@ -18,6 +18,7 @@ i965_FILES = \ brw_clip_unfilled.c \ brw_clip_util.c \ brw_compute.c \ + brw_conditional_render.c \ brw_context.c \ brw_context.h \ brw_cs.cpp \ diff --git a/src/mesa/drivers/dri/i965/brw_conditional_render.c b/src/mesa/drivers/dri/i965/brw_conditional_render.c new file mode 100644 index 000..e676aa0 --- /dev/null +++ b/src/mesa/drivers/dri/i965/brw_conditional_render.c @@ -0,0 +1,167 @@ +/* + * Copyright © 2014 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + * + * Authors: + *Neil Roberts + */ + +/** @file brw_conditional_render.c + * + * Support for conditional rendering based on query objects + * (GL_NV_conditional_render, GL_ARB_conditional_render_inverted) on Gen7+. + */ + +#include "main/imports.h" +#include "main/condrender.h" + +#include "brw_context.h" +#include "brw_defines.h" +#include "intel_batchbuffer.h" + +static void +set_predicate_enable(struct brw_context *brw, + bool value) +{ + if (value) + brw->predicate.state = BRW_PREDICATE_STATE_RENDER; + else + brw->predicate.state = BRW_PREDICATE_STATE_DONT_RENDER; +} + +static void +load_64bit_register(struct brw_context *brw, +uint32_t reg, +drm_intel_bo *bo, +uint32_t offset) +{ + int i; + + for (i = 0; i < 2; i++) { +
[Mesa-dev] [PATCH 0/2] i965: Do conditional rendering in hardware
I thought it might be a good idea to try posting these patches again since it's been 6 months since they were originally posted. The patches are a lot more useful now since the command parser in the kernel is working correctly for Haswell. This means the functionality is no longer restricted to only Ivybridge. I haven't changed the patches apart from some minor rebasing. I've rerun it through Piglit on Ivybridge and it doesn't cause any regressions. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] clover: add --with-icd-file-dir option
On 8 May 2015 at 03:10, Michel Dänzer wrote: > On 08.05.2015 03:24, Tom Stellard wrote: >> For this particular situation, I'm happy with any solution that: >> >> 1. Allows a user to install the icd file to /etc if he or she wants to. > > --sysconfdir=/etc > > That covers drirc as well. > > >> 2. Does not require the user to read the spec to know that /etc is the >> correct place to install it. > > I think the above is pretty standard for autotools projects. I think it > would be better to document this in the appropriate place(s) for OpenCL > users than to add another convoluted option which doesn't really add any > flexibility. > Very well said Michel. I'm suspecting that the latest approach will lead to quite a bit of confusion. Even amongst experienced package maintainers. If in doubt one can ping the debian, arch and/or fedora guys (i.e the ones shipping opencl) for their 2c :-) -Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] main: glGetIntegeri_v fails for GL_VERTEX_BINDING_STRIDE
On 8 May 2015 at 11:36, Tapani Pälli wrote: > That is strange, it was introduced in fb370f89d but then has gone missing .. > Seems like it broke shortly after it was introduced (around 10.1.0-devel) commit 902f9df36bec7d67a2d8bc4c24d89d9d57964903 Author: Francisco Jerez Date: Mon Nov 25 10:11:59 2013 -0800 mesa: Add image parameter queries for ARB_shader_image_load_store. The fix looks great - curious that we don't have piglits for these. Whomever pushes this, please add Cc: "10.4 10.5" Signed-off-by: Emil Velikov -Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/5] clover: Add threadsafe wrappers for pipe_screen and pipe_context
Tom Stellard writes: > Events can be added to an OpenCL command queue concurrently from > multiple threads, but pipe_context and pipe_screen objects > are not threadsafe. The threadsafe wrappers protect all pipe_screen > and pipe_context function calls with a mutex, so we can safely use > them with multiple threads. > --- > src/gallium/state_trackers/clover/Makefile.am | 6 +- > src/gallium/state_trackers/clover/Makefile.sources | 4 + > src/gallium/state_trackers/clover/core/device.cpp | 2 + > .../clover/core/pipe_threadsafe_context.c | 272 > + > .../clover/core/pipe_threadsafe_screen.c | 184 ++ > .../state_trackers/clover/core/threadsafe.h| 39 +++ > src/gallium/targets/opencl/Makefile.am | 3 +- > 7 files changed, 508 insertions(+), 2 deletions(-) > create mode 100644 > src/gallium/state_trackers/clover/core/pipe_threadsafe_context.c > create mode 100644 > src/gallium/state_trackers/clover/core/pipe_threadsafe_screen.c > create mode 100644 src/gallium/state_trackers/clover/core/threadsafe.h > > diff --git a/src/gallium/state_trackers/clover/Makefile.am > b/src/gallium/state_trackers/clover/Makefile.am > index f46d9ef..8b615ae 100644 > --- a/src/gallium/state_trackers/clover/Makefile.am > +++ b/src/gallium/state_trackers/clover/Makefile.am > @@ -1,5 +1,6 @@ > AUTOMAKE_OPTIONS = subdir-objects > > +include $(top_srcdir)/src/gallium/Automake.inc > include Makefile.sources > > AM_CPPFLAGS = \ > @@ -32,6 +33,9 @@ cl_HEADERS = \ > $(top_srcdir)/include/CL/opencl.h > endif > > +AM_CFLAGS = \ > + $(GALLIUM_CFLAGS) > + > noinst_LTLIBRARIES = libclover.la libcltgsi.la libclllvm.la > > libcltgsi_la_CXXFLAGS = \ > @@ -58,6 +62,6 @@ libclover_la_CXXFLAGS = \ > libclover_la_LIBADD = \ > libcltgsi.la libclllvm.la > > -libclover_la_SOURCES = $(CPP_SOURCES) > +libclover_la_SOURCES = $(CPP_SOURCES) $(C_SOURCES) > > EXTRA_DIST = Doxyfile > diff --git a/src/gallium/state_trackers/clover/Makefile.sources > b/src/gallium/state_trackers/clover/Makefile.sources > index 10bbda0..90e6b7e 100644 > --- a/src/gallium/state_trackers/clover/Makefile.sources > +++ b/src/gallium/state_trackers/clover/Makefile.sources > @@ -53,6 +53,10 @@ CPP_SOURCES := \ > util/range.hpp \ > util/tuple.hpp > > +C_SOURCES := \ > + core/pipe_threadsafe_context.c \ > + core/pipe_threadsafe_screen.c > + > LLVM_SOURCES := \ > llvm/invocation.cpp > > diff --git a/src/gallium/state_trackers/clover/core/device.cpp > b/src/gallium/state_trackers/clover/core/device.cpp > index 42b45b7..b145027 100644 > --- a/src/gallium/state_trackers/clover/core/device.cpp > +++ b/src/gallium/state_trackers/clover/core/device.cpp > @@ -22,6 +22,7 @@ > > #include "core/device.hpp" > #include "core/platform.hpp" > +#include "core/threadsafe.h" > #include "pipe/p_screen.h" > #include "pipe/p_state.h" > > @@ -47,6 +48,7 @@ device::device(clover::platform &platform, > pipe_loader_device *ldev) : > pipe->destroy(pipe); >throw error(CL_INVALID_DEVICE); > } > + pipe = pipe_threadsafe_screen(pipe); > } > > device::~device() { > diff --git a/src/gallium/state_trackers/clover/core/pipe_threadsafe_context.c > b/src/gallium/state_trackers/clover/core/pipe_threadsafe_context.c > new file mode 100644 > index 000..f08f56c > --- /dev/null > +++ b/src/gallium/state_trackers/clover/core/pipe_threadsafe_context.c > @@ -0,0 +1,272 @@ > +/* > + * Copyright 2015 Advanced Micro Devices, Inc. > + * > + * Permission is hereby granted, free of charge, to any person obtaining a > + * copy of this software and associated documentation files (the "Software"), > + * to deal in the Software without restriction, including without limitation > + * the rights to use, copy, modify, merge, publish, distribute, sublicense, > + * and/or sell copies of the Software, and to permit persons to whom the > + * Software is furnished to do so, subject to the following conditions: > + * > + * The above copyright notice and this permission notice (including the next > + * paragraph) shall be included in all copies or substantial portions of the > + * Software. > + * > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL > + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING > FROM, > + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN > THE > + * SOFTWARE. > + * > + * Authors: Tom Stellard > + * > + */ > + > +#include > + > +/** > + * \file > + * > + * threadsafe_context is a wrapper around a pipe_context to make it thread > + * safe. > + */ > + > +#include "os/os_thread.h" > +#include "pipe/p_context.h" >
Re: [Mesa-dev] [PATCH 7/9] egl/wayland: assume EGL_WINDOW_BIT
On 8 May 2015 at 14:34, Axel Davy wrote: > Le 06/05/2015 03:00, Dave Airlie a écrit : >> >> On 2 May 2015 at 20:15, Axel Davy wrote: >>> >>> Only EGL_WINDOW_BIT is supported. Remove tests related. >> >> Is this there no plans to support pixmap/pbuffer/ or any of the other >> bits? >> >> Seems like a step in the wrong direction if we really should be supporting >> other things than WINDOW_BIT in the future. >> >> Dave. >> > I double checked, and it really seems pbuffers are not supported by the > current mesa implementation for wayland. > However the spec doesn't tell they shouldn't be supported. > > What about changing the commit message to: > egl/wayland: simplify dri2_wl_create_surface > > This function is always used with EGL_WINDOW_BIT. Pixmaps are forbidden > for Wayland, and PBuffers are unimplemented. > Fwiw, the new commit message looks great. I'm not sure how optimistic one can get about pbuffer support based on what Daniel said. Not to mention the relative lack of interest. -Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 5/9] egl/wayland: Implement DRI_PRIME support
On 8 May 2015 at 18:06, Axel Davy wrote: > On Fri, 8 May 2015, Emil Velikov wrote: > >> Shouldn't we authenticate with the correct gpu or master/render node ? >> This implementation will auth with GPU1, and then use GPU2 which seems >> a bit odd. I might be missing something ? >> >> > > The original patches did do differently: when GPU1 was discovered to not be > the wanted gpu, it was not authenticating to it. > > However loader_get_user_preferred_fd was introduced and thus that leads to > something different (which I think is cleaner): > > We authenticate to GPU1, so we have one node we can render to. > > loader_get_user_preferred_fd takes the GPU1 fd, and eventually replaces > it by GPU2 fd if user wants it and it is possible. > > Given this way of doing, we it makes sense to auth to GPU1, even if we end > up rendering to GPU2. No objections against loader_get_user_preferred_fd. I'm thinking that if we push it into drm_handle_device() we can avoid a needless (in some cases) auth. Not sure how much it's worth, but it seems like a good idea imho. -Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 9/9] egl/swrast: enable config extension for swrast
On 2 May 2015 at 11:15, Axel Davy wrote: > Enables to use dri config for swrast, like vblank_mode. > > Signed-off-by: Axel Davy > --- > src/egl/drivers/dri2/egl_dri2.c| 21 ++--- > src/gallium/state_trackers/dri/drisw.c | 1 + > src/mesa/drivers/dri/swrast/swrast.c | 1 + > 3 files changed, 12 insertions(+), 11 deletions(-) > > diff --git a/src/egl/drivers/dri2/egl_dri2.c b/src/egl/drivers/dri2/egl_dri2.c > index 689d5ec..1434580 100644 > --- a/src/egl/drivers/dri2/egl_dri2.c > +++ b/src/egl/drivers/dri2/egl_dri2.c > @@ -559,6 +559,7 @@ dri2_create_screen(_EGLDisplay *disp) > { > const __DRIextension **extensions; > struct dri2_egl_display *dri2_dpy; > + unsigned i; > > dri2_dpy = disp->DriverData; > > @@ -599,25 +600,23 @@ dri2_create_screen(_EGLDisplay *disp) > extensions = dri2_dpy->core->getExtensions(dri2_dpy->dri_screen); > > if (dri2_dpy->dri2) { > - unsigned i; > - >if (!dri2_bind_extensions(dri2_dpy, dri2_core_extensions, extensions)) > goto cleanup_dri_screen; > - > - for (i = 0; extensions[i]; i++) { > -if (strcmp(extensions[i]->name, __DRI2_ROBUSTNESS) == 0) { > -dri2_dpy->robustness = (__DRIrobustnessExtension *) > extensions[i]; > -} > -if (strcmp(extensions[i]->name, __DRI2_CONFIG_QUERY) == 0) { > - dri2_dpy->config = (__DRI2configQueryExtension *) extensions[i]; > -} > - } > } else { >assert(dri2_dpy->swrast); >if (!dri2_bind_extensions(dri2_dpy, swrast_core_extensions, > extensions)) > goto cleanup_dri_screen; > } > > + for (i = 0; extensions[i]; i++) { > + if (strcmp(extensions[i]->name, __DRI2_ROBUSTNESS) == 0) { > + dri2_dpy->robustness = (__DRIrobustnessExtension *) extensions[i]; > + } > + if (strcmp(extensions[i]->name, __DRI2_CONFIG_QUERY) == 0) { > + dri2_dpy->config = (__DRI2configQueryExtension *) extensions[i]; > + } > + } > + Side note: Applying this hunk might lead to conflict due to the newly introduced __DRI2_FENCE. Afaict one can move it as well similarly to robustness and config_query. Cheers Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965: Use NIR by default for vertex shaders on GEN8+
On Fri, May 8, 2015 at 10:08 AM, Jason Ekstrand wrote: > On Fri, May 8, 2015 at 3:27 AM, Kenneth Graunke wrote: >> On Thursday, May 07, 2015 06:17:46 PM Matt Turner wrote: >>> On Thu, May 7, 2015 at 4:50 PM, Jason Ekstrand wrote: >>> > GLSL IR vs. NIR shader-db results for SIMD8 vertex shaders on Broadwell: >>> > >>> >total instructions in shared programs: 2724483 -> 2711790 (-0.47%) >>> >instructions in affected programs: 1860859 -> 1848166 (-0.68%) >>> >helped:4387 >>> >HURT: 4758 >>> >GAINED:1499 >>> > >>> > The gained programs are ARB vertext programs that were previously going >>> > through the vec4 backend. Now that we have prog_to_nir, ARB vertex >>> > programs can go through the scalar backend so they show up as "gained" in >>> > the shader-db results. >>> >>> Again, I'm kind of confused and disappointed that we're just okay with >>> hurting 4700 programs without more analysis. I guess I'll go do >>> that... >> >> I took a stab at that tonight. The good news is, the majority of the >> hurt is pretty stupid. Indirect uniform address calculations involve >> a lot of integer multiplication by 4. >> >> For whatever reason, we're getting 4*x instead of x*4, which doesn't >> support immediates. So we get: >> >> MOV tmp 4 >> MUL dst tmp x >> >> Normally, constant propagation would commute the operands, giving us >> "MUL dst x 4" like we want. But it sees integer multiplication and >> chickens out, due to the asymmetry on some platforms. > > Right. I just sent out a patch that puts immediates in src1 for > commutative ALU ops at the emit stage. We probably still want to do > something in constant propagation in case we can fold something, but > it fixes the problem for now. It also helps even more than the > shifting patch. > > I don't know. Maybe we just want to make constant propagation do the > right thing on BDW+. Matt? Yes. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/fs_nir: Put the immediate in src1 for commutative ALU ops
On Fri, May 8, 2015 at 10:04 AM, Jason Ekstrand wrote: > Shader-db results for fragment shaders on Broadwell: > >total instructions in shared programs: 4310987 -> 4310663 (-0.01%) >instructions in affected programs: 43242 -> 42918 (-0.75%) >helped:142 >HURT: 0 > > Shader-db results for vertex shaders on Broadwell: > >total instructions in shared programs: 2889581 -> 2844309 (-1.57%) >instructions in affected programs: 1418720 -> 1373448 (-3.19%) >helped:6139 >HURT: 0 > --- > src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 12 > 1 file changed, 12 insertions(+) > > diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > index 555987d..161a262 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > @@ -21,6 +21,8 @@ > * IN THE SOFTWARE. > */ > > +#include > + > #include "glsl/ir.h" > #include "glsl/ir_optimization.h" > #include "glsl/nir/glsl_to_nir.h" > @@ -662,6 +664,16 @@ fs_visitor::nir_emit_alu(nir_alu_instr *instr) >op[i] = offset(op[i], instr->src[i].swizzle[channel]); > } > > + /* Immediates can only be used as the second source of two-source > +* instructions. We have code in opt_algebraic to flip them as needed > +* for most instructions. However, it doesn't hurt anything to just do > +* the right thing if we can detect it at the NIR level. > +*/ > + if ((nir_op_infos[instr->op].algebraic_properties & > NIR_OP_IS_COMMUTATIVE) && > + nir_src_as_const_value(instr->src[0].src)) { > + std::swap(op[0], op[1]); > + } > + The real problem is that we haven't lifted the restriction about non-commutative integer multiply on Broadwell: brw_fs_copy_propagation.cpp: /* Fit this constant in by commuting the operands. * Exception: we can't do this for 32-bit integer MUL/MACH * because it's asymmetric. */ if ((inst->opcode == BRW_OPCODE_MUL || inst->opcode == BRW_OPCODE_MACH) && (inst->src[1].type == BRW_REGISTER_TYPE_D || inst->src[1].type == BRW_REGISTER_TYPE_UD)) break; inst->src[0] = inst->src[1]; inst->src[1] = val; progress = true; ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/fs_nir: Put the immediate in src1 for commutative ALU ops
On Fri, May 8, 2015 at 10:04 AM, Jason Ekstrand wrote: > Shader-db results for fragment shaders on Broadwell: > >total instructions in shared programs: 4310987 -> 4310663 (-0.01%) >instructions in affected programs: 43242 -> 42918 (-0.75%) >helped:142 >HURT: 0 > > Shader-db results for vertex shaders on Broadwell: > >total instructions in shared programs: 2889581 -> 2844309 (-1.57%) >instructions in affected programs: 1418720 -> 1373448 (-3.19%) >helped:6139 >HURT: 0 > --- > src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 12 > 1 file changed, 12 insertions(+) > > diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > index 555987d..161a262 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp > @@ -21,6 +21,8 @@ > * IN THE SOFTWARE. > */ > > +#include > + > #include "glsl/ir.h" > #include "glsl/ir_optimization.h" > #include "glsl/nir/glsl_to_nir.h" > @@ -662,6 +664,16 @@ fs_visitor::nir_emit_alu(nir_alu_instr *instr) >op[i] = offset(op[i], instr->src[i].swizzle[channel]); > } > > + /* Immediates can only be used as the second source of two-source > +* instructions. We have code in opt_algebraic to flip them as needed > +* for most instructions. However, it doesn't hurt anything to just do > +* the right thing if we can detect it at the NIR level. > +*/ > + if ((nir_op_infos[instr->op].algebraic_properties & > NIR_OP_IS_COMMUTATIVE) && > + nir_src_as_const_value(instr->src[0].src)) { > + std::swap(op[0], op[1]); We don't use STL. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965/fs: Disable opt_sampler_eot for textureGather
The opt_sampler_eot optimisation seems to break when the last instruction is SHADER_OPCODE_TG4. A bunch of Piglit tests end up doing this so it causes a lot of regressions. I can't find any documentation or known workarounds to indicate that this is expected behaviour, but considering that this is probably a pretty unlikely situation in a real use case we might as well disable it in order to avoid the regressions. In total this fixes 451 tests. Reviewed-by: Ben Widawsky --- See here for some more discussion of this: http://lists.freedesktop.org/archives/mesa-dev/2015-May/083640.html As far as I can tell the Jenkins run mentioned in that email doesn't seem to have any tests on Cherryview or Skylake so that probably explains why it didn't pick up the regression. src/mesa/drivers/dri/i965/brw_fs.cpp | 10 ++ 1 file changed, 10 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 8dd680e..e9528e0 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -2655,6 +2655,16 @@ fs_visitor::opt_sampler_eot() if (unlikely(tex_inst->is_head_sentinel()) || !tex_inst->is_tex()) return false; + /* This optimisation doesn't seem to work for textureGather for some +* reason. I can't find any documentation or known workarounds to indicate +* that this is expected, but considering that it is probably pretty +* unlikely that a shader would directly write out the results from +* textureGather we might as well just disable it. +*/ + if (tex_inst->opcode == SHADER_OPCODE_TG4 || + tex_inst->opcode == SHADER_OPCODE_TG4_OFFSET) + return false; + /* If there's no header present, we need to munge the LOAD_PAYLOAD as well. * It's very likely to be the previous instruction. */ -- 1.9.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965: Use NIR by default for vertex shaders on GEN8+
On Fri, May 8, 2015 at 3:27 AM, Kenneth Graunke wrote: > On Thursday, May 07, 2015 06:17:46 PM Matt Turner wrote: >> On Thu, May 7, 2015 at 4:50 PM, Jason Ekstrand wrote: >> > GLSL IR vs. NIR shader-db results for SIMD8 vertex shaders on Broadwell: >> > >> >total instructions in shared programs: 2724483 -> 2711790 (-0.47%) >> >instructions in affected programs: 1860859 -> 1848166 (-0.68%) >> >helped:4387 >> >HURT: 4758 >> >GAINED:1499 >> > >> > The gained programs are ARB vertext programs that were previously going >> > through the vec4 backend. Now that we have prog_to_nir, ARB vertex >> > programs can go through the scalar backend so they show up as "gained" in >> > the shader-db results. >> >> Again, I'm kind of confused and disappointed that we're just okay with >> hurting 4700 programs without more analysis. I guess I'll go do >> that... > > I took a stab at that tonight. The good news is, the majority of the > hurt is pretty stupid. Indirect uniform address calculations involve > a lot of integer multiplication by 4. > > For whatever reason, we're getting 4*x instead of x*4, which doesn't > support immediates. So we get: > > MOV tmp 4 > MUL dst tmp x > > Normally, constant propagation would commute the operands, giving us > "MUL dst x 4" like we want. But it sees integer multiplication and > chickens out, due to the asymmetry on some platforms. Right. I just sent out a patch that puts immediates in src1 for commutative ALU ops at the emit stage. We probably still want to do something in constant propagation in case we can fold something, but it fixes the problem for now. It also helps even more than the shifting patch. I don't know. Maybe we just want to make constant propagation do the right thing on BDW+. Matt? --jason > I think we can extend that - on Broadwell it should work fine, and > might work fine for 16-bit immediates on Gen7 and Cherryview, too. > > Alternatively, I wrote a nir_opt_algebraic_late optimization that turns > 4*x into x << 2, which works around the problem, and is also apparently > much better for R600. > > Statistics on the shift patch are: > > total instructions in shared programs: 7432587 -> 7388982 (-0.59%) > instructions in affected programs: 1360411 -> 1316806 (-3.21%) > helped:5772 > HURT: 0 > > Statistics for GLSL IR vs. NIR+(4*x => x << 2): > > total instructions in shared programs: 7232451 -> 7175983 (-0.78%) > instructions in affected programs: 1586917 -> 1530449 (-3.56%) > helped:5780 > HURT: 1654 > > which is much better. > > Looking at a couple of the shaders that are still worse off...it looks > like a ton of Source shaders used to do MUL/ADD with an attribute and > two immediates, and now are doing MOV/MOV/MAD. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [RFC PATCH] nir: Transform 4*x into x << 2 during late optimizations.
On Fri, May 8, 2015 at 3:36 AM, Kenneth Graunke wrote: > According to Glenn, shifts on R600 have 5x the throughput as multiplies. > > Intel GPUs have strange integer multiplication restrictions - on most > hardware, MUL actually only does a 32-bit x 16-bit multiply. This > means the arguments aren't commutative, which can limit our constant > propagation options. SHL has no such restrictions. > > Shifting is probably reasonable on most people's hardware, so let's just > do that. > > i965 shader-db results (using NIR for VS): > total instructions in shared programs: 7432587 -> 7388982 (-0.59%) > instructions in affected programs: 1360411 -> 1316806 (-3.21%) > helped:5772 > HURT: 0 > > Signed-off-by: Kenneth Graunke > Cc: matts...@gmail.com > Cc: ja...@jlekstrand.net > --- > src/glsl/nir/nir_opt_algebraic.py | 5 + > 1 file changed, 5 insertions(+) > > So...I found a bizarre issue with this patch. > >(('imul', 4, a), ('ishl', a, 2)), > > totally optimizes things. However... > >(('imul', a, 4), ('ishl', a, 2)), > > doesn't seem to do anything, even though imul is commutative, and nir_search > should totally handle that... > > ▄▄ ▄▄▄▄ ▄ ▄▄ > ██ ██ ▀▀▀██▀▀▀ ███ ██ > ▀█▄ ██ ▄█▀ ██ ▄█▀ ██ > ██ ██ ██ ██ ██ ██ ▄██▀ ██ > ███▀▀███ ██ ██ ██ ▀▀ > ███ ███ ▄██ ██▄ ██ ▄▄ ▄▄ > ▀▀▀ ▀▀▀ ▀▀▀▀ ▀▀ ▀▀ ▀▀ > > If you know why, let me know, otherwise I may have to look into it when more > awake. I figured it out and I have a patch. Unfortunately, it regresses a few programs and looses 8 SIMD8 programs so I'm doing some more investigation. I'll send it out soon. > diff --git a/src/glsl/nir/nir_opt_algebraic.py > b/src/glsl/nir/nir_opt_algebraic.py > index 400d60e..350471f 100644 > --- a/src/glsl/nir/nir_opt_algebraic.py > +++ b/src/glsl/nir/nir_opt_algebraic.py > @@ -247,6 +247,11 @@ late_optimizations = [ > (('fge', ('fadd', a, b), 0.0), ('fge', a, ('fneg', b))), > (('feq', ('fadd', a, b), 0.0), ('feq', a, ('fneg', b))), > (('fne', ('fadd', a, b), 0.0), ('fne', a, ('fneg', b))), > + > + # Multiplication by 4 comes up fairly often in indirect offset > calculations. > + # Some GPUs have weird integer multiplication limitations, but shifts > should work > + # equally well everywhere. > + (('imul', 4, a), ('ishl', a, 2)), > ] > > print nir_algebraic.AlgebraicPass("nir_opt_algebraic", > optimizations).render() > -- > 2.4.0 > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] i965/fs_nir: Put the immediate in src1 for commutative ALU ops
Shader-db results for fragment shaders on Broadwell: total instructions in shared programs: 4310987 -> 4310663 (-0.01%) instructions in affected programs: 43242 -> 42918 (-0.75%) helped:142 HURT: 0 Shader-db results for vertex shaders on Broadwell: total instructions in shared programs: 2889581 -> 2844309 (-1.57%) instructions in affected programs: 1418720 -> 1373448 (-3.19%) helped:6139 HURT: 0 --- src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 12 1 file changed, 12 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp index 555987d..161a262 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp @@ -21,6 +21,8 @@ * IN THE SOFTWARE. */ +#include + #include "glsl/ir.h" #include "glsl/ir_optimization.h" #include "glsl/nir/glsl_to_nir.h" @@ -662,6 +664,16 @@ fs_visitor::nir_emit_alu(nir_alu_instr *instr) op[i] = offset(op[i], instr->src[i].swizzle[channel]); } + /* Immediates can only be used as the second source of two-source +* instructions. We have code in opt_algebraic to flip them as needed +* for most instructions. However, it doesn't hurt anything to just do +* the right thing if we can detect it at the NIR level. +*/ + if ((nir_op_infos[instr->op].algebraic_properties & NIR_OP_IS_COMMUTATIVE) && + nir_src_as_const_value(instr->src[0].src)) { + std::swap(op[0], op[1]); + } + switch (instr->op) { case nir_op_i2f: case nir_op_u2f: -- 2.4.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 5/9] egl/wayland: Implement DRI_PRIME support
On 2 May 2015 at 11:15, Axel Davy wrote: > When the server gpu and requested gpu are different: > . They likely don't support the same tiling modes > . They likely do not have fast access to the same locations > > Thus we do: > . render to a tiled buffer we do not share with the server > . Copy the content at every swap to a buffer with no tiling > that we share with the server. > > This is similar to the glx dri3 DRI_PRIME implementation. > > Signed-off-by: Axel Davy > --- > --- a/src/egl/drivers/dri2/platform_wayland.c > +++ b/src/egl/drivers/dri2/platform_wayland.c > @@ -632,6 +658,8 @@ dri2_wl_swap_buffers_with_damage(_EGLDriver *drv, > { > struct dri2_egl_display *dri2_dpy = dri2_egl_display(disp); > struct dri2_egl_surface *dri2_surf = dri2_egl_surface(draw); > + struct dri2_egl_context *dri2_ctx; > + _EGLContext *ctx; Nit: Move the variable declarations where they're used. Otherwise static analysis tools flag lovely message(s). > @@ -1084,6 +1123,24 @@ dri2_initialize_wayland(_EGLDriver *drv, _EGLDisplay > *disp) > if (roundtrip(dri2_dpy) < 0 || !dri2_dpy->authenticated) >goto cleanup_fd; > > + dri2_dpy->fd = loader_get_user_preferred_fd(dri2_dpy->fd, > + &dri2_dpy->is_different_gpu); > + if (dri2_dpy->is_different_gpu) { > + free(dri2_dpy->device_name); > + dri2_dpy->device_name = loader_get_device_name_for_fd(dri2_dpy->fd); > + if (!dri2_dpy->device_name) { > + _eglError(EGL_BAD_ALLOC, "wayland-egl: failed to get device name " > + "for requested GPU"); > + goto cleanup_fd; > + } > + } > + Shouldn't we authenticate with the correct gpu or master/render node ? This implementation will auth with GPU1, and then use GPU2 which seems a bit odd. I might be missing something ? > @@ -1127,6 +1184,15 @@ dri2_initialize_wayland(_EGLDriver *drv, _EGLDisplay > *disp) >goto cleanup_screen; > } > > + if (dri2_dpy->is_different_gpu && > + (dri2_dpy->image->base.version < 9 || > +dri2_dpy->image->blitImage == NULL)) { > + _eglLog(_EGL_WARNING, "wayland-egl: Different GPU, but image extension > " > +"version 9 or later not found, or blitImage not " > +"implemented for this driver"); Nit: Alternative message: "Different GPU selected, but the Image extension in the driver is not compatible. Version 9 or later and blitImage() are required." Thanks Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCHv2 4/5] nir: Translate image load, store and atomic intrinsics from GLSL IR.
Update the comment in patch 1, and the entire thing is Reviewed-by: Connor Abbott On Fri, May 8, 2015 at 12:40 PM, Francisco Jerez wrote: > v2: Undefine coordinate components not applicable to the target. > > --- > src/glsl/nir/glsl_to_nir.cpp | 126 > +++ > 1 file changed, 115 insertions(+), 11 deletions(-) > > diff --git a/src/glsl/nir/glsl_to_nir.cpp b/src/glsl/nir/glsl_to_nir.cpp > index f6b8331..fab4508 100644 > --- a/src/glsl/nir/glsl_to_nir.cpp > +++ b/src/glsl/nir/glsl_to_nir.cpp > @@ -614,27 +614,131 @@ nir_visitor::visit(ir_call *ir) > op = nir_intrinsic_atomic_counter_inc_var; >} else if (strcmp(ir->callee_name(), > "__intrinsic_atomic_predecrement") == 0) { > op = nir_intrinsic_atomic_counter_dec_var; > + } else if (strcmp(ir->callee_name(), "__intrinsic_image_load") == 0) { > + op = nir_intrinsic_image_load; > + } else if (strcmp(ir->callee_name(), "__intrinsic_image_store") == 0) { > + op = nir_intrinsic_image_store; > + } else if (strcmp(ir->callee_name(), "__intrinsic_image_atomic_add") > == 0) { > + op = nir_intrinsic_image_atomic_add; > + } else if (strcmp(ir->callee_name(), "__intrinsic_image_atomic_min") > == 0) { > + op = nir_intrinsic_image_atomic_min; > + } else if (strcmp(ir->callee_name(), "__intrinsic_image_atomic_max") > == 0) { > + op = nir_intrinsic_image_atomic_max; > + } else if (strcmp(ir->callee_name(), "__intrinsic_image_atomic_and") > == 0) { > + op = nir_intrinsic_image_atomic_and; > + } else if (strcmp(ir->callee_name(), "__intrinsic_image_atomic_or") == > 0) { > + op = nir_intrinsic_image_atomic_or; > + } else if (strcmp(ir->callee_name(), "__intrinsic_image_atomic_xor") > == 0) { > + op = nir_intrinsic_image_atomic_xor; > + } else if (strcmp(ir->callee_name(), > "__intrinsic_image_atomic_exchange") == 0) { > + op = nir_intrinsic_image_atomic_exchange; > + } else if (strcmp(ir->callee_name(), > "__intrinsic_image_atomic_comp_swap") == 0) { > + op = nir_intrinsic_image_atomic_comp_swap; >} else { > unreachable("not reached"); >} > >nir_intrinsic_instr *instr = nir_intrinsic_instr_create(shader, op); > - ir_dereference *param = > - (ir_dereference *) ir->actual_parameters.get_head(); > - instr->variables[0] = evaluate_deref(&instr->instr, param); > - nir_ssa_dest_init(&instr->instr, &instr->dest, 1, NULL); > + > + switch (op) { > + case nir_intrinsic_atomic_counter_read_var: > + case nir_intrinsic_atomic_counter_inc_var: > + case nir_intrinsic_atomic_counter_dec_var: { > + ir_dereference *param = > +(ir_dereference *) ir->actual_parameters.get_head(); > + instr->variables[0] = evaluate_deref(&instr->instr, param); > + nir_ssa_dest_init(&instr->instr, &instr->dest, 1, NULL); > + break; > + } > + case nir_intrinsic_image_load: > + case nir_intrinsic_image_store: > + case nir_intrinsic_image_atomic_add: > + case nir_intrinsic_image_atomic_min: > + case nir_intrinsic_image_atomic_max: > + case nir_intrinsic_image_atomic_and: > + case nir_intrinsic_image_atomic_or: > + case nir_intrinsic_image_atomic_xor: > + case nir_intrinsic_image_atomic_exchange: > + case nir_intrinsic_image_atomic_comp_swap: { > + nir_ssa_undef_instr *instr_undef = > +nir_ssa_undef_instr_create(shader, 1); > + nir_instr_insert_after_cf_list(this->cf_node_list, > +&instr_undef->instr); > + > + /* Set the image variable dereference. */ > + exec_node *param = ir->actual_parameters.get_head(); > + ir_dereference *image = (ir_dereference *)param; > + const glsl_type *type = > +image->variable_referenced()->type->without_array(); > + > + instr->variables[0] = evaluate_deref(&instr->instr, image); > + param = param->get_next(); > + > + /* Set the address argument, extending the coordinate vector to four > + * components. > + */ > + const nir_src src_addr = evaluate_rvalue((ir_dereference *)param); > + nir_alu_instr *instr_addr = nir_alu_instr_create(shader, > nir_op_vec4); > + nir_ssa_dest_init(&instr_addr->instr, &instr_addr->dest.dest, 4, > NULL); > + > + for (int i = 0; i < 4; i++) { > +if (i < type->coordinate_components()) { > + instr_addr->src[i].src = src_addr; > + instr_addr->src[i].swizzle[0] = i; > +} else { > + instr_addr->src[i].src = nir_src_for_ssa(&instr_undef->def); > +} > + } > + > + nir_instr_insert_after_cf_list(cf_node_list, &instr_addr->instr); > + instr->src[0] = nir_src_for_ssa(&instr_addr->dest.dest.ssa); > + para
Re: [Mesa-dev] [PATCH] nir/search: handle explicitly sized sources in match_value
Makes sense. Reviewed-by: Connor Abbott On Fri, May 8, 2015 at 11:38 AM, Jason Ekstrand wrote: > Previously, this case was being handled in match_expression prior to > calling match_value. However, there is really no good reason for this > given that match_value has all of the information it needs. Also, they > weren't being handled properly in the commutative case and putting it in > match_value gives us that for free. > > Cc: Connor Abbott > --- > src/glsl/nir/nir_search.c | 16 > 1 file changed, 8 insertions(+), 8 deletions(-) > > diff --git a/src/glsl/nir/nir_search.c b/src/glsl/nir/nir_search.c > index 5ba0160..821b86d 100644 > --- a/src/glsl/nir/nir_search.c > +++ b/src/glsl/nir/nir_search.c > @@ -73,6 +73,14 @@ match_value(const nir_search_value *value, nir_alu_instr > *instr, unsigned src, > { > uint8_t new_swizzle[4]; > > + /* If the source is an explicitly sized source, then we need to reset > +* both the number of components and the swizzle. > +*/ > + if (nir_op_infos[instr->op].input_sizes[src] != 0) { > + num_components = nir_op_infos[instr->op].input_sizes[src]; > + swizzle = identity_swizzle; > + } > + > for (int i = 0; i < num_components; ++i) >new_swizzle[i] = instr->src[src].swizzle[swizzle[i]]; > > @@ -200,14 +208,6 @@ match_expression(const nir_search_expression *expr, > nir_alu_instr *instr, > > bool matched = true; > for (unsigned i = 0; i < nir_op_infos[instr->op].num_inputs; i++) { > - /* If the source is an explicitly sized source, then we need to reset > - * both the number of components and the swizzle. > - */ > - if (nir_op_infos[instr->op].input_sizes[i] != 0) { > - num_components = nir_op_infos[instr->op].input_sizes[i]; > - swizzle = identity_swizzle; > - } > - >if (!match_value(expr->srcs[i], instr, i, num_components, > swizzle, state)) { > matched = false; > -- > 2.4.0 > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCHv2 4/5] nir: Translate image load, store and atomic intrinsics from GLSL IR.
v2: Undefine coordinate components not applicable to the target. --- src/glsl/nir/glsl_to_nir.cpp | 126 +++ 1 file changed, 115 insertions(+), 11 deletions(-) diff --git a/src/glsl/nir/glsl_to_nir.cpp b/src/glsl/nir/glsl_to_nir.cpp index f6b8331..fab4508 100644 --- a/src/glsl/nir/glsl_to_nir.cpp +++ b/src/glsl/nir/glsl_to_nir.cpp @@ -614,27 +614,131 @@ nir_visitor::visit(ir_call *ir) op = nir_intrinsic_atomic_counter_inc_var; } else if (strcmp(ir->callee_name(), "__intrinsic_atomic_predecrement") == 0) { op = nir_intrinsic_atomic_counter_dec_var; + } else if (strcmp(ir->callee_name(), "__intrinsic_image_load") == 0) { + op = nir_intrinsic_image_load; + } else if (strcmp(ir->callee_name(), "__intrinsic_image_store") == 0) { + op = nir_intrinsic_image_store; + } else if (strcmp(ir->callee_name(), "__intrinsic_image_atomic_add") == 0) { + op = nir_intrinsic_image_atomic_add; + } else if (strcmp(ir->callee_name(), "__intrinsic_image_atomic_min") == 0) { + op = nir_intrinsic_image_atomic_min; + } else if (strcmp(ir->callee_name(), "__intrinsic_image_atomic_max") == 0) { + op = nir_intrinsic_image_atomic_max; + } else if (strcmp(ir->callee_name(), "__intrinsic_image_atomic_and") == 0) { + op = nir_intrinsic_image_atomic_and; + } else if (strcmp(ir->callee_name(), "__intrinsic_image_atomic_or") == 0) { + op = nir_intrinsic_image_atomic_or; + } else if (strcmp(ir->callee_name(), "__intrinsic_image_atomic_xor") == 0) { + op = nir_intrinsic_image_atomic_xor; + } else if (strcmp(ir->callee_name(), "__intrinsic_image_atomic_exchange") == 0) { + op = nir_intrinsic_image_atomic_exchange; + } else if (strcmp(ir->callee_name(), "__intrinsic_image_atomic_comp_swap") == 0) { + op = nir_intrinsic_image_atomic_comp_swap; } else { unreachable("not reached"); } nir_intrinsic_instr *instr = nir_intrinsic_instr_create(shader, op); - ir_dereference *param = - (ir_dereference *) ir->actual_parameters.get_head(); - instr->variables[0] = evaluate_deref(&instr->instr, param); - nir_ssa_dest_init(&instr->instr, &instr->dest, 1, NULL); + + switch (op) { + case nir_intrinsic_atomic_counter_read_var: + case nir_intrinsic_atomic_counter_inc_var: + case nir_intrinsic_atomic_counter_dec_var: { + ir_dereference *param = +(ir_dereference *) ir->actual_parameters.get_head(); + instr->variables[0] = evaluate_deref(&instr->instr, param); + nir_ssa_dest_init(&instr->instr, &instr->dest, 1, NULL); + break; + } + case nir_intrinsic_image_load: + case nir_intrinsic_image_store: + case nir_intrinsic_image_atomic_add: + case nir_intrinsic_image_atomic_min: + case nir_intrinsic_image_atomic_max: + case nir_intrinsic_image_atomic_and: + case nir_intrinsic_image_atomic_or: + case nir_intrinsic_image_atomic_xor: + case nir_intrinsic_image_atomic_exchange: + case nir_intrinsic_image_atomic_comp_swap: { + nir_ssa_undef_instr *instr_undef = +nir_ssa_undef_instr_create(shader, 1); + nir_instr_insert_after_cf_list(this->cf_node_list, +&instr_undef->instr); + + /* Set the image variable dereference. */ + exec_node *param = ir->actual_parameters.get_head(); + ir_dereference *image = (ir_dereference *)param; + const glsl_type *type = +image->variable_referenced()->type->without_array(); + + instr->variables[0] = evaluate_deref(&instr->instr, image); + param = param->get_next(); + + /* Set the address argument, extending the coordinate vector to four + * components. + */ + const nir_src src_addr = evaluate_rvalue((ir_dereference *)param); + nir_alu_instr *instr_addr = nir_alu_instr_create(shader, nir_op_vec4); + nir_ssa_dest_init(&instr_addr->instr, &instr_addr->dest.dest, 4, NULL); + + for (int i = 0; i < 4; i++) { +if (i < type->coordinate_components()) { + instr_addr->src[i].src = src_addr; + instr_addr->src[i].swizzle[0] = i; +} else { + instr_addr->src[i].src = nir_src_for_ssa(&instr_undef->def); +} + } + + nir_instr_insert_after_cf_list(cf_node_list, &instr_addr->instr); + instr->src[0] = nir_src_for_ssa(&instr_addr->dest.dest.ssa); + param = param->get_next(); + + /* Set the sample argument, which is undefined for single-sample + * images. + */ + if (type->sampler_dimensionality == GLSL_SAMPLER_DIM_MS) { +instr->src[1] = evaluate_rvalue((ir_dereference *)param); +param = param->get_next(); + } else { +instr->src[
Re: [Mesa-dev] [PATCH 4/5] nir: Translate image load, store and atomic intrinsics from GLSL IR.
On Fri, May 8, 2015 at 10:30 AM, Francisco Jerez wrote: > Connor Abbott writes: > >> On Tue, May 5, 2015 at 4:29 PM, Francisco Jerez >> wrote: >>> --- >>> src/glsl/nir/glsl_to_nir.cpp | 125 >>> +++ >>> 1 file changed, 114 insertions(+), 11 deletions(-) >>> >>> diff --git a/src/glsl/nir/glsl_to_nir.cpp b/src/glsl/nir/glsl_to_nir.cpp >>> index f6b8331..a01ab3b 100644 >>> --- a/src/glsl/nir/glsl_to_nir.cpp >>> +++ b/src/glsl/nir/glsl_to_nir.cpp >>> @@ -614,27 +614,130 @@ nir_visitor::visit(ir_call *ir) >>> op = nir_intrinsic_atomic_counter_inc_var; >>>} else if (strcmp(ir->callee_name(), >>> "__intrinsic_atomic_predecrement") == 0) { >>> op = nir_intrinsic_atomic_counter_dec_var; >>> + } else if (strcmp(ir->callee_name(), "__intrinsic_image_load") == 0) >>> { >>> + op = nir_intrinsic_image_load; >>> + } else if (strcmp(ir->callee_name(), "__intrinsic_image_store") == >>> 0) { >>> + op = nir_intrinsic_image_store; >>> + } else if (strcmp(ir->callee_name(), "__intrinsic_image_atomic_add") >>> == 0) { >>> + op = nir_intrinsic_image_atomic_add; >>> + } else if (strcmp(ir->callee_name(), "__intrinsic_image_atomic_min") >>> == 0) { >>> + op = nir_intrinsic_image_atomic_min; >>> + } else if (strcmp(ir->callee_name(), "__intrinsic_image_atomic_max") >>> == 0) { >>> + op = nir_intrinsic_image_atomic_max; >>> + } else if (strcmp(ir->callee_name(), "__intrinsic_image_atomic_and") >>> == 0) { >>> + op = nir_intrinsic_image_atomic_and; >>> + } else if (strcmp(ir->callee_name(), "__intrinsic_image_atomic_or") >>> == 0) { >>> + op = nir_intrinsic_image_atomic_or; >>> + } else if (strcmp(ir->callee_name(), "__intrinsic_image_atomic_xor") >>> == 0) { >>> + op = nir_intrinsic_image_atomic_xor; >>> + } else if (strcmp(ir->callee_name(), >>> "__intrinsic_image_atomic_exchange") == 0) { >>> + op = nir_intrinsic_image_atomic_exchange; >>> + } else if (strcmp(ir->callee_name(), >>> "__intrinsic_image_atomic_comp_swap") == 0) { >>> + op = nir_intrinsic_image_atomic_comp_swap; >>>} else { >>> unreachable("not reached"); >>>} >>> >>>nir_intrinsic_instr *instr = nir_intrinsic_instr_create(shader, op); >>> - ir_dereference *param = >>> - (ir_dereference *) ir->actual_parameters.get_head(); >>> - instr->variables[0] = evaluate_deref(&instr->instr, param); >>> - nir_ssa_dest_init(&instr->instr, &instr->dest, 1, NULL); >>> + >>> + switch (op) { >>> + case nir_intrinsic_atomic_counter_read_var: >>> + case nir_intrinsic_atomic_counter_inc_var: >>> + case nir_intrinsic_atomic_counter_dec_var: { >>> + ir_dereference *param = >>> +(ir_dereference *) ir->actual_parameters.get_head(); >>> + instr->variables[0] = evaluate_deref(&instr->instr, param); >>> + nir_ssa_dest_init(&instr->instr, &instr->dest, 1, NULL); >>> + break; >>> + } >>> + case nir_intrinsic_image_load: >>> + case nir_intrinsic_image_store: >>> + case nir_intrinsic_image_atomic_add: >>> + case nir_intrinsic_image_atomic_min: >>> + case nir_intrinsic_image_atomic_max: >>> + case nir_intrinsic_image_atomic_and: >>> + case nir_intrinsic_image_atomic_or: >>> + case nir_intrinsic_image_atomic_xor: >>> + case nir_intrinsic_image_atomic_exchange: >>> + case nir_intrinsic_image_atomic_comp_swap: { >>> + nir_load_const_instr *instr_zero = >>> nir_load_const_instr_create(shader, 1); >>> + instr_zero->value.u[0] = 0; >>> + nir_instr_insert_after_cf_list(this->cf_node_list, >>> &instr_zero->instr); >>> + >>> + /* Set the image variable dereference. */ >>> + exec_node *param = ir->actual_parameters.get_head(); >>> + ir_dereference *image = (ir_dereference *)param; >>> + const glsl_type *type = >>> +image->variable_referenced()->type->without_array(); >>> + >>> + instr->variables[0] = evaluate_deref(&instr->instr, image); >>> + param = param->get_next(); >>> + >>> + /* Set the address argument, extending the coordinate vector to >>> four >>> + * components. >>> + */ >>> + const nir_src src_addr = evaluate_rvalue((ir_dereference *)param); >>> + nir_alu_instr *instr_addr = nir_alu_instr_create(shader, >>> nir_op_vec4); >>> + nir_ssa_dest_init(&instr_addr->instr, &instr_addr->dest.dest, 4, >>> NULL); >>> + >>> + for (int i = 0; i < 4; i++) { >>> +if (i < type->coordinate_components()) { >>> + instr_addr->src[i].src = src_addr; >>> + instr_addr->src[i].swizzle[0] = i; >>> +} else { >>> + instr_addr->src[i].src = nir_src_for_ssa(&instr_zero->def); >> >> I think it would better convey the intent to create an
Re: [Mesa-dev] [PATCH 4/9] egl/wayland: Add support for render-nodes
On 2 May 2015 at 11:15, Axel Davy wrote: > It is possible the server advertises a render-node. > In that case no authentication is needed, > and Gem names are forbidden. > > Signed-off-by: Axel Davy > --- > diff --git a/src/egl/drivers/dri2/platform_wayland.c > b/src/egl/drivers/dri2/platform_wayland.c > index a5bcf25..79989cb 100644 > --- a/src/egl/drivers/dri2/platform_wayland.c > +++ b/src/egl/drivers/dri2/platform_wayland.c > @@ -800,12 +800,33 @@ bad_format: > return NULL; > } > > +static char > +is_fd_render_node(int fd) > +{ > + struct stat render; > + > + if (fstat(fd, &render)) > + return 0; > + > + if (!S_ISCHR(render.st_mode)) > + return 0; > + > + if (render.st_rdev & 0x80) > + return 1; > + return 0; > +} > + Let's use drmGetNodeTypeFromFd(), rather than hard-coding this ? > +static EGLBoolean > +is_render_node_capable(struct dri2_egl_display *dri2_dpy) > +{ > + const __DRIextension **extensions; > + int i; > + > + /* We cannot use Gem names with render-nodes, only prime fds (dma-buf). > +* The server needs to accept them */ > + if (!(dri2_dpy->capabilities & WL_DRM_CAPABILITY_PRIME)) > + return EGL_FALSE; > + > + /* Check the __DRIimage api is supported (this is required by our > +* codepath without Gem names) */ > + extensions = dri2_dpy->driver_extensions; > + for (i = 0; extensions[i]; i++) { > + if (strcmp(extensions[i]->name, __DRI_IMAGE_DRIVER) == 0) > + return EGL_TRUE; If memory serves me right__DRI_IMAGE_DRIVER is a cleaned up version of __DRI_DRI2. Afaics the former is never looked up (or used) in the EGL code, unlike the latter. So I'm suspecting that this check is a bit off ? Thanks Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] nir/search: handle explicitly sized sources in match_value
Previously, this case was being handled in match_expression prior to calling match_value. However, there is really no good reason for this given that match_value has all of the information it needs. Also, they weren't being handled properly in the commutative case and putting it in match_value gives us that for free. Cc: Connor Abbott --- src/glsl/nir/nir_search.c | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/src/glsl/nir/nir_search.c b/src/glsl/nir/nir_search.c index 5ba0160..821b86d 100644 --- a/src/glsl/nir/nir_search.c +++ b/src/glsl/nir/nir_search.c @@ -73,6 +73,14 @@ match_value(const nir_search_value *value, nir_alu_instr *instr, unsigned src, { uint8_t new_swizzle[4]; + /* If the source is an explicitly sized source, then we need to reset +* both the number of components and the swizzle. +*/ + if (nir_op_infos[instr->op].input_sizes[src] != 0) { + num_components = nir_op_infos[instr->op].input_sizes[src]; + swizzle = identity_swizzle; + } + for (int i = 0; i < num_components; ++i) new_swizzle[i] = instr->src[src].swizzle[swizzle[i]]; @@ -200,14 +208,6 @@ match_expression(const nir_search_expression *expr, nir_alu_instr *instr, bool matched = true; for (unsigned i = 0; i < nir_op_infos[instr->op].num_inputs; i++) { - /* If the source is an explicitly sized source, then we need to reset - * both the number of components and the swizzle. - */ - if (nir_op_infos[instr->op].input_sizes[i] != 0) { - num_components = nir_op_infos[instr->op].input_sizes[i]; - swizzle = identity_swizzle; - } - if (!match_value(expr->srcs[i], instr, i, num_components, swizzle, state)) { matched = false; -- 2.4.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 3/9] glx/dri3: Add additional check for gpu offloading case
On 2 May 2015 at 11:15, Axel Davy wrote: > Checks blitImage is implemented. > Initially having the __DRIimageExtension extension > at version 9 at least meant blitImage was supported. > However some implementations do advertise version >= 9 > without implementing it. > Afaict in the above mentioned case we'll just segfault. A worthy mesa-stable material imho. Thanks Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/9] egl/wayland: properly destroy wayland objects
On 2 May 2015 at 11:15, Axel Davy wrote: > the wl_registry and the wl_queue allocated weren't destroyed. > Would this have any affect other than leaking memory ? If so can you please add the mesa-stable tag (prior to pushing). Thanks Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] Mesa 10.6 release plan
Hello all, On 21 April 2015 at 13:40, Emil Velikov wrote: > Hi all, > > Here is the preliminary release schedule for Mesa 10.6. Personally I > would love if we can go up to Mesa 11.0 (Spinal Tap anyone ?) although > it might happen with the September release. > > May 15th 2015 - Feature freeze/Release candidate 1 > May 22st 2015 - Release candidate 2 > May 29th 2015 - Release candidate 3 > June 5th 2015 - Release candidate 4/Mesa 10.6.0 > > This means that we have roughly four weeks to get new features in, and > another four weeks to get all the serious bugs sorted out. > > Does this sound reasonable with the different teams ? If anyone has > something special in mind please speak up. > This is a gentle reminder that we have one week until the 10.6 branch point. As far as I can see we have a few extensions which may or may not make it in time. I would be fine if we need an extra week to finish up any of the following extensions considering their importance and/or how long people have been working on them. If you feel the same way please speak up, otherwise the schedule will stay as outlined previously. GL_ARB_shader_subroutine GL_ARB_tessellation_shader GL_ARB_shader_image_load_store GL_ARB_direct_state_access Thanks Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/5] nir: Define image load, store and atomic intrinsics.
Connor Abbott writes: > On IRC, Ken and I were discussing using a scheme inspired by SPIR-V, > which has an OpImagePointer instruction that forms a pointer to the > particular texel of the image as well as > OpAtomic{Load,Store,Exchange,etc.} that operate on an image or shared > buffer pointer. The advantages would be: > > * Makes translating from SPIR-V easier. > * Reduces the number of intrinsics we need to add for SSBO support. > * Reduces the combinatorial explosion enough that we can have separate > versions for 2, 3, and 4 components and MS vs. non-MS without it being > unbearable. I'm not sure how much of a benefit that would be though. > > The disadvantages I can think of are: > > * Doesn't actually save any code in the i965 backend, since we need to > do different things depending on if the pointer is to an image or a > shared buffer anyways. > * We'd have to special case nir_convert_from_ssa to ignore the SSA > value that's really a pointer since we don't have any real type-level > support for pointers. > * Since we lower to SSA before converting to i965, there are some ugly > edge cases when the coordinate argument becomes part of a phi web and > gets potentially overwritten before the instruction that uses the > pointer. > Yeah, I actually played around with a SPIR-V-like approach, and decided to give up the idea in the end mainly because of the disadvantages you mention, because it's nothing close to what the back-ends will want. Two other ideas occurred to me that could have made the back-end's life easier, but it didn't seem like they were worth doing: - Write a driver-independent lowering pass to convert SPIR-V-style intrinsics into coordinate-based intrinsics while still in SSA form. In that case there's likely no point in having the SPIR-V-style intrinsics in the first place, no back-end I know of will prefer image intrinsics in that form at this point, and we can just let the SPIR-V front-end deal with this problem alone. - Actually agree on a representation for a texel pointer. A texel pointer would be a triple of pointer-to-object, vec4 of coordinates and sample index, together with some static metadata determining the memory access properties. Something more back-end-specific would likely work too. In any case we would also have to agree on a representation for a pointer to an image/SSB object. And we would need to type-check pointers correctly to prevent the optimizer from doing illegal transformations (e.g. a single store intrinsic that writes to either coherent or non-coherent memory depending on the parent block, or, if we adhere to the limitations of GLSL strictly, a single store intrinsic that might access images based on different array uniforms). It gets even more "interesting" if you consider the limitations some back-ends have about accessing images non-uniformly -- We would have to guarantee that the pointer-to-object inside the texel pointer has the same value for all thread invocations executing a given load, store or atomic intrinsic, what implies that we would have to forbid texel pointers in phi instructions unless it can be proven that either the control flow incident into the node is uniform *or* the pointer-to-object coming from all parent branches is the same. Sounds like a lot of work for little benefit at this point. > I don't have a preference one way or the other, and I guess we could > always refactor it later if we wanted to, so assuming Ken is OK with > this, then besides one minor comment on patch 4 the series is > > Reviewed-by: Connor Abbott > Thanks. > On Tue, May 5, 2015 at 4:29 PM, Francisco Jerez wrote: >> --- >> src/glsl/nir/nir_intrinsics.h | 27 +++ >> 1 file changed, 27 insertions(+) >> >> diff --git a/src/glsl/nir/nir_intrinsics.h b/src/glsl/nir/nir_intrinsics.h >> index 8e28765..4b13c75 100644 >> --- a/src/glsl/nir/nir_intrinsics.h >> +++ b/src/glsl/nir/nir_intrinsics.h >> @@ -89,6 +89,33 @@ ATOMIC(inc, 0) >> ATOMIC(dec, 0) >> ATOMIC(read, NIR_INTRINSIC_CAN_ELIMINATE) >> >> +/* >> + * Image load, store and atomic intrinsics. >> + * >> + * All image intrinsics take an image target passed as a nir_variable. >> Image >> + * variables contain a number of memory and layout qualifiers that influence >> + * the semantics of the intrinsic. >> + * >> + * All image intrinsics take a four-coordinate vector and a sample index as >> + * first two sources, determining the location within the image that will be >> + * accessed by the intrinsic. Components not applicable to the image target >> + * in use are equal to zero by convention. Image store takes an additional >> + * four-component argument with the value to be written, and image atomic >> + * operations take either one or two additional scalar arguments with the >> same >> + * meaning as in the ARB_shader_image_load_store specification. >> + */ >> +INTRINSIC(image_load, 2, ARR(4, 1),
Re: [Mesa-dev] [PATCH 4/5] nir: Translate image load, store and atomic intrinsics from GLSL IR.
Connor Abbott writes: > On Tue, May 5, 2015 at 4:29 PM, Francisco Jerez wrote: >> --- >> src/glsl/nir/glsl_to_nir.cpp | 125 >> +++ >> 1 file changed, 114 insertions(+), 11 deletions(-) >> >> diff --git a/src/glsl/nir/glsl_to_nir.cpp b/src/glsl/nir/glsl_to_nir.cpp >> index f6b8331..a01ab3b 100644 >> --- a/src/glsl/nir/glsl_to_nir.cpp >> +++ b/src/glsl/nir/glsl_to_nir.cpp >> @@ -614,27 +614,130 @@ nir_visitor::visit(ir_call *ir) >> op = nir_intrinsic_atomic_counter_inc_var; >>} else if (strcmp(ir->callee_name(), >> "__intrinsic_atomic_predecrement") == 0) { >> op = nir_intrinsic_atomic_counter_dec_var; >> + } else if (strcmp(ir->callee_name(), "__intrinsic_image_load") == 0) { >> + op = nir_intrinsic_image_load; >> + } else if (strcmp(ir->callee_name(), "__intrinsic_image_store") == 0) >> { >> + op = nir_intrinsic_image_store; >> + } else if (strcmp(ir->callee_name(), "__intrinsic_image_atomic_add") >> == 0) { >> + op = nir_intrinsic_image_atomic_add; >> + } else if (strcmp(ir->callee_name(), "__intrinsic_image_atomic_min") >> == 0) { >> + op = nir_intrinsic_image_atomic_min; >> + } else if (strcmp(ir->callee_name(), "__intrinsic_image_atomic_max") >> == 0) { >> + op = nir_intrinsic_image_atomic_max; >> + } else if (strcmp(ir->callee_name(), "__intrinsic_image_atomic_and") >> == 0) { >> + op = nir_intrinsic_image_atomic_and; >> + } else if (strcmp(ir->callee_name(), "__intrinsic_image_atomic_or") >> == 0) { >> + op = nir_intrinsic_image_atomic_or; >> + } else if (strcmp(ir->callee_name(), "__intrinsic_image_atomic_xor") >> == 0) { >> + op = nir_intrinsic_image_atomic_xor; >> + } else if (strcmp(ir->callee_name(), >> "__intrinsic_image_atomic_exchange") == 0) { >> + op = nir_intrinsic_image_atomic_exchange; >> + } else if (strcmp(ir->callee_name(), >> "__intrinsic_image_atomic_comp_swap") == 0) { >> + op = nir_intrinsic_image_atomic_comp_swap; >>} else { >> unreachable("not reached"); >>} >> >>nir_intrinsic_instr *instr = nir_intrinsic_instr_create(shader, op); >> - ir_dereference *param = >> - (ir_dereference *) ir->actual_parameters.get_head(); >> - instr->variables[0] = evaluate_deref(&instr->instr, param); >> - nir_ssa_dest_init(&instr->instr, &instr->dest, 1, NULL); >> + >> + switch (op) { >> + case nir_intrinsic_atomic_counter_read_var: >> + case nir_intrinsic_atomic_counter_inc_var: >> + case nir_intrinsic_atomic_counter_dec_var: { >> + ir_dereference *param = >> +(ir_dereference *) ir->actual_parameters.get_head(); >> + instr->variables[0] = evaluate_deref(&instr->instr, param); >> + nir_ssa_dest_init(&instr->instr, &instr->dest, 1, NULL); >> + break; >> + } >> + case nir_intrinsic_image_load: >> + case nir_intrinsic_image_store: >> + case nir_intrinsic_image_atomic_add: >> + case nir_intrinsic_image_atomic_min: >> + case nir_intrinsic_image_atomic_max: >> + case nir_intrinsic_image_atomic_and: >> + case nir_intrinsic_image_atomic_or: >> + case nir_intrinsic_image_atomic_xor: >> + case nir_intrinsic_image_atomic_exchange: >> + case nir_intrinsic_image_atomic_comp_swap: { >> + nir_load_const_instr *instr_zero = >> nir_load_const_instr_create(shader, 1); >> + instr_zero->value.u[0] = 0; >> + nir_instr_insert_after_cf_list(this->cf_node_list, >> &instr_zero->instr); >> + >> + /* Set the image variable dereference. */ >> + exec_node *param = ir->actual_parameters.get_head(); >> + ir_dereference *image = (ir_dereference *)param; >> + const glsl_type *type = >> +image->variable_referenced()->type->without_array(); >> + >> + instr->variables[0] = evaluate_deref(&instr->instr, image); >> + param = param->get_next(); >> + >> + /* Set the address argument, extending the coordinate vector to >> four >> + * components. >> + */ >> + const nir_src src_addr = evaluate_rvalue((ir_dereference *)param); >> + nir_alu_instr *instr_addr = nir_alu_instr_create(shader, >> nir_op_vec4); >> + nir_ssa_dest_init(&instr_addr->instr, &instr_addr->dest.dest, 4, >> NULL); >> + >> + for (int i = 0; i < 4; i++) { >> +if (i < type->coordinate_components()) { >> + instr_addr->src[i].src = src_addr; >> + instr_addr->src[i].swizzle[0] = i; >> +} else { >> + instr_addr->src[i].src = nir_src_for_ssa(&instr_zero->def); > > I think it would better convey the intent to create an ssa_undef_instr > and use that here instead of zero. Unless something else relies on the > extra coordinates being zeroed? > Yeah, that would work too. Zero makes some sense
Re: [Mesa-dev] [PATCH 7/9] egl/wayland: assume EGL_WINDOW_BIT
Le 06/05/2015 03:00, Dave Airlie a écrit : On 2 May 2015 at 20:15, Axel Davy wrote: Only EGL_WINDOW_BIT is supported. Remove tests related. Is this there no plans to support pixmap/pbuffer/ or any of the other bits? Seems like a step in the wrong direction if we really should be supporting other things than WINDOW_BIT in the future. Dave. I double checked, and it really seems pbuffers are not supported by the current mesa implementation for wayland. However the spec doesn't tell they shouldn't be supported. What about changing the commit message to: egl/wayland: simplify dri2_wl_create_surface This function is always used with EGL_WINDOW_BIT. Pixmaps are forbidden for Wayland, and PBuffers are unimplemented. I'm also fine with dropping the patch. Yours, Axel Davy ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 13/13] mesa/main: Verify context creation on progress
On Fri, May 08, 2015 at 03:23:52PM +0300, Juha-Pekka Heikkil? wrote: >perjantai 8. toukokuuta 2015 Ian Romanick kirjoitti: > > On 05/07/2015 05:21 AM, Pohjolainen, Topi wrote: > > On Tue, May 05, 2015 at 02:25:29PM +0300, Juha-Pekka Heikkila wrote: > >> Stop context creation if something failed. If something errored > >> during context creation we'd segfault. Now will clean up and > >> return error. > >> > >> Signed-off-by: Juha-Pekka Heikkila > >> --- > >> src/mesa/main/shared.c | 66 > +++--- > >> 1 file changed, 62 insertions(+), 4 deletions(-) > >> > >> diff --git a/src/mesa/main/shared.c b/src/mesa/main/shared.c > >> index 0b76cc0..cc05b05 100644 > >> --- a/src/mesa/main/shared.c > >> +++ b/src/mesa/main/shared.c > >> @@ -64,9 +64,21 @@ _mesa_alloc_shared_state(struct gl_context *ctx) > >> > >> mtx_init(&shared->Mutex, mtx_plain); > >> > >> + /* Mutex and timestamp for texobj state validation */ > >> + mtx_init(&shared->TexMutex, mtx_recursive); > >> + shared->TextureStateStamp = 0; > > > > Do you really need to move this here? > > I was going to ask the same thing. I think moving it here means that it > can be unconditionally mtx_destroy'ed in the error path below. > >Yes, Ian got it correct. When moving mutex creation here there is no need >to go checking about it if need to clean up. I though this makes things >more clean and simple. As it can't fail, I would have moved it right before the return in the success path. Saves you from adding the mutex tear-down in the failure path. > > > > > >> + > >> shared->DisplayList = _mesa_NewHashTable(); > >> + if (!shared->DisplayList) > >> + goto error_out; > >> + > >> shared->TexObjects = _mesa_NewHashTable(); > >> + if (!shared->TexObjects) > >> + goto error_out; > >> + > >> shared->Programs = _mesa_NewHashTable(); > >> + if (!shared->Programs) > >> + goto error_out; > >> > >> shared->DefaultVertexProgram = > >>gl_vertex_program(ctx->Driver.NewProgram(ctx, > >> @@ -76,17 +88,28 @@ _mesa_alloc_shared_state(struct gl_context *ctx) > >> > GL_FRAGMENT_PROGRAM_ARB, 0)); > >> > >> shared->ATIShaders = _mesa_NewHashTable(); > >> + if (!shared->ATIShaders) > >> + goto error_out; > >> + > >> shared->DefaultFragmentShader = > _mesa_new_ati_fragment_shader(ctx, 0); > >> > >> shared->ShaderObjects = _mesa_NewHashTable(); > >> + if (!shared->ShaderObjects) > >> + goto error_out; > >> > >> shared->BufferObjects = _mesa_NewHashTable(); > >> + if (!shared->BufferObjects) > >> + goto error_out; > >> > >> /* GL_ARB_sampler_objects */ > >> shared->SamplerObjects = _mesa_NewHashTable(); > >> + if (!shared->SamplerObjects) > >> + goto error_out; > >> > >> /* Allocate the default buffer object */ > >> shared->NullBufferObj = ctx->Driver.NewBufferObject(ctx, 0); > >> + if (!shared->NullBufferObj) > >> + goto error_out; > >> > >> /* Create default texture objects */ > >> for (i = 0; i < NUM_TEXTURE_TARGETS; i++) { > >> @@ -107,22 +130,57 @@ _mesa_alloc_shared_state(struct gl_context > *ctx) > >>}; > >>STATIC_ASSERT(ARRAY_SIZE(targets) == NUM_TEXTURE_TARGETS); > >>shared->DefaultTex[i] = ctx->Driver.NewTextureObject(ctx, 0, > targets[i]); > >> + > >> + if (!shared->DefaultTex[i]) > >> + goto error_out; > >> } > >> > >> /* sanity check */ > >> assert(shared->DefaultTex[TEXTURE_1D_INDEX]->RefCount == 1); > >> > >> - /* Mutex and timestamp for texobj state validation */ > >> - mtx_init(&shared->TexMutex, mtx_recursive); > >> - shared->TextureStateStamp = 0; > >> - > >> shared->FrameBuffers = _mesa_NewHashTable(); > >> + if (!shared->FrameBuffers) > >> + goto error_out; > >> + > >> shared->RenderBuffers = _mesa_NewHashTable(); > >> + if (!shared->RenderBuffers) > >> + goto error_out; > >> > >> shared->SyncObjects = _mesa_set_create(NULL, _mesa_hash_pointer, > >>_mesa_key_pointer_equal); > >> + if (!shared->SyncObjects) > >> + goto error_out; > >> > >> return shared; > >> + > >> +error_out: > >> + for (i = 0; i < NUM_TEXTURE_TARGETS; i++) { > >> + if (shared->DefaultTex[i]) { > >> + ctx->Driver.DeleteTexture(ctx, shared->DefaultTex[i]); >
Re: [Mesa-dev] [PATCH 11/27] i965: Store gather table information in the program data
On 05/07/2015 06:17 PM, Pohjolainen, Topi wrote: > On Tue, Apr 28, 2015 at 11:08:08PM +0300, Abdiel Janulgue wrote: >> The resource streamer is able to gather and pack sparsely-located >> constant data from any buffer object by a referring to a gather table >> This patch adds support for keeping track of these constant data >> fetches into a gather table. >> >> The gather table is generated from two sources. Ordinary uniform fetches >> are stored first. These are then combined with a separate table containing >> UBO entries. The separate entry for UBOs is needed to make it easier to >> generate the gather mask when combining and packing the constant data. >> >> Signed-off-by: Abdiel Janulgue >> --- >> src/mesa/drivers/dri/i965/brw_context.h | 9 + >> src/mesa/drivers/dri/i965/brw_gs.c | 4 >> src/mesa/drivers/dri/i965/brw_program.c | 5 + >> src/mesa/drivers/dri/i965/brw_shader.cpp | 4 +++- >> src/mesa/drivers/dri/i965/brw_shader.h | 11 +++ >> src/mesa/drivers/dri/i965/brw_vs.c | 5 + >> src/mesa/drivers/dri/i965/brw_wm.c | 5 + >> 7 files changed, 42 insertions(+), 1 deletion(-) >> >> diff --git a/src/mesa/drivers/dri/i965/brw_context.h >> b/src/mesa/drivers/dri/i965/brw_context.h >> index 7fd49e9..e25c64d 100644 >> --- a/src/mesa/drivers/dri/i965/brw_context.h >> +++ b/src/mesa/drivers/dri/i965/brw_context.h >> @@ -355,9 +355,12 @@ struct brw_stage_prog_data { >> >> GLuint nr_params; /**< number of float params/constants */ >> GLuint nr_pull_params; >> + GLuint nr_ubo_params; >> + GLuint nr_gather_table; > > I would introduce these as non gl-types - we are trying to move away from > them. Perhaps change "nr_params" and "nr_pull_params" while you are at it. > >> >> unsigned curb_read_length; >> unsigned total_scratch; >> + unsigned max_ubo_const_block; >> >> /** >> * Register where the thread expects to find input data from the URB >> @@ -375,6 +378,12 @@ struct brw_stage_prog_data { >> */ >> const gl_constant_value **param; >> const gl_constant_value **pull_param; >> + struct { >> + int reg; >> + unsigned channel_mask; >> + unsigned const_block; >> + unsigned const_offset; >> + } *gather_table; >> }; > > Below in brw_shader.h you do: > >struct gather_table { > int reg; > unsigned channel_mask; > unsigned const_block; > unsigned const_offset; >}; >gather_table *ubo_gather_table; > > Why not here? I guess you're right. Yep, I can probably re-use the same definition in the next version. Thanks! > >> >> /* Data about a particular attempt to compile a program. Note that >> diff --git a/src/mesa/drivers/dri/i965/brw_gs.c >> b/src/mesa/drivers/dri/i965/brw_gs.c >> index bea90d8..97658d5 100644 >> --- a/src/mesa/drivers/dri/i965/brw_gs.c >> +++ b/src/mesa/drivers/dri/i965/brw_gs.c >> @@ -70,6 +70,10 @@ brw_compile_gs_prog(struct brw_context *brw, >> c.prog_data.base.base.pull_param = >>rzalloc_array(NULL, const gl_constant_value *, param_count); >> c.prog_data.base.base.nr_params = param_count; >> + c.prog_data.base.base.nr_gather_table = 0; >> + c.prog_data.base.base.gather_table = >> + rzalloc_size(NULL, sizeof(*c.prog_data.base.base.gather_table) * >> + (c.prog_data.base.base.nr_params + >> c.prog_data.base.base.nr_ubo_params)); > > Wrap this line. > >> >> if (brw->gen >= 7) { >>if (gp->program.OutputType == GL_POINTS) { >> diff --git a/src/mesa/drivers/dri/i965/brw_program.c >> b/src/mesa/drivers/dri/i965/brw_program.c >> index 81a0c19..f27c799 100644 >> --- a/src/mesa/drivers/dri/i965/brw_program.c >> +++ b/src/mesa/drivers/dri/i965/brw_program.c >> @@ -573,6 +573,10 @@ brw_stage_prog_data_compare(const struct >> brw_stage_prog_data *a, >> if (memcmp(a->pull_param, b->pull_param, a->nr_pull_params * sizeof(void >> *))) >>return false; >> >> + if (memcmp(a->gather_table, b->gather_table, >> + a->nr_gather_table * sizeof(*a->gather_table))) >> + return false; >> + >> return true; >> } >> >> @@ -583,6 +587,7 @@ brw_stage_prog_data_free(const void *p) >> >> ralloc_free(prog_data->param); >> ralloc_free(prog_data->pull_param); >> + ralloc_free(prog_data->gather_table); >> } >> >> void >> diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp >> b/src/mesa/drivers/dri/i965/brw_shader.cpp >> index 0d6ac0c..8769f67 100644 >> --- a/src/mesa/drivers/dri/i965/brw_shader.cpp >> +++ b/src/mesa/drivers/dri/i965/brw_shader.cpp >> @@ -739,11 +739,13 @@ backend_visitor::backend_visitor(struct brw_context >> *brw, >> prog(prog), >> stage_prog_data(stage_prog_data), >> cfg(NULL), >> - stage(stage) >> + stage(stage), >> + ubo_gather_table(NULL) >> { >> debug_enabled = INTEL_DEBUG & intel_debug_flag_for_shader_stage(stage); >> stage_name = _mesa_shader_stage_to_string(stage);
Re: [Mesa-dev] [PATCH 13/13] mesa/main: Verify context creation on progress
perjantai 8. toukokuuta 2015 Ian Romanick kirjoitti: > On 05/07/2015 05:21 AM, Pohjolainen, Topi wrote: > > On Tue, May 05, 2015 at 02:25:29PM +0300, Juha-Pekka Heikkila wrote: > >> Stop context creation if something failed. If something errored > >> during context creation we'd segfault. Now will clean up and > >> return error. > >> > >> Signed-off-by: Juha-Pekka Heikkila > > >> --- > >> src/mesa/main/shared.c | 66 > +++--- > >> 1 file changed, 62 insertions(+), 4 deletions(-) > >> > >> diff --git a/src/mesa/main/shared.c b/src/mesa/main/shared.c > >> index 0b76cc0..cc05b05 100644 > >> --- a/src/mesa/main/shared.c > >> +++ b/src/mesa/main/shared.c > >> @@ -64,9 +64,21 @@ _mesa_alloc_shared_state(struct gl_context *ctx) > >> > >> mtx_init(&shared->Mutex, mtx_plain); > >> > >> + /* Mutex and timestamp for texobj state validation */ > >> + mtx_init(&shared->TexMutex, mtx_recursive); > >> + shared->TextureStateStamp = 0; > > > > Do you really need to move this here? > > I was going to ask the same thing. I think moving it here means that it > can be unconditionally mtx_destroy'ed in the error path below. Yes, Ian got it correct. When moving mutex creation here there is no need to go checking about it if need to clean up. I though this makes things more clean and simple. > > >> + > >> shared->DisplayList = _mesa_NewHashTable(); > >> + if (!shared->DisplayList) > >> + goto error_out; > >> + > >> shared->TexObjects = _mesa_NewHashTable(); > >> + if (!shared->TexObjects) > >> + goto error_out; > >> + > >> shared->Programs = _mesa_NewHashTable(); > >> + if (!shared->Programs) > >> + goto error_out; > >> > >> shared->DefaultVertexProgram = > >>gl_vertex_program(ctx->Driver.NewProgram(ctx, > >> @@ -76,17 +88,28 @@ _mesa_alloc_shared_state(struct gl_context *ctx) > >> > GL_FRAGMENT_PROGRAM_ARB, 0)); > >> > >> shared->ATIShaders = _mesa_NewHashTable(); > >> + if (!shared->ATIShaders) > >> + goto error_out; > >> + > >> shared->DefaultFragmentShader = _mesa_new_ati_fragment_shader(ctx, > 0); > >> > >> shared->ShaderObjects = _mesa_NewHashTable(); > >> + if (!shared->ShaderObjects) > >> + goto error_out; > >> > >> shared->BufferObjects = _mesa_NewHashTable(); > >> + if (!shared->BufferObjects) > >> + goto error_out; > >> > >> /* GL_ARB_sampler_objects */ > >> shared->SamplerObjects = _mesa_NewHashTable(); > >> + if (!shared->SamplerObjects) > >> + goto error_out; > >> > >> /* Allocate the default buffer object */ > >> shared->NullBufferObj = ctx->Driver.NewBufferObject(ctx, 0); > >> + if (!shared->NullBufferObj) > >> + goto error_out; > >> > >> /* Create default texture objects */ > >> for (i = 0; i < NUM_TEXTURE_TARGETS; i++) { > >> @@ -107,22 +130,57 @@ _mesa_alloc_shared_state(struct gl_context *ctx) > >>}; > >>STATIC_ASSERT(ARRAY_SIZE(targets) == NUM_TEXTURE_TARGETS); > >>shared->DefaultTex[i] = ctx->Driver.NewTextureObject(ctx, 0, > targets[i]); > >> + > >> + if (!shared->DefaultTex[i]) > >> + goto error_out; > >> } > >> > >> /* sanity check */ > >> assert(shared->DefaultTex[TEXTURE_1D_INDEX]->RefCount == 1); > >> > >> - /* Mutex and timestamp for texobj state validation */ > >> - mtx_init(&shared->TexMutex, mtx_recursive); > >> - shared->TextureStateStamp = 0; > >> - > >> shared->FrameBuffers = _mesa_NewHashTable(); > >> + if (!shared->FrameBuffers) > >> + goto error_out; > >> + > >> shared->RenderBuffers = _mesa_NewHashTable(); > >> + if (!shared->RenderBuffers) > >> + goto error_out; > >> > >> shared->SyncObjects = _mesa_set_create(NULL, _mesa_hash_pointer, > >>_mesa_key_pointer_equal); > >> + if (!shared->SyncObjects) > >> + goto error_out; > >> > >> return shared; > >> + > >> +error_out: > >> + for (i = 0; i < NUM_TEXTURE_TARGETS; i++) { > >> + if (shared->DefaultTex[i]) { > >> + ctx->Driver.DeleteTexture(ctx, shared->DefaultTex[i]); > >> + } > >> + } > >> + > >> + _mesa_reference_buffer_object(ctx, &shared->NullBufferObj, NULL); > >> + > >> + _mesa_DeleteHashTable(shared->RenderBuffers); > >> + _mesa_DeleteHashTable(shared->FrameBuffers); > >> + _mesa_DeleteHashTable(shared->SamplerObjects); > >> + _mesa_DeleteHashTable(shared->BufferObjects); > >> + _mesa_DeleteHashTable(shared->ShaderObjects); > >> + _mesa_DeleteHashTable(shared->ATIShaders); > >> + _mesa_DeleteHashTable(shared->Programs); > >> + _mesa_DeleteHashTable(shared->TexObjects); > >> + _mesa_DeleteHashTable(shared->DisplayList); > >> + > >> + _mesa_reference_vertprog(ctx, &shared->DefaultVertexProgram, NULL); > >> + _mesa_reference_geomprog(ctx, &shared->DefaultGeometryProgram, > NULL); > >> + _mesa_reference_fragprog(ctx, &shared->DefaultFragmentProgram, > NULL); > >> + > >> + mtx_
Re: [Mesa-dev] [PATCH 03/27] i965: Enable hardware-generated binding tables on render path.
On 05/07/2015 04:43 PM, Pohjolainen, Topi wrote: > On Tue, Apr 28, 2015 at 11:08:00PM +0300, Abdiel Janulgue wrote: >> This patch implements the binding table enable command which is also >> used to allocate a binding table pool where hardware-generated >> binding table entries are flushed into. Each binding table offset in >> the binding table pool is unique per each shader stage that are >> enabled within a batch. >> >> Also insert the required brw_tracked_state objects to enable >> hw-generated binding tables in normal render path. >> >> Signed-off-by: Abdiel Janulgue >> --- >> src/mesa/drivers/dri/i965/brw_binding_tables.c | 70 >> ++ >> src/mesa/drivers/dri/i965/brw_context.c| 4 ++ >> src/mesa/drivers/dri/i965/brw_context.h| 5 ++ >> src/mesa/drivers/dri/i965/brw_state.h | 7 +++ >> src/mesa/drivers/dri/i965/brw_state_upload.c | 2 + >> src/mesa/drivers/dri/i965/intel_batchbuffer.c | 4 ++ >> 6 files changed, 92 insertions(+) >> >> diff --git a/src/mesa/drivers/dri/i965/brw_binding_tables.c >> b/src/mesa/drivers/dri/i965/brw_binding_tables.c >> index 459165a..a58e32e 100644 >> --- a/src/mesa/drivers/dri/i965/brw_binding_tables.c >> +++ b/src/mesa/drivers/dri/i965/brw_binding_tables.c >> @@ -44,6 +44,11 @@ >> #include "brw_state.h" >> #include "intel_batchbuffer.h" >> >> +/* Somehow the hw-binding table pool offset must start here, otherwise >> + * the GPU will hang >> + */ >> +#define HW_BT_START_OFFSET 256; > > I think we want to understand this a little better before enabling... > >> + >> /** >> * Upload a shader stage's binding table as indirect state. >> * >> @@ -163,6 +168,71 @@ const struct brw_tracked_state brw_gs_binding_table = { >> .emit = brw_gs_upload_binding_table, >> }; >> >> +/** >> + * Hardware-generated binding tables for the resource streamer >> + */ >> +void >> +gen7_disable_hw_binding_tables(struct brw_context *brw) >> +{ >> + BEGIN_BATCH(3); >> + OUT_BATCH(_3DSTATE_BINDING_TABLE_POOL_ALLOC << 16 | (3 - 2)); >> + OUT_BATCH(SET_FIELD(BRW_HW_BINDING_TABLE_OFF, >> BRW_HW_BINDING_TABLE_ENABLE) | >> + brw->is_haswell ? HSW_HW_BINDING_TABLE_RESERVED : 0); >> + OUT_BATCH(0); >> + ADVANCE_BATCH(); >> + >> + /* Pipe control workaround */ >> + brw_emit_pipe_control_flush(brw, PIPE_CONTROL_STATE_CACHE_INVALIDATE); >> +} >> + >> +void >> +gen7_enable_hw_binding_tables(struct brw_context *brw) >> +{ >> + if (!brw->has_resource_streamer) { >> + gen7_disable_hw_binding_tables(brw); > > I started wondering why we really need this - RS is disabled by default and > we haven't needed to do anything to disable it before. > >> + return; >> + } >> + >> + if (!brw->hw_bt_pool.bo) { >> + /* From the BSpec, 3D Pipeline > Resource Streamer > Hardware Binding >> Tables: >> + * >> + * "A maximum of 16,383 Binding tables are allowed in any batch >> buffer." >> + */ >> + int max_size = 16383 * 4; > > But does it really need this much all the time? I guess I need to go and > read the spec. This is actually just one re-usable buffer object sticking around for the lifetime of the context. Compare this with creating lots of bo every-time we enable the resource streamer. I think it helps with reducing the amount of relocations we have. > >> + brw->hw_bt_pool.bo = drm_intel_bo_alloc(brw->bufmgr, "hw_bt", >> + max_size, 64); >> + brw->hw_bt_pool.next_offset = HW_BT_START_OFFSET; >> + } >> + >> + uint32_t dw1 = SET_FIELD(BRW_HW_BINDING_TABLE_ON, >> BRW_HW_BINDING_TABLE_ENABLE); >> + if (brw->is_haswell) >> + dw1 |= SET_FIELD(GEN7_MOCS_L3, GEN7_HW_BT_MOCS) | >> HSW_HW_BINDING_TABLE_RESERVED; > > These are overflowing 80 columns. > >> + >> + BEGIN_BATCH(3); >> + OUT_BATCH(_3DSTATE_BINDING_TABLE_POOL_ALLOC << 16 | (3 - 2)); >> + OUT_RELOC(brw->hw_bt_pool.bo, I915_GEM_DOMAIN_SAMPLER, 0, dw1); >> + OUT_RELOC(brw->hw_bt_pool.bo, I915_GEM_DOMAIN_SAMPLER, 0, >> + brw->hw_bt_pool.bo->size); >> + ADVANCE_BATCH(); >> + >> + /* Pipe control workaround */ >> + brw_emit_pipe_control_flush(brw, PIPE_CONTROL_STATE_CACHE_INVALIDATE); > > Would you have a spec reference for this? 3D-Media-GPGPU Engine > Resource Streamer [HSW+] > Hardware Binding Tables [HSW+] > Programming note "When switching between HW and SW binding table generation, SW must issue a state cache invalidate." > >> +} >> + >> +void >> +gen7_reset_rs_pool_offsets(struct brw_context *brw) >> +{ >> + brw->hw_bt_pool.next_offset = HW_BT_START_OFFSET; >> +} >> + >> +const struct brw_tracked_state gen7_hw_binding_tables = { >> + .dirty = { >> + .mesa = 0, >> + .brw = BRW_NEW_BATCH, >> + }, >> + .emit = gen7_enable_hw_binding_tables >> +}; >> + >> /** @} */ >> >> /** >> diff --git a/src/mesa/drivers/dri/i965/brw_context.c >> b/src/mesa/drivers/dri/i965/brw_context.c >> index c7e1e81..9c7ccae 100
Re: [Mesa-dev] [PATCH 26/27] i965: Disable gather push constants for null constants
On 05/07/2015 05:48 PM, Pohjolainen, Topi wrote: > On Tue, Apr 28, 2015 at 11:08:23PM +0300, Abdiel Janulgue wrote: >> Programming null constants with gather constant tables seems to >> be unsupported and results in a GPU lockup even with the prescribed >> GPU workarounds in the bspec. Found out by trial and error that >> disabling HW gather constant when the constant state for a stage >> needs to be nullified is the only way to go around the issue. > > Just a general question. We keep resource streamer itself always enabled > (except for blorp of course). Does it still do something meaningful without > gather constants or should we disable them both? > The resource streamer is just an infrastructure containing the specific optimizations hw-generated binding tables and gather constants. The dependency below flows from left to right. RS_on ringbuffer --> hw-binding tables --> gather constants So switching on gather constants require HW-gen binding tables be enabled as well. But hw-generated binding table is not required to be switched off for gather constants to be turned off. The only problem with disabling both is the additional required state setup packets to tear-down and then re-enable everything again which increases the size of the command buffers for every 3D primitive -Abdiel ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 03/27] i965: Enable hardware-generated binding tables on render path.
On 05/07/2015 05:46 PM, Pohjolainen, Topi wrote: > On Thu, May 07, 2015 at 04:43:21PM +0300, Pohjolainen, Topi wrote: >> On Tue, Apr 28, 2015 at 11:08:00PM +0300, Abdiel Janulgue wrote: >>> This patch implements the binding table enable command which is also >>> used to allocate a binding table pool where hardware-generated >>> binding table entries are flushed into. Each binding table offset in >>> the binding table pool is unique per each shader stage that are >>> enabled within a batch. >>> >>> Also insert the required brw_tracked_state objects to enable >>> hw-generated binding tables in normal render path. >>> >>> Signed-off-by: Abdiel Janulgue >>> --- >>> src/mesa/drivers/dri/i965/brw_binding_tables.c | 70 >>> ++ >>> src/mesa/drivers/dri/i965/brw_context.c| 4 ++ >>> src/mesa/drivers/dri/i965/brw_context.h| 5 ++ >>> src/mesa/drivers/dri/i965/brw_state.h | 7 +++ >>> src/mesa/drivers/dri/i965/brw_state_upload.c | 2 + >>> src/mesa/drivers/dri/i965/intel_batchbuffer.c | 4 ++ >>> 6 files changed, 92 insertions(+) >>> >>> diff --git a/src/mesa/drivers/dri/i965/brw_binding_tables.c >>> b/src/mesa/drivers/dri/i965/brw_binding_tables.c >>> index 459165a..a58e32e 100644 >>> --- a/src/mesa/drivers/dri/i965/brw_binding_tables.c >>> +++ b/src/mesa/drivers/dri/i965/brw_binding_tables.c >>> @@ -44,6 +44,11 @@ >>> #include "brw_state.h" >>> #include "intel_batchbuffer.h" >>> >>> +/* Somehow the hw-binding table pool offset must start here, otherwise >>> + * the GPU will hang >>> + */ >>> +#define HW_BT_START_OFFSET 256; >> >> I think we want to understand this a little better before enabling... Actually, now that I remember I had my notes somewhere when I first enabled this hw-binding table over a year ago. I dug it up and 256 is the actually the size of a single stage's "hw-binding table state" expressed in hw-binding table format. Details: From the Bspec 3DSTATE_BINDING_TABLE_POINTERS_x > Pointer to PS Binding Table section lists the format as: "SurfaceStateOffset[16:6]BINDING_TABLE_STATE*256 When HW-generated binding table is enabled" So this is 16-bits[1] x 256 = 512 bytes. Now this offset must be expressed in "Pointer to PS Binding Table" using the hw-generated binding table format which must be aligned to 16:6. However the bit entry field in dw1 of 3DSTATE_BINDING_TABLE_POINTERS_x must be set within 15:5 so this value should be >> 1. Hence, the 256 (similar case is evident on function gen7_update_binding_table() in patch 4 of this series). Seems the RS hardware is extremely intolerant of even slight variations hence the hungs when this is not followed closely. In the next version, I can make the magic numbers a bit more clearer. [1] 3D-Media-GPGPU Engine > Shared Functions > 3D Sampler > State > HW Generated BINDING_TABLE_STATE >> >>> + >>> /** >>> * Upload a shader stage's binding table as indirect state. >>> * >>> @@ -163,6 +168,71 @@ const struct brw_tracked_state brw_gs_binding_table = { >>> .emit = brw_gs_upload_binding_table, >>> }; >>> >>> +/** >>> + * Hardware-generated binding tables for the resource streamer >>> + */ >>> +void >>> +gen7_disable_hw_binding_tables(struct brw_context *brw) >>> +{ >>> + BEGIN_BATCH(3); >>> + OUT_BATCH(_3DSTATE_BINDING_TABLE_POOL_ALLOC << 16 | (3 - 2)); >>> + OUT_BATCH(SET_FIELD(BRW_HW_BINDING_TABLE_OFF, >>> BRW_HW_BINDING_TABLE_ENABLE) | >>> + brw->is_haswell ? HSW_HW_BINDING_TABLE_RESERVED : 0); >>> + OUT_BATCH(0); >>> + ADVANCE_BATCH(); >>> + >>> + /* Pipe control workaround */ >>> + brw_emit_pipe_control_flush(brw, PIPE_CONTROL_STATE_CACHE_INVALIDATE); >>> +} >>> + >>> +void >>> +gen7_enable_hw_binding_tables(struct brw_context *brw) >>> +{ >>> + if (!brw->has_resource_streamer) { >>> + gen7_disable_hw_binding_tables(brw); >> >> I started wondering why we really need this - RS is disabled by default and >> we haven't needed to do anything to disable it before. > > Right, patch number eight gave me the answer, we want to disable it for blorp. > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [RFC PATCH] nir: Transform 4*x into x << 2 during late optimizations.
According to Glenn, shifts on R600 have 5x the throughput as multiplies. Intel GPUs have strange integer multiplication restrictions - on most hardware, MUL actually only does a 32-bit x 16-bit multiply. This means the arguments aren't commutative, which can limit our constant propagation options. SHL has no such restrictions. Shifting is probably reasonable on most people's hardware, so let's just do that. i965 shader-db results (using NIR for VS): total instructions in shared programs: 7432587 -> 7388982 (-0.59%) instructions in affected programs: 1360411 -> 1316806 (-3.21%) helped:5772 HURT: 0 Signed-off-by: Kenneth Graunke Cc: matts...@gmail.com Cc: ja...@jlekstrand.net --- src/glsl/nir/nir_opt_algebraic.py | 5 + 1 file changed, 5 insertions(+) So...I found a bizarre issue with this patch. (('imul', 4, a), ('ishl', a, 2)), totally optimizes things. However... (('imul', a, 4), ('ishl', a, 2)), doesn't seem to do anything, even though imul is commutative, and nir_search should totally handle that... ▄▄ ▄▄▄▄ ▄ ▄▄ ██ ██ ▀▀▀██▀▀▀ ███ ██ ▀█▄ ██ ▄█▀ ██ ▄█▀ ██ ██ ██ ██ ██ ██ ██ ▄██▀ ██ ███▀▀███ ██ ██ ██ ▀▀ ███ ███ ▄██ ██▄ ██ ▄▄ ▄▄ ▀▀▀ ▀▀▀ ▀▀▀▀ ▀▀ ▀▀ ▀▀ If you know why, let me know, otherwise I may have to look into it when more awake. diff --git a/src/glsl/nir/nir_opt_algebraic.py b/src/glsl/nir/nir_opt_algebraic.py index 400d60e..350471f 100644 --- a/src/glsl/nir/nir_opt_algebraic.py +++ b/src/glsl/nir/nir_opt_algebraic.py @@ -247,6 +247,11 @@ late_optimizations = [ (('fge', ('fadd', a, b), 0.0), ('fge', a, ('fneg', b))), (('feq', ('fadd', a, b), 0.0), ('feq', a, ('fneg', b))), (('fne', ('fadd', a, b), 0.0), ('fne', a, ('fneg', b))), + + # Multiplication by 4 comes up fairly often in indirect offset calculations. + # Some GPUs have weird integer multiplication limitations, but shifts should work + # equally well everywhere. + (('imul', 4, a), ('ishl', a, 2)), ] print nir_algebraic.AlgebraicPass("nir_opt_algebraic", optimizations).render() -- 2.4.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] main: glGetIntegeri_v fails for GL_VERTEX_BINDING_STRIDE
That is strange, it was introduced in fb370f89d but then has gone missing .. Reviewed-by: Tapani Pälli On 05/07/2015 06:13 PM, Marta Lofstedt wrote: The return type for GL_VERTEX_BINDING_STRIDE is missing, this cause glGetIntegeri_v to fail. Signed-off-by: Marta Lofstedt --- src/mesa/main/get.c | 1 + 1 file changed, 1 insertion(+) diff --git a/src/mesa/main/get.c b/src/mesa/main/get.c index 6fc0f3f..9fb8fba 100644 --- a/src/mesa/main/get.c +++ b/src/mesa/main/get.c @@ -1959,6 +1959,7 @@ find_value_indexed(const char *func, GLenum pname, GLuint index, union value *v) if (index >= ctx->Const.Program[MESA_SHADER_VERTEX].MaxAttribs) goto invalid_value; v->value_int = ctx->Array.VAO->VertexBinding[VERT_ATTRIB_GENERIC(index)].Stride; + return TYPE_INT; /* ARB_shader_image_load_store */ case GL_IMAGE_BINDING_NAME: { ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965: Use NIR by default for vertex shaders on GEN8+
On Thursday, May 07, 2015 06:17:46 PM Matt Turner wrote: > On Thu, May 7, 2015 at 4:50 PM, Jason Ekstrand wrote: > > GLSL IR vs. NIR shader-db results for SIMD8 vertex shaders on Broadwell: > > > >total instructions in shared programs: 2724483 -> 2711790 (-0.47%) > >instructions in affected programs: 1860859 -> 1848166 (-0.68%) > >helped:4387 > >HURT: 4758 > >GAINED:1499 > > > > The gained programs are ARB vertext programs that were previously going > > through the vec4 backend. Now that we have prog_to_nir, ARB vertex > > programs can go through the scalar backend so they show up as "gained" in > > the shader-db results. > > Again, I'm kind of confused and disappointed that we're just okay with > hurting 4700 programs without more analysis. I guess I'll go do > that... I took a stab at that tonight. The good news is, the majority of the hurt is pretty stupid. Indirect uniform address calculations involve a lot of integer multiplication by 4. For whatever reason, we're getting 4*x instead of x*4, which doesn't support immediates. So we get: MOV tmp 4 MUL dst tmp x Normally, constant propagation would commute the operands, giving us "MUL dst x 4" like we want. But it sees integer multiplication and chickens out, due to the asymmetry on some platforms. I think we can extend that - on Broadwell it should work fine, and might work fine for 16-bit immediates on Gen7 and Cherryview, too. Alternatively, I wrote a nir_opt_algebraic_late optimization that turns 4*x into x << 2, which works around the problem, and is also apparently much better for R600. Statistics on the shift patch are: total instructions in shared programs: 7432587 -> 7388982 (-0.59%) instructions in affected programs: 1360411 -> 1316806 (-3.21%) helped:5772 HURT: 0 Statistics for GLSL IR vs. NIR+(4*x => x << 2): total instructions in shared programs: 7232451 -> 7175983 (-0.78%) instructions in affected programs: 1586917 -> 1530449 (-3.56%) helped:5780 HURT: 1654 which is much better. Looking at a couple of the shaders that are still worse off...it looks like a ton of Source shaders used to do MUL/ADD with an attribute and two immediates, and now are doing MOV/MOV/MAD. signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v2 1/6] mesa/es3.1: enable GL_ARB_shader_image_load_store for gles3.1
> > > On 05/08/2015 12:13 AM, Ian Romanick wrote: >> On 05/07/2015 12:57 AM, Marta Lofstedt wrote: >>> From: Marta Lofstedt >>> >>> v2: only expose enums from GL_ARB_shader_image_load_store >>> for gles 3.1 and GL core >>> >>> Signed-off-by: Marta Lofstedt >>> --- >>> src/mesa/main/get.c | 6 ++ >>> src/mesa/main/get_hash_params.py | 17 - >>> 2 files changed, 14 insertions(+), 9 deletions(-) >>> >>> diff --git a/src/mesa/main/get.c b/src/mesa/main/get.c >>> index 9898197..73739b6 100644 >>> --- a/src/mesa/main/get.c >>> +++ b/src/mesa/main/get.c >>> @@ -355,6 +355,12 @@ static const int extra_ARB_draw_indirect_es31[] = >>> { >>> EXTRA_END >>> }; >>> >>> +static const int extra_ARB_shader_image_load_store_es31[] = { >>> + EXT(ARB_shader_image_load_store), >>> + EXTRA_API_ES31, >> >> I think you're missing the patch that adds EXTRA_API_ES31. Did you >> forget to send that one out? > > Marta's series builds on top of my patch here that adds EXTRA_API_ES31: > > http://lists.freedesktop.org/archives/mesa-dev/2015-May/083593.html > >> Also, on a few of these patches, I think the old, non-_es31 set of >> requirements can be removed due to no longer being used. >> Ian, are you referring to the stuff that is left with the extra_ARB_shader_atomic_counters_and_geometry_shader: do you want me to remove those and it's companion struct in get.c? >>> + EXTRA_END >>> +}; >>> + >>> EXTRA_EXT(ARB_texture_cube_map); >>> EXTRA_EXT(EXT_texture_array); >>> EXTRA_EXT(NV_fog_distance); >>> diff --git a/src/mesa/main/get_hash_params.py >>> b/src/mesa/main/get_hash_params.py >>> index 513d5d2..85c2494 100644 >>> --- a/src/mesa/main/get_hash_params.py >>> +++ b/src/mesa/main/get_hash_params.py >>> @@ -413,6 +413,14 @@ descriptor=[ >>> { "apis": ["GL_CORE", "GLES3"], "params": [ >>> # GL_ARB_draw_indirect / GLES 3.1 >>> [ "DRAW_INDIRECT_BUFFER_BINDING", "LOC_CUSTOM, TYPE_INT, 0, >>> extra_ARB_draw_indirect_es31" ], >>> +# GL_ARB_shader_image_load_store / GLES 3.1 >>> + [ "MAX_IMAGE_UNITS", "CONTEXT_INT(Const.MaxImageUnits), >>> extra_ARB_shader_image_load_store_es31"], >>> + [ "MAX_COMBINED_IMAGE_UNITS_AND_FRAGMENT_OUTPUTS", >>> "CONTEXT_INT(Const.MaxCombinedImageUnitsAndFragmentOutputs), >>> extra_ARB_shader_image_load_store_es31"], >>> + [ "MAX_IMAGE_SAMPLES", "CONTEXT_INT(Const.MaxImageSamples), >>> extra_ARB_shader_image_load_store_es31"], >>> + [ "MAX_VERTEX_IMAGE_UNIFORMS", >>> "CONTEXT_INT(Const.Program[MESA_SHADER_VERTEX].MaxImageUniforms), >>> extra_ARB_shader_image_load_store_es31"], >>> + [ "MAX_GEOMETRY_IMAGE_UNIFORMS", >>> "CONTEXT_INT(Const.Program[MESA_SHADER_GEOMETRY].MaxImageUniforms), >>> extra_ARB_shader_image_load_store_es31"], >>> + [ "MAX_FRAGMENT_IMAGE_UNIFORMS", >>> "CONTEXT_INT(Const.Program[MESA_SHADER_FRAGMENT].MaxImageUniforms), >>> extra_ARB_shader_image_load_store_es31"], >>> + [ "MAX_COMBINED_IMAGE_UNIFORMS", >>> "CONTEXT_INT(Const.MaxCombinedImageUniforms), >>> extra_ARB_shader_image_load_store_es31"], >>> ]}, >>> >>> # Remaining enums are only in OpenGL >>> @@ -780,15 +788,6 @@ descriptor=[ >>> [ "MAX_VERTEX_ATTRIB_RELATIVE_OFFSET", >>> "CONTEXT_ENUM(Const.MaxVertexAttribRelativeOffset), NO_EXTRA" ], >>> [ "MAX_VERTEX_ATTRIB_BINDINGS", >>> "CONTEXT_ENUM(Const.MaxVertexAttribBindings), NO_EXTRA" ], >>> >>> -# GL_ARB_shader_image_load_store >>> - [ "MAX_IMAGE_UNITS", "CONTEXT_INT(Const.MaxImageUnits), >>> extra_ARB_shader_image_load_store"], >>> - [ "MAX_COMBINED_IMAGE_UNITS_AND_FRAGMENT_OUTPUTS", >>> "CONTEXT_INT(Const.MaxCombinedImageUnitsAndFragmentOutputs), >>> extra_ARB_shader_image_load_store"], >>> - [ "MAX_IMAGE_SAMPLES", "CONTEXT_INT(Const.MaxImageSamples), >>> extra_ARB_shader_image_load_store"], >>> - [ "MAX_VERTEX_IMAGE_UNIFORMS", >>> "CONTEXT_INT(Const.Program[MESA_SHADER_VERTEX].MaxImageUniforms), >>> extra_ARB_shader_image_load_store"], >>> - [ "MAX_GEOMETRY_IMAGE_UNIFORMS", >>> "CONTEXT_INT(Const.Program[MESA_SHADER_GEOMETRY].MaxImageUniforms), >>> extra_ARB_shader_image_load_store_and_geometry_shader"], >>> - [ "MAX_FRAGMENT_IMAGE_UNIFORMS", >>> "CONTEXT_INT(Const.Program[MESA_SHADER_FRAGMENT].MaxImageUniforms), >>> extra_ARB_shader_image_load_store"], >>> - [ "MAX_COMBINED_IMAGE_UNIFORMS", >>> "CONTEXT_INT(Const.MaxCombinedImageUniforms), >>> extra_ARB_shader_image_load_store"], >>> - >>> # GL_ARB_compute_shader >>> [ "MAX_COMPUTE_WORK_GROUP_INVOCATIONS", >>> "CONTEXT_INT(Const.MaxComputeWorkGroupInvocations), >>> extra_ARB_compute_shader" ], >>> [ "MAX_COMPUTE_UNIFORM_BLOCKS", "CONST(MAX_COMPUTE_UNIFORM_BLOCKS), >>> extra_ARB_compute_shader" ], >>> >> >> ___ >> mesa-dev mailing list >> mesa-dev@lists.freedesktop.org >> http://lists.freedesktop.org/mailman/listinfo/mesa-dev >> > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] nir: fix sampler lowering pass for arrays
On 05/08/2015 09:56 AM, Pohjolainen, Topi wrote: On Fri, May 08, 2015 at 09:51:54AM +0300, Tapani P?lli wrote: This fixes bugs with special cases where we have arrays of structures containing samplers or arrays of samplers. I've verified that patch results in calculating same index value as returned by _mesa_get_sampler_uniform_value for IR. Patch makes following ES3 conformance test pass: ES3-CTS.shaders.struct.uniform.sampler_array_fragment Signed-off-by: Tapani Pälli Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90114 --- src/glsl/nir/nir_lower_samplers.cpp | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/src/glsl/nir/nir_lower_samplers.cpp b/src/glsl/nir/nir_lower_samplers.cpp index cf8ab83..9859cc0 100644 --- a/src/glsl/nir/nir_lower_samplers.cpp +++ b/src/glsl/nir/nir_lower_samplers.cpp @@ -78,7 +78,11 @@ lower_sampler(nir_tex_instr *instr, const struct gl_shader_program *shader_progr instr->sampler_index *= glsl_get_length(deref->type); switch (deref_array->deref_array_type) { case nir_deref_array_type_direct: -instr->sampler_index += deref_array->base_offset; + +/* If this is an array of samplers. */ Above the case is for arrays and below you check for the sampler. This comment does not tell much extra :) Yeah, not sure how to change it. What I want to state here is that only for arrays of samplers we need to do this, otherwise we don't. The only other case really is array of structs that contain sampler so maybe I should state that instead? +if (deref->child->type->base_type == GLSL_TYPE_SAMPLER) + instr->sampler_index += deref_array->base_offset; + if (deref_array->deref.child) ralloc_asprintf_append(&name, "[%u]", deref_array->base_offset); break; -- 2.1.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev