Re: [Mesa-dev] [PATCH 0/9] NIR dead control-flow removal

2015-05-08 Thread Jason Ekstrand
On Fri, May 8, 2015 at 10:03 PM, Connor Abbott  wrote:
> This series implements a bunch of related optimizations that run at
> once as part of the same pass to eliminate control flow that is
> useless. This includes both unreachable code and useless loops, i.e.
> loops that don't compute a result used by the rest of the code. To do
> this, I needed to change some low-level things that change the behavior
> of nir_instr_remove() (see patch 2 for details), but I've tested the
> series on piglit and there are no regressions, and there are no
> shader-db changes either.

I tried to get you shader-db results, but it crashes part-way through.
In particular, it looks like unigen heaven on high will crash it.
Either that or a couple of synmark shaders but I doubt those are the
culprets.

I think it's also worth noting that piglit probably isn't a good test
of whether or not this works.   I doubt there are many piglit shaders
with dead control-flow and I think GLSL IR can get rid of it in some
of the common cases.  If you wanted to "generate NIR" to test with,
you could always hack up my SPIRV -> NIR branch to run an optimization
loop after converting.  Then you can see what happens without GLSL IR
in the way.
--Jason

> This is rebased on Jason's series to use lists for use-def sets; it
> turns out that the changes required were pretty minimal.
>
> The series is also available at:
>
> git://people.freedesktop.org/~cwabbott0/mesa nir-dead-cf-v3
>
> Connor Abbott (9):
>   nir/vars_to_ssa: don't rewrite removed instructions
>   nir: insert ssa_undef instructions when cleanup up defs/uses
>   nir: add an optimization for removing dead control flow
>   nir/validate: validate successors at the end of a loop
>   nir/dead_cf: delete code that's unreachable due to jumps
>   nir: add nir_block_get_following_loop() helper
>   nir: add a helper for iterating over blocks in a cf node
>   nir/dead_cf: add support for removing useless loops
>   i965/fs/nir: enable the dead control flow optimization
>
>  src/glsl/Makefile.sources|   1 +
>  src/glsl/nir/nir.c   |  94 --
>  src/glsl/nir/nir.h   |  12 ++
>  src/glsl/nir/nir_lower_vars_to_ssa.c |   3 +-
>  src/glsl/nir/nir_opt_dead_cf.c   | 332 
> +++
>  src/glsl/nir/nir_validate.c  |  21 +++
>  src/mesa/drivers/dri/i965/brw_nir.c  |   2 +
>  7 files changed, 452 insertions(+), 13 deletions(-)
>  create mode 100644 src/glsl/nir/nir_opt_dead_cf.c
>
> --
> 2.1.0
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/9] nir: insert ssa_undef instructions when cleanup up defs/uses

2015-05-08 Thread Connor Abbott
On Sat, May 9, 2015 at 1:52 AM, Jason Ekstrand  wrote:
> On Fri, May 8, 2015 at 10:03 PM, Connor Abbott  wrote:
>> The point of cleanup_defs_uses() is to make an instruction safe to
>> remove by removing any references that the rest of the shader may have
>> to it. Previously, it was handling register use/def sets and removing
>> the instruction from the use sets of any SSA sources it had, but if the
>> instruction defined an SSA value that was used by other instructions it
>> wasn't doing anything. This was ok, since we were always careful to make
>> sure that no removed instruction ever had any uses, but now we want to
>> start removing unreachable instruction which might still be used in
>> reachable parts of the code. In that case, the value that any uses get
>> is undefined since the instruction never actually executes, so we can
>> just replace the instruction with an ssa_undef_instr.
>>
>> Signed-off-by: Connor Abbott 
>> ---
>>  src/glsl/nir/nir.c | 47 ++-
>>  1 file changed, 34 insertions(+), 13 deletions(-)
>>
>> diff --git a/src/glsl/nir/nir.c b/src/glsl/nir/nir.c
>> index f03e80a..362ac30 100644
>> --- a/src/glsl/nir/nir.c
>> +++ b/src/glsl/nir/nir.c
>> @@ -1206,26 +1206,29 @@ stitch_blocks(nir_block *before, nir_block *after)
>>  }
>>
>>  static void
>> -remove_defs_uses(nir_instr *instr);
>> +remove_defs_uses(nir_instr *instr, nir_function_impl *impl);
>>
>>  static void
>> -cleanup_cf_node(nir_cf_node *node)
>> +cleanup_cf_node(nir_cf_node *node, nir_function_impl *impl)
>>  {
>> switch (node->type) {
>> case nir_cf_node_block: {
>>nir_block *block = nir_cf_node_as_block(node);
>>/* We need to walk the instructions and clean up defs/uses */
>> -  nir_foreach_instr(block, instr)
>> - remove_defs_uses(instr);
>> +  nir_foreach_instr(block, instr) {
>> + if (instr->type == nir_instr_type_jump)
>> +handle_remove_jump(block, nir_instr_as_jump(instr)->type);
>> + remove_defs_uses(instr, impl);
>> +  }
>
> This looks like an unrelated change.  Maybe split that out?  The rest
> of the patch looks plausible.
> --Jason

Yeah, right... part of this change is just passing impl through to
remove_defs_uses() so it doesn't have to recompute it, but the other
part is fixing another bug where we would forget to fixup the
successors/predecessors when removing jumps. I'll split out the latter
part.

>
>>break;
>> }
>>
>> case nir_cf_node_if: {
>>nir_if *if_stmt = nir_cf_node_as_if(node);
>>foreach_list_typed(nir_cf_node, child, node, &if_stmt->then_list)
>> - cleanup_cf_node(child);
>> + cleanup_cf_node(child, impl);
>>foreach_list_typed(nir_cf_node, child, node, &if_stmt->else_list)
>> - cleanup_cf_node(child);
>> + cleanup_cf_node(child, impl);
>>
>>list_del(&if_stmt->condition.use_link);
>>break;
>> @@ -1234,13 +1237,12 @@ cleanup_cf_node(nir_cf_node *node)
>> case nir_cf_node_loop: {
>>nir_loop *loop = nir_cf_node_as_loop(node);
>>foreach_list_typed(nir_cf_node, child, node, &loop->body)
>> - cleanup_cf_node(child);
>> + cleanup_cf_node(child, impl);
>>break;
>> }
>> case nir_cf_node_function: {
>> -  nir_function_impl *impl = nir_cf_node_as_function(node);
>>foreach_list_typed(nir_cf_node, child, node, &impl->body)
>> - cleanup_cf_node(child);
>> + cleanup_cf_node(child, impl);
>>break;
>> }
>> default:
>> @@ -1254,6 +1256,8 @@ nir_cf_node_remove(nir_cf_node *node)
>> nir_function_impl *impl = nir_cf_node_get_function(node);
>> nir_metadata_preserve(impl, nir_metadata_none);
>>
>> +   cleanup_cf_node(node, impl);
>> +
>> if (node->type == nir_cf_node_block) {
>>/*
>> * Basic blocks can't really be removed by themselves, since they act 
>> as
>> @@ -1275,8 +1279,6 @@ nir_cf_node_remove(nir_cf_node *node)
>>exec_node_remove(&node->node);
>>stitch_blocks(before_block, after_block);
>> }
>> -
>> -   cleanup_cf_node(node);
>>  }
>>
>>  static bool
>> @@ -1443,16 +1445,35 @@ remove_def_cb(nir_dest *dest, void *state)
>> return true;
>>  }
>>
>> +static bool
>> +remove_ssa_def_cb(nir_ssa_def *def, void *state)
>> +{
>> +   nir_function_impl *impl = state;
>> +   nir_shader *shader = impl->overload->function->shader;
>> +
>> +   if (!list_empty(&def->uses) || !list_empty(&def->if_uses)) {
>> +  nir_ssa_undef_instr *undef =
>> + nir_ssa_undef_instr_create(shader, def->num_components);
>> +  nir_instr_insert_before_cf_list(&impl->body, &undef->instr);
>> +  nir_ssa_def_rewrite_uses(def, nir_src_for_ssa(&undef->def), shader);
>> +   }
>> +
>> +   return true;
>> +}
>> +
>> +
>>  static void
>> -remove_defs_uses(nir_instr *instr)
>> +remove_defs_uses(nir_instr *instr, nir_function_impl *impl)
>>  {
>> nir_foreach_dest(instr, remove_def_cb, instr);

Re: [Mesa-dev] [PATCH 2/9] nir: insert ssa_undef instructions when cleanup up defs/uses

2015-05-08 Thread Jason Ekstrand
On Fri, May 8, 2015 at 10:03 PM, Connor Abbott  wrote:
> The point of cleanup_defs_uses() is to make an instruction safe to
> remove by removing any references that the rest of the shader may have
> to it. Previously, it was handling register use/def sets and removing
> the instruction from the use sets of any SSA sources it had, but if the
> instruction defined an SSA value that was used by other instructions it
> wasn't doing anything. This was ok, since we were always careful to make
> sure that no removed instruction ever had any uses, but now we want to
> start removing unreachable instruction which might still be used in
> reachable parts of the code. In that case, the value that any uses get
> is undefined since the instruction never actually executes, so we can
> just replace the instruction with an ssa_undef_instr.
>
> Signed-off-by: Connor Abbott 
> ---
>  src/glsl/nir/nir.c | 47 ++-
>  1 file changed, 34 insertions(+), 13 deletions(-)
>
> diff --git a/src/glsl/nir/nir.c b/src/glsl/nir/nir.c
> index f03e80a..362ac30 100644
> --- a/src/glsl/nir/nir.c
> +++ b/src/glsl/nir/nir.c
> @@ -1206,26 +1206,29 @@ stitch_blocks(nir_block *before, nir_block *after)
>  }
>
>  static void
> -remove_defs_uses(nir_instr *instr);
> +remove_defs_uses(nir_instr *instr, nir_function_impl *impl);
>
>  static void
> -cleanup_cf_node(nir_cf_node *node)
> +cleanup_cf_node(nir_cf_node *node, nir_function_impl *impl)
>  {
> switch (node->type) {
> case nir_cf_node_block: {
>nir_block *block = nir_cf_node_as_block(node);
>/* We need to walk the instructions and clean up defs/uses */
> -  nir_foreach_instr(block, instr)
> - remove_defs_uses(instr);
> +  nir_foreach_instr(block, instr) {
> + if (instr->type == nir_instr_type_jump)
> +handle_remove_jump(block, nir_instr_as_jump(instr)->type);
> + remove_defs_uses(instr, impl);
> +  }

This looks like an unrelated change.  Maybe split that out?  The rest
of the patch looks plausible.
--Jason

>break;
> }
>
> case nir_cf_node_if: {
>nir_if *if_stmt = nir_cf_node_as_if(node);
>foreach_list_typed(nir_cf_node, child, node, &if_stmt->then_list)
> - cleanup_cf_node(child);
> + cleanup_cf_node(child, impl);
>foreach_list_typed(nir_cf_node, child, node, &if_stmt->else_list)
> - cleanup_cf_node(child);
> + cleanup_cf_node(child, impl);
>
>list_del(&if_stmt->condition.use_link);
>break;
> @@ -1234,13 +1237,12 @@ cleanup_cf_node(nir_cf_node *node)
> case nir_cf_node_loop: {
>nir_loop *loop = nir_cf_node_as_loop(node);
>foreach_list_typed(nir_cf_node, child, node, &loop->body)
> - cleanup_cf_node(child);
> + cleanup_cf_node(child, impl);
>break;
> }
> case nir_cf_node_function: {
> -  nir_function_impl *impl = nir_cf_node_as_function(node);
>foreach_list_typed(nir_cf_node, child, node, &impl->body)
> - cleanup_cf_node(child);
> + cleanup_cf_node(child, impl);
>break;
> }
> default:
> @@ -1254,6 +1256,8 @@ nir_cf_node_remove(nir_cf_node *node)
> nir_function_impl *impl = nir_cf_node_get_function(node);
> nir_metadata_preserve(impl, nir_metadata_none);
>
> +   cleanup_cf_node(node, impl);
> +
> if (node->type == nir_cf_node_block) {
>/*
> * Basic blocks can't really be removed by themselves, since they act 
> as
> @@ -1275,8 +1279,6 @@ nir_cf_node_remove(nir_cf_node *node)
>exec_node_remove(&node->node);
>stitch_blocks(before_block, after_block);
> }
> -
> -   cleanup_cf_node(node);
>  }
>
>  static bool
> @@ -1443,16 +1445,35 @@ remove_def_cb(nir_dest *dest, void *state)
> return true;
>  }
>
> +static bool
> +remove_ssa_def_cb(nir_ssa_def *def, void *state)
> +{
> +   nir_function_impl *impl = state;
> +   nir_shader *shader = impl->overload->function->shader;
> +
> +   if (!list_empty(&def->uses) || !list_empty(&def->if_uses)) {
> +  nir_ssa_undef_instr *undef =
> + nir_ssa_undef_instr_create(shader, def->num_components);
> +  nir_instr_insert_before_cf_list(&impl->body, &undef->instr);
> +  nir_ssa_def_rewrite_uses(def, nir_src_for_ssa(&undef->def), shader);
> +   }
> +
> +   return true;
> +}
> +
> +
>  static void
> -remove_defs_uses(nir_instr *instr)
> +remove_defs_uses(nir_instr *instr, nir_function_impl *impl)
>  {
> nir_foreach_dest(instr, remove_def_cb, instr);
> nir_foreach_src(instr, remove_use_cb, instr);
> +   nir_foreach_ssa_def(instr, remove_ssa_def_cb, impl);
>  }
>
>  void nir_instr_remove(nir_instr *instr)
>  {
> -   remove_defs_uses(instr);
> +   nir_function_impl *impl = 
> nir_cf_node_get_function(&instr->block->cf_node);
> +   remove_defs_uses(instr, impl);
> exec_node_remove(&instr->node);
>
> if (instr->type == nir_instr_type_jump) {
> --
> 2.1.0
>
> 

Re: [Mesa-dev] [PATCH 1/9] nir/vars_to_ssa: don't rewrite removed instructions

2015-05-08 Thread Jason Ekstrand
Reviewed-by: Jason Ekstrand  wrote:
> We were rewriting the uses of the intrinsic instruction to point to
> something else after removing it, which only happened to work since we
> were lax about having dangling uses when removing instructions. Fix that
> up.
>
> Signed-off-by: Connor Abbott 
> ---
>  src/glsl/nir/nir_lower_vars_to_ssa.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/src/glsl/nir/nir_lower_vars_to_ssa.c 
> b/src/glsl/nir/nir_lower_vars_to_ssa.c
> index ccb8f99..8d0ae1b 100644
> --- a/src/glsl/nir/nir_lower_vars_to_ssa.c
> +++ b/src/glsl/nir/nir_lower_vars_to_ssa.c
> @@ -647,11 +647,12 @@ rename_variables_block(nir_block *block, struct 
> lower_variables_state *state)
>intrin->num_components, NULL);
>
>  nir_instr_insert_before(&intrin->instr, &mov->instr);
> -nir_instr_remove(&intrin->instr);
>
>  nir_ssa_def_rewrite_uses(&intrin->dest.ssa,
>   nir_src_for_ssa(&mov->dest.dest.ssa),
>   state->shader);
> +
> +nir_instr_remove(&intrin->instr);
>  break;
>   }
>
> --
> 2.1.0
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/4] nvc0/ir: avoid jumping to a sched instruction

2015-05-08 Thread Ilia Mirkin
Signed-off-by: Ilia Mirkin 
---
Pretty sure there's nothing wrong with it, but it looks odd in the code.

 src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp | 2 ++
 src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp | 7 +--
 src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp  | 2 ++
 3 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp
index 6bb9620..28081fa 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp
@@ -1316,6 +1316,8 @@ CodeEmitterGK110::emitFlow(const Instruction *i)
} else
if (mask & 2) {
   int32_t pcRel = f->target.bb->binPos - (codeSize + 8);
+  if (writeIssueDelays && !(f->target.bb->binPos & 0x3f))
+ pcRel += 8;
   // currently we don't want absolute branches
   assert(!f->absolute);
   code[0] |= (pcRel & 0x1ff) << 23;
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
index 22db368..442cedf 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
@@ -509,10 +509,13 @@ CodeEmitterGM107::emitBRA()
emitCond5(0x00, CC_TR);
 
if (!insn->srcExists(0) || insn->src(0).getFile() != FILE_MEMORY_CONST) {
+  int32_t pos = insn->target.bb->binPos;
+  if (writeIssueDelays && !(pos & 0x1f))
+ pos += 8;
   if (!insn->absolute)
- emitField(0x14, 24, insn->target.bb->binPos - (codeSize + 8));
+ emitField(0x14, 24, pos - (codeSize + 8));
   else
- emitField(0x14, 32, insn->target.bb->binPos);
+ emitField(0x14, 32, pos);
} else {
   emitCBUF (0x24, gpr, 20, 16, 0, insn->src(0));
   emitField(0x05, 1, 1);
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
index d9aed34..c241973 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
@@ -1406,6 +1406,8 @@ CodeEmitterNVC0::emitFlow(const Instruction *i)
} else
if (mask & 2) {
   int32_t pcRel = f->target.bb->binPos - (codeSize + 8);
+  if (writeIssueDelays && !(f->target.bb->binPos & 0x3f))
+ pcRel += 8;
   // currently we don't want absolute branches
   assert(!f->absolute);
   code[0] |= (pcRel & 0x3f) << 26;
-- 
2.3.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/4] nv50/ir: allow OP_SET to merge with OP_SET_AND/etc as well as a neg

2015-05-08 Thread Ilia Mirkin
This covers the pattern where a KILL_IF is used, which triggers a
comparison of -x to 0. This can usually be folded into the comparison whose
result is being compared to 0, however it may, itself, have already been
combined with another comparison. That shouldn't impact the logic of
this pass however. With this and the & 1.0 change, code like

0020: 001c0001 80081df4 set b32 $r0 lt f32 $r0 0x3e80
0028: 001c 201fc000 and b32 $r0 $r0 0x3f80
0030: 7f9c001e dd885c00 set $p0 0x1 lt f32 neg $r0 0x0
0038: 003c 1980 $p0 discard

becomes

0020: 001c001d b5881df4 set $p0 0x1 lt f32 $r0 0x3e80
0028: 003c 1980 $p0 discard

Signed-off-by: Ilia Mirkin 
---
 .../drivers/nouveau/codegen/nv50_ir_peephole.cpp   | 51 ++
 1 file changed, 33 insertions(+), 18 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index d8af19a..43a2fe9 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
@@ -278,7 +278,6 @@ private:
 
void tryCollapseChainedMULs(Instruction *, const int s, ImmediateValue&);
 
-   // TGSI 'true' is converted to -1 by F2I(NEG(SET)), track back to SET
CmpInstruction *findOriginForTestWithZero(Value *);
 
unsigned int foldCount;
@@ -337,16 +336,10 @@ ConstantFolding::findOriginForTestWithZero(Value *value)
   return NULL;
Instruction *insn = value->getInsn();
 
-   while (insn && insn->op != OP_SET) {
+   while (insn && insn->op != OP_SET && insn->op != OP_SET_AND &&
+  insn->op != OP_SET_OR && insn->op != OP_SET_XOR) {
   Instruction *next = NULL;
   switch (insn->op) {
-  case OP_NEG:
-  case OP_ABS:
-  case OP_CVT:
- next = insn->getSrc(0)->getInsn();
- if (insn->sType != next->dType)
-return NULL;
- break;
   case OP_MOV:
  next = insn->getSrc(0)->getInsn();
  break;
@@ -946,29 +939,51 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue 
&imm0, int s)
 
case OP_SET: // TODO: SET_AND,OR,XOR
{
+  /* This optimizes the case where the output of a set is being compared
+   * to zero. Since the set can only produce 0/-1 (int) or 0/1 (float), we
+   * can be a lot cleverer in our comparison.
+   */
   CmpInstruction *si = findOriginForTestWithZero(i->getSrc(t));
   CondCode cc, ccZ;
-  if (i->src(t).mod != Modifier(0))
- return;
-  if (imm0.reg.data.u32 != 0 || !si || si->op != OP_SET)
+  if (imm0.reg.data.u32 != 0 || !si)
  return;
   cc = si->setCond;
   ccZ = (CondCode)((unsigned int)i->asCmp()->setCond & ~CC_U);
+  // We do everything assuming var (cmp) 0, reverse the condition if 0 is
+  // first.
   if (s == 0)
  ccZ = reverseCondCode(ccZ);
+  // If there is a negative modifier, we need to undo that, by flipping
+  // the comparison to zero.
+  if (i->src(t).mod.neg())
+ ccZ = reverseCondCode(ccZ);
+  // If this is a signed comparison, we expect the input to be a regular
+  // boolean, i.e. 0/-1. However the rest of the logic assumes that true
+  // is positive, so just flip the sign.
+  if (i->sType == TYPE_S32) {
+ assert(!isFloatType(si->dType));
+ ccZ = reverseCondCode(ccZ);
+  }
   switch (ccZ) {
-  case CC_LT: cc = CC_FL; break;
-  case CC_GE: cc = CC_TR; break;
-  case CC_EQ: cc = inverseCondCode(cc); break;
-  case CC_LE: cc = inverseCondCode(cc); break;
-  case CC_GT: break;
-  case CC_NE: break;
+  case CC_LT: cc = CC_FL; break; // bool < 0 -- this is never true
+  case CC_GE: cc = CC_TR; break; // bool >= 0 -- this is always true
+  case CC_EQ: cc = inverseCondCode(cc); break; // bool == 0 -- !bool
+  case CC_LE: cc = inverseCondCode(cc); break; // bool <= 0 -- !bool
+  case CC_GT: break; // bool > 0 -- bool
+  case CC_NE: break; // bool != 0 -- bool
   default:
  return;
   }
+
+  // Update the condition of this SET to be identical to the origin set,
+  // but with the updated condition code. The original SET should get
+  // DCE'd, ideally.
+  i->op = si->op;
   i->asCmp()->setCond = cc;
   i->setSrc(0, si->src(0));
   i->setSrc(1, si->src(1));
+  if (si->srcExists(2))
+ i->setSrc(2, si->src(2));
   i->sType = si->sType;
}
   break;
-- 
2.3.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/4] nvc0/ir: allow iset to produce a boolean float

2015-05-08 Thread Ilia Mirkin
Signed-off-by: Ilia Mirkin 
---
 src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp | 12 
 src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp |  1 +
 src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp  |  8 +++-
 3 files changed, 16 insertions(+), 5 deletions(-)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp
index 28081fa..ab8bf2e 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gk110.cpp
@@ -967,8 +967,8 @@ CodeEmitterGK110::emitSET(const CmpInstruction *i)
   code[0] = (code[0] & ~0xfc) | ((code[0] << 3) & 0xe0);
   if (i->defExists(1))
  defId(i->def(1), 2);
-   else
-  code[0] |= 0x1c;
+  else
+ code[0] |= 0x1c;
} else {
   switch (i->sType) {
   case TYPE_F32: op2 = 0x000; op1 = 0x800; break;
@@ -990,8 +990,12 @@ CodeEmitterGK110::emitSET(const CmpInstruction *i)
   }
   FTZ_(3a);
 
-  if (i->dType == TYPE_F32)
- code[1] |= 1 << 23;
+  if (i->dType == TYPE_F32) {
+ if (isFloatType(i->sType))
+code[1] |= 1 << 23;
+ else
+code[1] |= 1 << 15;
+  }
}
if (i->sType == TYPE_S32)
   code[1] |= 1 << 19;
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
index 442cedf..399a6f1 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_gm107.cpp
@@ -1830,6 +1830,7 @@ CodeEmitterGM107::emitISET()
emitCond3(0x31, insn->setCond);
emitField(0x30, 1, isSignedType(insn->sType));
emitCC   (0x2f);
+   emitField(0x2c, 1, insn->dType == TYPE_F32);
emitX(0x2b);
emitGPR  (0x08, insn->src(0));
emitGPR  (0x00, insn->def(0));
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
index c241973..f5992bc 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_emit_nvc0.cpp
@@ -1078,8 +1078,14 @@ CodeEmitterNVC0::emitSET(const CmpInstruction *i)
if (!isFloatType(i->sType))
   lo = 0x3;
 
-   if (isFloatType(i->dType) || isSignedIntType(i->sType))
+   if (isSignedIntType(i->sType))
   lo |= 0x20;
+   if (isFloatType(i->dType)) {
+  if (isFloatType(i->sType))
+ lo |= 0x20;
+  else
+ lo |= 0x80;
+   }
 
switch (i->op) {
case OP_SET_AND: hi = 0x1000; break;
-- 
2.3.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/4] nvc0/ir: optimize set & 1.0 to produce boolean-float sets

2015-05-08 Thread Ilia Mirkin
This has started to happen more now that the backend is producing
KILL_IF more often.

Signed-off-by: Ilia Mirkin 
---
 .../drivers/nouveau/codegen/nv50_ir_peephole.cpp   | 29 ++
 .../nouveau/codegen/nv50_ir_target_nv50.cpp|  2 ++
 2 files changed, 31 insertions(+)

diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
index 14446b6..d8af19a 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_peephole.cpp
@@ -973,6 +973,35 @@ ConstantFolding::opnd(Instruction *i, ImmediateValue 
&imm0, int s)
}
   break;
 
+   case OP_AND:
+   {
+  CmpInstruction *cmp = i->getSrc(t)->getInsn()->asCmp();
+  if (!cmp || cmp->op == OP_SLCT)
+ return;
+  if (!prog->getTarget()->isOpSupported(cmp->op, TYPE_F32))
+ return;
+  if (imm0.reg.data.f32 != 1.0)
+ return;
+  if (cmp == NULL)
+ return;
+  if (i->getSrc(t)->getInsn()->dType != TYPE_U32)
+ return;
+
+  i->getSrc(t)->getInsn()->dType = TYPE_F32;
+  if (i->src(t).mod != Modifier(0)) {
+ assert(i->src(t).mod == Modifier(NV50_IR_MOD_NOT));
+ i->src(t).mod = Modifier(0);
+ cmp->setCond = reverseCondCode(cmp->setCond);
+  }
+  i->op = OP_MOV;
+  i->setSrc(s, NULL);
+  if (t) {
+ i->setSrc(0, i->getSrc(t));
+ i->setSrc(t, NULL);
+  }
+   }
+  break;
+
case OP_SHL:
{
   if (s != 1 || i->src(0).mod != Modifier(0))
diff --git a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp 
b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp
index 178a167..70180eb 100644
--- a/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp
+++ b/src/gallium/drivers/nouveau/codegen/nv50_ir_target_nv50.cpp
@@ -413,6 +413,8 @@ TargetNV50::isOpSupported(operation op, DataType ty) const
   return false;
case OP_SAD:
   return ty == TYPE_S32;
+   case OP_SET:
+  return !isFloatType(ty);
default:
   return true;
}
-- 
2.3.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 9/9] i965/fs/nir: enable the dead control flow optimization

2015-05-08 Thread Connor Abbott
Doesn't do anything on the public shader-db.

Signed-off-by: Connor Abbott 
---
 src/glsl/Makefile.sources   | 1 +
 src/mesa/drivers/dri/i965/brw_nir.c | 2 ++
 2 files changed, 3 insertions(+)

diff --git a/src/glsl/Makefile.sources b/src/glsl/Makefile.sources
index d784a81..7bdd895 100644
--- a/src/glsl/Makefile.sources
+++ b/src/glsl/Makefile.sources
@@ -50,6 +50,7 @@ NIR_FILES = \
nir/nir_opt_copy_propagate.c \
nir/nir_opt_cse.c \
nir/nir_opt_dce.c \
+   nir/nir_opt_dead_cf.c \
nir/nir_opt_gcm.c \
nir/nir_opt_global_to_local.c \
nir/nir_opt_peephole_ffma.c \
diff --git a/src/mesa/drivers/dri/i965/brw_nir.c 
b/src/mesa/drivers/dri/i965/brw_nir.c
index de4d7aa..4001190 100644
--- a/src/mesa/drivers/dri/i965/brw_nir.c
+++ b/src/mesa/drivers/dri/i965/brw_nir.c
@@ -52,6 +52,8 @@ nir_optimize(nir_shader *nir)
   nir_validate_shader(nir);
   progress |= nir_opt_constant_folding(nir);
   nir_validate_shader(nir);
+  progress |= nir_opt_dead_cf(nir);
+  nir_validate_shader(nir);
   progress |= nir_opt_remove_phis(nir);
   nir_validate_shader(nir);
} while (progress);
-- 
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 6/9] nir: add nir_block_get_following_loop() helper

2015-05-08 Thread Connor Abbott
Signed-off-by: Connor Abbott 
---
 src/glsl/nir/nir.c | 16 
 src/glsl/nir/nir.h |  2 ++
 2 files changed, 18 insertions(+)

diff --git a/src/glsl/nir/nir.c b/src/glsl/nir/nir.c
index efc3f01..e3c37dd 100644
--- a/src/glsl/nir/nir.c
+++ b/src/glsl/nir/nir.c
@@ -2097,6 +2097,22 @@ nir_block_get_following_if(nir_block *block)
return nir_cf_node_as_if(next_node);
 }
 
+nir_loop *
+nir_block_get_following_loop(nir_block *block)
+{
+   if (exec_node_is_tail_sentinel(&block->cf_node.node))
+  return NULL;
+
+   if (nir_cf_node_is_last(&block->cf_node))
+  return NULL;
+
+   nir_cf_node *next_node = nir_cf_node_next(&block->cf_node);
+
+   if (next_node->type != nir_cf_node_loop)
+  return NULL;
+
+   return nir_cf_node_as_loop(next_node);
+}
 static bool
 index_block(nir_block *block, void *state)
 {
diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h
index d2bf3d7..69daa14 100644
--- a/src/glsl/nir/nir.h
+++ b/src/glsl/nir/nir.h
@@ -1607,6 +1607,8 @@ bool nir_foreach_block_reverse(nir_function_impl *impl, 
nir_foreach_block_cb cb,
  */
 nir_if *nir_block_get_following_if(nir_block *block);
 
+nir_loop *nir_block_get_following_loop(nir_block *block);
+
 void nir_index_local_regs(nir_function_impl *impl);
 void nir_index_global_regs(nir_shader *shader);
 void nir_index_ssa_defs(nir_function_impl *impl);
-- 
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 8/9] nir/dead_cf: add support for removing useless loops

2015-05-08 Thread Connor Abbott
Signed-off-by: Connor Abbott 
---
 src/glsl/nir/nir_opt_dead_cf.c | 96 --
 1 file changed, 84 insertions(+), 12 deletions(-)

diff --git a/src/glsl/nir/nir_opt_dead_cf.c b/src/glsl/nir/nir_opt_dead_cf.c
index 17eadbd..f96bf63 100644
--- a/src/glsl/nir/nir_opt_dead_cf.c
+++ b/src/glsl/nir/nir_opt_dead_cf.c
@@ -28,9 +28,9 @@
 #include "nir.h"
 
 /*
- * This file implements an optimization that deletes statically unreachable
- * code. In NIR, one way this can happen if if an if statement has a constant
- * condition:
+ * This file implements an optimization that deletes statically
+ * unreachable/dead code. In NIR, one way this can happen if if an if
+ * statement has a constant condition:
  *
  * if (true) {
  *...
@@ -40,7 +40,7 @@
  * branch into the surrounding control flow, possibly removing more code if
  * the branch had a jump at the end.
  *
- * The other way is that control flow can end in a jump so that code after it
+ * Another way is that control flow can end in a jump so that code after it
  * never gets executed. In particular, this can happen after optimizing
  * something like:
  *
@@ -59,6 +59,12 @@
  * }
  * ...
  *
+ * Finally, we also handle removing useless loops, i.e. loops with no side
+ * effects and without any definitions that are used elsewhere. This case is a
+ * little different from the first two in that the code is actually run (it
+ * just never does anything), but there are similar issues with needing to
+ * be careful with restarting after deleting the cf_node (see dead_cf_list())
+ * so this is a convenient place to remove them.
  */
 
 /* deletes all the control flow nodes after 'start' in the control flow list.
@@ -126,19 +132,82 @@ opt_constant_if(nir_if *if_stmt, bool condition)
 }
 
 static bool
+block_has_no_side_effects(nir_block *block, void *state)
+{
+   (void) state;
+
+   nir_foreach_instr(block, instr) {
+  if (instr->type == nir_instr_type_call)
+ return false;
+
+  if (instr->type != nir_instr_type_intrinsic)
+ continue;
+
+  nir_intrinsic_instr *intrin = nir_instr_as_intrinsic(instr);
+  if (!nir_intrinsic_infos[intrin->intrinsic].flags &
+  NIR_INTRINSIC_CAN_ELIMINATE)
+ return false;
+   }
+
+   return true;
+}
+
+static bool
+dest_not_live_out(nir_dest *dest, void *state)
+{
+   nir_block *after = state;
+
+   assert(dest->is_ssa);
+   return !BITSET_TEST(after->live_in, dest->ssa.live_index);
+}
+
+static bool
+loop_is_dead(nir_loop *loop)
+{
+   nir_block *before = nir_cf_node_as_block(nir_cf_node_prev(&loop->cf_node));
+   nir_block *after = nir_cf_node_as_block(nir_cf_node_next(&loop->cf_node));
+
+   if (!exec_list_is_empty(&after->instr_list) &&
+   nir_block_last_instr(after)->type == nir_instr_type_phi)
+  return false;
+
+   if (!nir_foreach_block_in_cf_node(&loop->cf_node, block_has_no_side_effects,
+ NULL))
+  return false;
+
+   for (nir_block *cur = after->imm_dom; cur != before; cur = cur->imm_dom) {
+  nir_foreach_instr(cur, instr) {
+ if (!nir_foreach_dest(instr, dest_not_live_out, after))
+return false;
+  }
+   }
+
+   return true;
+}
+
+static bool
 dead_cf_block(nir_block *block)
 {
nir_if *following_if = nir_block_get_following_if(block);
-   if (!following_if)
-  return false;
+   if (following_if) {
+ nir_const_value *const_value =
+nir_src_as_const_value(following_if->condition);
+
+ if (!const_value)
+return false;
+
+  opt_constant_if(following_if, const_value->u[0] != 0);
+  return true;
+   }
 
-  nir_const_value *const_value =
- nir_src_as_const_value(following_if->condition);
+   nir_loop *following_loop = nir_block_get_following_loop(block);
+   if (!following_loop)
+  return false;
 
-  if (!const_value)
- return false;
+   if (!loop_is_dead(following_loop))
+  return false;
 
-   opt_constant_if(following_if, const_value->u[0] != 0);
+   nir_cf_node_remove(&following_loop->cf_node);
return true;
 }
 
@@ -167,7 +236,7 @@ dead_cf_list(struct exec_list *list, bool 
*list_ends_in_jump)
   case nir_cf_node_block: {
  nir_block *block = nir_cf_node_as_block(cur);
  if (dead_cf_block(block)) {
-/* We just deleted the if after this block, so we may have
+/* We just deleted the if or loop after this block, so we may have
  * deleted the block before or after it -- which one is an
  * implementation detail. Therefore, to recover the place we were
  * at, we have to use the previous cf_node.
@@ -238,6 +307,9 @@ dead_cf_list(struct exec_list *list, bool 
*list_ends_in_jump)
 static bool
 opt_dead_cf_impl(nir_function_impl *impl)
 {
+   nir_metadata_require(impl, nir_metadata_live_variables |
+  nir_metadata_dominance);
+
bool dummy;
bool progress = dead_cf_list(&impl->body, &dummy);
 
-- 
2.1.0

_

[Mesa-dev] [PATCH 3/9] nir: add an optimization for removing dead control flow

2015-05-08 Thread Connor Abbott
I'm not so sure about where to put the helper currently in nir.c... on
the one hand, it's pretty specific to this pass, but on the other hand
it needs to do some very fiddly low-level things to the control flow
which is why it needs access to a static function in nir.c
(stitch_blocks()) that I'd rather not expose publically.

Signed-off-by: Connor Abbott 
---
 src/glsl/nir/nir.c |  26 +++
 src/glsl/nir/nir.h |   8 +++
 src/glsl/nir/nir_opt_dead_cf.c | 150 +
 3 files changed, 184 insertions(+)
 create mode 100644 src/glsl/nir/nir_opt_dead_cf.c

diff --git a/src/glsl/nir/nir.c b/src/glsl/nir/nir.c
index 362ac30..efc3f01 100644
--- a/src/glsl/nir/nir.c
+++ b/src/glsl/nir/nir.c
@@ -1281,6 +1281,32 @@ nir_cf_node_remove(nir_cf_node *node)
}
 }
 
+/* Takes a control flow list 'cf_list,' presumed to be a child of the control
+ * flow node 'node,' pastes cf_list after node, and then deletes node.
+ */
+
+void
+nir_cf_list_move_after_node(nir_cf_node *node, struct exec_list *cf_list)
+{
+   nir_cf_node *after = nir_cf_node_next(node);
+   assert(after->type == nir_cf_node_block);
+   nir_block *after_block = nir_cf_node_as_block(after);
+
+   foreach_list_typed(nir_cf_node, child, node, cf_list) {
+  child->parent = node->parent;
+   }
+
+   nir_cf_node *last = exec_node_data(nir_cf_node, exec_list_get_tail(cf_list),
+  node);
+   assert(last->type == nir_cf_node_block);
+   nir_block *last_block = nir_cf_node_as_block(last);
+
+   exec_node_insert_list_before(&after->node, cf_list);
+   stitch_blocks(last_block, after_block);
+
+   nir_cf_node_remove(node);
+}
+
 static bool
 add_use_cb(nir_src *src, void *state)
 {
diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h
index 697d37e..d2bf3d7 100644
--- a/src/glsl/nir/nir.h
+++ b/src/glsl/nir/nir.h
@@ -1518,6 +1518,12 @@ void nir_cf_node_insert_end(struct exec_list *list, 
nir_cf_node *node);
 /** removes a control flow node, doing any cleanup necessary */
 void nir_cf_node_remove(nir_cf_node *node);
 
+
+/** Takes a control flow list 'cf_list,' presumed to be a child of the control
+ *  flow node 'node,' pastes cf_list after node, and then deletes node.
+ */
+void nir_cf_list_move_after_node(nir_cf_node *node, struct exec_list *cf_list);
+
 /** requests that the given pieces of metadata be generated */
 void nir_metadata_require(nir_function_impl *impl, nir_metadata required);
 /** dirties all but the preserved metadata */
@@ -1692,6 +1698,8 @@ bool nir_opt_cse(nir_shader *shader);
 bool nir_opt_dce_impl(nir_function_impl *impl);
 bool nir_opt_dce(nir_shader *shader);
 
+bool nir_opt_dead_cf(nir_shader *shader);
+
 void nir_opt_gcm(nir_shader *shader);
 
 bool nir_opt_peephole_select(nir_shader *shader);
diff --git a/src/glsl/nir/nir_opt_dead_cf.c b/src/glsl/nir/nir_opt_dead_cf.c
new file mode 100644
index 000..3fbb794
--- /dev/null
+++ b/src/glsl/nir/nir_opt_dead_cf.c
@@ -0,0 +1,150 @@
+/*
+ * Copyright © 2014 Connor Abbott
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *Connor Abbott (cwabbo...@gmail.com)
+ *
+ */
+
+#include "nir.h"
+
+/*
+ * This file implements an optimization that deletes statically unreachable
+ * code. In NIR, one way this can happen if if an if statement has a constant
+ * condition:
+ *
+ * if (true) {
+ *...
+ * }
+ *
+ * We delete the if statement and paste the contents of the always-executed
+ * branch into the surrounding control flow, possibly removing more code if
+ * the branch had a jump at the end.
+ */
+
+/* deletes all the control flow nodes after 'start' in the control flow list.
+ */
+static void
+delete_unreachable_after(nir_cf_node *start)
+{
+   nir_cf_node *current = start;
+   while (!nir_cf_node_is_last(current)) {
+  current = nir_cf_node_next(current);
+  nir_cf_node_remove(current);
+   }
+}
+
+static void
+

[Mesa-dev] [PATCH 5/9] nir/dead_cf: delete code that's unreachable due to jumps

2015-05-08 Thread Connor Abbott
Signed-off-by: Connor Abbott 
---
 src/glsl/nir/nir_opt_dead_cf.c | 126 ++---
 1 file changed, 118 insertions(+), 8 deletions(-)

diff --git a/src/glsl/nir/nir_opt_dead_cf.c b/src/glsl/nir/nir_opt_dead_cf.c
index 3fbb794..17eadbd 100644
--- a/src/glsl/nir/nir_opt_dead_cf.c
+++ b/src/glsl/nir/nir_opt_dead_cf.c
@@ -39,6 +39,26 @@
  * We delete the if statement and paste the contents of the always-executed
  * branch into the surrounding control flow, possibly removing more code if
  * the branch had a jump at the end.
+ *
+ * The other way is that control flow can end in a jump so that code after it
+ * never gets executed. In particular, this can happen after optimizing
+ * something like:
+ *
+ * if (true) {
+ *...
+ *break;
+ * }
+ * ...
+ *
+ * We also consider the case where both branches of an if end in a jump, e.g.:
+ *
+ * if (...) {
+ *break;
+ * } else {
+ *continue;
+ * }
+ * ...
+ *
  */
 
 /* deletes all the control flow nodes after 'start' in the control flow list.
@@ -106,30 +126,120 @@ opt_constant_if(nir_if *if_stmt, bool condition)
 }
 
 static bool
-dead_cf_cb(nir_block *block, void *state)
+dead_cf_block(nir_block *block)
 {
-   bool *progress = state;
-
nir_if *following_if = nir_block_get_following_if(block);
if (!following_if)
-  return true;
+  return false;
 
   nir_const_value *const_value =
  nir_src_as_const_value(following_if->condition);
 
   if (!const_value)
- return true;
+ return false;
 
opt_constant_if(following_if, const_value->u[0] != 0);
-   *progress = true;
return true;
 }
 
 static bool
-opt_dead_cf_impl(nir_function_impl *impl)
+ends_in_jump(nir_block *block)
+{
+   if (exec_list_is_empty(&block->instr_list))
+  return false;
+
+   nir_instr *instr = nir_block_last_instr(block);
+   return instr->type == nir_instr_type_jump;
+}
+
+static bool
+dead_cf_list(struct exec_list *list, bool *list_ends_in_jump)
 {
bool progress = false;
-   nir_foreach_block(impl, dead_cf_cb, &progress);
+   *list_ends_in_jump = false;
+
+   nir_cf_node *cur = exec_node_data(nir_cf_node, exec_list_get_head(list),
+ node);
+   nir_cf_node *prev = NULL;
+
+   while (cur != NULL) {
+  switch (cur->type) {
+  case nir_cf_node_block: {
+ nir_block *block = nir_cf_node_as_block(cur);
+ if (dead_cf_block(block)) {
+/* We just deleted the if after this block, so we may have
+ * deleted the block before or after it -- which one is an
+ * implementation detail. Therefore, to recover the place we were
+ * at, we have to use the previous cf_node.
+ */
+
+if (prev) {
+   cur = nir_cf_node_next(prev);
+} else {
+   cur = exec_node_data(nir_cf_node, exec_list_get_head(list),
+node);
+}
+
+block = nir_cf_node_as_block(cur);
+
+progress = true;
+ }
+
+ if (ends_in_jump(block)) {
+*list_ends_in_jump = true;
+
+if (!exec_node_is_tail_sentinel(cur->node.next)) {
+   delete_unreachable_after(cur);
+   return true;
+}
+ }
+
+ break;
+  }
+
+  case nir_cf_node_if: {
+ nir_if *if_stmt = nir_cf_node_as_if(cur);
+ bool then_ends_in_jump, else_ends_in_jump;
+ progress |= dead_cf_list(&if_stmt->then_list, &then_ends_in_jump);
+ progress |= dead_cf_list(&if_stmt->else_list, &else_ends_in_jump);
+
+ if (then_ends_in_jump && else_ends_in_jump) {
+*list_ends_in_jump = true;
+nir_block *next = nir_cf_node_as_block(nir_cf_node_next(cur));
+if (!exec_list_is_empty(&next->instr_list) ||
+!exec_node_is_tail_sentinel(next->cf_node.node.next)) {
+   delete_unreachable_after(cur);
+   return true;
+}
+ }
+
+ break;
+  }
+
+  case nir_cf_node_loop: {
+ nir_loop *loop = nir_cf_node_as_loop(cur);
+ bool dummy;
+ progress |= dead_cf_list(&loop->body, &dummy);
+
+ break;
+  }
+
+  default:
+ unreachable("unknown cf node type");
+  }
+
+  prev = cur;
+  cur = nir_cf_node_next(cur);
+   }
+
+   return progress;
+}
+
+static bool
+opt_dead_cf_impl(nir_function_impl *impl)
+{
+   bool dummy;
+   bool progress = dead_cf_list(&impl->body, &dummy);
 
if (progress)
   nir_metadata_preserve(impl, nir_metadata_none);
-- 
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/9] nir/vars_to_ssa: don't rewrite removed instructions

2015-05-08 Thread Connor Abbott
We were rewriting the uses of the intrinsic instruction to point to
something else after removing it, which only happened to work since we
were lax about having dangling uses when removing instructions. Fix that
up.

Signed-off-by: Connor Abbott 
---
 src/glsl/nir/nir_lower_vars_to_ssa.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/glsl/nir/nir_lower_vars_to_ssa.c 
b/src/glsl/nir/nir_lower_vars_to_ssa.c
index ccb8f99..8d0ae1b 100644
--- a/src/glsl/nir/nir_lower_vars_to_ssa.c
+++ b/src/glsl/nir/nir_lower_vars_to_ssa.c
@@ -647,11 +647,12 @@ rename_variables_block(nir_block *block, struct 
lower_variables_state *state)
   intrin->num_components, NULL);
 
 nir_instr_insert_before(&intrin->instr, &mov->instr);
-nir_instr_remove(&intrin->instr);
 
 nir_ssa_def_rewrite_uses(&intrin->dest.ssa,
  nir_src_for_ssa(&mov->dest.dest.ssa),
  state->shader);
+
+nir_instr_remove(&intrin->instr);
 break;
  }
 
-- 
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/9] nir: insert ssa_undef instructions when cleanup up defs/uses

2015-05-08 Thread Connor Abbott
The point of cleanup_defs_uses() is to make an instruction safe to
remove by removing any references that the rest of the shader may have
to it. Previously, it was handling register use/def sets and removing
the instruction from the use sets of any SSA sources it had, but if the
instruction defined an SSA value that was used by other instructions it
wasn't doing anything. This was ok, since we were always careful to make
sure that no removed instruction ever had any uses, but now we want to
start removing unreachable instruction which might still be used in
reachable parts of the code. In that case, the value that any uses get
is undefined since the instruction never actually executes, so we can
just replace the instruction with an ssa_undef_instr.

Signed-off-by: Connor Abbott 
---
 src/glsl/nir/nir.c | 47 ++-
 1 file changed, 34 insertions(+), 13 deletions(-)

diff --git a/src/glsl/nir/nir.c b/src/glsl/nir/nir.c
index f03e80a..362ac30 100644
--- a/src/glsl/nir/nir.c
+++ b/src/glsl/nir/nir.c
@@ -1206,26 +1206,29 @@ stitch_blocks(nir_block *before, nir_block *after)
 }
 
 static void
-remove_defs_uses(nir_instr *instr);
+remove_defs_uses(nir_instr *instr, nir_function_impl *impl);
 
 static void
-cleanup_cf_node(nir_cf_node *node)
+cleanup_cf_node(nir_cf_node *node, nir_function_impl *impl)
 {
switch (node->type) {
case nir_cf_node_block: {
   nir_block *block = nir_cf_node_as_block(node);
   /* We need to walk the instructions and clean up defs/uses */
-  nir_foreach_instr(block, instr)
- remove_defs_uses(instr);
+  nir_foreach_instr(block, instr) {
+ if (instr->type == nir_instr_type_jump)
+handle_remove_jump(block, nir_instr_as_jump(instr)->type);
+ remove_defs_uses(instr, impl);
+  }
   break;
}
 
case nir_cf_node_if: {
   nir_if *if_stmt = nir_cf_node_as_if(node);
   foreach_list_typed(nir_cf_node, child, node, &if_stmt->then_list)
- cleanup_cf_node(child);
+ cleanup_cf_node(child, impl);
   foreach_list_typed(nir_cf_node, child, node, &if_stmt->else_list)
- cleanup_cf_node(child);
+ cleanup_cf_node(child, impl);
 
   list_del(&if_stmt->condition.use_link);
   break;
@@ -1234,13 +1237,12 @@ cleanup_cf_node(nir_cf_node *node)
case nir_cf_node_loop: {
   nir_loop *loop = nir_cf_node_as_loop(node);
   foreach_list_typed(nir_cf_node, child, node, &loop->body)
- cleanup_cf_node(child);
+ cleanup_cf_node(child, impl);
   break;
}
case nir_cf_node_function: {
-  nir_function_impl *impl = nir_cf_node_as_function(node);
   foreach_list_typed(nir_cf_node, child, node, &impl->body)
- cleanup_cf_node(child);
+ cleanup_cf_node(child, impl);
   break;
}
default:
@@ -1254,6 +1256,8 @@ nir_cf_node_remove(nir_cf_node *node)
nir_function_impl *impl = nir_cf_node_get_function(node);
nir_metadata_preserve(impl, nir_metadata_none);
 
+   cleanup_cf_node(node, impl);
+
if (node->type == nir_cf_node_block) {
   /*
* Basic blocks can't really be removed by themselves, since they act as
@@ -1275,8 +1279,6 @@ nir_cf_node_remove(nir_cf_node *node)
   exec_node_remove(&node->node);
   stitch_blocks(before_block, after_block);
}
-
-   cleanup_cf_node(node);
 }
 
 static bool
@@ -1443,16 +1445,35 @@ remove_def_cb(nir_dest *dest, void *state)
return true;
 }
 
+static bool
+remove_ssa_def_cb(nir_ssa_def *def, void *state)
+{
+   nir_function_impl *impl = state;
+   nir_shader *shader = impl->overload->function->shader;
+
+   if (!list_empty(&def->uses) || !list_empty(&def->if_uses)) {
+  nir_ssa_undef_instr *undef =
+ nir_ssa_undef_instr_create(shader, def->num_components);
+  nir_instr_insert_before_cf_list(&impl->body, &undef->instr);
+  nir_ssa_def_rewrite_uses(def, nir_src_for_ssa(&undef->def), shader);
+   }
+
+   return true;
+}
+
+
 static void
-remove_defs_uses(nir_instr *instr)
+remove_defs_uses(nir_instr *instr, nir_function_impl *impl)
 {
nir_foreach_dest(instr, remove_def_cb, instr);
nir_foreach_src(instr, remove_use_cb, instr);
+   nir_foreach_ssa_def(instr, remove_ssa_def_cb, impl);
 }
 
 void nir_instr_remove(nir_instr *instr)
 {
-   remove_defs_uses(instr);
+   nir_function_impl *impl = nir_cf_node_get_function(&instr->block->cf_node);
+   remove_defs_uses(instr, impl);
exec_node_remove(&instr->node);
 
if (instr->type == nir_instr_type_jump) {
-- 
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/9] nir/validate: validate successors at the end of a loop

2015-05-08 Thread Connor Abbott
I found this useful while debugging some control flow bugs while working
on the dead control flow pass.

Signed-off-by: Connor Abbott 
---
 src/glsl/nir/nir_validate.c | 21 +
 1 file changed, 21 insertions(+)

diff --git a/src/glsl/nir/nir_validate.c b/src/glsl/nir/nir_validate.c
index da92ed9..0332c7d 100644
--- a/src/glsl/nir/nir_validate.c
+++ b/src/glsl/nir/nir_validate.c
@@ -78,6 +78,9 @@ typedef struct {
/* the parent of the current cf node being visited */
nir_cf_node *parent_node;
 
+   /* whether the current loop we're visiting has a break statement */
+   bool has_break;
+
/* the current function implementation being validated */
nir_function_impl *impl;
 
@@ -513,6 +516,8 @@ validate_instr(nir_instr *instr, validate_state *state)
   break;
 
case nir_instr_type_jump:
+  if (nir_instr_as_jump(instr)->type == nir_jump_break)
+ state->has_break = true;
   break;
 
default:
@@ -641,6 +646,9 @@ validate_if(nir_if *if_stmt, validate_state *state)
 static void
 validate_loop(nir_loop *loop, validate_state *state)
 {
+   bool old_has_break = state->has_break;
+   state->has_break = false;
+
assert(!exec_node_is_head_sentinel(loop->cf_node.node.prev));
nir_cf_node *prev_node = nir_cf_node_prev(&loop->cf_node);
assert(prev_node->type == nir_cf_node_block);
@@ -663,7 +671,20 @@ validate_loop(nir_loop *loop, validate_state *state)
   validate_cf_node(cf_node, state);
}
 
+   nir_block *last_block = nir_cf_node_as_block(nir_loop_last_cf_node(loop));
+   if (exec_list_is_empty(&last_block->instr_list) ||
+   nir_block_last_instr(last_block)->type != nir_instr_type_jump) {
+  assert(&last_block->successors[0]->cf_node == 
nir_loop_first_cf_node(loop));
+   }
+   if (state->has_break) {
+  assert(last_block->successors[1] == NULL);
+   } else {
+  nir_block *next = nir_cf_node_as_block(nir_cf_node_next(&loop->cf_node));
+  assert(last_block->successors[1] == next);
+   }
+
state->parent_node = old_parent;
+   state->has_break = old_has_break;
 }
 
 static void
-- 
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 7/9] nir: add a helper for iterating over blocks in a cf node

2015-05-08 Thread Connor Abbott
We were already doing this internally for iterating over a function
implementation, so just expose it directly.

Signed-off-by: Connor Abbott 
---
 src/glsl/nir/nir.c | 7 +++
 src/glsl/nir/nir.h | 2 ++
 2 files changed, 9 insertions(+)

diff --git a/src/glsl/nir/nir.c b/src/glsl/nir/nir.c
index e3c37dd..3cabdf7 100644
--- a/src/glsl/nir/nir.c
+++ b/src/glsl/nir/nir.c
@@ -2055,6 +2055,13 @@ foreach_cf_node(nir_cf_node *node, nir_foreach_block_cb 
cb,
 }
 
 bool
+nir_foreach_block_in_cf_node(nir_cf_node *node, nir_foreach_block_cb cb,
+ void *state)
+{
+   return foreach_cf_node(node, cb, false, state);
+}
+
+bool
 nir_foreach_block(nir_function_impl *impl, nir_foreach_block_cb cb, void 
*state)
 {
foreach_list_typed_safe(nir_cf_node, node, node, &impl->body) {
diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h
index 69daa14..4284374 100644
--- a/src/glsl/nir/nir.h
+++ b/src/glsl/nir/nir.h
@@ -1601,6 +1601,8 @@ bool nir_foreach_block(nir_function_impl *impl, 
nir_foreach_block_cb cb,
void *state);
 bool nir_foreach_block_reverse(nir_function_impl *impl, nir_foreach_block_cb 
cb,
void *state);
+bool nir_foreach_block_in_cf_node(nir_cf_node *node, nir_foreach_block_cb cb,
+  void *state);
 
 /* If the following CF node is an if, this function returns that if.
  * Otherwise, it returns NULL.
-- 
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 0/9] NIR dead control-flow removal

2015-05-08 Thread Connor Abbott
This series implements a bunch of related optimizations that run at
once as part of the same pass to eliminate control flow that is
useless. This includes both unreachable code and useless loops, i.e.
loops that don't compute a result used by the rest of the code. To do
this, I needed to change some low-level things that change the behavior
of nir_instr_remove() (see patch 2 for details), but I've tested the
series on piglit and there are no regressions, and there are no
shader-db changes either.

This is rebased on Jason's series to use lists for use-def sets; it
turns out that the changes required were pretty minimal.

The series is also available at:

git://people.freedesktop.org/~cwabbott0/mesa nir-dead-cf-v3

Connor Abbott (9):
  nir/vars_to_ssa: don't rewrite removed instructions
  nir: insert ssa_undef instructions when cleanup up defs/uses
  nir: add an optimization for removing dead control flow
  nir/validate: validate successors at the end of a loop
  nir/dead_cf: delete code that's unreachable due to jumps
  nir: add nir_block_get_following_loop() helper
  nir: add a helper for iterating over blocks in a cf node
  nir/dead_cf: add support for removing useless loops
  i965/fs/nir: enable the dead control flow optimization

 src/glsl/Makefile.sources|   1 +
 src/glsl/nir/nir.c   |  94 --
 src/glsl/nir/nir.h   |  12 ++
 src/glsl/nir/nir_lower_vars_to_ssa.c |   3 +-
 src/glsl/nir/nir_opt_dead_cf.c   | 332 +++
 src/glsl/nir/nir_validate.c  |  21 +++
 src/mesa/drivers/dri/i965/brw_nir.c  |   2 +
 7 files changed, 452 insertions(+), 13 deletions(-)
 create mode 100644 src/glsl/nir/nir_opt_dead_cf.c

-- 
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [xf86-video-nouveau] dri2: Enable BufferAge support

2015-05-08 Thread Mario Kleiner



On 01/19/2015 12:00 PM, Chris Wilson wrote:

For enable BufferAge support, we just have to be not using the
DRI2Buffer->flags field for any purpose (i.e. it is always expected to
be 0, as it is now) and to be sure to swap the flags field whenever we
exchange buffers. As nouveau does not exactly support TripleBuffer, we
don't have to worry about setting the copying the flags field when
injecting the third buffer.

Signed-off-by: Chris Wilson 
Cc: Maarten Lankhorst 
Cc: Mario Kleiner 
---
  src/nouveau_dri2.c | 7 +++
  1 file changed, 7 insertions(+)

diff --git a/src/nouveau_dri2.c b/src/nouveau_dri2.c
index e3445b2..428ef92 100644
--- a/src/nouveau_dri2.c
+++ b/src/nouveau_dri2.c
@@ -711,6 +711,7 @@ nouveau_dri2_finish_swap(DrawablePtr draw, unsigned int 
frame,
}

SWAP(s->dst->name, s->src->name);
+   SWAP(s->dst->flags, s->src->flags);
SWAP(nouveau_pixmap(dst_pix)->bo, nouveau_pixmap(src_pix)->bo);

DamageRegionProcessPending(draw);
@@ -1003,6 +1004,12 @@ nouveau_dri2_init(ScreenPtr pScreen)
dri2.DestroyBuffer2 = nouveau_dri2_destroy_buffer2;
dri2.CopyRegion2 = nouveau_dri2_copy_region2;
  #endif
+
+#if DRI2INFOREC_VERSION >= 10
+   dri2.version = 10;
+   dri2.bufferAge = 1;
+#endif
+
return DRI2ScreenInit(pScreen, &dri2);
  }




Seems ok to me.

Reviewed-by: Mario Kleiner 

-mario
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/fs: Add support for removing NOT.NZ and NOT.Z instructions.

2015-05-08 Thread Ian Romanick
This was not fully baked.  I'll send out a fixed version later.

On 05/08/2015 07:05 PM, Ian Romanick wrote:
> From: Ian Romanick 
> 
> Shader-db results:
> 
> GM45 and Iron Lake:
> total instructions in shared programs: 7888585 -> 7888585 (0.00%)
> instructions in affected programs: 0 -> 0
> 
> Sandy Bridge, Ivy Bridge, Haswell, and Broadwell:
> total instructions in shared programs: 9598608 -> 9598572 (-0.00%)
> instructions in affected programs: 6506 -> 6470 (-0.55%)
> helped:36
> 
> Signed-off-by: Ian Romanick 
> ---
>  src/mesa/drivers/dri/i965/brw_fs_cmod_propagation.cpp | 8 +++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_cmod_propagation.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_cmod_propagation.cpp
> index 469f2ea..d72a83a 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_cmod_propagation.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_cmod_propagation.cpp
> @@ -59,7 +59,8 @@ opt_cmod_propagation_local(bblock_t *block)
>  
>if ((inst->opcode != BRW_OPCODE_AND &&
> inst->opcode != BRW_OPCODE_CMP &&
> -   inst->opcode != BRW_OPCODE_MOV) ||
> +   inst->opcode != BRW_OPCODE_MOV &&
> +   inst->opcode != BRW_OPCODE_NOT) ||
>inst->predicate != BRW_PREDICATE_NONE ||
>!inst->dst.is_null() ||
>inst->src[0].file != GRF ||
> @@ -86,6 +87,11 @@ opt_cmod_propagation_local(bblock_t *block)
>inst->conditional_mod != BRW_CONDITIONAL_NZ)
>   continue;
>  
> +  if (inst->opcode == BRW_OPCODE_NOT &&
> +  inst->conditional_mod != BRW_CONDITIONAL_Z &&
> +  inst->conditional_mod != BRW_CONDITIONAL_NZ)
> + continue;
> +
>bool read_flag = false;
>foreach_inst_in_block_reverse_starting_from(fs_inst, scan_inst, inst,
>block) {
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965/fs: Add support for removing NOT.NZ and NOT.Z instructions.

2015-05-08 Thread Ian Romanick
From: Ian Romanick 

Shader-db results:

GM45 and Iron Lake:
total instructions in shared programs: 7888585 -> 7888585 (0.00%)
instructions in affected programs: 0 -> 0

Sandy Bridge, Ivy Bridge, Haswell, and Broadwell:
total instructions in shared programs: 9598608 -> 9598572 (-0.00%)
instructions in affected programs: 6506 -> 6470 (-0.55%)
helped:36

Signed-off-by: Ian Romanick 
---
 src/mesa/drivers/dri/i965/brw_fs_cmod_propagation.cpp | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_cmod_propagation.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_cmod_propagation.cpp
index 469f2ea..d72a83a 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_cmod_propagation.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_cmod_propagation.cpp
@@ -59,7 +59,8 @@ opt_cmod_propagation_local(bblock_t *block)
 
   if ((inst->opcode != BRW_OPCODE_AND &&
inst->opcode != BRW_OPCODE_CMP &&
-   inst->opcode != BRW_OPCODE_MOV) ||
+   inst->opcode != BRW_OPCODE_MOV &&
+   inst->opcode != BRW_OPCODE_NOT) ||
   inst->predicate != BRW_PREDICATE_NONE ||
   !inst->dst.is_null() ||
   inst->src[0].file != GRF ||
@@ -86,6 +87,11 @@ opt_cmod_propagation_local(bblock_t *block)
   inst->conditional_mod != BRW_CONDITIONAL_NZ)
  continue;
 
+  if (inst->opcode == BRW_OPCODE_NOT &&
+  inst->conditional_mod != BRW_CONDITIONAL_Z &&
+  inst->conditional_mod != BRW_CONDITIONAL_NZ)
+ continue;
+
   bool read_flag = false;
   foreach_inst_in_block_reverse_starting_from(fs_inst, scan_inst, inst,
   block) {
-- 
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] nir/search: Save/restore the variables_seen bitmask when matching

2015-05-08 Thread Jason Ekstrand
On Fri, May 8, 2015 at 3:32 PM, Ian Romanick  wrote:
> On 05/08/2015 03:31 PM, Ian Romanick wrote:
>> On 05/08/2015 03:25 PM, Jason Ekstrand wrote:
>>> On Fri, May 8, 2015 at 3:20 PM, Ian Romanick  wrote:
 On 05/08/2015 11:55 AM, Jason Ekstrand wrote:
> On Fri, May 8, 2015 at 11:53 AM, Jason Ekstrand  
> wrote:
>> total instructions in shared programs: 7152330 -> 7137006 (-0.21%)
>> instructions in affected programs: 1330548 -> 1315224 (-1.15%)
>> helped:5797
>> HURT:  76
>
> I'm doing some looking into the hurt programs.  It seems as if there
> are some very strange interatctions between flrp and ffma.  I'm still
> working out exactly how to fix it up.

 Yes, I noticed this too.  Did you see my reply to Ken earlier today?
 The problem I noticed /seems/ restricted to cases where the would-be
 interpolant is scalar but the other values are vector.
>>>
>>> It would surprise me a lot of that mattered.  At the point where we do
>>> opt_algebraic in NIR, we've already scalarized things.  That said,
>>> there is a *lot* going on in an optimization loop so anything's
>>> possible.
>>
>> If I take the shader_runner test that I included in the e-mail to Ken
>> and s/float alpha/vec3 alpha/ I get LRPs.  I made some obvious tweaks to
>> opt_algebraic to handle the mix of scalar and vector, and something like
>> 150 shaders were helped.

Ok, that's weird.  I really have no idea why that would make a
difference.  Unless, of course, it affects the optimizations that GLSL
IR is able to do which then, in turn, affects NIR.  That would make
some sense.

>> I spent about an hour digging into it, and I came up dry.  I have tried
>> adding some rules to nir_opt_algebraic.py to convert the fmul/ffma to
>> flrp, and I couldn't get a break point at the nir_opt_ffma to trigger.
>> I was going to ask about it at the office on Monday, but it came up on
>> the list first.
>
> FWIW, those rules were:
>
>(('ffma', b, c, ('fmul', a, ('fadd', 1.0, ('fneg', c, ('flrp', a, b, 
> c), '!options->lower_flrp'),
>(('ffma', c, b, ('fmul', a, ('fadd', 1.0, ('fneg', c, ('flrp', a, b, 
> c), '!options->lower_flrp'),

Did you add them to optimizations or late_optimizations?  The late
ones get run after opt_ffma and the others don't.

I poked around for an hour or two today with similar optimizations (I
actually had 6) but I wasn't able to get it to do any better.
--Jason
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] nir/search: Save/restore the variables_seen bitmask when matching

2015-05-08 Thread Ian Romanick
On 05/08/2015 03:31 PM, Ian Romanick wrote:
> On 05/08/2015 03:25 PM, Jason Ekstrand wrote:
>> On Fri, May 8, 2015 at 3:20 PM, Ian Romanick  wrote:
>>> On 05/08/2015 11:55 AM, Jason Ekstrand wrote:
 On Fri, May 8, 2015 at 11:53 AM, Jason Ekstrand  
 wrote:
> total instructions in shared programs: 7152330 -> 7137006 (-0.21%)
> instructions in affected programs: 1330548 -> 1315224 (-1.15%)
> helped:5797
> HURT:  76

 I'm doing some looking into the hurt programs.  It seems as if there
 are some very strange interatctions between flrp and ffma.  I'm still
 working out exactly how to fix it up.
>>>
>>> Yes, I noticed this too.  Did you see my reply to Ken earlier today?
>>> The problem I noticed /seems/ restricted to cases where the would-be
>>> interpolant is scalar but the other values are vector.
>>
>> It would surprise me a lot of that mattered.  At the point where we do
>> opt_algebraic in NIR, we've already scalarized things.  That said,
>> there is a *lot* going on in an optimization loop so anything's
>> possible.
> 
> If I take the shader_runner test that I included in the e-mail to Ken
> and s/float alpha/vec3 alpha/ I get LRPs.  I made some obvious tweaks to
> opt_algebraic to handle the mix of scalar and vector, and something like
> 150 shaders were helped.
> 
> I spent about an hour digging into it, and I came up dry.  I have tried
> adding some rules to nir_opt_algebraic.py to convert the fmul/ffma to
> flrp, and I couldn't get a break point at the nir_opt_ffma to trigger.
> I was going to ask about it at the office on Monday, but it came up on
> the list first.

FWIW, those rules were:

   (('ffma', b, c, ('fmul', a, ('fadd', 1.0, ('fneg', c, ('flrp', a, b, c), 
'!options->lower_flrp'),
   (('ffma', c, b, ('fmul', a, ('fadd', 1.0, ('fneg', c, ('flrp', a, b, c), 
'!options->lower_flrp'),

> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] nir/search: Save/restore the variables_seen bitmask when matching

2015-05-08 Thread Ian Romanick
On 05/08/2015 03:25 PM, Jason Ekstrand wrote:
> On Fri, May 8, 2015 at 3:20 PM, Ian Romanick  wrote:
>> On 05/08/2015 11:55 AM, Jason Ekstrand wrote:
>>> On Fri, May 8, 2015 at 11:53 AM, Jason Ekstrand  
>>> wrote:
 total instructions in shared programs: 7152330 -> 7137006 (-0.21%)
 instructions in affected programs: 1330548 -> 1315224 (-1.15%)
 helped:5797
 HURT:  76
>>>
>>> I'm doing some looking into the hurt programs.  It seems as if there
>>> are some very strange interatctions between flrp and ffma.  I'm still
>>> working out exactly how to fix it up.
>>
>> Yes, I noticed this too.  Did you see my reply to Ken earlier today?
>> The problem I noticed /seems/ restricted to cases where the would-be
>> interpolant is scalar but the other values are vector.
> 
> It would surprise me a lot of that mattered.  At the point where we do
> opt_algebraic in NIR, we've already scalarized things.  That said,
> there is a *lot* going on in an optimization loop so anything's
> possible.

If I take the shader_runner test that I included in the e-mail to Ken
and s/float alpha/vec3 alpha/ I get LRPs.  I made some obvious tweaks to
opt_algebraic to handle the mix of scalar and vector, and something like
150 shaders were helped.

I spent about an hour digging into it, and I came up dry.  I have tried
adding some rules to nir_opt_algebraic.py to convert the fmul/ffma to
flrp, and I couldn't get a break point at the nir_opt_ffma to trigger.
I was going to ask about it at the office on Monday, but it came up on
the list first.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] nir/search: Save/restore the variables_seen bitmask when matching

2015-05-08 Thread Jason Ekstrand
On Fri, May 8, 2015 at 3:20 PM, Ian Romanick  wrote:
> On 05/08/2015 11:55 AM, Jason Ekstrand wrote:
>> On Fri, May 8, 2015 at 11:53 AM, Jason Ekstrand  wrote:
>>> total instructions in shared programs: 7152330 -> 7137006 (-0.21%)
>>> instructions in affected programs: 1330548 -> 1315224 (-1.15%)
>>> helped:5797
>>> HURT:  76
>>
>> I'm doing some looking into the hurt programs.  It seems as if there
>> are some very strange interatctions between flrp and ffma.  I'm still
>> working out exactly how to fix it up.
>
> Yes, I noticed this too.  Did you see my reply to Ken earlier today?
> The problem I noticed /seems/ restricted to cases where the would-be
> interpolant is scalar but the other values are vector.

It would surprise me a lot of that mattered.  At the point where we do
opt_algebraic in NIR, we've already scalarized things.  That said,
there is a *lot* going on in an optimization loop so anything's
possible.
--Jason

>> --Jason
>>
>>> GAINED:0
>>> LOST:  8
>>> ---
>>>  src/glsl/nir/nir_search.c | 11 +++
>>>  1 file changed, 11 insertions(+)
>>>
>>> diff --git a/src/glsl/nir/nir_search.c b/src/glsl/nir/nir_search.c
>>> index d86655b..4c83349 100644
>>> --- a/src/glsl/nir/nir_search.c
>>> +++ b/src/glsl/nir/nir_search.c
>>> @@ -199,6 +199,10 @@ match_expression(const nir_search_expression *expr, 
>>> nir_alu_instr *instr,
>>>}
>>> }
>>>
>>> +   /* Stash off the current variables_seen bitmask.  This way we can
>>> +* restore it prior to matching in the commutative case below. */
>>> +   unsigned variables_seen_stash = state->variables_seen;
>>> +
>>> bool matched = true;
>>> for (unsigned i = 0; i < nir_op_infos[instr->op].num_inputs; i++) {
>>>/* If the source is an explicitly sized source, then we need to reset
>>> @@ -221,6 +225,13 @@ match_expression(const nir_search_expression *expr, 
>>> nir_alu_instr *instr,
>>>
>>> if (nir_op_infos[instr->op].algebraic_properties & 
>>> NIR_OP_IS_COMMUTATIVE) {
>>>assert(nir_op_infos[instr->op].num_inputs == 2);
>>> +
>>> +  /* Restore the variables_seen bitmask.  If we don't do this, then we
>>> +   * could end up with an erroneous failure due to variables found in 
>>> the
>>> +   * first match attempt above not matching those in the second.
>>> +   */
>>> +  state->variables_seen = variables_seen_stash;
>>> +
>>>if (!match_value(expr->srcs[0], instr, 1, num_components,
>>> swizzle, state))
>>>   return false;
>>> --
>>> 2.4.0
>>>
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] nir/search: Save/restore the variables_seen bitmask when matching

2015-05-08 Thread Ian Romanick
On 05/08/2015 11:55 AM, Jason Ekstrand wrote:
> On Fri, May 8, 2015 at 11:53 AM, Jason Ekstrand  wrote:
>> total instructions in shared programs: 7152330 -> 7137006 (-0.21%)
>> instructions in affected programs: 1330548 -> 1315224 (-1.15%)
>> helped:5797
>> HURT:  76
> 
> I'm doing some looking into the hurt programs.  It seems as if there
> are some very strange interatctions between flrp and ffma.  I'm still
> working out exactly how to fix it up.

Yes, I noticed this too.  Did you see my reply to Ken earlier today?
The problem I noticed /seems/ restricted to cases where the would-be
interpolant is scalar but the other values are vector.

> --Jason
> 
>> GAINED:0
>> LOST:  8
>> ---
>>  src/glsl/nir/nir_search.c | 11 +++
>>  1 file changed, 11 insertions(+)
>>
>> diff --git a/src/glsl/nir/nir_search.c b/src/glsl/nir/nir_search.c
>> index d86655b..4c83349 100644
>> --- a/src/glsl/nir/nir_search.c
>> +++ b/src/glsl/nir/nir_search.c
>> @@ -199,6 +199,10 @@ match_expression(const nir_search_expression *expr, 
>> nir_alu_instr *instr,
>>}
>> }
>>
>> +   /* Stash off the current variables_seen bitmask.  This way we can
>> +* restore it prior to matching in the commutative case below. */
>> +   unsigned variables_seen_stash = state->variables_seen;
>> +
>> bool matched = true;
>> for (unsigned i = 0; i < nir_op_infos[instr->op].num_inputs; i++) {
>>/* If the source is an explicitly sized source, then we need to reset
>> @@ -221,6 +225,13 @@ match_expression(const nir_search_expression *expr, 
>> nir_alu_instr *instr,
>>
>> if (nir_op_infos[instr->op].algebraic_properties & 
>> NIR_OP_IS_COMMUTATIVE) {
>>assert(nir_op_infos[instr->op].num_inputs == 2);
>> +
>> +  /* Restore the variables_seen bitmask.  If we don't do this, then we
>> +   * could end up with an erroneous failure due to variables found in 
>> the
>> +   * first match attempt above not matching those in the second.
>> +   */
>> +  state->variables_seen = variables_seen_stash;
>> +
>>if (!match_value(expr->srcs[0], instr, 1, num_components,
>> swizzle, state))
>>   return false;
>> --
>> 2.4.0
>>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/5] clover: Add a mutex to guard queue::queued_events

2015-05-08 Thread Francisco Jerez
Tom Stellard  writes:

> This fixes a potential crash where on a sequence like this:
>
> Thread 0: Check if queue is not empty.
> Thread 1: Remove item from queue, making it empty.
> Thread 0: Do something assuming queue is not empty.
> ---
>  src/gallium/state_trackers/clover/core/queue.cpp | 2 ++
>  src/gallium/state_trackers/clover/core/queue.hpp | 2 ++
>  2 files changed, 4 insertions(+)
>
> diff --git a/src/gallium/state_trackers/clover/core/queue.cpp 
> b/src/gallium/state_trackers/clover/core/queue.cpp
> index 24f9326..87f9dcc 100644
> --- a/src/gallium/state_trackers/clover/core/queue.cpp
> +++ b/src/gallium/state_trackers/clover/core/queue.cpp
> @@ -44,6 +44,7 @@ command_queue::flush() {
> pipe_screen *screen = device().pipe;
> pipe_fence_handle *fence = NULL;
>  
> +   std::lock_guard lock(queued_events_mutex);
> if (!queued_events.empty()) {
>pipe->flush(pipe, &fence, 0);
>  
> @@ -69,6 +70,7 @@ command_queue::profiling_enabled() const {
>  
>  void
>  command_queue::sequence(hard_event &ev) {
> +   std::lock_guard lock(queued_events_mutex);
> if (!queued_events.empty())
>queued_events.back()().chain(ev);
>  
> diff --git a/src/gallium/state_trackers/clover/core/queue.hpp 
> b/src/gallium/state_trackers/clover/core/queue.hpp
> index b7166e6..bddb86c 100644
> --- a/src/gallium/state_trackers/clover/core/queue.hpp
> +++ b/src/gallium/state_trackers/clover/core/queue.hpp
> @@ -24,6 +24,7 @@
>  #define CLOVER_CORE_QUEUE_HPP
>  
>  #include 
> +#include 
>  
>  #include "core/object.hpp"
>  #include "core/context.hpp"
> @@ -69,6 +70,7 @@ namespace clover {
>  
>cl_command_queue_properties props;
>pipe_context *pipe;
> +  std::mutex queued_events_mutex;
>std::deque> queued_events;
> };
>  }
> -- 
> 2.0.4

Thanks, this patch is:
Reviewed-by: Francisco Jerez 


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] nir/search: Assert that variable id's are in range

2015-05-08 Thread Connor Abbott
Reviewed-by: Connor Abbott 

On Fri, May 8, 2015 at 3:13 PM, Jason Ekstrand  wrote:
> ---
>  src/glsl/nir/nir_search.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/src/glsl/nir/nir_search.c b/src/glsl/nir/nir_search.c
> index 5ba0160..d86655b 100644
> --- a/src/glsl/nir/nir_search.c
> +++ b/src/glsl/nir/nir_search.c
> @@ -90,6 +90,7 @@ match_value(const nir_search_value *value, nir_alu_instr 
> *instr, unsigned src,
>
> case nir_search_value_variable: {
>nir_search_variable *var = nir_search_value_as_variable(value);
> +  assert(var->variable < NIR_SEARCH_MAX_VARIABLES);
>
>if (state->variables_seen & (1 << var->variable)) {
>   if (!nir_srcs_equal(state->variables[var->variable].src,
> --
> 2.4.0
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] nir/search: Save/restore the variables_seen bitmask when matching

2015-05-08 Thread Connor Abbott
On Fri, May 8, 2015 at 2:53 PM, Jason Ekstrand  wrote:
> total instructions in shared programs: 7152330 -> 7137006 (-0.21%)
> instructions in affected programs: 1330548 -> 1315224 (-1.15%)
> helped:5797
> HURT:  76
> GAINED:0
> LOST:  8
> ---
>  src/glsl/nir/nir_search.c | 11 +++
>  1 file changed, 11 insertions(+)
>
> diff --git a/src/glsl/nir/nir_search.c b/src/glsl/nir/nir_search.c
> index d86655b..4c83349 100644
> --- a/src/glsl/nir/nir_search.c
> +++ b/src/glsl/nir/nir_search.c
> @@ -199,6 +199,10 @@ match_expression(const nir_search_expression *expr, 
> nir_alu_instr *instr,
>}
> }
>
> +   /* Stash off the current variables_seen bitmask.  This way we can
> +* restore it prior to matching in the commutative case below. */

Puth the */ on the next line, and this is

Reviewed-by: Connor Abbott 

> +   unsigned variables_seen_stash = state->variables_seen;
> +
> bool matched = true;
> for (unsigned i = 0; i < nir_op_infos[instr->op].num_inputs; i++) {
>/* If the source is an explicitly sized source, then we need to reset
> @@ -221,6 +225,13 @@ match_expression(const nir_search_expression *expr, 
> nir_alu_instr *instr,
>
> if (nir_op_infos[instr->op].algebraic_properties & NIR_OP_IS_COMMUTATIVE) 
> {
>assert(nir_op_infos[instr->op].num_inputs == 2);
> +
> +  /* Restore the variables_seen bitmask.  If we don't do this, then we
> +   * could end up with an erroneous failure due to variables found in the
> +   * first match attempt above not matching those in the second.
> +   */
> +  state->variables_seen = variables_seen_stash;
> +
>if (!match_value(expr->srcs[0], instr, 1, num_components,
> swizzle, state))
>   return false;
> --
> 2.4.0
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] Stable libEGL ABI ?

2015-05-08 Thread Emil Velikov
Hi all,

Just had a quick look at Debian's repo and noticed something rather
worrying - the ABI of our libEGL is not stable.
Stable in the sense of - we provide a growing list of static symbols
for each new function an EGL extension adds.

This comes as we set EGL_EGLEXT_PROTOTYPES, prior to including
eglext.h. Which in turn exposes the the function prototypes which are
explicitly annotated with default visibility. This change was
introduced with

commit 1ed1027e886980b9b0f48fa6bfcf3d6e209c7787
Author: Brian Paul 
Date:   Tue May 27 13:45:41 2008 -0600

assorted changes to compile with new EGL 1.4 headers (untested)

From going through the EGL 1.3 to 1.5 spec, I cannot see any mention
that one should export EGL extension functions from their
implementation. On the contrary, it is explicitly mentioned that some
implementations may do so, but relying on it in your program is not
portable. Quick look at the Nvidia official driver shows only core EGL
functions, which aligns with various "undefined symbol
eglCreateImageKHR" issues that I've seen not too long ago - mir,
gstreamer, piglit.


So the question here is:

Can we "break" the already non-portable ABI, and hide everything that
is not core EGL, or alternatively how can we rework things so that as
we add new entry points, required by extensions, they are available
only dynamically ?

Personally I would opt for the former and address any issues in
piglit/waffle/foo accordingly. I would suspect that libepoxy is OK,
although I've never looked too closely at the code.

I would love to get this in some shape or form for 10.6, and avoid
exporting the two new functions from EGL_MESA_image_dma_buf_export :-\

Thanks
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 11/23] glsl/types: add new subroutine type

2015-05-08 Thread Chris Forbes
Patches 11-13 are:

Reviewed-by: Chris Forbes 

On Fri, Apr 24, 2015 at 1:42 PM, Dave Airlie  wrote:
> From: Dave Airlie 
>
> This type will be used to store the name of subroutine types
>
> as in subroutine void myfunc(void);
> will store myfunc into a subroutine type.
>
> This is required to the parser can identify a subroutine
> type in a uniform decleration as a valid type, and also for
> looking up the type later.
>
> Also add contains_subroutine method.
>
> Signed-off-by: Dave Airlie 
> ---
>  src/glsl/glsl_types.cpp| 63 
> ++
>  src/glsl/glsl_types.h  | 19 ++
>  src/glsl/ir_clone.cpp  |  1 +
>  src/glsl/link_uniform_initializers.cpp |  1 +
>  4 files changed, 84 insertions(+)
>
> diff --git a/src/glsl/glsl_types.cpp b/src/glsl/glsl_types.cpp
> index 9c9b7ef..37b5c62 100644
> --- a/src/glsl/glsl_types.cpp
> +++ b/src/glsl/glsl_types.cpp
> @@ -32,6 +32,7 @@ mtx_t glsl_type::mutex = _MTX_INITIALIZER_NP;
>  hash_table *glsl_type::array_types = NULL;
>  hash_table *glsl_type::record_types = NULL;
>  hash_table *glsl_type::interface_types = NULL;
> +hash_table *glsl_type::subroutine_types = NULL;
>  void *glsl_type::mem_ctx = NULL;
>
>  void
> @@ -159,6 +160,22 @@ glsl_type::glsl_type(const glsl_struct_field *fields, 
> unsigned num_fields,
> mtx_unlock(&glsl_type::mutex);
>  }
>
> +glsl_type::glsl_type(const char *subroutine_name) :
> +   gl_type(0),
> +   base_type(GLSL_TYPE_SUBROUTINE),
> +   sampler_dimensionality(0), sampler_shadow(0), sampler_array(0),
> +   sampler_type(0), interface_packing(0),
> +   vector_elements(0), matrix_columns(0),
> +   length(0)
> +{
> +   mtx_lock(&glsl_type::mutex);
> +
> +   init_ralloc_type_ctx();
> +   assert(subroutine_name != NULL);
> +   this->name = ralloc_strdup(this->mem_ctx, subroutine_name);
> +   this->vector_elements = 1;
> +   mtx_unlock(&glsl_type::mutex);
> +}
>
>  bool
>  glsl_type::contains_sampler() const
> @@ -229,6 +246,22 @@ glsl_type::contains_opaque() const {
> }
>  }
>
> +bool
> +glsl_type::contains_subroutine() const
> +{
> +   if (this->is_array()) {
> +  return this->fields.array->contains_subroutine();
> +   } else if (this->is_record()) {
> +  for (unsigned int i = 0; i < this->length; i++) {
> +if (this->fields.structure[i].type->contains_subroutine())
> +   return true;
> +  }
> +  return false;
> +   } else {
> +  return this->is_subroutine();
> +   }
> +}
> +
>  gl_texture_index
>  glsl_type::sampler_index() const
>  {
> @@ -826,6 +859,34 @@ glsl_type::get_interface_instance(const 
> glsl_struct_field *fields,
> return t;
>  }
>
> +const glsl_type *
> +glsl_type::get_subroutine_instance(const char *subroutine_name)
> +{
> +   const glsl_type key(subroutine_name);
> +
> +   mtx_lock(&glsl_type::mutex);
> +
> +   if (subroutine_types == NULL) {
> +  subroutine_types = hash_table_ctor(64, record_key_hash, 
> record_key_compare);
> +   }
> +
> +   const glsl_type *t = (glsl_type *) hash_table_find(subroutine_types, & 
> key);
> +   if (t == NULL) {
> +  mtx_unlock(&glsl_type::mutex);
> +  t = new glsl_type(subroutine_name);
> +  mtx_lock(&glsl_type::mutex);
> +
> +  hash_table_insert(subroutine_types, (void *) t, t);
> +   }
> +
> +   assert(t->base_type == GLSL_TYPE_SUBROUTINE);
> +   assert(strcmp(t->name, subroutine_name) == 0);
> +
> +   mtx_unlock(&glsl_type::mutex);
> +
> +   return t;
> +}
> +
>
>  const glsl_type *
>  glsl_type::get_mul_type(const glsl_type *type_a, const glsl_type *type_b)
> @@ -958,6 +1019,7 @@ glsl_type::component_slots() const
> case GLSL_TYPE_SAMPLER:
> case GLSL_TYPE_ATOMIC_UINT:
> case GLSL_TYPE_VOID:
> +   case GLSL_TYPE_SUBROUTINE:
> case GLSL_TYPE_ERROR:
>break;
> }
> @@ -1330,6 +1392,7 @@ glsl_type::count_attribute_slots() const
> case GLSL_TYPE_IMAGE:
> case GLSL_TYPE_ATOMIC_UINT:
> case GLSL_TYPE_VOID:
> +   case GLSL_TYPE_SUBROUTINE:
> case GLSL_TYPE_ERROR:
>break;
> }
> diff --git a/src/glsl/glsl_types.h b/src/glsl/glsl_types.h
> index d383dd5..078adaf 100644
> --- a/src/glsl/glsl_types.h
> +++ b/src/glsl/glsl_types.h
> @@ -59,6 +59,7 @@ enum glsl_base_type {
> GLSL_TYPE_INTERFACE,
> GLSL_TYPE_ARRAY,
> GLSL_TYPE_VOID,
> +   GLSL_TYPE_SUBROUTINE,
> GLSL_TYPE_ERROR
>  };
>
> @@ -276,6 +277,11 @@ struct glsl_type {
>   const char *block_name);
>
> /**
> +* Get the instance of an subroutine type
> +*/
> +   static const glsl_type *get_subroutine_instance(const char 
> *subroutine_name);
> +
> +   /**
>  * Get the type resulting from a multiplication of \p type_a * \p type_b
>  */
> static const glsl_type *get_mul_type(const glsl_type *type_a,
> @@ -526,6 +532,13 @@ struct glsl_type {
> /**
>  * Query if a type is unnamed/anonymous (named by the parser)
>  */
> +
> +   bool is_subroutine() const
> + 

Re: [Mesa-dev] [PATCH] nir: fix sampler lowering pass for arrays

2015-05-08 Thread Jason Ekstrand
Over-all, I think this is on the right track, but I still don't think
it's 100% correct.

On Fri, May 8, 2015 at 12:04 AM, Tapani Pälli  wrote:
>
>
> On 05/08/2015 09:56 AM, Pohjolainen, Topi wrote:
>>
>> On Fri, May 08, 2015 at 09:51:54AM +0300, Tapani P?lli wrote:
>>>
>>> This fixes bugs with special cases where we have arrays of
>>> structures containing samplers or arrays of samplers.
>>>
>>> I've verified that patch results in calculating same index value as
>>> returned by _mesa_get_sampler_uniform_value for IR. Patch makes
>>> following ES3 conformance test pass:
>>>
>>> ES3-CTS.shaders.struct.uniform.sampler_array_fragment
>>>
>>> Signed-off-by: Tapani Pälli 
>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90114
>>> ---
>>>   src/glsl/nir/nir_lower_samplers.cpp | 6 +-
>>>   1 file changed, 5 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/src/glsl/nir/nir_lower_samplers.cpp
>>> b/src/glsl/nir/nir_lower_samplers.cpp
>>> index cf8ab83..9859cc0 100644
>>> --- a/src/glsl/nir/nir_lower_samplers.cpp
>>> +++ b/src/glsl/nir/nir_lower_samplers.cpp
>>> @@ -78,7 +78,11 @@ lower_sampler(nir_tex_instr *instr, const struct
>>> gl_shader_program *shader_progr
>>>instr->sampler_index *= glsl_get_length(deref->type);

We really should get rid of the multiply since the sampler index is
zero up until the end and the multiply does nothing but confuse
people.

>>>switch (deref_array->deref_array_type) {
>>>case nir_deref_array_type_direct:
>>> -instr->sampler_index += deref_array->base_offset;
>>> +
>>> +/* If this is an array of samplers. */
>>
>>
>> Above the case is for arrays and below you check for the sampler. This
>> comment does not tell much extra :)
>
>
> Yeah, not sure how to change it. What I want to state here is that only for
> arrays of samplers we need to do this, otherwise we don't. The only other
> case really is array of structs that contain sampler so maybe I should state
> that instead?
>
>
>
>>> +if (deref->child->type->base_type == GLSL_TYPE_SAMPLER)
>>> +   instr->sampler_index += deref_array->base_offset;
>>> +
>>>   if (deref_array->deref.child)
>>>  ralloc_asprintf_append(&name, "[%u]",
>>> deref_array->base_offset);

The two conditionals above should be combined.  If the deref has a
child, it should not have type SAMPLER and vice-versa.  A better way
to do it would be

if (deref_array->deref.child) {
   ralloc_asprintf_append(&name, "[%u]", deref_array->base_offset);
} else {
   assert(deref->child->type->bbase_type == GLSL_TYPE_SAMPLER);
   instr->sampler_index = deref_array->base_offset;
}

Also, it may be better to do that outside of the switch and turn the
switch into an "if (deref_array->deref_array_type ==
deref_array_type_indirect)" to handle indirects.  Right now, I don't
think that we correctly handle an indirect with a base offset other
than 0.

Does that make sense?
--Jason

>>>   break;
>>> --
>>> 2.1.0
>>>
>>> ___
>>> mesa-dev mailing list
>>> mesa-dev@lists.freedesktop.org
>>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC PATCH] nir: Transform 4*x into x << 2 during late optimizations.

2015-05-08 Thread Eric Anholt
Ilia Mirkin  writes:

> On Fri, May 8, 2015 at 6:36 AM, Kenneth Graunke  wrote:
>> +   # Multiplication by 4 comes up fairly often in indirect offset 
>> calculations.
>> +   # Some GPUs have weird integer multiplication limitations, but shifts 
>> should work
>> +   # equally well everywhere.
>> +   (('imul', 4, a), ('ishl', a, 2)),
>
> Not sure what the cost of doing it this way, but really you want all
> powers of 2... and also udiv -> shr. Since this is python, should be
> easy enough to append onto that list. AFAIK all GPU's prefer a shift
> over a mul. Adreno doen't have 32-bit imul in the first place (and no
> idiv either).

I can confirm that I'd love shifts instead of imul/udiv on vc4.


signature.asc
Description: PGP signature
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] nir/search: Assert that variable id's are in range

2015-05-08 Thread Jason Ekstrand
---
 src/glsl/nir/nir_search.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/glsl/nir/nir_search.c b/src/glsl/nir/nir_search.c
index 5ba0160..d86655b 100644
--- a/src/glsl/nir/nir_search.c
+++ b/src/glsl/nir/nir_search.c
@@ -90,6 +90,7 @@ match_value(const nir_search_value *value, nir_alu_instr 
*instr, unsigned src,
 
case nir_search_value_variable: {
   nir_search_variable *var = nir_search_value_as_variable(value);
+  assert(var->variable < NIR_SEARCH_MAX_VARIABLES);
 
   if (state->variables_seen & (1 << var->variable)) {
  if (!nir_srcs_equal(state->variables[var->variable].src,
-- 
2.4.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC PATCH] nir: Transform 4*x into x << 2 during late optimizations.

2015-05-08 Thread Jason Ekstrand
On Fri, May 8, 2015 at 11:11 AM, Ian Romanick  wrote:
> On 05/08/2015 03:36 AM, Kenneth Graunke wrote:
>> According to Glenn, shifts on R600 have 5x the throughput as multiplies.
>>
>> Intel GPUs have strange integer multiplication restrictions - on most
>> hardware, MUL actually only does a 32-bit x 16-bit multiply.  This
>> means the arguments aren't commutative, which can limit our constant
>> propagation options.  SHL has no such restrictions.
>>
>> Shifting is probably reasonable on most people's hardware, so let's just
>> do that.
>>
>> i965 shader-db results (using NIR for VS):
>> total instructions in shared programs: 7432587 -> 7388982 (-0.59%)
>> instructions in affected programs: 1360411 -> 1316806 (-3.21%)
>> helped:5772
>> HURT:  0
>>
>> Signed-off-by: Kenneth Graunke 
>> Cc: matts...@gmail.com
>> Cc: ja...@jlekstrand.net
>> ---
>>  src/glsl/nir/nir_opt_algebraic.py | 5 +
>>  1 file changed, 5 insertions(+)
>>
>> So...I found a bizarre issue with this patch.
>>
>>(('imul', 4, a), ('ishl', a, 2)),
>>
>> totally optimizes things.  However...
>>
>>(('imul', a, 4), ('ishl', a, 2)),
>>
>> doesn't seem to do anything, even though imul is commutative, and nir_search
>> should totally handle that...
>>
>>  ▄▄  ▄▄▄▄    ▄   ▄▄
>>  ██  ██   ▀▀▀██▀▀▀  ███  ██
>>  ▀█▄ ██ ▄█▀      ██ ▄█▀  ██
>>   ██ ██ ██   ██  ██  ██   ▄██▀   ██
>>   ███▀▀███   ██  ██   ██ ▀▀
>>   ███  ███  ▄██  ██▄ ██   ▄▄ ▄▄
>>   ▀▀▀  ▀▀▀  ▀▀▀▀ ▀▀   ▀▀ ▀▀
>>
>> If you know why, let me know, otherwise I may have to look into it when more
>> awake.
>
> I've noticed a couple other weird things that I have been unable to
> understand.  Shaders like the one below end with fmul/ffma instaed of
> flrp, for example.  I understand why that happens from GLSL IR
> opt_algebraic, but it seems like nir_opt_algebraic should handle it.

Just a guess, but it's quite possibly due to the commutative
operations bug I just sent a patch for.
--Jason

> [require]
> GLSL >= 1.30
>
> [vertex shader]
> in vec4 v;
> in vec2 tc_in;
>
> out vec2 tc;
>
> void main() {
> gl_Position = v;
> tc = tc_in;
> }
>
> [fragment shader]
> in vec2 tc;
>
> out vec4 color;
>
> uniform sampler2D s;
> uniform float a;
> uniform vec3 base_color;
>
> void main() {
> vec3 tex_color = texture(s, tc).xyz;
>
> color.xyz = (base_color * a) + (tex_color * (1.0 - a));
> color.a = 1.0;
> }
>
>
>
>> diff --git a/src/glsl/nir/nir_opt_algebraic.py 
>> b/src/glsl/nir/nir_opt_algebraic.py
>> index 400d60e..350471f 100644
>> --- a/src/glsl/nir/nir_opt_algebraic.py
>> +++ b/src/glsl/nir/nir_opt_algebraic.py
>> @@ -247,6 +247,11 @@ late_optimizations = [
>> (('fge', ('fadd', a, b), 0.0), ('fge', a, ('fneg', b))),
>> (('feq', ('fadd', a, b), 0.0), ('feq', a, ('fneg', b))),
>> (('fne', ('fadd', a, b), 0.0), ('fne', a, ('fneg', b))),
>> +
>> +   # Multiplication by 4 comes up fairly often in indirect offset 
>> calculations.
>> +   # Some GPUs have weird integer multiplication limitations, but shifts 
>> should work
>> +   # equally well everywhere.
>> +   (('imul', 4, a), ('ishl', a, 2)),
>
> This should be conditionalized on whether the platform has native integers.
>
>>  ]
>>
>>  print nir_algebraic.AlgebraicPass("nir_opt_algebraic", 
>> optimizations).render()
>>
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] nir/search: Save/restore the variables_seen bitmask when matching

2015-05-08 Thread Jason Ekstrand
On Fri, May 8, 2015 at 11:53 AM, Jason Ekstrand  wrote:
> total instructions in shared programs: 7152330 -> 7137006 (-0.21%)
> instructions in affected programs: 1330548 -> 1315224 (-1.15%)
> helped:5797
> HURT:  76

I'm doing some looking into the hurt programs.  It seems as if there
are some very strange interatctions between flrp and ffma.  I'm still
working out exactly how to fix it up.
--Jason

> GAINED:0
> LOST:  8
> ---
>  src/glsl/nir/nir_search.c | 11 +++
>  1 file changed, 11 insertions(+)
>
> diff --git a/src/glsl/nir/nir_search.c b/src/glsl/nir/nir_search.c
> index d86655b..4c83349 100644
> --- a/src/glsl/nir/nir_search.c
> +++ b/src/glsl/nir/nir_search.c
> @@ -199,6 +199,10 @@ match_expression(const nir_search_expression *expr, 
> nir_alu_instr *instr,
>}
> }
>
> +   /* Stash off the current variables_seen bitmask.  This way we can
> +* restore it prior to matching in the commutative case below. */
> +   unsigned variables_seen_stash = state->variables_seen;
> +
> bool matched = true;
> for (unsigned i = 0; i < nir_op_infos[instr->op].num_inputs; i++) {
>/* If the source is an explicitly sized source, then we need to reset
> @@ -221,6 +225,13 @@ match_expression(const nir_search_expression *expr, 
> nir_alu_instr *instr,
>
> if (nir_op_infos[instr->op].algebraic_properties & NIR_OP_IS_COMMUTATIVE) 
> {
>assert(nir_op_infos[instr->op].num_inputs == 2);
> +
> +  /* Restore the variables_seen bitmask.  If we don't do this, then we
> +   * could end up with an erroneous failure due to variables found in the
> +   * first match attempt above not matching those in the second.
> +   */
> +  state->variables_seen = variables_seen_stash;
> +
>if (!match_value(expr->srcs[0], instr, 1, num_components,
> swizzle, state))
>   return false;
> --
> 2.4.0
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] nir/search: Save/restore the variables_seen bitmask when matching

2015-05-08 Thread Jason Ekstrand
total instructions in shared programs: 7152330 -> 7137006 (-0.21%)
instructions in affected programs: 1330548 -> 1315224 (-1.15%)
helped:5797
HURT:  76
GAINED:0
LOST:  8
---
 src/glsl/nir/nir_search.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/src/glsl/nir/nir_search.c b/src/glsl/nir/nir_search.c
index d86655b..4c83349 100644
--- a/src/glsl/nir/nir_search.c
+++ b/src/glsl/nir/nir_search.c
@@ -199,6 +199,10 @@ match_expression(const nir_search_expression *expr, 
nir_alu_instr *instr,
   }
}
 
+   /* Stash off the current variables_seen bitmask.  This way we can
+* restore it prior to matching in the commutative case below. */
+   unsigned variables_seen_stash = state->variables_seen;
+
bool matched = true;
for (unsigned i = 0; i < nir_op_infos[instr->op].num_inputs; i++) {
   /* If the source is an explicitly sized source, then we need to reset
@@ -221,6 +225,13 @@ match_expression(const nir_search_expression *expr, 
nir_alu_instr *instr,
 
if (nir_op_infos[instr->op].algebraic_properties & NIR_OP_IS_COMMUTATIVE) {
   assert(nir_op_infos[instr->op].num_inputs == 2);
+
+  /* Restore the variables_seen bitmask.  If we don't do this, then we
+   * could end up with an erroneous failure due to variables found in the
+   * first match attempt above not matching those in the second.
+   */
+  state->variables_seen = variables_seen_stash;
+
   if (!match_value(expr->srcs[0], instr, 1, num_components,
swizzle, state))
  return false;
-- 
2.4.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/fs: Disable opt_sampler_eot for textureGather

2015-05-08 Thread Chris Forbes
I don't have CHV or SKL hw or docs to try and confirm this, but this
does what it claims to.

Reviewed-by: Chris Forbes 

On Sat, May 9, 2015 at 5:10 AM, Neil Roberts  wrote:
> The opt_sampler_eot optimisation seems to break when the last
> instruction is SHADER_OPCODE_TG4. A bunch of Piglit tests end up doing
> this so it causes a lot of regressions. I can't find any documentation
> or known workarounds to indicate that this is expected behaviour, but
> considering that this is probably a pretty unlikely situation in a
> real use case we might as well disable it in order to avoid the
> regressions. In total this fixes 451 tests.
>
> Reviewed-by: Ben Widawsky 
> ---
>
> See here for some more discussion of this:
>
> http://lists.freedesktop.org/archives/mesa-dev/2015-May/083640.html
>
> As far as I can tell the Jenkins run mentioned in that email doesn't
> seem to have any tests on Cherryview or Skylake so that probably
> explains why it didn't pick up the regression.
>
>  src/mesa/drivers/dri/i965/brw_fs.cpp | 10 ++
>  1 file changed, 10 insertions(+)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs.cpp
> index 8dd680e..e9528e0 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> @@ -2655,6 +2655,16 @@ fs_visitor::opt_sampler_eot()
> if (unlikely(tex_inst->is_head_sentinel()) || !tex_inst->is_tex())
>return false;
>
> +   /* This optimisation doesn't seem to work for textureGather for some
> +* reason. I can't find any documentation or known workarounds to indicate
> +* that this is expected, but considering that it is probably pretty
> +* unlikely that a shader would directly write out the results from
> +* textureGather we might as well just disable it.
> +*/
> +   if (tex_inst->opcode == SHADER_OPCODE_TG4 ||
> +   tex_inst->opcode == SHADER_OPCODE_TG4_OFFSET)
> +  return false;
> +
> /* If there's no header present, we need to munge the LOAD_PAYLOAD as 
> well.
>  * It's very likely to be the previous instruction.
>  */
> --
> 1.9.3
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/2 v3] i965: Use predicate enable bit for conditional rendering w/o stalling

2015-05-08 Thread Kenneth Graunke
On Friday, May 08, 2015 07:04:31 PM Neil Roberts wrote:
> Previously whenever a primitive is drawn the driver would call
> _mesa_check_conditional_render which blocks waiting for the result of
> the query to determine whether to render. On Gen7+ there is a bit in
> the 3DPRIMITIVE command which can be used to disable the primitive
> based on the value of a state bit. This state bit can be set based on
> whether two registers have different values using the MI_PREDICATE
> command. We can load these two registers with the pixel count values
> stored in the query begin and end to implement conditional rendering
> without stalling.
> 
> Unfortunately these two source registers were not in the whitelist of
> available registers in the kernel driver until v3.19. This patch uses
> the command parser version from intel_screen to detect whether to
> attempt to set the predicate data registers.
> 
> The predicate enable bit is currently only used for drawing 3D
> primitives. For blits, clears, bitmaps, copypixels and drawpixels it
> still causes a stall. For most of these it would probably just work to
> call the new brw_check_conditional_render function instead of
> _mesa_check_conditional_render because they already work in terms of
> rendering primitives. However it's a bit trickier for blits because it
> can use the BLT ring or the blorp codepath. I think these operations
> are less useful for conditional rendering than rendering primitives so
> it might be best to leave it for a later patch.
> 
> v2: Use the command parser version to detect whether we can write to
> the predicate data registers instead of trying to execute a
> register load command.
> v3: Simple rebase
> ---
>  src/mesa/drivers/dri/i965/Makefile.sources |   1 +
>  src/mesa/drivers/dri/i965/brw_conditional_render.c | 167 
> +
>  src/mesa/drivers/dri/i965/brw_context.c|   4 +
>  src/mesa/drivers/dri/i965/brw_context.h|  21 +++
>  src/mesa/drivers/dri/i965/brw_defines.h|   1 +
>  src/mesa/drivers/dri/i965/brw_draw.c   |  16 +-
>  src/mesa/drivers/dri/i965/brw_queryobj.c   |  17 ++-
>  src/mesa/drivers/dri/i965/intel_extensions.c   |   5 +
>  src/mesa/drivers/dri/i965/intel_reg.h  |  23 +++
>  9 files changed, 243 insertions(+), 12 deletions(-)
>  create mode 100644 src/mesa/drivers/dri/i965/brw_conditional_render.c
> 
> diff --git a/src/mesa/drivers/dri/i965/Makefile.sources 
> b/src/mesa/drivers/dri/i965/Makefile.sources
> index 1ae93e1..a24c20a 100644
> --- a/src/mesa/drivers/dri/i965/Makefile.sources
> +++ b/src/mesa/drivers/dri/i965/Makefile.sources
> @@ -18,6 +18,7 @@ i965_FILES = \
>   brw_clip_unfilled.c \
>   brw_clip_util.c \
>   brw_compute.c \
> + brw_conditional_render.c \
>   brw_context.c \
>   brw_context.h \
>   brw_cs.cpp \
> diff --git a/src/mesa/drivers/dri/i965/brw_conditional_render.c 
> b/src/mesa/drivers/dri/i965/brw_conditional_render.c
> new file mode 100644
> index 000..e676aa0
> --- /dev/null
> +++ b/src/mesa/drivers/dri/i965/brw_conditional_render.c
> @@ -0,0 +1,167 @@
> +/*
> + * Copyright © 2014 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 
> DEALINGS
> + * IN THE SOFTWARE.
> + *
> + * Authors:
> + *Neil Roberts 
> + */
> +
> +/** @file brw_conditional_render.c
> + *
> + * Support for conditional rendering based on query objects
> + * (GL_NV_conditional_render, GL_ARB_conditional_render_inverted) on Gen7+.
> + */
> +
> +#include "main/imports.h"
> +#include "main/condrender.h"
> +
> +#include "brw_context.h"
> +#include "brw_defines.h"
> +#include "intel_batchbuffer.h"
> +
> +static void
> +set_predicate_enable(struct brw_context *brw,
> + bool value)
> +{
> +   if (value)
> +  brw->predicate.state = BRW_PREDICATE_STATE_RENDER;
> +   else
> +  brw->predicate.stat

Re: [Mesa-dev] [PATCH 1/5] clover: Add threadsafe wrappers for pipe_screen and pipe_context

2015-05-08 Thread Emil Velikov
Hi Tom,

On 8 May 2015 at 00:36, Tom Stellard  wrote:
> Events can be added to an OpenCL command queue concurrently from
> multiple threads, but pipe_context and pipe_screen objects
> are not threadsafe.  The threadsafe wrappers protect all pipe_screen
> and pipe_context function calls with a mutex, so we can safely use
> them with multiple threads.
> ---
>  src/gallium/state_trackers/clover/Makefile.am  |   6 +-
>  src/gallium/state_trackers/clover/Makefile.sources |   4 +
>  src/gallium/state_trackers/clover/core/device.cpp  |   2 +
>  .../clover/core/pipe_threadsafe_context.c  | 272 
> +
>  .../clover/core/pipe_threadsafe_screen.c   | 184 ++
>  .../state_trackers/clover/core/threadsafe.h|  39 +++
>  src/gallium/targets/opencl/Makefile.am |   3 +-
>  7 files changed, 508 insertions(+), 2 deletions(-)
>  create mode 100644 
> src/gallium/state_trackers/clover/core/pipe_threadsafe_context.c
>  create mode 100644 
> src/gallium/state_trackers/clover/core/pipe_threadsafe_screen.c
>  create mode 100644 src/gallium/state_trackers/clover/core/threadsafe.h
>

> --- a/src/gallium/state_trackers/clover/Makefile.sources
> +++ b/src/gallium/state_trackers/clover/Makefile.sources
> @@ -53,6 +53,10 @@ CPP_SOURCES := \
> util/range.hpp \
> util/tuple.hpp
>
> +C_SOURCES := \
> +   core/pipe_threadsafe_context.c \
> +   core/pipe_threadsafe_screen.c
> +
Regardless of the approach (re Francisco's comment), please keep the
header (core/threadsafe.h) within the sources list.

Thanks
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC PATCH] nir: Transform 4*x into x << 2 during late optimizations.

2015-05-08 Thread Ilia Mirkin
On Fri, May 8, 2015 at 6:36 AM, Kenneth Graunke  wrote:
> +   # Multiplication by 4 comes up fairly often in indirect offset 
> calculations.
> +   # Some GPUs have weird integer multiplication limitations, but shifts 
> should work
> +   # equally well everywhere.
> +   (('imul', 4, a), ('ishl', a, 2)),

Not sure what the cost of doing it this way, but really you want all
powers of 2... and also udiv -> shr. Since this is python, should be
easy enough to append onto that list. AFAIK all GPU's prefer a shift
over a mul. Adreno doen't have 32-bit imul in the first place (and no
idiv either).

In nouveau/codegen we just have a single check for whether the
immediate is a power of 2, perhaps that can be encoded here in some
way.

Cheers,

  -ilia
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/fs_nir: Put the immediate in src1 for commutative ALU ops

2015-05-08 Thread Jason Ekstrand
On Fri, May 8, 2015 at 11:15 AM, Matt Turner  wrote:
> On Fri, May 8, 2015 at 11:13 AM, Jason Ekstrand  wrote:
>> On Fri, May 8, 2015 at 11:05 AM, Kenneth Graunke  
>> wrote:
>>> On Friday, May 08, 2015 10:18:44 AM Matt Turner wrote:
 On Fri, May 8, 2015 at 10:04 AM, Jason Ekstrand  
 wrote:
 > Shader-db results for fragment shaders on Broadwell:
 >
 >total instructions in shared programs: 4310987 -> 4310663 (-0.01%)
 >instructions in affected programs: 43242 -> 42918 (-0.75%)
 >helped:142
 >HURT:  0
 >
 > Shader-db results for vertex shaders on Broadwell:
 >
 >total instructions in shared programs: 2889581 -> 2844309 (-1.57%)
 >instructions in affected programs: 1418720 -> 1373448 (-3.19%)
 >helped:6139
 >HURT:  0
 > ---
 >  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 12 
 >  1 file changed, 12 insertions(+)
 >
 > diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
 > b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
 > index 555987d..161a262 100644
 > --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
 > +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
 > @@ -21,6 +21,8 @@
 >   * IN THE SOFTWARE.
 >   */
 >
 > +#include 
 > +
 >  #include "glsl/ir.h"
 >  #include "glsl/ir_optimization.h"
 >  #include "glsl/nir/glsl_to_nir.h"
 > @@ -662,6 +664,16 @@ fs_visitor::nir_emit_alu(nir_alu_instr *instr)
 >op[i] = offset(op[i], instr->src[i].swizzle[channel]);
 > }
 >
 > +   /* Immediates can only be used as the second source of two-source
 > +* instructions.  We have code in opt_algebraic to flip them as 
 > needed
 > +* for most instructions.  However, it doesn't hurt anything to just 
 > do
 > +* the right thing if we can detect it at the NIR level.
 > +*/
 > +   if ((nir_op_infos[instr->op].algebraic_properties & 
 > NIR_OP_IS_COMMUTATIVE) &&
 > +   nir_src_as_const_value(instr->src[0].src)) {
 > +  std::swap(op[0], op[1]);
 > +   }
 > +

 The real problem is that we haven't lifted the restriction about
 non-commutative integer multiply on Broadwell:

 brw_fs_copy_propagation.cpp:

/* Fit this constant in by commuting the operands.
 * Exception: we can't do this for 32-bit integer MUL/MACH
 * because it's asymmetric.
 */
if ((inst->opcode == BRW_OPCODE_MUL ||
 inst->opcode == BRW_OPCODE_MACH) &&
(inst->src[1].type == BRW_REGISTER_TYPE_D ||
 inst->src[1].type == BRW_REGISTER_TYPE_UD))
   break;
inst->src[0] = inst->src[1];
inst->src[1] = val;
progress = true;
>>>
>>> Yeah.  I also wrote a patch to do that, adding
>>>
>>>(brw->gen < 8 || brw->is_cherryview) &&
>>
>> In that case, shouldn't Cherry View take the same path as hsw when
>> emitting multiplies and look for 15-bit constants?
>
> Almost... that path needs to set one of the MUL's source types to W/UW
> and the stride to 2, like in commit 0c06d019. I'm planning to rip out
> all of the multiplication code from brw_fs_visitor.cpp/brw_fs_nir.cpp
> and move it to a common lowering pass, so I'll fix that at the same
> time.

Awesome!  Thanks for working on that!
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/fs_nir: Put the immediate in src1 for commutative ALU ops

2015-05-08 Thread Matt Turner
On Fri, May 8, 2015 at 11:13 AM, Jason Ekstrand  wrote:
> On Fri, May 8, 2015 at 11:05 AM, Kenneth Graunke  
> wrote:
>> On Friday, May 08, 2015 10:18:44 AM Matt Turner wrote:
>>> On Fri, May 8, 2015 at 10:04 AM, Jason Ekstrand  
>>> wrote:
>>> > Shader-db results for fragment shaders on Broadwell:
>>> >
>>> >total instructions in shared programs: 4310987 -> 4310663 (-0.01%)
>>> >instructions in affected programs: 43242 -> 42918 (-0.75%)
>>> >helped:142
>>> >HURT:  0
>>> >
>>> > Shader-db results for vertex shaders on Broadwell:
>>> >
>>> >total instructions in shared programs: 2889581 -> 2844309 (-1.57%)
>>> >instructions in affected programs: 1418720 -> 1373448 (-3.19%)
>>> >helped:6139
>>> >HURT:  0
>>> > ---
>>> >  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 12 
>>> >  1 file changed, 12 insertions(+)
>>> >
>>> > diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
>>> > b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>>> > index 555987d..161a262 100644
>>> > --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>>> > +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>>> > @@ -21,6 +21,8 @@
>>> >   * IN THE SOFTWARE.
>>> >   */
>>> >
>>> > +#include 
>>> > +
>>> >  #include "glsl/ir.h"
>>> >  #include "glsl/ir_optimization.h"
>>> >  #include "glsl/nir/glsl_to_nir.h"
>>> > @@ -662,6 +664,16 @@ fs_visitor::nir_emit_alu(nir_alu_instr *instr)
>>> >op[i] = offset(op[i], instr->src[i].swizzle[channel]);
>>> > }
>>> >
>>> > +   /* Immediates can only be used as the second source of two-source
>>> > +* instructions.  We have code in opt_algebraic to flip them as needed
>>> > +* for most instructions.  However, it doesn't hurt anything to just 
>>> > do
>>> > +* the right thing if we can detect it at the NIR level.
>>> > +*/
>>> > +   if ((nir_op_infos[instr->op].algebraic_properties & 
>>> > NIR_OP_IS_COMMUTATIVE) &&
>>> > +   nir_src_as_const_value(instr->src[0].src)) {
>>> > +  std::swap(op[0], op[1]);
>>> > +   }
>>> > +
>>>
>>> The real problem is that we haven't lifted the restriction about
>>> non-commutative integer multiply on Broadwell:
>>>
>>> brw_fs_copy_propagation.cpp:
>>>
>>>/* Fit this constant in by commuting the operands.
>>> * Exception: we can't do this for 32-bit integer MUL/MACH
>>> * because it's asymmetric.
>>> */
>>>if ((inst->opcode == BRW_OPCODE_MUL ||
>>> inst->opcode == BRW_OPCODE_MACH) &&
>>>(inst->src[1].type == BRW_REGISTER_TYPE_D ||
>>> inst->src[1].type == BRW_REGISTER_TYPE_UD))
>>>   break;
>>>inst->src[0] = inst->src[1];
>>>inst->src[1] = val;
>>>progress = true;
>>
>> Yeah.  I also wrote a patch to do that, adding
>>
>>(brw->gen < 8 || brw->is_cherryview) &&
>
> In that case, shouldn't Cherry View take the same path as hsw when
> emitting multiplies and look for 15-bit constants?

Almost... that path needs to set one of the MUL's source types to W/UW
and the stride to 2, like in commit 0c06d019. I'm planning to rip out
all of the multiplication code from brw_fs_visitor.cpp/brw_fs_nir.cpp
and move it to a common lowering pass, so I'll fix that at the same
time.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/5] nir: Define image load, store and atomic intrinsics.

2015-05-08 Thread Kenneth Graunke
On Tuesday, May 05, 2015 11:29:52 PM Francisco Jerez wrote:
> ---
>  src/glsl/nir/nir_intrinsics.h | 27 +++
>  1 file changed, 27 insertions(+)
> 
> diff --git a/src/glsl/nir/nir_intrinsics.h b/src/glsl/nir/nir_intrinsics.h
> index 8e28765..4b13c75 100644
> --- a/src/glsl/nir/nir_intrinsics.h
> +++ b/src/glsl/nir/nir_intrinsics.h
> @@ -89,6 +89,33 @@ ATOMIC(inc, 0)
>  ATOMIC(dec, 0)
>  ATOMIC(read, NIR_INTRINSIC_CAN_ELIMINATE)
>  
> +/*
> + * Image load, store and atomic intrinsics.
> + *
> + * All image intrinsics take an image target passed as a nir_variable.  Image
> + * variables contain a number of memory and layout qualifiers that influence
> + * the semantics of the intrinsic.
> + *
> + * All image intrinsics take a four-coordinate vector and a sample index as
> + * first two sources, determining the location within the image that will be
> + * accessed by the intrinsic.  Components not applicable to the image target
> + * in use are equal to zero by convention.  Image store takes an additional
> + * four-component argument with the value to be written, and image atomic
> + * operations take either one or two additional scalar arguments with the 
> same
> + * meaning as in the ARB_shader_image_load_store specification.
> + */
> +INTRINSIC(image_load, 2, ARR(4, 1), true, 4, 1, 0,
> +  NIR_INTRINSIC_CAN_ELIMINATE)
> +INTRINSIC(image_store, 3, ARR(4, 1, 4), false, 0, 1, 0, 0)
> +INTRINSIC(image_atomic_add, 3, ARR(4, 1, 1), true, 1, 1, 0, 0)
> +INTRINSIC(image_atomic_min, 3, ARR(4, 1, 1), true, 1, 1, 0, 0)
> +INTRINSIC(image_atomic_max, 3, ARR(4, 1, 1), true, 1, 1, 0, 0)
> +INTRINSIC(image_atomic_and, 3, ARR(4, 1, 1), true, 1, 1, 0, 0)
> +INTRINSIC(image_atomic_or, 3, ARR(4, 1, 1), true, 1, 1, 0, 0)
> +INTRINSIC(image_atomic_xor, 3, ARR(4, 1, 1), true, 1, 1, 0, 0)
> +INTRINSIC(image_atomic_exchange, 3, ARR(4, 1, 1), true, 1, 1, 0, 0)
> +INTRINSIC(image_atomic_comp_swap, 4, ARR(4, 1, 1, 1), true, 1, 1, 0, 0)
> +
>  #define SYSTEM_VALUE(name, components) \
> INTRINSIC(load_##name, 0, ARR(), true, components, 0, 0, \
> NIR_INTRINSIC_CAN_ELIMINATE | NIR_INTRINSIC_CAN_REORDER)
> 

Looks great, Curro.  Expanding the coordinate to a vec4 and always
having the parameters present for e.g. sample_index does reduce the
combinatorial explosion a lot.  FWIW, I also prefer undefined.

This should work out fine for SPIR-V too, it's pretty trivial to go
from their style to this (just combine the two - it's SSA after all).

These 5 are:
Reviewed-by: Kenneth Graunke 


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/fs_nir: Put the immediate in src1 for commutative ALU ops

2015-05-08 Thread Jason Ekstrand
On Fri, May 8, 2015 at 11:05 AM, Kenneth Graunke  wrote:
> On Friday, May 08, 2015 10:18:44 AM Matt Turner wrote:
>> On Fri, May 8, 2015 at 10:04 AM, Jason Ekstrand  wrote:
>> > Shader-db results for fragment shaders on Broadwell:
>> >
>> >total instructions in shared programs: 4310987 -> 4310663 (-0.01%)
>> >instructions in affected programs: 43242 -> 42918 (-0.75%)
>> >helped:142
>> >HURT:  0
>> >
>> > Shader-db results for vertex shaders on Broadwell:
>> >
>> >total instructions in shared programs: 2889581 -> 2844309 (-1.57%)
>> >instructions in affected programs: 1418720 -> 1373448 (-3.19%)
>> >helped:6139
>> >HURT:  0
>> > ---
>> >  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 12 
>> >  1 file changed, 12 insertions(+)
>> >
>> > diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
>> > b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> > index 555987d..161a262 100644
>> > --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> > +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> > @@ -21,6 +21,8 @@
>> >   * IN THE SOFTWARE.
>> >   */
>> >
>> > +#include 
>> > +
>> >  #include "glsl/ir.h"
>> >  #include "glsl/ir_optimization.h"
>> >  #include "glsl/nir/glsl_to_nir.h"
>> > @@ -662,6 +664,16 @@ fs_visitor::nir_emit_alu(nir_alu_instr *instr)
>> >op[i] = offset(op[i], instr->src[i].swizzle[channel]);
>> > }
>> >
>> > +   /* Immediates can only be used as the second source of two-source
>> > +* instructions.  We have code in opt_algebraic to flip them as needed
>> > +* for most instructions.  However, it doesn't hurt anything to just do
>> > +* the right thing if we can detect it at the NIR level.
>> > +*/
>> > +   if ((nir_op_infos[instr->op].algebraic_properties & 
>> > NIR_OP_IS_COMMUTATIVE) &&
>> > +   nir_src_as_const_value(instr->src[0].src)) {
>> > +  std::swap(op[0], op[1]);
>> > +   }
>> > +
>>
>> The real problem is that we haven't lifted the restriction about
>> non-commutative integer multiply on Broadwell:
>>
>> brw_fs_copy_propagation.cpp:
>>
>>/* Fit this constant in by commuting the operands.
>> * Exception: we can't do this for 32-bit integer MUL/MACH
>> * because it's asymmetric.
>> */
>>if ((inst->opcode == BRW_OPCODE_MUL ||
>> inst->opcode == BRW_OPCODE_MACH) &&
>>(inst->src[1].type == BRW_REGISTER_TYPE_D ||
>> inst->src[1].type == BRW_REGISTER_TYPE_UD))
>>   break;
>>inst->src[0] = inst->src[1];
>>inst->src[1] = val;
>>progress = true;
>
> Yeah.  I also wrote a patch to do that, adding
>
>(brw->gen < 8 || brw->is_cherryview) &&

In that case, shouldn't Cherry View take the same path as hsw when
emitting multiplies and look for 15-bit constants?

> which also solves the problem.  But it won't help on Cherryview, which I
> believe still has the asymmetrical multiply, while switching to shifts
> will.  I suppose another alternative to NIR late optimizations is to
> have brw_fs_nir's handling of imul check for power of two sizes and emit
> a SHL.  That would be easy.

I really don't think SHL is the issue here.  It's that we're being
stupid about multiplies.  SHL is a nice hack but unless it is actually
faster, I think it's hacking around the problem.

> I do think we really need to make logical IMUL opcodes and lower them to
> MUL/MACH as needed later, so we don't need to worry about breaking MACH
> in cases like this.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/fs_nir: Put the immediate in src1 for commutative ALU ops

2015-05-08 Thread Matt Turner
On Fri, May 8, 2015 at 11:05 AM, Kenneth Graunke  wrote:
> On Friday, May 08, 2015 10:18:44 AM Matt Turner wrote:
>> On Fri, May 8, 2015 at 10:04 AM, Jason Ekstrand  wrote:
>> > Shader-db results for fragment shaders on Broadwell:
>> >
>> >total instructions in shared programs: 4310987 -> 4310663 (-0.01%)
>> >instructions in affected programs: 43242 -> 42918 (-0.75%)
>> >helped:142
>> >HURT:  0
>> >
>> > Shader-db results for vertex shaders on Broadwell:
>> >
>> >total instructions in shared programs: 2889581 -> 2844309 (-1.57%)
>> >instructions in affected programs: 1418720 -> 1373448 (-3.19%)
>> >helped:6139
>> >HURT:  0
>> > ---
>> >  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 12 
>> >  1 file changed, 12 insertions(+)
>> >
>> > diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
>> > b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> > index 555987d..161a262 100644
>> > --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> > +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
>> > @@ -21,6 +21,8 @@
>> >   * IN THE SOFTWARE.
>> >   */
>> >
>> > +#include 
>> > +
>> >  #include "glsl/ir.h"
>> >  #include "glsl/ir_optimization.h"
>> >  #include "glsl/nir/glsl_to_nir.h"
>> > @@ -662,6 +664,16 @@ fs_visitor::nir_emit_alu(nir_alu_instr *instr)
>> >op[i] = offset(op[i], instr->src[i].swizzle[channel]);
>> > }
>> >
>> > +   /* Immediates can only be used as the second source of two-source
>> > +* instructions.  We have code in opt_algebraic to flip them as needed
>> > +* for most instructions.  However, it doesn't hurt anything to just do
>> > +* the right thing if we can detect it at the NIR level.
>> > +*/
>> > +   if ((nir_op_infos[instr->op].algebraic_properties & 
>> > NIR_OP_IS_COMMUTATIVE) &&
>> > +   nir_src_as_const_value(instr->src[0].src)) {
>> > +  std::swap(op[0], op[1]);
>> > +   }
>> > +
>>
>> The real problem is that we haven't lifted the restriction about
>> non-commutative integer multiply on Broadwell:
>>
>> brw_fs_copy_propagation.cpp:
>>
>>/* Fit this constant in by commuting the operands.
>> * Exception: we can't do this for 32-bit integer MUL/MACH
>> * because it's asymmetric.
>> */
>>if ((inst->opcode == BRW_OPCODE_MUL ||
>> inst->opcode == BRW_OPCODE_MACH) &&
>>(inst->src[1].type == BRW_REGISTER_TYPE_D ||
>> inst->src[1].type == BRW_REGISTER_TYPE_UD))
>>   break;
>>inst->src[0] = inst->src[1];
>>inst->src[1] = val;
>>progress = true;
>
> Yeah.  I also wrote a patch to do that, adding
>
>(brw->gen < 8 || brw->is_cherryview) &&
>
> which also solves the problem.  But it won't help on Cherryview, which I
> believe still has the asymmetrical multiply, while switching to shifts
> will.  I suppose another alternative to NIR late optimizations is to
> have brw_fs_nir's handling of imul check for power of two sizes and emit
> a SHL.  That would be easy.
>
> I do think we really need to make logical IMUL opcodes and lower them to
> MUL/MACH as needed later, so we don't need to worry about breaking MACH
> in cases like this.

Indeed. I mentioned yesterday I've been planning to do it for a while,
so I'll go ahead and take care of it.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC PATCH] nir: Transform 4*x into x << 2 during late optimizations.

2015-05-08 Thread Ian Romanick
On 05/08/2015 03:36 AM, Kenneth Graunke wrote:
> According to Glenn, shifts on R600 have 5x the throughput as multiplies.
> 
> Intel GPUs have strange integer multiplication restrictions - on most
> hardware, MUL actually only does a 32-bit x 16-bit multiply.  This
> means the arguments aren't commutative, which can limit our constant
> propagation options.  SHL has no such restrictions.
> 
> Shifting is probably reasonable on most people's hardware, so let's just
> do that.
> 
> i965 shader-db results (using NIR for VS):
> total instructions in shared programs: 7432587 -> 7388982 (-0.59%)
> instructions in affected programs: 1360411 -> 1316806 (-3.21%)
> helped:5772
> HURT:  0
> 
> Signed-off-by: Kenneth Graunke 
> Cc: matts...@gmail.com
> Cc: ja...@jlekstrand.net
> ---
>  src/glsl/nir/nir_opt_algebraic.py | 5 +
>  1 file changed, 5 insertions(+)
> 
> So...I found a bizarre issue with this patch.
> 
>(('imul', 4, a), ('ishl', a, 2)),
> 
> totally optimizes things.  However...
> 
>(('imul', a, 4), ('ishl', a, 2)),
> 
> doesn't seem to do anything, even though imul is commutative, and nir_search
> should totally handle that...
> 
>  ▄▄  ▄▄▄▄    ▄   ▄▄
>  ██  ██   ▀▀▀██▀▀▀  ███  ██
>  ▀█▄ ██ ▄█▀      ██ ▄█▀  ██
>   ██ ██ ██   ██  ██  ██   ▄██▀   ██
>   ███▀▀███   ██  ██   ██ ▀▀
>   ███  ███  ▄██  ██▄ ██   ▄▄ ▄▄
>   ▀▀▀  ▀▀▀  ▀▀▀▀ ▀▀   ▀▀ ▀▀
> 
> If you know why, let me know, otherwise I may have to look into it when more
> awake.

I've noticed a couple other weird things that I have been unable to
understand.  Shaders like the one below end with fmul/ffma instaed of
flrp, for example.  I understand why that happens from GLSL IR
opt_algebraic, but it seems like nir_opt_algebraic should handle it.

[require]
GLSL >= 1.30

[vertex shader]
in vec4 v;
in vec2 tc_in;

out vec2 tc;

void main() {
gl_Position = v;
tc = tc_in;
}

[fragment shader]
in vec2 tc;

out vec4 color;

uniform sampler2D s;
uniform float a;
uniform vec3 base_color;

void main() {
vec3 tex_color = texture(s, tc).xyz;

color.xyz = (base_color * a) + (tex_color * (1.0 - a));
color.a = 1.0;
}



> diff --git a/src/glsl/nir/nir_opt_algebraic.py 
> b/src/glsl/nir/nir_opt_algebraic.py
> index 400d60e..350471f 100644
> --- a/src/glsl/nir/nir_opt_algebraic.py
> +++ b/src/glsl/nir/nir_opt_algebraic.py
> @@ -247,6 +247,11 @@ late_optimizations = [
> (('fge', ('fadd', a, b), 0.0), ('fge', a, ('fneg', b))),
> (('feq', ('fadd', a, b), 0.0), ('feq', a, ('fneg', b))),
> (('fne', ('fadd', a, b), 0.0), ('fne', a, ('fneg', b))),
> +
> +   # Multiplication by 4 comes up fairly often in indirect offset 
> calculations.
> +   # Some GPUs have weird integer multiplication limitations, but shifts 
> should work
> +   # equally well everywhere.
> +   (('imul', 4, a), ('ishl', a, 2)),

This should be conditionalized on whether the platform has native integers.

>  ]
>  
>  print nir_algebraic.AlgebraicPass("nir_opt_algebraic", 
> optimizations).render()
> 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] i965: Store the command parser version number in intel_screen

2015-05-08 Thread Neil Roberts
In order to detect whether the predicate source registers can be used
in a later patch we will need to know the version number for the
command parser. This patch just adds a member to intel_screen and does
an ioctl to get the version.

Reviewed-by: Kenneth Graunke 
---
 src/mesa/drivers/dri/i965/intel_screen.c | 7 +++
 src/mesa/drivers/dri/i965/intel_screen.h | 8 +++-
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/intel_screen.c 
b/src/mesa/drivers/dri/i965/intel_screen.c
index dda1638..896a125 100644
--- a/src/mesa/drivers/dri/i965/intel_screen.c
+++ b/src/mesa/drivers/dri/i965/intel_screen.c
@@ -1407,6 +1407,13 @@ __DRIconfig **intelInitScreen2(__DRIscreen *psp)
  (ret != -1 || errno != EINVAL);
}
 
+   struct drm_i915_getparam getparam;
+   getparam.param = I915_PARAM_CMD_PARSER_VERSION;
+   getparam.value = &intelScreen->cmd_parser_version;
+   const int ret = drmIoctl(psp->fd, DRM_IOCTL_I915_GETPARAM, &getparam);
+   if (ret == -1)
+  intelScreen->cmd_parser_version = 0;
+
psp->extensions = !intelScreen->has_context_reset_notification
   ? intelScreenExtensions : intelRobustScreenExtensions;
 
diff --git a/src/mesa/drivers/dri/i965/intel_screen.h 
b/src/mesa/drivers/dri/i965/intel_screen.h
index e7a1490..742b3d3 100644
--- a/src/mesa/drivers/dri/i965/intel_screen.h
+++ b/src/mesa/drivers/dri/i965/intel_screen.h
@@ -72,7 +72,13 @@ struct intel_screen
* Configuration cache with default values for all contexts
*/
driOptionCache optionCache;
-};
+
+   /**
+* Version of the command parser reported by the
+* I915_PARAM_CMD_PARSER_VERSION parameter
+*/
+   int cmd_parser_version;
+ };
 
 extern void intelDestroyContext(__DRIcontext * driContextPriv);
 
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/fs_nir: Put the immediate in src1 for commutative ALU ops

2015-05-08 Thread Kenneth Graunke
On Friday, May 08, 2015 10:18:44 AM Matt Turner wrote:
> On Fri, May 8, 2015 at 10:04 AM, Jason Ekstrand  wrote:
> > Shader-db results for fragment shaders on Broadwell:
> >
> >total instructions in shared programs: 4310987 -> 4310663 (-0.01%)
> >instructions in affected programs: 43242 -> 42918 (-0.75%)
> >helped:142
> >HURT:  0
> >
> > Shader-db results for vertex shaders on Broadwell:
> >
> >total instructions in shared programs: 2889581 -> 2844309 (-1.57%)
> >instructions in affected programs: 1418720 -> 1373448 (-3.19%)
> >helped:6139
> >HURT:  0
> > ---
> >  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 12 
> >  1 file changed, 12 insertions(+)
> >
> > diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
> > b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> > index 555987d..161a262 100644
> > --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> > +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> > @@ -21,6 +21,8 @@
> >   * IN THE SOFTWARE.
> >   */
> >
> > +#include 
> > +
> >  #include "glsl/ir.h"
> >  #include "glsl/ir_optimization.h"
> >  #include "glsl/nir/glsl_to_nir.h"
> > @@ -662,6 +664,16 @@ fs_visitor::nir_emit_alu(nir_alu_instr *instr)
> >op[i] = offset(op[i], instr->src[i].swizzle[channel]);
> > }
> >
> > +   /* Immediates can only be used as the second source of two-source
> > +* instructions.  We have code in opt_algebraic to flip them as needed
> > +* for most instructions.  However, it doesn't hurt anything to just do
> > +* the right thing if we can detect it at the NIR level.
> > +*/
> > +   if ((nir_op_infos[instr->op].algebraic_properties & 
> > NIR_OP_IS_COMMUTATIVE) &&
> > +   nir_src_as_const_value(instr->src[0].src)) {
> > +  std::swap(op[0], op[1]);
> > +   }
> > +
> 
> The real problem is that we haven't lifted the restriction about
> non-commutative integer multiply on Broadwell:
> 
> brw_fs_copy_propagation.cpp:
> 
>/* Fit this constant in by commuting the operands.
> * Exception: we can't do this for 32-bit integer MUL/MACH
> * because it's asymmetric.
> */
>if ((inst->opcode == BRW_OPCODE_MUL ||
> inst->opcode == BRW_OPCODE_MACH) &&
>(inst->src[1].type == BRW_REGISTER_TYPE_D ||
> inst->src[1].type == BRW_REGISTER_TYPE_UD))
>   break;
>inst->src[0] = inst->src[1];
>inst->src[1] = val;
>progress = true;

Yeah.  I also wrote a patch to do that, adding

   (brw->gen < 8 || brw->is_cherryview) &&

which also solves the problem.  But it won't help on Cherryview, which I
believe still has the asymmetrical multiply, while switching to shifts
will.  I suppose another alternative to NIR late optimizations is to
have brw_fs_nir's handling of imul check for power of two sizes and emit
a SHL.  That would be easy.

I do think we really need to make logical IMUL opcodes and lower them to
MUL/MACH as needed later, so we don't need to worry about breaking MACH
in cases like this.


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2 v3] i965: Use predicate enable bit for conditional rendering w/o stalling

2015-05-08 Thread Neil Roberts
Previously whenever a primitive is drawn the driver would call
_mesa_check_conditional_render which blocks waiting for the result of
the query to determine whether to render. On Gen7+ there is a bit in
the 3DPRIMITIVE command which can be used to disable the primitive
based on the value of a state bit. This state bit can be set based on
whether two registers have different values using the MI_PREDICATE
command. We can load these two registers with the pixel count values
stored in the query begin and end to implement conditional rendering
without stalling.

Unfortunately these two source registers were not in the whitelist of
available registers in the kernel driver until v3.19. This patch uses
the command parser version from intel_screen to detect whether to
attempt to set the predicate data registers.

The predicate enable bit is currently only used for drawing 3D
primitives. For blits, clears, bitmaps, copypixels and drawpixels it
still causes a stall. For most of these it would probably just work to
call the new brw_check_conditional_render function instead of
_mesa_check_conditional_render because they already work in terms of
rendering primitives. However it's a bit trickier for blits because it
can use the BLT ring or the blorp codepath. I think these operations
are less useful for conditional rendering than rendering primitives so
it might be best to leave it for a later patch.

v2: Use the command parser version to detect whether we can write to
the predicate data registers instead of trying to execute a
register load command.
v3: Simple rebase
---
 src/mesa/drivers/dri/i965/Makefile.sources |   1 +
 src/mesa/drivers/dri/i965/brw_conditional_render.c | 167 +
 src/mesa/drivers/dri/i965/brw_context.c|   4 +
 src/mesa/drivers/dri/i965/brw_context.h|  21 +++
 src/mesa/drivers/dri/i965/brw_defines.h|   1 +
 src/mesa/drivers/dri/i965/brw_draw.c   |  16 +-
 src/mesa/drivers/dri/i965/brw_queryobj.c   |  17 ++-
 src/mesa/drivers/dri/i965/intel_extensions.c   |   5 +
 src/mesa/drivers/dri/i965/intel_reg.h  |  23 +++
 9 files changed, 243 insertions(+), 12 deletions(-)
 create mode 100644 src/mesa/drivers/dri/i965/brw_conditional_render.c

diff --git a/src/mesa/drivers/dri/i965/Makefile.sources 
b/src/mesa/drivers/dri/i965/Makefile.sources
index 1ae93e1..a24c20a 100644
--- a/src/mesa/drivers/dri/i965/Makefile.sources
+++ b/src/mesa/drivers/dri/i965/Makefile.sources
@@ -18,6 +18,7 @@ i965_FILES = \
brw_clip_unfilled.c \
brw_clip_util.c \
brw_compute.c \
+   brw_conditional_render.c \
brw_context.c \
brw_context.h \
brw_cs.cpp \
diff --git a/src/mesa/drivers/dri/i965/brw_conditional_render.c 
b/src/mesa/drivers/dri/i965/brw_conditional_render.c
new file mode 100644
index 000..e676aa0
--- /dev/null
+++ b/src/mesa/drivers/dri/i965/brw_conditional_render.c
@@ -0,0 +1,167 @@
+/*
+ * Copyright © 2014 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *Neil Roberts 
+ */
+
+/** @file brw_conditional_render.c
+ *
+ * Support for conditional rendering based on query objects
+ * (GL_NV_conditional_render, GL_ARB_conditional_render_inverted) on Gen7+.
+ */
+
+#include "main/imports.h"
+#include "main/condrender.h"
+
+#include "brw_context.h"
+#include "brw_defines.h"
+#include "intel_batchbuffer.h"
+
+static void
+set_predicate_enable(struct brw_context *brw,
+ bool value)
+{
+   if (value)
+  brw->predicate.state = BRW_PREDICATE_STATE_RENDER;
+   else
+  brw->predicate.state = BRW_PREDICATE_STATE_DONT_RENDER;
+}
+
+static void
+load_64bit_register(struct brw_context *brw,
+uint32_t reg,
+drm_intel_bo *bo,
+uint32_t offset)
+{
+   int i;
+
+   for (i = 0; i < 2; i++) {
+  

[Mesa-dev] [PATCH 0/2] i965: Do conditional rendering in hardware

2015-05-08 Thread Neil Roberts
I thought it might be a good idea to try posting these patches again
since it's been 6 months since they were originally posted. The
patches are a lot more useful now since the command parser in the
kernel is working correctly for Haswell. This means the functionality
is no longer restricted to only Ivybridge.

I haven't changed the patches apart from some minor rebasing. I've
rerun it through Piglit on Ivybridge and it doesn't cause any
regressions.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] clover: add --with-icd-file-dir option

2015-05-08 Thread Emil Velikov
On 8 May 2015 at 03:10, Michel Dänzer  wrote:
> On 08.05.2015 03:24, Tom Stellard wrote:
>> For this particular situation, I'm happy with any solution that:
>>
>> 1. Allows a user to install the icd file to /etc if he or she wants to.
>
> --sysconfdir=/etc
>
> That covers drirc as well.
>
>
>> 2. Does not require the user to read the spec to know that /etc is the
>> correct place to install it.
>
> I think the above is pretty standard for autotools projects. I think it
> would be better to document this in the appropriate place(s) for OpenCL
> users than to add another convoluted option which doesn't really add any
> flexibility.
>
Very well said Michel. I'm suspecting that the latest approach will
lead to quite a bit of confusion. Even amongst experienced package
maintainers. If in doubt one can ping the debian, arch and/or fedora
guys (i.e the ones shipping opencl) for their 2c :-)

-Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] main: glGetIntegeri_v fails for GL_VERTEX_BINDING_STRIDE

2015-05-08 Thread Emil Velikov
On 8 May 2015 at 11:36, Tapani Pälli  wrote:
> That is strange, it was introduced in fb370f89d but then has gone missing ..
>
Seems like it broke shortly after it was introduced (around 10.1.0-devel)

commit 902f9df36bec7d67a2d8bc4c24d89d9d57964903
Author: Francisco Jerez 
Date:   Mon Nov 25 10:11:59 2013 -0800

mesa: Add image parameter queries for ARB_shader_image_load_store.


The fix looks great - curious that we don't have piglits for these.

Whomever pushes this, please add

Cc: "10.4 10.5" 
Signed-off-by: Emil Velikov 

-Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/5] clover: Add threadsafe wrappers for pipe_screen and pipe_context

2015-05-08 Thread Francisco Jerez
Tom Stellard  writes:

> Events can be added to an OpenCL command queue concurrently from
> multiple threads, but pipe_context and pipe_screen objects
> are not threadsafe.  The threadsafe wrappers protect all pipe_screen
> and pipe_context function calls with a mutex, so we can safely use
> them with multiple threads.
> ---
>  src/gallium/state_trackers/clover/Makefile.am  |   6 +-
>  src/gallium/state_trackers/clover/Makefile.sources |   4 +
>  src/gallium/state_trackers/clover/core/device.cpp  |   2 +
>  .../clover/core/pipe_threadsafe_context.c  | 272 
> +
>  .../clover/core/pipe_threadsafe_screen.c   | 184 ++
>  .../state_trackers/clover/core/threadsafe.h|  39 +++
>  src/gallium/targets/opencl/Makefile.am |   3 +-
>  7 files changed, 508 insertions(+), 2 deletions(-)
>  create mode 100644 
> src/gallium/state_trackers/clover/core/pipe_threadsafe_context.c
>  create mode 100644 
> src/gallium/state_trackers/clover/core/pipe_threadsafe_screen.c
>  create mode 100644 src/gallium/state_trackers/clover/core/threadsafe.h
>
> diff --git a/src/gallium/state_trackers/clover/Makefile.am 
> b/src/gallium/state_trackers/clover/Makefile.am
> index f46d9ef..8b615ae 100644
> --- a/src/gallium/state_trackers/clover/Makefile.am
> +++ b/src/gallium/state_trackers/clover/Makefile.am
> @@ -1,5 +1,6 @@
>  AUTOMAKE_OPTIONS = subdir-objects
>  
> +include $(top_srcdir)/src/gallium/Automake.inc
>  include Makefile.sources
>  
>  AM_CPPFLAGS = \
> @@ -32,6 +33,9 @@ cl_HEADERS = \
>   $(top_srcdir)/include/CL/opencl.h
>  endif
>  
> +AM_CFLAGS = \
> + $(GALLIUM_CFLAGS)
> +
>  noinst_LTLIBRARIES = libclover.la libcltgsi.la libclllvm.la
>  
>  libcltgsi_la_CXXFLAGS = \
> @@ -58,6 +62,6 @@ libclover_la_CXXFLAGS = \
>  libclover_la_LIBADD = \
>   libcltgsi.la libclllvm.la
>  
> -libclover_la_SOURCES = $(CPP_SOURCES)
> +libclover_la_SOURCES = $(CPP_SOURCES) $(C_SOURCES)
>  
>  EXTRA_DIST = Doxyfile
> diff --git a/src/gallium/state_trackers/clover/Makefile.sources 
> b/src/gallium/state_trackers/clover/Makefile.sources
> index 10bbda0..90e6b7e 100644
> --- a/src/gallium/state_trackers/clover/Makefile.sources
> +++ b/src/gallium/state_trackers/clover/Makefile.sources
> @@ -53,6 +53,10 @@ CPP_SOURCES := \
>   util/range.hpp \
>   util/tuple.hpp
>  
> +C_SOURCES := \
> + core/pipe_threadsafe_context.c \
> + core/pipe_threadsafe_screen.c
> +
>  LLVM_SOURCES := \
>   llvm/invocation.cpp
>  
> diff --git a/src/gallium/state_trackers/clover/core/device.cpp 
> b/src/gallium/state_trackers/clover/core/device.cpp
> index 42b45b7..b145027 100644
> --- a/src/gallium/state_trackers/clover/core/device.cpp
> +++ b/src/gallium/state_trackers/clover/core/device.cpp
> @@ -22,6 +22,7 @@
>  
>  #include "core/device.hpp"
>  #include "core/platform.hpp"
> +#include "core/threadsafe.h"
>  #include "pipe/p_screen.h"
>  #include "pipe/p_state.h"
>  
> @@ -47,6 +48,7 @@ device::device(clover::platform &platform, 
> pipe_loader_device *ldev) :
>   pipe->destroy(pipe);
>throw error(CL_INVALID_DEVICE);
> }
> +   pipe = pipe_threadsafe_screen(pipe);
>  }
>  
>  device::~device() {
> diff --git a/src/gallium/state_trackers/clover/core/pipe_threadsafe_context.c 
> b/src/gallium/state_trackers/clover/core/pipe_threadsafe_context.c
> new file mode 100644
> index 000..f08f56c
> --- /dev/null
> +++ b/src/gallium/state_trackers/clover/core/pipe_threadsafe_context.c
> @@ -0,0 +1,272 @@
> +/*
> + * Copyright 2015 Advanced Micro Devices, Inc.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
> FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN 
> THE
> + * SOFTWARE.
> + *
> + * Authors: Tom Stellard 
> + *
> + */
> +
> +#include 
> +
> +/**
> + * \file
> + *
> + * threadsafe_context is a wrapper around a pipe_context to make it thread
> + * safe.
> + */
> +
> +#include "os/os_thread.h"
> +#include "pipe/p_context.h"
> 

Re: [Mesa-dev] [PATCH 7/9] egl/wayland: assume EGL_WINDOW_BIT

2015-05-08 Thread Emil Velikov
On 8 May 2015 at 14:34, Axel Davy  wrote:
> Le 06/05/2015 03:00, Dave Airlie a écrit :
>>
>> On 2 May 2015 at 20:15, Axel Davy  wrote:
>>>
>>> Only EGL_WINDOW_BIT is supported. Remove tests related.
>>
>> Is this there no plans to support pixmap/pbuffer/ or any of the other
>> bits?
>>
>> Seems like a step in the wrong direction if we really should be supporting
>> other things than WINDOW_BIT in the future.
>>
>> Dave.
>>
> I double checked, and it really seems pbuffers are not supported by the
> current mesa implementation for wayland.
> However the spec doesn't tell they shouldn't be supported.
>
> What about changing  the commit message to:
> egl/wayland: simplify dri2_wl_create_surface
>
> This function is always used with EGL_WINDOW_BIT. Pixmaps are forbidden
> for Wayland, and PBuffers are unimplemented.
>
Fwiw, the new commit message looks great. I'm not sure how optimistic
one can get about pbuffer support based on what Daniel said. Not to
mention the relative lack of interest.

-Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 5/9] egl/wayland: Implement DRI_PRIME support

2015-05-08 Thread Emil Velikov
On 8 May 2015 at 18:06, Axel Davy  wrote:
> On Fri, 8 May 2015, Emil Velikov wrote:
>
>> Shouldn't we authenticate with the correct gpu or master/render node ?
>> This implementation will auth with GPU1, and then use GPU2 which seems
>> a bit odd. I might be missing something ?
>>
>>
>
> The original patches did do differently: when GPU1 was discovered to not be
> the wanted gpu, it was not authenticating to it.
>
> However loader_get_user_preferred_fd was introduced and thus that leads to
> something different (which I think is cleaner):
>
> We authenticate to GPU1, so we have one node we can render to.
>
> loader_get_user_preferred_fd takes the GPU1 fd, and eventually replaces
> it by GPU2 fd if user wants it and it is possible.
>
> Given this way of doing, we it makes sense to auth to GPU1, even if we end
> up rendering to GPU2.
No objections against loader_get_user_preferred_fd. I'm thinking that
if we push it into drm_handle_device() we can avoid a needless (in
some cases) auth. Not sure how much it's worth, but it seems like a
good idea imho.

-Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 9/9] egl/swrast: enable config extension for swrast

2015-05-08 Thread Emil Velikov
On 2 May 2015 at 11:15, Axel Davy  wrote:
> Enables to use dri config for swrast, like vblank_mode.
>
> Signed-off-by: Axel Davy 
> ---
>  src/egl/drivers/dri2/egl_dri2.c| 21 ++---
>  src/gallium/state_trackers/dri/drisw.c |  1 +
>  src/mesa/drivers/dri/swrast/swrast.c   |  1 +
>  3 files changed, 12 insertions(+), 11 deletions(-)
>
> diff --git a/src/egl/drivers/dri2/egl_dri2.c b/src/egl/drivers/dri2/egl_dri2.c
> index 689d5ec..1434580 100644
> --- a/src/egl/drivers/dri2/egl_dri2.c
> +++ b/src/egl/drivers/dri2/egl_dri2.c
> @@ -559,6 +559,7 @@ dri2_create_screen(_EGLDisplay *disp)
>  {
> const __DRIextension **extensions;
> struct dri2_egl_display *dri2_dpy;
> +   unsigned i;
>
> dri2_dpy = disp->DriverData;
>
> @@ -599,25 +600,23 @@ dri2_create_screen(_EGLDisplay *disp)
> extensions = dri2_dpy->core->getExtensions(dri2_dpy->dri_screen);
>
> if (dri2_dpy->dri2) {
> -  unsigned i;
> -
>if (!dri2_bind_extensions(dri2_dpy, dri2_core_extensions, extensions))
>   goto cleanup_dri_screen;
> -
> -  for (i = 0; extensions[i]; i++) {
> -if (strcmp(extensions[i]->name, __DRI2_ROBUSTNESS) == 0) {
> -dri2_dpy->robustness = (__DRIrobustnessExtension *) 
> extensions[i];
> -}
> -if (strcmp(extensions[i]->name, __DRI2_CONFIG_QUERY) == 0) {
> -   dri2_dpy->config = (__DRI2configQueryExtension *) extensions[i];
> -}
> -  }
> } else {
>assert(dri2_dpy->swrast);
>if (!dri2_bind_extensions(dri2_dpy, swrast_core_extensions, 
> extensions))
>   goto cleanup_dri_screen;
> }
>
> +   for (i = 0; extensions[i]; i++) {
> +  if (strcmp(extensions[i]->name, __DRI2_ROBUSTNESS) == 0) {
> + dri2_dpy->robustness = (__DRIrobustnessExtension *) extensions[i];
> +  }
> +  if (strcmp(extensions[i]->name, __DRI2_CONFIG_QUERY) == 0) {
> + dri2_dpy->config = (__DRI2configQueryExtension *) extensions[i];
> +  }
> +   }
> +
Side note: Applying this hunk might lead to conflict due to the newly
introduced __DRI2_FENCE. Afaict one can move it as well similarly to
robustness and config_query.

Cheers
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Use NIR by default for vertex shaders on GEN8+

2015-05-08 Thread Matt Turner
On Fri, May 8, 2015 at 10:08 AM, Jason Ekstrand  wrote:
> On Fri, May 8, 2015 at 3:27 AM, Kenneth Graunke  wrote:
>> On Thursday, May 07, 2015 06:17:46 PM Matt Turner wrote:
>>> On Thu, May 7, 2015 at 4:50 PM, Jason Ekstrand  wrote:
>>> > GLSL IR vs. NIR shader-db results for SIMD8 vertex shaders on Broadwell:
>>> >
>>> >total instructions in shared programs: 2724483 -> 2711790 (-0.47%)
>>> >instructions in affected programs: 1860859 -> 1848166 (-0.68%)
>>> >helped:4387
>>> >HURT:  4758
>>> >GAINED:1499
>>> >
>>> > The gained programs are ARB vertext programs that were previously going
>>> > through the vec4 backend.  Now that we have prog_to_nir, ARB vertex
>>> > programs can go through the scalar backend so they show up as "gained" in
>>> > the shader-db results.
>>>
>>> Again, I'm kind of confused and disappointed that we're just okay with
>>> hurting 4700 programs without more analysis. I guess I'll go do
>>> that...
>>
>> I took a stab at that tonight.  The good news is, the majority of the
>> hurt is pretty stupid.  Indirect uniform address calculations involve
>> a lot of integer multiplication by 4.
>>
>> For whatever reason, we're getting 4*x instead of x*4, which doesn't
>> support immediates.  So we get:
>>
>> MOV tmp 4
>> MUL dst tmp x
>>
>> Normally, constant propagation would commute the operands, giving us
>> "MUL dst x 4" like we want.  But it sees integer multiplication and
>> chickens out, due to the asymmetry on some platforms.
>
> Right.  I just sent out a patch that puts immediates in src1 for
> commutative ALU ops at the emit stage.  We probably still want to do
> something in constant propagation in case we can fold something, but
> it fixes the problem for now.  It also helps even more than the
> shifting patch.
>
> I don't know.  Maybe we just want to make constant propagation do the
> right thing on BDW+.  Matt?

Yes.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/fs_nir: Put the immediate in src1 for commutative ALU ops

2015-05-08 Thread Matt Turner
On Fri, May 8, 2015 at 10:04 AM, Jason Ekstrand  wrote:
> Shader-db results for fragment shaders on Broadwell:
>
>total instructions in shared programs: 4310987 -> 4310663 (-0.01%)
>instructions in affected programs: 43242 -> 42918 (-0.75%)
>helped:142
>HURT:  0
>
> Shader-db results for vertex shaders on Broadwell:
>
>total instructions in shared programs: 2889581 -> 2844309 (-1.57%)
>instructions in affected programs: 1418720 -> 1373448 (-3.19%)
>helped:6139
>HURT:  0
> ---
>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 12 
>  1 file changed, 12 insertions(+)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> index 555987d..161a262 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> @@ -21,6 +21,8 @@
>   * IN THE SOFTWARE.
>   */
>
> +#include 
> +
>  #include "glsl/ir.h"
>  #include "glsl/ir_optimization.h"
>  #include "glsl/nir/glsl_to_nir.h"
> @@ -662,6 +664,16 @@ fs_visitor::nir_emit_alu(nir_alu_instr *instr)
>op[i] = offset(op[i], instr->src[i].swizzle[channel]);
> }
>
> +   /* Immediates can only be used as the second source of two-source
> +* instructions.  We have code in opt_algebraic to flip them as needed
> +* for most instructions.  However, it doesn't hurt anything to just do
> +* the right thing if we can detect it at the NIR level.
> +*/
> +   if ((nir_op_infos[instr->op].algebraic_properties & 
> NIR_OP_IS_COMMUTATIVE) &&
> +   nir_src_as_const_value(instr->src[0].src)) {
> +  std::swap(op[0], op[1]);
> +   }
> +

The real problem is that we haven't lifted the restriction about
non-commutative integer multiply on Broadwell:

brw_fs_copy_propagation.cpp:

   /* Fit this constant in by commuting the operands.
* Exception: we can't do this for 32-bit integer MUL/MACH
* because it's asymmetric.
*/
   if ((inst->opcode == BRW_OPCODE_MUL ||
inst->opcode == BRW_OPCODE_MACH) &&
   (inst->src[1].type == BRW_REGISTER_TYPE_D ||
inst->src[1].type == BRW_REGISTER_TYPE_UD))
  break;
   inst->src[0] = inst->src[1];
   inst->src[1] = val;
   progress = true;
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/fs_nir: Put the immediate in src1 for commutative ALU ops

2015-05-08 Thread Matt Turner
On Fri, May 8, 2015 at 10:04 AM, Jason Ekstrand  wrote:
> Shader-db results for fragment shaders on Broadwell:
>
>total instructions in shared programs: 4310987 -> 4310663 (-0.01%)
>instructions in affected programs: 43242 -> 42918 (-0.75%)
>helped:142
>HURT:  0
>
> Shader-db results for vertex shaders on Broadwell:
>
>total instructions in shared programs: 2889581 -> 2844309 (-1.57%)
>instructions in affected programs: 1418720 -> 1373448 (-3.19%)
>helped:6139
>HURT:  0
> ---
>  src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 12 
>  1 file changed, 12 insertions(+)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> index 555987d..161a262 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
> @@ -21,6 +21,8 @@
>   * IN THE SOFTWARE.
>   */
>
> +#include 
> +
>  #include "glsl/ir.h"
>  #include "glsl/ir_optimization.h"
>  #include "glsl/nir/glsl_to_nir.h"
> @@ -662,6 +664,16 @@ fs_visitor::nir_emit_alu(nir_alu_instr *instr)
>op[i] = offset(op[i], instr->src[i].swizzle[channel]);
> }
>
> +   /* Immediates can only be used as the second source of two-source
> +* instructions.  We have code in opt_algebraic to flip them as needed
> +* for most instructions.  However, it doesn't hurt anything to just do
> +* the right thing if we can detect it at the NIR level.
> +*/
> +   if ((nir_op_infos[instr->op].algebraic_properties & 
> NIR_OP_IS_COMMUTATIVE) &&
> +   nir_src_as_const_value(instr->src[0].src)) {
> +  std::swap(op[0], op[1]);

We don't use STL.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965/fs: Disable opt_sampler_eot for textureGather

2015-05-08 Thread Neil Roberts
The opt_sampler_eot optimisation seems to break when the last
instruction is SHADER_OPCODE_TG4. A bunch of Piglit tests end up doing
this so it causes a lot of regressions. I can't find any documentation
or known workarounds to indicate that this is expected behaviour, but
considering that this is probably a pretty unlikely situation in a
real use case we might as well disable it in order to avoid the
regressions. In total this fixes 451 tests.

Reviewed-by: Ben Widawsky 
---

See here for some more discussion of this:

http://lists.freedesktop.org/archives/mesa-dev/2015-May/083640.html

As far as I can tell the Jenkins run mentioned in that email doesn't
seem to have any tests on Cherryview or Skylake so that probably
explains why it didn't pick up the regression.

 src/mesa/drivers/dri/i965/brw_fs.cpp | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 8dd680e..e9528e0 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -2655,6 +2655,16 @@ fs_visitor::opt_sampler_eot()
if (unlikely(tex_inst->is_head_sentinel()) || !tex_inst->is_tex())
   return false;
 
+   /* This optimisation doesn't seem to work for textureGather for some
+* reason. I can't find any documentation or known workarounds to indicate
+* that this is expected, but considering that it is probably pretty
+* unlikely that a shader would directly write out the results from
+* textureGather we might as well just disable it.
+*/
+   if (tex_inst->opcode == SHADER_OPCODE_TG4 ||
+   tex_inst->opcode == SHADER_OPCODE_TG4_OFFSET)
+  return false;
+
/* If there's no header present, we need to munge the LOAD_PAYLOAD as well.
 * It's very likely to be the previous instruction.
 */
-- 
1.9.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Use NIR by default for vertex shaders on GEN8+

2015-05-08 Thread Jason Ekstrand
On Fri, May 8, 2015 at 3:27 AM, Kenneth Graunke  wrote:
> On Thursday, May 07, 2015 06:17:46 PM Matt Turner wrote:
>> On Thu, May 7, 2015 at 4:50 PM, Jason Ekstrand  wrote:
>> > GLSL IR vs. NIR shader-db results for SIMD8 vertex shaders on Broadwell:
>> >
>> >total instructions in shared programs: 2724483 -> 2711790 (-0.47%)
>> >instructions in affected programs: 1860859 -> 1848166 (-0.68%)
>> >helped:4387
>> >HURT:  4758
>> >GAINED:1499
>> >
>> > The gained programs are ARB vertext programs that were previously going
>> > through the vec4 backend.  Now that we have prog_to_nir, ARB vertex
>> > programs can go through the scalar backend so they show up as "gained" in
>> > the shader-db results.
>>
>> Again, I'm kind of confused and disappointed that we're just okay with
>> hurting 4700 programs without more analysis. I guess I'll go do
>> that...
>
> I took a stab at that tonight.  The good news is, the majority of the
> hurt is pretty stupid.  Indirect uniform address calculations involve
> a lot of integer multiplication by 4.
>
> For whatever reason, we're getting 4*x instead of x*4, which doesn't
> support immediates.  So we get:
>
> MOV tmp 4
> MUL dst tmp x
>
> Normally, constant propagation would commute the operands, giving us
> "MUL dst x 4" like we want.  But it sees integer multiplication and
> chickens out, due to the asymmetry on some platforms.

Right.  I just sent out a patch that puts immediates in src1 for
commutative ALU ops at the emit stage.  We probably still want to do
something in constant propagation in case we can fold something, but
it fixes the problem for now.  It also helps even more than the
shifting patch.

I don't know.  Maybe we just want to make constant propagation do the
right thing on BDW+.  Matt?
--jason

> I think we can extend that - on Broadwell it should work fine, and
> might work fine for 16-bit immediates on Gen7 and Cherryview, too.
>
> Alternatively, I wrote a nir_opt_algebraic_late optimization that turns
> 4*x into x << 2, which works around the problem, and is also apparently
> much better for R600.
>
> Statistics on the shift patch are:
>
> total instructions in shared programs: 7432587 -> 7388982 (-0.59%)
> instructions in affected programs: 1360411 -> 1316806 (-3.21%)
> helped:5772
> HURT:  0
>
> Statistics for GLSL IR vs. NIR+(4*x => x << 2):
>
> total instructions in shared programs: 7232451 -> 7175983 (-0.78%)
> instructions in affected programs: 1586917 -> 1530449 (-3.56%)
> helped:5780
> HURT:  1654
>
> which is much better.
>
> Looking at a couple of the shaders that are still worse off...it looks
> like a ton of Source shaders used to do MUL/ADD with an attribute and
> two immediates, and now are doing MOV/MOV/MAD.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC PATCH] nir: Transform 4*x into x << 2 during late optimizations.

2015-05-08 Thread Jason Ekstrand
On Fri, May 8, 2015 at 3:36 AM, Kenneth Graunke  wrote:
> According to Glenn, shifts on R600 have 5x the throughput as multiplies.
>
> Intel GPUs have strange integer multiplication restrictions - on most
> hardware, MUL actually only does a 32-bit x 16-bit multiply.  This
> means the arguments aren't commutative, which can limit our constant
> propagation options.  SHL has no such restrictions.
>
> Shifting is probably reasonable on most people's hardware, so let's just
> do that.
>
> i965 shader-db results (using NIR for VS):
> total instructions in shared programs: 7432587 -> 7388982 (-0.59%)
> instructions in affected programs: 1360411 -> 1316806 (-3.21%)
> helped:5772
> HURT:  0
>
> Signed-off-by: Kenneth Graunke 
> Cc: matts...@gmail.com
> Cc: ja...@jlekstrand.net
> ---
>  src/glsl/nir/nir_opt_algebraic.py | 5 +
>  1 file changed, 5 insertions(+)
>
> So...I found a bizarre issue with this patch.
>
>(('imul', 4, a), ('ishl', a, 2)),
>
> totally optimizes things.  However...
>
>(('imul', a, 4), ('ishl', a, 2)),
>
> doesn't seem to do anything, even though imul is commutative, and nir_search
> should totally handle that...
>
>  ▄▄  ▄▄▄▄    ▄   ▄▄
>  ██  ██   ▀▀▀██▀▀▀  ███  ██
>  ▀█▄ ██ ▄█▀      ██ ▄█▀  ██
>   ██ ██ ██   ██  ██  ██   ▄██▀   ██
>   ███▀▀███   ██  ██   ██ ▀▀
>   ███  ███  ▄██  ██▄ ██   ▄▄ ▄▄
>   ▀▀▀  ▀▀▀  ▀▀▀▀ ▀▀   ▀▀ ▀▀
>
> If you know why, let me know, otherwise I may have to look into it when more
> awake.

I figured it out and I have a patch.  Unfortunately, it regresses a
few programs and looses 8 SIMD8 programs so I'm doing some more
investigation.  I'll send it out soon.

> diff --git a/src/glsl/nir/nir_opt_algebraic.py 
> b/src/glsl/nir/nir_opt_algebraic.py
> index 400d60e..350471f 100644
> --- a/src/glsl/nir/nir_opt_algebraic.py
> +++ b/src/glsl/nir/nir_opt_algebraic.py
> @@ -247,6 +247,11 @@ late_optimizations = [
> (('fge', ('fadd', a, b), 0.0), ('fge', a, ('fneg', b))),
> (('feq', ('fadd', a, b), 0.0), ('feq', a, ('fneg', b))),
> (('fne', ('fadd', a, b), 0.0), ('fne', a, ('fneg', b))),
> +
> +   # Multiplication by 4 comes up fairly often in indirect offset 
> calculations.
> +   # Some GPUs have weird integer multiplication limitations, but shifts 
> should work
> +   # equally well everywhere.
> +   (('imul', 4, a), ('ishl', a, 2)),
>  ]
>
>  print nir_algebraic.AlgebraicPass("nir_opt_algebraic", 
> optimizations).render()
> --
> 2.4.0
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] i965/fs_nir: Put the immediate in src1 for commutative ALU ops

2015-05-08 Thread Jason Ekstrand
Shader-db results for fragment shaders on Broadwell:

   total instructions in shared programs: 4310987 -> 4310663 (-0.01%)
   instructions in affected programs: 43242 -> 42918 (-0.75%)
   helped:142
   HURT:  0

Shader-db results for vertex shaders on Broadwell:

   total instructions in shared programs: 2889581 -> 2844309 (-1.57%)
   instructions in affected programs: 1418720 -> 1373448 (-3.19%)
   helped:6139
   HURT:  0
---
 src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 12 
 1 file changed, 12 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
index 555987d..161a262 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
@@ -21,6 +21,8 @@
  * IN THE SOFTWARE.
  */
 
+#include 
+
 #include "glsl/ir.h"
 #include "glsl/ir_optimization.h"
 #include "glsl/nir/glsl_to_nir.h"
@@ -662,6 +664,16 @@ fs_visitor::nir_emit_alu(nir_alu_instr *instr)
   op[i] = offset(op[i], instr->src[i].swizzle[channel]);
}
 
+   /* Immediates can only be used as the second source of two-source
+* instructions.  We have code in opt_algebraic to flip them as needed
+* for most instructions.  However, it doesn't hurt anything to just do
+* the right thing if we can detect it at the NIR level.
+*/
+   if ((nir_op_infos[instr->op].algebraic_properties & NIR_OP_IS_COMMUTATIVE) 
&&
+   nir_src_as_const_value(instr->src[0].src)) {
+  std::swap(op[0], op[1]);
+   }
+
switch (instr->op) {
case nir_op_i2f:
case nir_op_u2f:
-- 
2.4.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 5/9] egl/wayland: Implement DRI_PRIME support

2015-05-08 Thread Emil Velikov
On 2 May 2015 at 11:15, Axel Davy  wrote:
> When the server gpu and requested gpu are different:
> . They likely don't support the same tiling modes
> . They likely do not have fast access to the same locations
>
> Thus we do:
> . render to a tiled buffer we do not share with the server
> . Copy the content at every swap to a buffer with no tiling
> that we share with the server.
>
> This is similar to the glx dri3 DRI_PRIME implementation.
>
> Signed-off-by: Axel Davy 
> ---

> --- a/src/egl/drivers/dri2/platform_wayland.c
> +++ b/src/egl/drivers/dri2/platform_wayland.c

> @@ -632,6 +658,8 @@ dri2_wl_swap_buffers_with_damage(_EGLDriver *drv,
>  {
> struct dri2_egl_display *dri2_dpy = dri2_egl_display(disp);
> struct dri2_egl_surface *dri2_surf = dri2_egl_surface(draw);
> +   struct dri2_egl_context *dri2_ctx;
> +   _EGLContext *ctx;
Nit: Move the variable declarations where they're used. Otherwise
static analysis tools flag lovely message(s).

> @@ -1084,6 +1123,24 @@ dri2_initialize_wayland(_EGLDriver *drv, _EGLDisplay 
> *disp)
> if (roundtrip(dri2_dpy) < 0 || !dri2_dpy->authenticated)
>goto cleanup_fd;
>
> +   dri2_dpy->fd = loader_get_user_preferred_fd(dri2_dpy->fd,
> +   &dri2_dpy->is_different_gpu);
> +   if (dri2_dpy->is_different_gpu) {
> +  free(dri2_dpy->device_name);
> +  dri2_dpy->device_name = loader_get_device_name_for_fd(dri2_dpy->fd);
> +  if (!dri2_dpy->device_name) {
> + _eglError(EGL_BAD_ALLOC, "wayland-egl: failed to get device name "
> +  "for requested GPU");
> + goto cleanup_fd;
> +  }
> +   }
> +
Shouldn't we authenticate with the correct gpu or master/render node ?
This implementation will auth with GPU1, and then use GPU2 which seems
a bit odd. I might be missing something ?


> @@ -1127,6 +1184,15 @@ dri2_initialize_wayland(_EGLDriver *drv, _EGLDisplay 
> *disp)
>goto cleanup_screen;
> }
>
> +   if (dri2_dpy->is_different_gpu &&
> +   (dri2_dpy->image->base.version < 9 ||
> +dri2_dpy->image->blitImage == NULL)) {
> +  _eglLog(_EGL_WARNING, "wayland-egl: Different GPU, but image extension 
> "
> +"version 9 or later not found, or blitImage not "
> +"implemented for this driver");
Nit: Alternative message:
"Different GPU selected, but the Image extension in the driver is not
compatible. Version 9 or later and blitImage() are required."


Thanks
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCHv2 4/5] nir: Translate image load, store and atomic intrinsics from GLSL IR.

2015-05-08 Thread Connor Abbott
Update the comment in patch 1, and the entire thing is

Reviewed-by: Connor Abbott 

On Fri, May 8, 2015 at 12:40 PM, Francisco Jerez  wrote:
> v2: Undefine coordinate components not applicable to the target.
>
> ---
>  src/glsl/nir/glsl_to_nir.cpp | 126 
> +++
>  1 file changed, 115 insertions(+), 11 deletions(-)
>
> diff --git a/src/glsl/nir/glsl_to_nir.cpp b/src/glsl/nir/glsl_to_nir.cpp
> index f6b8331..fab4508 100644
> --- a/src/glsl/nir/glsl_to_nir.cpp
> +++ b/src/glsl/nir/glsl_to_nir.cpp
> @@ -614,27 +614,131 @@ nir_visitor::visit(ir_call *ir)
>   op = nir_intrinsic_atomic_counter_inc_var;
>} else if (strcmp(ir->callee_name(), 
> "__intrinsic_atomic_predecrement") == 0) {
>   op = nir_intrinsic_atomic_counter_dec_var;
> +  } else if (strcmp(ir->callee_name(), "__intrinsic_image_load") == 0) {
> + op = nir_intrinsic_image_load;
> +  } else if (strcmp(ir->callee_name(), "__intrinsic_image_store") == 0) {
> + op = nir_intrinsic_image_store;
> +  } else if (strcmp(ir->callee_name(), "__intrinsic_image_atomic_add") 
> == 0) {
> + op = nir_intrinsic_image_atomic_add;
> +  } else if (strcmp(ir->callee_name(), "__intrinsic_image_atomic_min") 
> == 0) {
> + op = nir_intrinsic_image_atomic_min;
> +  } else if (strcmp(ir->callee_name(), "__intrinsic_image_atomic_max") 
> == 0) {
> + op = nir_intrinsic_image_atomic_max;
> +  } else if (strcmp(ir->callee_name(), "__intrinsic_image_atomic_and") 
> == 0) {
> + op = nir_intrinsic_image_atomic_and;
> +  } else if (strcmp(ir->callee_name(), "__intrinsic_image_atomic_or") == 
> 0) {
> + op = nir_intrinsic_image_atomic_or;
> +  } else if (strcmp(ir->callee_name(), "__intrinsic_image_atomic_xor") 
> == 0) {
> + op = nir_intrinsic_image_atomic_xor;
> +  } else if (strcmp(ir->callee_name(), 
> "__intrinsic_image_atomic_exchange") == 0) {
> + op = nir_intrinsic_image_atomic_exchange;
> +  } else if (strcmp(ir->callee_name(), 
> "__intrinsic_image_atomic_comp_swap") == 0) {
> + op = nir_intrinsic_image_atomic_comp_swap;
>} else {
>   unreachable("not reached");
>}
>
>nir_intrinsic_instr *instr = nir_intrinsic_instr_create(shader, op);
> -  ir_dereference *param =
> - (ir_dereference *) ir->actual_parameters.get_head();
> -  instr->variables[0] = evaluate_deref(&instr->instr, param);
> -  nir_ssa_dest_init(&instr->instr, &instr->dest, 1, NULL);
> +
> +  switch (op) {
> +  case nir_intrinsic_atomic_counter_read_var:
> +  case nir_intrinsic_atomic_counter_inc_var:
> +  case nir_intrinsic_atomic_counter_dec_var: {
> + ir_dereference *param =
> +(ir_dereference *) ir->actual_parameters.get_head();
> + instr->variables[0] = evaluate_deref(&instr->instr, param);
> + nir_ssa_dest_init(&instr->instr, &instr->dest, 1, NULL);
> + break;
> +  }
> +  case nir_intrinsic_image_load:
> +  case nir_intrinsic_image_store:
> +  case nir_intrinsic_image_atomic_add:
> +  case nir_intrinsic_image_atomic_min:
> +  case nir_intrinsic_image_atomic_max:
> +  case nir_intrinsic_image_atomic_and:
> +  case nir_intrinsic_image_atomic_or:
> +  case nir_intrinsic_image_atomic_xor:
> +  case nir_intrinsic_image_atomic_exchange:
> +  case nir_intrinsic_image_atomic_comp_swap: {
> + nir_ssa_undef_instr *instr_undef =
> +nir_ssa_undef_instr_create(shader, 1);
> + nir_instr_insert_after_cf_list(this->cf_node_list,
> +&instr_undef->instr);
> +
> + /* Set the image variable dereference. */
> + exec_node *param = ir->actual_parameters.get_head();
> + ir_dereference *image = (ir_dereference *)param;
> + const glsl_type *type =
> +image->variable_referenced()->type->without_array();
> +
> + instr->variables[0] = evaluate_deref(&instr->instr, image);
> + param = param->get_next();
> +
> + /* Set the address argument, extending the coordinate vector to four
> +  * components.
> +  */
> + const nir_src src_addr = evaluate_rvalue((ir_dereference *)param);
> + nir_alu_instr *instr_addr = nir_alu_instr_create(shader, 
> nir_op_vec4);
> + nir_ssa_dest_init(&instr_addr->instr, &instr_addr->dest.dest, 4, 
> NULL);
> +
> + for (int i = 0; i < 4; i++) {
> +if (i < type->coordinate_components()) {
> +   instr_addr->src[i].src = src_addr;
> +   instr_addr->src[i].swizzle[0] = i;
> +} else {
> +   instr_addr->src[i].src = nir_src_for_ssa(&instr_undef->def);
> +}
> + }
> +
> + nir_instr_insert_after_cf_list(cf_node_list, &instr_addr->instr);
> + instr->src[0] = nir_src_for_ssa(&instr_addr->dest.dest.ssa);
> + para

Re: [Mesa-dev] [PATCH] nir/search: handle explicitly sized sources in match_value

2015-05-08 Thread Connor Abbott
Makes sense.

Reviewed-by: Connor Abbott 

On Fri, May 8, 2015 at 11:38 AM, Jason Ekstrand  wrote:
> Previously, this case was being handled in match_expression prior to
> calling match_value.  However, there is really no good reason for this
> given that match_value has all of the information it needs.  Also, they
> weren't being handled properly in the commutative case and putting it in
> match_value gives us that for free.
>
> Cc: Connor Abbott 
> ---
>  src/glsl/nir/nir_search.c | 16 
>  1 file changed, 8 insertions(+), 8 deletions(-)
>
> diff --git a/src/glsl/nir/nir_search.c b/src/glsl/nir/nir_search.c
> index 5ba0160..821b86d 100644
> --- a/src/glsl/nir/nir_search.c
> +++ b/src/glsl/nir/nir_search.c
> @@ -73,6 +73,14 @@ match_value(const nir_search_value *value, nir_alu_instr 
> *instr, unsigned src,
>  {
> uint8_t new_swizzle[4];
>
> +   /* If the source is an explicitly sized source, then we need to reset
> +* both the number of components and the swizzle.
> +*/
> +   if (nir_op_infos[instr->op].input_sizes[src] != 0) {
> +  num_components = nir_op_infos[instr->op].input_sizes[src];
> +  swizzle = identity_swizzle;
> +   }
> +
> for (int i = 0; i < num_components; ++i)
>new_swizzle[i] = instr->src[src].swizzle[swizzle[i]];
>
> @@ -200,14 +208,6 @@ match_expression(const nir_search_expression *expr, 
> nir_alu_instr *instr,
>
> bool matched = true;
> for (unsigned i = 0; i < nir_op_infos[instr->op].num_inputs; i++) {
> -  /* If the source is an explicitly sized source, then we need to reset
> -   * both the number of components and the swizzle.
> -   */
> -  if (nir_op_infos[instr->op].input_sizes[i] != 0) {
> - num_components = nir_op_infos[instr->op].input_sizes[i];
> - swizzle = identity_swizzle;
> -  }
> -
>if (!match_value(expr->srcs[i], instr, i, num_components,
> swizzle, state)) {
>   matched = false;
> --
> 2.4.0
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCHv2 4/5] nir: Translate image load, store and atomic intrinsics from GLSL IR.

2015-05-08 Thread Francisco Jerez
v2: Undefine coordinate components not applicable to the target.

---
 src/glsl/nir/glsl_to_nir.cpp | 126 +++
 1 file changed, 115 insertions(+), 11 deletions(-)

diff --git a/src/glsl/nir/glsl_to_nir.cpp b/src/glsl/nir/glsl_to_nir.cpp
index f6b8331..fab4508 100644
--- a/src/glsl/nir/glsl_to_nir.cpp
+++ b/src/glsl/nir/glsl_to_nir.cpp
@@ -614,27 +614,131 @@ nir_visitor::visit(ir_call *ir)
  op = nir_intrinsic_atomic_counter_inc_var;
   } else if (strcmp(ir->callee_name(), "__intrinsic_atomic_predecrement") 
== 0) {
  op = nir_intrinsic_atomic_counter_dec_var;
+  } else if (strcmp(ir->callee_name(), "__intrinsic_image_load") == 0) {
+ op = nir_intrinsic_image_load;
+  } else if (strcmp(ir->callee_name(), "__intrinsic_image_store") == 0) {
+ op = nir_intrinsic_image_store;
+  } else if (strcmp(ir->callee_name(), "__intrinsic_image_atomic_add") == 
0) {
+ op = nir_intrinsic_image_atomic_add;
+  } else if (strcmp(ir->callee_name(), "__intrinsic_image_atomic_min") == 
0) {
+ op = nir_intrinsic_image_atomic_min;
+  } else if (strcmp(ir->callee_name(), "__intrinsic_image_atomic_max") == 
0) {
+ op = nir_intrinsic_image_atomic_max;
+  } else if (strcmp(ir->callee_name(), "__intrinsic_image_atomic_and") == 
0) {
+ op = nir_intrinsic_image_atomic_and;
+  } else if (strcmp(ir->callee_name(), "__intrinsic_image_atomic_or") == 
0) {
+ op = nir_intrinsic_image_atomic_or;
+  } else if (strcmp(ir->callee_name(), "__intrinsic_image_atomic_xor") == 
0) {
+ op = nir_intrinsic_image_atomic_xor;
+  } else if (strcmp(ir->callee_name(), 
"__intrinsic_image_atomic_exchange") == 0) {
+ op = nir_intrinsic_image_atomic_exchange;
+  } else if (strcmp(ir->callee_name(), 
"__intrinsic_image_atomic_comp_swap") == 0) {
+ op = nir_intrinsic_image_atomic_comp_swap;
   } else {
  unreachable("not reached");
   }
 
   nir_intrinsic_instr *instr = nir_intrinsic_instr_create(shader, op);
-  ir_dereference *param =
- (ir_dereference *) ir->actual_parameters.get_head();
-  instr->variables[0] = evaluate_deref(&instr->instr, param);
-  nir_ssa_dest_init(&instr->instr, &instr->dest, 1, NULL);
+
+  switch (op) {
+  case nir_intrinsic_atomic_counter_read_var:
+  case nir_intrinsic_atomic_counter_inc_var:
+  case nir_intrinsic_atomic_counter_dec_var: {
+ ir_dereference *param =
+(ir_dereference *) ir->actual_parameters.get_head();
+ instr->variables[0] = evaluate_deref(&instr->instr, param);
+ nir_ssa_dest_init(&instr->instr, &instr->dest, 1, NULL);
+ break;
+  }
+  case nir_intrinsic_image_load:
+  case nir_intrinsic_image_store:
+  case nir_intrinsic_image_atomic_add:
+  case nir_intrinsic_image_atomic_min:
+  case nir_intrinsic_image_atomic_max:
+  case nir_intrinsic_image_atomic_and:
+  case nir_intrinsic_image_atomic_or:
+  case nir_intrinsic_image_atomic_xor:
+  case nir_intrinsic_image_atomic_exchange:
+  case nir_intrinsic_image_atomic_comp_swap: {
+ nir_ssa_undef_instr *instr_undef =
+nir_ssa_undef_instr_create(shader, 1);
+ nir_instr_insert_after_cf_list(this->cf_node_list,
+&instr_undef->instr);
+
+ /* Set the image variable dereference. */
+ exec_node *param = ir->actual_parameters.get_head();
+ ir_dereference *image = (ir_dereference *)param;
+ const glsl_type *type =
+image->variable_referenced()->type->without_array();
+
+ instr->variables[0] = evaluate_deref(&instr->instr, image);
+ param = param->get_next();
+
+ /* Set the address argument, extending the coordinate vector to four
+  * components.
+  */
+ const nir_src src_addr = evaluate_rvalue((ir_dereference *)param);
+ nir_alu_instr *instr_addr = nir_alu_instr_create(shader, nir_op_vec4);
+ nir_ssa_dest_init(&instr_addr->instr, &instr_addr->dest.dest, 4, 
NULL);
+
+ for (int i = 0; i < 4; i++) {
+if (i < type->coordinate_components()) {
+   instr_addr->src[i].src = src_addr;
+   instr_addr->src[i].swizzle[0] = i;
+} else {
+   instr_addr->src[i].src = nir_src_for_ssa(&instr_undef->def);
+}
+ }
+
+ nir_instr_insert_after_cf_list(cf_node_list, &instr_addr->instr);
+ instr->src[0] = nir_src_for_ssa(&instr_addr->dest.dest.ssa);
+ param = param->get_next();
+
+ /* Set the sample argument, which is undefined for single-sample
+  * images.
+  */
+ if (type->sampler_dimensionality == GLSL_SAMPLER_DIM_MS) {
+instr->src[1] = evaluate_rvalue((ir_dereference *)param);
+param = param->get_next();
+ } else {
+instr->src[

Re: [Mesa-dev] [PATCH 4/5] nir: Translate image load, store and atomic intrinsics from GLSL IR.

2015-05-08 Thread Connor Abbott
On Fri, May 8, 2015 at 10:30 AM, Francisco Jerez  wrote:
> Connor Abbott  writes:
>
>> On Tue, May 5, 2015 at 4:29 PM, Francisco Jerez  
>> wrote:
>>> ---
>>>  src/glsl/nir/glsl_to_nir.cpp | 125 
>>> +++
>>>  1 file changed, 114 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/src/glsl/nir/glsl_to_nir.cpp b/src/glsl/nir/glsl_to_nir.cpp
>>> index f6b8331..a01ab3b 100644
>>> --- a/src/glsl/nir/glsl_to_nir.cpp
>>> +++ b/src/glsl/nir/glsl_to_nir.cpp
>>> @@ -614,27 +614,130 @@ nir_visitor::visit(ir_call *ir)
>>>   op = nir_intrinsic_atomic_counter_inc_var;
>>>} else if (strcmp(ir->callee_name(), 
>>> "__intrinsic_atomic_predecrement") == 0) {
>>>   op = nir_intrinsic_atomic_counter_dec_var;
>>> +  } else if (strcmp(ir->callee_name(), "__intrinsic_image_load") == 0) 
>>> {
>>> + op = nir_intrinsic_image_load;
>>> +  } else if (strcmp(ir->callee_name(), "__intrinsic_image_store") == 
>>> 0) {
>>> + op = nir_intrinsic_image_store;
>>> +  } else if (strcmp(ir->callee_name(), "__intrinsic_image_atomic_add") 
>>> == 0) {
>>> + op = nir_intrinsic_image_atomic_add;
>>> +  } else if (strcmp(ir->callee_name(), "__intrinsic_image_atomic_min") 
>>> == 0) {
>>> + op = nir_intrinsic_image_atomic_min;
>>> +  } else if (strcmp(ir->callee_name(), "__intrinsic_image_atomic_max") 
>>> == 0) {
>>> + op = nir_intrinsic_image_atomic_max;
>>> +  } else if (strcmp(ir->callee_name(), "__intrinsic_image_atomic_and") 
>>> == 0) {
>>> + op = nir_intrinsic_image_atomic_and;
>>> +  } else if (strcmp(ir->callee_name(), "__intrinsic_image_atomic_or") 
>>> == 0) {
>>> + op = nir_intrinsic_image_atomic_or;
>>> +  } else if (strcmp(ir->callee_name(), "__intrinsic_image_atomic_xor") 
>>> == 0) {
>>> + op = nir_intrinsic_image_atomic_xor;
>>> +  } else if (strcmp(ir->callee_name(), 
>>> "__intrinsic_image_atomic_exchange") == 0) {
>>> + op = nir_intrinsic_image_atomic_exchange;
>>> +  } else if (strcmp(ir->callee_name(), 
>>> "__intrinsic_image_atomic_comp_swap") == 0) {
>>> + op = nir_intrinsic_image_atomic_comp_swap;
>>>} else {
>>>   unreachable("not reached");
>>>}
>>>
>>>nir_intrinsic_instr *instr = nir_intrinsic_instr_create(shader, op);
>>> -  ir_dereference *param =
>>> - (ir_dereference *) ir->actual_parameters.get_head();
>>> -  instr->variables[0] = evaluate_deref(&instr->instr, param);
>>> -  nir_ssa_dest_init(&instr->instr, &instr->dest, 1, NULL);
>>> +
>>> +  switch (op) {
>>> +  case nir_intrinsic_atomic_counter_read_var:
>>> +  case nir_intrinsic_atomic_counter_inc_var:
>>> +  case nir_intrinsic_atomic_counter_dec_var: {
>>> + ir_dereference *param =
>>> +(ir_dereference *) ir->actual_parameters.get_head();
>>> + instr->variables[0] = evaluate_deref(&instr->instr, param);
>>> + nir_ssa_dest_init(&instr->instr, &instr->dest, 1, NULL);
>>> + break;
>>> +  }
>>> +  case nir_intrinsic_image_load:
>>> +  case nir_intrinsic_image_store:
>>> +  case nir_intrinsic_image_atomic_add:
>>> +  case nir_intrinsic_image_atomic_min:
>>> +  case nir_intrinsic_image_atomic_max:
>>> +  case nir_intrinsic_image_atomic_and:
>>> +  case nir_intrinsic_image_atomic_or:
>>> +  case nir_intrinsic_image_atomic_xor:
>>> +  case nir_intrinsic_image_atomic_exchange:
>>> +  case nir_intrinsic_image_atomic_comp_swap: {
>>> + nir_load_const_instr *instr_zero = 
>>> nir_load_const_instr_create(shader, 1);
>>> + instr_zero->value.u[0] = 0;
>>> + nir_instr_insert_after_cf_list(this->cf_node_list, 
>>> &instr_zero->instr);
>>> +
>>> + /* Set the image variable dereference. */
>>> + exec_node *param = ir->actual_parameters.get_head();
>>> + ir_dereference *image = (ir_dereference *)param;
>>> + const glsl_type *type =
>>> +image->variable_referenced()->type->without_array();
>>> +
>>> + instr->variables[0] = evaluate_deref(&instr->instr, image);
>>> + param = param->get_next();
>>> +
>>> + /* Set the address argument, extending the coordinate vector to 
>>> four
>>> +  * components.
>>> +  */
>>> + const nir_src src_addr = evaluate_rvalue((ir_dereference *)param);
>>> + nir_alu_instr *instr_addr = nir_alu_instr_create(shader, 
>>> nir_op_vec4);
>>> + nir_ssa_dest_init(&instr_addr->instr, &instr_addr->dest.dest, 4, 
>>> NULL);
>>> +
>>> + for (int i = 0; i < 4; i++) {
>>> +if (i < type->coordinate_components()) {
>>> +   instr_addr->src[i].src = src_addr;
>>> +   instr_addr->src[i].swizzle[0] = i;
>>> +} else {
>>> +   instr_addr->src[i].src = nir_src_for_ssa(&instr_zero->def);
>>
>> I think it would better convey the intent to create an

Re: [Mesa-dev] [PATCH 4/9] egl/wayland: Add support for render-nodes

2015-05-08 Thread Emil Velikov
On 2 May 2015 at 11:15, Axel Davy  wrote:
> It is possible the server advertises a render-node.
> In that case no authentication is needed,
> and Gem names are forbidden.
>
> Signed-off-by: Axel Davy 
> ---

> diff --git a/src/egl/drivers/dri2/platform_wayland.c 
> b/src/egl/drivers/dri2/platform_wayland.c
> index a5bcf25..79989cb 100644
> --- a/src/egl/drivers/dri2/platform_wayland.c
> +++ b/src/egl/drivers/dri2/platform_wayland.c
> @@ -800,12 +800,33 @@ bad_format:
> return NULL;
>  }
>
> +static char
> +is_fd_render_node(int fd)
> +{
> +   struct stat render;
> +
> +   if (fstat(fd, &render))
> +  return 0;
> +
> +   if (!S_ISCHR(render.st_mode))
> +  return 0;
> +
> +   if (render.st_rdev & 0x80)
> +  return 1;
> +   return 0;
> +}
> +
Let's use drmGetNodeTypeFromFd(), rather than hard-coding this ?


> +static EGLBoolean
> +is_render_node_capable(struct dri2_egl_display *dri2_dpy)
> +{
> +   const __DRIextension **extensions;
> +   int i;
> +
> +   /* We cannot use Gem names with render-nodes, only prime fds (dma-buf).
> +* The server needs to accept them */
> +   if (!(dri2_dpy->capabilities & WL_DRM_CAPABILITY_PRIME))
> +  return EGL_FALSE;
> +
> +   /* Check the __DRIimage api is supported (this is required by our
> +* codepath without Gem names) */
> +   extensions = dri2_dpy->driver_extensions;
> +   for (i = 0; extensions[i]; i++) {
> +  if (strcmp(extensions[i]->name, __DRI_IMAGE_DRIVER) == 0)
> + return EGL_TRUE;
If memory serves me right__DRI_IMAGE_DRIVER is a cleaned up version of
__DRI_DRI2. Afaics the former is never looked up (or used) in the EGL
code, unlike the latter. So I'm suspecting that this check is a bit
off ?

Thanks
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] nir/search: handle explicitly sized sources in match_value

2015-05-08 Thread Jason Ekstrand
Previously, this case was being handled in match_expression prior to
calling match_value.  However, there is really no good reason for this
given that match_value has all of the information it needs.  Also, they
weren't being handled properly in the commutative case and putting it in
match_value gives us that for free.

Cc: Connor Abbott 
---
 src/glsl/nir/nir_search.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/src/glsl/nir/nir_search.c b/src/glsl/nir/nir_search.c
index 5ba0160..821b86d 100644
--- a/src/glsl/nir/nir_search.c
+++ b/src/glsl/nir/nir_search.c
@@ -73,6 +73,14 @@ match_value(const nir_search_value *value, nir_alu_instr 
*instr, unsigned src,
 {
uint8_t new_swizzle[4];
 
+   /* If the source is an explicitly sized source, then we need to reset
+* both the number of components and the swizzle.
+*/
+   if (nir_op_infos[instr->op].input_sizes[src] != 0) {
+  num_components = nir_op_infos[instr->op].input_sizes[src];
+  swizzle = identity_swizzle;
+   }
+
for (int i = 0; i < num_components; ++i)
   new_swizzle[i] = instr->src[src].swizzle[swizzle[i]];
 
@@ -200,14 +208,6 @@ match_expression(const nir_search_expression *expr, 
nir_alu_instr *instr,
 
bool matched = true;
for (unsigned i = 0; i < nir_op_infos[instr->op].num_inputs; i++) {
-  /* If the source is an explicitly sized source, then we need to reset
-   * both the number of components and the swizzle.
-   */
-  if (nir_op_infos[instr->op].input_sizes[i] != 0) {
- num_components = nir_op_infos[instr->op].input_sizes[i];
- swizzle = identity_swizzle;
-  }
-
   if (!match_value(expr->srcs[i], instr, i, num_components,
swizzle, state)) {
  matched = false;
-- 
2.4.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 3/9] glx/dri3: Add additional check for gpu offloading case

2015-05-08 Thread Emil Velikov
On 2 May 2015 at 11:15, Axel Davy  wrote:
> Checks blitImage is implemented.
> Initially having the __DRIimageExtension extension
> at version 9 at least meant blitImage was supported.
> However some implementations do advertise version >= 9
> without implementing it.
>
Afaict in the above mentioned case we'll just segfault. A worthy
mesa-stable material imho.

Thanks
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/9] egl/wayland: properly destroy wayland objects

2015-05-08 Thread Emil Velikov
On 2 May 2015 at 11:15, Axel Davy  wrote:
> the wl_registry and the wl_queue allocated weren't destroyed.
>
Would this have any affect other than leaking memory ? If so can you
please add the mesa-stable tag (prior to pushing).

Thanks
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] Mesa 10.6 release plan

2015-05-08 Thread Emil Velikov
Hello all,

On 21 April 2015 at 13:40, Emil Velikov  wrote:
> Hi all,
>
> Here is the preliminary release schedule for Mesa 10.6. Personally I
> would love if we can go up to Mesa 11.0 (Spinal Tap anyone ?) although
> it might happen with the September release.
>
> May 15th 2015 - Feature freeze/Release candidate 1
> May 22st 2015 - Release candidate 2
> May 29th 2015 - Release candidate 3
> June 5th 2015 - Release candidate 4/Mesa 10.6.0
>
> This means that we have roughly four weeks to get new features in, and
> another four weeks to get all the serious bugs sorted out.
>
> Does this sound reasonable with the different teams ? If anyone has
> something special in mind please speak up.
>
This is a gentle reminder that we have one week until the 10.6 branch
point. As far as I can see we have a few extensions which may or may
not make it in time. I would be fine if we need an extra week to
finish up any of the following extensions considering their importance
and/or how long people have been working on them. If you feel the same
way please speak up, otherwise the schedule will stay as outlined
previously.

GL_ARB_shader_subroutine
GL_ARB_tessellation_shader
GL_ARB_shader_image_load_store
GL_ARB_direct_state_access

Thanks
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/5] nir: Define image load, store and atomic intrinsics.

2015-05-08 Thread Francisco Jerez
Connor Abbott  writes:

> On IRC, Ken and I were discussing using a scheme inspired by SPIR-V,
> which has an OpImagePointer instruction that forms a pointer to the
> particular texel of the image as well as
> OpAtomic{Load,Store,Exchange,etc.} that operate on an image or shared
> buffer pointer. The advantages would be:
>
> * Makes translating from SPIR-V easier.
> * Reduces the number of intrinsics we need to add for SSBO support.
> * Reduces the combinatorial explosion enough that we can have separate
> versions for 2, 3, and 4 components and MS vs. non-MS without it being
> unbearable. I'm not sure how much of a benefit that would be though.
>
> The disadvantages I can think of are:
>
> * Doesn't actually save any code in the i965 backend, since we need to
> do different things depending on if the pointer is to an image or a
> shared buffer anyways.
> * We'd have to special case nir_convert_from_ssa to ignore the SSA
> value that's really a pointer since we don't have any real type-level
> support for pointers.
> * Since we lower to SSA before converting to i965, there are some ugly
> edge cases when the coordinate argument becomes part of a phi web and
> gets potentially overwritten before the instruction that uses the
> pointer.
>
Yeah, I actually played around with a SPIR-V-like approach, and decided
to give up the idea in the end mainly because of the disadvantages you
mention, because it's nothing close to what the back-ends will want.

Two other ideas occurred to me that could have made the back-end's life
easier, but it didn't seem like they were worth doing:

 - Write a driver-independent lowering pass to convert SPIR-V-style
   intrinsics into coordinate-based intrinsics while still in SSA form.
   In that case there's likely no point in having the SPIR-V-style
   intrinsics in the first place, no back-end I know of will prefer
   image intrinsics in that form at this point, and we can just let the
   SPIR-V front-end deal with this problem alone.

 - Actually agree on a representation for a texel pointer.  A texel
   pointer would be a triple of pointer-to-object, vec4 of coordinates
   and sample index, together with some static metadata determining the
   memory access properties.  Something more back-end-specific would
   likely work too.  In any case we would also have to agree on a
   representation for a pointer to an image/SSB object.  And we would
   need to type-check pointers correctly to prevent the optimizer from
   doing illegal transformations (e.g. a single store intrinsic that
   writes to either coherent or non-coherent memory depending on the
   parent block, or, if we adhere to the limitations of GLSL strictly, a
   single store intrinsic that might access images based on different
   array uniforms).

   It gets even more "interesting" if you consider the limitations some
   back-ends have about accessing images non-uniformly -- We would have
   to guarantee that the pointer-to-object inside the texel pointer has
   the same value for all thread invocations executing a given load,
   store or atomic intrinsic, what implies that we would have to forbid
   texel pointers in phi instructions unless it can be proven that
   either the control flow incident into the node is uniform *or* the
   pointer-to-object coming from all parent branches is the same.

Sounds like a lot of work for little benefit at this point.

> I don't have a preference one way or the other, and I guess we could
> always refactor it later if we wanted to, so assuming Ken is OK with
> this, then besides one minor comment on patch 4 the series is
>
> Reviewed-by: Connor Abbott 
>
Thanks.

> On Tue, May 5, 2015 at 4:29 PM, Francisco Jerez  wrote:
>> ---
>>  src/glsl/nir/nir_intrinsics.h | 27 +++
>>  1 file changed, 27 insertions(+)
>>
>> diff --git a/src/glsl/nir/nir_intrinsics.h b/src/glsl/nir/nir_intrinsics.h
>> index 8e28765..4b13c75 100644
>> --- a/src/glsl/nir/nir_intrinsics.h
>> +++ b/src/glsl/nir/nir_intrinsics.h
>> @@ -89,6 +89,33 @@ ATOMIC(inc, 0)
>>  ATOMIC(dec, 0)
>>  ATOMIC(read, NIR_INTRINSIC_CAN_ELIMINATE)
>>
>> +/*
>> + * Image load, store and atomic intrinsics.
>> + *
>> + * All image intrinsics take an image target passed as a nir_variable.  
>> Image
>> + * variables contain a number of memory and layout qualifiers that influence
>> + * the semantics of the intrinsic.
>> + *
>> + * All image intrinsics take a four-coordinate vector and a sample index as
>> + * first two sources, determining the location within the image that will be
>> + * accessed by the intrinsic.  Components not applicable to the image target
>> + * in use are equal to zero by convention.  Image store takes an additional
>> + * four-component argument with the value to be written, and image atomic
>> + * operations take either one or two additional scalar arguments with the 
>> same
>> + * meaning as in the ARB_shader_image_load_store specification.
>> + */
>> +INTRINSIC(image_load, 2, ARR(4, 1),

Re: [Mesa-dev] [PATCH 4/5] nir: Translate image load, store and atomic intrinsics from GLSL IR.

2015-05-08 Thread Francisco Jerez
Connor Abbott  writes:

> On Tue, May 5, 2015 at 4:29 PM, Francisco Jerez  wrote:
>> ---
>>  src/glsl/nir/glsl_to_nir.cpp | 125 
>> +++
>>  1 file changed, 114 insertions(+), 11 deletions(-)
>>
>> diff --git a/src/glsl/nir/glsl_to_nir.cpp b/src/glsl/nir/glsl_to_nir.cpp
>> index f6b8331..a01ab3b 100644
>> --- a/src/glsl/nir/glsl_to_nir.cpp
>> +++ b/src/glsl/nir/glsl_to_nir.cpp
>> @@ -614,27 +614,130 @@ nir_visitor::visit(ir_call *ir)
>>   op = nir_intrinsic_atomic_counter_inc_var;
>>} else if (strcmp(ir->callee_name(), 
>> "__intrinsic_atomic_predecrement") == 0) {
>>   op = nir_intrinsic_atomic_counter_dec_var;
>> +  } else if (strcmp(ir->callee_name(), "__intrinsic_image_load") == 0) {
>> + op = nir_intrinsic_image_load;
>> +  } else if (strcmp(ir->callee_name(), "__intrinsic_image_store") == 0) 
>> {
>> + op = nir_intrinsic_image_store;
>> +  } else if (strcmp(ir->callee_name(), "__intrinsic_image_atomic_add") 
>> == 0) {
>> + op = nir_intrinsic_image_atomic_add;
>> +  } else if (strcmp(ir->callee_name(), "__intrinsic_image_atomic_min") 
>> == 0) {
>> + op = nir_intrinsic_image_atomic_min;
>> +  } else if (strcmp(ir->callee_name(), "__intrinsic_image_atomic_max") 
>> == 0) {
>> + op = nir_intrinsic_image_atomic_max;
>> +  } else if (strcmp(ir->callee_name(), "__intrinsic_image_atomic_and") 
>> == 0) {
>> + op = nir_intrinsic_image_atomic_and;
>> +  } else if (strcmp(ir->callee_name(), "__intrinsic_image_atomic_or") 
>> == 0) {
>> + op = nir_intrinsic_image_atomic_or;
>> +  } else if (strcmp(ir->callee_name(), "__intrinsic_image_atomic_xor") 
>> == 0) {
>> + op = nir_intrinsic_image_atomic_xor;
>> +  } else if (strcmp(ir->callee_name(), 
>> "__intrinsic_image_atomic_exchange") == 0) {
>> + op = nir_intrinsic_image_atomic_exchange;
>> +  } else if (strcmp(ir->callee_name(), 
>> "__intrinsic_image_atomic_comp_swap") == 0) {
>> + op = nir_intrinsic_image_atomic_comp_swap;
>>} else {
>>   unreachable("not reached");
>>}
>>
>>nir_intrinsic_instr *instr = nir_intrinsic_instr_create(shader, op);
>> -  ir_dereference *param =
>> - (ir_dereference *) ir->actual_parameters.get_head();
>> -  instr->variables[0] = evaluate_deref(&instr->instr, param);
>> -  nir_ssa_dest_init(&instr->instr, &instr->dest, 1, NULL);
>> +
>> +  switch (op) {
>> +  case nir_intrinsic_atomic_counter_read_var:
>> +  case nir_intrinsic_atomic_counter_inc_var:
>> +  case nir_intrinsic_atomic_counter_dec_var: {
>> + ir_dereference *param =
>> +(ir_dereference *) ir->actual_parameters.get_head();
>> + instr->variables[0] = evaluate_deref(&instr->instr, param);
>> + nir_ssa_dest_init(&instr->instr, &instr->dest, 1, NULL);
>> + break;
>> +  }
>> +  case nir_intrinsic_image_load:
>> +  case nir_intrinsic_image_store:
>> +  case nir_intrinsic_image_atomic_add:
>> +  case nir_intrinsic_image_atomic_min:
>> +  case nir_intrinsic_image_atomic_max:
>> +  case nir_intrinsic_image_atomic_and:
>> +  case nir_intrinsic_image_atomic_or:
>> +  case nir_intrinsic_image_atomic_xor:
>> +  case nir_intrinsic_image_atomic_exchange:
>> +  case nir_intrinsic_image_atomic_comp_swap: {
>> + nir_load_const_instr *instr_zero = 
>> nir_load_const_instr_create(shader, 1);
>> + instr_zero->value.u[0] = 0;
>> + nir_instr_insert_after_cf_list(this->cf_node_list, 
>> &instr_zero->instr);
>> +
>> + /* Set the image variable dereference. */
>> + exec_node *param = ir->actual_parameters.get_head();
>> + ir_dereference *image = (ir_dereference *)param;
>> + const glsl_type *type =
>> +image->variable_referenced()->type->without_array();
>> +
>> + instr->variables[0] = evaluate_deref(&instr->instr, image);
>> + param = param->get_next();
>> +
>> + /* Set the address argument, extending the coordinate vector to 
>> four
>> +  * components.
>> +  */
>> + const nir_src src_addr = evaluate_rvalue((ir_dereference *)param);
>> + nir_alu_instr *instr_addr = nir_alu_instr_create(shader, 
>> nir_op_vec4);
>> + nir_ssa_dest_init(&instr_addr->instr, &instr_addr->dest.dest, 4, 
>> NULL);
>> +
>> + for (int i = 0; i < 4; i++) {
>> +if (i < type->coordinate_components()) {
>> +   instr_addr->src[i].src = src_addr;
>> +   instr_addr->src[i].swizzle[0] = i;
>> +} else {
>> +   instr_addr->src[i].src = nir_src_for_ssa(&instr_zero->def);
>
> I think it would better convey the intent to create an ssa_undef_instr
> and use that here instead of zero. Unless something else relies on the
> extra coordinates being zeroed?
>
Yeah, that would work too.  Zero makes some sense 

Re: [Mesa-dev] [PATCH 7/9] egl/wayland: assume EGL_WINDOW_BIT

2015-05-08 Thread Axel Davy

Le 06/05/2015 03:00, Dave Airlie a écrit :

On 2 May 2015 at 20:15, Axel Davy  wrote:

Only EGL_WINDOW_BIT is supported. Remove tests related.

Is this there no plans to support pixmap/pbuffer/ or any of the other bits?

Seems like a step in the wrong direction if we really should be supporting
other things than WINDOW_BIT in the future.

Dave.

I double checked, and it really seems pbuffers are not supported by the 
current mesa implementation for wayland.

However the spec doesn't tell they shouldn't be supported.

What about changing  the commit message to:
egl/wayland: simplify dri2_wl_create_surface

This function is always used with EGL_WINDOW_BIT. Pixmaps are forbidden
for Wayland, and PBuffers are unimplemented.



I'm also fine with dropping the patch.

Yours,

Axel Davy

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 13/13] mesa/main: Verify context creation on progress

2015-05-08 Thread Pohjolainen, Topi
On Fri, May 08, 2015 at 03:23:52PM +0300, Juha-Pekka Heikkil? wrote:
>perjantai 8. toukokuuta 2015 Ian Romanick  kirjoitti:
> 
>  On 05/07/2015 05:21 AM, Pohjolainen, Topi wrote:
>  > On Tue, May 05, 2015 at 02:25:29PM +0300, Juha-Pekka Heikkila wrote:
>  >> Stop context creation if something failed. If something errored
>  >> during context creation we'd segfault. Now will clean up and
>  >> return error.
>  >>
>  >> Signed-off-by: Juha-Pekka Heikkila 
>  >> ---
>  >>  src/mesa/main/shared.c | 66
>  +++---
>  >>  1 file changed, 62 insertions(+), 4 deletions(-)
>  >>
>  >> diff --git a/src/mesa/main/shared.c b/src/mesa/main/shared.c
>  >> index 0b76cc0..cc05b05 100644
>  >> --- a/src/mesa/main/shared.c
>  >> +++ b/src/mesa/main/shared.c
>  >> @@ -64,9 +64,21 @@ _mesa_alloc_shared_state(struct gl_context *ctx)
>  >>
>  >> mtx_init(&shared->Mutex, mtx_plain);
>  >>
>  >> +   /* Mutex and timestamp for texobj state validation */
>  >> +   mtx_init(&shared->TexMutex, mtx_recursive);
>  >> +   shared->TextureStateStamp = 0;
>  >
>  > Do you really need to move this here?
> 
>  I was going to ask the same thing.  I think moving it here means that it
>  can be unconditionally mtx_destroy'ed in the error path below.
> 
>Yes, Ian got it correct. When moving mutex creation here there is no need
>to go checking about it if need to clean up. I though this makes things
>more clean and simple.

As it can't fail, I would have moved it right before the return in the
success path. Saves you from adding the mutex tear-down in the failure path.

> 
> 
>   
> 
>  >> +
>  >> shared->DisplayList = _mesa_NewHashTable();
>  >> +   if (!shared->DisplayList)
>  >> +  goto error_out;
>  >> +
>  >> shared->TexObjects = _mesa_NewHashTable();
>  >> +   if (!shared->TexObjects)
>  >> +  goto error_out;
>  >> +
>  >> shared->Programs = _mesa_NewHashTable();
>  >> +   if (!shared->Programs)
>  >> +  goto error_out;
>  >>
>  >> shared->DefaultVertexProgram =
>  >>gl_vertex_program(ctx->Driver.NewProgram(ctx,
>  >> @@ -76,17 +88,28 @@ _mesa_alloc_shared_state(struct gl_context *ctx)
>  >> 
>   GL_FRAGMENT_PROGRAM_ARB, 0));
>  >>
>  >> shared->ATIShaders = _mesa_NewHashTable();
>  >> +   if (!shared->ATIShaders)
>  >> +  goto error_out;
>  >> +
>  >> shared->DefaultFragmentShader =
>  _mesa_new_ati_fragment_shader(ctx, 0);
>  >>
>  >> shared->ShaderObjects = _mesa_NewHashTable();
>  >> +   if (!shared->ShaderObjects)
>  >> +  goto error_out;
>  >>
>  >> shared->BufferObjects = _mesa_NewHashTable();
>  >> +   if (!shared->BufferObjects)
>  >> +  goto error_out;
>  >>
>  >> /* GL_ARB_sampler_objects */
>  >> shared->SamplerObjects = _mesa_NewHashTable();
>  >> +   if (!shared->SamplerObjects)
>  >> +  goto error_out;
>  >>
>  >> /* Allocate the default buffer object */
>  >> shared->NullBufferObj = ctx->Driver.NewBufferObject(ctx, 0);
>  >> +   if (!shared->NullBufferObj)
>  >> +   goto error_out;
>  >>
>  >> /* Create default texture objects */
>  >> for (i = 0; i < NUM_TEXTURE_TARGETS; i++) {
>  >> @@ -107,22 +130,57 @@ _mesa_alloc_shared_state(struct gl_context
>  *ctx)
>  >>};
>  >>STATIC_ASSERT(ARRAY_SIZE(targets) == NUM_TEXTURE_TARGETS);
>  >>shared->DefaultTex[i] = ctx->Driver.NewTextureObject(ctx, 0,
>  targets[i]);
>  >> +
>  >> +  if (!shared->DefaultTex[i])
>  >> +  goto error_out;
>  >> }
>  >>
>  >> /* sanity check */
>  >> assert(shared->DefaultTex[TEXTURE_1D_INDEX]->RefCount == 1);
>  >>
>  >> -   /* Mutex and timestamp for texobj state validation */
>  >> -   mtx_init(&shared->TexMutex, mtx_recursive);
>  >> -   shared->TextureStateStamp = 0;
>  >> -
>  >> shared->FrameBuffers = _mesa_NewHashTable();
>  >> +   if (!shared->FrameBuffers)
>  >> +  goto error_out;
>  >> +
>  >> shared->RenderBuffers = _mesa_NewHashTable();
>  >> +   if (!shared->RenderBuffers)
>  >> +  goto error_out;
>  >>
>  >> shared->SyncObjects = _mesa_set_create(NULL, _mesa_hash_pointer,
>  >>_mesa_key_pointer_equal);
>  >> +   if (!shared->SyncObjects)
>  >> +  goto error_out;
>  >>
>  >> return shared;
>  >> +
>  >> +error_out:
>  >> +   for (i = 0; i < NUM_TEXTURE_TARGETS; i++) {
>  >> +  if (shared->DefaultTex[i]) {
>  >> + ctx->Driver.DeleteTexture(ctx, shared->DefaultTex[i]);
> 

Re: [Mesa-dev] [PATCH 11/27] i965: Store gather table information in the program data

2015-05-08 Thread Abdiel Janulgue


On 05/07/2015 06:17 PM, Pohjolainen, Topi wrote:
> On Tue, Apr 28, 2015 at 11:08:08PM +0300, Abdiel Janulgue wrote:
>> The resource streamer is able to gather and pack sparsely-located
>> constant data from any buffer object by a referring to a gather table
>> This patch adds support for keeping track of these constant data
>> fetches into a gather table.
>>
>> The gather table is generated from two sources. Ordinary uniform fetches
>> are stored first. These are then combined with a separate table containing
>> UBO entries. The separate entry for UBOs is needed to make it easier to
>> generate the gather mask when combining and packing the constant data.
>>
>> Signed-off-by: Abdiel Janulgue 
>> ---
>>  src/mesa/drivers/dri/i965/brw_context.h  |  9 +
>>  src/mesa/drivers/dri/i965/brw_gs.c   |  4 
>>  src/mesa/drivers/dri/i965/brw_program.c  |  5 +
>>  src/mesa/drivers/dri/i965/brw_shader.cpp |  4 +++-
>>  src/mesa/drivers/dri/i965/brw_shader.h   | 11 +++
>>  src/mesa/drivers/dri/i965/brw_vs.c   |  5 +
>>  src/mesa/drivers/dri/i965/brw_wm.c   |  5 +
>>  7 files changed, 42 insertions(+), 1 deletion(-)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_context.h 
>> b/src/mesa/drivers/dri/i965/brw_context.h
>> index 7fd49e9..e25c64d 100644
>> --- a/src/mesa/drivers/dri/i965/brw_context.h
>> +++ b/src/mesa/drivers/dri/i965/brw_context.h
>> @@ -355,9 +355,12 @@ struct brw_stage_prog_data {
>>  
>> GLuint nr_params;   /**< number of float params/constants */
>> GLuint nr_pull_params;
>> +   GLuint nr_ubo_params;
>> +   GLuint nr_gather_table;
> 
> I would introduce these as non gl-types - we are trying to move away from
> them. Perhaps change "nr_params" and "nr_pull_params" while you are at it.
> 
>>  
>> unsigned curb_read_length;
>> unsigned total_scratch;
>> +   unsigned max_ubo_const_block;
>>  
>> /**
>>  * Register where the thread expects to find input data from the URB
>> @@ -375,6 +378,12 @@ struct brw_stage_prog_data {
>>  */
>> const gl_constant_value **param;
>> const gl_constant_value **pull_param;
>> +   struct {
>> +  int reg;
>> +  unsigned channel_mask;
>> +  unsigned const_block;
>> +  unsigned const_offset;
>> +   } *gather_table;
>>  };
> 
> Below in brw_shader.h you do:
> 
>struct gather_table {
>   int reg;
>   unsigned channel_mask;
>   unsigned const_block;
>   unsigned const_offset;
>};
>gather_table *ubo_gather_table;
> 
> Why not here?

I guess you're right. Yep, I can probably re-use the same definition in
the next version. Thanks!

> 
>>  
>>  /* Data about a particular attempt to compile a program.  Note that
>> diff --git a/src/mesa/drivers/dri/i965/brw_gs.c 
>> b/src/mesa/drivers/dri/i965/brw_gs.c
>> index bea90d8..97658d5 100644
>> --- a/src/mesa/drivers/dri/i965/brw_gs.c
>> +++ b/src/mesa/drivers/dri/i965/brw_gs.c
>> @@ -70,6 +70,10 @@ brw_compile_gs_prog(struct brw_context *brw,
>> c.prog_data.base.base.pull_param =
>>rzalloc_array(NULL, const gl_constant_value *, param_count);
>> c.prog_data.base.base.nr_params = param_count;
>> +   c.prog_data.base.base.nr_gather_table = 0;
>> +   c.prog_data.base.base.gather_table =
>> +  rzalloc_size(NULL, sizeof(*c.prog_data.base.base.gather_table) *
>> +   (c.prog_data.base.base.nr_params + 
>> c.prog_data.base.base.nr_ubo_params));
> 
> Wrap this line.
> 
>>  
>> if (brw->gen >= 7) {
>>if (gp->program.OutputType == GL_POINTS) {
>> diff --git a/src/mesa/drivers/dri/i965/brw_program.c 
>> b/src/mesa/drivers/dri/i965/brw_program.c
>> index 81a0c19..f27c799 100644
>> --- a/src/mesa/drivers/dri/i965/brw_program.c
>> +++ b/src/mesa/drivers/dri/i965/brw_program.c
>> @@ -573,6 +573,10 @@ brw_stage_prog_data_compare(const struct 
>> brw_stage_prog_data *a,
>> if (memcmp(a->pull_param, b->pull_param, a->nr_pull_params * sizeof(void 
>> *)))
>>return false;
>>  
>> +   if (memcmp(a->gather_table, b->gather_table,
>> +  a->nr_gather_table * sizeof(*a->gather_table)))
>> +  return false;
>> +
>> return true;
>>  }
>>  
>> @@ -583,6 +587,7 @@ brw_stage_prog_data_free(const void *p)
>>  
>> ralloc_free(prog_data->param);
>> ralloc_free(prog_data->pull_param);
>> +   ralloc_free(prog_data->gather_table);
>>  }
>>  
>>  void
>> diff --git a/src/mesa/drivers/dri/i965/brw_shader.cpp 
>> b/src/mesa/drivers/dri/i965/brw_shader.cpp
>> index 0d6ac0c..8769f67 100644
>> --- a/src/mesa/drivers/dri/i965/brw_shader.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_shader.cpp
>> @@ -739,11 +739,13 @@ backend_visitor::backend_visitor(struct brw_context 
>> *brw,
>>   prog(prog),
>>   stage_prog_data(stage_prog_data),
>>   cfg(NULL),
>> - stage(stage)
>> + stage(stage),
>> + ubo_gather_table(NULL)
>>  {
>> debug_enabled = INTEL_DEBUG & intel_debug_flag_for_shader_stage(stage);
>> stage_name = _mesa_shader_stage_to_string(stage);

Re: [Mesa-dev] [PATCH 13/13] mesa/main: Verify context creation on progress

2015-05-08 Thread Juha-Pekka Heikkilä
perjantai 8. toukokuuta 2015 Ian Romanick  kirjoitti:

> On 05/07/2015 05:21 AM, Pohjolainen, Topi wrote:
> > On Tue, May 05, 2015 at 02:25:29PM +0300, Juha-Pekka Heikkila wrote:
> >> Stop context creation if something failed. If something errored
> >> during context creation we'd segfault. Now will clean up and
> >> return error.
> >>
> >> Signed-off-by: Juha-Pekka Heikkila  >
> >> ---
> >>  src/mesa/main/shared.c | 66
> +++---
> >>  1 file changed, 62 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/src/mesa/main/shared.c b/src/mesa/main/shared.c
> >> index 0b76cc0..cc05b05 100644
> >> --- a/src/mesa/main/shared.c
> >> +++ b/src/mesa/main/shared.c
> >> @@ -64,9 +64,21 @@ _mesa_alloc_shared_state(struct gl_context *ctx)
> >>
> >> mtx_init(&shared->Mutex, mtx_plain);
> >>
> >> +   /* Mutex and timestamp for texobj state validation */
> >> +   mtx_init(&shared->TexMutex, mtx_recursive);
> >> +   shared->TextureStateStamp = 0;
> >
> > Do you really need to move this here?
>
> I was going to ask the same thing.  I think moving it here means that it
> can be unconditionally mtx_destroy'ed in the error path below.


Yes, Ian got it correct. When moving mutex creation here there is no need
to go checking about it if need to clean up. I though this makes things
more clean and simple.


>
>
>> +
> >> shared->DisplayList = _mesa_NewHashTable();
> >> +   if (!shared->DisplayList)
> >> +  goto error_out;
> >> +
> >> shared->TexObjects = _mesa_NewHashTable();
> >> +   if (!shared->TexObjects)
> >> +  goto error_out;
> >> +
> >> shared->Programs = _mesa_NewHashTable();
> >> +   if (!shared->Programs)
> >> +  goto error_out;
> >>
> >> shared->DefaultVertexProgram =
> >>gl_vertex_program(ctx->Driver.NewProgram(ctx,
> >> @@ -76,17 +88,28 @@ _mesa_alloc_shared_state(struct gl_context *ctx)
> >>
>  GL_FRAGMENT_PROGRAM_ARB, 0));
> >>
> >> shared->ATIShaders = _mesa_NewHashTable();
> >> +   if (!shared->ATIShaders)
> >> +  goto error_out;
> >> +
> >> shared->DefaultFragmentShader = _mesa_new_ati_fragment_shader(ctx,
> 0);
> >>
> >> shared->ShaderObjects = _mesa_NewHashTable();
> >> +   if (!shared->ShaderObjects)
> >> +  goto error_out;
> >>
> >> shared->BufferObjects = _mesa_NewHashTable();
> >> +   if (!shared->BufferObjects)
> >> +  goto error_out;
> >>
> >> /* GL_ARB_sampler_objects */
> >> shared->SamplerObjects = _mesa_NewHashTable();
> >> +   if (!shared->SamplerObjects)
> >> +  goto error_out;
> >>
> >> /* Allocate the default buffer object */
> >> shared->NullBufferObj = ctx->Driver.NewBufferObject(ctx, 0);
> >> +   if (!shared->NullBufferObj)
> >> +   goto error_out;
> >>
> >> /* Create default texture objects */
> >> for (i = 0; i < NUM_TEXTURE_TARGETS; i++) {
> >> @@ -107,22 +130,57 @@ _mesa_alloc_shared_state(struct gl_context *ctx)
> >>};
> >>STATIC_ASSERT(ARRAY_SIZE(targets) == NUM_TEXTURE_TARGETS);
> >>shared->DefaultTex[i] = ctx->Driver.NewTextureObject(ctx, 0,
> targets[i]);
> >> +
> >> +  if (!shared->DefaultTex[i])
> >> +  goto error_out;
> >> }
> >>
> >> /* sanity check */
> >> assert(shared->DefaultTex[TEXTURE_1D_INDEX]->RefCount == 1);
> >>
> >> -   /* Mutex and timestamp for texobj state validation */
> >> -   mtx_init(&shared->TexMutex, mtx_recursive);
> >> -   shared->TextureStateStamp = 0;
> >> -
> >> shared->FrameBuffers = _mesa_NewHashTable();
> >> +   if (!shared->FrameBuffers)
> >> +  goto error_out;
> >> +
> >> shared->RenderBuffers = _mesa_NewHashTable();
> >> +   if (!shared->RenderBuffers)
> >> +  goto error_out;
> >>
> >> shared->SyncObjects = _mesa_set_create(NULL, _mesa_hash_pointer,
> >>_mesa_key_pointer_equal);
> >> +   if (!shared->SyncObjects)
> >> +  goto error_out;
> >>
> >> return shared;
> >> +
> >> +error_out:
> >> +   for (i = 0; i < NUM_TEXTURE_TARGETS; i++) {
> >> +  if (shared->DefaultTex[i]) {
> >> + ctx->Driver.DeleteTexture(ctx, shared->DefaultTex[i]);
> >> +  }
> >> +   }
> >> +
> >> +   _mesa_reference_buffer_object(ctx, &shared->NullBufferObj, NULL);
> >> +
> >> +   _mesa_DeleteHashTable(shared->RenderBuffers);
> >> +   _mesa_DeleteHashTable(shared->FrameBuffers);
> >> +   _mesa_DeleteHashTable(shared->SamplerObjects);
> >> +   _mesa_DeleteHashTable(shared->BufferObjects);
> >> +   _mesa_DeleteHashTable(shared->ShaderObjects);
> >> +   _mesa_DeleteHashTable(shared->ATIShaders);
> >> +   _mesa_DeleteHashTable(shared->Programs);
> >> +   _mesa_DeleteHashTable(shared->TexObjects);
> >> +   _mesa_DeleteHashTable(shared->DisplayList);
> >> +
> >> +   _mesa_reference_vertprog(ctx, &shared->DefaultVertexProgram, NULL);
> >> +   _mesa_reference_geomprog(ctx, &shared->DefaultGeometryProgram,
> NULL);
> >> +   _mesa_reference_fragprog(ctx, &shared->DefaultFragmentProgram,
> NULL);
> >> +
> >> +   mtx_

Re: [Mesa-dev] [PATCH 03/27] i965: Enable hardware-generated binding tables on render path.

2015-05-08 Thread Abdiel Janulgue


On 05/07/2015 04:43 PM, Pohjolainen, Topi wrote:
> On Tue, Apr 28, 2015 at 11:08:00PM +0300, Abdiel Janulgue wrote:
>> This patch implements the binding table enable command which is also
>> used to allocate a binding table pool where hardware-generated
>> binding table entries are flushed into. Each binding table offset in
>> the binding table pool is unique per each shader stage that are
>> enabled within a batch.
>>
>> Also insert the required brw_tracked_state objects to enable
>> hw-generated binding tables in normal render path.
>>
>> Signed-off-by: Abdiel Janulgue 
>> ---
>>  src/mesa/drivers/dri/i965/brw_binding_tables.c | 70 
>> ++
>>  src/mesa/drivers/dri/i965/brw_context.c|  4 ++
>>  src/mesa/drivers/dri/i965/brw_context.h|  5 ++
>>  src/mesa/drivers/dri/i965/brw_state.h  |  7 +++
>>  src/mesa/drivers/dri/i965/brw_state_upload.c   |  2 +
>>  src/mesa/drivers/dri/i965/intel_batchbuffer.c  |  4 ++
>>  6 files changed, 92 insertions(+)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_binding_tables.c 
>> b/src/mesa/drivers/dri/i965/brw_binding_tables.c
>> index 459165a..a58e32e 100644
>> --- a/src/mesa/drivers/dri/i965/brw_binding_tables.c
>> +++ b/src/mesa/drivers/dri/i965/brw_binding_tables.c
>> @@ -44,6 +44,11 @@
>>  #include "brw_state.h"
>>  #include "intel_batchbuffer.h"
>>  
>> +/* Somehow the hw-binding table pool offset must start here, otherwise
>> + * the GPU will hang
>> + */
>> +#define HW_BT_START_OFFSET 256;
> 
> I think we want to understand this a little better before enabling...
> 
>> +
>>  /**
>>   * Upload a shader stage's binding table as indirect state.
>>   *
>> @@ -163,6 +168,71 @@ const struct brw_tracked_state brw_gs_binding_table = {
>> .emit = brw_gs_upload_binding_table,
>>  };
>>  
>> +/**
>> + * Hardware-generated binding tables for the resource streamer
>> + */
>> +void
>> +gen7_disable_hw_binding_tables(struct brw_context *brw)
>> +{
>> +   BEGIN_BATCH(3);
>> +   OUT_BATCH(_3DSTATE_BINDING_TABLE_POOL_ALLOC << 16 | (3 - 2));
>> +   OUT_BATCH(SET_FIELD(BRW_HW_BINDING_TABLE_OFF, 
>> BRW_HW_BINDING_TABLE_ENABLE) |
>> + brw->is_haswell ? HSW_HW_BINDING_TABLE_RESERVED : 0);
>> +   OUT_BATCH(0);
>> +   ADVANCE_BATCH();
>> +
>> +   /* Pipe control workaround */
>> +   brw_emit_pipe_control_flush(brw, PIPE_CONTROL_STATE_CACHE_INVALIDATE);
>> +}
>> +
>> +void
>> +gen7_enable_hw_binding_tables(struct brw_context *brw)
>> +{
>> +   if (!brw->has_resource_streamer) {
>> +  gen7_disable_hw_binding_tables(brw);
> 
> I started wondering why we really need this - RS is disabled by default and
> we haven't needed to do anything to disable it before.
> 
>> +  return;
>> +   }
>> +
>> +   if (!brw->hw_bt_pool.bo) {
>> +  /* From the BSpec, 3D Pipeline > Resource Streamer > Hardware Binding 
>> Tables:
>> +   *
>> +   *  "A maximum of 16,383 Binding tables are allowed in any batch 
>> buffer."
>> +   */
>> +  int max_size = 16383 * 4;
> 
> But does it really need this much all the time? I guess I need to go and
> read the spec.

This is actually just one re-usable buffer object sticking around for
the lifetime of the context. Compare this with creating lots of bo
every-time we enable the resource streamer. I think it helps with
reducing the amount of relocations we have.

> 
>> +  brw->hw_bt_pool.bo = drm_intel_bo_alloc(brw->bufmgr, "hw_bt",
>> +  max_size, 64);
>> +  brw->hw_bt_pool.next_offset = HW_BT_START_OFFSET;
>> +   }
>> +
>> +   uint32_t dw1 = SET_FIELD(BRW_HW_BINDING_TABLE_ON, 
>> BRW_HW_BINDING_TABLE_ENABLE);
>> +   if (brw->is_haswell)
>> +  dw1 |= SET_FIELD(GEN7_MOCS_L3, GEN7_HW_BT_MOCS) | 
>> HSW_HW_BINDING_TABLE_RESERVED;
> 
> These are overflowing 80 columns.
> 
>> +
>> +   BEGIN_BATCH(3);
>> +   OUT_BATCH(_3DSTATE_BINDING_TABLE_POOL_ALLOC << 16 | (3 - 2));
>> +   OUT_RELOC(brw->hw_bt_pool.bo, I915_GEM_DOMAIN_SAMPLER, 0, dw1);
>> +   OUT_RELOC(brw->hw_bt_pool.bo, I915_GEM_DOMAIN_SAMPLER, 0,
>> + brw->hw_bt_pool.bo->size);
>> +   ADVANCE_BATCH();
>> +
>> +   /* Pipe control workaround */
>> +   brw_emit_pipe_control_flush(brw, PIPE_CONTROL_STATE_CACHE_INVALIDATE);
> 
> Would you have a spec reference for this?

3D-Media-GPGPU Engine > Resource Streamer [HSW+] > Hardware Binding
Tables [HSW+] > Programming note

"When switching between HW and SW binding table generation, SW must
issue a state cache invalidate."

> 
>> +}
>> +
>> +void
>> +gen7_reset_rs_pool_offsets(struct brw_context *brw)
>> +{
>> +   brw->hw_bt_pool.next_offset = HW_BT_START_OFFSET;
>> +}
>> +
>> +const struct brw_tracked_state gen7_hw_binding_tables = {
>> +   .dirty = {
>> +  .mesa = 0,
>> +  .brw = BRW_NEW_BATCH,
>> +   },
>> +   .emit = gen7_enable_hw_binding_tables
>> +};
>> +
>>  /** @} */
>>  
>>  /**
>> diff --git a/src/mesa/drivers/dri/i965/brw_context.c 
>> b/src/mesa/drivers/dri/i965/brw_context.c
>> index c7e1e81..9c7ccae 100

Re: [Mesa-dev] [PATCH 26/27] i965: Disable gather push constants for null constants

2015-05-08 Thread Abdiel Janulgue


On 05/07/2015 05:48 PM, Pohjolainen, Topi wrote:
> On Tue, Apr 28, 2015 at 11:08:23PM +0300, Abdiel Janulgue wrote:
>> Programming null constants with gather constant tables seems to
>> be unsupported and results in a GPU lockup even with the prescribed
>> GPU workarounds in the bspec. Found out by trial and error that
>> disabling HW gather constant when the constant state for a stage
>> needs to be nullified is the only way to go around the issue.
> 
> Just a general question. We keep resource streamer itself always enabled
> (except for blorp of course). Does it still do something meaningful without
> gather constants or should we disable them both?
> 

The resource streamer is just an infrastructure containing the specific
optimizations hw-generated binding tables and gather constants. The
dependency below flows from left to right.

RS_on ringbuffer --> hw-binding tables --> gather constants

So switching on gather constants require HW-gen binding tables be
enabled as well. But hw-generated binding table is not required to be
switched off for gather constants to be turned off.

The only problem with disabling both is the additional required state
setup packets to tear-down and then re-enable everything again which
increases the size of the command buffers for every 3D primitive

-Abdiel


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 03/27] i965: Enable hardware-generated binding tables on render path.

2015-05-08 Thread Abdiel Janulgue


On 05/07/2015 05:46 PM, Pohjolainen, Topi wrote:
> On Thu, May 07, 2015 at 04:43:21PM +0300, Pohjolainen, Topi wrote:
>> On Tue, Apr 28, 2015 at 11:08:00PM +0300, Abdiel Janulgue wrote:
>>> This patch implements the binding table enable command which is also
>>> used to allocate a binding table pool where hardware-generated
>>> binding table entries are flushed into. Each binding table offset in
>>> the binding table pool is unique per each shader stage that are
>>> enabled within a batch.
>>>
>>> Also insert the required brw_tracked_state objects to enable
>>> hw-generated binding tables in normal render path.
>>>
>>> Signed-off-by: Abdiel Janulgue 
>>> ---
>>>  src/mesa/drivers/dri/i965/brw_binding_tables.c | 70 
>>> ++
>>>  src/mesa/drivers/dri/i965/brw_context.c|  4 ++
>>>  src/mesa/drivers/dri/i965/brw_context.h|  5 ++
>>>  src/mesa/drivers/dri/i965/brw_state.h  |  7 +++
>>>  src/mesa/drivers/dri/i965/brw_state_upload.c   |  2 +
>>>  src/mesa/drivers/dri/i965/intel_batchbuffer.c  |  4 ++
>>>  6 files changed, 92 insertions(+)
>>>
>>> diff --git a/src/mesa/drivers/dri/i965/brw_binding_tables.c 
>>> b/src/mesa/drivers/dri/i965/brw_binding_tables.c
>>> index 459165a..a58e32e 100644
>>> --- a/src/mesa/drivers/dri/i965/brw_binding_tables.c
>>> +++ b/src/mesa/drivers/dri/i965/brw_binding_tables.c
>>> @@ -44,6 +44,11 @@
>>>  #include "brw_state.h"
>>>  #include "intel_batchbuffer.h"
>>>  
>>> +/* Somehow the hw-binding table pool offset must start here, otherwise
>>> + * the GPU will hang
>>> + */
>>> +#define HW_BT_START_OFFSET 256;
>>
>> I think we want to understand this a little better before enabling...

Actually, now that I remember I had my notes somewhere when I first
enabled this hw-binding table over a year ago. I dug it up and 256 is
the actually the size of a single stage's "hw-binding table state"
expressed in hw-binding table format. Details:

From the Bspec 3DSTATE_BINDING_TABLE_POINTERS_x > Pointer to PS Binding
Table section lists the format as:

"SurfaceStateOffset[16:6]BINDING_TABLE_STATE*256 When
HW-generated binding table is enabled"

So this is 16-bits[1] x 256 = 512 bytes.

Now this offset must be expressed in "Pointer to PS Binding Table" using
the hw-generated binding table format which must be aligned to 16:6.
However the bit entry field in dw1 of 3DSTATE_BINDING_TABLE_POINTERS_x
must be set within 15:5 so this value should be >> 1. Hence, the 256
(similar case is evident on function gen7_update_binding_table() in
patch 4 of this series).

Seems the RS hardware is extremely intolerant of even slight variations
hence the hungs when this is not followed closely.

In the next version, I can make the magic numbers a bit more clearer.

[1] 3D-Media-GPGPU Engine > Shared Functions > 3D Sampler > State > HW
Generated BINDING_TABLE_STATE

>>
>>> +
>>>  /**
>>>   * Upload a shader stage's binding table as indirect state.
>>>   *
>>> @@ -163,6 +168,71 @@ const struct brw_tracked_state brw_gs_binding_table = {
>>> .emit = brw_gs_upload_binding_table,
>>>  };
>>>  
>>> +/**
>>> + * Hardware-generated binding tables for the resource streamer
>>> + */
>>> +void
>>> +gen7_disable_hw_binding_tables(struct brw_context *brw)
>>> +{
>>> +   BEGIN_BATCH(3);
>>> +   OUT_BATCH(_3DSTATE_BINDING_TABLE_POOL_ALLOC << 16 | (3 - 2));
>>> +   OUT_BATCH(SET_FIELD(BRW_HW_BINDING_TABLE_OFF, 
>>> BRW_HW_BINDING_TABLE_ENABLE) |
>>> + brw->is_haswell ? HSW_HW_BINDING_TABLE_RESERVED : 0);
>>> +   OUT_BATCH(0);
>>> +   ADVANCE_BATCH();
>>> +
>>> +   /* Pipe control workaround */
>>> +   brw_emit_pipe_control_flush(brw, PIPE_CONTROL_STATE_CACHE_INVALIDATE);
>>> +}
>>> +
>>> +void
>>> +gen7_enable_hw_binding_tables(struct brw_context *brw)
>>> +{
>>> +   if (!brw->has_resource_streamer) {
>>> +  gen7_disable_hw_binding_tables(brw);
>>
>> I started wondering why we really need this - RS is disabled by default and
>> we haven't needed to do anything to disable it before.
> 
> Right, patch number eight gave me the answer, we want to disable it for blorp.
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [RFC PATCH] nir: Transform 4*x into x << 2 during late optimizations.

2015-05-08 Thread Kenneth Graunke
According to Glenn, shifts on R600 have 5x the throughput as multiplies.

Intel GPUs have strange integer multiplication restrictions - on most
hardware, MUL actually only does a 32-bit x 16-bit multiply.  This
means the arguments aren't commutative, which can limit our constant
propagation options.  SHL has no such restrictions.

Shifting is probably reasonable on most people's hardware, so let's just
do that.

i965 shader-db results (using NIR for VS):
total instructions in shared programs: 7432587 -> 7388982 (-0.59%)
instructions in affected programs: 1360411 -> 1316806 (-3.21%)
helped:5772
HURT:  0

Signed-off-by: Kenneth Graunke 
Cc: matts...@gmail.com
Cc: ja...@jlekstrand.net
---
 src/glsl/nir/nir_opt_algebraic.py | 5 +
 1 file changed, 5 insertions(+)

So...I found a bizarre issue with this patch.

   (('imul', 4, a), ('ishl', a, 2)),

totally optimizes things.  However...

   (('imul', a, 4), ('ishl', a, 2)),

doesn't seem to do anything, even though imul is commutative, and nir_search
should totally handle that...

 ▄▄  ▄▄▄▄    ▄   ▄▄
 ██  ██   ▀▀▀██▀▀▀  ███  ██
 ▀█▄ ██ ▄█▀      ██ ▄█▀  ██
  ██ ██ ██   ██  ██  ██   ▄██▀   ██
  ███▀▀███   ██  ██   ██ ▀▀
  ███  ███  ▄██  ██▄ ██   ▄▄ ▄▄
  ▀▀▀  ▀▀▀  ▀▀▀▀ ▀▀   ▀▀ ▀▀

If you know why, let me know, otherwise I may have to look into it when more
awake.

diff --git a/src/glsl/nir/nir_opt_algebraic.py 
b/src/glsl/nir/nir_opt_algebraic.py
index 400d60e..350471f 100644
--- a/src/glsl/nir/nir_opt_algebraic.py
+++ b/src/glsl/nir/nir_opt_algebraic.py
@@ -247,6 +247,11 @@ late_optimizations = [
(('fge', ('fadd', a, b), 0.0), ('fge', a, ('fneg', b))),
(('feq', ('fadd', a, b), 0.0), ('feq', a, ('fneg', b))),
(('fne', ('fadd', a, b), 0.0), ('fne', a, ('fneg', b))),
+
+   # Multiplication by 4 comes up fairly often in indirect offset calculations.
+   # Some GPUs have weird integer multiplication limitations, but shifts 
should work
+   # equally well everywhere.
+   (('imul', 4, a), ('ishl', a, 2)),
 ]
 
 print nir_algebraic.AlgebraicPass("nir_opt_algebraic", optimizations).render()
-- 
2.4.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] main: glGetIntegeri_v fails for GL_VERTEX_BINDING_STRIDE

2015-05-08 Thread Tapani Pälli

That is strange, it was introduced in fb370f89d but then has gone missing ..

Reviewed-by: Tapani Pälli 


On 05/07/2015 06:13 PM, Marta Lofstedt wrote:

The return type for GL_VERTEX_BINDING_STRIDE is missing,
this cause glGetIntegeri_v to fail.

Signed-off-by: Marta Lofstedt 
---
  src/mesa/main/get.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/src/mesa/main/get.c b/src/mesa/main/get.c
index 6fc0f3f..9fb8fba 100644
--- a/src/mesa/main/get.c
+++ b/src/mesa/main/get.c
@@ -1959,6 +1959,7 @@ find_value_indexed(const char *func, GLenum pname, GLuint 
index, union value *v)
if (index >= ctx->Const.Program[MESA_SHADER_VERTEX].MaxAttribs)
goto invalid_value;
v->value_int = 
ctx->Array.VAO->VertexBinding[VERT_ATTRIB_GENERIC(index)].Stride;
+  return TYPE_INT;

 /* ARB_shader_image_load_store */
 case GL_IMAGE_BINDING_NAME: {


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965: Use NIR by default for vertex shaders on GEN8+

2015-05-08 Thread Kenneth Graunke
On Thursday, May 07, 2015 06:17:46 PM Matt Turner wrote:
> On Thu, May 7, 2015 at 4:50 PM, Jason Ekstrand  wrote:
> > GLSL IR vs. NIR shader-db results for SIMD8 vertex shaders on Broadwell:
> >
> >total instructions in shared programs: 2724483 -> 2711790 (-0.47%)
> >instructions in affected programs: 1860859 -> 1848166 (-0.68%)
> >helped:4387
> >HURT:  4758
> >GAINED:1499
> >
> > The gained programs are ARB vertext programs that were previously going
> > through the vec4 backend.  Now that we have prog_to_nir, ARB vertex
> > programs can go through the scalar backend so they show up as "gained" in
> > the shader-db results.
> 
> Again, I'm kind of confused and disappointed that we're just okay with
> hurting 4700 programs without more analysis. I guess I'll go do
> that...

I took a stab at that tonight.  The good news is, the majority of the
hurt is pretty stupid.  Indirect uniform address calculations involve
a lot of integer multiplication by 4.

For whatever reason, we're getting 4*x instead of x*4, which doesn't
support immediates.  So we get:

MOV tmp 4
MUL dst tmp x

Normally, constant propagation would commute the operands, giving us
"MUL dst x 4" like we want.  But it sees integer multiplication and
chickens out, due to the asymmetry on some platforms.

I think we can extend that - on Broadwell it should work fine, and
might work fine for 16-bit immediates on Gen7 and Cherryview, too.

Alternatively, I wrote a nir_opt_algebraic_late optimization that turns
4*x into x << 2, which works around the problem, and is also apparently
much better for R600.

Statistics on the shift patch are:

total instructions in shared programs: 7432587 -> 7388982 (-0.59%)
instructions in affected programs: 1360411 -> 1316806 (-3.21%)
helped:5772
HURT:  0

Statistics for GLSL IR vs. NIR+(4*x => x << 2):

total instructions in shared programs: 7232451 -> 7175983 (-0.78%)
instructions in affected programs: 1586917 -> 1530449 (-3.56%)
helped:5780
HURT:  1654

which is much better.

Looking at a couple of the shaders that are still worse off...it looks
like a ton of Source shaders used to do MUL/ADD with an attribute and
two immediates, and now are doing MOV/MOV/MAD.


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v2 1/6] mesa/es3.1: enable GL_ARB_shader_image_load_store for gles3.1

2015-05-08 Thread marta . lofstedt
>
>
> On 05/08/2015 12:13 AM, Ian Romanick wrote:
>> On 05/07/2015 12:57 AM, Marta Lofstedt wrote:
>>> From: Marta Lofstedt 
>>>
>>> v2: only expose enums from GL_ARB_shader_image_load_store
>>> for gles 3.1 and GL core
>>>
>>> Signed-off-by: Marta Lofstedt 
>>> ---
>>>   src/mesa/main/get.c  |  6 ++
>>>   src/mesa/main/get_hash_params.py | 17 -
>>>   2 files changed, 14 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/src/mesa/main/get.c b/src/mesa/main/get.c
>>> index 9898197..73739b6 100644
>>> --- a/src/mesa/main/get.c
>>> +++ b/src/mesa/main/get.c
>>> @@ -355,6 +355,12 @@ static const int extra_ARB_draw_indirect_es31[] =
>>> {
>>>  EXTRA_END
>>>   };
>>>
>>> +static const int extra_ARB_shader_image_load_store_es31[] = {
>>> +   EXT(ARB_shader_image_load_store),
>>> +   EXTRA_API_ES31,
>>
>> I think you're missing the patch that adds EXTRA_API_ES31.  Did you
>> forget to send that one out?
>
> Marta's series builds on top of my patch here that adds EXTRA_API_ES31:
>
> http://lists.freedesktop.org/archives/mesa-dev/2015-May/083593.html
>
>> Also, on a few of these patches, I think the old, non-_es31 set of
>> requirements can be removed due to no longer being used.
>>

Ian, are you referring to the stuff that is left with the
extra_ARB_shader_atomic_counters_and_geometry_shader: do you want me to
remove those and it's companion struct in get.c?

>>> +   EXTRA_END
>>> +};
>>> +
>>>   EXTRA_EXT(ARB_texture_cube_map);
>>>   EXTRA_EXT(EXT_texture_array);
>>>   EXTRA_EXT(NV_fog_distance);
>>> diff --git a/src/mesa/main/get_hash_params.py
>>> b/src/mesa/main/get_hash_params.py
>>> index 513d5d2..85c2494 100644
>>> --- a/src/mesa/main/get_hash_params.py
>>> +++ b/src/mesa/main/get_hash_params.py
>>> @@ -413,6 +413,14 @@ descriptor=[
>>>   { "apis": ["GL_CORE", "GLES3"], "params": [
>>>   # GL_ARB_draw_indirect / GLES 3.1
>>> [ "DRAW_INDIRECT_BUFFER_BINDING", "LOC_CUSTOM, TYPE_INT, 0,
>>> extra_ARB_draw_indirect_es31" ],
>>> +# GL_ARB_shader_image_load_store / GLES 3.1
>>> +  [ "MAX_IMAGE_UNITS", "CONTEXT_INT(Const.MaxImageUnits),
>>> extra_ARB_shader_image_load_store_es31"],
>>> +  [ "MAX_COMBINED_IMAGE_UNITS_AND_FRAGMENT_OUTPUTS",
>>> "CONTEXT_INT(Const.MaxCombinedImageUnitsAndFragmentOutputs),
>>> extra_ARB_shader_image_load_store_es31"],
>>> +  [ "MAX_IMAGE_SAMPLES", "CONTEXT_INT(Const.MaxImageSamples),
>>> extra_ARB_shader_image_load_store_es31"],
>>> +  [ "MAX_VERTEX_IMAGE_UNIFORMS",
>>> "CONTEXT_INT(Const.Program[MESA_SHADER_VERTEX].MaxImageUniforms),
>>> extra_ARB_shader_image_load_store_es31"],
>>> +  [ "MAX_GEOMETRY_IMAGE_UNIFORMS",
>>> "CONTEXT_INT(Const.Program[MESA_SHADER_GEOMETRY].MaxImageUniforms),
>>> extra_ARB_shader_image_load_store_es31"],
>>> +  [ "MAX_FRAGMENT_IMAGE_UNIFORMS",
>>> "CONTEXT_INT(Const.Program[MESA_SHADER_FRAGMENT].MaxImageUniforms),
>>> extra_ARB_shader_image_load_store_es31"],
>>> +  [ "MAX_COMBINED_IMAGE_UNIFORMS",
>>> "CONTEXT_INT(Const.MaxCombinedImageUniforms),
>>> extra_ARB_shader_image_load_store_es31"],
>>>   ]},
>>>
>>>   # Remaining enums are only in OpenGL
>>> @@ -780,15 +788,6 @@ descriptor=[
>>> [ "MAX_VERTEX_ATTRIB_RELATIVE_OFFSET",
>>> "CONTEXT_ENUM(Const.MaxVertexAttribRelativeOffset), NO_EXTRA" ],
>>> [ "MAX_VERTEX_ATTRIB_BINDINGS",
>>> "CONTEXT_ENUM(Const.MaxVertexAttribBindings), NO_EXTRA" ],
>>>
>>> -# GL_ARB_shader_image_load_store
>>> -  [ "MAX_IMAGE_UNITS", "CONTEXT_INT(Const.MaxImageUnits),
>>> extra_ARB_shader_image_load_store"],
>>> -  [ "MAX_COMBINED_IMAGE_UNITS_AND_FRAGMENT_OUTPUTS",
>>> "CONTEXT_INT(Const.MaxCombinedImageUnitsAndFragmentOutputs),
>>> extra_ARB_shader_image_load_store"],
>>> -  [ "MAX_IMAGE_SAMPLES", "CONTEXT_INT(Const.MaxImageSamples),
>>> extra_ARB_shader_image_load_store"],
>>> -  [ "MAX_VERTEX_IMAGE_UNIFORMS",
>>> "CONTEXT_INT(Const.Program[MESA_SHADER_VERTEX].MaxImageUniforms),
>>> extra_ARB_shader_image_load_store"],
>>> -  [ "MAX_GEOMETRY_IMAGE_UNIFORMS",
>>> "CONTEXT_INT(Const.Program[MESA_SHADER_GEOMETRY].MaxImageUniforms),
>>> extra_ARB_shader_image_load_store_and_geometry_shader"],
>>> -  [ "MAX_FRAGMENT_IMAGE_UNIFORMS",
>>> "CONTEXT_INT(Const.Program[MESA_SHADER_FRAGMENT].MaxImageUniforms),
>>> extra_ARB_shader_image_load_store"],
>>> -  [ "MAX_COMBINED_IMAGE_UNIFORMS",
>>> "CONTEXT_INT(Const.MaxCombinedImageUniforms),
>>> extra_ARB_shader_image_load_store"],
>>> -
>>>   # GL_ARB_compute_shader
>>> [ "MAX_COMPUTE_WORK_GROUP_INVOCATIONS",
>>> "CONTEXT_INT(Const.MaxComputeWorkGroupInvocations),
>>> extra_ARB_compute_shader" ],
>>> [ "MAX_COMPUTE_UNIFORM_BLOCKS", "CONST(MAX_COMPUTE_UNIFORM_BLOCKS),
>>> extra_ARB_compute_shader" ],
>>>
>>
>> ___
>> mesa-dev mailing list
>> mesa-dev@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
>>
>

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] nir: fix sampler lowering pass for arrays

2015-05-08 Thread Tapani Pälli



On 05/08/2015 09:56 AM, Pohjolainen, Topi wrote:

On Fri, May 08, 2015 at 09:51:54AM +0300, Tapani P?lli wrote:

This fixes bugs with special cases where we have arrays of
structures containing samplers or arrays of samplers.

I've verified that patch results in calculating same index value as
returned by _mesa_get_sampler_uniform_value for IR. Patch makes
following ES3 conformance test pass:

ES3-CTS.shaders.struct.uniform.sampler_array_fragment

Signed-off-by: Tapani Pälli 
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=90114
---
  src/glsl/nir/nir_lower_samplers.cpp | 6 +-
  1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/src/glsl/nir/nir_lower_samplers.cpp 
b/src/glsl/nir/nir_lower_samplers.cpp
index cf8ab83..9859cc0 100644
--- a/src/glsl/nir/nir_lower_samplers.cpp
+++ b/src/glsl/nir/nir_lower_samplers.cpp
@@ -78,7 +78,11 @@ lower_sampler(nir_tex_instr *instr, const struct 
gl_shader_program *shader_progr
   instr->sampler_index *= glsl_get_length(deref->type);
   switch (deref_array->deref_array_type) {
   case nir_deref_array_type_direct:
-instr->sampler_index += deref_array->base_offset;
+
+/* If this is an array of samplers. */


Above the case is for arrays and below you check for the sampler. This
comment does not tell much extra :)


Yeah, not sure how to change it. What I want to state here is that only 
for arrays of samplers we need to do this, otherwise we don't. The only 
other case really is array of structs that contain sampler so maybe I 
should state that instead?




+if (deref->child->type->base_type == GLSL_TYPE_SAMPLER)
+   instr->sampler_index += deref_array->base_offset;
+
  if (deref_array->deref.child)
 ralloc_asprintf_append(&name, "[%u]", 
deref_array->base_offset);
  break;
--
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev