Re: [Mesa-dev] [PATCH 2/2] nir/lower_vec_to_movs: Don't emit unneeded movs

2015-09-22 Thread Jason Ekstrand
On Sep 22, 2015 10:01 PM, "Jason Ekstrand"  wrote:
>
> It's possible that, if a vecN operation is involved in a phi node, that we
> could end up moving from a register to itself.  If swizzling is involved,
> we need to emit the move but.  However, if there is no swizzling, then the
> mov is a no-op and we might as well not bother emitting it.
>
> Shader-db results on Haswell:
>
>total instructions in shared programs: 6262536 -> 6259558 (-0.05%)
>instructions in affected programs: 184780 -> 181802 (-1.61%)
>helped:838
>HURT:  0

By the way, I have absolutely no idea why this helps more than Matt's patch
to delete these in register coalesce.

> ---
>  src/glsl/nir/nir_lower_vec_to_movs.c | 19 ++-
>  1 file changed, 18 insertions(+), 1 deletion(-)
>
> diff --git a/src/glsl/nir/nir_lower_vec_to_movs.c
b/src/glsl/nir/nir_lower_vec_to_movs.c
> index 287f2bf..2039891 100644
> --- a/src/glsl/nir/nir_lower_vec_to_movs.c
> +++ b/src/glsl/nir/nir_lower_vec_to_movs.c
> @@ -83,7 +83,24 @@ insert_mov(nir_alu_instr *vec, unsigned start_idx,
nir_shader *shader)
>}
> }
>
> -   nir_instr_insert_before(&vec->instr, &mov->instr);
> +   /* In some situations (if the vecN is involved in a phi-web), we can
end
> +* up with a mov from a register to itself.  Some of those channels
may end
> +* up doing nothing and there's no reason to have them as part of the
mov.
> +*/
> +   if (src_matches_dest_reg(&mov->dest.dest, &mov->src[0].src) &&
> +   !mov->src[0].abs && !mov->src[0].negate) {
> +  for (unsigned i = 0; i < 4; i++) {
> + if (mov->src[0].swizzle[i] == i)
> +mov->dest.write_mask &= ~(1 << i);
> +  }
> +   }
> +
> +   /* Only emit the instruction if it actually does something */
> +   if (mov->dest.write_mask) {
> +  nir_instr_insert_before(&vec->instr, &mov->instr);
> +   } else {
> +  ralloc_free(mov);
> +   }
>
> return mov->dest.write_mask;
>  }
> --
> 2.5.0.400.gff86faf
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] nir/lower_vec_to_movs: Properly handle source modifiers on vecN ops

2015-09-22 Thread Jason Ekstrand
I don't know of any piglit tests that are currently broken.  However, there
is nothing stopping a vecN instruction from getting source modifiers and
lower_vec_to_movs is run after we lower to source modifiers.
---
 src/glsl/nir/nir_lower_vec_to_movs.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/src/glsl/nir/nir_lower_vec_to_movs.c 
b/src/glsl/nir/nir_lower_vec_to_movs.c
index 622e59c..287f2bf 100644
--- a/src/glsl/nir/nir_lower_vec_to_movs.c
+++ b/src/glsl/nir/nir_lower_vec_to_movs.c
@@ -68,12 +68,16 @@ insert_mov(nir_alu_instr *vec, unsigned start_idx, 
nir_shader *shader)
 
mov->dest.write_mask = (1u << start_idx);
mov->src[0].swizzle[start_idx] = vec->src[start_idx].swizzle[0];
+   mov->src[0].negate = vec->src[start_idx].negate;
+   mov->src[0].abs = vec->src[start_idx].abs;
 
for (unsigned i = start_idx + 1; i < 4; i++) {
   if (!(vec->dest.write_mask & (1 << i)))
  continue;
 
-  if (nir_srcs_equal(vec->src[i].src, vec->src[start_idx].src)) {
+  if (nir_srcs_equal(vec->src[i].src, vec->src[start_idx].src) &&
+  vec->src[i].negate == vec->src[start_idx].negate &&
+  vec->src[i].abs == vec->src[start_idx].abs) {
  mov->dest.write_mask |= (1 << i);
  mov->src[0].swizzle[i] = vec->src[i].swizzle[0];
   }
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/2] nir/lower_vec_to_movs: Don't emit unneeded movs

2015-09-22 Thread Jason Ekstrand
It's possible that, if a vecN operation is involved in a phi node, that we
could end up moving from a register to itself.  If swizzling is involved,
we need to emit the move but.  However, if there is no swizzling, then the
mov is a no-op and we might as well not bother emitting it.

Shader-db results on Haswell:

   total instructions in shared programs: 6262536 -> 6259558 (-0.05%)
   instructions in affected programs: 184780 -> 181802 (-1.61%)
   helped:838
   HURT:  0
---
 src/glsl/nir/nir_lower_vec_to_movs.c | 19 ++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/src/glsl/nir/nir_lower_vec_to_movs.c 
b/src/glsl/nir/nir_lower_vec_to_movs.c
index 287f2bf..2039891 100644
--- a/src/glsl/nir/nir_lower_vec_to_movs.c
+++ b/src/glsl/nir/nir_lower_vec_to_movs.c
@@ -83,7 +83,24 @@ insert_mov(nir_alu_instr *vec, unsigned start_idx, 
nir_shader *shader)
   }
}
 
-   nir_instr_insert_before(&vec->instr, &mov->instr);
+   /* In some situations (if the vecN is involved in a phi-web), we can end
+* up with a mov from a register to itself.  Some of those channels may end
+* up doing nothing and there's no reason to have them as part of the mov.
+*/
+   if (src_matches_dest_reg(&mov->dest.dest, &mov->src[0].src) &&
+   !mov->src[0].abs && !mov->src[0].negate) {
+  for (unsigned i = 0; i < 4; i++) {
+ if (mov->src[0].swizzle[i] == i)
+mov->dest.write_mask &= ~(1 << i);
+  }
+   }
+
+   /* Only emit the instruction if it actually does something */
+   if (mov->dest.write_mask) {
+  nir_instr_insert_before(&vec->instr, &mov->instr);
+   } else {
+  ralloc_free(mov);
+   }
 
return mov->dest.write_mask;
 }
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 08/11] nir: Add unit tests for control flow graphs.

2015-09-22 Thread Connor Abbott
On Tue, Sep 22, 2015 at 11:01 PM, Kenneth Graunke  wrote:
> Signed-off-by: Kenneth Graunke 
> ---
>  src/glsl/Makefile.am  |  14 +++
>  src/glsl/nir/tests/control_flow_tests.cpp | 155 
> ++
>  2 files changed, 169 insertions(+)
>  create mode 100644 src/glsl/nir/tests/control_flow_tests.cpp
>
> For now, this only adds a single test.  It was broken before this series,
> and now it works.  We certainly need to add more tests; I'm not sure whether
> it makes sense to commit this now or later.
>
> Suggestions for mean things to test are welcome.


Oh boy, I've thought of so many in the past... if you're not inspired,
I'd suggest going through nir_control_flow.c and figuring out how each
of the possible paths can be taken, especially in the lower-level
routines. If it's not relatively obvious or likely to get hit (and I
know for a fact these cases do exist), then we should probably have a
unit test that exercises it. It's too late tonight, but I should
probably do that as well. Also, it might also be a good idea to also
test the bug fixed in patch 03. I don't think adding extra tests
should hold back this getting committed, though -- leaving it around
uncommitted will probably increase the chance that it becomes stale
and make it harder for others (aka me and possibly Jason) to add
tests.

>
> diff --git a/src/glsl/Makefile.am b/src/glsl/Makefile.am
> index 1aa9caa..3265391 100644
> --- a/src/glsl/Makefile.am
> +++ b/src/glsl/Makefile.am
> @@ -50,12 +50,14 @@ EXTRA_DIST = tests glcpp/tests README TODO glcpp/README   
>   \
> nir/nir_opcodes_c.py\
> nir/nir_opcodes_h.py\
> nir/nir_opt_algebraic.py\
> +   nir/tests   \
> SConscript
>
>  include Makefile.sources
>
>  TESTS = glcpp/tests/glcpp-test \
> glcpp/tests/glcpp-test-cr-lf\
> +nir/tests/control_flow_tests   \
> tests/blob-test \
> tests/general-ir-test   \
> tests/optimization-test \
> @@ -70,6 +72,7 @@ noinst_LTLIBRARIES = libnir.la libglsl.la libglcpp.la
>  check_PROGRAMS =   \
> glcpp/glcpp \
> glsl_test   \
> +   nir/tests/control_flow_tests\
> tests/blob-test \
> tests/general-ir-test   \
> tests/sampler-types-test\
> @@ -263,3 +266,14 @@ nir/nir_opcodes.c: nir/nir_opcodes.py 
> nir/nir_opcodes_c.py
>  nir/nir_opt_algebraic.c: nir/nir_opt_algebraic.py nir/nir_algebraic.py
> $(MKDIR_GEN)
> $(PYTHON_GEN) $(srcdir)/nir/nir_opt_algebraic.py > $@
> +
> +nir_tests_control_flow_tests_SOURCES = \
> +   nir/tests/control_flow_tests.cpp
> +nir_tests_control_flow_tests_CFLAGS =  \
> +   $(PTHREAD_CFLAGS)
> +nir_tests_control_flow_tests_LDADD =   \
> +   $(top_builddir)/src/gtest/libgtest.la   \
> +   $(top_builddir)/src/glsl/libnir.la  \
> +   $(top_builddir)/src/libglsl_util.la \
> +   $(top_builddir)/src/util/libmesautil.la \
> +   $(PTHREAD_LIBS)
> diff --git a/src/glsl/nir/tests/control_flow_tests.cpp 
> b/src/glsl/nir/tests/control_flow_tests.cpp
> new file mode 100644
> index 000..b9f90e6
> --- /dev/null
> +++ b/src/glsl/nir/tests/control_flow_tests.cpp
> @@ -0,0 +1,155 @@
> +/*
> + * Copyright © 2015 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
> + * DEALINGS IN THE SOFTWARE.
> + */
> +#include 
> +#include "nir.h"
>

Re: [Mesa-dev] [PATCH 07/11] nir/cf: Fix dominance metadata in the dead control flow pass.

2015-09-22 Thread Connor Abbott
On Tue, Sep 22, 2015 at 11:01 PM, Kenneth Graunke  wrote:
> The NIR control flow modification API churns the block structure,
> splitting blocks, stitching them back together, and so on.  Preserving
> information about block dominance is hard (and probably not worthwhile).
>
> This patch makes nir_cf_extract() throw away all metadata, like we do
> when adding/removing jumps.
>
> We then make the dead control flow pass compute dominance information
> right before it uses it.  This is necessary because earlier work by the
> pass may have invalidated it.
>
> Signed-off-by: Kenneth Graunke 

Reviewed-by: Connor Abbott 

> ---
>  src/glsl/nir/nir_control_flow.c | 3 +++
>  src/glsl/nir/nir_opt_dead_cf.c  | 7 ---
>  2 files changed, 7 insertions(+), 3 deletions(-)
>
> New patch!
>
> diff --git a/src/glsl/nir/nir_control_flow.c b/src/glsl/nir/nir_control_flow.c
> index e027766..8ec1252 100644
> --- a/src/glsl/nir/nir_control_flow.c
> +++ b/src/glsl/nir/nir_control_flow.c
> @@ -742,6 +742,9 @@ nir_cf_extract(nir_cf_list *extracted, nir_cursor begin, 
> nir_cursor end)
> extracted->impl = nir_cf_node_get_function(&block_begin->cf_node);
> exec_list_make_empty(&extracted->list);
>
> +   /* Dominance and other block-related information is toast. */
> +   nir_metadata_preserve(extracted->impl, nir_metadata_none);
> +
> nir_cf_node *cf_node = &block_begin->cf_node;
> nir_cf_node *cf_node_end = &block_end->cf_node;
> while (true) {
> diff --git a/src/glsl/nir/nir_opt_dead_cf.c b/src/glsl/nir/nir_opt_dead_cf.c
> index 317bbc5..0d4819b 100644
> --- a/src/glsl/nir/nir_opt_dead_cf.c
> +++ b/src/glsl/nir/nir_opt_dead_cf.c
> @@ -203,6 +203,10 @@ loop_is_dead(nir_loop *loop)
>   NULL))
>return false;
>
> +   nir_function_impl *impl = nir_cf_node_get_function(&loop->cf_node);
> +   nir_metadata_require(impl, nir_metadata_live_variables |
> +  nir_metadata_dominance);
> +
> for (nir_block *cur = after->imm_dom; cur != before; cur = cur->imm_dom) {
>nir_foreach_instr(cur, instr) {
>   if (!nir_foreach_ssa_def(instr, def_not_live_out, after))
> @@ -332,9 +336,6 @@ dead_cf_list(struct exec_list *list, bool 
> *list_ends_in_jump)
>  static bool
>  opt_dead_cf_impl(nir_function_impl *impl)
>  {
> -   nir_metadata_require(impl, nir_metadata_live_variables |
> -  nir_metadata_dominance);
> -
> bool dummy;
> bool progress = dead_cf_list(&impl->body, &dummy);
>
> --
> 2.5.3
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 05/11] nir/cf: Alter block successors before adding a fake link.

2015-09-22 Thread Connor Abbott
On Tue, Sep 22, 2015 at 11:01 PM, Kenneth Graunke  wrote:
> Consider the case of "while (...) { break }".  Or in NIR:
>
> block block_0 (0x7ab640):
> ...
> /* succs: block_1 */
> loop {
> block block_1:
> /* preds: block_0 */
> break
> /* succs: block_2 */
> }
> block block_2:
>
> Calling nir_handle_remove_jump(block_1, nir_jump_break) will remove the break.
> Unfortunately, it would mangle the predecessors and successors.
>
> Here, block_2->predecessors->entries == 1, so we would create a fake
> link, setting block_1->successors[1] = block_2, and adding block_1 to
> block_2's predecessor set.  This is illegal: a block cannot specify the
> same successor twice.  In particular, adding the predecessor would have
> no effect, as it was already present in the set.
>
> We'd then call unlink_block_successors(), which would delete the fake
> link and remove block_1 from block_2's predecessor set.  It would then
> delete successors[0], and attempt to remove block_1 from block_2's
> predecessor set a second time...except that it wouldn't be present,
> triggering an assertion failure.
>
> The fix appears to be simple: move the fake link creation after we
> recreate the block's successors.  Since we're inspecting "next" later,
> If we've created a new infinite
> loop, the code following the loop will be dead, and thus 0 predecessors.
> So we create a new fake successor.
>
> simply unlink the block's successors and
> recreate them to point at the correct blocks first.  Then, add the fake
> link.  In the above example, removing the break would cause block_1 to
> have itself as a successor (as it becomes an infinite loop), and then
>
> Signed-off-by: Kenneth Graunke 
> ---
>  src/glsl/nir/nir_control_flow.c | 30 ++
>  1 file changed, 14 insertions(+), 16 deletions(-)
>
> New patch!
>
> diff --git a/src/glsl/nir/nir_control_flow.c b/src/glsl/nir/nir_control_flow.c
> index 2b23f38..a59add5 100644
> --- a/src/glsl/nir/nir_control_flow.c
> +++ b/src/glsl/nir/nir_control_flow.c
> @@ -551,31 +551,29 @@ remove_phi_src(nir_block *block, nir_block *pred)
>  static void
>  unlink_jump(nir_block *block, nir_jump_type type, bool add_normal_successors)
>  {
> +   nir_block *next = block->successors[0];
> +
> if (block->successors[0])
>remove_phi_src(block->successors[0], block);
> if (block->successors[1])
>remove_phi_src(block->successors[1], block);
>
> -   if (type == nir_jump_break) {
> -  nir_block *next = block->successors[0];
> +   unlink_block_successors(block);
> +   if (add_normal_successors)
> +  block_add_normal_succs(block);
>
> -  if (next->predecessors->entries == 1) {
> - nir_loop *loop =
> -nir_cf_node_as_loop(nir_cf_node_prev(&next->cf_node));
> +   if (type == nir_jump_break && next->predecessors->entries == 0) {
> +  nir_loop *loop =
> + nir_cf_node_as_loop(nir_cf_node_prev(&next->cf_node));
>
> - /* insert fake link */
> - nir_cf_node *last = nir_loop_last_cf_node(loop);
> - assert(last->type == nir_cf_node_block);
> - nir_block *last_block = nir_cf_node_as_block(last);
> +  /* insert fake link */

I was a little terse here with this comment, and based on how long it
took to actually fix this, there seems to be a lot going on that one
could easily miss from reading the code. The commit message has a
pretty good summary of the problem, but perhaps we should at least add
a comment here explaining that we can't do this until now since if the
break is at the end of the loop then last_block == block and we'd be
inserting the same successor twice and generally screwing things up.
Other than that, this patch and the previous one are

Reviewed-by: Connor Abbott 

> +  nir_cf_node *last = nir_loop_last_cf_node(loop);
> +  assert(last->type == nir_cf_node_block);
> +  nir_block *last_block = nir_cf_node_as_block(last);
>
> - last_block->successors[1] = next;
> - block_add_pred(next, last_block);
> -  }
> +  last_block->successors[1] = next;
> +  block_add_pred(next, last_block);
> }
> -
> -   unlink_block_successors(block);
> -   if (add_normal_successors)
> -  block_add_normal_succs(block);
>  }
>
>  void
> --
> 2.5.3
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] st/mesa: keep track of saturated writes when eliminating dead code

2015-09-22 Thread Ilia Mirkin
It doesn't matter whether a write is saturated or not, in another
implementation it might even have been a separate opcode. This code was
most likely copied from the copy-propagation pass (where one does have
to distinguish saturation).

Signed-off-by: Ilia Mirkin 
---

Haven't run this through piglit yet, but happened upon it when I was
debugging the other issue. (Which turned out to be an odd dependence
on this pass by the regular emitter code.)

 src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp 
b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
index 6762566..ee0acb5 100644
--- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
+++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp
@@ -4228,8 +4228,7 @@ glsl_to_tgsi_visitor::eliminate_dead_code(void)
*/
   for (unsigned i = 0; i < ARRAY_SIZE(inst->dst); i++) {
  if (inst->dst[i].file == PROGRAM_TEMPORARY &&
- !inst->dst[i].reladdr &&
- !inst->saturate) {
+ !inst->dst[i].reladdr) {
 for (int c = 0; c < 4; c++) {
if (inst->dst[i].writemask & (1 << c)) {
   if (writes[4 * inst->dst[i].index + c]) {
-- 
2.4.9

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 03/11] nir/cf: Don't break outer-block successors in split_block_beginning().

2015-09-22 Thread Jason Ekstrand
First three are

Reviewed-by: Jason Ekstrand 

I'm going to need more brain power for the rest.
--Jason

On Tue, Sep 22, 2015 at 8:01 PM, Kenneth Graunke  wrote:
> Consider the following NIR:
>
>block block_0;
>/* succs: block_1 block_2 */
>if (...) {
>   block block_1;
>   ...
>} else {
>   block block_2;
>}
>
> Calling split_block_beginning() on block_1 would break block_0's
> successors:  link_block() sets both successors of a block, so calling
> link_block(block_0, new_block, NULL) would throw away the second
> successor, leaving only /* succ: new_block */.  This is invalid: the
> block before an if statement must have two successors.
>
> Changing the call to link_block(pred, new_block, pred->successors[0])
> would correctly leave both successors in place, but because unlink_block
> may shift successor[1] to successor[0], it may not preserve the original
> order.  NIR maintains a convention that successor[0] must point to the
> "then" block, while successor[1] points to the "else" block, so we need
> to take care to preserve this ordering.
>
> This patch creates a new function that swaps out one successor for
> another, preserving the ordering.  It then uses this to fix the issue.
>
> Signed-off-by: Kenneth Graunke 
> Reviewed-by: Connor Abbott 
> ---
>  src/glsl/nir/nir_control_flow.c | 21 ++---
>  1 file changed, 18 insertions(+), 3 deletions(-)
>
> ***UNCHANGED SINCE THE FIRST SEND***
>
> diff --git a/src/glsl/nir/nir_control_flow.c b/src/glsl/nir/nir_control_flow.c
> index 43e4e43..e2a151d 100644
> --- a/src/glsl/nir/nir_control_flow.c
> +++ b/src/glsl/nir/nir_control_flow.c
> @@ -200,6 +200,23 @@ link_block_to_non_block(nir_block *block, nir_cf_node 
> *node)
>  }
>
>  /**
> + * Replace a block's successor with a different one.
> + */
> +static void
> +replace_successor(nir_block *block, nir_block *old_succ, nir_block *new_succ)
> +{
> +   if (block->successors[0] == old_succ) {
> +  block->successors[0] = new_succ;
> +   } else {
> +  assert(block->successors[1] == old_succ);
> +  block->successors[1] = new_succ;
> +   }
> +
> +   block_remove_pred(old_succ, block);
> +   block_add_pred(new_succ, block);
> +}
> +
> +/**
>   * Takes a basic block and inserts a new empty basic block before it, making 
> its
>   * predecessors point to the new block. This essentially splits the block 
> into
>   * an empty header and a body so that another non-block CF node can be 
> inserted
> @@ -217,9 +234,7 @@ split_block_beginning(nir_block *block)
> struct set_entry *entry;
> set_foreach(block->predecessors, entry) {
>nir_block *pred = (nir_block *) entry->key;
> -
> -  unlink_blocks(pred, block);
> -  link_blocks(pred, new_block, NULL);
> +  replace_successor(pred, block, new_block);
> }
>
> /* Any phi nodes must stay part of the new block, or else their
> --
> 2.5.3
>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] glsl: correctly detect inactive UBO arrays

2015-09-22 Thread Timothy Arceri
On Wed, 2015-09-23 at 13:23 +1000, Timothy Arceri wrote:
> Previously the code was trying to get the packing type from the array
> not the
> interface.
> 
> Cc: Ian Romanick 
> Cc: Antia Puentes 

I meant to add that there is a piglit test here [1] and that this fixes
the second bug mentioned in the bug report [2].

[1] 
http://lists.freedesktop.org/archives/piglit/2015-September/017247.html
[2] https://bugs.freedesktop.org/show_bug.cgi?id=83508


> ---
>  src/glsl/link_uniform_block_active_visitor.cpp | 6 ++
>  src/glsl/opt_dead_code.cpp | 7 ++-
>  2 files changed, 4 insertions(+), 9 deletions(-)
> 
> diff --git a/src/glsl/link_uniform_block_active_visitor.cpp
> b/src/glsl/link_uniform_block_active_visitor.cpp
> index 5102947..72e0782 100644
> --- a/src/glsl/link_uniform_block_active_visitor.cpp
> +++ b/src/glsl/link_uniform_block_active_visitor.cpp
> @@ -77,9 +77,6 @@
> link_uniform_block_active_visitor::visit(ir_variable *var)
> if (!var->is_in_buffer_block())
>return visit_continue;
>  
> -   const glsl_type *const block_type = var->is_interface_instance()
> -  ? var->type : var->get_interface_type();
> -
> /* Section 2.11.6 (Uniform Variables) of the OpenGL ES 3.0.3 spec
> says:
>  *
>  * "All members of a named uniform block declared with a
> shared or
> @@ -88,7 +85,8 @@
> link_uniform_block_active_visitor::visit(ir_variable *var)
>  * also considered active, even if no member of the block is
>  * referenced."
>  */
> -   if (block_type->interface_packing ==
> GLSL_INTERFACE_PACKING_PACKED)
> +   if (var->get_interface_type()->interface_packing ==
> +   GLSL_INTERFACE_PACKING_PACKED)
>return visit_continue;
>  
> /* Process the block.  Bail if there was an error.
> diff --git a/src/glsl/opt_dead_code.cpp b/src/glsl/opt_dead_code.cpp
> index e4bf874..2cb7f41 100644
> --- a/src/glsl/opt_dead_code.cpp
> +++ b/src/glsl/opt_dead_code.cpp
> @@ -119,11 +119,8 @@ do_dead_code(exec_list *instructions, bool
> uniform_locations_assigned)
>   * layouts, do not eliminate it.
>   */
>  if (entry->var->is_in_buffer_block()) {
> -   const glsl_type *const block_type =
> -  entry->var->is_interface_instance()
> -  ? entry->var->type : entry->var
> ->get_interface_type();
> -
> -   if (block_type->interface_packing !=
> GLSL_INTERFACE_PACKING_PACKED)
> +   if (entry->var->get_interface_type()
> ->interface_packing !=
> +   GLSL_INTERFACE_PACKING_PACKED)
>continue;
>  }
>  
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 2/6] nir/lower_alu_to_scalar: Return after lower_reduction

2015-09-22 Thread Jason Ekstrand
On Tue, Sep 22, 2015 at 7:50 PM, Kenneth Graunke  wrote:
> On Tuesday, September 22, 2015 06:18:12 PM Jason Ekstrand wrote:
>> We don't use any of the code after the switch and, as far as I can tell, it
>> wouldn't work anyway.  The only reason this wasn't causing us problems is
>> that it's all dead and DCE cleans it up.
>
> Huh, really?  Right after that switch statement I see:
>
>if (instr->dest.dest.ssa.num_components == 1)
>   return;

Right.

> and reductions should always produce a single component, so this change
> seems like it should do nothing.  It seems like a reasonable change
> regardless...but...

Sure.  I'll change the commit message.  That said, I think my
confusion is, itself, enough justification for the change.

>> ---
>>  src/glsl/nir/nir_lower_alu_to_scalar.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/src/glsl/nir/nir_lower_alu_to_scalar.c 
>> b/src/glsl/nir/nir_lower_alu_to_scalar.c
>> index 5ef5ec2..84d4943 100644
>> --- a/src/glsl/nir/nir_lower_alu_to_scalar.c
>> +++ b/src/glsl/nir/nir_lower_alu_to_scalar.c
>> @@ -86,7 +86,7 @@ lower_alu_instr_scalar(nir_alu_instr *instr, nir_builder 
>> *b)
>> case name##3: \
>> case name##4: \
>>lower_reduction(instr, chan, merge, b); \
>> -  break;
>> +  return;
>>
>> switch (instr->op) {
>> case nir_op_vec4:
>>
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] glsl: correctly detect inactive UBO arrays

2015-09-22 Thread Timothy Arceri
Previously the code was trying to get the packing type from the array not the
interface.

Cc: Ian Romanick 
Cc: Antia Puentes 
---
 src/glsl/link_uniform_block_active_visitor.cpp | 6 ++
 src/glsl/opt_dead_code.cpp | 7 ++-
 2 files changed, 4 insertions(+), 9 deletions(-)

diff --git a/src/glsl/link_uniform_block_active_visitor.cpp 
b/src/glsl/link_uniform_block_active_visitor.cpp
index 5102947..72e0782 100644
--- a/src/glsl/link_uniform_block_active_visitor.cpp
+++ b/src/glsl/link_uniform_block_active_visitor.cpp
@@ -77,9 +77,6 @@ link_uniform_block_active_visitor::visit(ir_variable *var)
if (!var->is_in_buffer_block())
   return visit_continue;
 
-   const glsl_type *const block_type = var->is_interface_instance()
-  ? var->type : var->get_interface_type();
-
/* Section 2.11.6 (Uniform Variables) of the OpenGL ES 3.0.3 spec says:
 *
 * "All members of a named uniform block declared with a shared or
@@ -88,7 +85,8 @@ link_uniform_block_active_visitor::visit(ir_variable *var)
 * also considered active, even if no member of the block is
 * referenced."
 */
-   if (block_type->interface_packing == GLSL_INTERFACE_PACKING_PACKED)
+   if (var->get_interface_type()->interface_packing ==
+   GLSL_INTERFACE_PACKING_PACKED)
   return visit_continue;
 
/* Process the block.  Bail if there was an error.
diff --git a/src/glsl/opt_dead_code.cpp b/src/glsl/opt_dead_code.cpp
index e4bf874..2cb7f41 100644
--- a/src/glsl/opt_dead_code.cpp
+++ b/src/glsl/opt_dead_code.cpp
@@ -119,11 +119,8 @@ do_dead_code(exec_list *instructions, bool 
uniform_locations_assigned)
  * layouts, do not eliminate it.
  */
 if (entry->var->is_in_buffer_block()) {
-   const glsl_type *const block_type =
-  entry->var->is_interface_instance()
-  ? entry->var->type : entry->var->get_interface_type();
-
-   if (block_type->interface_packing != 
GLSL_INTERFACE_PACKING_PACKED)
+   if (entry->var->get_interface_type()->interface_packing !=
+   GLSL_INTERFACE_PACKING_PACKED)
   continue;
 }
 
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 04/11] nir/cf: Conditionally do block_add_normal_succs() in unlink_jump();

2015-09-22 Thread Kenneth Graunke
There is a bug where we mess up predecessors/successors due to the
ordering of unlinking/recreating edges/adding fake edges.  In order to
fix that, I need everything in one routine.

However, calling block_add_normal_succs() isn't safe from
cleanup_cf_node() - it would crash trying to insert phi undefs.
So unfortunately I need to add a parameter.

Signed-off-by: Kenneth Graunke 
---
 src/glsl/nir/nir_control_flow.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

New patch!

diff --git a/src/glsl/nir/nir_control_flow.c b/src/glsl/nir/nir_control_flow.c
index e2a151d..2b23f38 100644
--- a/src/glsl/nir/nir_control_flow.c
+++ b/src/glsl/nir/nir_control_flow.c
@@ -548,8 +548,8 @@ remove_phi_src(nir_block *block, nir_block *pred)
  * infinite loops. Note that the jump to be eliminated may be free-floating.
  */
 
-static
-void unlink_jump(nir_block *block, nir_jump_type type)
+static void
+unlink_jump(nir_block *block, nir_jump_type type, bool add_normal_successors)
 {
if (block->successors[0])
   remove_phi_src(block->successors[0], block);
@@ -574,14 +574,14 @@ void unlink_jump(nir_block *block, nir_jump_type type)
}
 
unlink_block_successors(block);
+   if (add_normal_successors)
+  block_add_normal_succs(block);
 }
 
 void
 nir_handle_remove_jump(nir_block *block, nir_jump_type type)
 {
-   unlink_jump(block, type);
-
-   block_add_normal_succs(block);
+   unlink_jump(block, type, true);
 
nir_function_impl *impl = nir_cf_node_get_function(&block->cf_node);
nir_metadata_preserve(impl, nir_metadata_none);
@@ -689,7 +689,7 @@ cleanup_cf_node(nir_cf_node *node, nir_function_impl *impl)
   nir_foreach_instr_safe(block, instr) {
  if (instr->type == nir_instr_type_jump) {
 nir_jump_type jump_type = nir_instr_as_jump(instr)->type;
-unlink_jump(block, jump_type);
+unlink_jump(block, jump_type, false);
  } else {
 nir_foreach_ssa_def(instr, replace_ssa_def_uses, impl);
 nir_instr_remove(instr);
-- 
2.5.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 10/11] i965/gs: Use new NIR intrinsics.

2015-09-22 Thread Kenneth Graunke
By performing the vertex counting in NIR, we're able to elide a ton of
useless safety checks around every EmitVertex() call:

total instructions in shared programs: 3952 -> 3720 (-5.87%)
instructions in affected programs: 3491 -> 3259 (-6.65%)
helped:11
HURT:  0

Improves performance in Gl32GSCloth by 0.671742% +/- 0.142202% (n=621)
on Haswell GT3e at 1024x768.

This should also make it easier to implement Broadwell's "Static Vertex
Count" feature someday.

Signed-off-by: Kenneth Graunke 
Reviewed-by: Jason Ekstrand 
---
 src/mesa/drivers/dri/i965/brw_nir.c   |  5 
 src/mesa/drivers/dri/i965/brw_vec4_gs_nir.cpp | 13 +--
 src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp | 28 ---
 src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp | 28 ++-
 4 files changed, 48 insertions(+), 26 deletions(-)

***UNCHANGED SINCE THE FIRST SEND***

diff --git a/src/mesa/drivers/dri/i965/brw_nir.c 
b/src/mesa/drivers/dri/i965/brw_nir.c
index b47b87e..1d4f6ab 100644
--- a/src/mesa/drivers/dri/i965/brw_nir.c
+++ b/src/mesa/drivers/dri/i965/brw_nir.c
@@ -96,6 +96,11 @@ brw_create_nir(struct brw_context *brw,
}
nir_validate_shader(nir);
 
+   if (stage == MESA_SHADER_GEOMETRY) {
+  nir_lower_gs_intrinsics(nir);
+  nir_validate_shader(nir);
+   }
+
nir_lower_global_vars_to_local(nir);
nir_validate_shader(nir);
 
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_gs_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_gs_nir.cpp
index 8a8dd57..4f4e1e1 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_gs_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_gs_nir.cpp
@@ -92,16 +92,25 @@ vec4_gs_visitor::nir_emit_intrinsic(nir_intrinsic_instr 
*instr)
src_reg src;
 
switch (instr->intrinsic) {
-   case nir_intrinsic_emit_vertex: {
+   case nir_intrinsic_emit_vertex_with_counter: {
+  this->vertex_count =
+ retype(get_nir_src(instr->src[0], 1), BRW_REGISTER_TYPE_UD);
   int stream_id = instr->const_index[0];
   gs_emit_vertex(stream_id);
   break;
}
 
-   case nir_intrinsic_end_primitive:
+   case nir_intrinsic_end_primitive_with_counter:
+  this->vertex_count =
+ retype(get_nir_src(instr->src[0], 1), BRW_REGISTER_TYPE_UD);
   gs_end_primitive();
   break;
 
+   case nir_intrinsic_set_vertex_count:
+  this->vertex_count =
+ retype(get_nir_src(instr->src[0], 1), BRW_REGISTER_TYPE_UD);
+  break;
+
case nir_intrinsic_load_invocation_id: {
   src_reg invocation_id =
  src_reg(nir_system_values[SYSTEM_VALUE_INVOCATION_ID]);
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
index b9694f6..7a5b945 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
@@ -484,14 +484,6 @@ vec4_gs_visitor::gs_emit_vertex(int stream_id)
if (stream_id > 0 && shader_prog->TransformFeedback.NumVarying == 0)
   return;
 
-   /* To ensure that we don't output more vertices than the shader specified
-* using max_vertices, do the logic inside a conditional of the form "if
-* (vertex_count < MAX)"
-*/
-   unsigned num_output_vertices = c->gp->program.VerticesOut;
-   emit(CMP(dst_null_d(), this->vertex_count,
-src_reg(num_output_vertices), BRW_CONDITIONAL_L));
-   emit(IF(BRW_PREDICATE_NORMAL));
{
   /* If we're outputting 32 control data bits or less, then we can wait
* until the shader is over to output them all.  Otherwise we need to
@@ -562,12 +554,7 @@ vec4_gs_visitor::gs_emit_vertex(int stream_id)
   this->current_annotation = "emit vertex: Stream control data bits";
   set_stream_control_data_bits(stream_id);
   }
-
-  this->current_annotation = "emit vertex: increment vertex count";
-  emit(ADD(dst_reg(this->vertex_count), this->vertex_count,
-   src_reg(1u)));
}
-   emit(BRW_OPCODE_ENDIF);
 
this->current_annotation = NULL;
 }
@@ -575,7 +562,22 @@ vec4_gs_visitor::gs_emit_vertex(int stream_id)
 void
 vec4_gs_visitor::visit(ir_emit_vertex *ir)
 {
+   /* To ensure that we don't output more vertices than the shader specified
+* using max_vertices, do the logic inside a conditional of the form "if
+* (vertex_count < MAX)"
+*/
+   unsigned num_output_vertices = c->gp->program.VerticesOut;
+   emit(CMP(dst_null_d(), this->vertex_count,
+src_reg(num_output_vertices), BRW_CONDITIONAL_L));
+   emit(IF(BRW_PREDICATE_NORMAL));
+
gs_emit_vertex(ir->stream_id());
+
+   this->current_annotation = "emit vertex: increment vertex count";
+   emit(ADD(dst_reg(this->vertex_count), this->vertex_count,
+src_reg(1u)));
+
+   emit(BRW_OPCODE_ENDIF);
 }
 
 void
diff --git a/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp 
b/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp
index 68e443d..5cfff7b 100644

[Mesa-dev] [PATCH 02/11] nir/cf: Make a helper function for removing a predecessor.

2015-09-22 Thread Kenneth Graunke
I need to do this in a second place, and I'd rather make a helper
function than cut and paste the code.

Signed-off-by: Kenneth Graunke 
Reviewed-by: Connor Abbott 
---
 src/glsl/nir/nir_control_flow.c | 16 +++-
 1 file changed, 11 insertions(+), 5 deletions(-)

***UNCHANGED SINCE THE FIRST SEND***

diff --git a/src/glsl/nir/nir_control_flow.c b/src/glsl/nir/nir_control_flow.c
index 768dfd2..43e4e43 100644
--- a/src/glsl/nir/nir_control_flow.c
+++ b/src/glsl/nir/nir_control_flow.c
@@ -60,6 +60,16 @@ block_add_pred(nir_block *block, nir_block *pred)
_mesa_set_add(block->predecessors, pred);
 }
 
+static inline void
+block_remove_pred(nir_block *block, nir_block *pred)
+{
+   struct set_entry *entry = _mesa_set_search(block->predecessors, pred);
+
+   assert(entry);
+
+   _mesa_set_remove(block->predecessors, entry);
+}
+
 static void
 link_blocks(nir_block *pred, nir_block *succ1, nir_block *succ2)
 {
@@ -83,11 +93,7 @@ unlink_blocks(nir_block *pred, nir_block *succ)
   pred->successors[1] = NULL;
}
 
-   struct set_entry *entry = _mesa_set_search(succ->predecessors, pred);
-
-   assert(entry);
-
-   _mesa_set_remove(succ->predecessors, entry);
+   block_remove_pred(succ, pred);
 }
 
 static void
-- 
2.5.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 01/11] nir: Validate that a block doesn't have two identical successors.

2015-09-22 Thread Kenneth Graunke
This is invalid, and causes disasters if we try to unlink successors:
removing the first will work, but removing the second copy will fail
because the block isn't in the successor's predecessor set any longer.

Signed-off-by: Kenneth Graunke 
Reviewed-by: Connor Abbott 
---
 src/glsl/nir/nir_validate.c | 1 +
 1 file changed, 1 insertion(+)

***UNCHANGED SINCE THE FIRST SEND***

I want to commit all the NIR control flow fixes together, as I haven't
tested individual commits - only that the whole group collectively works
and fixes bugs.

diff --git a/src/glsl/nir/nir_validate.c b/src/glsl/nir/nir_validate.c
index 9938c0e..1c9993a 100644
--- a/src/glsl/nir/nir_validate.c
+++ b/src/glsl/nir/nir_validate.c
@@ -586,6 +586,7 @@ validate_block(nir_block *block, validate_state *state)
}
 
assert(block->successors[0] != NULL);
+   assert(block->successors[0] != block->successors[1]);
 
for (unsigned i = 0; i < 2; i++) {
   if (block->successors[i] != NULL) {
-- 
2.5.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 07/11] nir/cf: Fix dominance metadata in the dead control flow pass.

2015-09-22 Thread Kenneth Graunke
The NIR control flow modification API churns the block structure,
splitting blocks, stitching them back together, and so on.  Preserving
information about block dominance is hard (and probably not worthwhile).

This patch makes nir_cf_extract() throw away all metadata, like we do
when adding/removing jumps.

We then make the dead control flow pass compute dominance information
right before it uses it.  This is necessary because earlier work by the
pass may have invalidated it.

Signed-off-by: Kenneth Graunke 
---
 src/glsl/nir/nir_control_flow.c | 3 +++
 src/glsl/nir/nir_opt_dead_cf.c  | 7 ---
 2 files changed, 7 insertions(+), 3 deletions(-)

New patch!

diff --git a/src/glsl/nir/nir_control_flow.c b/src/glsl/nir/nir_control_flow.c
index e027766..8ec1252 100644
--- a/src/glsl/nir/nir_control_flow.c
+++ b/src/glsl/nir/nir_control_flow.c
@@ -742,6 +742,9 @@ nir_cf_extract(nir_cf_list *extracted, nir_cursor begin, 
nir_cursor end)
extracted->impl = nir_cf_node_get_function(&block_begin->cf_node);
exec_list_make_empty(&extracted->list);
 
+   /* Dominance and other block-related information is toast. */
+   nir_metadata_preserve(extracted->impl, nir_metadata_none);
+
nir_cf_node *cf_node = &block_begin->cf_node;
nir_cf_node *cf_node_end = &block_end->cf_node;
while (true) {
diff --git a/src/glsl/nir/nir_opt_dead_cf.c b/src/glsl/nir/nir_opt_dead_cf.c
index 317bbc5..0d4819b 100644
--- a/src/glsl/nir/nir_opt_dead_cf.c
+++ b/src/glsl/nir/nir_opt_dead_cf.c
@@ -203,6 +203,10 @@ loop_is_dead(nir_loop *loop)
  NULL))
   return false;
 
+   nir_function_impl *impl = nir_cf_node_get_function(&loop->cf_node);
+   nir_metadata_require(impl, nir_metadata_live_variables |
+  nir_metadata_dominance);
+
for (nir_block *cur = after->imm_dom; cur != before; cur = cur->imm_dom) {
   nir_foreach_instr(cur, instr) {
  if (!nir_foreach_ssa_def(instr, def_not_live_out, after))
@@ -332,9 +336,6 @@ dead_cf_list(struct exec_list *list, bool 
*list_ends_in_jump)
 static bool
 opt_dead_cf_impl(nir_function_impl *impl)
 {
-   nir_metadata_require(impl, nir_metadata_live_variables |
-  nir_metadata_dominance);
-
bool dummy;
bool progress = dead_cf_list(&impl->body, &dummy);
 
-- 
2.5.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 05/11] nir/cf: Alter block successors before adding a fake link.

2015-09-22 Thread Kenneth Graunke
Consider the case of "while (...) { break }".  Or in NIR:

block block_0 (0x7ab640):
...
/* succs: block_1 */
loop {
block block_1:
/* preds: block_0 */
break
/* succs: block_2 */
}
block block_2:

Calling nir_handle_remove_jump(block_1, nir_jump_break) will remove the break.
Unfortunately, it would mangle the predecessors and successors.

Here, block_2->predecessors->entries == 1, so we would create a fake
link, setting block_1->successors[1] = block_2, and adding block_1 to
block_2's predecessor set.  This is illegal: a block cannot specify the
same successor twice.  In particular, adding the predecessor would have
no effect, as it was already present in the set.

We'd then call unlink_block_successors(), which would delete the fake
link and remove block_1 from block_2's predecessor set.  It would then
delete successors[0], and attempt to remove block_1 from block_2's
predecessor set a second time...except that it wouldn't be present,
triggering an assertion failure.

The fix appears to be simple: move the fake link creation after we
recreate the block's successors.  Since we're inspecting "next" later,
If we've created a new infinite
loop, the code following the loop will be dead, and thus 0 predecessors.
So we create a new fake successor.

simply unlink the block's successors and
recreate them to point at the correct blocks first.  Then, add the fake
link.  In the above example, removing the break would cause block_1 to
have itself as a successor (as it becomes an infinite loop), and then

Signed-off-by: Kenneth Graunke 
---
 src/glsl/nir/nir_control_flow.c | 30 ++
 1 file changed, 14 insertions(+), 16 deletions(-)

New patch!

diff --git a/src/glsl/nir/nir_control_flow.c b/src/glsl/nir/nir_control_flow.c
index 2b23f38..a59add5 100644
--- a/src/glsl/nir/nir_control_flow.c
+++ b/src/glsl/nir/nir_control_flow.c
@@ -551,31 +551,29 @@ remove_phi_src(nir_block *block, nir_block *pred)
 static void
 unlink_jump(nir_block *block, nir_jump_type type, bool add_normal_successors)
 {
+   nir_block *next = block->successors[0];
+
if (block->successors[0])
   remove_phi_src(block->successors[0], block);
if (block->successors[1])
   remove_phi_src(block->successors[1], block);
 
-   if (type == nir_jump_break) {
-  nir_block *next = block->successors[0];
+   unlink_block_successors(block);
+   if (add_normal_successors)
+  block_add_normal_succs(block);
 
-  if (next->predecessors->entries == 1) {
- nir_loop *loop =
-nir_cf_node_as_loop(nir_cf_node_prev(&next->cf_node));
+   if (type == nir_jump_break && next->predecessors->entries == 0) {
+  nir_loop *loop =
+ nir_cf_node_as_loop(nir_cf_node_prev(&next->cf_node));
 
- /* insert fake link */
- nir_cf_node *last = nir_loop_last_cf_node(loop);
- assert(last->type == nir_cf_node_block);
- nir_block *last_block = nir_cf_node_as_block(last);
+  /* insert fake link */
+  nir_cf_node *last = nir_loop_last_cf_node(loop);
+  assert(last->type == nir_cf_node_block);
+  nir_block *last_block = nir_cf_node_as_block(last);
 
- last_block->successors[1] = next;
- block_add_pred(next, last_block);
-  }
+  last_block->successors[1] = next;
+  block_add_pred(next, last_block);
}
-
-   unlink_block_successors(block);
-   if (add_normal_successors)
-  block_add_normal_succs(block);
 }
 
 void
-- 
2.5.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 03/11] nir/cf: Don't break outer-block successors in split_block_beginning().

2015-09-22 Thread Kenneth Graunke
Consider the following NIR:

   block block_0;
   /* succs: block_1 block_2 */
   if (...) {
  block block_1;
  ...
   } else {
  block block_2;
   }

Calling split_block_beginning() on block_1 would break block_0's
successors:  link_block() sets both successors of a block, so calling
link_block(block_0, new_block, NULL) would throw away the second
successor, leaving only /* succ: new_block */.  This is invalid: the
block before an if statement must have two successors.

Changing the call to link_block(pred, new_block, pred->successors[0])
would correctly leave both successors in place, but because unlink_block
may shift successor[1] to successor[0], it may not preserve the original
order.  NIR maintains a convention that successor[0] must point to the
"then" block, while successor[1] points to the "else" block, so we need
to take care to preserve this ordering.

This patch creates a new function that swaps out one successor for
another, preserving the ordering.  It then uses this to fix the issue.

Signed-off-by: Kenneth Graunke 
Reviewed-by: Connor Abbott 
---
 src/glsl/nir/nir_control_flow.c | 21 ++---
 1 file changed, 18 insertions(+), 3 deletions(-)

***UNCHANGED SINCE THE FIRST SEND***

diff --git a/src/glsl/nir/nir_control_flow.c b/src/glsl/nir/nir_control_flow.c
index 43e4e43..e2a151d 100644
--- a/src/glsl/nir/nir_control_flow.c
+++ b/src/glsl/nir/nir_control_flow.c
@@ -200,6 +200,23 @@ link_block_to_non_block(nir_block *block, nir_cf_node 
*node)
 }
 
 /**
+ * Replace a block's successor with a different one.
+ */
+static void
+replace_successor(nir_block *block, nir_block *old_succ, nir_block *new_succ)
+{
+   if (block->successors[0] == old_succ) {
+  block->successors[0] = new_succ;
+   } else {
+  assert(block->successors[1] == old_succ);
+  block->successors[1] = new_succ;
+   }
+
+   block_remove_pred(old_succ, block);
+   block_add_pred(new_succ, block);
+}
+
+/**
  * Takes a basic block and inserts a new empty basic block before it, making 
its
  * predecessors point to the new block. This essentially splits the block into
  * an empty header and a body so that another non-block CF node can be inserted
@@ -217,9 +234,7 @@ split_block_beginning(nir_block *block)
struct set_entry *entry;
set_foreach(block->predecessors, entry) {
   nir_block *pred = (nir_block *) entry->key;
-
-  unlink_blocks(pred, block);
-  link_blocks(pred, new_block, NULL);
+  replace_successor(pred, block, new_block);
}
 
/* Any phi nodes must stay part of the new block, or else their
-- 
2.5.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 08/11] nir: Add unit tests for control flow graphs.

2015-09-22 Thread Kenneth Graunke
Signed-off-by: Kenneth Graunke 
---
 src/glsl/Makefile.am  |  14 +++
 src/glsl/nir/tests/control_flow_tests.cpp | 155 ++
 2 files changed, 169 insertions(+)
 create mode 100644 src/glsl/nir/tests/control_flow_tests.cpp

For now, this only adds a single test.  It was broken before this series,
and now it works.  We certainly need to add more tests; I'm not sure whether
it makes sense to commit this now or later.

Suggestions for mean things to test are welcome.

diff --git a/src/glsl/Makefile.am b/src/glsl/Makefile.am
index 1aa9caa..3265391 100644
--- a/src/glsl/Makefile.am
+++ b/src/glsl/Makefile.am
@@ -50,12 +50,14 @@ EXTRA_DIST = tests glcpp/tests README TODO glcpp/README 
\
nir/nir_opcodes_c.py\
nir/nir_opcodes_h.py\
nir/nir_opt_algebraic.py\
+   nir/tests   \
SConscript
 
 include Makefile.sources
 
 TESTS = glcpp/tests/glcpp-test \
glcpp/tests/glcpp-test-cr-lf\
+nir/tests/control_flow_tests   \
tests/blob-test \
tests/general-ir-test   \
tests/optimization-test \
@@ -70,6 +72,7 @@ noinst_LTLIBRARIES = libnir.la libglsl.la libglcpp.la
 check_PROGRAMS =   \
glcpp/glcpp \
glsl_test   \
+   nir/tests/control_flow_tests\
tests/blob-test \
tests/general-ir-test   \
tests/sampler-types-test\
@@ -263,3 +266,14 @@ nir/nir_opcodes.c: nir/nir_opcodes.py nir/nir_opcodes_c.py
 nir/nir_opt_algebraic.c: nir/nir_opt_algebraic.py nir/nir_algebraic.py
$(MKDIR_GEN)
$(PYTHON_GEN) $(srcdir)/nir/nir_opt_algebraic.py > $@
+
+nir_tests_control_flow_tests_SOURCES = \
+   nir/tests/control_flow_tests.cpp
+nir_tests_control_flow_tests_CFLAGS =  \
+   $(PTHREAD_CFLAGS)
+nir_tests_control_flow_tests_LDADD =   \
+   $(top_builddir)/src/gtest/libgtest.la   \
+   $(top_builddir)/src/glsl/libnir.la  \
+   $(top_builddir)/src/libglsl_util.la \
+   $(top_builddir)/src/util/libmesautil.la \
+   $(PTHREAD_LIBS)
diff --git a/src/glsl/nir/tests/control_flow_tests.cpp 
b/src/glsl/nir/tests/control_flow_tests.cpp
new file mode 100644
index 000..b9f90e6
--- /dev/null
+++ b/src/glsl/nir/tests/control_flow_tests.cpp
@@ -0,0 +1,155 @@
+/*
+ * Copyright © 2015 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
+ * DEALINGS IN THE SOFTWARE.
+ */
+#include 
+#include "nir.h"
+#include "nir_builder.h"
+
+class nir_cf_test : public ::testing::Test {
+protected:
+   nir_cf_test();
+   ~nir_cf_test();
+
+   nir_builder b;
+   nir_shader *shader;
+   nir_function_impl *impl;
+};
+
+nir_cf_test::nir_cf_test()
+{
+   static const nir_shader_compiler_options options = { };
+   shader = nir_shader_create(NULL, MESA_SHADER_VERTEX, &options);
+   nir_function *func = nir_function_create(shader, "main");
+   nir_function_overload *overload = nir_function_overload_create(func);
+   impl = nir_function_impl_create(overload);
+
+   nir_builder_init(&b, impl);
+}
+
+nir_cf_test::~nir_cf_test()
+{
+   ralloc_free(shader);
+}
+
+TEST_F(nir_cf_test, delete_break_in_loop)
+{
+   /* Create IR:
+*
+* while (...) { break; }
+*/
+   nir_loop *loop = nir_loop_create(shader);
+   nir_cf_node_insert(nir_after_cf_list(&impl->body), &loop->cf_node);
+
+   b.cursor = nir_after_cf_list(&loop->body);
+
+   nir_jump_instr *jump = nir_jump_instr_create(shader, nir_jum

[Mesa-dev] [PATCH 06/11] nir/cf: Fix unlink_block_successors to actually unlink the second one.

2015-09-22 Thread Kenneth Graunke
Calling unlink_blocks(block, block->successors[0]) will successfully
unlink the first successor, but then will shift block->successors[1]
down to block->successor[0].  So the successors[1] != NULL check will
always fail.

Signed-off-by: Kenneth Graunke 
Reviewed-by: Connor Abbott 
---
 src/glsl/nir/nir_control_flow.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

***UNCHANGED SINCE THE FIRST SEND***

diff --git a/src/glsl/nir/nir_control_flow.c b/src/glsl/nir/nir_control_flow.c
index a59add5..e027766 100644
--- a/src/glsl/nir/nir_control_flow.c
+++ b/src/glsl/nir/nir_control_flow.c
@@ -99,10 +99,10 @@ unlink_blocks(nir_block *pred, nir_block *succ)
 static void
 unlink_block_successors(nir_block *block)
 {
-   if (block->successors[0] != NULL)
-  unlink_blocks(block, block->successors[0]);
if (block->successors[1] != NULL)
   unlink_blocks(block, block->successors[1]);
+   if (block->successors[0] != NULL)
+  unlink_blocks(block, block->successors[0]);
 }
 
 static void
-- 
2.5.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 09/11] nir: Add new GS intrinsics that maintain a count of emitted vertices.

2015-09-22 Thread Kenneth Graunke
This patch also introduces a lowering pass to convert the simple GS
intrinsics to the new ones.  See the comments above that for the
rationale behind the new intrinsics.

This should be useful for i965; it's a generic enough mechanism that I
could see other drivers potentially using it as well, so I don't feel
too bad about putting it in the generic code.

v2:
- Use nir_after_block_before_jump for the cursor (caught by Jason
  Ekstrand - I'd mistakenly used nir_after_block when rebasing this
  code onto the new NIR control flow API).
- Remove the old emit_vertex intrinsic at the end, rather than in
  the middle (requested by Jason).
- Use state->... directly rather than locals (requested by Jason).
- Report progress from nir_lower_gs_intrinsics() (requested by me).
- Remove "Authors:" section from file comment (requested by
  Michael Schellenberger Costa).

Signed-off-by: Kenneth Graunke 
Reviewed-by: Jason Ekstrand  [v1]
---
 src/glsl/Makefile.sources  |   1 +
 src/glsl/nir/nir.h |   2 +
 src/glsl/nir/nir_intrinsics.h  |  21 
 src/glsl/nir/nir_lower_gs_intrinsics.c | 218 +
 4 files changed, 242 insertions(+)
 create mode 100644 src/glsl/nir/nir_lower_gs_intrinsics.c

Revised patch.  This GS series was blocked on fixing the NIR CF bugs, but
I'm hoping that we'll be able to land it now.

diff --git a/src/glsl/Makefile.sources b/src/glsl/Makefile.sources
index f7c69f4..a8f4994 100644
--- a/src/glsl/Makefile.sources
+++ b/src/glsl/Makefile.sources
@@ -37,6 +37,7 @@ NIR_FILES = \
nir/nir_lower_atomics.c \
nir/nir_lower_clip.c \
nir/nir_lower_global_vars_to_local.c \
+   nir/nir_lower_gs_intrinsics.c \
nir/nir_lower_load_const_to_scalar.c \
nir/nir_lower_locals_to_regs.c \
nir/nir_lower_idiv.c \
diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h
index 619a363..4f45770 100644
--- a/src/glsl/nir/nir.h
+++ b/src/glsl/nir/nir.h
@@ -1883,6 +1883,8 @@ void nir_lower_two_sided_color(nir_shader *shader);
 void nir_lower_atomics(nir_shader *shader);
 void nir_lower_to_source_mods(nir_shader *shader);
 
+bool nir_lower_gs_intrinsics(nir_shader *shader);
+
 bool nir_normalize_cubemap_coords(nir_shader *shader);
 
 void nir_live_variables_impl(nir_function_impl *impl);
diff --git a/src/glsl/nir/nir_intrinsics.h b/src/glsl/nir/nir_intrinsics.h
index 70cae42..b21460d 100644
--- a/src/glsl/nir/nir_intrinsics.h
+++ b/src/glsl/nir/nir_intrinsics.h
@@ -79,9 +79,30 @@ BARRIER(memory_barrier)
 /** A conditional discard, with a single boolean source. */
 INTRINSIC(discard_if, 1, ARR(1), false, 0, 0, 0, 0)
 
+/**
+ * Basic Geometry Shader intrinsics.
+ *
+ * emit_vertex implements GLSL's EmitStreamVertex() built-in.  It takes a 
single
+ * index, which is the stream ID to write to.
+ *
+ * end_primitive implements GLSL's EndPrimitive() built-in.
+ */
 INTRINSIC(emit_vertex,   0, ARR(), false, 0, 0, 1, 0)
 INTRINSIC(end_primitive, 0, ARR(), false, 0, 0, 1, 0)
 
+/**
+ * Geometry Shader intrinsics with a vertex count.
+ *
+ * Alternatively, drivers may implement these intrinsics, and use
+ * nir_lower_gs_intrinsics() to convert from the basic intrinsics.
+ *
+ * These maintain a count of the number of vertices emitted, as an additional
+ * unsigned integer source.
+ */
+INTRINSIC(emit_vertex_with_counter, 1, ARR(1), false, 0, 0, 1, 0)
+INTRINSIC(end_primitive_with_counter, 1, ARR(1), false, 0, 0, 1, 0)
+INTRINSIC(set_vertex_count, 1, ARR(1), false, 0, 0, 0, 0)
+
 /*
  * Atomic counters
  *
diff --git a/src/glsl/nir/nir_lower_gs_intrinsics.c 
b/src/glsl/nir/nir_lower_gs_intrinsics.c
new file mode 100644
index 000..2ee4e5c
--- /dev/null
+++ b/src/glsl/nir/nir_lower_gs_intrinsics.c
@@ -0,0 +1,218 @@
+/*
+ * Copyright © 2015 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ */
+
+#include "nir.h"
+#include "nir_builder.h"
+
+/**

[Mesa-dev] [PATCH 11/11] i965/gs: Fix extra level of indentation left by the previous commit.

2015-09-22 Thread Kenneth Graunke
I left a bunch of code indented a level in the previous patch to make
the diff easier to read.  But now we should fix that.

Signed-off-by: Kenneth Graunke 
Reviewed-by: Jason Ekstrand 
---
 src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp | 124 +++---
 src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp | 102 +-
 2 files changed, 111 insertions(+), 115 deletions(-)

***UNCHANGED SINCE THE FIRST SEND***

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
index 7a5b945..3cb1b4c 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp
@@ -484,76 +484,74 @@ vec4_gs_visitor::gs_emit_vertex(int stream_id)
if (stream_id > 0 && shader_prog->TransformFeedback.NumVarying == 0)
   return;
 
-   {
-  /* If we're outputting 32 control data bits or less, then we can wait
-   * until the shader is over to output them all.  Otherwise we need to
-   * output them as we go.  Now is the time to do it, since we're about to
-   * output the vertex_count'th vertex, so it's guaranteed that the
-   * control data bits associated with the (vertex_count - 1)th vertex are
-   * correct.
+   /* If we're outputting 32 control data bits or less, then we can wait
+* until the shader is over to output them all.  Otherwise we need to
+* output them as we go.  Now is the time to do it, since we're about to
+* output the vertex_count'th vertex, so it's guaranteed that the
+* control data bits associated with the (vertex_count - 1)th vertex are
+* correct.
+*/
+   if (c->control_data_header_size_bits > 32) {
+  this->current_annotation = "emit vertex: emit control data bits";
+  /* Only emit control data bits if we've finished accumulating a batch
+   * of 32 bits.  This is the case when:
+   *
+   * (vertex_count * bits_per_vertex) % 32 == 0
+   *
+   * (in other words, when the last 5 bits of vertex_count *
+   * bits_per_vertex are 0).  Assuming bits_per_vertex == 2^n for some
+   * integer n (which is always the case, since bits_per_vertex is
+   * always 1 or 2), this is equivalent to requiring that the last 5-n
+   * bits of vertex_count are 0:
+   *
+   * vertex_count & (2^(5-n) - 1) == 0
+   *
+   * 2^(5-n) == 2^5 / 2^n == 32 / bits_per_vertex, so this is
+   * equivalent to:
+   *
+   * vertex_count & (32 / bits_per_vertex - 1) == 0
*/
-  if (c->control_data_header_size_bits > 32) {
- this->current_annotation = "emit vertex: emit control data bits";
- /* Only emit control data bits if we've finished accumulating a batch
-  * of 32 bits.  This is the case when:
-  *
-  * (vertex_count * bits_per_vertex) % 32 == 0
-  *
-  * (in other words, when the last 5 bits of vertex_count *
-  * bits_per_vertex are 0).  Assuming bits_per_vertex == 2^n for some
-  * integer n (which is always the case, since bits_per_vertex is
-  * always 1 or 2), this is equivalent to requiring that the last 5-n
-  * bits of vertex_count are 0:
-  *
-  * vertex_count & (2^(5-n) - 1) == 0
-  *
-  * 2^(5-n) == 2^5 / 2^n == 32 / bits_per_vertex, so this is
-  * equivalent to:
-  *
-  * vertex_count & (32 / bits_per_vertex - 1) == 0
+  vec4_instruction *inst =
+ emit(AND(dst_null_d(), this->vertex_count,
+  (uint32_t) (32 / c->control_data_bits_per_vertex - 1)));
+  inst->conditional_mod = BRW_CONDITIONAL_Z;
+
+  emit(IF(BRW_PREDICATE_NORMAL));
+  {
+ /* If vertex_count is 0, then no control data bits have been
+  * accumulated yet, so we skip emitting them.
   */
- vec4_instruction *inst =
-emit(AND(dst_null_d(), this->vertex_count,
- (uint32_t) (32 / c->control_data_bits_per_vertex - 1)));
- inst->conditional_mod = BRW_CONDITIONAL_Z;
-
+ emit(CMP(dst_null_d(), this->vertex_count, 0u,
+  BRW_CONDITIONAL_NEQ));
  emit(IF(BRW_PREDICATE_NORMAL));
- {
-/* If vertex_count is 0, then no control data bits have been
- * accumulated yet, so we skip emitting them.
- */
-emit(CMP(dst_null_d(), this->vertex_count, 0u,
- BRW_CONDITIONAL_NEQ));
-emit(IF(BRW_PREDICATE_NORMAL));
-emit_control_data_bits();
-emit(BRW_OPCODE_ENDIF);
-
-/* Reset control_data_bits to 0 so we can start accumulating a new
- * batch.
- *
- * Note: in the case where vertex_count == 0, this neutralizes the
- * effect of any call to EndPrimitive() that the shader may have
- * made before outputting its first vertex.
-

Re: [Mesa-dev] [PATCH 2/6] nir/lower_alu_to_scalar: Return after lower_reduction

2015-09-22 Thread Kenneth Graunke
On Tuesday, September 22, 2015 06:18:12 PM Jason Ekstrand wrote:
> We don't use any of the code after the switch and, as far as I can tell, it
> wouldn't work anyway.  The only reason this wasn't causing us problems is
> that it's all dead and DCE cleans it up.

Huh, really?  Right after that switch statement I see:

   if (instr->dest.dest.ssa.num_components == 1)
  return;

and reductions should always produce a single component, so this change
seems like it should do nothing.  It seems like a reasonable change
regardless...but...

> ---
>  src/glsl/nir/nir_lower_alu_to_scalar.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/src/glsl/nir/nir_lower_alu_to_scalar.c 
> b/src/glsl/nir/nir_lower_alu_to_scalar.c
> index 5ef5ec2..84d4943 100644
> --- a/src/glsl/nir/nir_lower_alu_to_scalar.c
> +++ b/src/glsl/nir/nir_lower_alu_to_scalar.c
> @@ -86,7 +86,7 @@ lower_alu_instr_scalar(nir_alu_instr *instr, nir_builder *b)
> case name##3: \
> case name##4: \
>lower_reduction(instr, chan, merge, b); \
> -  break;
> +  return;
>  
> switch (instr->op) {
> case nir_op_vec4:
> 


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 6/6] prog_to_nir: Use nir_op_dph

2015-09-22 Thread Kenneth Graunke
On Tuesday, September 22, 2015 06:18:16 PM Jason Ekstrand wrote:
> ---
>  src/mesa/program/prog_to_nir.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/src/mesa/program/prog_to_nir.c b/src/mesa/program/prog_to_nir.c
> index ec61100..1bd735a 100644
> --- a/src/mesa/program/prog_to_nir.c
> +++ b/src/mesa/program/prog_to_nir.c
> @@ -527,8 +527,7 @@ ptn_dp4(nir_builder *b, nir_alu_dest dest, nir_ssa_def 
> **src)
>  static void
>  ptn_dph(nir_builder *b, nir_alu_dest dest, nir_ssa_def **src)
>  {
> -   nir_ssa_def *dp3 = nir_fdot3(b, src[0], src[1]);
> -   ptn_move_dest(b, dest, nir_fadd(b, dp3, ptn_channel(b, src[1], W)));
> +   ptn_move_dest(b, dest, nir_fdph(b, src[0], src[1]));
>  }
>  
>  static void
> 

I verified that both freedreno and vc4 use nir_lower_alu_to_scalar(),
so they shouldn't need any new code to handle fdph.  It might make sense
to make tgsi_to_nir use fdph as well, but it's fairly moot since neither
consumer will actually see dot products.

Series is:
Reviewed-by: Kenneth Graunke 


signature.asc
Description: This is a digitally signed message part.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 6/6] prog_to_nir: Use nir_op_dph

2015-09-22 Thread Matt Turner
On Tue, Sep 22, 2015 at 6:18 PM, Jason Ekstrand  wrote:
> ---

Thanks Jason! The whole series is

Reviewed-by: Matt Turner 

Feel free to add these shader-db results to this commit message:

instructions in affected programs: 72 -> 56 (-22.22%)
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 5/6] nir/lower_alu_to_scalar: Add support for nir_op_fdph

2015-09-22 Thread Jason Ekstrand
---
 src/glsl/nir/nir_lower_alu_to_scalar.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/src/glsl/nir/nir_lower_alu_to_scalar.c 
b/src/glsl/nir/nir_lower_alu_to_scalar.c
index 84d4943..9313fc0 100644
--- a/src/glsl/nir/nir_lower_alu_to_scalar.c
+++ b/src/glsl/nir/nir_lower_alu_to_scalar.c
@@ -112,6 +112,24 @@ lower_alu_instr_scalar(nir_alu_instr *instr, nir_builder 
*b)
*/
   return;
 
+   case nir_op_fdph: {
+  nir_ssa_def *sum[4];
+  for (unsigned i = 0; i < 3; i++) {
+ sum[i] = nir_fmul(b, nir_channel(b, instr->src[0].src.ssa,
+  instr->src[0].swizzle[i]),
+  nir_channel(b, instr->src[1].src.ssa,
+  instr->src[1].swizzle[i]));
+  }
+  sum[3] = nir_channel(b, instr->src[1].src.ssa, instr->src[1].swizzle[3]);
+
+  nir_ssa_def *val = nir_fadd(b, nir_fadd(b, sum[0], sum[1]),
+ nir_fadd(b, sum[2], sum[3]));
+
+  nir_ssa_def_rewrite_uses(&instr->dest.dest.ssa, nir_src_for_ssa(val));
+  nir_instr_remove(&instr->instr);
+  return;
+   }
+
   LOWER_REDUCTION(nir_op_fdot, nir_op_fmul, nir_op_fadd);
   LOWER_REDUCTION(nir_op_ball_fequal, nir_op_feq, nir_op_iand);
   LOWER_REDUCTION(nir_op_ball_iequal, nir_op_ieq, nir_op_iand);
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/6] i965/vec4: Add support for fdph_replicated

2015-09-22 Thread Jason Ekstrand
---
 src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 5 +
 1 file changed, 5 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
index 20c063d..c681ae4 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
@@ -1268,6 +1268,11 @@ vec4_visitor::nir_emit_alu(nir_alu_instr *instr)
   inst->saturate = instr->dest.saturate;
   break;
 
+   case nir_op_fdph_replicated:
+  inst = emit(BRW_OPCODE_DPH, dst, op[0], op[1]);
+  inst->saturate = instr->dest.saturate;
+  break;
+
case nir_op_bany2:
case nir_op_bany3:
case nir_op_bany4: {
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/6] nir: Add fdph and fdph_replicated opcodes

2015-09-22 Thread Jason Ekstrand
---
 src/glsl/nir/nir_lower_vec_to_movs.c | 3 ++-
 src/glsl/nir/nir_opcodes.py  | 5 +
 src/glsl/nir/nir_opt_algebraic.py| 1 +
 3 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/src/glsl/nir/nir_lower_vec_to_movs.c 
b/src/glsl/nir/nir_lower_vec_to_movs.c
index b7ee4e8..622e59c 100644
--- a/src/glsl/nir/nir_lower_vec_to_movs.c
+++ b/src/glsl/nir/nir_lower_vec_to_movs.c
@@ -89,7 +89,8 @@ has_replicated_dest(nir_alu_instr *alu)
 {
return alu->op == nir_op_fdot_replicated2 ||
   alu->op == nir_op_fdot_replicated3 ||
-  alu->op == nir_op_fdot_replicated4;
+  alu->op == nir_op_fdot_replicated4 ||
+  alu->op == nir_op_fdph_replicated;
 }
 
 /* Attempts to coalesce the "move" from the given source of the vec to the
diff --git a/src/glsl/nir/nir_opcodes.py b/src/glsl/nir/nir_opcodes.py
index 495d109..f2d584f 100644
--- a/src/glsl/nir/nir_opcodes.py
+++ b/src/glsl/nir/nir_opcodes.py
@@ -456,6 +456,11 @@ binop_reduce("fdot", 1, tfloat, tfloat, "{src0} * {src1}", 
"{src0} + {src1}",
 binop_reduce("fdot_replicated", 4, tfloat, tfloat,
  "{src0} * {src1}", "{src0} + {src1}", "{src}")
 
+opcode("fdph", 1, tfloat, [3, 4], [tfloat, tfloat], "",
+   "src0.x * src1.x + src0.y * src1.y + src0.z * src1.z + src1.w")
+opcode("fdph_replicated", 4, tfloat, [3, 4], [tfloat, tfloat], "",
+   "src0.x * src1.x + src0.y * src1.y + src0.z * src1.z + src1.w")
+
 binop("fmin", tfloat, "", "fminf(src0, src1)")
 binop("imin", tint, commutative + associative, "src1 > src0 ? src0 : src1")
 binop("umin", tunsigned, commutative + associative, "src1 > src0 ? src0 : 
src1")
diff --git a/src/glsl/nir/nir_opt_algebraic.py 
b/src/glsl/nir/nir_opt_algebraic.py
index 43558a5..585e5e0 100644
--- a/src/glsl/nir/nir_opt_algebraic.py
+++ b/src/glsl/nir/nir_opt_algebraic.py
@@ -244,6 +244,7 @@ late_optimizations = [
(('fdot2', a, b), ('fdot_replicated2', a, b), 'options->fdot_replicates'),
(('fdot3', a, b), ('fdot_replicated3', a, b), 'options->fdot_replicates'),
(('fdot4', a, b), ('fdot_replicated4', a, b), 'options->fdot_replicates'),
+   (('fdph', a, b), ('fdph_replicated', a, b), 'options->fdot_replicates'),
 ]
 
 print nir_algebraic.AlgebraicPass("nir_opt_algebraic", optimizations).render()
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 6/6] prog_to_nir: Use nir_op_dph

2015-09-22 Thread Jason Ekstrand
---
 src/mesa/program/prog_to_nir.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/src/mesa/program/prog_to_nir.c b/src/mesa/program/prog_to_nir.c
index ec61100..1bd735a 100644
--- a/src/mesa/program/prog_to_nir.c
+++ b/src/mesa/program/prog_to_nir.c
@@ -527,8 +527,7 @@ ptn_dp4(nir_builder *b, nir_alu_dest dest, nir_ssa_def 
**src)
 static void
 ptn_dph(nir_builder *b, nir_alu_dest dest, nir_ssa_def **src)
 {
-   nir_ssa_def *dp3 = nir_fdot3(b, src[0], src[1]);
-   ptn_move_dest(b, dest, nir_fadd(b, dp3, ptn_channel(b, src[1], W)));
+   ptn_move_dest(b, dest, nir_fdph(b, src[0], src[1]));
 }
 
 static void
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/6] nir/lower_alu_to_scalar: Use the builder

2015-09-22 Thread Jason Ekstrand
---
 src/glsl/nir/nir_lower_alu_to_scalar.c | 47 --
 1 file changed, 22 insertions(+), 25 deletions(-)

diff --git a/src/glsl/nir/nir_lower_alu_to_scalar.c 
b/src/glsl/nir/nir_lower_alu_to_scalar.c
index 710bb37..5ef5ec2 100644
--- a/src/glsl/nir/nir_lower_alu_to_scalar.c
+++ b/src/glsl/nir/nir_lower_alu_to_scalar.c
@@ -22,6 +22,7 @@
  */
 
 #include "nir.h"
+#include "nir_builder.h"
 
 /** @file nir_lower_alu_to_scalar.c
  *
@@ -38,13 +39,13 @@ nir_alu_ssa_dest_init(nir_alu_instr *instr, unsigned 
num_components)
 
 static void
 lower_reduction(nir_alu_instr *instr, nir_op chan_op, nir_op merge_op,
-void *mem_ctx)
+nir_builder *builder)
 {
unsigned num_components = nir_op_infos[instr->op].input_sizes[0];
 
nir_ssa_def *last = NULL;
for (unsigned i = 0; i < num_components; i++) {
-  nir_alu_instr *chan = nir_alu_instr_create(mem_ctx, chan_op);
+  nir_alu_instr *chan = nir_alu_instr_create(builder->shader, chan_op);
   nir_alu_ssa_dest_init(chan, 1);
   nir_alu_src_copy(&chan->src[0], &instr->src[0], chan);
   chan->src[0].swizzle[0] = chan->src[0].swizzle[i];
@@ -54,18 +55,13 @@ lower_reduction(nir_alu_instr *instr, nir_op chan_op, 
nir_op merge_op,
  chan->src[1].swizzle[0] = chan->src[1].swizzle[i];
   }
 
-  nir_instr_insert_before(&instr->instr, &chan->instr);
+  nir_builder_instr_insert(builder, &chan->instr);
 
   if (i == 0) {
  last = &chan->dest.dest.ssa;
   } else {
- nir_alu_instr *merge = nir_alu_instr_create(mem_ctx, merge_op);
- nir_alu_ssa_dest_init(merge, 1);
- merge->dest.write_mask = 1;
- merge->src[0].src = nir_src_for_ssa(last);
- merge->src[1].src = nir_src_for_ssa(&chan->dest.dest.ssa);
- nir_instr_insert_before(&instr->instr, &merge->instr);
- last = &merge->dest.dest.ssa;
+ last = nir_build_alu(builder, merge_op,
+  last, &chan->dest.dest.ssa, NULL, NULL);
   }
}
 
@@ -75,7 +71,7 @@ lower_reduction(nir_alu_instr *instr, nir_op chan_op, nir_op 
merge_op,
 }
 
 static void
-lower_alu_instr_scalar(nir_alu_instr *instr, void *mem_ctx)
+lower_alu_instr_scalar(nir_alu_instr *instr, nir_builder *b)
 {
unsigned num_src = nir_op_infos[instr->op].num_inputs;
unsigned i, chan;
@@ -83,11 +79,13 @@ lower_alu_instr_scalar(nir_alu_instr *instr, void *mem_ctx)
assert(instr->dest.dest.is_ssa);
assert(instr->dest.write_mask != 0);
 
+   b->cursor = nir_before_instr(&instr->instr);
+
 #define LOWER_REDUCTION(name, chan, merge) \
case name##2: \
case name##3: \
case name##4: \
-  lower_reduction(instr, chan, merge, mem_ctx); \
+  lower_reduction(instr, chan, merge, b); \
   break;
 
switch (instr->op) {
@@ -134,16 +132,13 @@ lower_alu_instr_scalar(nir_alu_instr *instr, void 
*mem_ctx)
   return;
 
unsigned num_components = instr->dest.dest.ssa.num_components;
-   static const nir_op nir_op_map[] = {nir_op_vec2, nir_op_vec3, nir_op_vec4};
-   nir_alu_instr *vec_instr =
-  nir_alu_instr_create(mem_ctx, nir_op_map[num_components - 2]);
-   nir_alu_ssa_dest_init(vec_instr, num_components);
+   nir_ssa_def *comps[] = { NULL, NULL, NULL, NULL };
 
for (chan = 0; chan < 4; chan++) {
   if (!(instr->dest.write_mask & (1 << chan)))
  continue;
 
-  nir_alu_instr *lower = nir_alu_instr_create(mem_ctx, instr->op);
+  nir_alu_instr *lower = nir_alu_instr_create(b->shader, instr->op);
   for (i = 0; i < num_src; i++) {
  /* We only handle same-size-as-dest (input_sizes[] == 0) or scalar
   * args (input_sizes[] == 1).
@@ -159,25 +154,24 @@ lower_alu_instr_scalar(nir_alu_instr *instr, void 
*mem_ctx)
 
   nir_alu_ssa_dest_init(lower, 1);
   lower->dest.saturate = instr->dest.saturate;
-  vec_instr->src[chan].src = nir_src_for_ssa(&lower->dest.dest.ssa);
+  comps[chan] = &lower->dest.dest.ssa;
 
-  nir_instr_insert_before(&instr->instr, &lower->instr);
+  nir_builder_instr_insert(b, &lower->instr);
}
 
-   nir_instr_insert_before(&instr->instr, &vec_instr->instr);
+   nir_ssa_def *vec = nir_vec(b, comps, num_components);
 
-   nir_ssa_def_rewrite_uses(&instr->dest.dest.ssa,
-nir_src_for_ssa(&vec_instr->dest.dest.ssa));
+   nir_ssa_def_rewrite_uses(&instr->dest.dest.ssa, nir_src_for_ssa(vec));
 
nir_instr_remove(&instr->instr);
 }
 
 static bool
-lower_alu_to_scalar_block(nir_block *block, void *data)
+lower_alu_to_scalar_block(nir_block *block, void *builder)
 {
nir_foreach_instr_safe(block, instr) {
   if (instr->type == nir_instr_type_alu)
- lower_alu_instr_scalar(nir_instr_as_alu(instr), data);
+ lower_alu_instr_scalar(nir_instr_as_alu(instr), builder);
}
 
return true;
@@ -186,7 +180,10 @@ lower_alu_to_scalar_block(nir_block *block, void *data)
 static void
 nir_lower_alu_to_scalar_impl(nir_functi

[Mesa-dev] [PATCH 2/6] nir/lower_alu_to_scalar: Return after lower_reduction

2015-09-22 Thread Jason Ekstrand
We don't use any of the code after the switch and, as far as I can tell, it
wouldn't work anyway.  The only reason this wasn't causing us problems is
that it's all dead and DCE cleans it up.
---
 src/glsl/nir/nir_lower_alu_to_scalar.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/glsl/nir/nir_lower_alu_to_scalar.c 
b/src/glsl/nir/nir_lower_alu_to_scalar.c
index 5ef5ec2..84d4943 100644
--- a/src/glsl/nir/nir_lower_alu_to_scalar.c
+++ b/src/glsl/nir/nir_lower_alu_to_scalar.c
@@ -86,7 +86,7 @@ lower_alu_instr_scalar(nir_alu_instr *instr, nir_builder *b)
case name##3: \
case name##4: \
   lower_reduction(instr, chan, merge, b); \
-  break;
+  return;
 
switch (instr->op) {
case nir_op_vec4:
-- 
2.5.0.400.gff86faf

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 92072] Wine breakage since d082c5324 (st/mesa: don't call st_validate_state in BlitFramebuffer)

2015-09-22 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=92072

--- Comment #4 from Ilia Mirkin  ---
FYI Marek has a fix: http://patchwork.freedesktop.org/patch/60146/

I tested it with this trace, and it seems to fix it.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] st/mesa: fix front buffer regression after dropping st_validate_state in Blit

2015-09-22 Thread Ilia Mirkin
On Tue, Sep 22, 2015 at 8:40 PM, Marek Olšák  wrote:
> From: Marek Olšák 
>
> Broken by: d082c5324914212f76e45be497229c7a0681f706
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92072
>
> Cc: mesa-sta...@lists.freedesktop.org

Tested-by: Ilia Mirkin 

Sorry, no idea what this defined business is, don't have time to
investigate it right now. Can you put a Cc: "10.6 11.0" tag on there
in case Emil should make another 10.6 release?

Thanks for tracking this down!

> ---
>  src/mesa/state_tracker/st_cb_blit.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/src/mesa/state_tracker/st_cb_blit.c 
> b/src/mesa/state_tracker/st_cb_blit.c
> index 4fdef7f..59c4dcc 100644
> --- a/src/mesa/state_tracker/st_cb_blit.c
> +++ b/src/mesa/state_tracker/st_cb_blit.c
> @@ -229,6 +229,7 @@ st_BlitFramebuffer(struct gl_context *ctx,
>st_adjust_blit_for_msaa_resolve(&blit);
>
>st->pipe->blit(st->pipe, &blit);
> +  dstRb->defined = true;
> }
>  }
>   }
> @@ -266,6 +267,7 @@ st_BlitFramebuffer(struct gl_context *ctx,
>st_adjust_blit_for_msaa_resolve(&blit);
>
>st->pipe->blit(st->pipe, &blit);
> +  dstRb->defined = true;
> }
>  }
>   }
> --
> 2.1.4
>
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] st/mesa: fix front buffer regression after dropping st_validate_state in Blit

2015-09-22 Thread Marek Olšák
From: Marek Olšák 

Broken by: d082c5324914212f76e45be497229c7a0681f706
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92072

Cc: mesa-sta...@lists.freedesktop.org
---
 src/mesa/state_tracker/st_cb_blit.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/mesa/state_tracker/st_cb_blit.c 
b/src/mesa/state_tracker/st_cb_blit.c
index 4fdef7f..59c4dcc 100644
--- a/src/mesa/state_tracker/st_cb_blit.c
+++ b/src/mesa/state_tracker/st_cb_blit.c
@@ -229,6 +229,7 @@ st_BlitFramebuffer(struct gl_context *ctx,
   st_adjust_blit_for_msaa_resolve(&blit);
 
   st->pipe->blit(st->pipe, &blit);
+  dstRb->defined = true;
}
 }
  }
@@ -266,6 +267,7 @@ st_BlitFramebuffer(struct gl_context *ctx,
   st_adjust_blit_for_msaa_resolve(&blit);
 
   st->pipe->blit(st->pipe, &blit);
+  dstRb->defined = true;
}
 }
  }
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v5 02/70] i965: Use 16-byte offset alignment for shader storage buffers

2015-09-22 Thread Ben Widawsky
On Tue, Sep 15, 2015 at 11:37:22AM -0700, Kristian Høgsberg wrote:
> On Thu, Sep 10, 2015 at 03:35:18PM +0200, Iago Toral Quiroga wrote:
> > This is the same we do for other things like uniforms because it ensures
> > optimal performance.
> > 
> > Reviewed-by: Jordan Justen 
> > ---
> >  src/mesa/drivers/dri/i965/brw_context.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/src/mesa/drivers/dri/i965/brw_context.c 
> > b/src/mesa/drivers/dri/i965/brw_context.c
> > index 907b2a0..c8e8a68 100644
> > --- a/src/mesa/drivers/dri/i965/brw_context.c
> > +++ b/src/mesa/drivers/dri/i965/brw_context.c
> > @@ -558,6 +558,7 @@ brw_initialize_context_constants(struct brw_context 
> > *brw)
> >  * However, unaligned accesses are slower, so enforce buffer alignment.
> >  */
> > ctx->Const.UniformBufferOffsetAlignment = 16;
> > +   ctx->Const.ShaderStorageBufferOffsetAlignment = 16;
> 
> This should be a cacheline (64 bytes) so that we can safely have the
> CPU and GPU writing the same SSBO on non-cachecoherent systems (our
> Atom CPUs).  With UBOs, the GPU never writes, so there's no
> problem. For an SSBO, the GPU and the CPU can be updating disjoint
> regions of the buffer simultaneously and that will break if the
> regions overlap the same cacheline.
> 
> With that change,
> 
> Reviewed-by: Kristian Høgsberg 
> 

I know nothing about SSBOs...

I agree with Kristian assuming something in the API disallows simultaneous
access to the same region from CPU and GPU - in which case, cacheline alignment
doesn't buy anything. Also, when you make the change, please add a comment about
why its 64.

> > ctx->Const.TextureBufferOffsetAlignment = 16;
> > ctx->Const.MaxTextureBufferSize = 128 * 1024 * 1024;
> >  
> > -- 
> > 1.9.1
> > 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/4] i965/cs: Implement DispatchComputeIndirect support

2015-09-22 Thread Matt Turner
On Sat, Sep 19, 2015 at 3:50 PM, Jordan Justen
 wrote:
> Signed-off-by: Jordan Justen 
> ---
>  src/mesa/drivers/dri/i965/brw_compute.c | 57 
> ++---
>  src/mesa/drivers/dri/i965/brw_defines.h |  2 ++
>  src/mesa/drivers/dri/i965/intel_reg.h   |  5 +++
>  3 files changed, 60 insertions(+), 4 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_compute.c 
> b/src/mesa/drivers/dri/i965/brw_compute.c
> index 5693ab5..5641823 100644
> --- a/src/mesa/drivers/dri/i965/brw_compute.c
> +++ b/src/mesa/drivers/dri/i965/brw_compute.c
> @@ -31,14 +31,46 @@
>  #include "brw_draw.h"
>  #include "brw_state.h"
>  #include "intel_batchbuffer.h"
> +#include "intel_buffer_objects.h"
>  #include "brw_defines.h"
>
>
>  static void
> -brw_emit_gpgpu_walker(struct brw_context *brw, const GLuint *num_groups)
> +brw_emit_gpgpu_walker(struct brw_context *brw,
> +  const void *compute_param,
> +  bool indirect)
>  {
> const struct brw_cs_prog_data *prog_data = brw->cs.prog_data;
>
> +   const GLuint *num_groups;
> +   uint32_t indirect_flag;
> +
> +   if (!indirect) {
> +  num_groups = (const GLuint *)compute_param;
> +  indirect_flag = 0;
> +   } else {
> +  GLintptr indirect_offset = *(GLintptr*)compute_param;

"GLintptr *" (space between type and *)
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 4/4] i965/cs: Implement DispatchComputeIndirect support

2015-09-22 Thread Kristian Høgsberg
On Sat, Sep 19, 2015 at 03:50:49PM -0700, Jordan Justen wrote:
> Signed-off-by: Jordan Justen 
> ---
>  src/mesa/drivers/dri/i965/brw_compute.c | 57 
> ++---
>  src/mesa/drivers/dri/i965/brw_defines.h |  2 ++
>  src/mesa/drivers/dri/i965/intel_reg.h   |  5 +++
>  3 files changed, 60 insertions(+), 4 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_compute.c 
> b/src/mesa/drivers/dri/i965/brw_compute.c
> index 5693ab5..5641823 100644
> --- a/src/mesa/drivers/dri/i965/brw_compute.c
> +++ b/src/mesa/drivers/dri/i965/brw_compute.c
> @@ -31,14 +31,46 @@
>  #include "brw_draw.h"
>  #include "brw_state.h"
>  #include "intel_batchbuffer.h"
> +#include "intel_buffer_objects.h"
>  #include "brw_defines.h"
>  
>  
>  static void
> -brw_emit_gpgpu_walker(struct brw_context *brw, const GLuint *num_groups)
> +brw_emit_gpgpu_walker(struct brw_context *brw,
> +  const void *compute_param,
> +  bool indirect)
>  {
> const struct brw_cs_prog_data *prog_data = brw->cs.prog_data;
>  
> +   const GLuint *num_groups;
> +   uint32_t indirect_flag;
> +
> +   if (!indirect) {
> +  num_groups = (const GLuint *)compute_param;
> +  indirect_flag = 0;
> +   } else {
> +  GLintptr indirect_offset = *(GLintptr*)compute_param;

I would call this as

   brw_dispatch_compute_common(ctx, indirect, true);

from brw_dispatch_compute_indirect() instead of passing the address of
indirect and then just say

  GLintptr indirect_offset = (GLintptr)compute_param;

here.  GLintptr is sized so that that's guaranteed to work.

With that, series

Reviewed-by: Kristian Høgsberg 

> +  static const GLuint indirect_group_counts[3] = { 0, 0, 0 };
> +  num_groups = indirect_group_counts;
> +
> +  struct gl_buffer_object *indirect_buffer = 
> brw->ctx.DispatchIndirectBuffer;
> +  drm_intel_bo *bo = intel_bufferobj_buffer(brw,
> +intel_buffer_object(indirect_buffer),
> +indirect_offset, 3 * sizeof(GLuint));
> +
> +  indirect_flag = GEN7_GPGPU_INDIRECT_PARAMETER_ENABLE;
> +
> +  brw_load_register_mem(brw, GEN7_GPGPU_DISPATCHDIMX, bo,
> +I915_GEM_DOMAIN_VERTEX, 0,
> +indirect_offset + 0);
> +  brw_load_register_mem(brw, GEN7_GPGPU_DISPATCHDIMY, bo,
> +I915_GEM_DOMAIN_VERTEX, 0,
> +indirect_offset + 4);
> +  brw_load_register_mem(brw, GEN7_GPGPU_DISPATCHDIMZ, bo,
> +I915_GEM_DOMAIN_VERTEX, 0,
> +indirect_offset + 8);
> +   }
> +
> const unsigned simd_size = prog_data->simd_size;
> unsigned group_size = prog_data->local_size[0] *
>prog_data->local_size[1] * prog_data->local_size[2];
> @@ -52,7 +84,7 @@ brw_emit_gpgpu_walker(struct brw_context *brw, const GLuint 
> *num_groups)
>  
> uint32_t dwords = brw->gen < 8 ? 11 : 15;
> BEGIN_BATCH(dwords);
> -   OUT_BATCH(GPGPU_WALKER << 16 | (dwords - 2));
> +   OUT_BATCH(GPGPU_WALKER << 16 | (dwords - 2) | indirect_flag);
> OUT_BATCH(0);
> if (brw->gen >= 8) {
>OUT_BATCH(0); /* Indirect Data Length */
> @@ -83,7 +115,9 @@ brw_emit_gpgpu_walker(struct brw_context *brw, const 
> GLuint *num_groups)
>  
>  
>  static void
> -brw_dispatch_compute(struct gl_context *ctx, const GLuint *num_groups)
> +brw_dispatch_compute_common(struct gl_context *ctx,
> +const void *compute_param,
> +bool indirect)
>  {
> struct brw_context *brw = brw_context(ctx);
> int estimated_buffer_space_needed;
> @@ -117,7 +151,7 @@ brw_dispatch_compute(struct gl_context *ctx, const GLuint 
> *num_groups)
> brw->no_batch_wrap = true;
> brw_upload_compute_state(brw);
>  
> -   brw_emit_gpgpu_walker(brw, num_groups);
> +   brw_emit_gpgpu_walker(brw, compute_param, indirect);
>  
> brw->no_batch_wrap = false;
>  
> @@ -155,9 +189,24 @@ brw_dispatch_compute(struct gl_context *ctx, const 
> GLuint *num_groups)
>  */
>  }
>  
> +static void
> +brw_dispatch_compute(struct gl_context *ctx, const GLuint *num_groups) {
> +   brw_dispatch_compute_common(ctx,
> +   num_groups,
> +   false);
> +}
> +
> +static void
> +brw_dispatch_compute_indirect(struct gl_context *ctx, GLintptr indirect)
> +{
> +   brw_dispatch_compute_common(ctx,
> +   &indirect,
> +   true);
> +}
>  
>  void
>  brw_init_compute_functions(struct dd_function_table *functions)
>  {
> functions->DispatchCompute = brw_dispatch_compute;
> +   functions->DispatchComputeIndirect = brw_dispatch_compute_indirect;
>  }
> diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
> b/src/mesa/drivers/dri/i965/brw_defines.h
> index 8fc8ceb..2de51d0 100644
> --- a/src/mesa/drivers/dri/i965/brw_defines.h
> +++ b/src/mesa/drivers/dri/i965/brw_defi

Re: [Mesa-dev] [PATCH 12/12] i965/skl+: Enable support for 16x multisampling

2015-09-22 Thread Ben Widawsky
On Thu, Sep 17, 2015 at 05:00:14PM +0100, Neil Roberts wrote:
> ---
>  src/mesa/drivers/dri/i965/brw_context.c  | 6 ++
>  src/mesa/drivers/dri/i965/intel_screen.c | 5 -
>  2 files changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_context.c 
> b/src/mesa/drivers/dri/i965/brw_context.c
> index 7c1c133..c05fb74 100644
> --- a/src/mesa/drivers/dri/i965/brw_context.c
> +++ b/src/mesa/drivers/dri/i965/brw_context.c
> @@ -83,6 +83,12 @@ brw_query_samples_for_format(struct gl_context *ctx, 
> GLenum target,
>  
> switch (brw->gen) {
> case 9:
> +  samples[0] = 16;
> +  samples[1] = 8;
> +  samples[2] = 4;
> +  samples[3] = 2;
> +  return 4;
> +
> case 8:
>samples[0] = 8;
>samples[1] = 4;
> diff --git a/src/mesa/drivers/dri/i965/intel_screen.c 
> b/src/mesa/drivers/dri/i965/intel_screen.c
> index 1783835..f971797 100644
> --- a/src/mesa/drivers/dri/i965/intel_screen.c
> +++ b/src/mesa/drivers/dri/i965/intel_screen.c
> @@ -1178,12 +1178,15 @@ intel_detect_timestamp(struct intel_screen *screen)
>  const int*
>  intel_supported_msaa_modes(const struct intel_screen  *screen)
>  {
> +   static const int gen9_modes[] = {16, 8, 4, 2, 0, -1};
> static const int gen8_modes[] = {8, 4, 2, 0, -1};
> static const int gen7_modes[] = {8, 4, 0, -1};
> static const int gen6_modes[] = {4, 0, -1};
> static const int gen4_modes[] = {0, -1};
>  
> -   if (screen->devinfo->gen >= 8) {
> +   if (screen->devinfo->gen >= 9) {
> +  return gen9_modes;
> +   } else if (screen->devinfo->gen >= 8) {
>return gen8_modes;
> } else if (screen->devinfo->gen >= 7) {
>return gen7_modes;

I think you also should add a case in get_fast_clear_rect() (even though we
don't have fast clears turned on yet). It's in my fast clear branch I believe,
but it makes more sense in this series.

Reviewed-by: Ben Widawsky 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 11/12] meta: Support 16x MSAA in the multisample scaled blit shader

2015-09-22 Thread Ben Widawsky
On Thu, Sep 17, 2015 at 05:00:13PM +0100, Neil Roberts wrote:
> I'm not too sure about the expression used to index into sample_map in
> the shader. It looks like if fract(coord.x) and fract(coord.y) are
> close to 1.0 then it would index outside of the array. However the
> code for 4 and 8 has the same problem and the results seems to look
> reasonable. It might make more sense to change it to something like
> this:
> 
>  sample_map[int(4 * fract(coord.x)) + 4 * int(fract(coord.y) * 4)]
> ---
>  src/mesa/drivers/common/meta.h |  2 ++
>  src/mesa/drivers/common/meta_blit.c|  8 ++--
>  src/mesa/drivers/dri/i965/gen6_multisample_state.c | 14 ++
>  src/mesa/main/mtypes.h | 15 ++-
>  4 files changed, 36 insertions(+), 3 deletions(-)
> 
> diff --git a/src/mesa/drivers/common/meta.h b/src/mesa/drivers/common/meta.h
> index fe43915..e42ddc8 100644
> --- a/src/mesa/drivers/common/meta.h
> +++ b/src/mesa/drivers/common/meta.h
> @@ -285,9 +285,11 @@ enum blit_msaa_shader {
> BLIT_2X_MSAA_SHADER_2D_MULTISAMPLE_SCALED_RESOLVE,
> BLIT_4X_MSAA_SHADER_2D_MULTISAMPLE_SCALED_RESOLVE,
> BLIT_8X_MSAA_SHADER_2D_MULTISAMPLE_SCALED_RESOLVE,
> +   BLIT_16X_MSAA_SHADER_2D_MULTISAMPLE_SCALED_RESOLVE,
> BLIT_2X_MSAA_SHADER_2D_MULTISAMPLE_ARRAY_SCALED_RESOLVE,
> BLIT_4X_MSAA_SHADER_2D_MULTISAMPLE_ARRAY_SCALED_RESOLVE,
> BLIT_8X_MSAA_SHADER_2D_MULTISAMPLE_ARRAY_SCALED_RESOLVE,
> +   BLIT_16X_MSAA_SHADER_2D_MULTISAMPLE_ARRAY_SCALED_RESOLVE,
> BLIT_MSAA_SHADER_COUNT,
>  };
>  
> diff --git a/src/mesa/drivers/common/meta_blit.c 
> b/src/mesa/drivers/common/meta_blit.c
> index a41fe42..b545c37 100644
> --- a/src/mesa/drivers/common/meta_blit.c
> +++ b/src/mesa/drivers/common/meta_blit.c
> @@ -86,8 +86,8 @@ setup_glsl_msaa_blit_scaled_shader(struct gl_context *ctx,
> while (samples >> (shader_offset + 1)) {
>shader_offset++;
> }
> -   /* Update the assert if we plan to support more than 8X MSAA. */
> -   assert(shader_offset > 0 && shader_offset < 4);
> +   /* Update the assert if we plan to support more than 16X MSAA. */
> +   assert(shader_offset > 0 && shader_offset <= 4);
>  
> assert(target == GL_TEXTURE_2D_MULTISAMPLE ||
>target == GL_TEXTURE_2D_MULTISAMPLE_ARRAY);
> @@ -132,6 +132,10 @@ setup_glsl_msaa_blit_scaled_shader(struct gl_context 
> *ctx,
>sample_number =  "sample_map[int(2 * fract(coord.x) + 8 * 
> fract(coord.y))]";
>sample_map = ctx->Const.SampleMap8x;
>break;
> +   case 16:
> +  sample_number =  "sample_map[int(4 * fract(coord.x) + 16 * 
> fract(coord.y))]";
> +  sample_map = ctx->Const.SampleMap16x;
> +  break;
> default:
>sample_number = NULL;
>sample_map = NULL;
> diff --git a/src/mesa/drivers/dri/i965/gen6_multisample_state.c 
> b/src/mesa/drivers/dri/i965/gen6_multisample_state.c
> index 49c6eba..8eb620d 100644
> --- a/src/mesa/drivers/dri/i965/gen6_multisample_state.c
> +++ b/src/mesa/drivers/dri/i965/gen6_multisample_state.c
> @@ -91,6 +91,17 @@ gen6_get_sample_position(struct gl_context *ctx,
>   *   | 6 | 7 |  | 7 | 1 |
>   *   -  -
>   *
> + * 16X MSAA sample index layout  16x MSAA sample number layout
> + * --
> + * | 0 | 1 | 2 | 3 ||15 |10 | 9 | 7 |
> + * --
> + * | 4 | 5 | 6 | 7 || 4 | 1 | 3 |13 |
> + * --
> + * | 8 | 9 |10 |11 ||12 | 2 | 0 | 6 |
> + * --
> + * |12 |13 |14 |15 ||11 | 8 | 5 |14 |
> + * --
> + *
>   * A sample map is used to map sample indices to sample numbers.
>   */
>  void
> @@ -99,10 +110,13 @@ gen6_set_sample_maps(struct gl_context *ctx)
> uint8_t map_2x[2] = {0, 1};
> uint8_t map_4x[4] = {0, 1, 2, 3};
> uint8_t map_8x[8] = {5, 2, 4, 6, 0, 3, 7, 1};
> +   uint8_t map_16x[16] = { 15, 10, 9, 7, 4, 1, 3, 13,
> +   12, 2, 0, 6, 11, 8, 5, 14 };
>  
> memcpy(ctx->Const.SampleMap2x, map_2x, sizeof(map_2x));
> memcpy(ctx->Const.SampleMap4x, map_4x, sizeof(map_4x));
> memcpy(ctx->Const.SampleMap8x, map_8x, sizeof(map_8x));
> +   memcpy(ctx->Const.SampleMap16x, map_16x, sizeof(map_16x));
>  }
>  
>  /**
> diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
> index fac45aa..e0a06f4 100644
> --- a/src/mesa/main/mtypes.h
> +++ b/src/mesa/main/mtypes.h
> @@ -3550,11 +3550,24 @@ struct gl_constants
>  * below:
>  *SampleMap8x = {a, b, c, d, e, f, g, h};
>  *
> -* Follow the logic for other sample counts.
> +* Follow the logic for sample counts 2-8.
> +*
> +* For 16x the sample indices layout as a 4x4 grid

Re: [Mesa-dev] [PATCH 0/3] i965: Delete all of the non-NIR vec4 code

2015-09-22 Thread Jason Ekstrand
On Tue, Sep 22, 2015 at 4:29 PM, Matt Turner  wrote:
> On Mon, Sep 21, 2015 at 7:22 PM, Jason Ekstrand  wrote:
>> On Mon, Sep 21, 2015 at 6:15 PM, Jason Ekstrand  wrote:
>>>
>>> On Sep 21, 2015 5:45 PM, "Matt Turner"  wrote:

 On Mon, Sep 21, 2015 at 3:18 PM, Jason Ekstrand 
 wrote:
 > At this point, piglit is the same as for GLSL and the shader-db numbers
 > are
 > looking pretty good.  On SNB, GLSL vs. NIR for vec4 programs is:
 >
 >total instructions in shared programs: 2020573 -> 1822601 (-9.80%)
 >instructions in affected programs: 1883334 -> 1685362 (-10.51%)
 >helped:13328
 >HURT:  3594
 >
 > and there are patches on the list that improve this to
 >
 >total instructions in shared programs: 2020283 -> 1805487 (-10.63%)
 >instructions in affected programs: 1855759 -> 1640963 (-11.57%)
 >helped:14142
 >HURT:  2346

 Wow, that's great. I didn't realize we were that close.

 That said, I don't feel like we're /quite/ ready for this (especially
 with outstanding optimization patches on the list). I'm not sure what
 patches are pending.
>>>
>>> Only two: the one you sent today and Alejandro's patch to make copy
>>> propagation less type-sensitive.
>>>
 Some things I've seen in digging through hurt programs today:

 portal-2/high/5134 emits:

 vec1 ssa_53 = flog2 ssa_52
 vec1 ssa_54 = flog2 ssa_52.y
 vec1 ssa_55 = flog2 ssa_52.z
 vec4 ssa_56 = vec4 ssa_53, ssa_54, ssa_55, ssa_42.w
 vec3 ssa_57 = fmul ssa_56, ssa_3
 vec1 ssa_58 = fexp2 ssa_57
 vec1 ssa_59 = fexp2 ssa_57.y
 vec1 ssa_60 = fexp2 ssa_57.z
 vec4 ssa_61 = vec4 ssa_58, ssa_59, ssa_60, ssa_42.w

 which we didn't transform into a vec3 pow with or without NIR but we
 really should. Why isn't NIR able to handle this?
>>
>> Ken and I were talking about this today.  What it comes down to is
>> that no one has written the pass yet.  We haven't done that many
>> vector optimizations to date.
>
> Yeah, I'm not concerned about this. We weren't doing it before either,
> so it's not a regression.

Yeah, writing a vectorizor is something that should probably be done
but isn't any more urgent than any other "make vec4 better" thing.

 (also, why isn't
 ".x" printed when the use of an ssa value scalar, e.g., in the
 assignment of ssa_58 the RHS should use ssa_57.x).
>>
>> It doesn't print the identity swizzle.
>
> Patch sent.
>
 We generate worse code for all_equal/any_nequal/any.
>>
>> Yes, we should fix that.  Suggestions/patches welcome, I don't have
>> any hot ideas at the moment.
>>
 book-of-unwritten-tales/original/vp-33 (a vertex program) emits uses
 DPH and NIR doesn't have DPH. NIR should probably grow a DPH
 instruction even if we don't have an optimization to recognize
 open-coded DPH.
>>
>> We could detect fdot(vec4(a.x, a.y, a.z, 1)) in the backend if we
>> really wanted to.  The long-term solution is probably to add swizzle
>> support to nir_search but that's going to be a real bear.  How would
>> having the nir_op_dph instruction help if we can't recognize it?
>
> Because the ARB vertex program language and the Mesa IR that we
> translate from both have DPH already. We're just throwing it away
> because NIR doesn't have DPH.

Right.  Should be easy enough to add.  I can do that if you'd like or
you can; I don't care.

 Lots of things hurt because of lack of global copy/constant
 propagation. I think NIR often emits the constant loads in blocks
 earlier than their uses and the backend optimizations aren't able to
 cope. See team-fortress-2/2197 for example (search for 953267991D, the
 hex value for 0.0001F).
>>
>> Hrm...  One option would be to copy-prop load_const in emit_alu.  This
>> should be easy enough to do if we detect that it's a 2-src and one
>> source is an immediate.  We could also do global copy-prop but I don't
>> know how hard that is.
>>
 I remember this issue from the FS/NIR backend as well, but dota-2/504
 (and others) emit:

 mad(8)  g16<1>.xF  g11<4,4,1>.xF  g12<4,4,1>.xF  g2<4,4,1>.xF
 mad(8)  g19<1>.xF  g10<4,4,1>.xF  g12<4,4,1>.xF  g2<4,4,1>.xF
 mad(8)  g22<1>.xF  g9<4,4,1>.xF   g12<4,4,1>.xF  g2<4,4,1>.xF
 mad(8)  g25<1>.xF  g8<4,4,1>.xF   g12<4,4,1>.xF  g2<4,4,1>.xF
 mad(8)  g28<1>.xF  g7<4,4,1>.xF   g12<4,4,1>.xF  g2<4,4,1>.xF
 mad(8)  g31<1>.xF  g6<4,4,1>.xF   g12<4,4,1>.xF  g2<4,4,1>.xF

 where the multiplication is duplicated. I can't remember what we decided.
>>
>> If I remember correctly, it came down to "optimization is hard" and we
>> said "good enough" about our current heuristics.
>
> I dug out the old threads, but all I found w

Re: [Mesa-dev] [PATCH 10/12] i965/meta: Support 16x MSAA in the meta stencil blit

2015-09-22 Thread Ben Widawsky
On Thu, Sep 17, 2015 at 05:00:12PM +0100, Neil Roberts wrote:
> The destination rectangle is now drawn at 4x4 the size and the shader
> code to calculate the sample number is adjusted accordingly.
> ---
>  src/mesa/drivers/dri/i965/brw_meta_stencil_blit.c | 22 +-
>  1 file changed, 17 insertions(+), 5 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_meta_stencil_blit.c 
> b/src/mesa/drivers/dri/i965/brw_meta_stencil_blit.c
> index cbbb919..4e9aa94 100644
> --- a/src/mesa/drivers/dri/i965/brw_meta_stencil_blit.c
> +++ b/src/mesa/drivers/dri/i965/brw_meta_stencil_blit.c
> @@ -163,6 +163,13 @@ static const char *fs_tmpl =
> "  txl_coords.x = ((X & int(0xfff8)) >> 2) | (X & int(0x1));\n"
> "  txl_coords.y = ((Y & int(0xfffc)) >> 1) | (Y & int(0x1));\n"
> "  sample_index = (X & 0x4) | (Y & 0x2) | ((X & 0x2) >> 1);\n"
> +   "  break;\n"
> +   "   case 16:\n"
> +   "  txl_coords.x = ((X & int(0xfff8)) >> 2) | (X & int(0x1));\n"
> +   "  txl_coords.y = ((Y & int(0xfff8)) >> 2) | (Y & int(0x1));\n"
> +   "  sample_index = (((Y & 0x4) << 1) | (X & 0x4) | (Y & 0x2) |\n"
> +   "  ((X & 0x2) >> 1));\n"
> +   "  break;\n"
> "   }\n"
> "}\n"
> "\n"
> @@ -313,11 +320,16 @@ adjust_msaa(struct blit_dims *dims, int num_samples)
>dims->dst_x0 *= 2;
>dims->dst_x1 *= 2;
> } else if (num_samples) {
> -  const int x_num_samples = num_samples / 2;
> -  dims->dst_x0 = ROUND_DOWN_TO(dims->dst_x0 * x_num_samples, 
> num_samples);
> -  dims->dst_y0 = ROUND_DOWN_TO(dims->dst_y0 * 2, 4);
> -  dims->dst_x1 = ALIGN(dims->dst_x1 * x_num_samples, num_samples);
> -  dims->dst_y1 = ALIGN(dims->dst_y1 * 2, 4);
> +  const int y_num_samples = num_samples >= 16 ? 4 : 2;
> +  const int x_num_samples = num_samples / y_num_samples;
> +  dims->dst_x0 = ROUND_DOWN_TO(dims->dst_x0 * x_num_samples,
> +   x_num_samples * 2);
> +  dims->dst_y0 = ROUND_DOWN_TO(dims->dst_y0 * y_num_samples,
> +   y_num_samples * 2);
> +  dims->dst_x1 = ALIGN(dims->dst_x1 * x_num_samples,
> +   x_num_samples * 2);
> +  dims->dst_y1 = ALIGN(dims->dst_y1 * y_num_samples,
> +   y_num_samples * 2);
> }
>  }
>  

I am in the middle of implementing stencil blits for gen9+ with stencil_export.
I think this goes away if we do that, right (since only MSAA 16x is gen9+ only)?

BTW, I don't actually care to understand what's going on here, hopefully my
acked-by is good enough...
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/3] i965: Delete all of the non-NIR vec4 code

2015-09-22 Thread Matt Turner
On Mon, Sep 21, 2015 at 7:22 PM, Jason Ekstrand  wrote:
> On Mon, Sep 21, 2015 at 6:15 PM, Jason Ekstrand  wrote:
>>
>> On Sep 21, 2015 5:45 PM, "Matt Turner"  wrote:
>>>
>>> On Mon, Sep 21, 2015 at 3:18 PM, Jason Ekstrand 
>>> wrote:
>>> > At this point, piglit is the same as for GLSL and the shader-db numbers
>>> > are
>>> > looking pretty good.  On SNB, GLSL vs. NIR for vec4 programs is:
>>> >
>>> >total instructions in shared programs: 2020573 -> 1822601 (-9.80%)
>>> >instructions in affected programs: 1883334 -> 1685362 (-10.51%)
>>> >helped:13328
>>> >HURT:  3594
>>> >
>>> > and there are patches on the list that improve this to
>>> >
>>> >total instructions in shared programs: 2020283 -> 1805487 (-10.63%)
>>> >instructions in affected programs: 1855759 -> 1640963 (-11.57%)
>>> >helped:14142
>>> >HURT:  2346
>>>
>>> Wow, that's great. I didn't realize we were that close.
>>>
>>> That said, I don't feel like we're /quite/ ready for this (especially
>>> with outstanding optimization patches on the list). I'm not sure what
>>> patches are pending.
>>
>> Only two: the one you sent today and Alejandro's patch to make copy
>> propagation less type-sensitive.
>>
>>> Some things I've seen in digging through hurt programs today:
>>>
>>> portal-2/high/5134 emits:
>>>
>>> vec1 ssa_53 = flog2 ssa_52
>>> vec1 ssa_54 = flog2 ssa_52.y
>>> vec1 ssa_55 = flog2 ssa_52.z
>>> vec4 ssa_56 = vec4 ssa_53, ssa_54, ssa_55, ssa_42.w
>>> vec3 ssa_57 = fmul ssa_56, ssa_3
>>> vec1 ssa_58 = fexp2 ssa_57
>>> vec1 ssa_59 = fexp2 ssa_57.y
>>> vec1 ssa_60 = fexp2 ssa_57.z
>>> vec4 ssa_61 = vec4 ssa_58, ssa_59, ssa_60, ssa_42.w
>>>
>>> which we didn't transform into a vec3 pow with or without NIR but we
>>> really should. Why isn't NIR able to handle this?
>
> Ken and I were talking about this today.  What it comes down to is
> that no one has written the pass yet.  We haven't done that many
> vector optimizations to date.

Yeah, I'm not concerned about this. We weren't doing it before either,
so it's not a regression.

>>> (also, why isn't
>>> ".x" printed when the use of an ssa value scalar, e.g., in the
>>> assignment of ssa_58 the RHS should use ssa_57.x).
>
> It doesn't print the identity swizzle.

Patch sent.

>>> We generate worse code for all_equal/any_nequal/any.
>
> Yes, we should fix that.  Suggestions/patches welcome, I don't have
> any hot ideas at the moment.
>
>>> book-of-unwritten-tales/original/vp-33 (a vertex program) emits uses
>>> DPH and NIR doesn't have DPH. NIR should probably grow a DPH
>>> instruction even if we don't have an optimization to recognize
>>> open-coded DPH.
>
> We could detect fdot(vec4(a.x, a.y, a.z, 1)) in the backend if we
> really wanted to.  The long-term solution is probably to add swizzle
> support to nir_search but that's going to be a real bear.  How would
> having the nir_op_dph instruction help if we can't recognize it?

Because the ARB vertex program language and the Mesa IR that we
translate from both have DPH already. We're just throwing it away
because NIR doesn't have DPH.

>>> Lots of things hurt because of lack of global copy/constant
>>> propagation. I think NIR often emits the constant loads in blocks
>>> earlier than their uses and the backend optimizations aren't able to
>>> cope. See team-fortress-2/2197 for example (search for 953267991D, the
>>> hex value for 0.0001F).
>
> Hrm...  One option would be to copy-prop load_const in emit_alu.  This
> should be easy enough to do if we detect that it's a 2-src and one
> source is an immediate.  We could also do global copy-prop but I don't
> know how hard that is.
>
>>> I remember this issue from the FS/NIR backend as well, but dota-2/504
>>> (and others) emit:
>>>
>>> mad(8)  g16<1>.xF  g11<4,4,1>.xF  g12<4,4,1>.xF  g2<4,4,1>.xF
>>> mad(8)  g19<1>.xF  g10<4,4,1>.xF  g12<4,4,1>.xF  g2<4,4,1>.xF
>>> mad(8)  g22<1>.xF  g9<4,4,1>.xF   g12<4,4,1>.xF  g2<4,4,1>.xF
>>> mad(8)  g25<1>.xF  g8<4,4,1>.xF   g12<4,4,1>.xF  g2<4,4,1>.xF
>>> mad(8)  g28<1>.xF  g7<4,4,1>.xF   g12<4,4,1>.xF  g2<4,4,1>.xF
>>> mad(8)  g31<1>.xF  g6<4,4,1>.xF   g12<4,4,1>.xF  g2<4,4,1>.xF
>>>
>>> where the multiplication is duplicated. I can't remember what we decided.
>
> If I remember correctly, it came down to "optimization is hard" and we
> said "good enough" about our current heuristics.

I dug out the old threads, but all I found was a snarky reply.

Patch sent.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC] nir/opt_peephole_ffma: Bypass fusion if any operand of fadd and fmul is a const

2015-09-22 Thread Jason Ekstrand
On Tue, Sep 22, 2015 at 4:04 PM, Matt Turner  wrote:
> On Fri, Sep 18, 2015 at 12:49 AM, Eduardo Lima Mitev  wrote:
>> When both fadd and fmul instructions have at least one operand that is a
>> constant and it is only used once, the total number of instructions can
>> be reduced from 3 (1 ffma + 2 load_const) to 2 (1 fmul + 1 fadd); because
>> the constants will be progagated as immediate operands of fmul and fadd.
>>
>> This patch modifies opt_peephole_ffma pass to detect this situation and
>> bails-out fusing fmul+fadd into ffma.
>>
>> As shown in shader-db results below, it seems to help a good bunch. However,
>> there are some caveats:
>>
>> * It seems i965 specific, so I'm not sure if modifying the NIR pass
>> directly is desired, as opposed to moving this to the backend.
>>
>> * There are still a high number of HURTs, but these could be reduced by being
>> more specific in the conditions to bailout.
>>
>> total instructions in shared programs: 1683959 -> 1677447 (-0.39%)
>> instructions in affected programs: 604918 -> 598406 (-1.08%)
>> helped:4633
>> HURT:  804
>> GAINED:0
>> LOST:  0
>> ---
>
> Interesting -- yeah, I've thought about doing this as well. It was
> more difficult before because with GLSL IR (where I was trying to do
> it) it wasn't possible to determine if the constant was used by
> multiple 3-src instructions. Actually, your check might be able to be
> more refined to consider only uses of 3-src instructions.
>
> But that's getting kind of hardware-specific.

If we want to move this into the i965 driver we can.  I think we're
the only users.  That would completely get rid of
hardware-specificness issues.
--Jason

> Perhaps another approach would be to modify the
> opt_combine_constants() pass to split MADs under some circumstances --
> e.g., it accounts for the only use of a constant we would otherwise
> have to promote. But of course we don't have that pass for the vec4
> backend.
>
> In the mean time, I've sent a related patch that may be of interest:
> "[PATCH] nir: Don't fuse fmul into ffma if used by more than 4 fadds."
>
> This patch, applied on top of mine gives these results on Haswell:
>
> Total:
> total instructions in shared programs: 6595563 -> 6584885 (-0.16%)
> instructions in affected programs: 1183608 -> 1172930 (-0.90%)
> helped:8074
> HURT:  842
> GAINED:4
>
> FS:
> total instructions in shared programs: 4863484 -> 4859884 (-0.07%)
> instructions in affected programs: 554042 -> 550442 (-0.65%)
> helped:3072
> HURT:  38
> GAINED:4
>
> VS:
> total instructions in shared programs: 1729224 -> 1722146 (-0.41%)
> instructions in affected programs: 629566 -> 622488 (-1.12%)
> total loops in shared programs:221 -> 221 (0.00%)
> helped:5002
> HURT:  804
>
> Another thing to consider for the vec4 backend is that vec4 uniforms
> have to be unpacked for use by 3-src instructions (see the
> VEC4_OPCODE_UNPACK_UNIFORM opcode). We CSE the unpacking operations,
> but they often do account for increases in instruction counts.
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 08/12] i965: Support allocating the MCS buffer for 16x MSAA

2015-09-22 Thread Ben Widawsky
On Thu, Sep 17, 2015 at 05:00:10PM +0100, Neil Roberts wrote:
> When 16 samples are used the MCS buffer needs 64 bits per pixel.
> ---
>  src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
> b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> index 0cb0632..9faafb4 100644
> --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
> @@ -1374,6 +1374,12 @@ intel_miptree_alloc_mcs(struct brw_context *brw,
> */
>format = MESA_FORMAT_R_UINT32;
>break;
> +   case 16:
> +  /* 64 bits/pixel are required for MCS data when using 16x MSAA (4 bits
> +   * for each sample).
> +   */
> +  format = MESA_FORMAT_RG_UINT32;
> +  break;
> default:
>unreachable("Unrecognized sample count in intel_miptree_alloc_mcs");
> };

This and the previous are:
Reviewed-by: Ben Widawsky 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 09/12] i965/fs/skl+: Fix calculating gl_SampleID for 16x MSAA

2015-09-22 Thread Ben Widawsky
On Thu, Sep 17, 2015 at 05:00:11PM +0100, Neil Roberts wrote:
> In order to accomodate 16x MSAA, the starting sample pair index is now
> 3 bits rather than 2 on SKL+.

Hooray for bad docs. "PS Thread Payload for Normal Dispatch": SSPI is 2 bits
(R0.0.7:6) with 1 reserved for expansion.

> ---
>  src/mesa/drivers/dri/i965/brw_fs.cpp | 9 -
>  1 file changed, 8 insertions(+), 1 deletion(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs.cpp
> index 82f49b4..bd9bcdd 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> @@ -1300,9 +1300,16 @@ fs_visitor::emit_sampleid_setup()
> * are sample 1 of subspan 0; the third group is sample 0 of
> * subspan 1, and finally sample 1 of subspan 1.
> */
> +
> +  /* SKL+ has an extra bit for the Starting Sample Pair Index to
> +   * accomodate 16x MSAA.
> +   */
> +  unsigned sspi_bits = devinfo->gen >= 9 ? 3 : 2;
> +  unsigned sspi_mask = ((1 << sspi_bits) - 1) << 6;
> +

I think using the bits makes this unnecessarily hard to read.
unsigned sspi_mask = devinfo->gen >= 9 ? 0x1c0 : 0xc0;

>abld.exec_all()
>.AND(t1, fs_reg(retype(brw_vec1_grf(0, 0), BRW_REGISTER_TYPE_UD)),
> -   fs_reg(0xc0));
> +   fs_reg(sspi_mask));
>abld.exec_all().SHR(t1, t1, fs_reg(5));
>  
>/* This works for both SIMD8 and SIMD16 */

I'm really sketchy on the details of how this actually works. Mostly, I get what
this code is doing, and I agree that the only difference for 16x should be the
extra bit (because SSPI needs to go up to 7 for SIMD8).

So take my:
Reviewed-by: Ben Widawsky 

with a grain of salt.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [RFC] nir/opt_peephole_ffma: Bypass fusion if any operand of fadd and fmul is a const

2015-09-22 Thread Matt Turner
On Fri, Sep 18, 2015 at 12:49 AM, Eduardo Lima Mitev  wrote:
> When both fadd and fmul instructions have at least one operand that is a
> constant and it is only used once, the total number of instructions can
> be reduced from 3 (1 ffma + 2 load_const) to 2 (1 fmul + 1 fadd); because
> the constants will be progagated as immediate operands of fmul and fadd.
>
> This patch modifies opt_peephole_ffma pass to detect this situation and
> bails-out fusing fmul+fadd into ffma.
>
> As shown in shader-db results below, it seems to help a good bunch. However,
> there are some caveats:
>
> * It seems i965 specific, so I'm not sure if modifying the NIR pass
> directly is desired, as opposed to moving this to the backend.
>
> * There are still a high number of HURTs, but these could be reduced by being
> more specific in the conditions to bailout.
>
> total instructions in shared programs: 1683959 -> 1677447 (-0.39%)
> instructions in affected programs: 604918 -> 598406 (-1.08%)
> helped:4633
> HURT:  804
> GAINED:0
> LOST:  0
> ---

Interesting -- yeah, I've thought about doing this as well. It was
more difficult before because with GLSL IR (where I was trying to do
it) it wasn't possible to determine if the constant was used by
multiple 3-src instructions. Actually, your check might be able to be
more refined to consider only uses of 3-src instructions.

But that's getting kind of hardware-specific.

Perhaps another approach would be to modify the
opt_combine_constants() pass to split MADs under some circumstances --
e.g., it accounts for the only use of a constant we would otherwise
have to promote. But of course we don't have that pass for the vec4
backend.

In the mean time, I've sent a related patch that may be of interest:
"[PATCH] nir: Don't fuse fmul into ffma if used by more than 4 fadds."

This patch, applied on top of mine gives these results on Haswell:

Total:
total instructions in shared programs: 6595563 -> 6584885 (-0.16%)
instructions in affected programs: 1183608 -> 1172930 (-0.90%)
helped:8074
HURT:  842
GAINED:4

FS:
total instructions in shared programs: 4863484 -> 4859884 (-0.07%)
instructions in affected programs: 554042 -> 550442 (-0.65%)
helped:3072
HURT:  38
GAINED:4

VS:
total instructions in shared programs: 1729224 -> 1722146 (-0.41%)
instructions in affected programs: 629566 -> 622488 (-1.12%)
total loops in shared programs:221 -> 221 (0.00%)
helped:5002
HURT:  804

Another thing to consider for the vec4 backend is that vec4 uniforms
have to be unpacked for use by 3-src instructions (see the
VEC4_OPCODE_UNPACK_UNIFORM opcode). We CSE the unpacking operations,
but they often do account for increases in instruction counts.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] mesa: Reduce libGL.so binary size by about 15%

2015-09-22 Thread Ian Romanick
On 09/17/2015 03:19 PM, Arlie Davis wrote:
> Ok, here's v2 of the change, with the suggested edits.

So... I think this code is fine, and I admire the effort.  I have a
couple concerns.

1. We have no way to test this, so it's quite possible something was broken.

2. This function is only used in the OSX builds.  Jeremy is the
maintainer for those builds, so I've added him to the CC list.

For every non-OSX build, we should just stop linking
src/mapi/glapi/glapi_gentable.c.  I thought maybe the X sever used it,
but I couldn't find any evidence of that.

If this is still a viable route, I have a few suggestions of follow-on
patches...

I guess this patch is

Reviewed-by: Ian Romanick 

but I really think we need to get Jeremy's approval before pushing it.

> From 5f393faa058f453408dfc640eecae3fe6335dfed Mon Sep 17 00:00:00 2001
> From: Arlie Davis 
> Date: Tue, 15 Sep 2015 09:58:34 -0700
> Subject: [PATCH] This patch significantly reduces the size of the libGL.so
>  binary. It does not change the (externally visible) behavior of libGL.so at
>  all.
> 
> gl_gentable.py generates a function, _glapi_create_table_from_handle.
> This function allocates a large dispatch table, consisting of 1300 or so
> function pointers, and fills this dispatch table by doing symbol lookups
> on a given shared library.  Previously, gl_gentable.py would generate a
> single, very large _glapi_create_table_from_handle function, with a short
> cluster of lines for each entry point (function).  The idiom it generates
> was a NULL check, a call to snprintf, a call to dlsym / GetProcAddress,
> and then a store into the dispatch table.  Since this function processes
> a large number of entry points, this code is duplicated many times over.
> 
> We can encode the same information much more compactly, by using a lookup
> table.  The previous total size of _glapi_create_table_from_handle on x64
> was 125848 bytes.  By using a lookup table, the size of
> _glapi_create_table_from_handle (and the related lookup tables) is reduced
> to 10840 bytes.  In other words, this enormous function is reduced by 91%.
> The size of the entire libGL.so binary (measured when stripped) itself drops
> by 15%.
> 
> So the purpose of this change is to reduce the binary size, which frees up
> disk space, memory, etc.
> ---
>  src/mapi/glapi/gen/gl_gentable.py | 57 
> ---
>  1 file changed, 41 insertions(+), 16 deletions(-)
> 
> diff --git a/src/mapi/glapi/gen/gl_gentable.py 
> b/src/mapi/glapi/gen/gl_gentable.py
> index 1b3eb72..7cd475a 100644
> --- a/src/mapi/glapi/gen/gl_gentable.py
> +++ b/src/mapi/glapi/gen/gl_gentable.py
> @@ -113,6 +113,9 @@ __glapi_gentable_set_remaining_noop(struct _glapi_table 
> *disp) {
>  dispatch[i] = p.v;
>  }
>  
> +"""
> +
> +footer = """
>  struct _glapi_table *
>  _glapi_create_table_from_handle(void *handle, const char *symbol_prefix) {
>  struct _glapi_table *disp = calloc(_glapi_get_dispatch_table_size(), 
> sizeof(_glapi_proc));
> @@ -123,27 +126,28 @@ _glapi_create_table_from_handle(void *handle, const 
> char *symbol_prefix) {
>  
>  if(symbol_prefix == NULL)
>  symbol_prefix = "";
> -"""
>  
> -footer = """
> -__glapi_gentable_set_remaining_noop(disp);
> -
> -return disp;
> -}
> -"""
> +/* Note: This code relies on _glapi_table_func_names being sorted by the
> + * entry point index of each function.
> + */
> +for (int func_index = 0; func_index < GLAPI_TABLE_COUNT; ++func_index) {
> +const char *name = _glapi_table_func_names[func_index];
> +void ** procp = &((void **)disp)[func_index];
>  
> -body_template = """
> -if(!disp->%(name)s) {
> -void ** procp = (void **) &disp->%(name)s;
> -snprintf(symboln, sizeof(symboln), "%%s%(entry_point)s", 
> symbol_prefix);
> +snprintf(symboln, sizeof(symboln), \"%s%s\", symbol_prefix, name);
>  #ifdef _WIN32
>  *procp = GetProcAddress(handle, symboln);
>  #else
>  *procp = dlsym(handle, symboln);
>  #endif
>  }
> +__glapi_gentable_set_remaining_noop(disp);
> +
> +return disp;
> +}
>  """
>  
> +
>  class PrintCode(gl_XML.gl_print_base):
>  
>  def __init__(self):
> @@ -180,12 +184,33 @@ class PrintCode(gl_XML.gl_print_base):
>  
>  
>  def printBody(self, api):
> -for f in api.functionIterateByOffset():
> -for entry_point in f.entry_points:
> -vars = { 'entry_point' : entry_point,
> - 'name' : f.name }
>  
> -print body_template % vars
> +# Determine how many functions have a defined offset.
> +func_count = 0
> +for f in api.functions_by_name.itervalues():
> +if f.offset != -1:
> +func_count += 1
> +
> +# Build the mapping from offset to function name.
> +funcnames = [None] * func_count
> +for f in api.functions_by_name.itervalues():
> +if f.offset != -1:
> 

[Mesa-dev] [PATCH] nir: Don't fuse fmul into ffma if used by more than 4 fadds.

2015-09-22 Thread Matt Turner
total instructions in shared programs: 6596689 -> 6595563 (-0.02%)
instructions in affected programs: 103154 -> 102028 (-1.09%)
helped:253
HURT:  217

It's kind of a wash in terms of programs helped/hurt, but of the
programs helped 169 are by more than 10%.
---
I tried values of 2-6, and 4 seemed to be the best. I can provide
full shader-db result files if other people want to investigate.

 src/glsl/nir/nir_opt_peephole_ffma.c | 20 
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/src/glsl/nir/nir_opt_peephole_ffma.c 
b/src/glsl/nir/nir_opt_peephole_ffma.c
index 4f0f0da..3e8a34f 100644
--- a/src/glsl/nir/nir_opt_peephole_ffma.c
+++ b/src/glsl/nir/nir_opt_peephole_ffma.c
@@ -39,7 +39,7 @@ struct peephole_ffma_state {
 };
 
 static inline bool
-are_all_uses_fadd(nir_ssa_def *def)
+are_all_uses_fadd(nir_ssa_def *def, unsigned *num_uses)
 {
if (!list_empty(&def->if_uses))
   return false;
@@ -53,6 +53,7 @@ are_all_uses_fadd(nir_ssa_def *def)
   nir_alu_instr *use_alu = nir_instr_as_alu(use_instr);
   switch (use_alu->op) {
   case nir_op_fadd:
+ (*num_uses)++;
  break; /* This one's ok */
 
   case nir_op_imov:
@@ -60,7 +61,7 @@ are_all_uses_fadd(nir_ssa_def *def)
   case nir_op_fneg:
   case nir_op_fabs:
  assert(use_alu->dest.dest.is_ssa);
- if (!are_all_uses_fadd(&use_alu->dest.dest.ssa))
+ if (!are_all_uses_fadd(&use_alu->dest.dest.ssa, num_uses))
 return false;
  break;
 
@@ -101,15 +102,18 @@ get_mul_for_src(nir_alu_src *src, int num_components,
   *abs = true;
   break;
 
-   case nir_op_fmul:
-  /* Only absorb a fmul into a ffma if the fmul is is only used in fadd
-   * operations.  This prevents us from being too aggressive with our
-   * fusing which can actually lead to more instructions.
+   case nir_op_fmul: {
+  /* Only fuse an fmul into an ffma if its result is used by not more than
+   * four fadd operations. This prevents us from too aggressively fusing
+   * operations which can actually lead to more instructions or many ffma
+   * operations performing the same multiply.
*/
-  if (!are_all_uses_fadd(&alu->dest.dest.ssa))
+
+  unsigned num_uses = 0;
+  if (!are_all_uses_fadd(&alu->dest.dest.ssa, &num_uses) || num_uses > 4)
  return NULL;
   break;
-
+   }
default:
   return NULL;
}
-- 
2.4.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH resend] r600g: update num_dw in scissor_enable workaround

2015-09-22 Thread Grazvydas Ignotas
"r600g: apply disable workaround on all scissors" forgot to update
num_dw, fix it.

Fixes: fbb423b433 "r600g: apply disable workaround on all scissors"
Reported-and-tested-by: Markus Trippelsdorf 
Signed-off-by: Grazvydas Ignotas 
---
resending with s-o-b

 src/gallium/drivers/r600/r600_state_common.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/gallium/drivers/r600/r600_state_common.c 
b/src/gallium/drivers/r600/r600_state_common.c
index 21c89dc..efce852 100644
--- a/src/gallium/drivers/r600/r600_state_common.c
+++ b/src/gallium/drivers/r600/r600_state_common.c
@@ -367,6 +367,7 @@ static void r600_bind_rs_state(struct pipe_context *ctx, 
void *state)
rs->scissor_enable != rctx->scissor.enable) {
rctx->scissor.enable = rs->scissor_enable;
rctx->scissor.dirty_mask = (1 << R600_MAX_VIEWPORTS) - 1;
+   rctx->scissor.atom.num_dw = R600_MAX_VIEWPORTS * 4;
r600_mark_atom_dirty(rctx, &rctx->scissor.atom);
}
 
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v5 00/70] ARB_shader_storage_buffer_object (mesa, i965)

2015-09-22 Thread Kristian Høgsberg
On Mon, Sep 21, 2015 at 12:39:14PM +0200, Iago Toral wrote:
> Hi Kristian,
> 
> On Fri, 2015-09-18 at 16:56 -0700, Kristian Høgsberg wrote:
> > On Thu, Sep 10, 2015 at 03:35:16PM +0200, Iago Toral Quiroga wrote:
> > > Hi,
> > > 
> > > this is the latest version of the ARB_shader_storage_buffer_object
> > > implementation. A good part of the frontend bits for this are already in
> > > master, but this adds some more missing pieces, specifically std430 and
> > > memory qualifiers. Additionally, this provides the i965 implementation.
> > 
> > I've gone through all patches in the series and I replied to patches
> > where I had comments.  Overall, the series look good and with the
> > comments addressed, I'm ready to give my Reviewed-by for the series.
> > I want to take a closer look at the atomics lowering in patches 49+,
> > but I'm done for today.  Base on the quick look-through I did, I don't
> > expect to find any showstoppers there.
> 
> great, thanks for reviewing this! We will send new versions of the
> patches for which you had comments or reply to your comments otherwise.

I had a look at the atomics lowering and that looks fine. Except for
patch 40/70, which I hope we can clean up a little bit, all updated
patches and explanation in 38/70 look good.

Kristian

> Iago
> 
> > Here's a summary of what I found:
> > 
> > [PATCH v5 01/70] mesa: set MAX_SHADER_STORAGE_BUFFERS to 15.
> > 
> >   Update limit to 16 and drop the comment
> > 
> > [PATCH v5 02/70] i965: Use 16-byte offset alignment for shader storage 
> > buffers
> > 
> >   ctx->Const.ShaderStorageBufferOffsetAlignment should be 64
> > 
> > [PATCH v5 23/70] glsl: refactor parser processing of an interface block 
> > definition
> > 
> >   Clarify that the commit is just moving code.
> > 
> > [PATCH v5 26/70] glsl: Add parser/compiler support for std430 interface 
> > packing qualifier
> > 
> >   Update the error to also mention shader storage blocks, not just ubos?
> > 
> > [PATCH v5 28/70] glsl: add std430 interface packing support to ssbo related 
> > operations
> > 
> >   Why are we passing false for is_std430 here (emit_access in
> >   handle_rvalue)?  We use handle_rvalue for both UBO and SSBO loads,
> >   right?  Also, for consistency, I'd prefer if we could just pass
> >   'packing' around instead of is_std430.
> > 
> > [PATCH v5 29/70] glsl: a shader storage buffer must be smaller than the 
> > maximum size allowed
> > 
> >   Two chunks look like they should be their own patch ("Add unsized
> >   array support to glsl_type::std140_size" or such).
> > 
> > [PATCH v5 38/70] i965/nir/vec4: Implement nir_intrinsic_store_ssbo
> > 
> >   Shouldn't this be 'skipped_channels += num_channels;' to handle write 
> > mask reg.yw?
> > 
> > [PATCH v5 40/70] nir: Implement __intrinsic_load_ssbo
> > 
> >   Refactor handling of cmp instruction for converting to bool
> > 
> > [PATCH v5 54/70] glsl: First argument to atomic functions must be a buffer 
> > variable
> > 
> >   Nitpick: move check that only looks at first element in list to after loop
> > 
> > Also, I expect that before we land this series (thought that shouldn't
> > be far off), we'll have deleted the vec4 GLSL IR visitor so we can
> > drop these patches (I didn't review them):
> > 
> > [PATCH v5 16/70] i965/vec4: Implement ir_unop_get_buffer_size
> > [PATCH v5 39/70] i965/vec4: Implement __intrinsic_store_ssbo
> > [PATCH v5 43/70] i965/vec4: Implement __intrinsic_load_ssbo
> > [PATCH v5 53/70] i965/vec4: Implement lowered SSBO atomic intrinsics
> > 
> > I wrote the initial prototype of the SSBO functionality, but I don't
> > recall writing:
> > 
> > [PATCH v5 45/70] glsl: atomic counters can be declared as buffer-qualified 
> > variables
> > 
> > I don't think I did anything for atomics in my patch. Feel free to
> > take ownership of that one and add my Reviewed-by.
> > 
> > thanks,
> > Kristian
> > 
> 
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 1/8] radeonsi: Allocate buffers for DCC.

2015-09-22 Thread Marek Olšák
This looks good. Just some nits below.

On Fri, Sep 4, 2015 at 9:47 PM, Bas Nieuwenhuizen
 wrote:
> As the alignment requirements can be 32 KiB or more, also adding
> an aligned buffer creation function.
>
> Signed-off-by: Bas Nieuwenhuizen 
> ---
>  src/gallium/drivers/radeon/r600_buffer_common.c | 20 +++
>  src/gallium/drivers/radeon/r600_pipe_common.h   |  6 
>  src/gallium/drivers/radeon/r600_texture.c   | 18 ++
>  src/gallium/drivers/radeon/radeon_winsys.h  |  5 +++
>  src/gallium/winsys/amdgpu/drm/amdgpu_surface.c  | 48 
> ++---
>  5 files changed, 92 insertions(+), 5 deletions(-)
>
> diff --git a/src/gallium/drivers/radeon/r600_buffer_common.c 
> b/src/gallium/drivers/radeon/r600_buffer_common.c
> index cb9809f..d869856 100644
> --- a/src/gallium/drivers/radeon/r600_buffer_common.c
> +++ b/src/gallium/drivers/radeon/r600_buffer_common.c
> @@ -422,6 +422,26 @@ struct pipe_resource *r600_buffer_create(struct 
> pipe_screen *screen,
> return &rbuffer->b.b;
>  }
>
> +struct pipe_resource * r600_aligned_buffer_create(struct pipe_screen *screen,
> + unsigned bind,
> + unsigned usage,
> + unsigned size,
> + unsigned alignment)
> +{
> +   struct pipe_resource buffer;

Missing space after the declaration.

> +   memset(&buffer, 0, sizeof buffer);
> +   buffer.target = PIPE_BUFFER;
> +   buffer.format = PIPE_FORMAT_R8_UNORM;
> +   buffer.bind = bind;
> +   buffer.usage = usage;
> +   buffer.flags = 0;
> +   buffer.width0 = size;
> +   buffer.height0 = 1;
> +   buffer.depth0 = 1;
> +   buffer.array_size = 1;
> +   return r600_buffer_create(screen, &buffer, alignment);

This should use tabs for indentation.


> +}
> +
>  struct pipe_resource *
>  r600_buffer_from_user_memory(struct pipe_screen *screen,
>  const struct pipe_resource *templ,
> diff --git a/src/gallium/drivers/radeon/r600_pipe_common.h 
> b/src/gallium/drivers/radeon/r600_pipe_common.h
> index d22c230..4bbecca 100644
> --- a/src/gallium/drivers/radeon/r600_pipe_common.h
> +++ b/src/gallium/drivers/radeon/r600_pipe_common.h
> @@ -212,6 +212,7 @@ struct r600_texture {
> struct r600_fmask_info  fmask;
> struct r600_cmask_info  cmask;
> struct r600_resource*cmask_buffer;
> +   struct r600_resource*dcc_buffer;
> unsignedcb_color_info; /* fast clear enable 
> bit */
> unsignedcolor_clear_value[2];
>
> @@ -488,6 +489,11 @@ bool r600_init_resource(struct r600_common_screen 
> *rscreen,
>  struct pipe_resource *r600_buffer_create(struct pipe_screen *screen,
>  const struct pipe_resource *templ,
>  unsigned alignment);
> +struct pipe_resource * r600_aligned_buffer_create(struct pipe_screen *screen,
> + unsigned bind,
> + unsigned usage,
> + unsigned size,
> + unsigned alignment);
>  struct pipe_resource *
>  r600_buffer_from_user_memory(struct pipe_screen *screen,
>  const struct pipe_resource *templ,
> diff --git a/src/gallium/drivers/radeon/r600_texture.c 
> b/src/gallium/drivers/radeon/r600_texture.c
> index 89f18fb..46e735e 100644
> --- a/src/gallium/drivers/radeon/r600_texture.c
> +++ b/src/gallium/drivers/radeon/r600_texture.c
> @@ -268,6 +268,7 @@ static void r600_texture_destroy(struct pipe_screen 
> *screen,
> if (rtex->cmask_buffer != &rtex->resource) {
> pipe_resource_reference((struct 
> pipe_resource**)&rtex->cmask_buffer, NULL);
> }
> +   pipe_resource_reference((struct pipe_resource**)&rtex->dcc_buffer, 
> NULL);
> pb_reference(&resource->buf, NULL);
> FREE(rtex);
>  }
> @@ -482,6 +483,20 @@ static void r600_texture_alloc_cmask_separate(struct 
> r600_common_screen *rscreen
> rtex->cb_color_info |= EG_S_028C70_FAST_CLEAR(1);
>  }
>
> +static void vi_texture_alloc_dcc_separate(struct r600_common_screen *rscreen,
> + struct r600_texture *rtex)
> +{
> +   rtex->dcc_buffer = (struct r600_resource *)
> +   r600_aligned_buffer_create(&rscreen->b, PIPE_BIND_CUSTOM,
> +  PIPE_USAGE_DEFAULT, 
> rtex->surface.dcc_size, rtex->surface.dcc_alignment);
> +   if (rtex->dcc_buffer == NULL) {
> +   return;
> +   }
> +
> +   r600_screen_clear_buffer(rscreen, &rtex->dcc_buffer->b.b, 0, 
> rtex->surface.dcc_size,
> +0x, true);
> +}
> +
>  static unsign

Re: [Mesa-dev] [PATCH v2] i965/vec4: Don't coalesce regs in Gen6 MATH ops if reswizzle/writemask needed

2015-09-22 Thread Matt Turner
On Tue, Sep 22, 2015 at 9:53 AM, Antia Puentes  wrote:
> Gen6 MATH instructions can not execute in align16 mode, so swizzles or
> writemasking are not allowed.
>
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92033
> ---
>
> I have tried to find an example where the writemask check is strictly needed
> in order to avoid an incorrect register coalescing, but I have failed.
>
> If I have something like this in my shader:
>
>   in vec4 attr_pos;
>
>   void main() {
> vec4 a;
> a.xy = sin(attr_pos);
> ..
>   }
>
> It is translated into the following vec4 instructions:
>   sin vgrf4.0:F, vgrf3.xyzw:F
>   mov vgrf0.0.xy:F, vgrf4.xyzw:F <-- (added by emit_math to fix the 
> writemasking issue)
>
> And converted by the opt_reduce_swizzle optimization into:
>   sin vgrf4.0:F, vgrf3.xyzw:F
>   mov vgrf0.0.xy:F, vgrf4.xyyy:F
>
> The swizzle in the MOV instruction is not the identity anymore, so the method 
> "can_reswizzle"
> in opt_register_coalesce will return FALSE because of the swizzle-check that 
> my first
> version of the patch added.
>
> Out of curiosity, I disabled the opt_reduce_swizzle optimization to see what 
> happens.
> Still "can_reswizzle" returns FALSE preventing the incorrect register 
> coalescing,
> because the math function sets more things than the MOV instruction; i.e.
> the check "(dst.writemask & ~swizzle_mask) is positive.
>
> Although I have not found an example where the writemask-check is crucial, 
> that does not neccessarily
> mean that it does not exists. I am sending the second version of the patch in 
> case you prefer to have
> that check in place.

Thanks Antia. This seems like a good thing to do.

>  src/mesa/drivers/dri/i965/brw_ir_vec4.h |  3 ++-
>  src/mesa/drivers/dri/i965/brw_vec4.cpp  | 11 +--
>  2 files changed, 11 insertions(+), 3 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_ir_vec4.h 
> b/src/mesa/drivers/dri/i965/brw_ir_vec4.h
> index 966a410..a48bb68 100644
> --- a/src/mesa/drivers/dri/i965/brw_ir_vec4.h
> +++ b/src/mesa/drivers/dri/i965/brw_ir_vec4.h
> @@ -175,7 +175,8 @@ public:
>
> bool is_send_from_grf();
> unsigned regs_read(unsigned arg) const;
> -   bool can_reswizzle(int dst_writemask, int swizzle, int swizzle_mask);
> +   bool can_reswizzle(const struct brw_device_info *devinfo, int 
> dst_writemask,
> +  int swizzle, int swizzle_mask);
> void reswizzle(int dst_writemask, int swizzle);
> bool can_do_source_mods(const struct brw_device_info *devinfo);
>
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
> b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> index ed49cd3..ae695f1 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> @@ -941,10 +941,17 @@ vec4_visitor::opt_set_dependency_control()
>  }
>
>  bool
> -vec4_instruction::can_reswizzle(int dst_writemask,
> +vec4_instruction::can_reswizzle(const struct brw_device_info *devinfo,
> +int dst_writemask,
>  int swizzle,
>  int swizzle_mask)
>  {
> +   /* Gen6 MATH instructions can not execute in align16 mode, so swizzles
> +* or writemasking are not allowed. */

*/ goes on its own line in Mesa.

With that,

Reviewed-by: Matt Turner 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 04/12] i965/fs/skl+: Use lcd2dms_w instead of lcd2dms

2015-09-22 Thread Ian Romanick
On 09/17/2015 09:00 AM, Neil Roberts wrote:
> In order to support 16x MSAA, skl+ has a wider version of lcd2dms that
> takes two parameters for the MCS data. This patch makes it allocate a
> register that is twice as big for the MCS data and then always use
> the wider version.
> ---
>  src/mesa/drivers/dri/i965/brw_defines.h|  4 
>  src/mesa/drivers/dri/i965/brw_disasm.c |  1 +
>  src/mesa/drivers/dri/i965/brw_fs.cpp   | 27 
> --
>  src/mesa/drivers/dri/i965/brw_fs_generator.cpp |  5 +
>  src/mesa/drivers/dri/i965/brw_fs_visitor.cpp   |  9 ++---
>  src/mesa/drivers/dri/i965/brw_shader.cpp   |  5 +
>  6 files changed, 46 insertions(+), 5 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
> b/src/mesa/drivers/dri/i965/brw_defines.h
> index 8fc8ceb..2d5b67d 100644
> --- a/src/mesa/drivers/dri/i965/brw_defines.h
> +++ b/src/mesa/drivers/dri/i965/brw_defines.h
> @@ -969,6 +969,8 @@ enum opcode {
> FS_OPCODE_TXB_LOGICAL,
> SHADER_OPCODE_TXF_CMS,
> SHADER_OPCODE_TXF_CMS_LOGICAL,
> +   SHADER_OPCODE_TXF_CMS_W,
> +   SHADER_OPCODE_TXF_CMS_W_LOGICAL,
> SHADER_OPCODE_TXF_UMS,
> SHADER_OPCODE_TXF_UMS_LOGICAL,
> SHADER_OPCODE_TXF_MCS,
> @@ -1517,10 +1519,12 @@ enum brw_message_target {
>  #define GEN7_SAMPLER_MESSAGE_SAMPLE_GATHER4_PO   17
>  #define GEN7_SAMPLER_MESSAGE_SAMPLE_GATHER4_PO_C 18
>  #define HSW_SAMPLER_MESSAGE_SAMPLE_DERIV_COMPARE 20
> +#define GEN9_SAMPLER_MESSAGE_SAMPLE_LD2DMS_W 28
>  #define GEN7_SAMPLER_MESSAGE_SAMPLE_LD_MCS   29
>  #define GEN7_SAMPLER_MESSAGE_SAMPLE_LD2DMS   30
>  #define GEN7_SAMPLER_MESSAGE_SAMPLE_LD2DSS   31
>  
> +

Spurious whitespace change.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 11/12] meta: Support 16x MSAA in the multisample scaled blit shader

2015-09-22 Thread Ian Romanick
On 09/17/2015 09:00 AM, Neil Roberts wrote:
> I'm not too sure about the expression used to index into sample_map in
> the shader. It looks like if fract(coord.x) and fract(coord.y) are
> close to 1.0 then it would index outside of the array. However the
> code for 4 and 8 has the same problem and the results seems to look
> reasonable. It might make more sense to change it to something like
> this:
> 
>  sample_map[int(4 * fract(coord.x)) + 4 * int(fract(coord.y) * 4)]

Yeah... Jason also noticed some strange things in this area.  I've got a
patch out
(http://lists.freedesktop.org/archives/mesa-dev/2015-September/095026.html)
that also fixes a bug here.  I'm going to dig a bit deeper if you want
to leave it as-is for now.

This patch is

Reviewed-by: Ian Romanick 

> ---
>  src/mesa/drivers/common/meta.h |  2 ++
>  src/mesa/drivers/common/meta_blit.c|  8 ++--
>  src/mesa/drivers/dri/i965/gen6_multisample_state.c | 14 ++
>  src/mesa/main/mtypes.h | 15 ++-
>  4 files changed, 36 insertions(+), 3 deletions(-)
> 
> diff --git a/src/mesa/drivers/common/meta.h b/src/mesa/drivers/common/meta.h
> index fe43915..e42ddc8 100644
> --- a/src/mesa/drivers/common/meta.h
> +++ b/src/mesa/drivers/common/meta.h
> @@ -285,9 +285,11 @@ enum blit_msaa_shader {
> BLIT_2X_MSAA_SHADER_2D_MULTISAMPLE_SCALED_RESOLVE,
> BLIT_4X_MSAA_SHADER_2D_MULTISAMPLE_SCALED_RESOLVE,
> BLIT_8X_MSAA_SHADER_2D_MULTISAMPLE_SCALED_RESOLVE,
> +   BLIT_16X_MSAA_SHADER_2D_MULTISAMPLE_SCALED_RESOLVE,
> BLIT_2X_MSAA_SHADER_2D_MULTISAMPLE_ARRAY_SCALED_RESOLVE,
> BLIT_4X_MSAA_SHADER_2D_MULTISAMPLE_ARRAY_SCALED_RESOLVE,
> BLIT_8X_MSAA_SHADER_2D_MULTISAMPLE_ARRAY_SCALED_RESOLVE,
> +   BLIT_16X_MSAA_SHADER_2D_MULTISAMPLE_ARRAY_SCALED_RESOLVE,
> BLIT_MSAA_SHADER_COUNT,
>  };
>  
> diff --git a/src/mesa/drivers/common/meta_blit.c 
> b/src/mesa/drivers/common/meta_blit.c
> index a41fe42..b545c37 100644
> --- a/src/mesa/drivers/common/meta_blit.c
> +++ b/src/mesa/drivers/common/meta_blit.c
> @@ -86,8 +86,8 @@ setup_glsl_msaa_blit_scaled_shader(struct gl_context *ctx,
> while (samples >> (shader_offset + 1)) {
>shader_offset++;
> }
> -   /* Update the assert if we plan to support more than 8X MSAA. */
> -   assert(shader_offset > 0 && shader_offset < 4);
> +   /* Update the assert if we plan to support more than 16X MSAA. */
> +   assert(shader_offset > 0 && shader_offset <= 4);
>  
> assert(target == GL_TEXTURE_2D_MULTISAMPLE ||
>target == GL_TEXTURE_2D_MULTISAMPLE_ARRAY);
> @@ -132,6 +132,10 @@ setup_glsl_msaa_blit_scaled_shader(struct gl_context 
> *ctx,
>sample_number =  "sample_map[int(2 * fract(coord.x) + 8 * 
> fract(coord.y))]";
>sample_map = ctx->Const.SampleMap8x;
>break;
> +   case 16:
> +  sample_number =  "sample_map[int(4 * fract(coord.x) + 16 * 
> fract(coord.y))]";
> +  sample_map = ctx->Const.SampleMap16x;
> +  break;
> default:
>sample_number = NULL;
>sample_map = NULL;
> diff --git a/src/mesa/drivers/dri/i965/gen6_multisample_state.c 
> b/src/mesa/drivers/dri/i965/gen6_multisample_state.c
> index 49c6eba..8eb620d 100644
> --- a/src/mesa/drivers/dri/i965/gen6_multisample_state.c
> +++ b/src/mesa/drivers/dri/i965/gen6_multisample_state.c
> @@ -91,6 +91,17 @@ gen6_get_sample_position(struct gl_context *ctx,
>   *   | 6 | 7 |  | 7 | 1 |
>   *   -  -
>   *
> + * 16X MSAA sample index layout  16x MSAA sample number layout
> + * --
> + * | 0 | 1 | 2 | 3 ||15 |10 | 9 | 7 |
> + * --
> + * | 4 | 5 | 6 | 7 || 4 | 1 | 3 |13 |
> + * --
> + * | 8 | 9 |10 |11 ||12 | 2 | 0 | 6 |
> + * --
> + * |12 |13 |14 |15 ||11 | 8 | 5 |14 |
> + * --
> + *
>   * A sample map is used to map sample indices to sample numbers.
>   */
>  void
> @@ -99,10 +110,13 @@ gen6_set_sample_maps(struct gl_context *ctx)
> uint8_t map_2x[2] = {0, 1};
> uint8_t map_4x[4] = {0, 1, 2, 3};
> uint8_t map_8x[8] = {5, 2, 4, 6, 0, 3, 7, 1};
> +   uint8_t map_16x[16] = { 15, 10, 9, 7, 4, 1, 3, 13,
> +   12, 2, 0, 6, 11, 8, 5, 14 };
>  
> memcpy(ctx->Const.SampleMap2x, map_2x, sizeof(map_2x));
> memcpy(ctx->Const.SampleMap4x, map_4x, sizeof(map_4x));
> memcpy(ctx->Const.SampleMap8x, map_8x, sizeof(map_8x));
> +   memcpy(ctx->Const.SampleMap16x, map_16x, sizeof(map_16x));
>  }
>  
>  /**
> diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h
> index fac45aa..e0a06f4 100644
> --- a/src/mesa/main/mtypes.h
> +++ b/src/mesa/main/mtypes.h
> @@

[Mesa-dev] [PATCH] meta: Handle array textures in scaled MSAA blits

2015-09-22 Thread Ian Romanick
From: Ian Romanick 

The old code had some significant problems with respect to
sampler2DArray textures.  The biggest problem was that some of the code
would use vec3 for the texture coordinate type, and other parts of the
code would use vec2.  The resulting shader would not even compile.
Since there were not tests for this path, nobody noticed.

The input to the fragment shader is always treated as a vec3.  If the
source data is only vec2, the vertex puller will supply 0 for the .z
component.  The texture coordinate passed to the fragment shader is
always a vec2 that comes from the .xy part of the vertex shader input.
The layer, taken from the .z of the vertex shader input is passed
separately as a flat integer.  If the generated fragment shader does not
use the layer integer, the GLSL linker will eliminate all the dead code
in the vertex shader.

Fixes the new piglit tests "blit-scaled samples=2 with
gl_texture_2d_multisample_array", etc. on i965.

Note for stable maintainer: This patch may depend on 46037237, and that
patch should be safe for stable.

Signed-off-by: Ian Romanick 
Cc: Anuj Phogat 
Cc: Topi Pohjolainen 
Cc: Jordan Justen 
Cc: "10.6 11.0" 
---
This is currently running through our Jenkins server.  There were no
regressions on my BDW system.

 src/mesa/drivers/common/meta_blit.c | 35 ---
 1 file changed, 20 insertions(+), 15 deletions(-)

diff --git a/src/mesa/drivers/common/meta_blit.c 
b/src/mesa/drivers/common/meta_blit.c
index a41fe42..5972a5a 100644
--- a/src/mesa/drivers/common/meta_blit.c
+++ b/src/mesa/drivers/common/meta_blit.c
@@ -71,9 +71,7 @@ setup_glsl_msaa_blit_scaled_shader(struct gl_context *ctx,
char *sample_map_str = rzalloc_size(mem_ctx, 1);
char *sample_map_expr = rzalloc_size(mem_ctx, 1);
char *texel_fetch_macro = rzalloc_size(mem_ctx, 1);
-   const char *vs_source;
const char *sampler_array_suffix = "";
-   const char *texcoord_type = "vec2";
float y_scale;
enum blit_msaa_shader shader_index;
 
@@ -99,7 +97,6 @@ setup_glsl_msaa_blit_scaled_shader(struct gl_context *ctx,
   shader_index += BLIT_2X_MSAA_SHADER_2D_MULTISAMPLE_ARRAY_SCALED_RESOLVE -
   BLIT_2X_MSAA_SHADER_2D_MULTISAMPLE_SCALED_RESOLVE;
   sampler_array_suffix = "Array";
-  texcoord_type = "vec3";
}
 
if (blit->msaa_shaders[shader_index]) {
@@ -150,28 +147,37 @@ setup_glsl_msaa_blit_scaled_shader(struct gl_context *ctx,
   "   const int sample_map[%d] = int[%d](%s);\n",
   samples, samples, sample_map_str);
 
-   ralloc_asprintf_append(&texel_fetch_macro,
-  "#define TEXEL_FETCH(coord) texelFetch(texSampler, 
i%s(coord), %s);\n",
-  texcoord_type, sample_number);
+   if (target == GL_TEXTURE_2D_MULTISAMPLE) {
+  ralloc_asprintf_append(&texel_fetch_macro,
+ "#define TEXEL_FETCH(coord) 
texelFetch(texSampler, ivec2(coord), %s);\n",
+ sample_number);
+   } else {
+  ralloc_asprintf_append(&texel_fetch_macro,
+ "#define TEXEL_FETCH(coord) 
texelFetch(texSampler, ivec3(coord, layer), %s);\n",
+ sample_number);
+   }
 
-   vs_source = ralloc_asprintf(mem_ctx,
+   static const char vs_source[] =
"#version 130\n"
"in vec2 position;\n"
-   "in %s textureCoords;\n"
-   "out %s texCoords;\n"
+   "in vec3 textureCoords;\n"
+   "out vec2 texCoords;\n"
+   "flat out int layer;\n"
"void main()\n"
"{\n"
-   "   texCoords = textureCoords;\n"
+   "   texCoords = textureCoords.xy;\n"
+   "   layer = int(textureCoords.z);\n"
"   gl_Position = vec4(position, 0.0, 1.0);\n"
-   "}\n",
-   texcoord_type,
-   texcoord_type);
+   "}\n"
+  ;
+
fs_source = ralloc_asprintf(mem_ctx,
"#version 130\n"
"#extension GL_ARB_texture_multisample : 
enable\n"
"uniform sampler2DMS%s texSampler;\n"
"uniform float src_width, src_height;\n"
-   "in %s texCoords;\n"
+   "in vec2 texCoords;\n"
+   "flat in int layer;\n"
"out vec4 out_color;\n"
"\n"
"void main()\n"
@@ -212,7 +218,6 @@ setup_glsl_msaa_blit_scaled_shader(struct gl_context *ctx,
   

[Mesa-dev] [PATCH V3 8/8] i965: Rename intel_miptree_get_dimensions_for_image()

2015-09-22 Thread Anuj Phogat
This function isn't specific to miptrees. So, drop the "miptree"
from function name.

V3: Add a comment explaining how the 1D Array texture height and
depth is interpreted by Intel hardware.

Cc: Chad Versace 
Signed-off-by: Anuj Phogat 
---
 src/mesa/drivers/dri/i965/intel_fbo.c  |  2 +-
 src/mesa/drivers/dri/i965/intel_mipmap_tree.c  | 12 +---
 src/mesa/drivers/dri/i965/intel_mipmap_tree.h  |  4 ++--
 src/mesa/drivers/dri/i965/intel_tex_image.c|  3 +--
 src/mesa/drivers/dri/i965/intel_tex_validate.c |  3 +--
 5 files changed, 14 insertions(+), 10 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/intel_fbo.c 
b/src/mesa/drivers/dri/i965/intel_fbo.c
index b216055..6b2349e 100644
--- a/src/mesa/drivers/dri/i965/intel_fbo.c
+++ b/src/mesa/drivers/dri/i965/intel_fbo.c
@@ -1027,7 +1027,7 @@ intel_renderbuffer_move_to_temp(struct brw_context *brw,
uint32_t layout_flags = MIPTREE_LAYOUT_ACCELERATED_UPLOAD |
MIPTREE_LAYOUT_TILING_ANY;
 
-   intel_miptree_get_dimensions_for_image(rb->TexImage, &width, &height, 
&depth);
+   intel_get_image_dims(rb->TexImage, &width, &height, &depth);
 
new_mt = intel_miptree_create(brw, rb->TexImage->TexObject->Target,
  intel_image->base.Base.TexFormat,
diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c 
b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
index 28d5031..0d1da33 100644
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
+++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c
@@ -921,12 +921,18 @@ intel_miptree_release(struct intel_mipmap_tree **mt)
*mt = NULL;
 }
 
+
 void
-intel_miptree_get_dimensions_for_image(struct gl_texture_image *image,
-   int *width, int *height, int *depth)
+intel_get_image_dims(struct gl_texture_image *image,
+ int *width, int *height, int *depth)
 {
switch (image->TexObject->Target) {
case GL_TEXTURE_1D_ARRAY:
+  /* For a 1D Array texture the OpenGL API will treat the image height as
+   * the number of array slices. For Intel hardware, we treat the 1D array
+   * as a 2D Array with a height of 1. So, here we want to swap image
+   * height and depth.
+   */
   *width = image->Width;
   *height = 1;
   *depth = image->Height;
@@ -970,7 +976,7 @@ intel_miptree_match_image(struct intel_mipmap_tree *mt,
if (image->TexFormat != mt_format)
   return false;
 
-   intel_miptree_get_dimensions_for_image(image, &width, &height, &depth);
+   intel_get_image_dims(image, &width, &height, &depth);
 
if (mt->target == GL_TEXTURE_CUBE_MAP)
   depth = 6;
diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h 
b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
index bd77f95..7610d75 100644
--- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
+++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h
@@ -619,8 +619,8 @@ intel_miptree_get_image_offset(const struct 
intel_mipmap_tree *mt,
   GLuint *x, GLuint *y);
 
 void
-intel_miptree_get_dimensions_for_image(struct gl_texture_image *image,
-   int *width, int *height, int *depth);
+intel_get_image_dims(struct gl_texture_image *image,
+ int *width, int *height, int *depth);
 
 void
 intel_get_tile_masks(uint32_t tiling, uint32_t tr_mode, uint32_t cpp,
diff --git a/src/mesa/drivers/dri/i965/intel_tex_image.c 
b/src/mesa/drivers/dri/i965/intel_tex_image.c
index 93a8cde..34b91e8 100644
--- a/src/mesa/drivers/dri/i965/intel_tex_image.c
+++ b/src/mesa/drivers/dri/i965/intel_tex_image.c
@@ -42,8 +42,7 @@ intel_miptree_create_for_teximage(struct brw_context *brw,
int width, height, depth;
GLuint i;
 
-   intel_miptree_get_dimensions_for_image(&intelImage->base.Base,
-  &width, &height, &depth);
+   intel_get_image_dims(&intelImage->base.Base, &width, &height, &depth);
 
DBG("%s\n", __func__);
 
diff --git a/src/mesa/drivers/dri/i965/intel_tex_validate.c 
b/src/mesa/drivers/dri/i965/intel_tex_validate.c
index d3fb252..2cf9c13 100644
--- a/src/mesa/drivers/dri/i965/intel_tex_validate.c
+++ b/src/mesa/drivers/dri/i965/intel_tex_validate.c
@@ -128,8 +128,7 @@ intel_finalize_mipmap_tree(struct brw_context *brw, GLuint 
unit)
/* May need to create a new tree:
 */
if (!intelObj->mt) {
-  intel_miptree_get_dimensions_for_image(&firstImage->base.Base,
-&width, &height, &depth);
+  intel_get_image_dims(&firstImage->base.Base, &width, &height, &depth);
 
   perf_debug("Creating new %s %dx%dx%d %d-level miptree to handle "
  "finalized texture miptree.\n",
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH V2 8/8] i965: Rename intel_miptree_get_dimensions_for_image()

2015-09-22 Thread Anuj Phogat
On Wed, Sep 16, 2015 at 4:30 PM, Chad Versace  wrote:
> On Wed 19 Aug 2015, Anuj Phogat wrote:
>> This function isn't specific to miptrees. So, drop the "miptree"
>> from function name.
>>
>> Signed-off-by: Anuj Phogat 
>> ---
>>  src/mesa/drivers/dri/i965/intel_fbo.c  | 2 +-
>>  src/mesa/drivers/dri/i965/intel_mipmap_tree.c  | 6 +++---
>>  src/mesa/drivers/dri/i965/intel_mipmap_tree.h  | 4 ++--
>>  src/mesa/drivers/dri/i965/intel_tex_image.c| 3 +--
>>  src/mesa/drivers/dri/i965/intel_tex_validate.c | 3 +--
>>  5 files changed, 8 insertions(+), 10 deletions(-)
>
>
>> @@ -928,8 +928,8 @@ intel_miptree_release(struct intel_mipmap_tree **mt)
>>  }
>>
>>  void
>> -intel_miptree_get_dimensions_for_image(struct gl_texture_image *image,
>> -   int *width, int *height, int *depth)
>> +intel_get_image_dims(struct gl_texture_image *image,
>> + int *width, int *height, int *depth)
>>  {
>> switch (image->TexObject->Target) {
>> case GL_TEXTURE_1D_ARRAY:
>
> True, the function isn't specific to miptrees. But it *is* specific to
> Intel's RENDER_SURFACE_STATE, as it translates the image's (width,
> height, depth), from the perspective of the OpenGL API, to the needs of
> Intel hardware.
>
> Now that 'miptree' is removed from the function name, the function name
> looks like a mere getter. In that case, it's not clear why the caller
> cannot just access image->width, image->height, and image->depth
> directly.
>
> So that we all don't forget why this function exists next year, please copy
> into the function the relevant portions of this comment from
> intel_miptree_create_layout():
>
>   /* For a 1D Array texture the OpenGL API will treat the height0
>* parameter as the number of array slices. For Intel hardware, we treat
>* the 1D array as a 2D Array with a height of 1.
>*
>* So, when we first come through this path to create a 1D Array
>* texture, height0 stores the number of slices, and depth0 is 1. In
>* this case, we want to swap height0 and depth0.
>*
>* Since some miptrees will be created based on the base miptree, we may
>* come through this path and see height0 as 1 and depth0 being the
>* number of slices. In this case we don't need to do the swap.
>*/
>
> With such a comment, I think this patch will be ok.
>
I'll send out V3 with added comment.

> By the way, the height<->depth adjustment that intel_miptree_create_layout()
> performs directly beneath that comment, that adjustment duplicates the
> height<-> adjustment done by intel_get_image_dims(). That means you may be 
> able
> to eliminate intel_get_image_dims() completely, and rely on
> intel_miptree_create_layout() to do the adjustment for you. I say "may" 
> because
> I haven't investigated it closely enough to be confident.
After looking at the usage of intel_get_image_dims() at multiple places, I
couldn't see an easy way to remove the function. So, I'll keep it unchanged
at the moment.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 92072] Wine breakage since d082c5324 (st/mesa: don't call st_validate_state in BlitFramebuffer)

2015-09-22 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=92072

--- Comment #3 from Samuel Pitoiset  ---
I can also reproduce this bug on GF119 (nvc0 driver) with master, and
(obviously) reverting that commit fixes the issue, but the assert in
ext_framebuffer_blit/blit-early is back.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 06/12] i965/fs: Add a sampler program key for whether the texture is 16x MSAA

2015-09-22 Thread Ben Widawsky
On Thu, Sep 17, 2015 at 05:00:08PM +0100, Neil Roberts wrote:
> When 16x MSAA is used for sampling with texelFetch the compiler needs
> to use a different instruction which passes more arguments for the MCS
> data. Previously on skl+ it was unconditionally using this new
> instruction. However since 16x MSAA is probably going to be pretty
> rare, it is probably worthwhile to avoid using this instruction for
> the other sample counts. In order to do that this patch adds a new
> member to brw_sampler_prog_key_data to track when a sampler refers to
> a buffer with 16 samples.
> 
> Note that this isn't done for the vec4 backend because it wouldn't
> change how many registers it uses.

Hmm. I was of the other opinion actually, I'd prefer to always use the
new instruction to limit the variability between tests. I suppose if we're often
passing the LOD parameter that is problematic, but we don't do that either,
right?

> ---
>  src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 2 +-
>  src/mesa/drivers/dri/i965/brw_program.h  | 7 +++
>  src/mesa/drivers/dri/i965/brw_wm.c   | 8 
>  3 files changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> index ea7b3eb..7937c48 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp
> @@ -287,7 +287,7 @@ fs_visitor::emit_texture(ir_texture_opcode op,
>opcode = SHADER_OPCODE_TXF_LOGICAL;
>break;
> case ir_txf_ms:
> -  if (devinfo->gen >= 9)
> +  if ((key_tex->msaa_16 & (1 << sampler)))
>   opcode = SHADER_OPCODE_TXF_CMS_W_LOGICAL;
>else
>   opcode = SHADER_OPCODE_TXF_CMS_LOGICAL;
> diff --git a/src/mesa/drivers/dri/i965/brw_program.h 
> b/src/mesa/drivers/dri/i965/brw_program.h
> index 00e8f3f..b27ab7f 100644
> --- a/src/mesa/drivers/dri/i965/brw_program.h
> +++ b/src/mesa/drivers/dri/i965/brw_program.h
> @@ -72,6 +72,13 @@ struct brw_sampler_prog_key_data {
> uint32_t compressed_multisample_layout_mask;
>  
> /**
> +* Whether this sampler is using 16x multisampling. If so fetching from
> +* this sampler will be handled with a different instruction, ld2ms_w
> +* instead of ld2ms.
> +*/
> +   uint32_t msaa_16;
> +
> +   /**
>  * For Sandybridge, which shader w/a we need for gather quirks.
>  */
> enum gen6_gather_sampler_wa gen6_gather_wa[MAX_SAMPLERS];
> diff --git a/src/mesa/drivers/dri/i965/brw_wm.c 
> b/src/mesa/drivers/dri/i965/brw_wm.c
> index 35c0908..2d6f10d 100644
> --- a/src/mesa/drivers/dri/i965/brw_wm.c
> +++ b/src/mesa/drivers/dri/i965/brw_wm.c
> @@ -292,6 +292,9 @@ brw_debug_recompile_sampler_key(struct brw_context *brw,
> found |= key_debug(brw, "compressed multisample layout",
>old_key->compressed_multisample_layout_mask,
>key->compressed_multisample_layout_mask);
> +   found |= key_debug(brw, "16x msaa",
> +  old_key->msaa_16,
> +  key->msaa_16);
>  
> for (unsigned int i = 0; i < MAX_SAMPLERS; i++) {
>found |= key_debug(brw, "textureGather workarounds",
> @@ -451,6 +454,11 @@ brw_populate_sampler_prog_key_data(struct gl_context 
> *ctx,
>   if (brw->gen >= 7 &&
>   intel_tex->mt->msaa_layout == INTEL_MSAA_LAYOUT_CMS) {
>  key->compressed_multisample_layout_mask |= 1 << s;
> +
> +if (intel_tex->mt->num_samples >= 16) {
> +   assert(brw->gen >= 9);
> +   key->msaa_16 |= 1 << s;
> +}
>   }
>}
> }
> -- 
> 1.9.3
> 
> ___
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 05/12] i965/vec4/skl+: Use lcd2dms_w instead of lcd2dms

2015-09-22 Thread Ben Widawsky
On Thu, Sep 17, 2015 at 05:00:07PM +0100, Neil Roberts wrote:
> In order to support 16x MSAA, skl+ has a wider version of lcd2dms that
> takes two parameters for the MCS data. The MCS data in the response
> still fits in a single register so we just need to ensure we copy both
> values rather than just the lower one.
> ---
>  src/mesa/drivers/dri/i965/brw_vec4.cpp   |  1 +
>  src/mesa/drivers/dri/i965/brw_vec4_generator.cpp |  5 +
>  src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp   | 14 --
>  3 files changed, 18 insertions(+), 2 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
> b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> index ed49cd3..22c8d01 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
> @@ -327,6 +327,7 @@ vec4_visitor::implied_mrf_writes(vec4_instruction *inst)
> case SHADER_OPCODE_TXD:
> case SHADER_OPCODE_TXF:
> case SHADER_OPCODE_TXF_CMS:
> +   case SHADER_OPCODE_TXF_CMS_W:
> case SHADER_OPCODE_TXF_MCS:
> case SHADER_OPCODE_TXS:
> case SHADER_OPCODE_TG4:
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp 
> b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
> index 1950333..c4ddff9 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp
> @@ -259,6 +259,10 @@ vec4_generator::generate_tex(vec4_instruction *inst,
>case SHADER_OPCODE_TXF:
>msg_type = GEN5_SAMPLER_MESSAGE_SAMPLE_LD;
>break;
> +  case SHADER_OPCODE_TXF_CMS_W:
> + assert(devinfo->gen >= 9);
> + msg_type = GEN9_SAMPLER_MESSAGE_SAMPLE_LD2DMS_W;
> + break;
>case SHADER_OPCODE_TXF_CMS:
>   if (devinfo->gen >= 7)
>  msg_type = GEN7_SAMPLER_MESSAGE_SAMPLE_LD2DMS;
> @@ -1372,6 +1376,7 @@ vec4_generator::generate_code(const cfg_t *cfg)
>case SHADER_OPCODE_TXD:
>case SHADER_OPCODE_TXF:
>case SHADER_OPCODE_TXF_CMS:
> +  case SHADER_OPCODE_TXF_CMS_W:
>case SHADER_OPCODE_TXF_MCS:
>case SHADER_OPCODE_TXL:
>case SHADER_OPCODE_TXS:
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp 
> b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
> index 0465770..79f92d8 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
> @@ -2545,7 +2545,8 @@ vec4_visitor::emit_texture(ir_texture_opcode op,
> case ir_txl: opcode = SHADER_OPCODE_TXL; break;
> case ir_txd: opcode = SHADER_OPCODE_TXD; break;
> case ir_txf: opcode = SHADER_OPCODE_TXF; break;
> -   case ir_txf_ms: opcode = SHADER_OPCODE_TXF_CMS; break;
> +   case ir_txf_ms: opcode = (devinfo->gen >= 9 ? SHADER_OPCODE_TXF_CMS_W :
> + SHADER_OPCODE_TXF_CMS); break;
> case ir_txs: opcode = SHADER_OPCODE_TXS; break;
> case ir_tg4: opcode = offset_value.file != BAD_FILE
>   ? SHADER_OPCODE_TG4_OFFSET : SHADER_OPCODE_TG4; 
> break;
> @@ -2637,7 +2638,16 @@ vec4_visitor::emit_texture(ir_texture_opcode op,
>} else if (op == ir_txf_ms) {
>   emit(MOV(dst_reg(MRF, param_base + 1, sample_index.type, 
> WRITEMASK_X),
>sample_index));
> - if (devinfo->gen >= 7) {
> + if (opcode == SHADER_OPCODE_TXF_CMS_W) {
> +/* MCS data is stored in the first two channels of ‘mcs’, but we
> + * need to get it into the .y and .z channels of the second vec4
> + * of params.
> + */

I don't understand what you mean by the "second vec4". Do you just mean the .y
and .z channel?

> +mcs.swizzle = BRW_SWIZZLE4(0, 0, 1, 1);

I don't really understand how the swizzle stuff works. Quick explanation on why
it's not BRW_SWIZZLE_XYXY would be greatly appreciated.

> +emit(MOV(dst_reg(MRF, param_base + 1,
> + glsl_type::uint_type, WRITEMASK_YZ),
> + mcs));
> + } else if (devinfo->gen >= 7) {
>  /* MCS data is in the first channel of `mcs`, but we need to get 
> it into
>   * the .y channel of the second vec4 of params, so replicate .x 
> across
>   * the whole vec4 and then mask off everything except .y
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 04/12] i965/fs/skl+: Use lcd2dms_w instead of lcd2dms

2015-09-22 Thread Ben Widawsky
On Thu, Sep 17, 2015 at 05:00:06PM +0100, Neil Roberts wrote:
> In order to support 16x MSAA, skl+ has a wider version of lcd2dms that
> takes two parameters for the MCS data. This patch makes it allocate a
> register that is twice as big for the MCS data and then always use
> the wider version.
> ---
>  src/mesa/drivers/dri/i965/brw_defines.h|  4 
>  src/mesa/drivers/dri/i965/brw_disasm.c |  1 +
>  src/mesa/drivers/dri/i965/brw_fs.cpp   | 27 
> --
>  src/mesa/drivers/dri/i965/brw_fs_generator.cpp |  5 +
>  src/mesa/drivers/dri/i965/brw_fs_visitor.cpp   |  9 ++---
>  src/mesa/drivers/dri/i965/brw_shader.cpp   |  5 +
>  6 files changed, 46 insertions(+), 5 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/i965/brw_defines.h 
> b/src/mesa/drivers/dri/i965/brw_defines.h
> index 8fc8ceb..2d5b67d 100644
> --- a/src/mesa/drivers/dri/i965/brw_defines.h
> +++ b/src/mesa/drivers/dri/i965/brw_defines.h
> @@ -969,6 +969,8 @@ enum opcode {
> FS_OPCODE_TXB_LOGICAL,
> SHADER_OPCODE_TXF_CMS,
> SHADER_OPCODE_TXF_CMS_LOGICAL,
> +   SHADER_OPCODE_TXF_CMS_W,
> +   SHADER_OPCODE_TXF_CMS_W_LOGICAL,
> SHADER_OPCODE_TXF_UMS,
> SHADER_OPCODE_TXF_UMS_LOGICAL,
> SHADER_OPCODE_TXF_MCS,
> @@ -1517,10 +1519,12 @@ enum brw_message_target {
>  #define GEN7_SAMPLER_MESSAGE_SAMPLE_GATHER4_PO   17
>  #define GEN7_SAMPLER_MESSAGE_SAMPLE_GATHER4_PO_C 18
>  #define HSW_SAMPLER_MESSAGE_SAMPLE_DERIV_COMPARE 20
> +#define GEN9_SAMPLER_MESSAGE_SAMPLE_LD2DMS_W 28
>  #define GEN7_SAMPLER_MESSAGE_SAMPLE_LD_MCS   29
>  #define GEN7_SAMPLER_MESSAGE_SAMPLE_LD2DMS   30
>  #define GEN7_SAMPLER_MESSAGE_SAMPLE_LD2DSS   31
>  
> +
>  /* for GEN5 only */
>  #define BRW_SAMPLER_SIMD_MODE_SIMD4X2   0
>  #define BRW_SAMPLER_SIMD_MODE_SIMD8 1
> diff --git a/src/mesa/drivers/dri/i965/brw_disasm.c 
> b/src/mesa/drivers/dri/i965/brw_disasm.c
> index db23a18..5aca728 100644
> --- a/src/mesa/drivers/dri/i965/brw_disasm.c
> +++ b/src/mesa/drivers/dri/i965/brw_disasm.c
> @@ -622,6 +622,7 @@ static const char *const gen5_sampler_msg_type[] = {
> [GEN7_SAMPLER_MESSAGE_SAMPLE_GATHER4_PO]   = "gather4_po",
> [GEN7_SAMPLER_MESSAGE_SAMPLE_GATHER4_PO_C] = "gather4_po_c",
> [HSW_SAMPLER_MESSAGE_SAMPLE_DERIV_COMPARE] = "sample_d_c",
> +   [GEN9_SAMPLER_MESSAGE_SAMPLE_LD2DMS_W] = "ld2dms_w",
> [GEN7_SAMPLER_MESSAGE_SAMPLE_LD_MCS]   = "ld_mcs",
> [GEN7_SAMPLER_MESSAGE_SAMPLE_LD2DMS]   = "ld2dms",
> [GEN7_SAMPLER_MESSAGE_SAMPLE_LD2DSS]   = "ld2dss",
> diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
> b/src/mesa/drivers/dri/i965/brw_fs.cpp
> index 32a2adf..82f49b4 100644
> --- a/src/mesa/drivers/dri/i965/brw_fs.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
> @@ -697,6 +697,7 @@ fs_inst::components_read(unsigned i) const
> case SHADER_OPCODE_TXS_LOGICAL:
> case FS_OPCODE_TXB_LOGICAL:
> case SHADER_OPCODE_TXF_CMS_LOGICAL:
> +   case SHADER_OPCODE_TXF_CMS_W_LOGICAL:
> case SHADER_OPCODE_TXF_UMS_LOGICAL:
> case SHADER_OPCODE_TXF_MCS_LOGICAL:
> case SHADER_OPCODE_LOD_LOGICAL:
> @@ -712,6 +713,9 @@ fs_inst::components_read(unsigned i) const
>/* Texture offset. */
>else if (i == 7)
>   return 2;
> +  /* MCS */
> +  else if (i == 5 && opcode == SHADER_OPCODE_TXF_CMS_W_LOGICAL)
> + return 2;
>else
>   return 1;
>  
> @@ -873,6 +877,7 @@ fs_visitor::implied_mrf_writes(fs_inst *inst)
> case SHADER_OPCODE_TXD:
> case SHADER_OPCODE_TXF:
> case SHADER_OPCODE_TXF_CMS:
> +   case SHADER_OPCODE_TXF_CMS_W:
> case SHADER_OPCODE_TXF_MCS:
> case SHADER_OPCODE_TG4:
> case SHADER_OPCODE_TG4_OFFSET:
> @@ -3816,17 +3821,31 @@ lower_sampler_logical_send_gen7(const fs_builder 
> &bld, fs_inst *inst, opcode op,
>coordinate_done = true;
>break;
> case SHADER_OPCODE_TXF_CMS:
> +   case SHADER_OPCODE_TXF_CMS_W:
> case SHADER_OPCODE_TXF_UMS:
> case SHADER_OPCODE_TXF_MCS:
> -  if (op == SHADER_OPCODE_TXF_UMS || op == SHADER_OPCODE_TXF_CMS) {
> +  if (op == SHADER_OPCODE_TXF_UMS ||
> +  op == SHADER_OPCODE_TXF_CMS ||
> +  op == SHADER_OPCODE_TXF_CMS_W) {
>   bld.MOV(retype(sources[length], BRW_REGISTER_TYPE_UD), 
> sample_index);
>   length++;
>}
>  
> -  if (op == SHADER_OPCODE_TXF_CMS) {
> +  if (op == SHADER_OPCODE_TXF_CMS || op == SHADER_OPCODE_TXF_CMS_W) {
>   /* Data from the multisample control surface. */
>   bld.MOV(retype(sources[length], BRW_REGISTER_TYPE_UD), mcs);
>   length++;
> +
> + /* On Gen9+ we'll use lcd2ms_w instead which has two registers for
> +  * the MCS data.
> +  */
> + if (op == SHADER_OPCODE_TXF_CMS_W) {
> +bld.MOV(retype(sources[length], BRW_REGISTER_TYPE_UD),
> +mcs.file == IMM ?
> +mcs :
> +  

Re: [Mesa-dev] [PATCH v5 40/70] nir: Implement __intrinsic_load_ssbo

2015-09-22 Thread Kristian Høgsberg
On Mon, Sep 21, 2015 at 11:03:41AM +0200, Samuel Iglesias Gonsálvez wrote:
> 
> 
> On 19/09/15 00:30, Kristian Høgsberg wrote:
> > On Thu, Sep 10, 2015 at 03:35:56PM +0200, Iago Toral Quiroga wrote:
> >> v2:
> >> - Fix ssbo loads with boolean variables.
> >>
> >> Reviewed-by: Connor Abbott 
> >> ---
> >>  src/glsl/nir/glsl_to_nir.cpp| 80 
> >> -
> >>  src/glsl/nir/nir_intrinsics.h   |  2 +-
> >>  src/glsl/nir/nir_lower_phis_to_scalar.c |  2 +
> >>  3 files changed, 81 insertions(+), 3 deletions(-)
> >>
> >> diff --git a/src/glsl/nir/glsl_to_nir.cpp b/src/glsl/nir/glsl_to_nir.cpp
> >> index 6f1e20a..cb7b196 100644
> >> --- a/src/glsl/nir/glsl_to_nir.cpp
> >> +++ b/src/glsl/nir/glsl_to_nir.cpp
> >> @@ -646,11 +646,14 @@ nir_visitor::visit(ir_call *ir)
> >>   op = nir_intrinsic_image_size;
> >>} else if (strcmp(ir->callee_name(), "__intrinsic_store_ssbo") == 
> >> 0) {
> >>   op = nir_intrinsic_store_ssbo;
> >> +  } else if (strcmp(ir->callee_name(), "__intrinsic_load_ssbo") == 0) 
> >> {
> >> + op = nir_intrinsic_load_ssbo;
> >>} else {
> >>   unreachable("not reached");
> >>}
> >>  
> >>nir_intrinsic_instr *instr = nir_intrinsic_instr_create(shader, op);
> >> +  nir_alu_instr *load_ssbo_compare;
> >>  
> >>switch (op) {
> >>case nir_intrinsic_atomic_counter_read_var:
> >> @@ -776,11 +779,75 @@ nir_visitor::visit(ir_call *ir)
> >>   instr->src[1] = evaluate_rvalue(block);
> >>   break;
> >>}
> >> +  case nir_intrinsic_load_ssbo: {
> >> + exec_node *param = ir->actual_parameters.get_head();
> >> + ir_rvalue *block = ((ir_instruction *)param)->as_rvalue();
> >> +
> >> + param = param->get_next();
> >> + ir_rvalue *offset = ((ir_instruction *)param)->as_rvalue();
> >> +
> >> + /* Check if we need the indirect version */
> >> + ir_constant *const_offset = offset->as_constant();
> >> + if (!const_offset) {
> >> +op = nir_intrinsic_load_ssbo_indirect;
> >> +ralloc_free(instr);
> >> +instr = nir_intrinsic_instr_create(shader, op);
> >> +instr->src[1] = evaluate_rvalue(offset);
> >> +instr->const_index[0] = 0;
> >> + } else {
> >> +instr->const_index[0] = const_offset->value.u[0];
> >> + }
> >> +
> >> + instr->src[0] = evaluate_rvalue(block);
> >> +
> >> + const glsl_type *type = ir->return_deref->var->type;
> >> + instr->num_components = type->vector_elements;
> >> +
> >> + /* Setup destination register */
> >> + nir_ssa_dest_init(&instr->instr, &instr->dest,
> >> +   type->vector_elements, NULL);
> >> +
> >> + /* Insert the created nir instruction now since in the case of 
> >> boolean
> >> +  * result we will need to emit another instruction after it
> >> +  */
> >> + nir_instr_insert_after_cf_list(this->cf_node_list, 
> >> &instr->instr);
> > 
> > I'd prefer moving this insert into the GLSL_TYPE_BOOL condition below...
> > 
> >> + /*
> >> +  * In SSBO/UBO's, a true boolean value is any non-zero value, 
> >> but we
> >> +  * consider a true boolean to be ~0. Fix this up with a != 0
> >> +  * comparison.
> >> +  */
> >> + if (type->base_type == GLSL_TYPE_BOOL) {
> > 
> > ... that is, here...
> > 
> >> +nir_load_const_instr *const_zero =
> >> +   nir_load_const_instr_create(shader, 1);
> >> +const_zero->value.u[0] = 0;
> >> +nir_instr_insert_after_cf_list(this->cf_node_list,
> >> +   &const_zero->instr);
> >> +
> >> +load_ssbo_compare = nir_alu_instr_create(shader, nir_op_ine);
> > 
> > and then, insteed of introducing and using 'load_ssbo_compare' here,
> > re-use 'instr' for the compare instruction...
> > 
> 
> I see why you suggest these changes but I don't like them:
> 
> instr is a pointer to nir_instrinsic_instr, while the compare
> instruction is a nir_alu_instr*, so although we are lucky because some
> fields are at the same offsets for both struct base addresses(*), this
> could change in the future.

Right, they're different types... it's still unfortunate that the ssbo
special case spills out of the nir_intrinsic_load_ssbo case in the
switch statement.

Instead of jumping through hoops to share the one
nir_instr_insert_after_cf_list call between the switch cases, maybe
it's more straightforward to move that call into each case.  That way
the code can explicitly decide how order the intrinsic call and other
helper instructions.  Then set a nir_dest pointer for the
ir->return_deref handling after the switch.  We can initialize the
nir_dest pointers to &instr->dest before the switch for the default
case.

Kristian

> Even if I introduce load_ssbo_compare and, af

Re: [Mesa-dev] [PATCH v5 38/70] i965/nir/vec4: Implement nir_intrinsic_store_ssbo

2015-09-22 Thread Kristian Høgsberg
On Mon, Sep 21, 2015 at 12:13:27PM +0200, Iago Toral wrote:
> On Fri, 2015-09-18 at 13:02 -0700, Kristian Høgsberg wrote:
> > On Thu, Sep 10, 2015 at 03:35:54PM +0200, Iago Toral Quiroga wrote:
> > > ---
> > >  src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 148 
> > > +
> > >  1 file changed, 148 insertions(+)
> > > 
> > > diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp 
> > > b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
> > > index f47b029..450441d 100644
> > > --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
> > > +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp
> > > @@ -23,8 +23,13 @@
> > >  
> > >  #include "brw_nir.h"
> > >  #include "brw_vec4.h"
> > > +#include "brw_vec4_builder.h"
> > > +#include "brw_vec4_surface_builder.h"
> > >  #include "glsl/ir_uniform.h"
> > >  
> > > +using namespace brw;
> > > +using namespace brw::surface_access;
> > > +
> > >  namespace brw {
> > >  
> > >  void
> > > @@ -556,6 +561,149 @@ 
> > > vec4_visitor::nir_emit_intrinsic(nir_intrinsic_instr *instr)
> > >break;
> > > }
> > >  
> > > +   case nir_intrinsic_store_ssbo_indirect:
> > > +  has_indirect = true;
> > > +  /* fallthrough */
> > > +   case nir_intrinsic_store_ssbo: {
> > > +  assert(devinfo->gen >= 7);
> > > +
> > > +  /* Block index */
> > > +  src_reg surf_index;
> > > +  nir_const_value *const_uniform_block =
> > > + nir_src_as_const_value(instr->src[1]);
> > > +  if (const_uniform_block) {
> > > + unsigned index = prog_data->base.binding_table.ubo_start +
> > > +  const_uniform_block->u[0];
> > > + surf_index = src_reg(index);
> > > + brw_mark_surface_used(&prog_data->base, index);
> > > +  } else {
> > > + surf_index = src_reg(this, glsl_type::uint_type);
> > > + emit(ADD(dst_reg(surf_index), get_nir_src(instr->src[1], 1),
> > > +  src_reg(prog_data->base.binding_table.ubo_start)));
> > > + surf_index = emit_uniformize(surf_index);
> > > +
> > > + brw_mark_surface_used(&prog_data->base,
> > > +   prog_data->base.binding_table.ubo_start +
> > > +   shader_prog->NumUniformBlocks - 1);
> > > +  }
> > > +
> > > +  /* Offset */
> > > +  src_reg offset_reg = src_reg(this, glsl_type::uint_type);
> > > +  unsigned const_offset_bytes = 0;
> > > +  if (has_indirect) {
> > > + emit(MOV(dst_reg(offset_reg), get_nir_src(instr->src[2], 1)));
> > > +  } else {
> > > + const_offset_bytes = instr->const_index[0];
> > > + emit(MOV(dst_reg(offset_reg), src_reg(const_offset_bytes)));
> > > +  }
> > > +
> > > +  /* Value */
> > > +  src_reg val_reg = get_nir_src(instr->src[0], 4);
> > > +
> > > +  /* Writemask */
> > > +  unsigned write_mask = instr->const_index[1];
> > > +
> > > +  /* IvyBridge does not have a native SIMD4x2 untyped write message 
> > > so untyped
> > > +   * writes will use SIMD8 mode. In order to hide this and keep 
> > > symmetry across
> > > +   * typed and untyped messages and across hardware platforms, the
> > > +   * current implementation of the untyped messages will 
> > > transparently convert
> > > +   * the SIMD4x2 payload into an equivalent SIMD8 payload by 
> > > transposing it
> > > +   * and enabling only channel X on the SEND instruction.
> > > +   *
> > > +   * The above, works well for full vector writes, but not for 
> > > partial writes
> > > +   * where we want to write some channels and not others, like when 
> > > we have
> > > +   * code such as v.xyw = vec3(1,2,4). Because the untyped write 
> > > messages are
> > > +   * quite restrictive with regards to the channel enables we can 
> > > configure in
> > > +   * the message descriptor (not all combinations are allowed) we 
> > > cannot simply
> > > +   * implement these scenarios with a single message while keeping 
> > > the
> > > +   * aforementioned symmetry in the implementation. For now we de 
> > > decided that
> > > +   * it is better to keep the symmetry to reduce complexity, so in 
> > > situations
> > > +   * such as the one described we end up emitting two untyped write 
> > > messages
> > > +   * (one for xy and another for w).
> > > +   *
> > > +   * The code below packs consecutive channels into a single write 
> > > message,
> > > +   * detects gaps in the vector write and if needed, sends a second 
> > > message
> > > +   * with the remaining channels. If in the future we decide that we 
> > > want to
> > > +   * emit a single message at the expense of losing the symmetry in 
> > > the
> > > +   * implementation we can:
> > > +   *
> > > +   * 1) For IvyBridge: Only use the red channel of the untyped write 
> > > SIMD8
> > > +   *message payload. In this mode we can write up to 8 offsets 
> > > and dwords
> 

[Mesa-dev] [Bug 92072] Wine breakage since d082c5324 (st/mesa: don't call st_validate_state in BlitFramebuffer)

2015-09-22 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=92072

Ilia Mirkin  changed:

   What|Removed |Added

Summary|Wine breackage  since   |Wine breakage since
   |504903b827604f1a630a335d142 |d082c5324 (st/mesa: don't
   |31f88c2cf36be (st/mesa: |call st_validate_state in
   |don't call  |BlitFramebuffer)
   |st_validate_state in|
   |BlitFramebuffer)|

--- Comment #2 from Ilia Mirkin  ---
(Updating commit id in subject to upstream commit, not stable branch)

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [Bug 92072] Wine breackage since 504903b827604f1a630a335d14231f88c2cf36be (st/mesa: don't call st_validate_state in BlitFramebuffer)

2015-09-22 Thread bugzilla-daemon
https://bugs.freedesktop.org/show_bug.cgi?id=92072

Ilia Mirkin  changed:

   What|Removed |Added

  Component|Drivers/DRI/nouveau |Mesa core
   Assignee|nouveau@lists.freedesktop.o |mesa-dev@lists.freedesktop.
   |rg  |org
 QA Contact|nouveau@lists.freedesktop.o |mesa-dev@lists.freedesktop.
   |rg  |org

--- Comment #1 from Ilia Mirkin  ---
This appears to affect all gallium drivers -- Glenn confirmed this was also
showing black on r600. i965 is unaffected.

-- 
You are receiving this mail because:
You are the QA Contact for the bug.
You are the assignee for the bug.
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v5] i965/vec4: refactor brw_vec4_copy_propagation.

2015-09-22 Thread Alejandro Piñeiro


On 22/09/15 19:27, Jason Ekstrand wrote:
> On Tue, Sep 22, 2015 at 9:17 AM, Alejandro Piñeiro  
> wrote:
>>
>> On 22/09/15 18:09, Jason Ekstrand wrote:
>>> On Tue, Sep 22, 2015 at 8:06 AM, Alejandro Piñeiro  
>>> wrote:
 Now it is more similar to brw_fs_copy_propagation, with three
 clear stages:

 1) Build up the value we are propagating as if it were the source of a
 single MOV:
 2) Check that we can propagate that value
 3) Build the final value

 Previously everything was somewhat messed up, making the
 implementation on some specific cases, like knowing if you can
 propagate from a previous instruction even with type mismatches even
 messier (for example, with the need of maintaining more of one
 has_source_modifiers). The refactoring clears stuff, and gives
 support to this mentioned use case without doing anything extra
 (for example, only one has_source_modifiers is used).

 Shader-db results for vec4 programs on Haswell:
 total instructions in shared programs: 1683842 -> 1669037 (-0.88%)
 instructions in affected programs: 739837 -> 725032 (-2.00%)
 helped:6237
 HURT:  0

 v2: using 'arg' index to get the from inst was wrong, as pointed
 by Jason Ekstrand
 v3: rebased against last change on the previous patch of the series
 v4: don't need to track instructions on struct copy_entry, as we
 only set the source on a direct copy, as pointed by Jason
 v5: change the approach for a refactoring, as pointed by Jason
 ---

 Just the refactoring suggested at v4 review is enough to get the use
 case was dealing with at the beginning solved. It would be hard
 to split this patch in two (refactoring+solve issue).

 Tested with piglit without any regression. Needed to update shader-db
 numbers after last Matt Turner's improvements. I think that the fact
 of going from 30 HURT (v4) to 0 (v5) is Matt's merit.

 I added comments to clearly mark the different stages mentioned at
 v4 review (1, 2 and 3), to make the review easier, but if the patch
 gets approved, they can go away.
>>> I'd like to keep them only without the "1)", "2)", etc.
>> Ok.
>>
 The change itself doesn't explain clearly how it got solved, but the
 resulting code is clearer. Thanks for the thoroughly review.

  .../drivers/dri/i965/brw_vec4_copy_propagation.cpp | 28 
 ++
  1 file changed, 18 insertions(+), 10 deletions(-)

 diff --git a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp 
 b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
 index 1522eea..8a0bd72 100644
 --- a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
 +++ b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
 @@ -265,6 +265,9 @@ try_copy_propagate(const struct brw_device_info 
 *devinfo,
 vec4_instruction *inst,
 int arg, struct copy_entry *entry)
  {
 +   /* 1) Build up the value we are propagating as if it were the source 
 of a
 +* single MOV
 +*/
 /* For constant propagation, we only handle the same constant
  * across all 4 channels.  Some day, we should handle the 8-bit
  * float vector format, which would let us constant propagate
>>> Can we please kill this comment while we're here?  It seems to have
>>> gotten copied+pasted from constant propagation and makes no sense in
>>> this context.
>> Ok.
>>
>>> With that, and the updated comments,
>>>
>>> Reviewed-by: Jason Ekstrand 
>>>
>>> I'm also running it through our CI system as I type.  I'll let you
>>> know how that goes when I get to my office in about an hour.
>> Ok. I will wait before pushing. I should be still around.
>>
>>> Do you have push access yet?  If not, I can make the trivial changes
>>> and push this for you.
>>> --Jason
>> Yes, I have push access, so I can make the changes myself. As mentioned
>> I will wait for the CI system outcome.
> CI gives the green light.  Push it!  Thanks for all your hard work and 
> patience!
> --Jason

Pushed! Thanks!

-- 
Alejandro Piñeiro (apinhe...@igalia.com)

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] nir: Always print non-identity swizzles.

2015-09-22 Thread Matt Turner
Previously we would not print a swizzle on ssa_52 when only its .x
component is used (as seen in the definition of ssa_53):

   vec3 ssa_52 = fadd ssa_51, ssa_51
   vec1 ssa_53 = flog2 ssa_52
   vec1 ssa_54 = flog2 ssa_52.y
   vec1 ssa_55 = flog2 ssa_52.z

But this makes the interpretation of the RHS of the definition difficult
to understand and dependent on the size of the LHS. Just print swizzles
when they are not the identity swizzle, so the previous example is now
printed as:

   vec3 ssa_52 = fadd ssa_51.xyz, ssa_51.xyz
   vec1 ssa_53 = flog2 ssa_52.x
   vec1 ssa_54 = flog2 ssa_52.y
   vec1 ssa_55 = flog2 ssa_52.z
---
 src/glsl/nir/nir_print.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/src/glsl/nir/nir_print.c b/src/glsl/nir/nir_print.c
index a19aa8b..af78b0a 100644
--- a/src/glsl/nir/nir_print.c
+++ b/src/glsl/nir/nir_print.c
@@ -158,17 +158,25 @@ print_alu_src(nir_alu_instr *instr, unsigned src, 
print_state *state)
print_src(&instr->src[src].src, state);
 
bool print_swizzle = false;
+   unsigned used_channels = 0;
+
for (unsigned i = 0; i < 4; i++) {
   if (!nir_alu_instr_channel_used(instr, src, i))
  continue;
 
+  used_channels++;
+
   if (instr->src[src].swizzle[i] != i) {
  print_swizzle = true;
  break;
   }
}
 
-   if (print_swizzle) {
+   unsigned live_channels = instr->src[src].src.is_ssa
+  ? instr->src[src].src.ssa->num_components
+  : instr->src[src].src.reg.reg->num_components;
+
+   if (print_swizzle || used_channels != live_channels) {
   fprintf(fp, ".");
   for (unsigned i = 0; i < 4; i++) {
  if (!nir_alu_instr_channel_used(instr, src, i))
-- 
2.4.6

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v5] i965/vec4: refactor brw_vec4_copy_propagation.

2015-09-22 Thread Jason Ekstrand
On Tue, Sep 22, 2015 at 9:17 AM, Alejandro Piñeiro  wrote:
>
>
> On 22/09/15 18:09, Jason Ekstrand wrote:
>> On Tue, Sep 22, 2015 at 8:06 AM, Alejandro Piñeiro  
>> wrote:
>>> Now it is more similar to brw_fs_copy_propagation, with three
>>> clear stages:
>>>
>>> 1) Build up the value we are propagating as if it were the source of a
>>> single MOV:
>>> 2) Check that we can propagate that value
>>> 3) Build the final value
>>>
>>> Previously everything was somewhat messed up, making the
>>> implementation on some specific cases, like knowing if you can
>>> propagate from a previous instruction even with type mismatches even
>>> messier (for example, with the need of maintaining more of one
>>> has_source_modifiers). The refactoring clears stuff, and gives
>>> support to this mentioned use case without doing anything extra
>>> (for example, only one has_source_modifiers is used).
>>>
>>> Shader-db results for vec4 programs on Haswell:
>>> total instructions in shared programs: 1683842 -> 1669037 (-0.88%)
>>> instructions in affected programs: 739837 -> 725032 (-2.00%)
>>> helped:6237
>>> HURT:  0
>>>
>>> v2: using 'arg' index to get the from inst was wrong, as pointed
>>> by Jason Ekstrand
>>> v3: rebased against last change on the previous patch of the series
>>> v4: don't need to track instructions on struct copy_entry, as we
>>> only set the source on a direct copy, as pointed by Jason
>>> v5: change the approach for a refactoring, as pointed by Jason
>>> ---
>>>
>>> Just the refactoring suggested at v4 review is enough to get the use
>>> case was dealing with at the beginning solved. It would be hard
>>> to split this patch in two (refactoring+solve issue).
>>>
>>> Tested with piglit without any regression. Needed to update shader-db
>>> numbers after last Matt Turner's improvements. I think that the fact
>>> of going from 30 HURT (v4) to 0 (v5) is Matt's merit.
>>>
>>> I added comments to clearly mark the different stages mentioned at
>>> v4 review (1, 2 and 3), to make the review easier, but if the patch
>>> gets approved, they can go away.
>> I'd like to keep them only without the "1)", "2)", etc.
>
> Ok.
>
>>> The change itself doesn't explain clearly how it got solved, but the
>>> resulting code is clearer. Thanks for the thoroughly review.
>>>
>>>  .../drivers/dri/i965/brw_vec4_copy_propagation.cpp | 28 
>>> ++
>>>  1 file changed, 18 insertions(+), 10 deletions(-)
>>>
>>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp 
>>> b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
>>> index 1522eea..8a0bd72 100644
>>> --- a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
>>> +++ b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
>>> @@ -265,6 +265,9 @@ try_copy_propagate(const struct brw_device_info 
>>> *devinfo,
>>> vec4_instruction *inst,
>>> int arg, struct copy_entry *entry)
>>>  {
>>> +   /* 1) Build up the value we are propagating as if it were the source of 
>>> a
>>> +* single MOV
>>> +*/
>>> /* For constant propagation, we only handle the same constant
>>>  * across all 4 channels.  Some day, we should handle the 8-bit
>>>  * float vector format, which would let us constant propagate
>> Can we please kill this comment while we're here?  It seems to have
>> gotten copied+pasted from constant propagation and makes no sense in
>> this context.
>
> Ok.
>
>>
>> With that, and the updated comments,
>>
>> Reviewed-by: Jason Ekstrand 
>>
>> I'm also running it through our CI system as I type.  I'll let you
>> know how that goes when I get to my office in about an hour.
>
> Ok. I will wait before pushing. I should be still around.
>
>> Do you have push access yet?  If not, I can make the trivial changes
>> and push this for you.
>> --Jason
>
> Yes, I have push access, so I can make the changes myself. As mentioned
> I will wait for the CI system outcome.

CI gives the green light.  Push it!  Thanks for all your hard work and patience!
--Jason

>>
>>> @@ -291,9 +294,9 @@ try_copy_propagate(const struct brw_device_info 
>>> *devinfo,
>>> for (int i = 0; i < 4; i++) {
>>>s[i] = BRW_GET_SWZ(entry->value[i]->swizzle, i);
>>> }
>>> -   value.swizzle = brw_compose_swizzle(inst->src[arg].swizzle,
>>> -   BRW_SWIZZLE4(s[0], s[1], s[2], 
>>> s[3]));
>>> +   value.swizzle = BRW_SWIZZLE4(s[0], s[1], s[2], s[3]);
>>>
>>> +   /* 2) Check that we can propagate that value */
>>> if (value.file != UNIFORM &&
>>> value.file != GRF &&
>>> value.file != ATTR)
>>> @@ -304,13 +307,6 @@ try_copy_propagate(const struct brw_device_info 
>>> *devinfo,
>>>return false;
>>> }
>>>
>>> -   if (inst->src[arg].abs) {
>>> -  value.negate = false;
>>> -  value.abs = true;
>>> -   }
>>> -   if (inst->src[arg].negate)
>>> -  value.negate = !value.ne

[Mesa-dev] [PATCH v2] i965/vec4: Don't coalesce regs in Gen6 MATH ops if reswizzle/writemask needed

2015-09-22 Thread Antia Puentes
Gen6 MATH instructions can not execute in align16 mode, so swizzles or
writemasking are not allowed.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92033
---

I have tried to find an example where the writemask check is strictly needed
in order to avoid an incorrect register coalescing, but I have failed.

If I have something like this in my shader:

  in vec4 attr_pos;

  void main() {
vec4 a;
a.xy = sin(attr_pos);
..
  }

It is translated into the following vec4 instructions:
  sin vgrf4.0:F, vgrf3.xyzw:F
  mov vgrf0.0.xy:F, vgrf4.xyzw:F <-- (added by emit_math to fix the 
writemasking issue)

And converted by the opt_reduce_swizzle optimization into:
  sin vgrf4.0:F, vgrf3.xyzw:F
  mov vgrf0.0.xy:F, vgrf4.xyyy:F

The swizzle in the MOV instruction is not the identity anymore, so the method 
"can_reswizzle"
in opt_register_coalesce will return FALSE because of the swizzle-check that my 
first
version of the patch added.

Out of curiosity, I disabled the opt_reduce_swizzle optimization to see what 
happens.
Still "can_reswizzle" returns FALSE preventing the incorrect register 
coalescing,
because the math function sets more things than the MOV instruction; i.e.
the check "(dst.writemask & ~swizzle_mask) is positive.

Although I have not found an example where the writemask-check is crucial, that 
does not neccessarily
mean that it does not exists. I am sending the second version of the patch in 
case you prefer to have
that check in place.

 src/mesa/drivers/dri/i965/brw_ir_vec4.h |  3 ++-
 src/mesa/drivers/dri/i965/brw_vec4.cpp  | 11 +--
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_ir_vec4.h 
b/src/mesa/drivers/dri/i965/brw_ir_vec4.h
index 966a410..a48bb68 100644
--- a/src/mesa/drivers/dri/i965/brw_ir_vec4.h
+++ b/src/mesa/drivers/dri/i965/brw_ir_vec4.h
@@ -175,7 +175,8 @@ public:
 
bool is_send_from_grf();
unsigned regs_read(unsigned arg) const;
-   bool can_reswizzle(int dst_writemask, int swizzle, int swizzle_mask);
+   bool can_reswizzle(const struct brw_device_info *devinfo, int dst_writemask,
+  int swizzle, int swizzle_mask);
void reswizzle(int dst_writemask, int swizzle);
bool can_do_source_mods(const struct brw_device_info *devinfo);
 
diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4.cpp
index ed49cd3..ae695f1 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp
@@ -941,10 +941,17 @@ vec4_visitor::opt_set_dependency_control()
 }
 
 bool
-vec4_instruction::can_reswizzle(int dst_writemask,
+vec4_instruction::can_reswizzle(const struct brw_device_info *devinfo,
+int dst_writemask,
 int swizzle,
 int swizzle_mask)
 {
+   /* Gen6 MATH instructions can not execute in align16 mode, so swizzles
+* or writemasking are not allowed. */
+   if (devinfo->gen == 6 && is_math() &&
+   (swizzle != BRW_SWIZZLE_XYZW || dst_writemask != WRITEMASK_XYZW))
+  return false;
+
/* If this instruction sets anything not referenced by swizzle, then we'd
 * totally break it when we reswizzle.
 */
@@ -1077,7 +1084,7 @@ vec4_visitor::opt_register_coalesce()
break;
 
 /* If we can't handle the swizzle, bail. */
-if (!scan_inst->can_reswizzle(inst->dst.writemask,
+if (!scan_inst->can_reswizzle(devinfo, inst->dst.writemask,
   inst->src[0].swizzle,
   chans_needed)) {
break;
-- 
2.1.0

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH v5] i965/vec4: refactor brw_vec4_copy_propagation.

2015-09-22 Thread Alejandro Piñeiro


On 22/09/15 18:09, Jason Ekstrand wrote:
> On Tue, Sep 22, 2015 at 8:06 AM, Alejandro Piñeiro  
> wrote:
>> Now it is more similar to brw_fs_copy_propagation, with three
>> clear stages:
>>
>> 1) Build up the value we are propagating as if it were the source of a
>> single MOV:
>> 2) Check that we can propagate that value
>> 3) Build the final value
>>
>> Previously everything was somewhat messed up, making the
>> implementation on some specific cases, like knowing if you can
>> propagate from a previous instruction even with type mismatches even
>> messier (for example, with the need of maintaining more of one
>> has_source_modifiers). The refactoring clears stuff, and gives
>> support to this mentioned use case without doing anything extra
>> (for example, only one has_source_modifiers is used).
>>
>> Shader-db results for vec4 programs on Haswell:
>> total instructions in shared programs: 1683842 -> 1669037 (-0.88%)
>> instructions in affected programs: 739837 -> 725032 (-2.00%)
>> helped:6237
>> HURT:  0
>>
>> v2: using 'arg' index to get the from inst was wrong, as pointed
>> by Jason Ekstrand
>> v3: rebased against last change on the previous patch of the series
>> v4: don't need to track instructions on struct copy_entry, as we
>> only set the source on a direct copy, as pointed by Jason
>> v5: change the approach for a refactoring, as pointed by Jason
>> ---
>>
>> Just the refactoring suggested at v4 review is enough to get the use
>> case was dealing with at the beginning solved. It would be hard
>> to split this patch in two (refactoring+solve issue).
>>
>> Tested with piglit without any regression. Needed to update shader-db
>> numbers after last Matt Turner's improvements. I think that the fact
>> of going from 30 HURT (v4) to 0 (v5) is Matt's merit.
>>
>> I added comments to clearly mark the different stages mentioned at
>> v4 review (1, 2 and 3), to make the review easier, but if the patch
>> gets approved, they can go away.
> I'd like to keep them only without the "1)", "2)", etc.

Ok.

>> The change itself doesn't explain clearly how it got solved, but the
>> resulting code is clearer. Thanks for the thoroughly review.
>>
>>  .../drivers/dri/i965/brw_vec4_copy_propagation.cpp | 28 
>> ++
>>  1 file changed, 18 insertions(+), 10 deletions(-)
>>
>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp 
>> b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
>> index 1522eea..8a0bd72 100644
>> --- a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
>> +++ b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
>> @@ -265,6 +265,9 @@ try_copy_propagate(const struct brw_device_info *devinfo,
>> vec4_instruction *inst,
>> int arg, struct copy_entry *entry)
>>  {
>> +   /* 1) Build up the value we are propagating as if it were the source of a
>> +* single MOV
>> +*/
>> /* For constant propagation, we only handle the same constant
>>  * across all 4 channels.  Some day, we should handle the 8-bit
>>  * float vector format, which would let us constant propagate
> Can we please kill this comment while we're here?  It seems to have
> gotten copied+pasted from constant propagation and makes no sense in
> this context.

Ok.

>
> With that, and the updated comments,
>
> Reviewed-by: Jason Ekstrand 
>
> I'm also running it through our CI system as I type.  I'll let you
> know how that goes when I get to my office in about an hour.

Ok. I will wait before pushing. I should be still around.

> Do you have push access yet?  If not, I can make the trivial changes
> and push this for you.
> --Jason

Yes, I have push access, so I can make the changes myself. As mentioned
I will wait for the CI system outcome.

>
>> @@ -291,9 +294,9 @@ try_copy_propagate(const struct brw_device_info *devinfo,
>> for (int i = 0; i < 4; i++) {
>>s[i] = BRW_GET_SWZ(entry->value[i]->swizzle, i);
>> }
>> -   value.swizzle = brw_compose_swizzle(inst->src[arg].swizzle,
>> -   BRW_SWIZZLE4(s[0], s[1], s[2], 
>> s[3]));
>> +   value.swizzle = BRW_SWIZZLE4(s[0], s[1], s[2], s[3]);
>>
>> +   /* 2) Check that we can propagate that value */
>> if (value.file != UNIFORM &&
>> value.file != GRF &&
>> value.file != ATTR)
>> @@ -304,13 +307,6 @@ try_copy_propagate(const struct brw_device_info 
>> *devinfo,
>>return false;
>> }
>>
>> -   if (inst->src[arg].abs) {
>> -  value.negate = false;
>> -  value.abs = true;
>> -   }
>> -   if (inst->src[arg].negate)
>> -  value.negate = !value.negate;
>> -
>> bool has_source_modifiers = value.negate || value.abs;
>>
>> /* gen6 math and gen7+ SENDs from GRFs ignore source modifiers on
>> @@ -376,6 +372,16 @@ try_copy_propagate(const struct brw_device_info 
>> *devinfo,
>>}
>> }
>>
>> +   /* 3) Build the final v

Re: [Mesa-dev] [PATCH v5] i965/vec4: refactor brw_vec4_copy_propagation.

2015-09-22 Thread Jason Ekstrand
On Tue, Sep 22, 2015 at 8:06 AM, Alejandro Piñeiro  wrote:
> Now it is more similar to brw_fs_copy_propagation, with three
> clear stages:
>
> 1) Build up the value we are propagating as if it were the source of a
> single MOV:
> 2) Check that we can propagate that value
> 3) Build the final value
>
> Previously everything was somewhat messed up, making the
> implementation on some specific cases, like knowing if you can
> propagate from a previous instruction even with type mismatches even
> messier (for example, with the need of maintaining more of one
> has_source_modifiers). The refactoring clears stuff, and gives
> support to this mentioned use case without doing anything extra
> (for example, only one has_source_modifiers is used).
>
> Shader-db results for vec4 programs on Haswell:
> total instructions in shared programs: 1683842 -> 1669037 (-0.88%)
> instructions in affected programs: 739837 -> 725032 (-2.00%)
> helped:6237
> HURT:  0
>
> v2: using 'arg' index to get the from inst was wrong, as pointed
> by Jason Ekstrand
> v3: rebased against last change on the previous patch of the series
> v4: don't need to track instructions on struct copy_entry, as we
> only set the source on a direct copy, as pointed by Jason
> v5: change the approach for a refactoring, as pointed by Jason
> ---
>
> Just the refactoring suggested at v4 review is enough to get the use
> case was dealing with at the beginning solved. It would be hard
> to split this patch in two (refactoring+solve issue).
>
> Tested with piglit without any regression. Needed to update shader-db
> numbers after last Matt Turner's improvements. I think that the fact
> of going from 30 HURT (v4) to 0 (v5) is Matt's merit.
>
> I added comments to clearly mark the different stages mentioned at
> v4 review (1, 2 and 3), to make the review easier, but if the patch
> gets approved, they can go away.

I'd like to keep them only without the "1)", "2)", etc.

> The change itself doesn't explain clearly how it got solved, but the
> resulting code is clearer. Thanks for the thoroughly review.
>
>  .../drivers/dri/i965/brw_vec4_copy_propagation.cpp | 28 
> ++
>  1 file changed, 18 insertions(+), 10 deletions(-)
>
> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp 
> b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
> index 1522eea..8a0bd72 100644
> --- a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
> +++ b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
> @@ -265,6 +265,9 @@ try_copy_propagate(const struct brw_device_info *devinfo,
> vec4_instruction *inst,
> int arg, struct copy_entry *entry)
>  {
> +   /* 1) Build up the value we are propagating as if it were the source of a
> +* single MOV
> +*/
> /* For constant propagation, we only handle the same constant
>  * across all 4 channels.  Some day, we should handle the 8-bit
>  * float vector format, which would let us constant propagate

Can we please kill this comment while we're here?  It seems to have
gotten copied+pasted from constant propagation and makes no sense in
this context.

With that, and the updated comments,

Reviewed-by: Jason Ekstrand 

I'm also running it through our CI system as I type.  I'll let you
know how that goes when I get to my office in about an hour.

Do you have push access yet?  If not, I can make the trivial changes
and push this for you.
--Jason

> @@ -291,9 +294,9 @@ try_copy_propagate(const struct brw_device_info *devinfo,
> for (int i = 0; i < 4; i++) {
>s[i] = BRW_GET_SWZ(entry->value[i]->swizzle, i);
> }
> -   value.swizzle = brw_compose_swizzle(inst->src[arg].swizzle,
> -   BRW_SWIZZLE4(s[0], s[1], s[2], s[3]));
> +   value.swizzle = BRW_SWIZZLE4(s[0], s[1], s[2], s[3]);
>
> +   /* 2) Check that we can propagate that value */
> if (value.file != UNIFORM &&
> value.file != GRF &&
> value.file != ATTR)
> @@ -304,13 +307,6 @@ try_copy_propagate(const struct brw_device_info *devinfo,
>return false;
> }
>
> -   if (inst->src[arg].abs) {
> -  value.negate = false;
> -  value.abs = true;
> -   }
> -   if (inst->src[arg].negate)
> -  value.negate = !value.negate;
> -
> bool has_source_modifiers = value.negate || value.abs;
>
> /* gen6 math and gen7+ SENDs from GRFs ignore source modifiers on
> @@ -376,6 +372,16 @@ try_copy_propagate(const struct brw_device_info *devinfo,
>}
> }
>
> +   /* 3) Build the final value */
> +   if (inst->src[arg].abs) {
> +  value.negate = false;
> +  value.abs = true;
> +   }
> +   if (inst->src[arg].negate)
> +  value.negate = !value.negate;
> +
> +   value.swizzle = brw_compose_swizzle(inst->src[arg].swizzle,
> +   value.swizzle);
> if (has_source_modifiers &&
> value.type 

Re: [Mesa-dev] [PATCH 1/4] mesa: const-qualify buffer_object_subdata_range_good() bufObj parameter

2015-09-22 Thread Matt Turner
The series is

Reviewed-by: Matt Turner 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 4/4] st/mesa: remove st_bind_framebuffer()

2015-09-22 Thread Brian Paul
The function was a no-op and if the ctx->Driver.BindFramebuffer pointer
is null, Mesa won't try to use it.
---
 src/mesa/state_tracker/st_cb_fbo.c | 12 
 1 file changed, 12 deletions(-)

diff --git a/src/mesa/state_tracker/st_cb_fbo.c 
b/src/mesa/state_tracker/st_cb_fbo.c
index 5707590..9d06a23 100644
--- a/src/mesa/state_tracker/st_cb_fbo.c
+++ b/src/mesa/state_tracker/st_cb_fbo.c
@@ -388,17 +388,6 @@ st_new_renderbuffer_fb(enum pipe_format format, int 
samples, boolean sw)
 
 
 /**
- * Called via ctx->Driver.BindFramebufferEXT().
- */
-static void
-st_bind_framebuffer(struct gl_context *ctx, GLenum target,
-struct gl_framebuffer *fb, struct gl_framebuffer *fbread)
-{
-   /* no-op */
-}
-
-
-/**
  * Create or update the pipe_surface of a FBO renderbuffer.
  * This is usually called after st_finalize_texture.
  */
@@ -839,7 +828,6 @@ void st_init_fbo_functions(struct dd_function_table 
*functions)
 {
functions->NewFramebuffer = st_new_framebuffer;
functions->NewRenderbuffer = st_new_renderbuffer;
-   functions->BindFramebuffer = st_bind_framebuffer;
functions->FramebufferRenderbuffer = _mesa_FramebufferRenderbuffer_sw;
functions->RenderTexture = st_render_texture;
functions->FinishRenderTexture = st_finish_render_texture;
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/4] mesa: const-qualify _mesa_base_tex_format() ctx param

2015-09-22 Thread Brian Paul
---
 src/mesa/main/teximage.c | 2 +-
 src/mesa/main/teximage.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/mesa/main/teximage.c b/src/mesa/main/teximage.c
index 8913a72..9bc176a 100644
--- a/src/mesa/main/teximage.c
+++ b/src/mesa/main/teximage.c
@@ -130,7 +130,7 @@ adjust_for_oes_float_texture(GLenum format, GLenum type)
  * texture format and env mode determine the arithmetic used.
  */
 GLint
-_mesa_base_tex_format( struct gl_context *ctx, GLint internalFormat )
+_mesa_base_tex_format(const struct gl_context *ctx, GLint internalFormat)
 {
switch (internalFormat) {
case GL_ALPHA:
diff --git a/src/mesa/main/teximage.h b/src/mesa/main/teximage.h
index a4736b5..a434720 100644
--- a/src/mesa/main/teximage.h
+++ b/src/mesa/main/teximage.h
@@ -60,7 +60,7 @@ _mesa_is_zero_size_texture(const struct gl_texture_image 
*texImage)
 /*@{*/
 
 extern GLint
-_mesa_base_tex_format( struct gl_context *ctx, GLint internalFormat );
+_mesa_base_tex_format(const struct gl_context *ctx, GLint internalFormat);
 
 
 extern GLboolean
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/4] mesa: const-qualify buffer_object_subdata_range_good() bufObj parameter

2015-09-22 Thread Brian Paul
---
 src/mesa/main/bufferobj.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mesa/main/bufferobj.c b/src/mesa/main/bufferobj.c
index 23da83e..f87cea9 100644
--- a/src/mesa/main/bufferobj.c
+++ b/src/mesa/main/bufferobj.c
@@ -246,7 +246,7 @@ bufferobj_range_mapped(const struct gl_buffer_object *obj,
  */
 static bool
 buffer_object_subdata_range_good(struct gl_context *ctx,
- struct gl_buffer_object *bufObj,
+ const struct gl_buffer_object *bufObj,
  GLintptr offset, GLsizeiptr size,
  bool mappedRange, const char *caller)
 {
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 3/4] mesa: const-qualify _mesa_is_legal_tex_storage_format ctx param

2015-09-22 Thread Brian Paul
---
 src/mesa/main/texstorage.c | 3 ++-
 src/mesa/main/texstorage.h | 3 ++-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/src/mesa/main/texstorage.c b/src/mesa/main/texstorage.c
index a29175e..9fd969f 100644
--- a/src/mesa/main/texstorage.c
+++ b/src/mesa/main/texstorage.c
@@ -202,7 +202,8 @@ update_fbo_texture(struct gl_context *ctx, struct 
gl_texture_object *texObj)
 
 
 GLboolean
-_mesa_is_legal_tex_storage_format(struct gl_context *ctx, GLenum 
internalformat)
+_mesa_is_legal_tex_storage_format(const struct gl_context *ctx,
+  GLenum internalformat)
 {
/* check internal format - note that only sized formats are allowed */
switch (internalformat) {
diff --git a/src/mesa/main/texstorage.h b/src/mesa/main/texstorage.h
index 033ecb7..e80a9ff 100644
--- a/src/mesa/main/texstorage.h
+++ b/src/mesa/main/texstorage.h
@@ -111,7 +111,8 @@ _mesa_TextureStorage3DEXT(GLuint texture, GLenum target, 
GLsizei levels,
   GLsizei width, GLsizei height, GLsizei depth);
 
 extern GLboolean
-_mesa_is_legal_tex_storage_format(struct gl_context *ctx, GLenum 
internalformat);
+_mesa_is_legal_tex_storage_format(const struct gl_context *ctx,
+  GLenum internalformat);
 
 extern GLboolean
 _mesa_AllocTextureStorage_sw(struct gl_context *ctx,
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH v5] i965/vec4: refactor brw_vec4_copy_propagation.

2015-09-22 Thread Alejandro Piñeiro
Now it is more similar to brw_fs_copy_propagation, with three
clear stages:

1) Build up the value we are propagating as if it were the source of a
single MOV:
2) Check that we can propagate that value
3) Build the final value

Previously everything was somewhat messed up, making the
implementation on some specific cases, like knowing if you can
propagate from a previous instruction even with type mismatches even
messier (for example, with the need of maintaining more of one
has_source_modifiers). The refactoring clears stuff, and gives
support to this mentioned use case without doing anything extra
(for example, only one has_source_modifiers is used).

Shader-db results for vec4 programs on Haswell:
total instructions in shared programs: 1683842 -> 1669037 (-0.88%)
instructions in affected programs: 739837 -> 725032 (-2.00%)
helped:6237
HURT:  0

v2: using 'arg' index to get the from inst was wrong, as pointed
by Jason Ekstrand
v3: rebased against last change on the previous patch of the series
v4: don't need to track instructions on struct copy_entry, as we
only set the source on a direct copy, as pointed by Jason
v5: change the approach for a refactoring, as pointed by Jason
---

Just the refactoring suggested at v4 review is enough to get the use
case was dealing with at the beginning solved. It would be hard
to split this patch in two (refactoring+solve issue).

Tested with piglit without any regression. Needed to update shader-db
numbers after last Matt Turner's improvements. I think that the fact
of going from 30 HURT (v4) to 0 (v5) is Matt's merit. 

I added comments to clearly mark the different stages mentioned at
v4 review (1, 2 and 3), to make the review easier, but if the patch
gets approved, they can go away.

The change itself doesn't explain clearly how it got solved, but the
resulting code is clearer. Thanks for the thoroughly review.

 .../drivers/dri/i965/brw_vec4_copy_propagation.cpp | 28 ++
 1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
index 1522eea..8a0bd72 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp
@@ -265,6 +265,9 @@ try_copy_propagate(const struct brw_device_info *devinfo,
vec4_instruction *inst,
int arg, struct copy_entry *entry)
 {
+   /* 1) Build up the value we are propagating as if it were the source of a
+* single MOV
+*/
/* For constant propagation, we only handle the same constant
 * across all 4 channels.  Some day, we should handle the 8-bit
 * float vector format, which would let us constant propagate
@@ -291,9 +294,9 @@ try_copy_propagate(const struct brw_device_info *devinfo,
for (int i = 0; i < 4; i++) {
   s[i] = BRW_GET_SWZ(entry->value[i]->swizzle, i);
}
-   value.swizzle = brw_compose_swizzle(inst->src[arg].swizzle,
-   BRW_SWIZZLE4(s[0], s[1], s[2], s[3]));
+   value.swizzle = BRW_SWIZZLE4(s[0], s[1], s[2], s[3]);
 
+   /* 2) Check that we can propagate that value */
if (value.file != UNIFORM &&
value.file != GRF &&
value.file != ATTR)
@@ -304,13 +307,6 @@ try_copy_propagate(const struct brw_device_info *devinfo,
   return false;
}
 
-   if (inst->src[arg].abs) {
-  value.negate = false;
-  value.abs = true;
-   }
-   if (inst->src[arg].negate)
-  value.negate = !value.negate;
-
bool has_source_modifiers = value.negate || value.abs;
 
/* gen6 math and gen7+ SENDs from GRFs ignore source modifiers on
@@ -376,6 +372,16 @@ try_copy_propagate(const struct brw_device_info *devinfo,
   }
}
 
+   /* 3) Build the final value */
+   if (inst->src[arg].abs) {
+  value.negate = false;
+  value.abs = true;
+   }
+   if (inst->src[arg].negate)
+  value.negate = !value.negate;
+
+   value.swizzle = brw_compose_swizzle(inst->src[arg].swizzle,
+   value.swizzle);
if (has_source_modifiers &&
value.type != inst->src[arg].type) {
   /* We are propagating source modifiers from a MOV with a different
@@ -387,8 +393,10 @@ try_copy_propagate(const struct brw_device_info *devinfo,
  inst->src[i].type = value.type;
   }
   inst->dst.type = value.type;
-   } else
+   } else {
   value.type = inst->src[arg].type;
+   }
+
inst->src[arg] = value;
return true;
 }
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 00/24] Massive clean up in t_dd_dmatmp.h

2015-09-22 Thread Emil Velikov
On 22 September 2015 at 13:43, Predut, Marius  wrote:
> In the upstream code , 'nr' variable is used intensively in many places for 
> things like :
> for (j = 0; j + 2 < count; j += nr ) {
>  nr = MIN2( currentsz, count - j );
>
> But first time when it is used it isn't initialized.
> I was unable to find  a place (a macro or something)  where  "nr" is 
> initialized.
> I suppose it is done somewhere because if NOT, THIS will be very strange how 
> this code  works.
> (Every time "nr" became "currentsz" because all the time nr is a big stack 
> value, so this is the reason for why code works?)
>
While I cannot quote the C spec, I'm pretty sure that  j += nr, will
be executed after nr = MIN2( currentsz, count - j ).

The following simple test confirms it.

int main(void) {
int j, nr;

for (j = 0; j < 10; printf("nr %d\n", nr), j += nr)
nr = 1, printf("hello world %d\n", j);
}

hello world 0
nr 1
hello world 1
...


Cheers,
Emil
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 00/24] Massive clean up in t_dd_dmatmp.h

2015-09-22 Thread Predut, Marius
In the upstream code , 'nr' variable is used intensively in many places for 
things like :
for (j = 0; j + 2 < count; j += nr ) {
 nr = MIN2( currentsz, count - j );

But first time when it is used it isn't initialized.
I was unable to find  a place (a macro or something)  where  "nr" is 
initialized.
I suppose it is done somewhere because if NOT, THIS will be very strange how 
this code  works.
(Every time "nr" became "currentsz" because all the time nr is a big stack 
value, so this is the reason for why code works?)

-Original Message-
From: mesa-dev [mailto:mesa-dev-boun...@lists.freedesktop.org] On Behalf Of Ian 
Romanick
Sent: Tuesday, September 15, 2015 5:04 PM
To: Brian Paul; mesa-dev@lists.freedesktop.org
Subject: Re: [Mesa-dev] [PATCH 00/24] Massive clean up in t_dd_dmatmp.h

On 09/14/2015 07:04 PM, Brian Paul wrote:
> On 09/14/2015 07:01 PM, Ian Romanick wrote:
>> I looked at t_dd_dmatmp.h after the previous discussions about fixing 
>> the "count" problem.  It was a mess.  The first 5 patches fix the bug.
>> The remaining 19 patches delete dead code and make the file adhere to 
>> Mesa's coding standards.
>>
>> t_dd_triemit.h and t_dd_dmatmp2.h could probably use some similar 
>> clean up.
>>
>> I don't have easy access to i915, radeon, or r200 hardware right now, 
>> so I am only able to compile test this series.  I won't be able to 
>> test it on hardware until after XDC (next week).
> 
> I have a minor comment on patch 10 which could be addressed in a follow-on.
> 
> Patch 19 does more than formatting changes (the second to last hunk is 
> a code transformation, but looks correct).

I was putting that (and the changes from if-statements to ?: in other
patches) under the loose umbrella of "formatting fixes". :)

> Anyway, the series looks good to me (though I can't test it either).

Marius noticed a problem in one of the patches, so I'm definitely going to wait 
until I can test both i915 and radeon before pushing.

> Reviewed-by: Brian Paul 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 03/24] glsl: Add support for linking uniform arrays of arrays

2015-09-22 Thread Samuel Iglesias Gonsálvez
Reviewed-by: Samuel Iglesias Gonsálvez 

On 17/09/15 09:02, Timothy Arceri wrote:
> V2: Handle arrays of arrays in the same way structures are handled
> 
> The ARB_arrays_of_arrays spec doesn't give very many details on how
> AoA uniforms are intended to be implemented. However in the
> ARB_program_interface_query spec there are details that show AoA are
> intended to be handled in a similar way to structs.
> 
> Issues 7 from the ARB_program_interface_query spec:
> 
>  We define rules consistent with our enumeration rules for
>  other complex types.  For existing one-dimensional arrays, we enumerate
>  a single entry if the array is an array of basic types, or separate
>  entries for each array element if the array is an array of structures.
>  We follow similar rules here.  For a uniform array such as:
> 
>uniform vec4 a[5][4][3];
> 
>  we enumerate twenty different entries ("a[0][0][0]" through
>  "a[4][3][0]"), each of which is treated as an array with three elements.
>  This is morally equivalent to what you'd get if you worked around the
>  limitation in current GLSL via:
> 
> struct ArrayBottom { vec4 c[3]; };
> struct ArrayMid{ ArrayBottom b[3]; };
> uniform ArrayMid   a[5];
> 
>  which would enumerate "a[0].b[0].c[0]" through "a[4].b[3].c[0]".
> ---
>  src/glsl/link_uniform_initializers.cpp |  4 +++-
>  src/glsl/link_uniforms.cpp | 13 +
>  2 files changed, 12 insertions(+), 5 deletions(-)
> 
> diff --git a/src/glsl/link_uniform_initializers.cpp 
> b/src/glsl/link_uniform_initializers.cpp
> index 05000fc..f64ba1b 100644
> --- a/src/glsl/link_uniform_initializers.cpp
> +++ b/src/glsl/link_uniform_initializers.cpp
> @@ -179,6 +179,7 @@ set_uniform_initializer(void *mem_ctx, gl_shader_program 
> *prog,
>   const char *name, const glsl_type *type,
>  ir_constant *val, unsigned int boolean_true)
>  {
> +   const glsl_type *t_without_array = type->without_array();
> if (type->is_record()) {
>ir_constant *field_constant;
>  
> @@ -193,7 +194,8 @@ set_uniform_initializer(void *mem_ctx, gl_shader_program 
> *prog,
>field_constant = (ir_constant *)field_constant->next;
>}
>return;
> -   } else if (type->is_array() && type->fields.array->is_record()) {
> +   } else if (t_without_array->is_record() ||
> +  (type->is_array() && type->fields.array->is_array())) {
>const glsl_type *const element_type = type->fields.array;
>  
>for (unsigned int i = 0; i < type->length; i++) {
> diff --git a/src/glsl/link_uniforms.cpp b/src/glsl/link_uniforms.cpp
> index 6108270..bdc28cc 100644
> --- a/src/glsl/link_uniforms.cpp
> +++ b/src/glsl/link_uniforms.cpp
> @@ -143,7 +143,8 @@ program_resource_visitor::process(ir_variable *var)
>recursion(var->type, &name, strlen(name), row_major, NULL, false,
>  record_array_count);
>ralloc_free(name);
> -   } else if (t->without_array()->is_record()) {
> +   } else if (t_without_array->is_record() ||
> +  (t->is_array() && t->fields.array->is_array())) {
>char *name = ralloc_strdup(NULL, var->name);
>recursion(var->type, &name, strlen(name), row_major, NULL, false,
>  record_array_count);
> @@ -223,7 +224,8 @@ program_resource_visitor::recursion(const glsl_type *t, 
> char **name,
>   this->leave_record(t, *name, row_major);
>}
> } else if (t->without_array()->is_record() ||
> -  t->without_array()->is_interface()) {
> +  t->without_array()->is_interface() ||
> +  (t->is_array() && t->fields.array->is_array())) {
>if (record_type == NULL && t->fields.array->is_record())
>   record_type = t->fields.array;
>  
> @@ -366,6 +368,7 @@ private:
> {
>assert(!type->without_array()->is_record());
>assert(!type->without_array()->is_interface());
> +  assert(!(type->is_array() && type->fields.array->is_array()));
>  
>(void) row_major;
>  
> @@ -690,6 +693,7 @@ private:
> {
>assert(!type->without_array()->is_record());
>assert(!type->without_array()->is_interface());
> +  assert(!(type->is_array() && type->fields.array->is_array()));
>  
>unsigned id;
>bool found = this->map->get(id, name);
> @@ -769,7 +773,7 @@ private:
>  
>if (type->is_array()) {
>   this->uniforms[id].array_stride =
> -glsl_align(type->fields.array->std140_size(row_major), 16);
> +glsl_align(type->without_array()->std140_size(row_major), 16);
>} else {
>   this->uniforms[id].array_stride = 0;
>}
> @@ -923,7 +927,8 @@ link_update_uniform_buffer_variables(struct gl_shader 
> *shader)
>  
>if (var->type->is_record()) {
>   sentinel = '.';
> -  } else if (var->type->without_array()->is_record()) {
> +  } else if (var->type->is_array() && 
> (var->type->fields.array->is_array()
> + 

Re: [Mesa-dev] [PATCH 23/24] i965: enable ARB_arrays_of_arrays

2015-09-22 Thread Samuel Iglesias Gonsálvez
Reviewed-by: Samuel Iglesias Gonsálvez 

On 17/09/15 09:03, Timothy Arceri wrote:
> ---
>  src/mesa/drivers/dri/i965/intel_extensions.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/src/mesa/drivers/dri/i965/intel_extensions.c 
> b/src/mesa/drivers/dri/i965/intel_extensions.c
> index e6d39e0..9f9c6ed 100644
> --- a/src/mesa/drivers/dri/i965/intel_extensions.c
> +++ b/src/mesa/drivers/dri/i965/intel_extensions.c
> @@ -174,6 +174,7 @@ intelInitExtensions(struct gl_context *ctx)
>  
> assert(brw->gen >= 4);
>  
> +   ctx->Extensions.ARB_arrays_of_arrays = true;
> ctx->Extensions.ARB_buffer_storage = true;
> ctx->Extensions.ARB_clear_texture = true;
> ctx->Extensions.ARB_clip_control = true;
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 25/24] glsl: add std140 layout support for AoA

2015-09-22 Thread Samuel Iglesias Gonsálvez
Reviewed-by: Samuel Iglesias Gonsálvez 

On 20/09/15 14:07, Timothy Arceri wrote:
> ---
>  I noticed this problem after adding AoA support [1] to Ian's random UBO test
>  script [2].
> 
>  [1] http://patchwork.freedesktop.org/patch/59956/
>  [2] http://cgit.freedesktop.org/~idr/piglit/log/?h=ubo-lolz
> 
>  src/glsl/glsl_types.cpp | 13 +++--
>  1 file changed, 7 insertions(+), 6 deletions(-)
> 
> diff --git a/src/glsl/glsl_types.cpp b/src/glsl/glsl_types.cpp
> index 86f0ea5..952bd0a 100644
> --- a/src/glsl/glsl_types.cpp
> +++ b/src/glsl/glsl_types.cpp
> @@ -1310,8 +1310,8 @@ glsl_type::std140_size(bool row_major) const
>unsigned int array_len;
>  
>if (this->is_array()) {
> -  element_type = this->fields.array;
> -  array_len = this->length;
> +  element_type = this->without_array();
> +  array_len = this->arrays_of_arrays_size();
>} else {
>element_type = this;
>array_len = 1;
> @@ -1344,12 +1344,13 @@ glsl_type::std140_size(bool row_major) const
>  *  the array are laid out in order, according to rule (9).
>  */
> if (this->is_array()) {
> -  if (this->fields.array->is_record()) {
> -  return this->length * this->fields.array->std140_size(row_major);
> +  if (this->without_array()->is_record()) {
> +  return this->arrays_of_arrays_size() *
> +this->without_array()->std140_size(row_major);
>} else {
>unsigned element_base_align =
> - this->fields.array->std140_base_alignment(row_major);
> -  return this->length * MAX2(element_base_align, 16);
> + this->without_array()->std140_base_alignment(row_major);
> +  return this->arrays_of_arrays_size() * MAX2(element_base_align, 16);
>}
> }
>  
> 
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 00/24] Massive clean up in t_dd_dmatmp.h

2015-09-22 Thread Predut, Marius
In the upstream code , 'nr' variable is used intensively in many places for 
things like :
for (j = 0; j + 2 < count; j += nr ) {
 nr = MIN2( currentsz, count - j );

But first time when it is used it isn't initialized.
I was unable to find  a place (a macro or something)  where  "nr" is 
initialized.
I suppose it is done somewhere because if NOT, THIS will be very strange how 
this code  works.
(Every time "nr" became "currentsz" because all the time nr is a big stack 
value, so this is the reason for why code works?)

-Original Message-
From: mesa-dev [mailto:mesa-dev-boun...@lists.freedesktop.org] On Behalf Of Ian 
Romanick
Sent: Tuesday, September 15, 2015 5:04 PM
To: Brian Paul; mesa-dev@lists.freedesktop.org
Subject: Re: [Mesa-dev] [PATCH 00/24] Massive clean up in t_dd_dmatmp.h

On 09/14/2015 07:04 PM, Brian Paul wrote:
> On 09/14/2015 07:01 PM, Ian Romanick wrote:
>> I looked at t_dd_dmatmp.h after the previous discussions about fixing 
>> the "count" problem.  It was a mess.  The first 5 patches fix the bug.
>> The remaining 19 patches delete dead code and make the file adhere to 
>> Mesa's coding standards.
>>
>> t_dd_triemit.h and t_dd_dmatmp2.h could probably use some similar 
>> clean up.
>>
>> I don't have easy access to i915, radeon, or r200 hardware right now, 
>> so I am only able to compile test this series.  I won't be able to 
>> test it on hardware until after XDC (next week).
> 
> I have a minor comment on patch 10 which could be addressed in a follow-on.
> 
> Patch 19 does more than formatting changes (the second to last hunk is 
> a code transformation, but looks correct).

I was putting that (and the changes from if-statements to ?: in other
patches) under the loose umbrella of "formatting fixes". :)

> Anyway, the series looks good to me (though I can't test it either).

Marius noticed a problem in one of the patches, so I'm definitely going to wait 
until I can test both i915 and radeon before pushing.

> Reviewed-by: Brian Paul 

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev
___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] glsl: bail out early in _mesa_ShaderSource if no shaderobj

2015-09-22 Thread Iago Toral
On Tue, 2015-09-22 at 14:34 +0300, Tapani Pälli wrote:
> Patch fixes a crash in conformance test that tries out different
> invalid arguments for glShaderSource and glGetShaderSource:
> 
>ES2-CTS.gtf.GL.glGetShaderSource.getshadersource_programhandle
> 
> This is a regression from commit:
>04e201d0c02cd30ace5c6fe80e9f021ebb733682
> 
> Additions in v2 also fix following failing deqp test:
>dEQP-GLES[2|3].functional.negative_api.shader.shader_source

Nice! I wasn't expecting that :)

Reviewed-by: Iago Toral Quiroga 

> v2: cleanup function, do check earlier (Iago Toral)
> 
> Signed-off-by: Tapani Pälli 
> ---
>  src/mesa/main/shaderapi.c | 20 +---
>  1 file changed, 9 insertions(+), 11 deletions(-)
> 
> diff --git a/src/mesa/main/shaderapi.c b/src/mesa/main/shaderapi.c
> index f31980b..edc23bc 100644
> --- a/src/mesa/main/shaderapi.c
> +++ b/src/mesa/main/shaderapi.c
> @@ -931,13 +931,9 @@ get_shader_source(struct gl_context *ctx, GLuint shader, 
> GLsizei maxLength,
>   * glShaderSource[ARB].
>   */
>  static void
> -shader_source(struct gl_context *ctx, GLuint shader, const GLchar *source)
> +shader_source(struct gl_shader *sh, const GLchar *source)
>  {
> -   struct gl_shader *sh;
> -
> -   sh = _mesa_lookup_shader_err(ctx, shader, "glShaderSource");
> -   if (!sh)
> -  return;
> +   assert(sh);
>  
> /* free old shader source string and install new one */
> free((void *)sh->Source);
> @@ -1639,13 +1635,17 @@ _mesa_ShaderSource(GLhandleARB shaderObj, GLsizei 
> count,
> GLint *offsets;
> GLsizei i, totalLength;
> GLcharARB *source;
> +   struct gl_shader *sh;
>  
>  #if defined(HAVE_SHA1)
> GLcharARB *replacement;
> -   struct gl_shader *sh;
>  #endif /* HAVE_SHA1 */
>  
> -   if (!shaderObj || string == NULL) {
> +   sh = _mesa_lookup_shader_err(ctx, shaderObj, "glShaderSourceARB");
> +   if (!sh)
> +  return;
> +
> +   if (string == NULL) {
>_mesa_error(ctx, GL_INVALID_VALUE, "glShaderSourceARB");
>return;
> }
> @@ -1697,8 +1697,6 @@ _mesa_ShaderSource(GLhandleARB shaderObj, GLsizei count,
> source[totalLength - 2] = '\0';
>  
>  #if defined(HAVE_SHA1)
> -   sh = _mesa_lookup_shader(ctx, shaderObj);
> -
> /* Dump original shader source to MESA_SHADER_DUMP_PATH and replace
>  * if corresponding entry found from MESA_SHADER_READ_PATH.
>  */
> @@ -1711,7 +1709,7 @@ _mesa_ShaderSource(GLhandleARB shaderObj, GLsizei count,
> }
>  #endif /* HAVE_SHA1 */
>  
> -   shader_source(ctx, shaderObj, source);
> +   shader_source(sh, source);
>  
> free(offsets);
>  }


___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH 0/5] Enable up to 24 MRF registers in gen6

2015-09-22 Thread Iago Toral
On Tue, 2015-09-22 at 08:10 +0200, Iago Toral wrote:
> Hi Mark,
> 
> On Mon, 2015-09-21 at 17:45 -0700, Mark Janes wrote:
> > Hi Iago,
> > 
> > According to my tests, this patch series fixes the gles2/gles3
> > "functional.uniform_api.random.23" tests in dEQP, on sandybridge.
> > 
> > Do you see the same results?  Should this patch series be applied to the
> > stable branch?
> 
> I can try to verify this, but I would not be surprised if that was the
> case. This seems to fix a couple of bugs in bugzilla and it fixes a lot
> of piglit tests in SNB when we force spilling, so if that test is
> triggering spilling, there is a good chance that it had the same problem
> and this patch fixed it. I'll try to confirm this today and let you
> know.

Yes, I see the same result. Also, I verified that the test fails without
my patches and that the reason it failed before is the same as all the
other cases that my patches fix: if fails to link because it can't use
m14 for spilling.

Iago

> As for whether this should go to stable, I suppose it depends on whether
> we are confident that this won't break anything else... if you have not
> seen any more issues with piglit/deqp on any other platform I guess
> that's a good sign, but I'll let Ken make that call.
> 
> Iago
> 
> > thanks,
> > 
> > Mark
> > 
> > Iago Toral Quiroga  writes:
> > 
> > > There are some bug reports about shaders failing to compile in gen6
> > > because MRF 14 is used when we need to spill. For example:
> > > https://bugs.freedesktop.org/show_bug.cgi?id=86469
> > > https://bugs.freedesktop.org/show_bug.cgi?id=90631
> > >
> > > Discussion in bugzilla pointed to the fact that gen6 might actually have
> > > 24 MRF registers available instead of 16, so we could use other MRF
> > > registers and avoid these conflicts (we still need to investigate why
> > > some shaders need up to MRF 14 anyway, since this is not expected).
> > >
> > > Notice that the hardware docs are not clear about this fact:
> > >
> > > SNB PRM Vol4 Part2's "Table 5-4. MRF Registers Available in Device
> > > Hardware" says "Number per Thread" - "24 registers"
> > >
> > > However, SNB PRM Vol4 Part1, 1.6.1 Message Register File (MRF) says:
> > >
> > > "Normal threads should construct their messages in m1..m15. (...)
> > > Regardless of actual hardware implementation, the thread should
> > > not assume th at MRF addresses above m15 wrap to legal MRF registers."
> > >
> > > Therefore experimentation was necessary to evaluate if we had these extra
> > > MRF registers available or not. This was tested in gen6 using MRF
> > > registers 21..23 for spilling and doing a full piglit run (all.py) forcing
> > > spilling of everything on the FS backend. It was also tested by doing
> > > spilling of everything on both the FS and the VS backends with a piglit 
> > > run
> > > of shader.py. In both cases no regressions were observed. In fact, many of
> > > these tests where helped in the cases where we forced spilling, since that
> > > triggered the same underlying problem described in the bug reports. Here 
> > > are
> > > some results using INTEL_DEBUG=spill_fs,spill_vec4 for a shader.py run on
> > > gen6 hardware:
> > >
> > > Using MRFs 13..15 for spilling:
> > > crash: 2, fail: 113, pass: 6621, skip: 5461
> > > 
> > > Using MRFs 21..23 for spilling:
> > > crash: 2, fail: 12, pass: 6722, skip: 5461
> > >
> > > We might want to test this further with other instances of gen6 hardware
> > > though... I am not sure that we can safely conclude that all 
> > > implementations
> > > of gen6 hardware have 24 MRF registers from my tests on just one 
> > > particular
> > > SandyBridge laptop.
> > >
> > > Iago Toral Quiroga (5):
> > >   i965: Move MRF register asserts out of brw_reg.h
> > >   i965: Turn BRW_MAX_MRF into a macro that accepts a hardware generation
> > >   i965/fs: Use MRF registers 21-23 for spilling in gen6
> > >   i965/vec4: Use MRF registers 21-23 for spilling in gen6
> > >   i965: Maximum allowed size of SEND messages is 15 (4 bits)
> > >
> > >  src/mesa/drivers/dri/i965/brw_eu_emit.c| 11 +
> > >  src/mesa/drivers/dri/i965/brw_fs.cpp   |  4 ++--
> > >  src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 15 
> > >  src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp  | 27 
> > > --
> > >  src/mesa/drivers/dri/i965/brw_inst.h   |  3 +++
> > >  src/mesa/drivers/dri/i965/brw_ir_vec4.h|  2 +-
> > >  src/mesa/drivers/dri/i965/brw_reg.h|  9 
> > >  .../drivers/dri/i965/brw_schedule_instructions.cpp |  4 ++--
> > >  src/mesa/drivers/dri/i965/brw_vec4_generator.cpp   | 10 +---
> > >  src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 15 +++-
> > >  10 files changed, 61 insertions(+), 39 deletions(-)
> > >
> > > -- 
> > > 1.9.1
> > >
> > > ___
> > > mesa-dev mailing list
> > > mesa-dev@lists.freedesktop.org
> > > http://lists.free

[Mesa-dev] [PATCH 3/3] i965: Define FIRST_SPILL_MRF and FIRST_PULL_LOAD_MRF only once and in one place

2015-09-22 Thread Iago Toral Quiroga
That should make tracking where we do spills and pull loads a bit easier.
---
 src/mesa/drivers/dri/i965/brw_fs.cpp  | 2 --
 src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp | 2 --
 src/mesa/drivers/dri/i965/brw_inst.h  | 6 ++
 src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp| 3 ---
 4 files changed, 6 insertions(+), 7 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index e42cc94..2d146ac 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -50,8 +50,6 @@
 #include "glsl/glsl_types.h"
 #include "program/sampler.h"
 
-#define FIRST_PULL_LOAD_MRF(gen) ((gen) == 6 ? 16 : 13)
-
 using namespace brw;
 
 void
diff --git a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
index 6900cee..c3a037b 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp
@@ -30,8 +30,6 @@
 #include "glsl/glsl_types.h"
 #include "glsl/ir_optimization.h"
 
-#define FIRST_SPILL_MRF(gen) (gen == 6 ? 21 : 13)
-
 using namespace brw;
 
 static void
diff --git a/src/mesa/drivers/dri/i965/brw_inst.h 
b/src/mesa/drivers/dri/i965/brw_inst.h
index c5132ba..ab37b70 100644
--- a/src/mesa/drivers/dri/i965/brw_inst.h
+++ b/src/mesa/drivers/dri/i965/brw_inst.h
@@ -42,6 +42,12 @@ extern "C" {
 /** Maximum SEND message length */
 #define BRW_MAX_MSG_LENGTH 15
 
+/** First MRF register used by pull loads */
+#define FIRST_SPILL_MRF(gen) ((gen) == 6 ? 21 : 13)
+
+/** First MRF register used by spills */
+#define FIRST_PULL_LOAD_MRF(gen) ((gen) == 6 ? 16 : 13)
+
 /* brw_context.h has a forward declaration of brw_inst, so name the struct. */
 typedef struct brw_inst {
uint64_t data[2];
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
index 17516f9..4cd2aed 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
@@ -26,9 +26,6 @@
 #include "glsl/ir_uniform.h"
 #include "program/sampler.h"
 
-#define FIRST_SPILL_MRF(gen) (gen == 6 ? 21 : 13)
-#define FIRST_PULL_LOAD_MRF(gen) (gen == 6 ? 16 : 13)
-
 namespace brw {
 
 vec4_instruction::vec4_instruction(enum opcode opcode, const dst_reg &dst,
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 0/3] Make pull constant loads in gen6 start at MRFs 16/17

2015-09-22 Thread Iago Toral Quiroga
Originally, these could conflict with our spills, but now that we moved the
latter to MRFs 21..23, that is no longer the case. Still, in gen6 we
now use MRFs 1..15 for URB writes, so we probably want to make our pull
constant loads out of that MRF space (currently, they start at MRFs 13/14).

Even if we do not want to do this for some reason, I still think we should
at least apply the first patch, since that plugs a hardcoded array size of
16 MRF registers. For some reason this only became a problem when I tried
to move pull constant loads to MRFs > 15 and not when I did the same for
spills, but it looks like the right thing to do in any case.

I tested this on SandyBridge and IvyBridge and did not observe any
regressions in piglit.

Iago Toral Quiroga (3):
  i965: Fix remove_duplicate_mrf_writes so it can handle 24 MRFs in gen6
  i965: make pull constant loads in gen6 start at MRFs 16/17
  i965: Define FIRST_SPILL_MRF and FIRST_PULL_LOAD_MRF only once and in
one place

 src/mesa/drivers/dri/i965/brw_fs.cpp  | 6 +++---
 src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp | 2 --
 src/mesa/drivers/dri/i965/brw_inst.h  | 6 ++
 src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp| 4 +---
 4 files changed, 10 insertions(+), 8 deletions(-)

-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 2/3] i965: make pull constant loads in gen6 start at MRFs 16/17

2015-09-22 Thread Iago Toral Quiroga
So they do not conflict with our (un)spills (MRF 21..23) or our
URB writes (MRF 1..15)
---
 src/mesa/drivers/dri/i965/brw_fs.cpp   | 6 --
 src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 3 ++-
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 81fe7f5..e42cc94 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -50,6 +50,8 @@
 #include "glsl/glsl_types.h"
 #include "program/sampler.h"
 
+#define FIRST_PULL_LOAD_MRF(gen) ((gen) == 6 ? 16 : 13)
+
 using namespace brw;
 
 void
@@ -210,7 +212,7 @@ fs_visitor::VARYING_PULL_CONSTANT_LOAD(const fs_builder 
&bld,
inst->regs_written = regs_written;
 
if (devinfo->gen < 7) {
-  inst->base_mrf = 13;
+  inst->base_mrf = FIRST_PULL_LOAD_MRF(devinfo->gen);
   inst->header_size = 1;
   if (devinfo->gen == 4)
  inst->mlen = 3;
@@ -3008,7 +3010,7 @@ fs_visitor::lower_uniform_pull_constant_loads()
   * else does except for register spill/unspill, which generates and
   * uses its MRF within a single IR instruction.
   */
- inst->base_mrf = 14;
+ inst->base_mrf = FIRST_PULL_LOAD_MRF(devinfo->gen) + 1;
  inst->mlen = 1;
   }
}
diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp 
b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
index 514ccd6..17516f9 100644
--- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
+++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp
@@ -27,6 +27,7 @@
 #include "program/sampler.h"
 
 #define FIRST_SPILL_MRF(gen) (gen == 6 ? 21 : 13)
+#define FIRST_PULL_LOAD_MRF(gen) (gen == 6 ? 16 : 13)
 
 namespace brw {
 
@@ -1386,7 +1387,7 @@ vec4_visitor::emit_pull_constant_load_reg(dst_reg dst,
dst,
surf_index,
offset_reg);
-  pull->base_mrf = FIRST_SPILL_MRF(devinfo->gen) + 1;
+  pull->base_mrf = FIRST_PULL_LOAD_MRF(devinfo->gen) + 1;
   pull->mlen = 1;
}
 
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/3] i965: Fix remove_duplicate_mrf_writes so it can handle 24 MRFs in gen6

2015-09-22 Thread Iago Toral Quiroga
---
 src/mesa/drivers/dri/i965/brw_fs.cpp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp 
b/src/mesa/drivers/dri/i965/brw_fs.cpp
index 225a312..81fe7f5 100644
--- a/src/mesa/drivers/dri/i965/brw_fs.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs.cpp
@@ -2687,7 +2687,7 @@ fs_visitor::emit_repclear_shader()
 bool
 fs_visitor::remove_duplicate_mrf_writes()
 {
-   fs_inst *last_mrf_move[16];
+   fs_inst *last_mrf_move[BRW_MAX_MRF(devinfo->gen)];
bool progress = false;
 
/* Need to update the MRF tracking for compressed instructions. */
-- 
1.9.1

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] glsl: bail out early in _mesa_ShaderSource if no shaderobj

2015-09-22 Thread Tapani Pälli
Patch fixes a crash in conformance test that tries out different
invalid arguments for glShaderSource and glGetShaderSource:

   ES2-CTS.gtf.GL.glGetShaderSource.getshadersource_programhandle

This is a regression from commit:
   04e201d0c02cd30ace5c6fe80e9f021ebb733682

Additions in v2 also fix following failing deqp test:
   dEQP-GLES[2|3].functional.negative_api.shader.shader_source

v2: cleanup function, do check earlier (Iago Toral)

Signed-off-by: Tapani Pälli 
---
 src/mesa/main/shaderapi.c | 20 +---
 1 file changed, 9 insertions(+), 11 deletions(-)

diff --git a/src/mesa/main/shaderapi.c b/src/mesa/main/shaderapi.c
index f31980b..edc23bc 100644
--- a/src/mesa/main/shaderapi.c
+++ b/src/mesa/main/shaderapi.c
@@ -931,13 +931,9 @@ get_shader_source(struct gl_context *ctx, GLuint shader, 
GLsizei maxLength,
  * glShaderSource[ARB].
  */
 static void
-shader_source(struct gl_context *ctx, GLuint shader, const GLchar *source)
+shader_source(struct gl_shader *sh, const GLchar *source)
 {
-   struct gl_shader *sh;
-
-   sh = _mesa_lookup_shader_err(ctx, shader, "glShaderSource");
-   if (!sh)
-  return;
+   assert(sh);
 
/* free old shader source string and install new one */
free((void *)sh->Source);
@@ -1639,13 +1635,17 @@ _mesa_ShaderSource(GLhandleARB shaderObj, GLsizei count,
GLint *offsets;
GLsizei i, totalLength;
GLcharARB *source;
+   struct gl_shader *sh;
 
 #if defined(HAVE_SHA1)
GLcharARB *replacement;
-   struct gl_shader *sh;
 #endif /* HAVE_SHA1 */
 
-   if (!shaderObj || string == NULL) {
+   sh = _mesa_lookup_shader_err(ctx, shaderObj, "glShaderSourceARB");
+   if (!sh)
+  return;
+
+   if (string == NULL) {
   _mesa_error(ctx, GL_INVALID_VALUE, "glShaderSourceARB");
   return;
}
@@ -1697,8 +1697,6 @@ _mesa_ShaderSource(GLhandleARB shaderObj, GLsizei count,
source[totalLength - 2] = '\0';
 
 #if defined(HAVE_SHA1)
-   sh = _mesa_lookup_shader(ctx, shaderObj);
-
/* Dump original shader source to MESA_SHADER_DUMP_PATH and replace
 * if corresponding entry found from MESA_SHADER_READ_PATH.
 */
@@ -1711,7 +1709,7 @@ _mesa_ShaderSource(GLhandleARB shaderObj, GLsizei count,
}
 #endif /* HAVE_SHA1 */
 
-   shader_source(ctx, shaderObj, source);
+   shader_source(sh, source);
 
free(offsets);
 }
-- 
2.4.3

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] Apologies for forgetting to add Reviewed by Tapani

2015-09-22 Thread Lofstedt, Marta
Hi all,

I apologize for forgetting to add Tapani as the reviewer for the patches I just 
merged:
commit 6c3de8996fbe9447e092cc75ccdd6f720fabaf4d
commit cf293e518ebd847cb28e03d4378679c47548206d
commit 419210005a84f1f26da353b945b3f783d53fa56a

The patches was reviewed here:
http://lists.freedesktop.org/archives/mesa-dev/2015-September/093432.html

BR,
Marta

---
Intel Sweden AB 
Registered Office: Knarrarnasgatan 15, 164 40 Kista, Stockholm, Sweden 
Registration Number: 556189-6027

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


Re: [Mesa-dev] [PATCH] i965/vec4: Don't coalesce registers in gen6 math ops if reswizzling needed

2015-09-22 Thread Antía Puentes
On mar, 2015-09-22 at 11:22 +0200, Antía Puentes wrote:
> I realized that if writemasking is also unsupported I should probably
> update the patch to avoid coalescing if the dst_writemask parameter is
> different from XYWZ. The condition in "vec4_instruction::can_reswizzle"
> would be something like:
> 
>   /* Gen6 MATH instructions can not execute in align16 mode, so swizzles
>* or writemasking are not allowed. */
>if (devinfo->gen == 6 && is_math() &&
>swizzle != BRW_SWIZZLE_XYZW &&
>dst_writemask != WRITEMASK_XYWZ)
>   return false;

Ups, I meant: (swizzle != BRW_SWIZZLE_XYZW || dst_writemask !=
WRITEMASK_XYWZ)
Sorry for the noise.

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH 1/2] glsl: fix indention in glsl_types.cpp

2015-09-22 Thread Samuel Iglesias Gonsalvez
No functional changes.

Signed-off-by: Samuel Iglesias Gonsalvez 
---
 src/glsl/glsl_types.cpp | 140 
 1 file changed, 70 insertions(+), 70 deletions(-)

diff --git a/src/glsl/glsl_types.cpp b/src/glsl/glsl_types.cpp
index 97c79fa..b4525eb 100644
--- a/src/glsl/glsl_types.cpp
+++ b/src/glsl/glsl_types.cpp
@@ -45,8 +45,8 @@ glsl_type::init_ralloc_type_ctx(void)
 }
 
 glsl_type::glsl_type(GLenum gl_type,
-glsl_base_type base_type, unsigned vector_elements,
-unsigned matrix_columns, const char *name) :
+ glsl_base_type base_type, unsigned vector_elements,
+ unsigned matrix_columns, const char *name) :
gl_type(gl_type),
base_type(base_type),
sampler_dimensionality(0), sampler_shadow(0), sampler_array(0),
@@ -69,8 +69,8 @@ glsl_type::glsl_type(GLenum gl_type,
 }
 
 glsl_type::glsl_type(GLenum gl_type, glsl_base_type base_type,
-enum glsl_sampler_dim dim, bool shadow, bool array,
-unsigned type, const char *name) :
+ enum glsl_sampler_dim dim, bool shadow, bool array,
+ unsigned type, const char *name) :
gl_type(gl_type),
base_type(base_type),
sampler_dimensionality(dim), sampler_shadow(shadow),
@@ -96,7 +96,7 @@ glsl_type::glsl_type(GLenum gl_type, glsl_base_type base_type,
 }
 
 glsl_type::glsl_type(const glsl_struct_field *fields, unsigned num_fields,
-const char *name) :
+ const char *name) :
gl_type(0),
base_type(GLSL_TYPE_STRUCT),
sampler_dimensionality(0), sampler_shadow(0), sampler_array(0),
@@ -112,12 +112,12 @@ glsl_type::glsl_type(const glsl_struct_field *fields, 
unsigned num_fields,
assert(name != NULL);
this->name = ralloc_strdup(this->mem_ctx, name);
this->fields.structure = ralloc_array(this->mem_ctx,
-glsl_struct_field, length);
+ glsl_struct_field, length);
 
for (i = 0; i < length; i++) {
   this->fields.structure[i].type = fields[i].type;
   this->fields.structure[i].name = ralloc_strdup(this->fields.structure,
-fields[i].name);
+ fields[i].name);
   this->fields.structure[i].location = fields[i].location;
   this->fields.structure[i].interpolation = fields[i].interpolation;
   this->fields.structure[i].centroid = fields[i].centroid;
@@ -130,7 +130,7 @@ glsl_type::glsl_type(const glsl_struct_field *fields, 
unsigned num_fields,
 }
 
 glsl_type::glsl_type(const glsl_struct_field *fields, unsigned num_fields,
-enum glsl_interface_packing packing, const char *name) :
+ enum glsl_interface_packing packing, const char *name) :
gl_type(0),
base_type(GLSL_TYPE_INTERFACE),
sampler_dimensionality(0), sampler_shadow(0), sampler_array(0),
@@ -146,11 +146,11 @@ glsl_type::glsl_type(const glsl_struct_field *fields, 
unsigned num_fields,
assert(name != NULL);
this->name = ralloc_strdup(this->mem_ctx, name);
this->fields.structure = ralloc_array(this->mem_ctx,
-glsl_struct_field, length);
+ glsl_struct_field, length);
for (i = 0; i < length; i++) {
   this->fields.structure[i].type = fields[i].type;
   this->fields.structure[i].name = ralloc_strdup(this->fields.structure,
-fields[i].name);
+ fields[i].name);
   this->fields.structure[i].location = fields[i].location;
   this->fields.structure[i].interpolation = fields[i].interpolation;
   this->fields.structure[i].centroid = fields[i].centroid;
@@ -186,8 +186,8 @@ glsl_type::contains_sampler() const
   return this->fields.array->contains_sampler();
} else if (this->is_record()) {
   for (unsigned int i = 0; i < this->length; i++) {
-if (this->fields.structure[i].type->contains_sampler())
-   return true;
+ if (this->fields.structure[i].type->contains_sampler())
+return true;
   }
   return false;
} else {
@@ -203,8 +203,8 @@ glsl_type::contains_integer() const
   return this->fields.array->contains_integer();
} else if (this->is_record()) {
   for (unsigned int i = 0; i < this->length; i++) {
-if (this->fields.structure[i].type->contains_integer())
-   return true;
+ if (this->fields.structure[i].type->contains_integer())
+return true;
   }
   return false;
} else {
@@ -219,8 +219,8 @@ glsl_type::contains_double() const
   return this->fields.array->contains_double();
} else if (this->is_record()) {
   for (unsigned int i = 0; i < this->length; i++) {
-if (th

[Mesa-dev] [PATCH 2/2] glsl: Add unsized array support to glsl_type::std140_size()

2015-09-22 Thread Samuel Iglesias Gonsalvez
---
 src/glsl/glsl_types.cpp | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/src/glsl/glsl_types.cpp b/src/glsl/glsl_types.cpp
index b4525eb..07d7248 100644
--- a/src/glsl/glsl_types.cpp
+++ b/src/glsl/glsl_types.cpp
@@ -1351,7 +1351,7 @@ glsl_type::std140_size(bool row_major) const
 * rounded up to the next multiple of the base alignment of the
 * structure.
 */
-   if (this->is_record()) {
+   if (this->is_record() || this->is_interface()) {
   unsigned size = 0;
   unsigned max_align = 0;
 
@@ -1367,6 +1367,11 @@ glsl_type::std140_size(bool row_major) const
 
  const struct glsl_type *field_type = this->fields.structure[i].type;
  unsigned align = field_type->std140_base_alignment(field_row_major);
+
+ /* Ignore unsized arrays when calculating size */
+ if (field_type->is_unsized_array())
+continue;
+
  size = glsl_align(size, align);
  size += field_type->std140_size(field_row_major);
 
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] glsl: Add parser/compiler support for std430 interface packing qualifier

2015-09-22 Thread Samuel Iglesias Gonsalvez
v2:
- Fix a missing check in has_layout()

v3:
- Mention shader storage block in error message for layout qualifiers
  (Kristian).

Signed-off-by: Samuel Iglesias Gonsalvez 
Reviewed-by: Jordan Justen 
---
 src/glsl/ast.h   |  1 +
 src/glsl/ast_to_hir.cpp  | 21 +
 src/glsl/ast_type.cpp|  2 ++
 src/glsl/glsl_parser.yy  |  2 ++
 src/glsl/glsl_types.h|  3 ++-
 src/glsl/link_uniform_blocks.cpp | 15 ---
 src/mesa/main/mtypes.h   |  3 ++-
 7 files changed, 38 insertions(+), 9 deletions(-)

diff --git a/src/glsl/ast.h b/src/glsl/ast.h
index cca32b3..4c31436 100644
--- a/src/glsl/ast.h
+++ b/src/glsl/ast.h
@@ -491,6 +491,7 @@ struct ast_type_qualifier {
 /** \name Layout qualifiers for GL_ARB_uniform_buffer_object */
 /** \{ */
  unsigned std140:1;
+ unsigned std430:1;
  unsigned shared:1;
  unsigned packed:1;
  unsigned column_major:1;
diff --git a/src/glsl/ast_to_hir.cpp b/src/glsl/ast_to_hir.cpp
index 92038a6..d6071ef 100644
--- a/src/glsl/ast_to_hir.cpp
+++ b/src/glsl/ast_to_hir.cpp
@@ -2920,11 +2920,13 @@ apply_type_qualifier_to_variable(const struct 
ast_type_qualifier *qual,
var->data.depth_layout = ir_depth_layout_none;
 
if (qual->flags.q.std140 ||
+   qual->flags.q.std430 ||
qual->flags.q.packed ||
qual->flags.q.shared) {
   _mesa_glsl_error(loc, state,
-   "uniform block layout qualifiers std140, packed, and "
-   "shared can only be applied to uniform blocks, not "
+   "uniform and shader storage block layout qualifiers "
+   "std140, std430, packed, and shared can only be "
+   "applied to uniform or shader storage blocks, not "
"members");
}
 
@@ -5691,12 +5693,14 @@ ast_process_structure_or_interface_block(exec_list 
*instructions,
  const struct ast_type_qualifier *const qual =
 & decl_list->type->qualifier;
  if (qual->flags.q.std140 ||
+ qual->flags.q.std430 ||
  qual->flags.q.packed ||
  qual->flags.q.shared) {
 _mesa_glsl_error(&loc, state,
  "uniform/shader storage block layout qualifiers "
- "std140, packed, and shared can only be applied "
- "to uniform/shader storage blocks, not members");
+ "std140, std430, packed, and shared can only be "
+ "applied to uniform/shader storage blocks, not "
+ "members");
  }
 
  if (qual->flags.q.constant) {
@@ -5908,6 +5912,13 @@ ast_interface_block::hir(exec_list *instructions,
this->block_name);
}
 
+   if (!this->layout.flags.q.buffer &&
+   this->layout.flags.q.std430) {
+  _mesa_glsl_error(&loc, state,
+   "std430 storage block layout qualifier is supported "
+   "only for shader storage blocks");
+   }
+
/* The ast_interface_block has a list of ast_declarator_lists.  We
 * need to turn those into ir_variables with an association
 * with this uniform block.
@@ -5917,6 +5928,8 @@ ast_interface_block::hir(exec_list *instructions,
   packing = GLSL_INTERFACE_PACKING_SHARED;
} else if (this->layout.flags.q.packed) {
   packing = GLSL_INTERFACE_PACKING_PACKED;
+   } else if (this->layout.flags.q.std430) {
+  packing = GLSL_INTERFACE_PACKING_STD430;
} else {
   /* The default layout is std140.
*/
diff --git a/src/glsl/ast_type.cpp b/src/glsl/ast_type.cpp
index a4671e2..08a4504 100644
--- a/src/glsl/ast_type.cpp
+++ b/src/glsl/ast_type.cpp
@@ -65,6 +65,7 @@ ast_type_qualifier::has_layout() const
   || this->flags.q.depth_less
   || this->flags.q.depth_unchanged
   || this->flags.q.std140
+  || this->flags.q.std430
   || this->flags.q.shared
   || this->flags.q.column_major
   || this->flags.q.row_major
@@ -123,6 +124,7 @@ ast_type_qualifier::merge_qualifier(YYLTYPE *loc,
ubo_layout_mask.flags.q.std140 = 1;
ubo_layout_mask.flags.q.packed = 1;
ubo_layout_mask.flags.q.shared = 1;
+   ubo_layout_mask.flags.q.std430 = 1;
 
ast_type_qualifier ubo_binding_mask;
ubo_binding_mask.flags.i = 0;
diff --git a/src/glsl/glsl_parser.yy b/src/glsl/glsl_parser.yy
index 028974e..4cb018a 100644
--- a/src/glsl/glsl_parser.yy
+++ b/src/glsl/glsl_parser.yy
@@ -1199,6 +1199,8 @@ layout_qualifier_id:
 $$.flags.q.std140 = 1;
  } else if (match_layout_qualifier($1, "shared", state) == 0) {
 $$.flags.q.shared = 1;
+ } else if (match_layout_qualifier($1, "std430", state) == 0) {
+$$.flags.q.std430 = 1;
  } else if (match_layout_qualifier($1, "column_major", state) == 0) 

[Mesa-dev] [PATCH] i965/fs/nir: implement nir_intrinsic_get_buffer_size

2015-09-22 Thread Samuel Iglesias Gonsalvez
v2:
- Remove inst->regs_written assignment as the instruction only
  writes to one register.

Signed-off-by: Samuel Iglesias Gonsalvez 
---
 src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 24 
 1 file changed, 24 insertions(+)

diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp 
b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
index e4ddadc..97aef61 100644
--- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
+++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp
@@ -1734,6 +1734,30 @@ fs_visitor::nir_emit_intrinsic(const fs_builder &bld, 
nir_intrinsic_instr *instr
   break;
}
 
+   case nir_intrinsic_get_buffer_size: {
+  nir_const_value *const_uniform_block = 
nir_src_as_const_value(instr->src[0]);
+  unsigned ubo_index = const_uniform_block ? const_uniform_block->u[0] : 0;
+  int reg_width = dispatch_width / 8;
+
+  assert(shader->base.UniformBlocks[ubo_index].IsShaderStorage);
+
+  /* Set LOD = 0 */
+  fs_reg source = fs_reg(0);
+
+  int mlen = 1 * reg_width;
+  fs_reg src_payload = fs_reg(GRF, alloc.allocate(mlen),
+  BRW_REGISTER_TYPE_UD);
+  bld.LOAD_PAYLOAD(src_payload, &source, 1, 0);
+
+  fs_reg surf_index = fs_reg(prog_data->binding_table.ubo_start + 
ubo_index);
+  fs_inst *inst = bld.emit(FS_OPCODE_GET_BUFFER_SIZE, dest,
+   src_payload, surf_index);
+  inst->header_size = 0;
+  inst->mlen = mlen;
+  bld.emit(inst);
+  break;
+   }
+
default:
   unreachable("unknown intrinsic");
}
-- 
2.1.4

___
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/mesa-dev


[Mesa-dev] [PATCH] glsl: Move interface block processing to glsl_parser_extras.cpp

2015-09-22 Thread Samuel Iglesias Gonsalvez
No functional changes.
---
 src/glsl/ast.h  |   5 ++
 src/glsl/glsl_parser.yy | 127 +---
 src/glsl/glsl_parser_extras.cpp | 122 ++
 3 files changed, 128 insertions(+), 126 deletions(-)

diff --git a/src/glsl/ast.h b/src/glsl/ast.h
index 335f426..cca32b3 100644
--- a/src/glsl/ast.h
+++ b/src/glsl/ast.h
@@ -1172,4 +1172,9 @@ extern void
 check_builtin_array_max_size(const char *name, unsigned size,
  YYLTYPE loc, struct _mesa_glsl_parse_state 
*state);
 
+extern void _mesa_ast_process_interface_block(YYLTYPE *locp,
+  _mesa_glsl_parse_state *state,
+  ast_interface_block *const block,
+  const struct ast_type_qualifier 
q);
+
 #endif /* AST_H */
diff --git a/src/glsl/glsl_parser.yy b/src/glsl/glsl_parser.yy
index 42108a3..7f00929 100644
--- a/src/glsl/glsl_parser.yy
+++ b/src/glsl/glsl_parser.yy
@@ -2634,132 +2634,7 @@ basic_interface_block:
   block->block_name = $2;
   block->declarations.push_degenerate_list_at_head(& $4->link);
 
-  if ($1.flags.q.buffer) {
- if (!state->has_shader_storage_buffer_objects()) {
-_mesa_glsl_error(& @1, state,
- "#version 430 / 
GL_ARB_shader_storage_buffer_object "
- "required for defining shader storage blocks");
- } else if (state->ARB_shader_storage_buffer_object_warn) {
-_mesa_glsl_warning(& @1, state,
-   "#version 430 / 
GL_ARB_shader_storage_buffer_object "
-   "required for defining shader storage blocks");
- }
-  } else if ($1.flags.q.uniform) {
- if (!state->has_uniform_buffer_objects()) {
-_mesa_glsl_error(& @1, state,
- "#version 140 / GL_ARB_uniform_buffer_object "
- "required for defining uniform blocks");
- } else if (state->ARB_uniform_buffer_object_warn) {
-_mesa_glsl_warning(& @1, state,
-   "#version 140 / GL_ARB_uniform_buffer_object "
-   "required for defining uniform blocks");
- }
-  } else {
- if (state->es_shader || state->language_version < 150) {
-_mesa_glsl_error(& @1, state,
- "#version 150 required for using "
- "interface blocks");
- }
-  }
-
-  /* From the GLSL 1.50.11 spec, section 4.3.7 ("Interface Blocks"):
-   * "It is illegal to have an input block in a vertex shader
-   *  or an output block in a fragment shader"
-   */
-  if ((state->stage == MESA_SHADER_VERTEX) && $1.flags.q.in) {
- _mesa_glsl_error(& @1, state,
-  "`in' interface block is not allowed for "
-  "a vertex shader");
-  } else if ((state->stage == MESA_SHADER_FRAGMENT) && $1.flags.q.out) {
- _mesa_glsl_error(& @1, state,
-  "`out' interface block is not allowed for "
-  "a fragment shader");
-  }
-
-  /* Since block arrays require names, and both features are added in
-   * the same language versions, we don't have to explicitly
-   * version-check both things.
-   */
-  if (block->instance_name != NULL) {
- state->check_version(150, 300, & @1, "interface blocks with "
-   "an instance name are not allowed");
-  }
-
-  uint64_t interface_type_mask;
-  struct ast_type_qualifier temp_type_qualifier;
-
-  /* Get a bitmask containing only the in/out/uniform/buffer
-   * flags, allowing us to ignore other irrelevant flags like
-   * interpolation qualifiers.
-   */
-  temp_type_qualifier.flags.i = 0;
-  temp_type_qualifier.flags.q.uniform = true;
-  temp_type_qualifier.flags.q.buffer = true;
-  temp_type_qualifier.flags.q.in = true;
-  temp_type_qualifier.flags.q.out = true;
-  interface_type_mask = temp_type_qualifier.flags.i;
-
-  /* Get the block's interface qualifier.  The interface_qualifier
-   * production rule guarantees that only one bit will be set (and
-   * it will be in/out/uniform).
-   */
-  uint64_t block_interface_qualifier = $1.flags.i;
-
-  block->layout.flags.i |= block_interface_qualifier;
-
-  if (state->stage == MESA_SHADER_GEOMETRY &&
-  state->has_explicit_attrib_stream()) {
- /* Assign global layout's stream value. */
- block->layout.flags.q.stream = 1;
- block->layout.flags.q.explicit_stream = 0;
- block->layout.stream = state->out_qualifier->stream;
-  }
-
-  foreach_list_typed (ast_declarator_list, member, link, 
&block->declarations) {
- ast_type_qual

[Mesa-dev] [PATCH] glsl: Add std430 related member functions to glsl_type class

2015-09-22 Thread Samuel Iglesias Gonsalvez
They are used to calculate size, base alignment and array stride values
for a glsl_type following std430 rules.

v2:
- Paste OpenGL 4.3 spec wording as it mentions stride of array. (Jordan)

Signed-off-by: Samuel Iglesias Gonsalvez 
Reviewed-by: Jordan Justen 
---
 src/glsl/glsl_types.cpp | 207 
 src/glsl/glsl_types.h   |  19 +
 2 files changed, 226 insertions(+)

diff --git a/src/glsl/glsl_types.cpp b/src/glsl/glsl_types.cpp
index 07d7248..93034a6 100644
--- a/src/glsl/glsl_types.cpp
+++ b/src/glsl/glsl_types.cpp
@@ -1388,6 +1388,213 @@ glsl_type::std140_size(bool row_major) const
return -1;
 }
 
+unsigned
+glsl_type::std430_base_alignment(bool row_major) const
+{
+
+   unsigned N = is_double() ? 8 : 4;
+
+   /* (1) If the member is a scalar consuming  basic machine units, the
+* base alignment is .
+*
+* (2) If the member is a two- or four-component vector with components
+* consuming  basic machine units, the base alignment is 2 or
+* 4, respectively.
+*
+* (3) If the member is a three-component vector with components consuming
+*  basic machine units, the base alignment is 4.
+*/
+   if (this->is_scalar() || this->is_vector()) {
+  switch (this->vector_elements) {
+  case 1:
+ return N;
+  case 2:
+ return 2 * N;
+  case 3:
+  case 4:
+ return 4 * N;
+  }
+   }
+
+   /* OpenGL 4.30 spec, section 7.6.2.2 "Standard Uniform Block Layout":
+*
+* "When using the std430 storage layout, shader storage blocks will be
+* laid out in buffer storage identically to uniform and shader storage
+* blocks using the std140 layout, except that the base alignment and
+* stride of arrays of scalars and vectors in rule 4 and of structures
+* in rule 9 are not rounded up a multiple of the base alignment of a vec4.
+*/
+
+   /* (1) If the member is a scalar consuming  basic machine units, the
+* base alignment is .
+*
+* (2) If the member is a two- or four-component vector with components
+* consuming  basic machine units, the base alignment is 2 or
+* 4, respectively.
+*
+* (3) If the member is a three-component vector with components consuming
+*  basic machine units, the base alignment is 4.
+*/
+   if (this->is_array())
+  return this->fields.array->std430_base_alignment(row_major);
+
+   /* (5) If the member is a column-major matrix with  columns and
+*  rows, the matrix is stored identically to an array of
+*  column vectors with  components each, according to
+* rule (4).
+*
+* (7) If the member is a row-major matrix with  columns and 
+* rows, the matrix is stored identically to an array of 
+* row vectors with  components each, according to rule (4).
+*/
+   if (this->is_matrix()) {
+  const struct glsl_type *vec_type, *array_type;
+  int c = this->matrix_columns;
+  int r = this->vector_elements;
+
+  if (row_major) {
+ vec_type = get_instance(base_type, c, 1);
+ array_type = glsl_type::get_array_instance(vec_type, r);
+  } else {
+ vec_type = get_instance(base_type, r, 1);
+ array_type = glsl_type::get_array_instance(vec_type, c);
+  }
+
+  return array_type->std430_base_alignment(false);
+   }
+
+  /* (9) If the member is a structure, the base alignment of the
+* structure is , where  is the largest base alignment
+* value of any of its members, and rounded up to the base
+* alignment of a vec4. The individual members of this
+* sub-structure are then assigned offsets by applying this set
+* of rules recursively, where the base offset of the first
+* member of the sub-structure is equal to the aligned offset
+* of the structure. The structure may have padding at the end;
+* the base offset of the member following the sub-structure is
+* rounded up to the next multiple of the base alignment of the
+* structure.
+*/
+   if (this->is_record()) {
+  unsigned base_alignment = 0;
+  for (unsigned i = 0; i < this->length; i++) {
+ bool field_row_major = row_major;
+ const enum glsl_matrix_layout matrix_layout =
+glsl_matrix_layout(this->fields.structure[i].matrix_layout);
+ if (matrix_layout == GLSL_MATRIX_LAYOUT_ROW_MAJOR) {
+field_row_major = true;
+ } else if (matrix_layout == GLSL_MATRIX_LAYOUT_COLUMN_MAJOR) {
+field_row_major = false;
+ }
+
+ const struct glsl_type *field_type = this->fields.structure[i].type;
+ base_alignment = MAX2(base_alignment,
+   
field_type->std430_base_alignment(field_row_major));
+  }
+  return base_alignment;
+   }
+   assert(!"not reached");
+   return -1;
+}
+
+unsigned
+glsl_type::std430_array_stride(bool row_major) con

[Mesa-dev] [PATCH] glsl: Add parser/compiler support for unsized array's length()

2015-09-22 Thread Samuel Iglesias Gonsalvez
The unsized array length is computed with the following formula:

array.length() =
   max((buffer_object_size - offset_of_array) / stride_of_array, 0)

Of these, only the buffer size needs to be provided by the backends, the
frontend already knows the values of the two other variables.

This patch identifies the cases where we need to get the length of an
unsized array, injecting ir_unop_ssbo_unsized_array_length expressions
that will be lowered (in a later patch) to inject the formula mentioned
above.

It also adds the ir_unop_get_buffer_size expression that drivers will
implement to provide the buffer length.

v2:
- Do not define a triop that will force backends to implement the
  entire formula, they should only need to provide the buffer size
  since the other values are known by the frontend (Curro).

v3:
- Call state->has_shader_storage_buffer_objects() in ast_function.cpp instead
  of using state->ARB_shader_storage_buffer_object_enable (Tapani).

Signed-off-by: Samuel Iglesias Gonsalvez 
---
 src/glsl/ast_function.cpp | 13 +
 src/glsl/ir.cpp   |  7 +++
 src/glsl/ir.h | 19 ++-
 src/glsl/ir_validate.cpp  | 11 +++
 src/glsl/link_uniforms.cpp| 10 --
 .../drivers/dri/i965/brw_fs_channel_expressions.cpp   |  2 ++
 src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp|  8 
 src/mesa/program/ir_to_mesa.cpp   |  2 ++
 src/mesa/state_tracker/st_glsl_to_tgsi.cpp|  5 +
 9 files changed, 70 insertions(+), 7 deletions(-)

diff --git a/src/glsl/ast_function.cpp b/src/glsl/ast_function.cpp
index 803edf5..ff5ecb9 100644
--- a/src/glsl/ast_function.cpp
+++ b/src/glsl/ast_function.cpp
@@ -1593,11 +1593,16 @@ ast_function_expression::handle_method(exec_list 
*instructions,
 
   if (op->type->is_array()) {
  if (op->type->is_unsized_array()) {
-_mesa_glsl_error(&loc, state, "length called on unsized array");
-goto fail;
+if (!state->has_shader_storage_buffer_objects()) {
+   _mesa_glsl_error(&loc, state, "length called on unsized array"
+ " only available with "
+ 
"ARB_shader_storage_buffer_object");
+}
+/* Calculate length of an unsized array in run-time */
+result = new(ctx) ir_expression(ir_unop_ssbo_unsized_array_length, 
op);
+ } else {
+result = new(ctx) ir_constant(op->type->array_size());
  }
-
- result = new(ctx) ir_constant(op->type->array_size());
   } else if (op->type->is_vector()) {
  if (state->ARB_shading_language_420pack_enable) {
 /* .length() returns int. */
diff --git a/src/glsl/ir.cpp b/src/glsl/ir.cpp
index b9df976..2c45b9e 100644
--- a/src/glsl/ir.cpp
+++ b/src/glsl/ir.cpp
@@ -342,6 +342,11 @@ ir_expression::ir_expression(int op, ir_rvalue *op0)
   op0->type->vector_elements, 1);
   break;
 
+   case ir_unop_get_buffer_size:
+   case ir_unop_ssbo_unsized_array_length:
+  this->type = glsl_type::int_type;
+  break;
+
default:
   assert(!"not reached: missing automatic type setup for ir_expression");
   this->type = op0->type;
@@ -571,6 +576,8 @@ static const char *const operator_strs[] = {
"noise",
"subroutine_to_int",
"interpolate_at_centroid",
+   "get_buffer_size",
+   "ssbo_unsized_array_length",
"+",
"-",
"*",
diff --git a/src/glsl/ir.h b/src/glsl/ir.h
index 48b6795..43a2bf0 100644
--- a/src/glsl/ir.h
+++ b/src/glsl/ir.h
@@ -1425,9 +1425,26 @@ enum ir_expression_operation {
ir_unop_interpolate_at_centroid,
 
/**
+* Ask the driver for the total size of a buffer block.
+*
+* operand0 is the ir_constant buffer block index in the linked shader.
+*/
+   ir_unop_get_buffer_size,
+
+   /**
+* Calculate length of an unsized array inside a buffer block.
+* This opcode is going to be replaced in a lowering pass inside
+* the linker.
+*
+* operand0 is the unsized array's ir_value for the calculation
+* of its length.
+*/
+   ir_unop_ssbo_unsized_array_length,
+
+   /**
 * A sentinel marking the last of the unary operations.
 */
-   ir_last_unop = ir_unop_interpolate_at_centroid,
+   ir_last_unop = ir_unop_ssbo_unsized_array_length,
 
ir_binop_add,
ir_binop_sub,
diff --git a/src/glsl/ir_validate.cpp b/src/glsl/ir_validate.cpp
index 3f0dea7..935571a 100644
--- a/src/glsl/ir_validate.cpp
+++ b/src/glsl/ir_validate.cpp
@@ -409,6 +409,17 @@ ir_validate::visit_leave(ir_expression *ir)
   assert(ir->operands[0]->type->is_float());
   break;
 
+   case ir_unop_get_buffer_size:
+  assert(ir->type == glsl_type::int_type);
+  assert(ir->operands[0]->type ==

[Mesa-dev] [PATCH] glsl: add std430 interface packing support to ssbo related operations

2015-09-22 Thread Samuel Iglesias Gonsalvez
v2:
- Get interface packing information from interface's type, not the
  variable type.
- Simplify is_std430 condition in emit_access() for readability (Jordan)
- Add a commment explaing why array of three-component vector case is
  different in std430 than the rest of cases.
- Add calls to std430_array_stride().

v3:
- Simplify size_mul change for std430's case (Jordan)
- Fix commit log lines length (Jordan)
- Pass 'packing' instead of 'is_std430' to emit_access() (Kristian)

Signed-off-by: Samuel Iglesias Gonsalvez 
Reviewed-by: Jordan Justen 
---
 src/glsl/lower_ubo_reference.cpp | 111 ---
 1 file changed, 81 insertions(+), 30 deletions(-)

diff --git a/src/glsl/lower_ubo_reference.cpp b/src/glsl/lower_ubo_reference.cpp
index 8694383..4aaa259 100644
--- a/src/glsl/lower_ubo_reference.cpp
+++ b/src/glsl/lower_ubo_reference.cpp
@@ -147,7 +147,8 @@ public:
 ir_rvalue **offset,
 unsigned *const_offset,
 bool *row_major,
-int *matrix_columns);
+int *matrix_columns,
+unsigned packing);
ir_expression *ubo_load(const struct glsl_type *type,
   ir_rvalue *offset);
ir_call *ssbo_load(const struct glsl_type *type,
@@ -164,7 +165,7 @@ public:
void emit_access(bool is_write, ir_dereference *deref,
 ir_variable *base_offset, unsigned int deref_offset,
 bool row_major, int matrix_columns,
-unsigned write_mask);
+unsigned packing, unsigned write_mask);
 
ir_visitor_status visit_enter(class ir_expression *);
ir_expression *calculate_ssbo_unsized_array_length(ir_expression *expr);
@@ -176,7 +177,8 @@ public:
 ir_variable *);
ir_expression *emit_ssbo_get_buffer_size();
 
-   unsigned calculate_unsized_array_stride(ir_dereference *deref);
+   unsigned calculate_unsized_array_stride(ir_dereference *deref,
+   unsigned packing);
 
void *mem_ctx;
struct gl_shader *shader;
@@ -257,7 +259,8 @@ 
lower_ubo_reference_visitor::setup_for_load_or_store(ir_variable *var,
  ir_rvalue **offset,
  unsigned *const_offset,
  bool *row_major,
- int *matrix_columns)
+ int *matrix_columns,
+ unsigned packing)
 {
/* Determine the name of the interface block */
ir_rvalue *nonconst_block_index;
@@ -343,8 +346,15 @@ 
lower_ubo_reference_visitor::setup_for_load_or_store(ir_variable *var,
 const bool array_row_major =
is_dereferenced_thing_row_major(deref_array);
 
-array_stride = deref_array->type->std140_size(array_row_major);
-array_stride = glsl_align(array_stride, 16);
+/* The array type will give the correct interface packing
+ * information
+ */
+if (packing == GLSL_INTERFACE_PACKING_STD430) {
+   array_stride = 
deref_array->type->std430_array_stride(array_row_major);
+} else {
+   array_stride = deref_array->type->std140_size(array_row_major);
+   array_stride = glsl_align(array_stride, 16);
+}
  }
 
  ir_rvalue *array_index = deref_array->array_index;
@@ -380,7 +390,12 @@ 
lower_ubo_reference_visitor::setup_for_load_or_store(ir_variable *var,
 
 ralloc_free(field_deref);
 
-unsigned field_align = 
type->std140_base_alignment(field_row_major);
+unsigned field_align = 0;
+
+if (packing == GLSL_INTERFACE_PACKING_STD430)
+   field_align = type->std430_base_alignment(field_row_major);
+else
+   field_align = type->std140_base_alignment(field_row_major);
 
 intra_struct_offset = glsl_align(intra_struct_offset, field_align);
 
@@ -388,7 +403,10 @@ 
lower_ubo_reference_visitor::setup_for_load_or_store(ir_variable *var,
deref_record->field) == 0)
break;
 
-intra_struct_offset += type->std140_size(field_row_major);
+if (packing == GLSL_INTERFACE_PACKING_STD430)
+   intra_struct_offset += type->std430_size(field_row_major);
+else
+   intra_struct_offset += type->std140_size(field_row_major);
 
 /* If the field just examined was itself a structure, apply rule
  * #9:
@@ -437,13 +455,15 @@ lower_ubo_reference_visitor::handle_rvalue(ir_rvalue 
**rvalue)
unsigned const_offset;
bool row_major;
   

  1   2   >