Re: [Mesa-dev] [PATCH 2/2] nir/lower_vec_to_movs: Don't emit unneeded movs
On Sep 22, 2015 10:01 PM, "Jason Ekstrand" wrote: > > It's possible that, if a vecN operation is involved in a phi node, that we > could end up moving from a register to itself. If swizzling is involved, > we need to emit the move but. However, if there is no swizzling, then the > mov is a no-op and we might as well not bother emitting it. > > Shader-db results on Haswell: > >total instructions in shared programs: 6262536 -> 6259558 (-0.05%) >instructions in affected programs: 184780 -> 181802 (-1.61%) >helped:838 >HURT: 0 By the way, I have absolutely no idea why this helps more than Matt's patch to delete these in register coalesce. > --- > src/glsl/nir/nir_lower_vec_to_movs.c | 19 ++- > 1 file changed, 18 insertions(+), 1 deletion(-) > > diff --git a/src/glsl/nir/nir_lower_vec_to_movs.c b/src/glsl/nir/nir_lower_vec_to_movs.c > index 287f2bf..2039891 100644 > --- a/src/glsl/nir/nir_lower_vec_to_movs.c > +++ b/src/glsl/nir/nir_lower_vec_to_movs.c > @@ -83,7 +83,24 @@ insert_mov(nir_alu_instr *vec, unsigned start_idx, nir_shader *shader) >} > } > > - nir_instr_insert_before(&vec->instr, &mov->instr); > + /* In some situations (if the vecN is involved in a phi-web), we can end > +* up with a mov from a register to itself. Some of those channels may end > +* up doing nothing and there's no reason to have them as part of the mov. > +*/ > + if (src_matches_dest_reg(&mov->dest.dest, &mov->src[0].src) && > + !mov->src[0].abs && !mov->src[0].negate) { > + for (unsigned i = 0; i < 4; i++) { > + if (mov->src[0].swizzle[i] == i) > +mov->dest.write_mask &= ~(1 << i); > + } > + } > + > + /* Only emit the instruction if it actually does something */ > + if (mov->dest.write_mask) { > + nir_instr_insert_before(&vec->instr, &mov->instr); > + } else { > + ralloc_free(mov); > + } > > return mov->dest.write_mask; > } > -- > 2.5.0.400.gff86faf > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] nir/lower_vec_to_movs: Properly handle source modifiers on vecN ops
I don't know of any piglit tests that are currently broken. However, there is nothing stopping a vecN instruction from getting source modifiers and lower_vec_to_movs is run after we lower to source modifiers. --- src/glsl/nir/nir_lower_vec_to_movs.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/src/glsl/nir/nir_lower_vec_to_movs.c b/src/glsl/nir/nir_lower_vec_to_movs.c index 622e59c..287f2bf 100644 --- a/src/glsl/nir/nir_lower_vec_to_movs.c +++ b/src/glsl/nir/nir_lower_vec_to_movs.c @@ -68,12 +68,16 @@ insert_mov(nir_alu_instr *vec, unsigned start_idx, nir_shader *shader) mov->dest.write_mask = (1u << start_idx); mov->src[0].swizzle[start_idx] = vec->src[start_idx].swizzle[0]; + mov->src[0].negate = vec->src[start_idx].negate; + mov->src[0].abs = vec->src[start_idx].abs; for (unsigned i = start_idx + 1; i < 4; i++) { if (!(vec->dest.write_mask & (1 << i))) continue; - if (nir_srcs_equal(vec->src[i].src, vec->src[start_idx].src)) { + if (nir_srcs_equal(vec->src[i].src, vec->src[start_idx].src) && + vec->src[i].negate == vec->src[start_idx].negate && + vec->src[i].abs == vec->src[start_idx].abs) { mov->dest.write_mask |= (1 << i); mov->src[0].swizzle[i] = vec->src[i].swizzle[0]; } -- 2.5.0.400.gff86faf ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/2] nir/lower_vec_to_movs: Don't emit unneeded movs
It's possible that, if a vecN operation is involved in a phi node, that we could end up moving from a register to itself. If swizzling is involved, we need to emit the move but. However, if there is no swizzling, then the mov is a no-op and we might as well not bother emitting it. Shader-db results on Haswell: total instructions in shared programs: 6262536 -> 6259558 (-0.05%) instructions in affected programs: 184780 -> 181802 (-1.61%) helped:838 HURT: 0 --- src/glsl/nir/nir_lower_vec_to_movs.c | 19 ++- 1 file changed, 18 insertions(+), 1 deletion(-) diff --git a/src/glsl/nir/nir_lower_vec_to_movs.c b/src/glsl/nir/nir_lower_vec_to_movs.c index 287f2bf..2039891 100644 --- a/src/glsl/nir/nir_lower_vec_to_movs.c +++ b/src/glsl/nir/nir_lower_vec_to_movs.c @@ -83,7 +83,24 @@ insert_mov(nir_alu_instr *vec, unsigned start_idx, nir_shader *shader) } } - nir_instr_insert_before(&vec->instr, &mov->instr); + /* In some situations (if the vecN is involved in a phi-web), we can end +* up with a mov from a register to itself. Some of those channels may end +* up doing nothing and there's no reason to have them as part of the mov. +*/ + if (src_matches_dest_reg(&mov->dest.dest, &mov->src[0].src) && + !mov->src[0].abs && !mov->src[0].negate) { + for (unsigned i = 0; i < 4; i++) { + if (mov->src[0].swizzle[i] == i) +mov->dest.write_mask &= ~(1 << i); + } + } + + /* Only emit the instruction if it actually does something */ + if (mov->dest.write_mask) { + nir_instr_insert_before(&vec->instr, &mov->instr); + } else { + ralloc_free(mov); + } return mov->dest.write_mask; } -- 2.5.0.400.gff86faf ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 08/11] nir: Add unit tests for control flow graphs.
On Tue, Sep 22, 2015 at 11:01 PM, Kenneth Graunke wrote: > Signed-off-by: Kenneth Graunke > --- > src/glsl/Makefile.am | 14 +++ > src/glsl/nir/tests/control_flow_tests.cpp | 155 > ++ > 2 files changed, 169 insertions(+) > create mode 100644 src/glsl/nir/tests/control_flow_tests.cpp > > For now, this only adds a single test. It was broken before this series, > and now it works. We certainly need to add more tests; I'm not sure whether > it makes sense to commit this now or later. > > Suggestions for mean things to test are welcome. Oh boy, I've thought of so many in the past... if you're not inspired, I'd suggest going through nir_control_flow.c and figuring out how each of the possible paths can be taken, especially in the lower-level routines. If it's not relatively obvious or likely to get hit (and I know for a fact these cases do exist), then we should probably have a unit test that exercises it. It's too late tonight, but I should probably do that as well. Also, it might also be a good idea to also test the bug fixed in patch 03. I don't think adding extra tests should hold back this getting committed, though -- leaving it around uncommitted will probably increase the chance that it becomes stale and make it harder for others (aka me and possibly Jason) to add tests. > > diff --git a/src/glsl/Makefile.am b/src/glsl/Makefile.am > index 1aa9caa..3265391 100644 > --- a/src/glsl/Makefile.am > +++ b/src/glsl/Makefile.am > @@ -50,12 +50,14 @@ EXTRA_DIST = tests glcpp/tests README TODO glcpp/README > \ > nir/nir_opcodes_c.py\ > nir/nir_opcodes_h.py\ > nir/nir_opt_algebraic.py\ > + nir/tests \ > SConscript > > include Makefile.sources > > TESTS = glcpp/tests/glcpp-test \ > glcpp/tests/glcpp-test-cr-lf\ > +nir/tests/control_flow_tests \ > tests/blob-test \ > tests/general-ir-test \ > tests/optimization-test \ > @@ -70,6 +72,7 @@ noinst_LTLIBRARIES = libnir.la libglsl.la libglcpp.la > check_PROGRAMS = \ > glcpp/glcpp \ > glsl_test \ > + nir/tests/control_flow_tests\ > tests/blob-test \ > tests/general-ir-test \ > tests/sampler-types-test\ > @@ -263,3 +266,14 @@ nir/nir_opcodes.c: nir/nir_opcodes.py > nir/nir_opcodes_c.py > nir/nir_opt_algebraic.c: nir/nir_opt_algebraic.py nir/nir_algebraic.py > $(MKDIR_GEN) > $(PYTHON_GEN) $(srcdir)/nir/nir_opt_algebraic.py > $@ > + > +nir_tests_control_flow_tests_SOURCES = \ > + nir/tests/control_flow_tests.cpp > +nir_tests_control_flow_tests_CFLAGS = \ > + $(PTHREAD_CFLAGS) > +nir_tests_control_flow_tests_LDADD = \ > + $(top_builddir)/src/gtest/libgtest.la \ > + $(top_builddir)/src/glsl/libnir.la \ > + $(top_builddir)/src/libglsl_util.la \ > + $(top_builddir)/src/util/libmesautil.la \ > + $(PTHREAD_LIBS) > diff --git a/src/glsl/nir/tests/control_flow_tests.cpp > b/src/glsl/nir/tests/control_flow_tests.cpp > new file mode 100644 > index 000..b9f90e6 > --- /dev/null > +++ b/src/glsl/nir/tests/control_flow_tests.cpp > @@ -0,0 +1,155 @@ > +/* > + * Copyright © 2015 Intel Corporation > + * > + * Permission is hereby granted, free of charge, to any person obtaining a > + * copy of this software and associated documentation files (the "Software"), > + * to deal in the Software without restriction, including without limitation > + * the rights to use, copy, modify, merge, publish, distribute, sublicense, > + * and/or sell copies of the Software, and to permit persons to whom the > + * Software is furnished to do so, subject to the following conditions: > + * > + * The above copyright notice and this permission notice (including the next > + * paragraph) shall be included in all copies or substantial portions of the > + * Software. > + * > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL > + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING > + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER > + * DEALINGS IN THE SOFTWARE. > + */ > +#include > +#include "nir.h" >
Re: [Mesa-dev] [PATCH 07/11] nir/cf: Fix dominance metadata in the dead control flow pass.
On Tue, Sep 22, 2015 at 11:01 PM, Kenneth Graunke wrote: > The NIR control flow modification API churns the block structure, > splitting blocks, stitching them back together, and so on. Preserving > information about block dominance is hard (and probably not worthwhile). > > This patch makes nir_cf_extract() throw away all metadata, like we do > when adding/removing jumps. > > We then make the dead control flow pass compute dominance information > right before it uses it. This is necessary because earlier work by the > pass may have invalidated it. > > Signed-off-by: Kenneth Graunke Reviewed-by: Connor Abbott > --- > src/glsl/nir/nir_control_flow.c | 3 +++ > src/glsl/nir/nir_opt_dead_cf.c | 7 --- > 2 files changed, 7 insertions(+), 3 deletions(-) > > New patch! > > diff --git a/src/glsl/nir/nir_control_flow.c b/src/glsl/nir/nir_control_flow.c > index e027766..8ec1252 100644 > --- a/src/glsl/nir/nir_control_flow.c > +++ b/src/glsl/nir/nir_control_flow.c > @@ -742,6 +742,9 @@ nir_cf_extract(nir_cf_list *extracted, nir_cursor begin, > nir_cursor end) > extracted->impl = nir_cf_node_get_function(&block_begin->cf_node); > exec_list_make_empty(&extracted->list); > > + /* Dominance and other block-related information is toast. */ > + nir_metadata_preserve(extracted->impl, nir_metadata_none); > + > nir_cf_node *cf_node = &block_begin->cf_node; > nir_cf_node *cf_node_end = &block_end->cf_node; > while (true) { > diff --git a/src/glsl/nir/nir_opt_dead_cf.c b/src/glsl/nir/nir_opt_dead_cf.c > index 317bbc5..0d4819b 100644 > --- a/src/glsl/nir/nir_opt_dead_cf.c > +++ b/src/glsl/nir/nir_opt_dead_cf.c > @@ -203,6 +203,10 @@ loop_is_dead(nir_loop *loop) > NULL)) >return false; > > + nir_function_impl *impl = nir_cf_node_get_function(&loop->cf_node); > + nir_metadata_require(impl, nir_metadata_live_variables | > + nir_metadata_dominance); > + > for (nir_block *cur = after->imm_dom; cur != before; cur = cur->imm_dom) { >nir_foreach_instr(cur, instr) { > if (!nir_foreach_ssa_def(instr, def_not_live_out, after)) > @@ -332,9 +336,6 @@ dead_cf_list(struct exec_list *list, bool > *list_ends_in_jump) > static bool > opt_dead_cf_impl(nir_function_impl *impl) > { > - nir_metadata_require(impl, nir_metadata_live_variables | > - nir_metadata_dominance); > - > bool dummy; > bool progress = dead_cf_list(&impl->body, &dummy); > > -- > 2.5.3 > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 05/11] nir/cf: Alter block successors before adding a fake link.
On Tue, Sep 22, 2015 at 11:01 PM, Kenneth Graunke wrote: > Consider the case of "while (...) { break }". Or in NIR: > > block block_0 (0x7ab640): > ... > /* succs: block_1 */ > loop { > block block_1: > /* preds: block_0 */ > break > /* succs: block_2 */ > } > block block_2: > > Calling nir_handle_remove_jump(block_1, nir_jump_break) will remove the break. > Unfortunately, it would mangle the predecessors and successors. > > Here, block_2->predecessors->entries == 1, so we would create a fake > link, setting block_1->successors[1] = block_2, and adding block_1 to > block_2's predecessor set. This is illegal: a block cannot specify the > same successor twice. In particular, adding the predecessor would have > no effect, as it was already present in the set. > > We'd then call unlink_block_successors(), which would delete the fake > link and remove block_1 from block_2's predecessor set. It would then > delete successors[0], and attempt to remove block_1 from block_2's > predecessor set a second time...except that it wouldn't be present, > triggering an assertion failure. > > The fix appears to be simple: move the fake link creation after we > recreate the block's successors. Since we're inspecting "next" later, > If we've created a new infinite > loop, the code following the loop will be dead, and thus 0 predecessors. > So we create a new fake successor. > > simply unlink the block's successors and > recreate them to point at the correct blocks first. Then, add the fake > link. In the above example, removing the break would cause block_1 to > have itself as a successor (as it becomes an infinite loop), and then > > Signed-off-by: Kenneth Graunke > --- > src/glsl/nir/nir_control_flow.c | 30 ++ > 1 file changed, 14 insertions(+), 16 deletions(-) > > New patch! > > diff --git a/src/glsl/nir/nir_control_flow.c b/src/glsl/nir/nir_control_flow.c > index 2b23f38..a59add5 100644 > --- a/src/glsl/nir/nir_control_flow.c > +++ b/src/glsl/nir/nir_control_flow.c > @@ -551,31 +551,29 @@ remove_phi_src(nir_block *block, nir_block *pred) > static void > unlink_jump(nir_block *block, nir_jump_type type, bool add_normal_successors) > { > + nir_block *next = block->successors[0]; > + > if (block->successors[0]) >remove_phi_src(block->successors[0], block); > if (block->successors[1]) >remove_phi_src(block->successors[1], block); > > - if (type == nir_jump_break) { > - nir_block *next = block->successors[0]; > + unlink_block_successors(block); > + if (add_normal_successors) > + block_add_normal_succs(block); > > - if (next->predecessors->entries == 1) { > - nir_loop *loop = > -nir_cf_node_as_loop(nir_cf_node_prev(&next->cf_node)); > + if (type == nir_jump_break && next->predecessors->entries == 0) { > + nir_loop *loop = > + nir_cf_node_as_loop(nir_cf_node_prev(&next->cf_node)); > > - /* insert fake link */ > - nir_cf_node *last = nir_loop_last_cf_node(loop); > - assert(last->type == nir_cf_node_block); > - nir_block *last_block = nir_cf_node_as_block(last); > + /* insert fake link */ I was a little terse here with this comment, and based on how long it took to actually fix this, there seems to be a lot going on that one could easily miss from reading the code. The commit message has a pretty good summary of the problem, but perhaps we should at least add a comment here explaining that we can't do this until now since if the break is at the end of the loop then last_block == block and we'd be inserting the same successor twice and generally screwing things up. Other than that, this patch and the previous one are Reviewed-by: Connor Abbott > + nir_cf_node *last = nir_loop_last_cf_node(loop); > + assert(last->type == nir_cf_node_block); > + nir_block *last_block = nir_cf_node_as_block(last); > > - last_block->successors[1] = next; > - block_add_pred(next, last_block); > - } > + last_block->successors[1] = next; > + block_add_pred(next, last_block); > } > - > - unlink_block_successors(block); > - if (add_normal_successors) > - block_add_normal_succs(block); > } > > void > -- > 2.5.3 > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] st/mesa: keep track of saturated writes when eliminating dead code
It doesn't matter whether a write is saturated or not, in another implementation it might even have been a separate opcode. This code was most likely copied from the copy-propagation pass (where one does have to distinguish saturation). Signed-off-by: Ilia Mirkin --- Haven't run this through piglit yet, but happened upon it when I was debugging the other issue. (Which turned out to be an odd dependence on this pass by the regular emitter code.) src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp index 6762566..ee0acb5 100644 --- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp +++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp @@ -4228,8 +4228,7 @@ glsl_to_tgsi_visitor::eliminate_dead_code(void) */ for (unsigned i = 0; i < ARRAY_SIZE(inst->dst); i++) { if (inst->dst[i].file == PROGRAM_TEMPORARY && - !inst->dst[i].reladdr && - !inst->saturate) { + !inst->dst[i].reladdr) { for (int c = 0; c < 4; c++) { if (inst->dst[i].writemask & (1 << c)) { if (writes[4 * inst->dst[i].index + c]) { -- 2.4.9 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 03/11] nir/cf: Don't break outer-block successors in split_block_beginning().
First three are Reviewed-by: Jason Ekstrand I'm going to need more brain power for the rest. --Jason On Tue, Sep 22, 2015 at 8:01 PM, Kenneth Graunke wrote: > Consider the following NIR: > >block block_0; >/* succs: block_1 block_2 */ >if (...) { > block block_1; > ... >} else { > block block_2; >} > > Calling split_block_beginning() on block_1 would break block_0's > successors: link_block() sets both successors of a block, so calling > link_block(block_0, new_block, NULL) would throw away the second > successor, leaving only /* succ: new_block */. This is invalid: the > block before an if statement must have two successors. > > Changing the call to link_block(pred, new_block, pred->successors[0]) > would correctly leave both successors in place, but because unlink_block > may shift successor[1] to successor[0], it may not preserve the original > order. NIR maintains a convention that successor[0] must point to the > "then" block, while successor[1] points to the "else" block, so we need > to take care to preserve this ordering. > > This patch creates a new function that swaps out one successor for > another, preserving the ordering. It then uses this to fix the issue. > > Signed-off-by: Kenneth Graunke > Reviewed-by: Connor Abbott > --- > src/glsl/nir/nir_control_flow.c | 21 ++--- > 1 file changed, 18 insertions(+), 3 deletions(-) > > ***UNCHANGED SINCE THE FIRST SEND*** > > diff --git a/src/glsl/nir/nir_control_flow.c b/src/glsl/nir/nir_control_flow.c > index 43e4e43..e2a151d 100644 > --- a/src/glsl/nir/nir_control_flow.c > +++ b/src/glsl/nir/nir_control_flow.c > @@ -200,6 +200,23 @@ link_block_to_non_block(nir_block *block, nir_cf_node > *node) > } > > /** > + * Replace a block's successor with a different one. > + */ > +static void > +replace_successor(nir_block *block, nir_block *old_succ, nir_block *new_succ) > +{ > + if (block->successors[0] == old_succ) { > + block->successors[0] = new_succ; > + } else { > + assert(block->successors[1] == old_succ); > + block->successors[1] = new_succ; > + } > + > + block_remove_pred(old_succ, block); > + block_add_pred(new_succ, block); > +} > + > +/** > * Takes a basic block and inserts a new empty basic block before it, making > its > * predecessors point to the new block. This essentially splits the block > into > * an empty header and a body so that another non-block CF node can be > inserted > @@ -217,9 +234,7 @@ split_block_beginning(nir_block *block) > struct set_entry *entry; > set_foreach(block->predecessors, entry) { >nir_block *pred = (nir_block *) entry->key; > - > - unlink_blocks(pred, block); > - link_blocks(pred, new_block, NULL); > + replace_successor(pred, block, new_block); > } > > /* Any phi nodes must stay part of the new block, or else their > -- > 2.5.3 > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] glsl: correctly detect inactive UBO arrays
On Wed, 2015-09-23 at 13:23 +1000, Timothy Arceri wrote: > Previously the code was trying to get the packing type from the array > not the > interface. > > Cc: Ian Romanick > Cc: Antia Puentes I meant to add that there is a piglit test here [1] and that this fixes the second bug mentioned in the bug report [2]. [1] http://lists.freedesktop.org/archives/piglit/2015-September/017247.html [2] https://bugs.freedesktop.org/show_bug.cgi?id=83508 > --- > src/glsl/link_uniform_block_active_visitor.cpp | 6 ++ > src/glsl/opt_dead_code.cpp | 7 ++- > 2 files changed, 4 insertions(+), 9 deletions(-) > > diff --git a/src/glsl/link_uniform_block_active_visitor.cpp > b/src/glsl/link_uniform_block_active_visitor.cpp > index 5102947..72e0782 100644 > --- a/src/glsl/link_uniform_block_active_visitor.cpp > +++ b/src/glsl/link_uniform_block_active_visitor.cpp > @@ -77,9 +77,6 @@ > link_uniform_block_active_visitor::visit(ir_variable *var) > if (!var->is_in_buffer_block()) >return visit_continue; > > - const glsl_type *const block_type = var->is_interface_instance() > - ? var->type : var->get_interface_type(); > - > /* Section 2.11.6 (Uniform Variables) of the OpenGL ES 3.0.3 spec > says: > * > * "All members of a named uniform block declared with a > shared or > @@ -88,7 +85,8 @@ > link_uniform_block_active_visitor::visit(ir_variable *var) > * also considered active, even if no member of the block is > * referenced." > */ > - if (block_type->interface_packing == > GLSL_INTERFACE_PACKING_PACKED) > + if (var->get_interface_type()->interface_packing == > + GLSL_INTERFACE_PACKING_PACKED) >return visit_continue; > > /* Process the block. Bail if there was an error. > diff --git a/src/glsl/opt_dead_code.cpp b/src/glsl/opt_dead_code.cpp > index e4bf874..2cb7f41 100644 > --- a/src/glsl/opt_dead_code.cpp > +++ b/src/glsl/opt_dead_code.cpp > @@ -119,11 +119,8 @@ do_dead_code(exec_list *instructions, bool > uniform_locations_assigned) > * layouts, do not eliminate it. > */ > if (entry->var->is_in_buffer_block()) { > - const glsl_type *const block_type = > - entry->var->is_interface_instance() > - ? entry->var->type : entry->var > ->get_interface_type(); > - > - if (block_type->interface_packing != > GLSL_INTERFACE_PACKING_PACKED) > + if (entry->var->get_interface_type() > ->interface_packing != > + GLSL_INTERFACE_PACKING_PACKED) >continue; > } > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 2/6] nir/lower_alu_to_scalar: Return after lower_reduction
On Tue, Sep 22, 2015 at 7:50 PM, Kenneth Graunke wrote: > On Tuesday, September 22, 2015 06:18:12 PM Jason Ekstrand wrote: >> We don't use any of the code after the switch and, as far as I can tell, it >> wouldn't work anyway. The only reason this wasn't causing us problems is >> that it's all dead and DCE cleans it up. > > Huh, really? Right after that switch statement I see: > >if (instr->dest.dest.ssa.num_components == 1) > return; Right. > and reductions should always produce a single component, so this change > seems like it should do nothing. It seems like a reasonable change > regardless...but... Sure. I'll change the commit message. That said, I think my confusion is, itself, enough justification for the change. >> --- >> src/glsl/nir/nir_lower_alu_to_scalar.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/src/glsl/nir/nir_lower_alu_to_scalar.c >> b/src/glsl/nir/nir_lower_alu_to_scalar.c >> index 5ef5ec2..84d4943 100644 >> --- a/src/glsl/nir/nir_lower_alu_to_scalar.c >> +++ b/src/glsl/nir/nir_lower_alu_to_scalar.c >> @@ -86,7 +86,7 @@ lower_alu_instr_scalar(nir_alu_instr *instr, nir_builder >> *b) >> case name##3: \ >> case name##4: \ >>lower_reduction(instr, chan, merge, b); \ >> - break; >> + return; >> >> switch (instr->op) { >> case nir_op_vec4: >> ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] glsl: correctly detect inactive UBO arrays
Previously the code was trying to get the packing type from the array not the interface. Cc: Ian Romanick Cc: Antia Puentes --- src/glsl/link_uniform_block_active_visitor.cpp | 6 ++ src/glsl/opt_dead_code.cpp | 7 ++- 2 files changed, 4 insertions(+), 9 deletions(-) diff --git a/src/glsl/link_uniform_block_active_visitor.cpp b/src/glsl/link_uniform_block_active_visitor.cpp index 5102947..72e0782 100644 --- a/src/glsl/link_uniform_block_active_visitor.cpp +++ b/src/glsl/link_uniform_block_active_visitor.cpp @@ -77,9 +77,6 @@ link_uniform_block_active_visitor::visit(ir_variable *var) if (!var->is_in_buffer_block()) return visit_continue; - const glsl_type *const block_type = var->is_interface_instance() - ? var->type : var->get_interface_type(); - /* Section 2.11.6 (Uniform Variables) of the OpenGL ES 3.0.3 spec says: * * "All members of a named uniform block declared with a shared or @@ -88,7 +85,8 @@ link_uniform_block_active_visitor::visit(ir_variable *var) * also considered active, even if no member of the block is * referenced." */ - if (block_type->interface_packing == GLSL_INTERFACE_PACKING_PACKED) + if (var->get_interface_type()->interface_packing == + GLSL_INTERFACE_PACKING_PACKED) return visit_continue; /* Process the block. Bail if there was an error. diff --git a/src/glsl/opt_dead_code.cpp b/src/glsl/opt_dead_code.cpp index e4bf874..2cb7f41 100644 --- a/src/glsl/opt_dead_code.cpp +++ b/src/glsl/opt_dead_code.cpp @@ -119,11 +119,8 @@ do_dead_code(exec_list *instructions, bool uniform_locations_assigned) * layouts, do not eliminate it. */ if (entry->var->is_in_buffer_block()) { - const glsl_type *const block_type = - entry->var->is_interface_instance() - ? entry->var->type : entry->var->get_interface_type(); - - if (block_type->interface_packing != GLSL_INTERFACE_PACKING_PACKED) + if (entry->var->get_interface_type()->interface_packing != + GLSL_INTERFACE_PACKING_PACKED) continue; } -- 2.4.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 04/11] nir/cf: Conditionally do block_add_normal_succs() in unlink_jump();
There is a bug where we mess up predecessors/successors due to the ordering of unlinking/recreating edges/adding fake edges. In order to fix that, I need everything in one routine. However, calling block_add_normal_succs() isn't safe from cleanup_cf_node() - it would crash trying to insert phi undefs. So unfortunately I need to add a parameter. Signed-off-by: Kenneth Graunke --- src/glsl/nir/nir_control_flow.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) New patch! diff --git a/src/glsl/nir/nir_control_flow.c b/src/glsl/nir/nir_control_flow.c index e2a151d..2b23f38 100644 --- a/src/glsl/nir/nir_control_flow.c +++ b/src/glsl/nir/nir_control_flow.c @@ -548,8 +548,8 @@ remove_phi_src(nir_block *block, nir_block *pred) * infinite loops. Note that the jump to be eliminated may be free-floating. */ -static -void unlink_jump(nir_block *block, nir_jump_type type) +static void +unlink_jump(nir_block *block, nir_jump_type type, bool add_normal_successors) { if (block->successors[0]) remove_phi_src(block->successors[0], block); @@ -574,14 +574,14 @@ void unlink_jump(nir_block *block, nir_jump_type type) } unlink_block_successors(block); + if (add_normal_successors) + block_add_normal_succs(block); } void nir_handle_remove_jump(nir_block *block, nir_jump_type type) { - unlink_jump(block, type); - - block_add_normal_succs(block); + unlink_jump(block, type, true); nir_function_impl *impl = nir_cf_node_get_function(&block->cf_node); nir_metadata_preserve(impl, nir_metadata_none); @@ -689,7 +689,7 @@ cleanup_cf_node(nir_cf_node *node, nir_function_impl *impl) nir_foreach_instr_safe(block, instr) { if (instr->type == nir_instr_type_jump) { nir_jump_type jump_type = nir_instr_as_jump(instr)->type; -unlink_jump(block, jump_type); +unlink_jump(block, jump_type, false); } else { nir_foreach_ssa_def(instr, replace_ssa_def_uses, impl); nir_instr_remove(instr); -- 2.5.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 10/11] i965/gs: Use new NIR intrinsics.
By performing the vertex counting in NIR, we're able to elide a ton of useless safety checks around every EmitVertex() call: total instructions in shared programs: 3952 -> 3720 (-5.87%) instructions in affected programs: 3491 -> 3259 (-6.65%) helped:11 HURT: 0 Improves performance in Gl32GSCloth by 0.671742% +/- 0.142202% (n=621) on Haswell GT3e at 1024x768. This should also make it easier to implement Broadwell's "Static Vertex Count" feature someday. Signed-off-by: Kenneth Graunke Reviewed-by: Jason Ekstrand --- src/mesa/drivers/dri/i965/brw_nir.c | 5 src/mesa/drivers/dri/i965/brw_vec4_gs_nir.cpp | 13 +-- src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp | 28 --- src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp | 28 ++- 4 files changed, 48 insertions(+), 26 deletions(-) ***UNCHANGED SINCE THE FIRST SEND*** diff --git a/src/mesa/drivers/dri/i965/brw_nir.c b/src/mesa/drivers/dri/i965/brw_nir.c index b47b87e..1d4f6ab 100644 --- a/src/mesa/drivers/dri/i965/brw_nir.c +++ b/src/mesa/drivers/dri/i965/brw_nir.c @@ -96,6 +96,11 @@ brw_create_nir(struct brw_context *brw, } nir_validate_shader(nir); + if (stage == MESA_SHADER_GEOMETRY) { + nir_lower_gs_intrinsics(nir); + nir_validate_shader(nir); + } + nir_lower_global_vars_to_local(nir); nir_validate_shader(nir); diff --git a/src/mesa/drivers/dri/i965/brw_vec4_gs_nir.cpp b/src/mesa/drivers/dri/i965/brw_vec4_gs_nir.cpp index 8a8dd57..4f4e1e1 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_gs_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_gs_nir.cpp @@ -92,16 +92,25 @@ vec4_gs_visitor::nir_emit_intrinsic(nir_intrinsic_instr *instr) src_reg src; switch (instr->intrinsic) { - case nir_intrinsic_emit_vertex: { + case nir_intrinsic_emit_vertex_with_counter: { + this->vertex_count = + retype(get_nir_src(instr->src[0], 1), BRW_REGISTER_TYPE_UD); int stream_id = instr->const_index[0]; gs_emit_vertex(stream_id); break; } - case nir_intrinsic_end_primitive: + case nir_intrinsic_end_primitive_with_counter: + this->vertex_count = + retype(get_nir_src(instr->src[0], 1), BRW_REGISTER_TYPE_UD); gs_end_primitive(); break; + case nir_intrinsic_set_vertex_count: + this->vertex_count = + retype(get_nir_src(instr->src[0], 1), BRW_REGISTER_TYPE_UD); + break; + case nir_intrinsic_load_invocation_id: { src_reg invocation_id = src_reg(nir_system_values[SYSTEM_VALUE_INVOCATION_ID]); diff --git a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp index b9694f6..7a5b945 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp @@ -484,14 +484,6 @@ vec4_gs_visitor::gs_emit_vertex(int stream_id) if (stream_id > 0 && shader_prog->TransformFeedback.NumVarying == 0) return; - /* To ensure that we don't output more vertices than the shader specified -* using max_vertices, do the logic inside a conditional of the form "if -* (vertex_count < MAX)" -*/ - unsigned num_output_vertices = c->gp->program.VerticesOut; - emit(CMP(dst_null_d(), this->vertex_count, -src_reg(num_output_vertices), BRW_CONDITIONAL_L)); - emit(IF(BRW_PREDICATE_NORMAL)); { /* If we're outputting 32 control data bits or less, then we can wait * until the shader is over to output them all. Otherwise we need to @@ -562,12 +554,7 @@ vec4_gs_visitor::gs_emit_vertex(int stream_id) this->current_annotation = "emit vertex: Stream control data bits"; set_stream_control_data_bits(stream_id); } - - this->current_annotation = "emit vertex: increment vertex count"; - emit(ADD(dst_reg(this->vertex_count), this->vertex_count, - src_reg(1u))); } - emit(BRW_OPCODE_ENDIF); this->current_annotation = NULL; } @@ -575,7 +562,22 @@ vec4_gs_visitor::gs_emit_vertex(int stream_id) void vec4_gs_visitor::visit(ir_emit_vertex *ir) { + /* To ensure that we don't output more vertices than the shader specified +* using max_vertices, do the logic inside a conditional of the form "if +* (vertex_count < MAX)" +*/ + unsigned num_output_vertices = c->gp->program.VerticesOut; + emit(CMP(dst_null_d(), this->vertex_count, +src_reg(num_output_vertices), BRW_CONDITIONAL_L)); + emit(IF(BRW_PREDICATE_NORMAL)); + gs_emit_vertex(ir->stream_id()); + + this->current_annotation = "emit vertex: increment vertex count"; + emit(ADD(dst_reg(this->vertex_count), this->vertex_count, +src_reg(1u))); + + emit(BRW_OPCODE_ENDIF); } void diff --git a/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp b/src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp index 68e443d..5cfff7b 100644
[Mesa-dev] [PATCH 02/11] nir/cf: Make a helper function for removing a predecessor.
I need to do this in a second place, and I'd rather make a helper function than cut and paste the code. Signed-off-by: Kenneth Graunke Reviewed-by: Connor Abbott --- src/glsl/nir/nir_control_flow.c | 16 +++- 1 file changed, 11 insertions(+), 5 deletions(-) ***UNCHANGED SINCE THE FIRST SEND*** diff --git a/src/glsl/nir/nir_control_flow.c b/src/glsl/nir/nir_control_flow.c index 768dfd2..43e4e43 100644 --- a/src/glsl/nir/nir_control_flow.c +++ b/src/glsl/nir/nir_control_flow.c @@ -60,6 +60,16 @@ block_add_pred(nir_block *block, nir_block *pred) _mesa_set_add(block->predecessors, pred); } +static inline void +block_remove_pred(nir_block *block, nir_block *pred) +{ + struct set_entry *entry = _mesa_set_search(block->predecessors, pred); + + assert(entry); + + _mesa_set_remove(block->predecessors, entry); +} + static void link_blocks(nir_block *pred, nir_block *succ1, nir_block *succ2) { @@ -83,11 +93,7 @@ unlink_blocks(nir_block *pred, nir_block *succ) pred->successors[1] = NULL; } - struct set_entry *entry = _mesa_set_search(succ->predecessors, pred); - - assert(entry); - - _mesa_set_remove(succ->predecessors, entry); + block_remove_pred(succ, pred); } static void -- 2.5.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 01/11] nir: Validate that a block doesn't have two identical successors.
This is invalid, and causes disasters if we try to unlink successors: removing the first will work, but removing the second copy will fail because the block isn't in the successor's predecessor set any longer. Signed-off-by: Kenneth Graunke Reviewed-by: Connor Abbott --- src/glsl/nir/nir_validate.c | 1 + 1 file changed, 1 insertion(+) ***UNCHANGED SINCE THE FIRST SEND*** I want to commit all the NIR control flow fixes together, as I haven't tested individual commits - only that the whole group collectively works and fixes bugs. diff --git a/src/glsl/nir/nir_validate.c b/src/glsl/nir/nir_validate.c index 9938c0e..1c9993a 100644 --- a/src/glsl/nir/nir_validate.c +++ b/src/glsl/nir/nir_validate.c @@ -586,6 +586,7 @@ validate_block(nir_block *block, validate_state *state) } assert(block->successors[0] != NULL); + assert(block->successors[0] != block->successors[1]); for (unsigned i = 0; i < 2; i++) { if (block->successors[i] != NULL) { -- 2.5.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 07/11] nir/cf: Fix dominance metadata in the dead control flow pass.
The NIR control flow modification API churns the block structure, splitting blocks, stitching them back together, and so on. Preserving information about block dominance is hard (and probably not worthwhile). This patch makes nir_cf_extract() throw away all metadata, like we do when adding/removing jumps. We then make the dead control flow pass compute dominance information right before it uses it. This is necessary because earlier work by the pass may have invalidated it. Signed-off-by: Kenneth Graunke --- src/glsl/nir/nir_control_flow.c | 3 +++ src/glsl/nir/nir_opt_dead_cf.c | 7 --- 2 files changed, 7 insertions(+), 3 deletions(-) New patch! diff --git a/src/glsl/nir/nir_control_flow.c b/src/glsl/nir/nir_control_flow.c index e027766..8ec1252 100644 --- a/src/glsl/nir/nir_control_flow.c +++ b/src/glsl/nir/nir_control_flow.c @@ -742,6 +742,9 @@ nir_cf_extract(nir_cf_list *extracted, nir_cursor begin, nir_cursor end) extracted->impl = nir_cf_node_get_function(&block_begin->cf_node); exec_list_make_empty(&extracted->list); + /* Dominance and other block-related information is toast. */ + nir_metadata_preserve(extracted->impl, nir_metadata_none); + nir_cf_node *cf_node = &block_begin->cf_node; nir_cf_node *cf_node_end = &block_end->cf_node; while (true) { diff --git a/src/glsl/nir/nir_opt_dead_cf.c b/src/glsl/nir/nir_opt_dead_cf.c index 317bbc5..0d4819b 100644 --- a/src/glsl/nir/nir_opt_dead_cf.c +++ b/src/glsl/nir/nir_opt_dead_cf.c @@ -203,6 +203,10 @@ loop_is_dead(nir_loop *loop) NULL)) return false; + nir_function_impl *impl = nir_cf_node_get_function(&loop->cf_node); + nir_metadata_require(impl, nir_metadata_live_variables | + nir_metadata_dominance); + for (nir_block *cur = after->imm_dom; cur != before; cur = cur->imm_dom) { nir_foreach_instr(cur, instr) { if (!nir_foreach_ssa_def(instr, def_not_live_out, after)) @@ -332,9 +336,6 @@ dead_cf_list(struct exec_list *list, bool *list_ends_in_jump) static bool opt_dead_cf_impl(nir_function_impl *impl) { - nir_metadata_require(impl, nir_metadata_live_variables | - nir_metadata_dominance); - bool dummy; bool progress = dead_cf_list(&impl->body, &dummy); -- 2.5.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 05/11] nir/cf: Alter block successors before adding a fake link.
Consider the case of "while (...) { break }". Or in NIR: block block_0 (0x7ab640): ... /* succs: block_1 */ loop { block block_1: /* preds: block_0 */ break /* succs: block_2 */ } block block_2: Calling nir_handle_remove_jump(block_1, nir_jump_break) will remove the break. Unfortunately, it would mangle the predecessors and successors. Here, block_2->predecessors->entries == 1, so we would create a fake link, setting block_1->successors[1] = block_2, and adding block_1 to block_2's predecessor set. This is illegal: a block cannot specify the same successor twice. In particular, adding the predecessor would have no effect, as it was already present in the set. We'd then call unlink_block_successors(), which would delete the fake link and remove block_1 from block_2's predecessor set. It would then delete successors[0], and attempt to remove block_1 from block_2's predecessor set a second time...except that it wouldn't be present, triggering an assertion failure. The fix appears to be simple: move the fake link creation after we recreate the block's successors. Since we're inspecting "next" later, If we've created a new infinite loop, the code following the loop will be dead, and thus 0 predecessors. So we create a new fake successor. simply unlink the block's successors and recreate them to point at the correct blocks first. Then, add the fake link. In the above example, removing the break would cause block_1 to have itself as a successor (as it becomes an infinite loop), and then Signed-off-by: Kenneth Graunke --- src/glsl/nir/nir_control_flow.c | 30 ++ 1 file changed, 14 insertions(+), 16 deletions(-) New patch! diff --git a/src/glsl/nir/nir_control_flow.c b/src/glsl/nir/nir_control_flow.c index 2b23f38..a59add5 100644 --- a/src/glsl/nir/nir_control_flow.c +++ b/src/glsl/nir/nir_control_flow.c @@ -551,31 +551,29 @@ remove_phi_src(nir_block *block, nir_block *pred) static void unlink_jump(nir_block *block, nir_jump_type type, bool add_normal_successors) { + nir_block *next = block->successors[0]; + if (block->successors[0]) remove_phi_src(block->successors[0], block); if (block->successors[1]) remove_phi_src(block->successors[1], block); - if (type == nir_jump_break) { - nir_block *next = block->successors[0]; + unlink_block_successors(block); + if (add_normal_successors) + block_add_normal_succs(block); - if (next->predecessors->entries == 1) { - nir_loop *loop = -nir_cf_node_as_loop(nir_cf_node_prev(&next->cf_node)); + if (type == nir_jump_break && next->predecessors->entries == 0) { + nir_loop *loop = + nir_cf_node_as_loop(nir_cf_node_prev(&next->cf_node)); - /* insert fake link */ - nir_cf_node *last = nir_loop_last_cf_node(loop); - assert(last->type == nir_cf_node_block); - nir_block *last_block = nir_cf_node_as_block(last); + /* insert fake link */ + nir_cf_node *last = nir_loop_last_cf_node(loop); + assert(last->type == nir_cf_node_block); + nir_block *last_block = nir_cf_node_as_block(last); - last_block->successors[1] = next; - block_add_pred(next, last_block); - } + last_block->successors[1] = next; + block_add_pred(next, last_block); } - - unlink_block_successors(block); - if (add_normal_successors) - block_add_normal_succs(block); } void -- 2.5.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 03/11] nir/cf: Don't break outer-block successors in split_block_beginning().
Consider the following NIR: block block_0; /* succs: block_1 block_2 */ if (...) { block block_1; ... } else { block block_2; } Calling split_block_beginning() on block_1 would break block_0's successors: link_block() sets both successors of a block, so calling link_block(block_0, new_block, NULL) would throw away the second successor, leaving only /* succ: new_block */. This is invalid: the block before an if statement must have two successors. Changing the call to link_block(pred, new_block, pred->successors[0]) would correctly leave both successors in place, but because unlink_block may shift successor[1] to successor[0], it may not preserve the original order. NIR maintains a convention that successor[0] must point to the "then" block, while successor[1] points to the "else" block, so we need to take care to preserve this ordering. This patch creates a new function that swaps out one successor for another, preserving the ordering. It then uses this to fix the issue. Signed-off-by: Kenneth Graunke Reviewed-by: Connor Abbott --- src/glsl/nir/nir_control_flow.c | 21 ++--- 1 file changed, 18 insertions(+), 3 deletions(-) ***UNCHANGED SINCE THE FIRST SEND*** diff --git a/src/glsl/nir/nir_control_flow.c b/src/glsl/nir/nir_control_flow.c index 43e4e43..e2a151d 100644 --- a/src/glsl/nir/nir_control_flow.c +++ b/src/glsl/nir/nir_control_flow.c @@ -200,6 +200,23 @@ link_block_to_non_block(nir_block *block, nir_cf_node *node) } /** + * Replace a block's successor with a different one. + */ +static void +replace_successor(nir_block *block, nir_block *old_succ, nir_block *new_succ) +{ + if (block->successors[0] == old_succ) { + block->successors[0] = new_succ; + } else { + assert(block->successors[1] == old_succ); + block->successors[1] = new_succ; + } + + block_remove_pred(old_succ, block); + block_add_pred(new_succ, block); +} + +/** * Takes a basic block and inserts a new empty basic block before it, making its * predecessors point to the new block. This essentially splits the block into * an empty header and a body so that another non-block CF node can be inserted @@ -217,9 +234,7 @@ split_block_beginning(nir_block *block) struct set_entry *entry; set_foreach(block->predecessors, entry) { nir_block *pred = (nir_block *) entry->key; - - unlink_blocks(pred, block); - link_blocks(pred, new_block, NULL); + replace_successor(pred, block, new_block); } /* Any phi nodes must stay part of the new block, or else their -- 2.5.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 08/11] nir: Add unit tests for control flow graphs.
Signed-off-by: Kenneth Graunke --- src/glsl/Makefile.am | 14 +++ src/glsl/nir/tests/control_flow_tests.cpp | 155 ++ 2 files changed, 169 insertions(+) create mode 100644 src/glsl/nir/tests/control_flow_tests.cpp For now, this only adds a single test. It was broken before this series, and now it works. We certainly need to add more tests; I'm not sure whether it makes sense to commit this now or later. Suggestions for mean things to test are welcome. diff --git a/src/glsl/Makefile.am b/src/glsl/Makefile.am index 1aa9caa..3265391 100644 --- a/src/glsl/Makefile.am +++ b/src/glsl/Makefile.am @@ -50,12 +50,14 @@ EXTRA_DIST = tests glcpp/tests README TODO glcpp/README \ nir/nir_opcodes_c.py\ nir/nir_opcodes_h.py\ nir/nir_opt_algebraic.py\ + nir/tests \ SConscript include Makefile.sources TESTS = glcpp/tests/glcpp-test \ glcpp/tests/glcpp-test-cr-lf\ +nir/tests/control_flow_tests \ tests/blob-test \ tests/general-ir-test \ tests/optimization-test \ @@ -70,6 +72,7 @@ noinst_LTLIBRARIES = libnir.la libglsl.la libglcpp.la check_PROGRAMS = \ glcpp/glcpp \ glsl_test \ + nir/tests/control_flow_tests\ tests/blob-test \ tests/general-ir-test \ tests/sampler-types-test\ @@ -263,3 +266,14 @@ nir/nir_opcodes.c: nir/nir_opcodes.py nir/nir_opcodes_c.py nir/nir_opt_algebraic.c: nir/nir_opt_algebraic.py nir/nir_algebraic.py $(MKDIR_GEN) $(PYTHON_GEN) $(srcdir)/nir/nir_opt_algebraic.py > $@ + +nir_tests_control_flow_tests_SOURCES = \ + nir/tests/control_flow_tests.cpp +nir_tests_control_flow_tests_CFLAGS = \ + $(PTHREAD_CFLAGS) +nir_tests_control_flow_tests_LDADD = \ + $(top_builddir)/src/gtest/libgtest.la \ + $(top_builddir)/src/glsl/libnir.la \ + $(top_builddir)/src/libglsl_util.la \ + $(top_builddir)/src/util/libmesautil.la \ + $(PTHREAD_LIBS) diff --git a/src/glsl/nir/tests/control_flow_tests.cpp b/src/glsl/nir/tests/control_flow_tests.cpp new file mode 100644 index 000..b9f90e6 --- /dev/null +++ b/src/glsl/nir/tests/control_flow_tests.cpp @@ -0,0 +1,155 @@ +/* + * Copyright © 2015 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER + * DEALINGS IN THE SOFTWARE. + */ +#include +#include "nir.h" +#include "nir_builder.h" + +class nir_cf_test : public ::testing::Test { +protected: + nir_cf_test(); + ~nir_cf_test(); + + nir_builder b; + nir_shader *shader; + nir_function_impl *impl; +}; + +nir_cf_test::nir_cf_test() +{ + static const nir_shader_compiler_options options = { }; + shader = nir_shader_create(NULL, MESA_SHADER_VERTEX, &options); + nir_function *func = nir_function_create(shader, "main"); + nir_function_overload *overload = nir_function_overload_create(func); + impl = nir_function_impl_create(overload); + + nir_builder_init(&b, impl); +} + +nir_cf_test::~nir_cf_test() +{ + ralloc_free(shader); +} + +TEST_F(nir_cf_test, delete_break_in_loop) +{ + /* Create IR: +* +* while (...) { break; } +*/ + nir_loop *loop = nir_loop_create(shader); + nir_cf_node_insert(nir_after_cf_list(&impl->body), &loop->cf_node); + + b.cursor = nir_after_cf_list(&loop->body); + + nir_jump_instr *jump = nir_jump_instr_create(shader, nir_jum
[Mesa-dev] [PATCH 06/11] nir/cf: Fix unlink_block_successors to actually unlink the second one.
Calling unlink_blocks(block, block->successors[0]) will successfully unlink the first successor, but then will shift block->successors[1] down to block->successor[0]. So the successors[1] != NULL check will always fail. Signed-off-by: Kenneth Graunke Reviewed-by: Connor Abbott --- src/glsl/nir/nir_control_flow.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) ***UNCHANGED SINCE THE FIRST SEND*** diff --git a/src/glsl/nir/nir_control_flow.c b/src/glsl/nir/nir_control_flow.c index a59add5..e027766 100644 --- a/src/glsl/nir/nir_control_flow.c +++ b/src/glsl/nir/nir_control_flow.c @@ -99,10 +99,10 @@ unlink_blocks(nir_block *pred, nir_block *succ) static void unlink_block_successors(nir_block *block) { - if (block->successors[0] != NULL) - unlink_blocks(block, block->successors[0]); if (block->successors[1] != NULL) unlink_blocks(block, block->successors[1]); + if (block->successors[0] != NULL) + unlink_blocks(block, block->successors[0]); } static void -- 2.5.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 09/11] nir: Add new GS intrinsics that maintain a count of emitted vertices.
This patch also introduces a lowering pass to convert the simple GS intrinsics to the new ones. See the comments above that for the rationale behind the new intrinsics. This should be useful for i965; it's a generic enough mechanism that I could see other drivers potentially using it as well, so I don't feel too bad about putting it in the generic code. v2: - Use nir_after_block_before_jump for the cursor (caught by Jason Ekstrand - I'd mistakenly used nir_after_block when rebasing this code onto the new NIR control flow API). - Remove the old emit_vertex intrinsic at the end, rather than in the middle (requested by Jason). - Use state->... directly rather than locals (requested by Jason). - Report progress from nir_lower_gs_intrinsics() (requested by me). - Remove "Authors:" section from file comment (requested by Michael Schellenberger Costa). Signed-off-by: Kenneth Graunke Reviewed-by: Jason Ekstrand [v1] --- src/glsl/Makefile.sources | 1 + src/glsl/nir/nir.h | 2 + src/glsl/nir/nir_intrinsics.h | 21 src/glsl/nir/nir_lower_gs_intrinsics.c | 218 + 4 files changed, 242 insertions(+) create mode 100644 src/glsl/nir/nir_lower_gs_intrinsics.c Revised patch. This GS series was blocked on fixing the NIR CF bugs, but I'm hoping that we'll be able to land it now. diff --git a/src/glsl/Makefile.sources b/src/glsl/Makefile.sources index f7c69f4..a8f4994 100644 --- a/src/glsl/Makefile.sources +++ b/src/glsl/Makefile.sources @@ -37,6 +37,7 @@ NIR_FILES = \ nir/nir_lower_atomics.c \ nir/nir_lower_clip.c \ nir/nir_lower_global_vars_to_local.c \ + nir/nir_lower_gs_intrinsics.c \ nir/nir_lower_load_const_to_scalar.c \ nir/nir_lower_locals_to_regs.c \ nir/nir_lower_idiv.c \ diff --git a/src/glsl/nir/nir.h b/src/glsl/nir/nir.h index 619a363..4f45770 100644 --- a/src/glsl/nir/nir.h +++ b/src/glsl/nir/nir.h @@ -1883,6 +1883,8 @@ void nir_lower_two_sided_color(nir_shader *shader); void nir_lower_atomics(nir_shader *shader); void nir_lower_to_source_mods(nir_shader *shader); +bool nir_lower_gs_intrinsics(nir_shader *shader); + bool nir_normalize_cubemap_coords(nir_shader *shader); void nir_live_variables_impl(nir_function_impl *impl); diff --git a/src/glsl/nir/nir_intrinsics.h b/src/glsl/nir/nir_intrinsics.h index 70cae42..b21460d 100644 --- a/src/glsl/nir/nir_intrinsics.h +++ b/src/glsl/nir/nir_intrinsics.h @@ -79,9 +79,30 @@ BARRIER(memory_barrier) /** A conditional discard, with a single boolean source. */ INTRINSIC(discard_if, 1, ARR(1), false, 0, 0, 0, 0) +/** + * Basic Geometry Shader intrinsics. + * + * emit_vertex implements GLSL's EmitStreamVertex() built-in. It takes a single + * index, which is the stream ID to write to. + * + * end_primitive implements GLSL's EndPrimitive() built-in. + */ INTRINSIC(emit_vertex, 0, ARR(), false, 0, 0, 1, 0) INTRINSIC(end_primitive, 0, ARR(), false, 0, 0, 1, 0) +/** + * Geometry Shader intrinsics with a vertex count. + * + * Alternatively, drivers may implement these intrinsics, and use + * nir_lower_gs_intrinsics() to convert from the basic intrinsics. + * + * These maintain a count of the number of vertices emitted, as an additional + * unsigned integer source. + */ +INTRINSIC(emit_vertex_with_counter, 1, ARR(1), false, 0, 0, 1, 0) +INTRINSIC(end_primitive_with_counter, 1, ARR(1), false, 0, 0, 1, 0) +INTRINSIC(set_vertex_count, 1, ARR(1), false, 0, 0, 0, 0) + /* * Atomic counters * diff --git a/src/glsl/nir/nir_lower_gs_intrinsics.c b/src/glsl/nir/nir_lower_gs_intrinsics.c new file mode 100644 index 000..2ee4e5c --- /dev/null +++ b/src/glsl/nir/nir_lower_gs_intrinsics.c @@ -0,0 +1,218 @@ +/* + * Copyright © 2015 Intel Corporation + * + * Permission is hereby granted, free of charge, to any person obtaining a + * copy of this software and associated documentation files (the "Software"), + * to deal in the Software without restriction, including without limitation + * the rights to use, copy, modify, merge, publish, distribute, sublicense, + * and/or sell copies of the Software, and to permit persons to whom the + * Software is furnished to do so, subject to the following conditions: + * + * The above copyright notice and this permission notice (including the next + * paragraph) shall be included in all copies or substantial portions of the + * Software. + * + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS + * IN THE SOFTWARE. + */ + +#include "nir.h" +#include "nir_builder.h" + +/**
[Mesa-dev] [PATCH 11/11] i965/gs: Fix extra level of indentation left by the previous commit.
I left a bunch of code indented a level in the previous patch to make the diff easier to read. But now we should fix that. Signed-off-by: Kenneth Graunke Reviewed-by: Jason Ekstrand --- src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp | 124 +++--- src/mesa/drivers/dri/i965/gen6_gs_visitor.cpp | 102 +- 2 files changed, 111 insertions(+), 115 deletions(-) ***UNCHANGED SINCE THE FIRST SEND*** diff --git a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp index 7a5b945..3cb1b4c 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_gs_visitor.cpp @@ -484,76 +484,74 @@ vec4_gs_visitor::gs_emit_vertex(int stream_id) if (stream_id > 0 && shader_prog->TransformFeedback.NumVarying == 0) return; - { - /* If we're outputting 32 control data bits or less, then we can wait - * until the shader is over to output them all. Otherwise we need to - * output them as we go. Now is the time to do it, since we're about to - * output the vertex_count'th vertex, so it's guaranteed that the - * control data bits associated with the (vertex_count - 1)th vertex are - * correct. + /* If we're outputting 32 control data bits or less, then we can wait +* until the shader is over to output them all. Otherwise we need to +* output them as we go. Now is the time to do it, since we're about to +* output the vertex_count'th vertex, so it's guaranteed that the +* control data bits associated with the (vertex_count - 1)th vertex are +* correct. +*/ + if (c->control_data_header_size_bits > 32) { + this->current_annotation = "emit vertex: emit control data bits"; + /* Only emit control data bits if we've finished accumulating a batch + * of 32 bits. This is the case when: + * + * (vertex_count * bits_per_vertex) % 32 == 0 + * + * (in other words, when the last 5 bits of vertex_count * + * bits_per_vertex are 0). Assuming bits_per_vertex == 2^n for some + * integer n (which is always the case, since bits_per_vertex is + * always 1 or 2), this is equivalent to requiring that the last 5-n + * bits of vertex_count are 0: + * + * vertex_count & (2^(5-n) - 1) == 0 + * + * 2^(5-n) == 2^5 / 2^n == 32 / bits_per_vertex, so this is + * equivalent to: + * + * vertex_count & (32 / bits_per_vertex - 1) == 0 */ - if (c->control_data_header_size_bits > 32) { - this->current_annotation = "emit vertex: emit control data bits"; - /* Only emit control data bits if we've finished accumulating a batch - * of 32 bits. This is the case when: - * - * (vertex_count * bits_per_vertex) % 32 == 0 - * - * (in other words, when the last 5 bits of vertex_count * - * bits_per_vertex are 0). Assuming bits_per_vertex == 2^n for some - * integer n (which is always the case, since bits_per_vertex is - * always 1 or 2), this is equivalent to requiring that the last 5-n - * bits of vertex_count are 0: - * - * vertex_count & (2^(5-n) - 1) == 0 - * - * 2^(5-n) == 2^5 / 2^n == 32 / bits_per_vertex, so this is - * equivalent to: - * - * vertex_count & (32 / bits_per_vertex - 1) == 0 + vec4_instruction *inst = + emit(AND(dst_null_d(), this->vertex_count, + (uint32_t) (32 / c->control_data_bits_per_vertex - 1))); + inst->conditional_mod = BRW_CONDITIONAL_Z; + + emit(IF(BRW_PREDICATE_NORMAL)); + { + /* If vertex_count is 0, then no control data bits have been + * accumulated yet, so we skip emitting them. */ - vec4_instruction *inst = -emit(AND(dst_null_d(), this->vertex_count, - (uint32_t) (32 / c->control_data_bits_per_vertex - 1))); - inst->conditional_mod = BRW_CONDITIONAL_Z; - + emit(CMP(dst_null_d(), this->vertex_count, 0u, + BRW_CONDITIONAL_NEQ)); emit(IF(BRW_PREDICATE_NORMAL)); - { -/* If vertex_count is 0, then no control data bits have been - * accumulated yet, so we skip emitting them. - */ -emit(CMP(dst_null_d(), this->vertex_count, 0u, - BRW_CONDITIONAL_NEQ)); -emit(IF(BRW_PREDICATE_NORMAL)); -emit_control_data_bits(); -emit(BRW_OPCODE_ENDIF); - -/* Reset control_data_bits to 0 so we can start accumulating a new - * batch. - * - * Note: in the case where vertex_count == 0, this neutralizes the - * effect of any call to EndPrimitive() that the shader may have - * made before outputting its first vertex. -
Re: [Mesa-dev] [PATCH 2/6] nir/lower_alu_to_scalar: Return after lower_reduction
On Tuesday, September 22, 2015 06:18:12 PM Jason Ekstrand wrote: > We don't use any of the code after the switch and, as far as I can tell, it > wouldn't work anyway. The only reason this wasn't causing us problems is > that it's all dead and DCE cleans it up. Huh, really? Right after that switch statement I see: if (instr->dest.dest.ssa.num_components == 1) return; and reductions should always produce a single component, so this change seems like it should do nothing. It seems like a reasonable change regardless...but... > --- > src/glsl/nir/nir_lower_alu_to_scalar.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/src/glsl/nir/nir_lower_alu_to_scalar.c > b/src/glsl/nir/nir_lower_alu_to_scalar.c > index 5ef5ec2..84d4943 100644 > --- a/src/glsl/nir/nir_lower_alu_to_scalar.c > +++ b/src/glsl/nir/nir_lower_alu_to_scalar.c > @@ -86,7 +86,7 @@ lower_alu_instr_scalar(nir_alu_instr *instr, nir_builder *b) > case name##3: \ > case name##4: \ >lower_reduction(instr, chan, merge, b); \ > - break; > + return; > > switch (instr->op) { > case nir_op_vec4: > signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 6/6] prog_to_nir: Use nir_op_dph
On Tuesday, September 22, 2015 06:18:16 PM Jason Ekstrand wrote: > --- > src/mesa/program/prog_to_nir.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/src/mesa/program/prog_to_nir.c b/src/mesa/program/prog_to_nir.c > index ec61100..1bd735a 100644 > --- a/src/mesa/program/prog_to_nir.c > +++ b/src/mesa/program/prog_to_nir.c > @@ -527,8 +527,7 @@ ptn_dp4(nir_builder *b, nir_alu_dest dest, nir_ssa_def > **src) > static void > ptn_dph(nir_builder *b, nir_alu_dest dest, nir_ssa_def **src) > { > - nir_ssa_def *dp3 = nir_fdot3(b, src[0], src[1]); > - ptn_move_dest(b, dest, nir_fadd(b, dp3, ptn_channel(b, src[1], W))); > + ptn_move_dest(b, dest, nir_fdph(b, src[0], src[1])); > } > > static void > I verified that both freedreno and vc4 use nir_lower_alu_to_scalar(), so they shouldn't need any new code to handle fdph. It might make sense to make tgsi_to_nir use fdph as well, but it's fairly moot since neither consumer will actually see dot products. Series is: Reviewed-by: Kenneth Graunke signature.asc Description: This is a digitally signed message part. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 6/6] prog_to_nir: Use nir_op_dph
On Tue, Sep 22, 2015 at 6:18 PM, Jason Ekstrand wrote: > --- Thanks Jason! The whole series is Reviewed-by: Matt Turner Feel free to add these shader-db results to this commit message: instructions in affected programs: 72 -> 56 (-22.22%) ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 5/6] nir/lower_alu_to_scalar: Add support for nir_op_fdph
--- src/glsl/nir/nir_lower_alu_to_scalar.c | 18 ++ 1 file changed, 18 insertions(+) diff --git a/src/glsl/nir/nir_lower_alu_to_scalar.c b/src/glsl/nir/nir_lower_alu_to_scalar.c index 84d4943..9313fc0 100644 --- a/src/glsl/nir/nir_lower_alu_to_scalar.c +++ b/src/glsl/nir/nir_lower_alu_to_scalar.c @@ -112,6 +112,24 @@ lower_alu_instr_scalar(nir_alu_instr *instr, nir_builder *b) */ return; + case nir_op_fdph: { + nir_ssa_def *sum[4]; + for (unsigned i = 0; i < 3; i++) { + sum[i] = nir_fmul(b, nir_channel(b, instr->src[0].src.ssa, + instr->src[0].swizzle[i]), + nir_channel(b, instr->src[1].src.ssa, + instr->src[1].swizzle[i])); + } + sum[3] = nir_channel(b, instr->src[1].src.ssa, instr->src[1].swizzle[3]); + + nir_ssa_def *val = nir_fadd(b, nir_fadd(b, sum[0], sum[1]), + nir_fadd(b, sum[2], sum[3])); + + nir_ssa_def_rewrite_uses(&instr->dest.dest.ssa, nir_src_for_ssa(val)); + nir_instr_remove(&instr->instr); + return; + } + LOWER_REDUCTION(nir_op_fdot, nir_op_fmul, nir_op_fadd); LOWER_REDUCTION(nir_op_ball_fequal, nir_op_feq, nir_op_iand); LOWER_REDUCTION(nir_op_ball_iequal, nir_op_ieq, nir_op_iand); -- 2.5.0.400.gff86faf ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 4/6] i965/vec4: Add support for fdph_replicated
--- src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 5 + 1 file changed, 5 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp index 20c063d..c681ae4 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp @@ -1268,6 +1268,11 @@ vec4_visitor::nir_emit_alu(nir_alu_instr *instr) inst->saturate = instr->dest.saturate; break; + case nir_op_fdph_replicated: + inst = emit(BRW_OPCODE_DPH, dst, op[0], op[1]); + inst->saturate = instr->dest.saturate; + break; + case nir_op_bany2: case nir_op_bany3: case nir_op_bany4: { -- 2.5.0.400.gff86faf ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/6] nir: Add fdph and fdph_replicated opcodes
--- src/glsl/nir/nir_lower_vec_to_movs.c | 3 ++- src/glsl/nir/nir_opcodes.py | 5 + src/glsl/nir/nir_opt_algebraic.py| 1 + 3 files changed, 8 insertions(+), 1 deletion(-) diff --git a/src/glsl/nir/nir_lower_vec_to_movs.c b/src/glsl/nir/nir_lower_vec_to_movs.c index b7ee4e8..622e59c 100644 --- a/src/glsl/nir/nir_lower_vec_to_movs.c +++ b/src/glsl/nir/nir_lower_vec_to_movs.c @@ -89,7 +89,8 @@ has_replicated_dest(nir_alu_instr *alu) { return alu->op == nir_op_fdot_replicated2 || alu->op == nir_op_fdot_replicated3 || - alu->op == nir_op_fdot_replicated4; + alu->op == nir_op_fdot_replicated4 || + alu->op == nir_op_fdph_replicated; } /* Attempts to coalesce the "move" from the given source of the vec to the diff --git a/src/glsl/nir/nir_opcodes.py b/src/glsl/nir/nir_opcodes.py index 495d109..f2d584f 100644 --- a/src/glsl/nir/nir_opcodes.py +++ b/src/glsl/nir/nir_opcodes.py @@ -456,6 +456,11 @@ binop_reduce("fdot", 1, tfloat, tfloat, "{src0} * {src1}", "{src0} + {src1}", binop_reduce("fdot_replicated", 4, tfloat, tfloat, "{src0} * {src1}", "{src0} + {src1}", "{src}") +opcode("fdph", 1, tfloat, [3, 4], [tfloat, tfloat], "", + "src0.x * src1.x + src0.y * src1.y + src0.z * src1.z + src1.w") +opcode("fdph_replicated", 4, tfloat, [3, 4], [tfloat, tfloat], "", + "src0.x * src1.x + src0.y * src1.y + src0.z * src1.z + src1.w") + binop("fmin", tfloat, "", "fminf(src0, src1)") binop("imin", tint, commutative + associative, "src1 > src0 ? src0 : src1") binop("umin", tunsigned, commutative + associative, "src1 > src0 ? src0 : src1") diff --git a/src/glsl/nir/nir_opt_algebraic.py b/src/glsl/nir/nir_opt_algebraic.py index 43558a5..585e5e0 100644 --- a/src/glsl/nir/nir_opt_algebraic.py +++ b/src/glsl/nir/nir_opt_algebraic.py @@ -244,6 +244,7 @@ late_optimizations = [ (('fdot2', a, b), ('fdot_replicated2', a, b), 'options->fdot_replicates'), (('fdot3', a, b), ('fdot_replicated3', a, b), 'options->fdot_replicates'), (('fdot4', a, b), ('fdot_replicated4', a, b), 'options->fdot_replicates'), + (('fdph', a, b), ('fdph_replicated', a, b), 'options->fdot_replicates'), ] print nir_algebraic.AlgebraicPass("nir_opt_algebraic", optimizations).render() -- 2.5.0.400.gff86faf ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 6/6] prog_to_nir: Use nir_op_dph
--- src/mesa/program/prog_to_nir.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/src/mesa/program/prog_to_nir.c b/src/mesa/program/prog_to_nir.c index ec61100..1bd735a 100644 --- a/src/mesa/program/prog_to_nir.c +++ b/src/mesa/program/prog_to_nir.c @@ -527,8 +527,7 @@ ptn_dp4(nir_builder *b, nir_alu_dest dest, nir_ssa_def **src) static void ptn_dph(nir_builder *b, nir_alu_dest dest, nir_ssa_def **src) { - nir_ssa_def *dp3 = nir_fdot3(b, src[0], src[1]); - ptn_move_dest(b, dest, nir_fadd(b, dp3, ptn_channel(b, src[1], W))); + ptn_move_dest(b, dest, nir_fdph(b, src[0], src[1])); } static void -- 2.5.0.400.gff86faf ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/6] nir/lower_alu_to_scalar: Use the builder
--- src/glsl/nir/nir_lower_alu_to_scalar.c | 47 -- 1 file changed, 22 insertions(+), 25 deletions(-) diff --git a/src/glsl/nir/nir_lower_alu_to_scalar.c b/src/glsl/nir/nir_lower_alu_to_scalar.c index 710bb37..5ef5ec2 100644 --- a/src/glsl/nir/nir_lower_alu_to_scalar.c +++ b/src/glsl/nir/nir_lower_alu_to_scalar.c @@ -22,6 +22,7 @@ */ #include "nir.h" +#include "nir_builder.h" /** @file nir_lower_alu_to_scalar.c * @@ -38,13 +39,13 @@ nir_alu_ssa_dest_init(nir_alu_instr *instr, unsigned num_components) static void lower_reduction(nir_alu_instr *instr, nir_op chan_op, nir_op merge_op, -void *mem_ctx) +nir_builder *builder) { unsigned num_components = nir_op_infos[instr->op].input_sizes[0]; nir_ssa_def *last = NULL; for (unsigned i = 0; i < num_components; i++) { - nir_alu_instr *chan = nir_alu_instr_create(mem_ctx, chan_op); + nir_alu_instr *chan = nir_alu_instr_create(builder->shader, chan_op); nir_alu_ssa_dest_init(chan, 1); nir_alu_src_copy(&chan->src[0], &instr->src[0], chan); chan->src[0].swizzle[0] = chan->src[0].swizzle[i]; @@ -54,18 +55,13 @@ lower_reduction(nir_alu_instr *instr, nir_op chan_op, nir_op merge_op, chan->src[1].swizzle[0] = chan->src[1].swizzle[i]; } - nir_instr_insert_before(&instr->instr, &chan->instr); + nir_builder_instr_insert(builder, &chan->instr); if (i == 0) { last = &chan->dest.dest.ssa; } else { - nir_alu_instr *merge = nir_alu_instr_create(mem_ctx, merge_op); - nir_alu_ssa_dest_init(merge, 1); - merge->dest.write_mask = 1; - merge->src[0].src = nir_src_for_ssa(last); - merge->src[1].src = nir_src_for_ssa(&chan->dest.dest.ssa); - nir_instr_insert_before(&instr->instr, &merge->instr); - last = &merge->dest.dest.ssa; + last = nir_build_alu(builder, merge_op, + last, &chan->dest.dest.ssa, NULL, NULL); } } @@ -75,7 +71,7 @@ lower_reduction(nir_alu_instr *instr, nir_op chan_op, nir_op merge_op, } static void -lower_alu_instr_scalar(nir_alu_instr *instr, void *mem_ctx) +lower_alu_instr_scalar(nir_alu_instr *instr, nir_builder *b) { unsigned num_src = nir_op_infos[instr->op].num_inputs; unsigned i, chan; @@ -83,11 +79,13 @@ lower_alu_instr_scalar(nir_alu_instr *instr, void *mem_ctx) assert(instr->dest.dest.is_ssa); assert(instr->dest.write_mask != 0); + b->cursor = nir_before_instr(&instr->instr); + #define LOWER_REDUCTION(name, chan, merge) \ case name##2: \ case name##3: \ case name##4: \ - lower_reduction(instr, chan, merge, mem_ctx); \ + lower_reduction(instr, chan, merge, b); \ break; switch (instr->op) { @@ -134,16 +132,13 @@ lower_alu_instr_scalar(nir_alu_instr *instr, void *mem_ctx) return; unsigned num_components = instr->dest.dest.ssa.num_components; - static const nir_op nir_op_map[] = {nir_op_vec2, nir_op_vec3, nir_op_vec4}; - nir_alu_instr *vec_instr = - nir_alu_instr_create(mem_ctx, nir_op_map[num_components - 2]); - nir_alu_ssa_dest_init(vec_instr, num_components); + nir_ssa_def *comps[] = { NULL, NULL, NULL, NULL }; for (chan = 0; chan < 4; chan++) { if (!(instr->dest.write_mask & (1 << chan))) continue; - nir_alu_instr *lower = nir_alu_instr_create(mem_ctx, instr->op); + nir_alu_instr *lower = nir_alu_instr_create(b->shader, instr->op); for (i = 0; i < num_src; i++) { /* We only handle same-size-as-dest (input_sizes[] == 0) or scalar * args (input_sizes[] == 1). @@ -159,25 +154,24 @@ lower_alu_instr_scalar(nir_alu_instr *instr, void *mem_ctx) nir_alu_ssa_dest_init(lower, 1); lower->dest.saturate = instr->dest.saturate; - vec_instr->src[chan].src = nir_src_for_ssa(&lower->dest.dest.ssa); + comps[chan] = &lower->dest.dest.ssa; - nir_instr_insert_before(&instr->instr, &lower->instr); + nir_builder_instr_insert(b, &lower->instr); } - nir_instr_insert_before(&instr->instr, &vec_instr->instr); + nir_ssa_def *vec = nir_vec(b, comps, num_components); - nir_ssa_def_rewrite_uses(&instr->dest.dest.ssa, -nir_src_for_ssa(&vec_instr->dest.dest.ssa)); + nir_ssa_def_rewrite_uses(&instr->dest.dest.ssa, nir_src_for_ssa(vec)); nir_instr_remove(&instr->instr); } static bool -lower_alu_to_scalar_block(nir_block *block, void *data) +lower_alu_to_scalar_block(nir_block *block, void *builder) { nir_foreach_instr_safe(block, instr) { if (instr->type == nir_instr_type_alu) - lower_alu_instr_scalar(nir_instr_as_alu(instr), data); + lower_alu_instr_scalar(nir_instr_as_alu(instr), builder); } return true; @@ -186,7 +180,10 @@ lower_alu_to_scalar_block(nir_block *block, void *data) static void nir_lower_alu_to_scalar_impl(nir_functi
[Mesa-dev] [PATCH 2/6] nir/lower_alu_to_scalar: Return after lower_reduction
We don't use any of the code after the switch and, as far as I can tell, it wouldn't work anyway. The only reason this wasn't causing us problems is that it's all dead and DCE cleans it up. --- src/glsl/nir/nir_lower_alu_to_scalar.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/glsl/nir/nir_lower_alu_to_scalar.c b/src/glsl/nir/nir_lower_alu_to_scalar.c index 5ef5ec2..84d4943 100644 --- a/src/glsl/nir/nir_lower_alu_to_scalar.c +++ b/src/glsl/nir/nir_lower_alu_to_scalar.c @@ -86,7 +86,7 @@ lower_alu_instr_scalar(nir_alu_instr *instr, nir_builder *b) case name##3: \ case name##4: \ lower_reduction(instr, chan, merge, b); \ - break; + return; switch (instr->op) { case nir_op_vec4: -- 2.5.0.400.gff86faf ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 92072] Wine breakage since d082c5324 (st/mesa: don't call st_validate_state in BlitFramebuffer)
https://bugs.freedesktop.org/show_bug.cgi?id=92072 --- Comment #4 from Ilia Mirkin --- FYI Marek has a fix: http://patchwork.freedesktop.org/patch/60146/ I tested it with this trace, and it seems to fix it. -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] st/mesa: fix front buffer regression after dropping st_validate_state in Blit
On Tue, Sep 22, 2015 at 8:40 PM, Marek Olšák wrote: > From: Marek Olšák > > Broken by: d082c5324914212f76e45be497229c7a0681f706 > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92072 > > Cc: mesa-sta...@lists.freedesktop.org Tested-by: Ilia Mirkin Sorry, no idea what this defined business is, don't have time to investigate it right now. Can you put a Cc: "10.6 11.0" tag on there in case Emil should make another 10.6 release? Thanks for tracking this down! > --- > src/mesa/state_tracker/st_cb_blit.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/src/mesa/state_tracker/st_cb_blit.c > b/src/mesa/state_tracker/st_cb_blit.c > index 4fdef7f..59c4dcc 100644 > --- a/src/mesa/state_tracker/st_cb_blit.c > +++ b/src/mesa/state_tracker/st_cb_blit.c > @@ -229,6 +229,7 @@ st_BlitFramebuffer(struct gl_context *ctx, >st_adjust_blit_for_msaa_resolve(&blit); > >st->pipe->blit(st->pipe, &blit); > + dstRb->defined = true; > } > } > } > @@ -266,6 +267,7 @@ st_BlitFramebuffer(struct gl_context *ctx, >st_adjust_blit_for_msaa_resolve(&blit); > >st->pipe->blit(st->pipe, &blit); > + dstRb->defined = true; > } > } > } > -- > 2.1.4 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] st/mesa: fix front buffer regression after dropping st_validate_state in Blit
From: Marek Olšák Broken by: d082c5324914212f76e45be497229c7a0681f706 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92072 Cc: mesa-sta...@lists.freedesktop.org --- src/mesa/state_tracker/st_cb_blit.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/mesa/state_tracker/st_cb_blit.c b/src/mesa/state_tracker/st_cb_blit.c index 4fdef7f..59c4dcc 100644 --- a/src/mesa/state_tracker/st_cb_blit.c +++ b/src/mesa/state_tracker/st_cb_blit.c @@ -229,6 +229,7 @@ st_BlitFramebuffer(struct gl_context *ctx, st_adjust_blit_for_msaa_resolve(&blit); st->pipe->blit(st->pipe, &blit); + dstRb->defined = true; } } } @@ -266,6 +267,7 @@ st_BlitFramebuffer(struct gl_context *ctx, st_adjust_blit_for_msaa_resolve(&blit); st->pipe->blit(st->pipe, &blit); + dstRb->defined = true; } } } -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v5 02/70] i965: Use 16-byte offset alignment for shader storage buffers
On Tue, Sep 15, 2015 at 11:37:22AM -0700, Kristian Høgsberg wrote: > On Thu, Sep 10, 2015 at 03:35:18PM +0200, Iago Toral Quiroga wrote: > > This is the same we do for other things like uniforms because it ensures > > optimal performance. > > > > Reviewed-by: Jordan Justen > > --- > > src/mesa/drivers/dri/i965/brw_context.c | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/src/mesa/drivers/dri/i965/brw_context.c > > b/src/mesa/drivers/dri/i965/brw_context.c > > index 907b2a0..c8e8a68 100644 > > --- a/src/mesa/drivers/dri/i965/brw_context.c > > +++ b/src/mesa/drivers/dri/i965/brw_context.c > > @@ -558,6 +558,7 @@ brw_initialize_context_constants(struct brw_context > > *brw) > > * However, unaligned accesses are slower, so enforce buffer alignment. > > */ > > ctx->Const.UniformBufferOffsetAlignment = 16; > > + ctx->Const.ShaderStorageBufferOffsetAlignment = 16; > > This should be a cacheline (64 bytes) so that we can safely have the > CPU and GPU writing the same SSBO on non-cachecoherent systems (our > Atom CPUs). With UBOs, the GPU never writes, so there's no > problem. For an SSBO, the GPU and the CPU can be updating disjoint > regions of the buffer simultaneously and that will break if the > regions overlap the same cacheline. > > With that change, > > Reviewed-by: Kristian Høgsberg > I know nothing about SSBOs... I agree with Kristian assuming something in the API disallows simultaneous access to the same region from CPU and GPU - in which case, cacheline alignment doesn't buy anything. Also, when you make the change, please add a comment about why its 64. > > ctx->Const.TextureBufferOffsetAlignment = 16; > > ctx->Const.MaxTextureBufferSize = 128 * 1024 * 1024; > > > > -- > > 1.9.1 > > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 4/4] i965/cs: Implement DispatchComputeIndirect support
On Sat, Sep 19, 2015 at 3:50 PM, Jordan Justen wrote: > Signed-off-by: Jordan Justen > --- > src/mesa/drivers/dri/i965/brw_compute.c | 57 > ++--- > src/mesa/drivers/dri/i965/brw_defines.h | 2 ++ > src/mesa/drivers/dri/i965/intel_reg.h | 5 +++ > 3 files changed, 60 insertions(+), 4 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_compute.c > b/src/mesa/drivers/dri/i965/brw_compute.c > index 5693ab5..5641823 100644 > --- a/src/mesa/drivers/dri/i965/brw_compute.c > +++ b/src/mesa/drivers/dri/i965/brw_compute.c > @@ -31,14 +31,46 @@ > #include "brw_draw.h" > #include "brw_state.h" > #include "intel_batchbuffer.h" > +#include "intel_buffer_objects.h" > #include "brw_defines.h" > > > static void > -brw_emit_gpgpu_walker(struct brw_context *brw, const GLuint *num_groups) > +brw_emit_gpgpu_walker(struct brw_context *brw, > + const void *compute_param, > + bool indirect) > { > const struct brw_cs_prog_data *prog_data = brw->cs.prog_data; > > + const GLuint *num_groups; > + uint32_t indirect_flag; > + > + if (!indirect) { > + num_groups = (const GLuint *)compute_param; > + indirect_flag = 0; > + } else { > + GLintptr indirect_offset = *(GLintptr*)compute_param; "GLintptr *" (space between type and *) ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 4/4] i965/cs: Implement DispatchComputeIndirect support
On Sat, Sep 19, 2015 at 03:50:49PM -0700, Jordan Justen wrote: > Signed-off-by: Jordan Justen > --- > src/mesa/drivers/dri/i965/brw_compute.c | 57 > ++--- > src/mesa/drivers/dri/i965/brw_defines.h | 2 ++ > src/mesa/drivers/dri/i965/intel_reg.h | 5 +++ > 3 files changed, 60 insertions(+), 4 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_compute.c > b/src/mesa/drivers/dri/i965/brw_compute.c > index 5693ab5..5641823 100644 > --- a/src/mesa/drivers/dri/i965/brw_compute.c > +++ b/src/mesa/drivers/dri/i965/brw_compute.c > @@ -31,14 +31,46 @@ > #include "brw_draw.h" > #include "brw_state.h" > #include "intel_batchbuffer.h" > +#include "intel_buffer_objects.h" > #include "brw_defines.h" > > > static void > -brw_emit_gpgpu_walker(struct brw_context *brw, const GLuint *num_groups) > +brw_emit_gpgpu_walker(struct brw_context *brw, > + const void *compute_param, > + bool indirect) > { > const struct brw_cs_prog_data *prog_data = brw->cs.prog_data; > > + const GLuint *num_groups; > + uint32_t indirect_flag; > + > + if (!indirect) { > + num_groups = (const GLuint *)compute_param; > + indirect_flag = 0; > + } else { > + GLintptr indirect_offset = *(GLintptr*)compute_param; I would call this as brw_dispatch_compute_common(ctx, indirect, true); from brw_dispatch_compute_indirect() instead of passing the address of indirect and then just say GLintptr indirect_offset = (GLintptr)compute_param; here. GLintptr is sized so that that's guaranteed to work. With that, series Reviewed-by: Kristian Høgsberg > + static const GLuint indirect_group_counts[3] = { 0, 0, 0 }; > + num_groups = indirect_group_counts; > + > + struct gl_buffer_object *indirect_buffer = > brw->ctx.DispatchIndirectBuffer; > + drm_intel_bo *bo = intel_bufferobj_buffer(brw, > +intel_buffer_object(indirect_buffer), > +indirect_offset, 3 * sizeof(GLuint)); > + > + indirect_flag = GEN7_GPGPU_INDIRECT_PARAMETER_ENABLE; > + > + brw_load_register_mem(brw, GEN7_GPGPU_DISPATCHDIMX, bo, > +I915_GEM_DOMAIN_VERTEX, 0, > +indirect_offset + 0); > + brw_load_register_mem(brw, GEN7_GPGPU_DISPATCHDIMY, bo, > +I915_GEM_DOMAIN_VERTEX, 0, > +indirect_offset + 4); > + brw_load_register_mem(brw, GEN7_GPGPU_DISPATCHDIMZ, bo, > +I915_GEM_DOMAIN_VERTEX, 0, > +indirect_offset + 8); > + } > + > const unsigned simd_size = prog_data->simd_size; > unsigned group_size = prog_data->local_size[0] * >prog_data->local_size[1] * prog_data->local_size[2]; > @@ -52,7 +84,7 @@ brw_emit_gpgpu_walker(struct brw_context *brw, const GLuint > *num_groups) > > uint32_t dwords = brw->gen < 8 ? 11 : 15; > BEGIN_BATCH(dwords); > - OUT_BATCH(GPGPU_WALKER << 16 | (dwords - 2)); > + OUT_BATCH(GPGPU_WALKER << 16 | (dwords - 2) | indirect_flag); > OUT_BATCH(0); > if (brw->gen >= 8) { >OUT_BATCH(0); /* Indirect Data Length */ > @@ -83,7 +115,9 @@ brw_emit_gpgpu_walker(struct brw_context *brw, const > GLuint *num_groups) > > > static void > -brw_dispatch_compute(struct gl_context *ctx, const GLuint *num_groups) > +brw_dispatch_compute_common(struct gl_context *ctx, > +const void *compute_param, > +bool indirect) > { > struct brw_context *brw = brw_context(ctx); > int estimated_buffer_space_needed; > @@ -117,7 +151,7 @@ brw_dispatch_compute(struct gl_context *ctx, const GLuint > *num_groups) > brw->no_batch_wrap = true; > brw_upload_compute_state(brw); > > - brw_emit_gpgpu_walker(brw, num_groups); > + brw_emit_gpgpu_walker(brw, compute_param, indirect); > > brw->no_batch_wrap = false; > > @@ -155,9 +189,24 @@ brw_dispatch_compute(struct gl_context *ctx, const > GLuint *num_groups) > */ > } > > +static void > +brw_dispatch_compute(struct gl_context *ctx, const GLuint *num_groups) { > + brw_dispatch_compute_common(ctx, > + num_groups, > + false); > +} > + > +static void > +brw_dispatch_compute_indirect(struct gl_context *ctx, GLintptr indirect) > +{ > + brw_dispatch_compute_common(ctx, > + &indirect, > + true); > +} > > void > brw_init_compute_functions(struct dd_function_table *functions) > { > functions->DispatchCompute = brw_dispatch_compute; > + functions->DispatchComputeIndirect = brw_dispatch_compute_indirect; > } > diff --git a/src/mesa/drivers/dri/i965/brw_defines.h > b/src/mesa/drivers/dri/i965/brw_defines.h > index 8fc8ceb..2de51d0 100644 > --- a/src/mesa/drivers/dri/i965/brw_defines.h > +++ b/src/mesa/drivers/dri/i965/brw_defi
Re: [Mesa-dev] [PATCH 12/12] i965/skl+: Enable support for 16x multisampling
On Thu, Sep 17, 2015 at 05:00:14PM +0100, Neil Roberts wrote: > --- > src/mesa/drivers/dri/i965/brw_context.c | 6 ++ > src/mesa/drivers/dri/i965/intel_screen.c | 5 - > 2 files changed, 10 insertions(+), 1 deletion(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_context.c > b/src/mesa/drivers/dri/i965/brw_context.c > index 7c1c133..c05fb74 100644 > --- a/src/mesa/drivers/dri/i965/brw_context.c > +++ b/src/mesa/drivers/dri/i965/brw_context.c > @@ -83,6 +83,12 @@ brw_query_samples_for_format(struct gl_context *ctx, > GLenum target, > > switch (brw->gen) { > case 9: > + samples[0] = 16; > + samples[1] = 8; > + samples[2] = 4; > + samples[3] = 2; > + return 4; > + > case 8: >samples[0] = 8; >samples[1] = 4; > diff --git a/src/mesa/drivers/dri/i965/intel_screen.c > b/src/mesa/drivers/dri/i965/intel_screen.c > index 1783835..f971797 100644 > --- a/src/mesa/drivers/dri/i965/intel_screen.c > +++ b/src/mesa/drivers/dri/i965/intel_screen.c > @@ -1178,12 +1178,15 @@ intel_detect_timestamp(struct intel_screen *screen) > const int* > intel_supported_msaa_modes(const struct intel_screen *screen) > { > + static const int gen9_modes[] = {16, 8, 4, 2, 0, -1}; > static const int gen8_modes[] = {8, 4, 2, 0, -1}; > static const int gen7_modes[] = {8, 4, 0, -1}; > static const int gen6_modes[] = {4, 0, -1}; > static const int gen4_modes[] = {0, -1}; > > - if (screen->devinfo->gen >= 8) { > + if (screen->devinfo->gen >= 9) { > + return gen9_modes; > + } else if (screen->devinfo->gen >= 8) { >return gen8_modes; > } else if (screen->devinfo->gen >= 7) { >return gen7_modes; I think you also should add a case in get_fast_clear_rect() (even though we don't have fast clears turned on yet). It's in my fast clear branch I believe, but it makes more sense in this series. Reviewed-by: Ben Widawsky ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 11/12] meta: Support 16x MSAA in the multisample scaled blit shader
On Thu, Sep 17, 2015 at 05:00:13PM +0100, Neil Roberts wrote: > I'm not too sure about the expression used to index into sample_map in > the shader. It looks like if fract(coord.x) and fract(coord.y) are > close to 1.0 then it would index outside of the array. However the > code for 4 and 8 has the same problem and the results seems to look > reasonable. It might make more sense to change it to something like > this: > > sample_map[int(4 * fract(coord.x)) + 4 * int(fract(coord.y) * 4)] > --- > src/mesa/drivers/common/meta.h | 2 ++ > src/mesa/drivers/common/meta_blit.c| 8 ++-- > src/mesa/drivers/dri/i965/gen6_multisample_state.c | 14 ++ > src/mesa/main/mtypes.h | 15 ++- > 4 files changed, 36 insertions(+), 3 deletions(-) > > diff --git a/src/mesa/drivers/common/meta.h b/src/mesa/drivers/common/meta.h > index fe43915..e42ddc8 100644 > --- a/src/mesa/drivers/common/meta.h > +++ b/src/mesa/drivers/common/meta.h > @@ -285,9 +285,11 @@ enum blit_msaa_shader { > BLIT_2X_MSAA_SHADER_2D_MULTISAMPLE_SCALED_RESOLVE, > BLIT_4X_MSAA_SHADER_2D_MULTISAMPLE_SCALED_RESOLVE, > BLIT_8X_MSAA_SHADER_2D_MULTISAMPLE_SCALED_RESOLVE, > + BLIT_16X_MSAA_SHADER_2D_MULTISAMPLE_SCALED_RESOLVE, > BLIT_2X_MSAA_SHADER_2D_MULTISAMPLE_ARRAY_SCALED_RESOLVE, > BLIT_4X_MSAA_SHADER_2D_MULTISAMPLE_ARRAY_SCALED_RESOLVE, > BLIT_8X_MSAA_SHADER_2D_MULTISAMPLE_ARRAY_SCALED_RESOLVE, > + BLIT_16X_MSAA_SHADER_2D_MULTISAMPLE_ARRAY_SCALED_RESOLVE, > BLIT_MSAA_SHADER_COUNT, > }; > > diff --git a/src/mesa/drivers/common/meta_blit.c > b/src/mesa/drivers/common/meta_blit.c > index a41fe42..b545c37 100644 > --- a/src/mesa/drivers/common/meta_blit.c > +++ b/src/mesa/drivers/common/meta_blit.c > @@ -86,8 +86,8 @@ setup_glsl_msaa_blit_scaled_shader(struct gl_context *ctx, > while (samples >> (shader_offset + 1)) { >shader_offset++; > } > - /* Update the assert if we plan to support more than 8X MSAA. */ > - assert(shader_offset > 0 && shader_offset < 4); > + /* Update the assert if we plan to support more than 16X MSAA. */ > + assert(shader_offset > 0 && shader_offset <= 4); > > assert(target == GL_TEXTURE_2D_MULTISAMPLE || >target == GL_TEXTURE_2D_MULTISAMPLE_ARRAY); > @@ -132,6 +132,10 @@ setup_glsl_msaa_blit_scaled_shader(struct gl_context > *ctx, >sample_number = "sample_map[int(2 * fract(coord.x) + 8 * > fract(coord.y))]"; >sample_map = ctx->Const.SampleMap8x; >break; > + case 16: > + sample_number = "sample_map[int(4 * fract(coord.x) + 16 * > fract(coord.y))]"; > + sample_map = ctx->Const.SampleMap16x; > + break; > default: >sample_number = NULL; >sample_map = NULL; > diff --git a/src/mesa/drivers/dri/i965/gen6_multisample_state.c > b/src/mesa/drivers/dri/i965/gen6_multisample_state.c > index 49c6eba..8eb620d 100644 > --- a/src/mesa/drivers/dri/i965/gen6_multisample_state.c > +++ b/src/mesa/drivers/dri/i965/gen6_multisample_state.c > @@ -91,6 +91,17 @@ gen6_get_sample_position(struct gl_context *ctx, > * | 6 | 7 | | 7 | 1 | > * - - > * > + * 16X MSAA sample index layout 16x MSAA sample number layout > + * -- > + * | 0 | 1 | 2 | 3 ||15 |10 | 9 | 7 | > + * -- > + * | 4 | 5 | 6 | 7 || 4 | 1 | 3 |13 | > + * -- > + * | 8 | 9 |10 |11 ||12 | 2 | 0 | 6 | > + * -- > + * |12 |13 |14 |15 ||11 | 8 | 5 |14 | > + * -- > + * > * A sample map is used to map sample indices to sample numbers. > */ > void > @@ -99,10 +110,13 @@ gen6_set_sample_maps(struct gl_context *ctx) > uint8_t map_2x[2] = {0, 1}; > uint8_t map_4x[4] = {0, 1, 2, 3}; > uint8_t map_8x[8] = {5, 2, 4, 6, 0, 3, 7, 1}; > + uint8_t map_16x[16] = { 15, 10, 9, 7, 4, 1, 3, 13, > + 12, 2, 0, 6, 11, 8, 5, 14 }; > > memcpy(ctx->Const.SampleMap2x, map_2x, sizeof(map_2x)); > memcpy(ctx->Const.SampleMap4x, map_4x, sizeof(map_4x)); > memcpy(ctx->Const.SampleMap8x, map_8x, sizeof(map_8x)); > + memcpy(ctx->Const.SampleMap16x, map_16x, sizeof(map_16x)); > } > > /** > diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h > index fac45aa..e0a06f4 100644 > --- a/src/mesa/main/mtypes.h > +++ b/src/mesa/main/mtypes.h > @@ -3550,11 +3550,24 @@ struct gl_constants > * below: > *SampleMap8x = {a, b, c, d, e, f, g, h}; > * > -* Follow the logic for other sample counts. > +* Follow the logic for sample counts 2-8. > +* > +* For 16x the sample indices layout as a 4x4 grid
Re: [Mesa-dev] [PATCH 0/3] i965: Delete all of the non-NIR vec4 code
On Tue, Sep 22, 2015 at 4:29 PM, Matt Turner wrote: > On Mon, Sep 21, 2015 at 7:22 PM, Jason Ekstrand wrote: >> On Mon, Sep 21, 2015 at 6:15 PM, Jason Ekstrand wrote: >>> >>> On Sep 21, 2015 5:45 PM, "Matt Turner" wrote: On Mon, Sep 21, 2015 at 3:18 PM, Jason Ekstrand wrote: > At this point, piglit is the same as for GLSL and the shader-db numbers > are > looking pretty good. On SNB, GLSL vs. NIR for vec4 programs is: > >total instructions in shared programs: 2020573 -> 1822601 (-9.80%) >instructions in affected programs: 1883334 -> 1685362 (-10.51%) >helped:13328 >HURT: 3594 > > and there are patches on the list that improve this to > >total instructions in shared programs: 2020283 -> 1805487 (-10.63%) >instructions in affected programs: 1855759 -> 1640963 (-11.57%) >helped:14142 >HURT: 2346 Wow, that's great. I didn't realize we were that close. That said, I don't feel like we're /quite/ ready for this (especially with outstanding optimization patches on the list). I'm not sure what patches are pending. >>> >>> Only two: the one you sent today and Alejandro's patch to make copy >>> propagation less type-sensitive. >>> Some things I've seen in digging through hurt programs today: portal-2/high/5134 emits: vec1 ssa_53 = flog2 ssa_52 vec1 ssa_54 = flog2 ssa_52.y vec1 ssa_55 = flog2 ssa_52.z vec4 ssa_56 = vec4 ssa_53, ssa_54, ssa_55, ssa_42.w vec3 ssa_57 = fmul ssa_56, ssa_3 vec1 ssa_58 = fexp2 ssa_57 vec1 ssa_59 = fexp2 ssa_57.y vec1 ssa_60 = fexp2 ssa_57.z vec4 ssa_61 = vec4 ssa_58, ssa_59, ssa_60, ssa_42.w which we didn't transform into a vec3 pow with or without NIR but we really should. Why isn't NIR able to handle this? >> >> Ken and I were talking about this today. What it comes down to is >> that no one has written the pass yet. We haven't done that many >> vector optimizations to date. > > Yeah, I'm not concerned about this. We weren't doing it before either, > so it's not a regression. Yeah, writing a vectorizor is something that should probably be done but isn't any more urgent than any other "make vec4 better" thing. (also, why isn't ".x" printed when the use of an ssa value scalar, e.g., in the assignment of ssa_58 the RHS should use ssa_57.x). >> >> It doesn't print the identity swizzle. > > Patch sent. > We generate worse code for all_equal/any_nequal/any. >> >> Yes, we should fix that. Suggestions/patches welcome, I don't have >> any hot ideas at the moment. >> book-of-unwritten-tales/original/vp-33 (a vertex program) emits uses DPH and NIR doesn't have DPH. NIR should probably grow a DPH instruction even if we don't have an optimization to recognize open-coded DPH. >> >> We could detect fdot(vec4(a.x, a.y, a.z, 1)) in the backend if we >> really wanted to. The long-term solution is probably to add swizzle >> support to nir_search but that's going to be a real bear. How would >> having the nir_op_dph instruction help if we can't recognize it? > > Because the ARB vertex program language and the Mesa IR that we > translate from both have DPH already. We're just throwing it away > because NIR doesn't have DPH. Right. Should be easy enough to add. I can do that if you'd like or you can; I don't care. Lots of things hurt because of lack of global copy/constant propagation. I think NIR often emits the constant loads in blocks earlier than their uses and the backend optimizations aren't able to cope. See team-fortress-2/2197 for example (search for 953267991D, the hex value for 0.0001F). >> >> Hrm... One option would be to copy-prop load_const in emit_alu. This >> should be easy enough to do if we detect that it's a 2-src and one >> source is an immediate. We could also do global copy-prop but I don't >> know how hard that is. >> I remember this issue from the FS/NIR backend as well, but dota-2/504 (and others) emit: mad(8) g16<1>.xF g11<4,4,1>.xF g12<4,4,1>.xF g2<4,4,1>.xF mad(8) g19<1>.xF g10<4,4,1>.xF g12<4,4,1>.xF g2<4,4,1>.xF mad(8) g22<1>.xF g9<4,4,1>.xF g12<4,4,1>.xF g2<4,4,1>.xF mad(8) g25<1>.xF g8<4,4,1>.xF g12<4,4,1>.xF g2<4,4,1>.xF mad(8) g28<1>.xF g7<4,4,1>.xF g12<4,4,1>.xF g2<4,4,1>.xF mad(8) g31<1>.xF g6<4,4,1>.xF g12<4,4,1>.xF g2<4,4,1>.xF where the multiplication is duplicated. I can't remember what we decided. >> >> If I remember correctly, it came down to "optimization is hard" and we >> said "good enough" about our current heuristics. > > I dug out the old threads, but all I found w
Re: [Mesa-dev] [PATCH 10/12] i965/meta: Support 16x MSAA in the meta stencil blit
On Thu, Sep 17, 2015 at 05:00:12PM +0100, Neil Roberts wrote: > The destination rectangle is now drawn at 4x4 the size and the shader > code to calculate the sample number is adjusted accordingly. > --- > src/mesa/drivers/dri/i965/brw_meta_stencil_blit.c | 22 +- > 1 file changed, 17 insertions(+), 5 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_meta_stencil_blit.c > b/src/mesa/drivers/dri/i965/brw_meta_stencil_blit.c > index cbbb919..4e9aa94 100644 > --- a/src/mesa/drivers/dri/i965/brw_meta_stencil_blit.c > +++ b/src/mesa/drivers/dri/i965/brw_meta_stencil_blit.c > @@ -163,6 +163,13 @@ static const char *fs_tmpl = > " txl_coords.x = ((X & int(0xfff8)) >> 2) | (X & int(0x1));\n" > " txl_coords.y = ((Y & int(0xfffc)) >> 1) | (Y & int(0x1));\n" > " sample_index = (X & 0x4) | (Y & 0x2) | ((X & 0x2) >> 1);\n" > + " break;\n" > + " case 16:\n" > + " txl_coords.x = ((X & int(0xfff8)) >> 2) | (X & int(0x1));\n" > + " txl_coords.y = ((Y & int(0xfff8)) >> 2) | (Y & int(0x1));\n" > + " sample_index = (((Y & 0x4) << 1) | (X & 0x4) | (Y & 0x2) |\n" > + " ((X & 0x2) >> 1));\n" > + " break;\n" > " }\n" > "}\n" > "\n" > @@ -313,11 +320,16 @@ adjust_msaa(struct blit_dims *dims, int num_samples) >dims->dst_x0 *= 2; >dims->dst_x1 *= 2; > } else if (num_samples) { > - const int x_num_samples = num_samples / 2; > - dims->dst_x0 = ROUND_DOWN_TO(dims->dst_x0 * x_num_samples, > num_samples); > - dims->dst_y0 = ROUND_DOWN_TO(dims->dst_y0 * 2, 4); > - dims->dst_x1 = ALIGN(dims->dst_x1 * x_num_samples, num_samples); > - dims->dst_y1 = ALIGN(dims->dst_y1 * 2, 4); > + const int y_num_samples = num_samples >= 16 ? 4 : 2; > + const int x_num_samples = num_samples / y_num_samples; > + dims->dst_x0 = ROUND_DOWN_TO(dims->dst_x0 * x_num_samples, > + x_num_samples * 2); > + dims->dst_y0 = ROUND_DOWN_TO(dims->dst_y0 * y_num_samples, > + y_num_samples * 2); > + dims->dst_x1 = ALIGN(dims->dst_x1 * x_num_samples, > + x_num_samples * 2); > + dims->dst_y1 = ALIGN(dims->dst_y1 * y_num_samples, > + y_num_samples * 2); > } > } > I am in the middle of implementing stencil blits for gen9+ with stencil_export. I think this goes away if we do that, right (since only MSAA 16x is gen9+ only)? BTW, I don't actually care to understand what's going on here, hopefully my acked-by is good enough... ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 0/3] i965: Delete all of the non-NIR vec4 code
On Mon, Sep 21, 2015 at 7:22 PM, Jason Ekstrand wrote: > On Mon, Sep 21, 2015 at 6:15 PM, Jason Ekstrand wrote: >> >> On Sep 21, 2015 5:45 PM, "Matt Turner" wrote: >>> >>> On Mon, Sep 21, 2015 at 3:18 PM, Jason Ekstrand >>> wrote: >>> > At this point, piglit is the same as for GLSL and the shader-db numbers >>> > are >>> > looking pretty good. On SNB, GLSL vs. NIR for vec4 programs is: >>> > >>> >total instructions in shared programs: 2020573 -> 1822601 (-9.80%) >>> >instructions in affected programs: 1883334 -> 1685362 (-10.51%) >>> >helped:13328 >>> >HURT: 3594 >>> > >>> > and there are patches on the list that improve this to >>> > >>> >total instructions in shared programs: 2020283 -> 1805487 (-10.63%) >>> >instructions in affected programs: 1855759 -> 1640963 (-11.57%) >>> >helped:14142 >>> >HURT: 2346 >>> >>> Wow, that's great. I didn't realize we were that close. >>> >>> That said, I don't feel like we're /quite/ ready for this (especially >>> with outstanding optimization patches on the list). I'm not sure what >>> patches are pending. >> >> Only two: the one you sent today and Alejandro's patch to make copy >> propagation less type-sensitive. >> >>> Some things I've seen in digging through hurt programs today: >>> >>> portal-2/high/5134 emits: >>> >>> vec1 ssa_53 = flog2 ssa_52 >>> vec1 ssa_54 = flog2 ssa_52.y >>> vec1 ssa_55 = flog2 ssa_52.z >>> vec4 ssa_56 = vec4 ssa_53, ssa_54, ssa_55, ssa_42.w >>> vec3 ssa_57 = fmul ssa_56, ssa_3 >>> vec1 ssa_58 = fexp2 ssa_57 >>> vec1 ssa_59 = fexp2 ssa_57.y >>> vec1 ssa_60 = fexp2 ssa_57.z >>> vec4 ssa_61 = vec4 ssa_58, ssa_59, ssa_60, ssa_42.w >>> >>> which we didn't transform into a vec3 pow with or without NIR but we >>> really should. Why isn't NIR able to handle this? > > Ken and I were talking about this today. What it comes down to is > that no one has written the pass yet. We haven't done that many > vector optimizations to date. Yeah, I'm not concerned about this. We weren't doing it before either, so it's not a regression. >>> (also, why isn't >>> ".x" printed when the use of an ssa value scalar, e.g., in the >>> assignment of ssa_58 the RHS should use ssa_57.x). > > It doesn't print the identity swizzle. Patch sent. >>> We generate worse code for all_equal/any_nequal/any. > > Yes, we should fix that. Suggestions/patches welcome, I don't have > any hot ideas at the moment. > >>> book-of-unwritten-tales/original/vp-33 (a vertex program) emits uses >>> DPH and NIR doesn't have DPH. NIR should probably grow a DPH >>> instruction even if we don't have an optimization to recognize >>> open-coded DPH. > > We could detect fdot(vec4(a.x, a.y, a.z, 1)) in the backend if we > really wanted to. The long-term solution is probably to add swizzle > support to nir_search but that's going to be a real bear. How would > having the nir_op_dph instruction help if we can't recognize it? Because the ARB vertex program language and the Mesa IR that we translate from both have DPH already. We're just throwing it away because NIR doesn't have DPH. >>> Lots of things hurt because of lack of global copy/constant >>> propagation. I think NIR often emits the constant loads in blocks >>> earlier than their uses and the backend optimizations aren't able to >>> cope. See team-fortress-2/2197 for example (search for 953267991D, the >>> hex value for 0.0001F). > > Hrm... One option would be to copy-prop load_const in emit_alu. This > should be easy enough to do if we detect that it's a 2-src and one > source is an immediate. We could also do global copy-prop but I don't > know how hard that is. > >>> I remember this issue from the FS/NIR backend as well, but dota-2/504 >>> (and others) emit: >>> >>> mad(8) g16<1>.xF g11<4,4,1>.xF g12<4,4,1>.xF g2<4,4,1>.xF >>> mad(8) g19<1>.xF g10<4,4,1>.xF g12<4,4,1>.xF g2<4,4,1>.xF >>> mad(8) g22<1>.xF g9<4,4,1>.xF g12<4,4,1>.xF g2<4,4,1>.xF >>> mad(8) g25<1>.xF g8<4,4,1>.xF g12<4,4,1>.xF g2<4,4,1>.xF >>> mad(8) g28<1>.xF g7<4,4,1>.xF g12<4,4,1>.xF g2<4,4,1>.xF >>> mad(8) g31<1>.xF g6<4,4,1>.xF g12<4,4,1>.xF g2<4,4,1>.xF >>> >>> where the multiplication is duplicated. I can't remember what we decided. > > If I remember correctly, it came down to "optimization is hard" and we > said "good enough" about our current heuristics. I dug out the old threads, but all I found was a snarky reply. Patch sent. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [RFC] nir/opt_peephole_ffma: Bypass fusion if any operand of fadd and fmul is a const
On Tue, Sep 22, 2015 at 4:04 PM, Matt Turner wrote: > On Fri, Sep 18, 2015 at 12:49 AM, Eduardo Lima Mitev wrote: >> When both fadd and fmul instructions have at least one operand that is a >> constant and it is only used once, the total number of instructions can >> be reduced from 3 (1 ffma + 2 load_const) to 2 (1 fmul + 1 fadd); because >> the constants will be progagated as immediate operands of fmul and fadd. >> >> This patch modifies opt_peephole_ffma pass to detect this situation and >> bails-out fusing fmul+fadd into ffma. >> >> As shown in shader-db results below, it seems to help a good bunch. However, >> there are some caveats: >> >> * It seems i965 specific, so I'm not sure if modifying the NIR pass >> directly is desired, as opposed to moving this to the backend. >> >> * There are still a high number of HURTs, but these could be reduced by being >> more specific in the conditions to bailout. >> >> total instructions in shared programs: 1683959 -> 1677447 (-0.39%) >> instructions in affected programs: 604918 -> 598406 (-1.08%) >> helped:4633 >> HURT: 804 >> GAINED:0 >> LOST: 0 >> --- > > Interesting -- yeah, I've thought about doing this as well. It was > more difficult before because with GLSL IR (where I was trying to do > it) it wasn't possible to determine if the constant was used by > multiple 3-src instructions. Actually, your check might be able to be > more refined to consider only uses of 3-src instructions. > > But that's getting kind of hardware-specific. If we want to move this into the i965 driver we can. I think we're the only users. That would completely get rid of hardware-specificness issues. --Jason > Perhaps another approach would be to modify the > opt_combine_constants() pass to split MADs under some circumstances -- > e.g., it accounts for the only use of a constant we would otherwise > have to promote. But of course we don't have that pass for the vec4 > backend. > > In the mean time, I've sent a related patch that may be of interest: > "[PATCH] nir: Don't fuse fmul into ffma if used by more than 4 fadds." > > This patch, applied on top of mine gives these results on Haswell: > > Total: > total instructions in shared programs: 6595563 -> 6584885 (-0.16%) > instructions in affected programs: 1183608 -> 1172930 (-0.90%) > helped:8074 > HURT: 842 > GAINED:4 > > FS: > total instructions in shared programs: 4863484 -> 4859884 (-0.07%) > instructions in affected programs: 554042 -> 550442 (-0.65%) > helped:3072 > HURT: 38 > GAINED:4 > > VS: > total instructions in shared programs: 1729224 -> 1722146 (-0.41%) > instructions in affected programs: 629566 -> 622488 (-1.12%) > total loops in shared programs:221 -> 221 (0.00%) > helped:5002 > HURT: 804 > > Another thing to consider for the vec4 backend is that vec4 uniforms > have to be unpacked for use by 3-src instructions (see the > VEC4_OPCODE_UNPACK_UNIFORM opcode). We CSE the unpacking operations, > but they often do account for increases in instruction counts. > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 08/12] i965: Support allocating the MCS buffer for 16x MSAA
On Thu, Sep 17, 2015 at 05:00:10PM +0100, Neil Roberts wrote: > When 16 samples are used the MCS buffer needs 64 bits per pixel. > --- > src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 6 ++ > 1 file changed, 6 insertions(+) > > diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c > b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c > index 0cb0632..9faafb4 100644 > --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c > +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c > @@ -1374,6 +1374,12 @@ intel_miptree_alloc_mcs(struct brw_context *brw, > */ >format = MESA_FORMAT_R_UINT32; >break; > + case 16: > + /* 64 bits/pixel are required for MCS data when using 16x MSAA (4 bits > + * for each sample). > + */ > + format = MESA_FORMAT_RG_UINT32; > + break; > default: >unreachable("Unrecognized sample count in intel_miptree_alloc_mcs"); > }; This and the previous are: Reviewed-by: Ben Widawsky ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 09/12] i965/fs/skl+: Fix calculating gl_SampleID for 16x MSAA
On Thu, Sep 17, 2015 at 05:00:11PM +0100, Neil Roberts wrote: > In order to accomodate 16x MSAA, the starting sample pair index is now > 3 bits rather than 2 on SKL+. Hooray for bad docs. "PS Thread Payload for Normal Dispatch": SSPI is 2 bits (R0.0.7:6) with 1 reserved for expansion. > --- > src/mesa/drivers/dri/i965/brw_fs.cpp | 9 - > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp > b/src/mesa/drivers/dri/i965/brw_fs.cpp > index 82f49b4..bd9bcdd 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs.cpp > +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp > @@ -1300,9 +1300,16 @@ fs_visitor::emit_sampleid_setup() > * are sample 1 of subspan 0; the third group is sample 0 of > * subspan 1, and finally sample 1 of subspan 1. > */ > + > + /* SKL+ has an extra bit for the Starting Sample Pair Index to > + * accomodate 16x MSAA. > + */ > + unsigned sspi_bits = devinfo->gen >= 9 ? 3 : 2; > + unsigned sspi_mask = ((1 << sspi_bits) - 1) << 6; > + I think using the bits makes this unnecessarily hard to read. unsigned sspi_mask = devinfo->gen >= 9 ? 0x1c0 : 0xc0; >abld.exec_all() >.AND(t1, fs_reg(retype(brw_vec1_grf(0, 0), BRW_REGISTER_TYPE_UD)), > - fs_reg(0xc0)); > + fs_reg(sspi_mask)); >abld.exec_all().SHR(t1, t1, fs_reg(5)); > >/* This works for both SIMD8 and SIMD16 */ I'm really sketchy on the details of how this actually works. Mostly, I get what this code is doing, and I agree that the only difference for 16x should be the extra bit (because SSPI needs to go up to 7 for SIMD8). So take my: Reviewed-by: Ben Widawsky with a grain of salt. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [RFC] nir/opt_peephole_ffma: Bypass fusion if any operand of fadd and fmul is a const
On Fri, Sep 18, 2015 at 12:49 AM, Eduardo Lima Mitev wrote: > When both fadd and fmul instructions have at least one operand that is a > constant and it is only used once, the total number of instructions can > be reduced from 3 (1 ffma + 2 load_const) to 2 (1 fmul + 1 fadd); because > the constants will be progagated as immediate operands of fmul and fadd. > > This patch modifies opt_peephole_ffma pass to detect this situation and > bails-out fusing fmul+fadd into ffma. > > As shown in shader-db results below, it seems to help a good bunch. However, > there are some caveats: > > * It seems i965 specific, so I'm not sure if modifying the NIR pass > directly is desired, as opposed to moving this to the backend. > > * There are still a high number of HURTs, but these could be reduced by being > more specific in the conditions to bailout. > > total instructions in shared programs: 1683959 -> 1677447 (-0.39%) > instructions in affected programs: 604918 -> 598406 (-1.08%) > helped:4633 > HURT: 804 > GAINED:0 > LOST: 0 > --- Interesting -- yeah, I've thought about doing this as well. It was more difficult before because with GLSL IR (where I was trying to do it) it wasn't possible to determine if the constant was used by multiple 3-src instructions. Actually, your check might be able to be more refined to consider only uses of 3-src instructions. But that's getting kind of hardware-specific. Perhaps another approach would be to modify the opt_combine_constants() pass to split MADs under some circumstances -- e.g., it accounts for the only use of a constant we would otherwise have to promote. But of course we don't have that pass for the vec4 backend. In the mean time, I've sent a related patch that may be of interest: "[PATCH] nir: Don't fuse fmul into ffma if used by more than 4 fadds." This patch, applied on top of mine gives these results on Haswell: Total: total instructions in shared programs: 6595563 -> 6584885 (-0.16%) instructions in affected programs: 1183608 -> 1172930 (-0.90%) helped:8074 HURT: 842 GAINED:4 FS: total instructions in shared programs: 4863484 -> 4859884 (-0.07%) instructions in affected programs: 554042 -> 550442 (-0.65%) helped:3072 HURT: 38 GAINED:4 VS: total instructions in shared programs: 1729224 -> 1722146 (-0.41%) instructions in affected programs: 629566 -> 622488 (-1.12%) total loops in shared programs:221 -> 221 (0.00%) helped:5002 HURT: 804 Another thing to consider for the vec4 backend is that vec4 uniforms have to be unpacked for use by 3-src instructions (see the VEC4_OPCODE_UNPACK_UNIFORM opcode). We CSE the unpacking operations, but they often do account for increases in instruction counts. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] mesa: Reduce libGL.so binary size by about 15%
On 09/17/2015 03:19 PM, Arlie Davis wrote: > Ok, here's v2 of the change, with the suggested edits. So... I think this code is fine, and I admire the effort. I have a couple concerns. 1. We have no way to test this, so it's quite possible something was broken. 2. This function is only used in the OSX builds. Jeremy is the maintainer for those builds, so I've added him to the CC list. For every non-OSX build, we should just stop linking src/mapi/glapi/glapi_gentable.c. I thought maybe the X sever used it, but I couldn't find any evidence of that. If this is still a viable route, I have a few suggestions of follow-on patches... I guess this patch is Reviewed-by: Ian Romanick but I really think we need to get Jeremy's approval before pushing it. > From 5f393faa058f453408dfc640eecae3fe6335dfed Mon Sep 17 00:00:00 2001 > From: Arlie Davis > Date: Tue, 15 Sep 2015 09:58:34 -0700 > Subject: [PATCH] This patch significantly reduces the size of the libGL.so > binary. It does not change the (externally visible) behavior of libGL.so at > all. > > gl_gentable.py generates a function, _glapi_create_table_from_handle. > This function allocates a large dispatch table, consisting of 1300 or so > function pointers, and fills this dispatch table by doing symbol lookups > on a given shared library. Previously, gl_gentable.py would generate a > single, very large _glapi_create_table_from_handle function, with a short > cluster of lines for each entry point (function). The idiom it generates > was a NULL check, a call to snprintf, a call to dlsym / GetProcAddress, > and then a store into the dispatch table. Since this function processes > a large number of entry points, this code is duplicated many times over. > > We can encode the same information much more compactly, by using a lookup > table. The previous total size of _glapi_create_table_from_handle on x64 > was 125848 bytes. By using a lookup table, the size of > _glapi_create_table_from_handle (and the related lookup tables) is reduced > to 10840 bytes. In other words, this enormous function is reduced by 91%. > The size of the entire libGL.so binary (measured when stripped) itself drops > by 15%. > > So the purpose of this change is to reduce the binary size, which frees up > disk space, memory, etc. > --- > src/mapi/glapi/gen/gl_gentable.py | 57 > --- > 1 file changed, 41 insertions(+), 16 deletions(-) > > diff --git a/src/mapi/glapi/gen/gl_gentable.py > b/src/mapi/glapi/gen/gl_gentable.py > index 1b3eb72..7cd475a 100644 > --- a/src/mapi/glapi/gen/gl_gentable.py > +++ b/src/mapi/glapi/gen/gl_gentable.py > @@ -113,6 +113,9 @@ __glapi_gentable_set_remaining_noop(struct _glapi_table > *disp) { > dispatch[i] = p.v; > } > > +""" > + > +footer = """ > struct _glapi_table * > _glapi_create_table_from_handle(void *handle, const char *symbol_prefix) { > struct _glapi_table *disp = calloc(_glapi_get_dispatch_table_size(), > sizeof(_glapi_proc)); > @@ -123,27 +126,28 @@ _glapi_create_table_from_handle(void *handle, const > char *symbol_prefix) { > > if(symbol_prefix == NULL) > symbol_prefix = ""; > -""" > > -footer = """ > -__glapi_gentable_set_remaining_noop(disp); > - > -return disp; > -} > -""" > +/* Note: This code relies on _glapi_table_func_names being sorted by the > + * entry point index of each function. > + */ > +for (int func_index = 0; func_index < GLAPI_TABLE_COUNT; ++func_index) { > +const char *name = _glapi_table_func_names[func_index]; > +void ** procp = &((void **)disp)[func_index]; > > -body_template = """ > -if(!disp->%(name)s) { > -void ** procp = (void **) &disp->%(name)s; > -snprintf(symboln, sizeof(symboln), "%%s%(entry_point)s", > symbol_prefix); > +snprintf(symboln, sizeof(symboln), \"%s%s\", symbol_prefix, name); > #ifdef _WIN32 > *procp = GetProcAddress(handle, symboln); > #else > *procp = dlsym(handle, symboln); > #endif > } > +__glapi_gentable_set_remaining_noop(disp); > + > +return disp; > +} > """ > > + > class PrintCode(gl_XML.gl_print_base): > > def __init__(self): > @@ -180,12 +184,33 @@ class PrintCode(gl_XML.gl_print_base): > > > def printBody(self, api): > -for f in api.functionIterateByOffset(): > -for entry_point in f.entry_points: > -vars = { 'entry_point' : entry_point, > - 'name' : f.name } > > -print body_template % vars > +# Determine how many functions have a defined offset. > +func_count = 0 > +for f in api.functions_by_name.itervalues(): > +if f.offset != -1: > +func_count += 1 > + > +# Build the mapping from offset to function name. > +funcnames = [None] * func_count > +for f in api.functions_by_name.itervalues(): > +if f.offset != -1: >
[Mesa-dev] [PATCH] nir: Don't fuse fmul into ffma if used by more than 4 fadds.
total instructions in shared programs: 6596689 -> 6595563 (-0.02%) instructions in affected programs: 103154 -> 102028 (-1.09%) helped:253 HURT: 217 It's kind of a wash in terms of programs helped/hurt, but of the programs helped 169 are by more than 10%. --- I tried values of 2-6, and 4 seemed to be the best. I can provide full shader-db result files if other people want to investigate. src/glsl/nir/nir_opt_peephole_ffma.c | 20 1 file changed, 12 insertions(+), 8 deletions(-) diff --git a/src/glsl/nir/nir_opt_peephole_ffma.c b/src/glsl/nir/nir_opt_peephole_ffma.c index 4f0f0da..3e8a34f 100644 --- a/src/glsl/nir/nir_opt_peephole_ffma.c +++ b/src/glsl/nir/nir_opt_peephole_ffma.c @@ -39,7 +39,7 @@ struct peephole_ffma_state { }; static inline bool -are_all_uses_fadd(nir_ssa_def *def) +are_all_uses_fadd(nir_ssa_def *def, unsigned *num_uses) { if (!list_empty(&def->if_uses)) return false; @@ -53,6 +53,7 @@ are_all_uses_fadd(nir_ssa_def *def) nir_alu_instr *use_alu = nir_instr_as_alu(use_instr); switch (use_alu->op) { case nir_op_fadd: + (*num_uses)++; break; /* This one's ok */ case nir_op_imov: @@ -60,7 +61,7 @@ are_all_uses_fadd(nir_ssa_def *def) case nir_op_fneg: case nir_op_fabs: assert(use_alu->dest.dest.is_ssa); - if (!are_all_uses_fadd(&use_alu->dest.dest.ssa)) + if (!are_all_uses_fadd(&use_alu->dest.dest.ssa, num_uses)) return false; break; @@ -101,15 +102,18 @@ get_mul_for_src(nir_alu_src *src, int num_components, *abs = true; break; - case nir_op_fmul: - /* Only absorb a fmul into a ffma if the fmul is is only used in fadd - * operations. This prevents us from being too aggressive with our - * fusing which can actually lead to more instructions. + case nir_op_fmul: { + /* Only fuse an fmul into an ffma if its result is used by not more than + * four fadd operations. This prevents us from too aggressively fusing + * operations which can actually lead to more instructions or many ffma + * operations performing the same multiply. */ - if (!are_all_uses_fadd(&alu->dest.dest.ssa)) + + unsigned num_uses = 0; + if (!are_all_uses_fadd(&alu->dest.dest.ssa, &num_uses) || num_uses > 4) return NULL; break; - + } default: return NULL; } -- 2.4.6 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH resend] r600g: update num_dw in scissor_enable workaround
"r600g: apply disable workaround on all scissors" forgot to update num_dw, fix it. Fixes: fbb423b433 "r600g: apply disable workaround on all scissors" Reported-and-tested-by: Markus Trippelsdorf Signed-off-by: Grazvydas Ignotas --- resending with s-o-b src/gallium/drivers/r600/r600_state_common.c | 1 + 1 file changed, 1 insertion(+) diff --git a/src/gallium/drivers/r600/r600_state_common.c b/src/gallium/drivers/r600/r600_state_common.c index 21c89dc..efce852 100644 --- a/src/gallium/drivers/r600/r600_state_common.c +++ b/src/gallium/drivers/r600/r600_state_common.c @@ -367,6 +367,7 @@ static void r600_bind_rs_state(struct pipe_context *ctx, void *state) rs->scissor_enable != rctx->scissor.enable) { rctx->scissor.enable = rs->scissor_enable; rctx->scissor.dirty_mask = (1 << R600_MAX_VIEWPORTS) - 1; + rctx->scissor.atom.num_dw = R600_MAX_VIEWPORTS * 4; r600_mark_atom_dirty(rctx, &rctx->scissor.atom); } -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v5 00/70] ARB_shader_storage_buffer_object (mesa, i965)
On Mon, Sep 21, 2015 at 12:39:14PM +0200, Iago Toral wrote: > Hi Kristian, > > On Fri, 2015-09-18 at 16:56 -0700, Kristian Høgsberg wrote: > > On Thu, Sep 10, 2015 at 03:35:16PM +0200, Iago Toral Quiroga wrote: > > > Hi, > > > > > > this is the latest version of the ARB_shader_storage_buffer_object > > > implementation. A good part of the frontend bits for this are already in > > > master, but this adds some more missing pieces, specifically std430 and > > > memory qualifiers. Additionally, this provides the i965 implementation. > > > > I've gone through all patches in the series and I replied to patches > > where I had comments. Overall, the series look good and with the > > comments addressed, I'm ready to give my Reviewed-by for the series. > > I want to take a closer look at the atomics lowering in patches 49+, > > but I'm done for today. Base on the quick look-through I did, I don't > > expect to find any showstoppers there. > > great, thanks for reviewing this! We will send new versions of the > patches for which you had comments or reply to your comments otherwise. I had a look at the atomics lowering and that looks fine. Except for patch 40/70, which I hope we can clean up a little bit, all updated patches and explanation in 38/70 look good. Kristian > Iago > > > Here's a summary of what I found: > > > > [PATCH v5 01/70] mesa: set MAX_SHADER_STORAGE_BUFFERS to 15. > > > > Update limit to 16 and drop the comment > > > > [PATCH v5 02/70] i965: Use 16-byte offset alignment for shader storage > > buffers > > > > ctx->Const.ShaderStorageBufferOffsetAlignment should be 64 > > > > [PATCH v5 23/70] glsl: refactor parser processing of an interface block > > definition > > > > Clarify that the commit is just moving code. > > > > [PATCH v5 26/70] glsl: Add parser/compiler support for std430 interface > > packing qualifier > > > > Update the error to also mention shader storage blocks, not just ubos? > > > > [PATCH v5 28/70] glsl: add std430 interface packing support to ssbo related > > operations > > > > Why are we passing false for is_std430 here (emit_access in > > handle_rvalue)? We use handle_rvalue for both UBO and SSBO loads, > > right? Also, for consistency, I'd prefer if we could just pass > > 'packing' around instead of is_std430. > > > > [PATCH v5 29/70] glsl: a shader storage buffer must be smaller than the > > maximum size allowed > > > > Two chunks look like they should be their own patch ("Add unsized > > array support to glsl_type::std140_size" or such). > > > > [PATCH v5 38/70] i965/nir/vec4: Implement nir_intrinsic_store_ssbo > > > > Shouldn't this be 'skipped_channels += num_channels;' to handle write > > mask reg.yw? > > > > [PATCH v5 40/70] nir: Implement __intrinsic_load_ssbo > > > > Refactor handling of cmp instruction for converting to bool > > > > [PATCH v5 54/70] glsl: First argument to atomic functions must be a buffer > > variable > > > > Nitpick: move check that only looks at first element in list to after loop > > > > Also, I expect that before we land this series (thought that shouldn't > > be far off), we'll have deleted the vec4 GLSL IR visitor so we can > > drop these patches (I didn't review them): > > > > [PATCH v5 16/70] i965/vec4: Implement ir_unop_get_buffer_size > > [PATCH v5 39/70] i965/vec4: Implement __intrinsic_store_ssbo > > [PATCH v5 43/70] i965/vec4: Implement __intrinsic_load_ssbo > > [PATCH v5 53/70] i965/vec4: Implement lowered SSBO atomic intrinsics > > > > I wrote the initial prototype of the SSBO functionality, but I don't > > recall writing: > > > > [PATCH v5 45/70] glsl: atomic counters can be declared as buffer-qualified > > variables > > > > I don't think I did anything for atomics in my patch. Feel free to > > take ownership of that one and add my Reviewed-by. > > > > thanks, > > Kristian > > > > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 1/8] radeonsi: Allocate buffers for DCC.
This looks good. Just some nits below. On Fri, Sep 4, 2015 at 9:47 PM, Bas Nieuwenhuizen wrote: > As the alignment requirements can be 32 KiB or more, also adding > an aligned buffer creation function. > > Signed-off-by: Bas Nieuwenhuizen > --- > src/gallium/drivers/radeon/r600_buffer_common.c | 20 +++ > src/gallium/drivers/radeon/r600_pipe_common.h | 6 > src/gallium/drivers/radeon/r600_texture.c | 18 ++ > src/gallium/drivers/radeon/radeon_winsys.h | 5 +++ > src/gallium/winsys/amdgpu/drm/amdgpu_surface.c | 48 > ++--- > 5 files changed, 92 insertions(+), 5 deletions(-) > > diff --git a/src/gallium/drivers/radeon/r600_buffer_common.c > b/src/gallium/drivers/radeon/r600_buffer_common.c > index cb9809f..d869856 100644 > --- a/src/gallium/drivers/radeon/r600_buffer_common.c > +++ b/src/gallium/drivers/radeon/r600_buffer_common.c > @@ -422,6 +422,26 @@ struct pipe_resource *r600_buffer_create(struct > pipe_screen *screen, > return &rbuffer->b.b; > } > > +struct pipe_resource * r600_aligned_buffer_create(struct pipe_screen *screen, > + unsigned bind, > + unsigned usage, > + unsigned size, > + unsigned alignment) > +{ > + struct pipe_resource buffer; Missing space after the declaration. > + memset(&buffer, 0, sizeof buffer); > + buffer.target = PIPE_BUFFER; > + buffer.format = PIPE_FORMAT_R8_UNORM; > + buffer.bind = bind; > + buffer.usage = usage; > + buffer.flags = 0; > + buffer.width0 = size; > + buffer.height0 = 1; > + buffer.depth0 = 1; > + buffer.array_size = 1; > + return r600_buffer_create(screen, &buffer, alignment); This should use tabs for indentation. > +} > + > struct pipe_resource * > r600_buffer_from_user_memory(struct pipe_screen *screen, > const struct pipe_resource *templ, > diff --git a/src/gallium/drivers/radeon/r600_pipe_common.h > b/src/gallium/drivers/radeon/r600_pipe_common.h > index d22c230..4bbecca 100644 > --- a/src/gallium/drivers/radeon/r600_pipe_common.h > +++ b/src/gallium/drivers/radeon/r600_pipe_common.h > @@ -212,6 +212,7 @@ struct r600_texture { > struct r600_fmask_info fmask; > struct r600_cmask_info cmask; > struct r600_resource*cmask_buffer; > + struct r600_resource*dcc_buffer; > unsignedcb_color_info; /* fast clear enable > bit */ > unsignedcolor_clear_value[2]; > > @@ -488,6 +489,11 @@ bool r600_init_resource(struct r600_common_screen > *rscreen, > struct pipe_resource *r600_buffer_create(struct pipe_screen *screen, > const struct pipe_resource *templ, > unsigned alignment); > +struct pipe_resource * r600_aligned_buffer_create(struct pipe_screen *screen, > + unsigned bind, > + unsigned usage, > + unsigned size, > + unsigned alignment); > struct pipe_resource * > r600_buffer_from_user_memory(struct pipe_screen *screen, > const struct pipe_resource *templ, > diff --git a/src/gallium/drivers/radeon/r600_texture.c > b/src/gallium/drivers/radeon/r600_texture.c > index 89f18fb..46e735e 100644 > --- a/src/gallium/drivers/radeon/r600_texture.c > +++ b/src/gallium/drivers/radeon/r600_texture.c > @@ -268,6 +268,7 @@ static void r600_texture_destroy(struct pipe_screen > *screen, > if (rtex->cmask_buffer != &rtex->resource) { > pipe_resource_reference((struct > pipe_resource**)&rtex->cmask_buffer, NULL); > } > + pipe_resource_reference((struct pipe_resource**)&rtex->dcc_buffer, > NULL); > pb_reference(&resource->buf, NULL); > FREE(rtex); > } > @@ -482,6 +483,20 @@ static void r600_texture_alloc_cmask_separate(struct > r600_common_screen *rscreen > rtex->cb_color_info |= EG_S_028C70_FAST_CLEAR(1); > } > > +static void vi_texture_alloc_dcc_separate(struct r600_common_screen *rscreen, > + struct r600_texture *rtex) > +{ > + rtex->dcc_buffer = (struct r600_resource *) > + r600_aligned_buffer_create(&rscreen->b, PIPE_BIND_CUSTOM, > + PIPE_USAGE_DEFAULT, > rtex->surface.dcc_size, rtex->surface.dcc_alignment); > + if (rtex->dcc_buffer == NULL) { > + return; > + } > + > + r600_screen_clear_buffer(rscreen, &rtex->dcc_buffer->b.b, 0, > rtex->surface.dcc_size, > +0x, true); > +} > + > static unsign
Re: [Mesa-dev] [PATCH v2] i965/vec4: Don't coalesce regs in Gen6 MATH ops if reswizzle/writemask needed
On Tue, Sep 22, 2015 at 9:53 AM, Antia Puentes wrote: > Gen6 MATH instructions can not execute in align16 mode, so swizzles or > writemasking are not allowed. > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92033 > --- > > I have tried to find an example where the writemask check is strictly needed > in order to avoid an incorrect register coalescing, but I have failed. > > If I have something like this in my shader: > > in vec4 attr_pos; > > void main() { > vec4 a; > a.xy = sin(attr_pos); > .. > } > > It is translated into the following vec4 instructions: > sin vgrf4.0:F, vgrf3.xyzw:F > mov vgrf0.0.xy:F, vgrf4.xyzw:F <-- (added by emit_math to fix the > writemasking issue) > > And converted by the opt_reduce_swizzle optimization into: > sin vgrf4.0:F, vgrf3.xyzw:F > mov vgrf0.0.xy:F, vgrf4.xyyy:F > > The swizzle in the MOV instruction is not the identity anymore, so the method > "can_reswizzle" > in opt_register_coalesce will return FALSE because of the swizzle-check that > my first > version of the patch added. > > Out of curiosity, I disabled the opt_reduce_swizzle optimization to see what > happens. > Still "can_reswizzle" returns FALSE preventing the incorrect register > coalescing, > because the math function sets more things than the MOV instruction; i.e. > the check "(dst.writemask & ~swizzle_mask) is positive. > > Although I have not found an example where the writemask-check is crucial, > that does not neccessarily > mean that it does not exists. I am sending the second version of the patch in > case you prefer to have > that check in place. Thanks Antia. This seems like a good thing to do. > src/mesa/drivers/dri/i965/brw_ir_vec4.h | 3 ++- > src/mesa/drivers/dri/i965/brw_vec4.cpp | 11 +-- > 2 files changed, 11 insertions(+), 3 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_ir_vec4.h > b/src/mesa/drivers/dri/i965/brw_ir_vec4.h > index 966a410..a48bb68 100644 > --- a/src/mesa/drivers/dri/i965/brw_ir_vec4.h > +++ b/src/mesa/drivers/dri/i965/brw_ir_vec4.h > @@ -175,7 +175,8 @@ public: > > bool is_send_from_grf(); > unsigned regs_read(unsigned arg) const; > - bool can_reswizzle(int dst_writemask, int swizzle, int swizzle_mask); > + bool can_reswizzle(const struct brw_device_info *devinfo, int > dst_writemask, > + int swizzle, int swizzle_mask); > void reswizzle(int dst_writemask, int swizzle); > bool can_do_source_mods(const struct brw_device_info *devinfo); > > diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp > b/src/mesa/drivers/dri/i965/brw_vec4.cpp > index ed49cd3..ae695f1 100644 > --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp > +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp > @@ -941,10 +941,17 @@ vec4_visitor::opt_set_dependency_control() > } > > bool > -vec4_instruction::can_reswizzle(int dst_writemask, > +vec4_instruction::can_reswizzle(const struct brw_device_info *devinfo, > +int dst_writemask, > int swizzle, > int swizzle_mask) > { > + /* Gen6 MATH instructions can not execute in align16 mode, so swizzles > +* or writemasking are not allowed. */ */ goes on its own line in Mesa. With that, Reviewed-by: Matt Turner ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 04/12] i965/fs/skl+: Use lcd2dms_w instead of lcd2dms
On 09/17/2015 09:00 AM, Neil Roberts wrote: > In order to support 16x MSAA, skl+ has a wider version of lcd2dms that > takes two parameters for the MCS data. This patch makes it allocate a > register that is twice as big for the MCS data and then always use > the wider version. > --- > src/mesa/drivers/dri/i965/brw_defines.h| 4 > src/mesa/drivers/dri/i965/brw_disasm.c | 1 + > src/mesa/drivers/dri/i965/brw_fs.cpp | 27 > -- > src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 5 + > src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 9 ++--- > src/mesa/drivers/dri/i965/brw_shader.cpp | 5 + > 6 files changed, 46 insertions(+), 5 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_defines.h > b/src/mesa/drivers/dri/i965/brw_defines.h > index 8fc8ceb..2d5b67d 100644 > --- a/src/mesa/drivers/dri/i965/brw_defines.h > +++ b/src/mesa/drivers/dri/i965/brw_defines.h > @@ -969,6 +969,8 @@ enum opcode { > FS_OPCODE_TXB_LOGICAL, > SHADER_OPCODE_TXF_CMS, > SHADER_OPCODE_TXF_CMS_LOGICAL, > + SHADER_OPCODE_TXF_CMS_W, > + SHADER_OPCODE_TXF_CMS_W_LOGICAL, > SHADER_OPCODE_TXF_UMS, > SHADER_OPCODE_TXF_UMS_LOGICAL, > SHADER_OPCODE_TXF_MCS, > @@ -1517,10 +1519,12 @@ enum brw_message_target { > #define GEN7_SAMPLER_MESSAGE_SAMPLE_GATHER4_PO 17 > #define GEN7_SAMPLER_MESSAGE_SAMPLE_GATHER4_PO_C 18 > #define HSW_SAMPLER_MESSAGE_SAMPLE_DERIV_COMPARE 20 > +#define GEN9_SAMPLER_MESSAGE_SAMPLE_LD2DMS_W 28 > #define GEN7_SAMPLER_MESSAGE_SAMPLE_LD_MCS 29 > #define GEN7_SAMPLER_MESSAGE_SAMPLE_LD2DMS 30 > #define GEN7_SAMPLER_MESSAGE_SAMPLE_LD2DSS 31 > > + Spurious whitespace change. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 11/12] meta: Support 16x MSAA in the multisample scaled blit shader
On 09/17/2015 09:00 AM, Neil Roberts wrote: > I'm not too sure about the expression used to index into sample_map in > the shader. It looks like if fract(coord.x) and fract(coord.y) are > close to 1.0 then it would index outside of the array. However the > code for 4 and 8 has the same problem and the results seems to look > reasonable. It might make more sense to change it to something like > this: > > sample_map[int(4 * fract(coord.x)) + 4 * int(fract(coord.y) * 4)] Yeah... Jason also noticed some strange things in this area. I've got a patch out (http://lists.freedesktop.org/archives/mesa-dev/2015-September/095026.html) that also fixes a bug here. I'm going to dig a bit deeper if you want to leave it as-is for now. This patch is Reviewed-by: Ian Romanick > --- > src/mesa/drivers/common/meta.h | 2 ++ > src/mesa/drivers/common/meta_blit.c| 8 ++-- > src/mesa/drivers/dri/i965/gen6_multisample_state.c | 14 ++ > src/mesa/main/mtypes.h | 15 ++- > 4 files changed, 36 insertions(+), 3 deletions(-) > > diff --git a/src/mesa/drivers/common/meta.h b/src/mesa/drivers/common/meta.h > index fe43915..e42ddc8 100644 > --- a/src/mesa/drivers/common/meta.h > +++ b/src/mesa/drivers/common/meta.h > @@ -285,9 +285,11 @@ enum blit_msaa_shader { > BLIT_2X_MSAA_SHADER_2D_MULTISAMPLE_SCALED_RESOLVE, > BLIT_4X_MSAA_SHADER_2D_MULTISAMPLE_SCALED_RESOLVE, > BLIT_8X_MSAA_SHADER_2D_MULTISAMPLE_SCALED_RESOLVE, > + BLIT_16X_MSAA_SHADER_2D_MULTISAMPLE_SCALED_RESOLVE, > BLIT_2X_MSAA_SHADER_2D_MULTISAMPLE_ARRAY_SCALED_RESOLVE, > BLIT_4X_MSAA_SHADER_2D_MULTISAMPLE_ARRAY_SCALED_RESOLVE, > BLIT_8X_MSAA_SHADER_2D_MULTISAMPLE_ARRAY_SCALED_RESOLVE, > + BLIT_16X_MSAA_SHADER_2D_MULTISAMPLE_ARRAY_SCALED_RESOLVE, > BLIT_MSAA_SHADER_COUNT, > }; > > diff --git a/src/mesa/drivers/common/meta_blit.c > b/src/mesa/drivers/common/meta_blit.c > index a41fe42..b545c37 100644 > --- a/src/mesa/drivers/common/meta_blit.c > +++ b/src/mesa/drivers/common/meta_blit.c > @@ -86,8 +86,8 @@ setup_glsl_msaa_blit_scaled_shader(struct gl_context *ctx, > while (samples >> (shader_offset + 1)) { >shader_offset++; > } > - /* Update the assert if we plan to support more than 8X MSAA. */ > - assert(shader_offset > 0 && shader_offset < 4); > + /* Update the assert if we plan to support more than 16X MSAA. */ > + assert(shader_offset > 0 && shader_offset <= 4); > > assert(target == GL_TEXTURE_2D_MULTISAMPLE || >target == GL_TEXTURE_2D_MULTISAMPLE_ARRAY); > @@ -132,6 +132,10 @@ setup_glsl_msaa_blit_scaled_shader(struct gl_context > *ctx, >sample_number = "sample_map[int(2 * fract(coord.x) + 8 * > fract(coord.y))]"; >sample_map = ctx->Const.SampleMap8x; >break; > + case 16: > + sample_number = "sample_map[int(4 * fract(coord.x) + 16 * > fract(coord.y))]"; > + sample_map = ctx->Const.SampleMap16x; > + break; > default: >sample_number = NULL; >sample_map = NULL; > diff --git a/src/mesa/drivers/dri/i965/gen6_multisample_state.c > b/src/mesa/drivers/dri/i965/gen6_multisample_state.c > index 49c6eba..8eb620d 100644 > --- a/src/mesa/drivers/dri/i965/gen6_multisample_state.c > +++ b/src/mesa/drivers/dri/i965/gen6_multisample_state.c > @@ -91,6 +91,17 @@ gen6_get_sample_position(struct gl_context *ctx, > * | 6 | 7 | | 7 | 1 | > * - - > * > + * 16X MSAA sample index layout 16x MSAA sample number layout > + * -- > + * | 0 | 1 | 2 | 3 ||15 |10 | 9 | 7 | > + * -- > + * | 4 | 5 | 6 | 7 || 4 | 1 | 3 |13 | > + * -- > + * | 8 | 9 |10 |11 ||12 | 2 | 0 | 6 | > + * -- > + * |12 |13 |14 |15 ||11 | 8 | 5 |14 | > + * -- > + * > * A sample map is used to map sample indices to sample numbers. > */ > void > @@ -99,10 +110,13 @@ gen6_set_sample_maps(struct gl_context *ctx) > uint8_t map_2x[2] = {0, 1}; > uint8_t map_4x[4] = {0, 1, 2, 3}; > uint8_t map_8x[8] = {5, 2, 4, 6, 0, 3, 7, 1}; > + uint8_t map_16x[16] = { 15, 10, 9, 7, 4, 1, 3, 13, > + 12, 2, 0, 6, 11, 8, 5, 14 }; > > memcpy(ctx->Const.SampleMap2x, map_2x, sizeof(map_2x)); > memcpy(ctx->Const.SampleMap4x, map_4x, sizeof(map_4x)); > memcpy(ctx->Const.SampleMap8x, map_8x, sizeof(map_8x)); > + memcpy(ctx->Const.SampleMap16x, map_16x, sizeof(map_16x)); > } > > /** > diff --git a/src/mesa/main/mtypes.h b/src/mesa/main/mtypes.h > index fac45aa..e0a06f4 100644 > --- a/src/mesa/main/mtypes.h > +++ b/src/mesa/main/mtypes.h > @@
[Mesa-dev] [PATCH] meta: Handle array textures in scaled MSAA blits
From: Ian Romanick The old code had some significant problems with respect to sampler2DArray textures. The biggest problem was that some of the code would use vec3 for the texture coordinate type, and other parts of the code would use vec2. The resulting shader would not even compile. Since there were not tests for this path, nobody noticed. The input to the fragment shader is always treated as a vec3. If the source data is only vec2, the vertex puller will supply 0 for the .z component. The texture coordinate passed to the fragment shader is always a vec2 that comes from the .xy part of the vertex shader input. The layer, taken from the .z of the vertex shader input is passed separately as a flat integer. If the generated fragment shader does not use the layer integer, the GLSL linker will eliminate all the dead code in the vertex shader. Fixes the new piglit tests "blit-scaled samples=2 with gl_texture_2d_multisample_array", etc. on i965. Note for stable maintainer: This patch may depend on 46037237, and that patch should be safe for stable. Signed-off-by: Ian Romanick Cc: Anuj Phogat Cc: Topi Pohjolainen Cc: Jordan Justen Cc: "10.6 11.0" --- This is currently running through our Jenkins server. There were no regressions on my BDW system. src/mesa/drivers/common/meta_blit.c | 35 --- 1 file changed, 20 insertions(+), 15 deletions(-) diff --git a/src/mesa/drivers/common/meta_blit.c b/src/mesa/drivers/common/meta_blit.c index a41fe42..5972a5a 100644 --- a/src/mesa/drivers/common/meta_blit.c +++ b/src/mesa/drivers/common/meta_blit.c @@ -71,9 +71,7 @@ setup_glsl_msaa_blit_scaled_shader(struct gl_context *ctx, char *sample_map_str = rzalloc_size(mem_ctx, 1); char *sample_map_expr = rzalloc_size(mem_ctx, 1); char *texel_fetch_macro = rzalloc_size(mem_ctx, 1); - const char *vs_source; const char *sampler_array_suffix = ""; - const char *texcoord_type = "vec2"; float y_scale; enum blit_msaa_shader shader_index; @@ -99,7 +97,6 @@ setup_glsl_msaa_blit_scaled_shader(struct gl_context *ctx, shader_index += BLIT_2X_MSAA_SHADER_2D_MULTISAMPLE_ARRAY_SCALED_RESOLVE - BLIT_2X_MSAA_SHADER_2D_MULTISAMPLE_SCALED_RESOLVE; sampler_array_suffix = "Array"; - texcoord_type = "vec3"; } if (blit->msaa_shaders[shader_index]) { @@ -150,28 +147,37 @@ setup_glsl_msaa_blit_scaled_shader(struct gl_context *ctx, " const int sample_map[%d] = int[%d](%s);\n", samples, samples, sample_map_str); - ralloc_asprintf_append(&texel_fetch_macro, - "#define TEXEL_FETCH(coord) texelFetch(texSampler, i%s(coord), %s);\n", - texcoord_type, sample_number); + if (target == GL_TEXTURE_2D_MULTISAMPLE) { + ralloc_asprintf_append(&texel_fetch_macro, + "#define TEXEL_FETCH(coord) texelFetch(texSampler, ivec2(coord), %s);\n", + sample_number); + } else { + ralloc_asprintf_append(&texel_fetch_macro, + "#define TEXEL_FETCH(coord) texelFetch(texSampler, ivec3(coord, layer), %s);\n", + sample_number); + } - vs_source = ralloc_asprintf(mem_ctx, + static const char vs_source[] = "#version 130\n" "in vec2 position;\n" - "in %s textureCoords;\n" - "out %s texCoords;\n" + "in vec3 textureCoords;\n" + "out vec2 texCoords;\n" + "flat out int layer;\n" "void main()\n" "{\n" - " texCoords = textureCoords;\n" + " texCoords = textureCoords.xy;\n" + " layer = int(textureCoords.z);\n" " gl_Position = vec4(position, 0.0, 1.0);\n" - "}\n", - texcoord_type, - texcoord_type); + "}\n" + ; + fs_source = ralloc_asprintf(mem_ctx, "#version 130\n" "#extension GL_ARB_texture_multisample : enable\n" "uniform sampler2DMS%s texSampler;\n" "uniform float src_width, src_height;\n" - "in %s texCoords;\n" + "in vec2 texCoords;\n" + "flat in int layer;\n" "out vec4 out_color;\n" "\n" "void main()\n" @@ -212,7 +218,6 @@ setup_glsl_msaa_blit_scaled_shader(struct gl_context *ctx,
[Mesa-dev] [PATCH V3 8/8] i965: Rename intel_miptree_get_dimensions_for_image()
This function isn't specific to miptrees. So, drop the "miptree" from function name. V3: Add a comment explaining how the 1D Array texture height and depth is interpreted by Intel hardware. Cc: Chad Versace Signed-off-by: Anuj Phogat --- src/mesa/drivers/dri/i965/intel_fbo.c | 2 +- src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 12 +--- src/mesa/drivers/dri/i965/intel_mipmap_tree.h | 4 ++-- src/mesa/drivers/dri/i965/intel_tex_image.c| 3 +-- src/mesa/drivers/dri/i965/intel_tex_validate.c | 3 +-- 5 files changed, 14 insertions(+), 10 deletions(-) diff --git a/src/mesa/drivers/dri/i965/intel_fbo.c b/src/mesa/drivers/dri/i965/intel_fbo.c index b216055..6b2349e 100644 --- a/src/mesa/drivers/dri/i965/intel_fbo.c +++ b/src/mesa/drivers/dri/i965/intel_fbo.c @@ -1027,7 +1027,7 @@ intel_renderbuffer_move_to_temp(struct brw_context *brw, uint32_t layout_flags = MIPTREE_LAYOUT_ACCELERATED_UPLOAD | MIPTREE_LAYOUT_TILING_ANY; - intel_miptree_get_dimensions_for_image(rb->TexImage, &width, &height, &depth); + intel_get_image_dims(rb->TexImage, &width, &height, &depth); new_mt = intel_miptree_create(brw, rb->TexImage->TexObject->Target, intel_image->base.Base.TexFormat, diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c index 28d5031..0d1da33 100644 --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.c +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.c @@ -921,12 +921,18 @@ intel_miptree_release(struct intel_mipmap_tree **mt) *mt = NULL; } + void -intel_miptree_get_dimensions_for_image(struct gl_texture_image *image, - int *width, int *height, int *depth) +intel_get_image_dims(struct gl_texture_image *image, + int *width, int *height, int *depth) { switch (image->TexObject->Target) { case GL_TEXTURE_1D_ARRAY: + /* For a 1D Array texture the OpenGL API will treat the image height as + * the number of array slices. For Intel hardware, we treat the 1D array + * as a 2D Array with a height of 1. So, here we want to swap image + * height and depth. + */ *width = image->Width; *height = 1; *depth = image->Height; @@ -970,7 +976,7 @@ intel_miptree_match_image(struct intel_mipmap_tree *mt, if (image->TexFormat != mt_format) return false; - intel_miptree_get_dimensions_for_image(image, &width, &height, &depth); + intel_get_image_dims(image, &width, &height, &depth); if (mt->target == GL_TEXTURE_CUBE_MAP) depth = 6; diff --git a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h index bd77f95..7610d75 100644 --- a/src/mesa/drivers/dri/i965/intel_mipmap_tree.h +++ b/src/mesa/drivers/dri/i965/intel_mipmap_tree.h @@ -619,8 +619,8 @@ intel_miptree_get_image_offset(const struct intel_mipmap_tree *mt, GLuint *x, GLuint *y); void -intel_miptree_get_dimensions_for_image(struct gl_texture_image *image, - int *width, int *height, int *depth); +intel_get_image_dims(struct gl_texture_image *image, + int *width, int *height, int *depth); void intel_get_tile_masks(uint32_t tiling, uint32_t tr_mode, uint32_t cpp, diff --git a/src/mesa/drivers/dri/i965/intel_tex_image.c b/src/mesa/drivers/dri/i965/intel_tex_image.c index 93a8cde..34b91e8 100644 --- a/src/mesa/drivers/dri/i965/intel_tex_image.c +++ b/src/mesa/drivers/dri/i965/intel_tex_image.c @@ -42,8 +42,7 @@ intel_miptree_create_for_teximage(struct brw_context *brw, int width, height, depth; GLuint i; - intel_miptree_get_dimensions_for_image(&intelImage->base.Base, - &width, &height, &depth); + intel_get_image_dims(&intelImage->base.Base, &width, &height, &depth); DBG("%s\n", __func__); diff --git a/src/mesa/drivers/dri/i965/intel_tex_validate.c b/src/mesa/drivers/dri/i965/intel_tex_validate.c index d3fb252..2cf9c13 100644 --- a/src/mesa/drivers/dri/i965/intel_tex_validate.c +++ b/src/mesa/drivers/dri/i965/intel_tex_validate.c @@ -128,8 +128,7 @@ intel_finalize_mipmap_tree(struct brw_context *brw, GLuint unit) /* May need to create a new tree: */ if (!intelObj->mt) { - intel_miptree_get_dimensions_for_image(&firstImage->base.Base, -&width, &height, &depth); + intel_get_image_dims(&firstImage->base.Base, &width, &height, &depth); perf_debug("Creating new %s %dx%dx%d %d-level miptree to handle " "finalized texture miptree.\n", -- 2.4.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH V2 8/8] i965: Rename intel_miptree_get_dimensions_for_image()
On Wed, Sep 16, 2015 at 4:30 PM, Chad Versace wrote: > On Wed 19 Aug 2015, Anuj Phogat wrote: >> This function isn't specific to miptrees. So, drop the "miptree" >> from function name. >> >> Signed-off-by: Anuj Phogat >> --- >> src/mesa/drivers/dri/i965/intel_fbo.c | 2 +- >> src/mesa/drivers/dri/i965/intel_mipmap_tree.c | 6 +++--- >> src/mesa/drivers/dri/i965/intel_mipmap_tree.h | 4 ++-- >> src/mesa/drivers/dri/i965/intel_tex_image.c| 3 +-- >> src/mesa/drivers/dri/i965/intel_tex_validate.c | 3 +-- >> 5 files changed, 8 insertions(+), 10 deletions(-) > > >> @@ -928,8 +928,8 @@ intel_miptree_release(struct intel_mipmap_tree **mt) >> } >> >> void >> -intel_miptree_get_dimensions_for_image(struct gl_texture_image *image, >> - int *width, int *height, int *depth) >> +intel_get_image_dims(struct gl_texture_image *image, >> + int *width, int *height, int *depth) >> { >> switch (image->TexObject->Target) { >> case GL_TEXTURE_1D_ARRAY: > > True, the function isn't specific to miptrees. But it *is* specific to > Intel's RENDER_SURFACE_STATE, as it translates the image's (width, > height, depth), from the perspective of the OpenGL API, to the needs of > Intel hardware. > > Now that 'miptree' is removed from the function name, the function name > looks like a mere getter. In that case, it's not clear why the caller > cannot just access image->width, image->height, and image->depth > directly. > > So that we all don't forget why this function exists next year, please copy > into the function the relevant portions of this comment from > intel_miptree_create_layout(): > > /* For a 1D Array texture the OpenGL API will treat the height0 >* parameter as the number of array slices. For Intel hardware, we treat >* the 1D array as a 2D Array with a height of 1. >* >* So, when we first come through this path to create a 1D Array >* texture, height0 stores the number of slices, and depth0 is 1. In >* this case, we want to swap height0 and depth0. >* >* Since some miptrees will be created based on the base miptree, we may >* come through this path and see height0 as 1 and depth0 being the >* number of slices. In this case we don't need to do the swap. >*/ > > With such a comment, I think this patch will be ok. > I'll send out V3 with added comment. > By the way, the height<->depth adjustment that intel_miptree_create_layout() > performs directly beneath that comment, that adjustment duplicates the > height<-> adjustment done by intel_get_image_dims(). That means you may be > able > to eliminate intel_get_image_dims() completely, and rely on > intel_miptree_create_layout() to do the adjustment for you. I say "may" > because > I haven't investigated it closely enough to be confident. After looking at the usage of intel_get_image_dims() at multiple places, I couldn't see an easy way to remove the function. So, I'll keep it unchanged at the moment. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 92072] Wine breakage since d082c5324 (st/mesa: don't call st_validate_state in BlitFramebuffer)
https://bugs.freedesktop.org/show_bug.cgi?id=92072 --- Comment #3 from Samuel Pitoiset --- I can also reproduce this bug on GF119 (nvc0 driver) with master, and (obviously) reverting that commit fixes the issue, but the assert in ext_framebuffer_blit/blit-early is back. -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 06/12] i965/fs: Add a sampler program key for whether the texture is 16x MSAA
On Thu, Sep 17, 2015 at 05:00:08PM +0100, Neil Roberts wrote: > When 16x MSAA is used for sampling with texelFetch the compiler needs > to use a different instruction which passes more arguments for the MCS > data. Previously on skl+ it was unconditionally using this new > instruction. However since 16x MSAA is probably going to be pretty > rare, it is probably worthwhile to avoid using this instruction for > the other sample counts. In order to do that this patch adds a new > member to brw_sampler_prog_key_data to track when a sampler refers to > a buffer with 16 samples. > > Note that this isn't done for the vec4 backend because it wouldn't > change how many registers it uses. Hmm. I was of the other opinion actually, I'd prefer to always use the new instruction to limit the variability between tests. I suppose if we're often passing the LOD parameter that is problematic, but we don't do that either, right? > --- > src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 2 +- > src/mesa/drivers/dri/i965/brw_program.h | 7 +++ > src/mesa/drivers/dri/i965/brw_wm.c | 8 > 3 files changed, 16 insertions(+), 1 deletion(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp > b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp > index ea7b3eb..7937c48 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp > +++ b/src/mesa/drivers/dri/i965/brw_fs_visitor.cpp > @@ -287,7 +287,7 @@ fs_visitor::emit_texture(ir_texture_opcode op, >opcode = SHADER_OPCODE_TXF_LOGICAL; >break; > case ir_txf_ms: > - if (devinfo->gen >= 9) > + if ((key_tex->msaa_16 & (1 << sampler))) > opcode = SHADER_OPCODE_TXF_CMS_W_LOGICAL; >else > opcode = SHADER_OPCODE_TXF_CMS_LOGICAL; > diff --git a/src/mesa/drivers/dri/i965/brw_program.h > b/src/mesa/drivers/dri/i965/brw_program.h > index 00e8f3f..b27ab7f 100644 > --- a/src/mesa/drivers/dri/i965/brw_program.h > +++ b/src/mesa/drivers/dri/i965/brw_program.h > @@ -72,6 +72,13 @@ struct brw_sampler_prog_key_data { > uint32_t compressed_multisample_layout_mask; > > /** > +* Whether this sampler is using 16x multisampling. If so fetching from > +* this sampler will be handled with a different instruction, ld2ms_w > +* instead of ld2ms. > +*/ > + uint32_t msaa_16; > + > + /** > * For Sandybridge, which shader w/a we need for gather quirks. > */ > enum gen6_gather_sampler_wa gen6_gather_wa[MAX_SAMPLERS]; > diff --git a/src/mesa/drivers/dri/i965/brw_wm.c > b/src/mesa/drivers/dri/i965/brw_wm.c > index 35c0908..2d6f10d 100644 > --- a/src/mesa/drivers/dri/i965/brw_wm.c > +++ b/src/mesa/drivers/dri/i965/brw_wm.c > @@ -292,6 +292,9 @@ brw_debug_recompile_sampler_key(struct brw_context *brw, > found |= key_debug(brw, "compressed multisample layout", >old_key->compressed_multisample_layout_mask, >key->compressed_multisample_layout_mask); > + found |= key_debug(brw, "16x msaa", > + old_key->msaa_16, > + key->msaa_16); > > for (unsigned int i = 0; i < MAX_SAMPLERS; i++) { >found |= key_debug(brw, "textureGather workarounds", > @@ -451,6 +454,11 @@ brw_populate_sampler_prog_key_data(struct gl_context > *ctx, > if (brw->gen >= 7 && > intel_tex->mt->msaa_layout == INTEL_MSAA_LAYOUT_CMS) { > key->compressed_multisample_layout_mask |= 1 << s; > + > +if (intel_tex->mt->num_samples >= 16) { > + assert(brw->gen >= 9); > + key->msaa_16 |= 1 << s; > +} > } >} > } > -- > 1.9.3 > > ___ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 05/12] i965/vec4/skl+: Use lcd2dms_w instead of lcd2dms
On Thu, Sep 17, 2015 at 05:00:07PM +0100, Neil Roberts wrote: > In order to support 16x MSAA, skl+ has a wider version of lcd2dms that > takes two parameters for the MCS data. The MCS data in the response > still fits in a single register so we just need to ensure we copy both > values rather than just the lower one. > --- > src/mesa/drivers/dri/i965/brw_vec4.cpp | 1 + > src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 5 + > src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 14 -- > 3 files changed, 18 insertions(+), 2 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp > b/src/mesa/drivers/dri/i965/brw_vec4.cpp > index ed49cd3..22c8d01 100644 > --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp > +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp > @@ -327,6 +327,7 @@ vec4_visitor::implied_mrf_writes(vec4_instruction *inst) > case SHADER_OPCODE_TXD: > case SHADER_OPCODE_TXF: > case SHADER_OPCODE_TXF_CMS: > + case SHADER_OPCODE_TXF_CMS_W: > case SHADER_OPCODE_TXF_MCS: > case SHADER_OPCODE_TXS: > case SHADER_OPCODE_TG4: > diff --git a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp > b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp > index 1950333..c4ddff9 100644 > --- a/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp > +++ b/src/mesa/drivers/dri/i965/brw_vec4_generator.cpp > @@ -259,6 +259,10 @@ vec4_generator::generate_tex(vec4_instruction *inst, >case SHADER_OPCODE_TXF: >msg_type = GEN5_SAMPLER_MESSAGE_SAMPLE_LD; >break; > + case SHADER_OPCODE_TXF_CMS_W: > + assert(devinfo->gen >= 9); > + msg_type = GEN9_SAMPLER_MESSAGE_SAMPLE_LD2DMS_W; > + break; >case SHADER_OPCODE_TXF_CMS: > if (devinfo->gen >= 7) > msg_type = GEN7_SAMPLER_MESSAGE_SAMPLE_LD2DMS; > @@ -1372,6 +1376,7 @@ vec4_generator::generate_code(const cfg_t *cfg) >case SHADER_OPCODE_TXD: >case SHADER_OPCODE_TXF: >case SHADER_OPCODE_TXF_CMS: > + case SHADER_OPCODE_TXF_CMS_W: >case SHADER_OPCODE_TXF_MCS: >case SHADER_OPCODE_TXL: >case SHADER_OPCODE_TXS: > diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp > b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp > index 0465770..79f92d8 100644 > --- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp > +++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp > @@ -2545,7 +2545,8 @@ vec4_visitor::emit_texture(ir_texture_opcode op, > case ir_txl: opcode = SHADER_OPCODE_TXL; break; > case ir_txd: opcode = SHADER_OPCODE_TXD; break; > case ir_txf: opcode = SHADER_OPCODE_TXF; break; > - case ir_txf_ms: opcode = SHADER_OPCODE_TXF_CMS; break; > + case ir_txf_ms: opcode = (devinfo->gen >= 9 ? SHADER_OPCODE_TXF_CMS_W : > + SHADER_OPCODE_TXF_CMS); break; > case ir_txs: opcode = SHADER_OPCODE_TXS; break; > case ir_tg4: opcode = offset_value.file != BAD_FILE > ? SHADER_OPCODE_TG4_OFFSET : SHADER_OPCODE_TG4; > break; > @@ -2637,7 +2638,16 @@ vec4_visitor::emit_texture(ir_texture_opcode op, >} else if (op == ir_txf_ms) { > emit(MOV(dst_reg(MRF, param_base + 1, sample_index.type, > WRITEMASK_X), >sample_index)); > - if (devinfo->gen >= 7) { > + if (opcode == SHADER_OPCODE_TXF_CMS_W) { > +/* MCS data is stored in the first two channels of ‘mcs’, but we > + * need to get it into the .y and .z channels of the second vec4 > + * of params. > + */ I don't understand what you mean by the "second vec4". Do you just mean the .y and .z channel? > +mcs.swizzle = BRW_SWIZZLE4(0, 0, 1, 1); I don't really understand how the swizzle stuff works. Quick explanation on why it's not BRW_SWIZZLE_XYXY would be greatly appreciated. > +emit(MOV(dst_reg(MRF, param_base + 1, > + glsl_type::uint_type, WRITEMASK_YZ), > + mcs)); > + } else if (devinfo->gen >= 7) { > /* MCS data is in the first channel of `mcs`, but we need to get > it into > * the .y channel of the second vec4 of params, so replicate .x > across > * the whole vec4 and then mask off everything except .y ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 04/12] i965/fs/skl+: Use lcd2dms_w instead of lcd2dms
On Thu, Sep 17, 2015 at 05:00:06PM +0100, Neil Roberts wrote: > In order to support 16x MSAA, skl+ has a wider version of lcd2dms that > takes two parameters for the MCS data. This patch makes it allocate a > register that is twice as big for the MCS data and then always use > the wider version. > --- > src/mesa/drivers/dri/i965/brw_defines.h| 4 > src/mesa/drivers/dri/i965/brw_disasm.c | 1 + > src/mesa/drivers/dri/i965/brw_fs.cpp | 27 > -- > src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 5 + > src/mesa/drivers/dri/i965/brw_fs_visitor.cpp | 9 ++--- > src/mesa/drivers/dri/i965/brw_shader.cpp | 5 + > 6 files changed, 46 insertions(+), 5 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_defines.h > b/src/mesa/drivers/dri/i965/brw_defines.h > index 8fc8ceb..2d5b67d 100644 > --- a/src/mesa/drivers/dri/i965/brw_defines.h > +++ b/src/mesa/drivers/dri/i965/brw_defines.h > @@ -969,6 +969,8 @@ enum opcode { > FS_OPCODE_TXB_LOGICAL, > SHADER_OPCODE_TXF_CMS, > SHADER_OPCODE_TXF_CMS_LOGICAL, > + SHADER_OPCODE_TXF_CMS_W, > + SHADER_OPCODE_TXF_CMS_W_LOGICAL, > SHADER_OPCODE_TXF_UMS, > SHADER_OPCODE_TXF_UMS_LOGICAL, > SHADER_OPCODE_TXF_MCS, > @@ -1517,10 +1519,12 @@ enum brw_message_target { > #define GEN7_SAMPLER_MESSAGE_SAMPLE_GATHER4_PO 17 > #define GEN7_SAMPLER_MESSAGE_SAMPLE_GATHER4_PO_C 18 > #define HSW_SAMPLER_MESSAGE_SAMPLE_DERIV_COMPARE 20 > +#define GEN9_SAMPLER_MESSAGE_SAMPLE_LD2DMS_W 28 > #define GEN7_SAMPLER_MESSAGE_SAMPLE_LD_MCS 29 > #define GEN7_SAMPLER_MESSAGE_SAMPLE_LD2DMS 30 > #define GEN7_SAMPLER_MESSAGE_SAMPLE_LD2DSS 31 > > + > /* for GEN5 only */ > #define BRW_SAMPLER_SIMD_MODE_SIMD4X2 0 > #define BRW_SAMPLER_SIMD_MODE_SIMD8 1 > diff --git a/src/mesa/drivers/dri/i965/brw_disasm.c > b/src/mesa/drivers/dri/i965/brw_disasm.c > index db23a18..5aca728 100644 > --- a/src/mesa/drivers/dri/i965/brw_disasm.c > +++ b/src/mesa/drivers/dri/i965/brw_disasm.c > @@ -622,6 +622,7 @@ static const char *const gen5_sampler_msg_type[] = { > [GEN7_SAMPLER_MESSAGE_SAMPLE_GATHER4_PO] = "gather4_po", > [GEN7_SAMPLER_MESSAGE_SAMPLE_GATHER4_PO_C] = "gather4_po_c", > [HSW_SAMPLER_MESSAGE_SAMPLE_DERIV_COMPARE] = "sample_d_c", > + [GEN9_SAMPLER_MESSAGE_SAMPLE_LD2DMS_W] = "ld2dms_w", > [GEN7_SAMPLER_MESSAGE_SAMPLE_LD_MCS] = "ld_mcs", > [GEN7_SAMPLER_MESSAGE_SAMPLE_LD2DMS] = "ld2dms", > [GEN7_SAMPLER_MESSAGE_SAMPLE_LD2DSS] = "ld2dss", > diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp > b/src/mesa/drivers/dri/i965/brw_fs.cpp > index 32a2adf..82f49b4 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs.cpp > +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp > @@ -697,6 +697,7 @@ fs_inst::components_read(unsigned i) const > case SHADER_OPCODE_TXS_LOGICAL: > case FS_OPCODE_TXB_LOGICAL: > case SHADER_OPCODE_TXF_CMS_LOGICAL: > + case SHADER_OPCODE_TXF_CMS_W_LOGICAL: > case SHADER_OPCODE_TXF_UMS_LOGICAL: > case SHADER_OPCODE_TXF_MCS_LOGICAL: > case SHADER_OPCODE_LOD_LOGICAL: > @@ -712,6 +713,9 @@ fs_inst::components_read(unsigned i) const >/* Texture offset. */ >else if (i == 7) > return 2; > + /* MCS */ > + else if (i == 5 && opcode == SHADER_OPCODE_TXF_CMS_W_LOGICAL) > + return 2; >else > return 1; > > @@ -873,6 +877,7 @@ fs_visitor::implied_mrf_writes(fs_inst *inst) > case SHADER_OPCODE_TXD: > case SHADER_OPCODE_TXF: > case SHADER_OPCODE_TXF_CMS: > + case SHADER_OPCODE_TXF_CMS_W: > case SHADER_OPCODE_TXF_MCS: > case SHADER_OPCODE_TG4: > case SHADER_OPCODE_TG4_OFFSET: > @@ -3816,17 +3821,31 @@ lower_sampler_logical_send_gen7(const fs_builder > &bld, fs_inst *inst, opcode op, >coordinate_done = true; >break; > case SHADER_OPCODE_TXF_CMS: > + case SHADER_OPCODE_TXF_CMS_W: > case SHADER_OPCODE_TXF_UMS: > case SHADER_OPCODE_TXF_MCS: > - if (op == SHADER_OPCODE_TXF_UMS || op == SHADER_OPCODE_TXF_CMS) { > + if (op == SHADER_OPCODE_TXF_UMS || > + op == SHADER_OPCODE_TXF_CMS || > + op == SHADER_OPCODE_TXF_CMS_W) { > bld.MOV(retype(sources[length], BRW_REGISTER_TYPE_UD), > sample_index); > length++; >} > > - if (op == SHADER_OPCODE_TXF_CMS) { > + if (op == SHADER_OPCODE_TXF_CMS || op == SHADER_OPCODE_TXF_CMS_W) { > /* Data from the multisample control surface. */ > bld.MOV(retype(sources[length], BRW_REGISTER_TYPE_UD), mcs); > length++; > + > + /* On Gen9+ we'll use lcd2ms_w instead which has two registers for > + * the MCS data. > + */ > + if (op == SHADER_OPCODE_TXF_CMS_W) { > +bld.MOV(retype(sources[length], BRW_REGISTER_TYPE_UD), > +mcs.file == IMM ? > +mcs : > +
Re: [Mesa-dev] [PATCH v5 40/70] nir: Implement __intrinsic_load_ssbo
On Mon, Sep 21, 2015 at 11:03:41AM +0200, Samuel Iglesias Gonsálvez wrote: > > > On 19/09/15 00:30, Kristian Høgsberg wrote: > > On Thu, Sep 10, 2015 at 03:35:56PM +0200, Iago Toral Quiroga wrote: > >> v2: > >> - Fix ssbo loads with boolean variables. > >> > >> Reviewed-by: Connor Abbott > >> --- > >> src/glsl/nir/glsl_to_nir.cpp| 80 > >> - > >> src/glsl/nir/nir_intrinsics.h | 2 +- > >> src/glsl/nir/nir_lower_phis_to_scalar.c | 2 + > >> 3 files changed, 81 insertions(+), 3 deletions(-) > >> > >> diff --git a/src/glsl/nir/glsl_to_nir.cpp b/src/glsl/nir/glsl_to_nir.cpp > >> index 6f1e20a..cb7b196 100644 > >> --- a/src/glsl/nir/glsl_to_nir.cpp > >> +++ b/src/glsl/nir/glsl_to_nir.cpp > >> @@ -646,11 +646,14 @@ nir_visitor::visit(ir_call *ir) > >> op = nir_intrinsic_image_size; > >>} else if (strcmp(ir->callee_name(), "__intrinsic_store_ssbo") == > >> 0) { > >> op = nir_intrinsic_store_ssbo; > >> + } else if (strcmp(ir->callee_name(), "__intrinsic_load_ssbo") == 0) > >> { > >> + op = nir_intrinsic_load_ssbo; > >>} else { > >> unreachable("not reached"); > >>} > >> > >>nir_intrinsic_instr *instr = nir_intrinsic_instr_create(shader, op); > >> + nir_alu_instr *load_ssbo_compare; > >> > >>switch (op) { > >>case nir_intrinsic_atomic_counter_read_var: > >> @@ -776,11 +779,75 @@ nir_visitor::visit(ir_call *ir) > >> instr->src[1] = evaluate_rvalue(block); > >> break; > >>} > >> + case nir_intrinsic_load_ssbo: { > >> + exec_node *param = ir->actual_parameters.get_head(); > >> + ir_rvalue *block = ((ir_instruction *)param)->as_rvalue(); > >> + > >> + param = param->get_next(); > >> + ir_rvalue *offset = ((ir_instruction *)param)->as_rvalue(); > >> + > >> + /* Check if we need the indirect version */ > >> + ir_constant *const_offset = offset->as_constant(); > >> + if (!const_offset) { > >> +op = nir_intrinsic_load_ssbo_indirect; > >> +ralloc_free(instr); > >> +instr = nir_intrinsic_instr_create(shader, op); > >> +instr->src[1] = evaluate_rvalue(offset); > >> +instr->const_index[0] = 0; > >> + } else { > >> +instr->const_index[0] = const_offset->value.u[0]; > >> + } > >> + > >> + instr->src[0] = evaluate_rvalue(block); > >> + > >> + const glsl_type *type = ir->return_deref->var->type; > >> + instr->num_components = type->vector_elements; > >> + > >> + /* Setup destination register */ > >> + nir_ssa_dest_init(&instr->instr, &instr->dest, > >> + type->vector_elements, NULL); > >> + > >> + /* Insert the created nir instruction now since in the case of > >> boolean > >> + * result we will need to emit another instruction after it > >> + */ > >> + nir_instr_insert_after_cf_list(this->cf_node_list, > >> &instr->instr); > > > > I'd prefer moving this insert into the GLSL_TYPE_BOOL condition below... > > > >> + /* > >> + * In SSBO/UBO's, a true boolean value is any non-zero value, > >> but we > >> + * consider a true boolean to be ~0. Fix this up with a != 0 > >> + * comparison. > >> + */ > >> + if (type->base_type == GLSL_TYPE_BOOL) { > > > > ... that is, here... > > > >> +nir_load_const_instr *const_zero = > >> + nir_load_const_instr_create(shader, 1); > >> +const_zero->value.u[0] = 0; > >> +nir_instr_insert_after_cf_list(this->cf_node_list, > >> + &const_zero->instr); > >> + > >> +load_ssbo_compare = nir_alu_instr_create(shader, nir_op_ine); > > > > and then, insteed of introducing and using 'load_ssbo_compare' here, > > re-use 'instr' for the compare instruction... > > > > I see why you suggest these changes but I don't like them: > > instr is a pointer to nir_instrinsic_instr, while the compare > instruction is a nir_alu_instr*, so although we are lucky because some > fields are at the same offsets for both struct base addresses(*), this > could change in the future. Right, they're different types... it's still unfortunate that the ssbo special case spills out of the nir_intrinsic_load_ssbo case in the switch statement. Instead of jumping through hoops to share the one nir_instr_insert_after_cf_list call between the switch cases, maybe it's more straightforward to move that call into each case. That way the code can explicitly decide how order the intrinsic call and other helper instructions. Then set a nir_dest pointer for the ir->return_deref handling after the switch. We can initialize the nir_dest pointers to &instr->dest before the switch for the default case. Kristian > Even if I introduce load_ssbo_compare and, af
Re: [Mesa-dev] [PATCH v5 38/70] i965/nir/vec4: Implement nir_intrinsic_store_ssbo
On Mon, Sep 21, 2015 at 12:13:27PM +0200, Iago Toral wrote: > On Fri, 2015-09-18 at 13:02 -0700, Kristian Høgsberg wrote: > > On Thu, Sep 10, 2015 at 03:35:54PM +0200, Iago Toral Quiroga wrote: > > > --- > > > src/mesa/drivers/dri/i965/brw_vec4_nir.cpp | 148 > > > + > > > 1 file changed, 148 insertions(+) > > > > > > diff --git a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp > > > b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp > > > index f47b029..450441d 100644 > > > --- a/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp > > > +++ b/src/mesa/drivers/dri/i965/brw_vec4_nir.cpp > > > @@ -23,8 +23,13 @@ > > > > > > #include "brw_nir.h" > > > #include "brw_vec4.h" > > > +#include "brw_vec4_builder.h" > > > +#include "brw_vec4_surface_builder.h" > > > #include "glsl/ir_uniform.h" > > > > > > +using namespace brw; > > > +using namespace brw::surface_access; > > > + > > > namespace brw { > > > > > > void > > > @@ -556,6 +561,149 @@ > > > vec4_visitor::nir_emit_intrinsic(nir_intrinsic_instr *instr) > > >break; > > > } > > > > > > + case nir_intrinsic_store_ssbo_indirect: > > > + has_indirect = true; > > > + /* fallthrough */ > > > + case nir_intrinsic_store_ssbo: { > > > + assert(devinfo->gen >= 7); > > > + > > > + /* Block index */ > > > + src_reg surf_index; > > > + nir_const_value *const_uniform_block = > > > + nir_src_as_const_value(instr->src[1]); > > > + if (const_uniform_block) { > > > + unsigned index = prog_data->base.binding_table.ubo_start + > > > + const_uniform_block->u[0]; > > > + surf_index = src_reg(index); > > > + brw_mark_surface_used(&prog_data->base, index); > > > + } else { > > > + surf_index = src_reg(this, glsl_type::uint_type); > > > + emit(ADD(dst_reg(surf_index), get_nir_src(instr->src[1], 1), > > > + src_reg(prog_data->base.binding_table.ubo_start))); > > > + surf_index = emit_uniformize(surf_index); > > > + > > > + brw_mark_surface_used(&prog_data->base, > > > + prog_data->base.binding_table.ubo_start + > > > + shader_prog->NumUniformBlocks - 1); > > > + } > > > + > > > + /* Offset */ > > > + src_reg offset_reg = src_reg(this, glsl_type::uint_type); > > > + unsigned const_offset_bytes = 0; > > > + if (has_indirect) { > > > + emit(MOV(dst_reg(offset_reg), get_nir_src(instr->src[2], 1))); > > > + } else { > > > + const_offset_bytes = instr->const_index[0]; > > > + emit(MOV(dst_reg(offset_reg), src_reg(const_offset_bytes))); > > > + } > > > + > > > + /* Value */ > > > + src_reg val_reg = get_nir_src(instr->src[0], 4); > > > + > > > + /* Writemask */ > > > + unsigned write_mask = instr->const_index[1]; > > > + > > > + /* IvyBridge does not have a native SIMD4x2 untyped write message > > > so untyped > > > + * writes will use SIMD8 mode. In order to hide this and keep > > > symmetry across > > > + * typed and untyped messages and across hardware platforms, the > > > + * current implementation of the untyped messages will > > > transparently convert > > > + * the SIMD4x2 payload into an equivalent SIMD8 payload by > > > transposing it > > > + * and enabling only channel X on the SEND instruction. > > > + * > > > + * The above, works well for full vector writes, but not for > > > partial writes > > > + * where we want to write some channels and not others, like when > > > we have > > > + * code such as v.xyw = vec3(1,2,4). Because the untyped write > > > messages are > > > + * quite restrictive with regards to the channel enables we can > > > configure in > > > + * the message descriptor (not all combinations are allowed) we > > > cannot simply > > > + * implement these scenarios with a single message while keeping > > > the > > > + * aforementioned symmetry in the implementation. For now we de > > > decided that > > > + * it is better to keep the symmetry to reduce complexity, so in > > > situations > > > + * such as the one described we end up emitting two untyped write > > > messages > > > + * (one for xy and another for w). > > > + * > > > + * The code below packs consecutive channels into a single write > > > message, > > > + * detects gaps in the vector write and if needed, sends a second > > > message > > > + * with the remaining channels. If in the future we decide that we > > > want to > > > + * emit a single message at the expense of losing the symmetry in > > > the > > > + * implementation we can: > > > + * > > > + * 1) For IvyBridge: Only use the red channel of the untyped write > > > SIMD8 > > > + *message payload. In this mode we can write up to 8 offsets > > > and dwords >
[Mesa-dev] [Bug 92072] Wine breakage since d082c5324 (st/mesa: don't call st_validate_state in BlitFramebuffer)
https://bugs.freedesktop.org/show_bug.cgi?id=92072 Ilia Mirkin changed: What|Removed |Added Summary|Wine breackage since |Wine breakage since |504903b827604f1a630a335d142 |d082c5324 (st/mesa: don't |31f88c2cf36be (st/mesa: |call st_validate_state in |don't call |BlitFramebuffer) |st_validate_state in| |BlitFramebuffer)| --- Comment #2 from Ilia Mirkin --- (Updating commit id in subject to upstream commit, not stable branch) -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [Bug 92072] Wine breackage since 504903b827604f1a630a335d14231f88c2cf36be (st/mesa: don't call st_validate_state in BlitFramebuffer)
https://bugs.freedesktop.org/show_bug.cgi?id=92072 Ilia Mirkin changed: What|Removed |Added Component|Drivers/DRI/nouveau |Mesa core Assignee|nouveau@lists.freedesktop.o |mesa-dev@lists.freedesktop. |rg |org QA Contact|nouveau@lists.freedesktop.o |mesa-dev@lists.freedesktop. |rg |org --- Comment #1 from Ilia Mirkin --- This appears to affect all gallium drivers -- Glenn confirmed this was also showing black on r600. i965 is unaffected. -- You are receiving this mail because: You are the QA Contact for the bug. You are the assignee for the bug. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v5] i965/vec4: refactor brw_vec4_copy_propagation.
On 22/09/15 19:27, Jason Ekstrand wrote: > On Tue, Sep 22, 2015 at 9:17 AM, Alejandro Piñeiro > wrote: >> >> On 22/09/15 18:09, Jason Ekstrand wrote: >>> On Tue, Sep 22, 2015 at 8:06 AM, Alejandro Piñeiro >>> wrote: Now it is more similar to brw_fs_copy_propagation, with three clear stages: 1) Build up the value we are propagating as if it were the source of a single MOV: 2) Check that we can propagate that value 3) Build the final value Previously everything was somewhat messed up, making the implementation on some specific cases, like knowing if you can propagate from a previous instruction even with type mismatches even messier (for example, with the need of maintaining more of one has_source_modifiers). The refactoring clears stuff, and gives support to this mentioned use case without doing anything extra (for example, only one has_source_modifiers is used). Shader-db results for vec4 programs on Haswell: total instructions in shared programs: 1683842 -> 1669037 (-0.88%) instructions in affected programs: 739837 -> 725032 (-2.00%) helped:6237 HURT: 0 v2: using 'arg' index to get the from inst was wrong, as pointed by Jason Ekstrand v3: rebased against last change on the previous patch of the series v4: don't need to track instructions on struct copy_entry, as we only set the source on a direct copy, as pointed by Jason v5: change the approach for a refactoring, as pointed by Jason --- Just the refactoring suggested at v4 review is enough to get the use case was dealing with at the beginning solved. It would be hard to split this patch in two (refactoring+solve issue). Tested with piglit without any regression. Needed to update shader-db numbers after last Matt Turner's improvements. I think that the fact of going from 30 HURT (v4) to 0 (v5) is Matt's merit. I added comments to clearly mark the different stages mentioned at v4 review (1, 2 and 3), to make the review easier, but if the patch gets approved, they can go away. >>> I'd like to keep them only without the "1)", "2)", etc. >> Ok. >> The change itself doesn't explain clearly how it got solved, but the resulting code is clearer. Thanks for the thoroughly review. .../drivers/dri/i965/brw_vec4_copy_propagation.cpp | 28 ++ 1 file changed, 18 insertions(+), 10 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp index 1522eea..8a0bd72 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp @@ -265,6 +265,9 @@ try_copy_propagate(const struct brw_device_info *devinfo, vec4_instruction *inst, int arg, struct copy_entry *entry) { + /* 1) Build up the value we are propagating as if it were the source of a +* single MOV +*/ /* For constant propagation, we only handle the same constant * across all 4 channels. Some day, we should handle the 8-bit * float vector format, which would let us constant propagate >>> Can we please kill this comment while we're here? It seems to have >>> gotten copied+pasted from constant propagation and makes no sense in >>> this context. >> Ok. >> >>> With that, and the updated comments, >>> >>> Reviewed-by: Jason Ekstrand >>> >>> I'm also running it through our CI system as I type. I'll let you >>> know how that goes when I get to my office in about an hour. >> Ok. I will wait before pushing. I should be still around. >> >>> Do you have push access yet? If not, I can make the trivial changes >>> and push this for you. >>> --Jason >> Yes, I have push access, so I can make the changes myself. As mentioned >> I will wait for the CI system outcome. > CI gives the green light. Push it! Thanks for all your hard work and > patience! > --Jason Pushed! Thanks! -- Alejandro Piñeiro (apinhe...@igalia.com) ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] nir: Always print non-identity swizzles.
Previously we would not print a swizzle on ssa_52 when only its .x component is used (as seen in the definition of ssa_53): vec3 ssa_52 = fadd ssa_51, ssa_51 vec1 ssa_53 = flog2 ssa_52 vec1 ssa_54 = flog2 ssa_52.y vec1 ssa_55 = flog2 ssa_52.z But this makes the interpretation of the RHS of the definition difficult to understand and dependent on the size of the LHS. Just print swizzles when they are not the identity swizzle, so the previous example is now printed as: vec3 ssa_52 = fadd ssa_51.xyz, ssa_51.xyz vec1 ssa_53 = flog2 ssa_52.x vec1 ssa_54 = flog2 ssa_52.y vec1 ssa_55 = flog2 ssa_52.z --- src/glsl/nir/nir_print.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/src/glsl/nir/nir_print.c b/src/glsl/nir/nir_print.c index a19aa8b..af78b0a 100644 --- a/src/glsl/nir/nir_print.c +++ b/src/glsl/nir/nir_print.c @@ -158,17 +158,25 @@ print_alu_src(nir_alu_instr *instr, unsigned src, print_state *state) print_src(&instr->src[src].src, state); bool print_swizzle = false; + unsigned used_channels = 0; + for (unsigned i = 0; i < 4; i++) { if (!nir_alu_instr_channel_used(instr, src, i)) continue; + used_channels++; + if (instr->src[src].swizzle[i] != i) { print_swizzle = true; break; } } - if (print_swizzle) { + unsigned live_channels = instr->src[src].src.is_ssa + ? instr->src[src].src.ssa->num_components + : instr->src[src].src.reg.reg->num_components; + + if (print_swizzle || used_channels != live_channels) { fprintf(fp, "."); for (unsigned i = 0; i < 4; i++) { if (!nir_alu_instr_channel_used(instr, src, i)) -- 2.4.6 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v5] i965/vec4: refactor brw_vec4_copy_propagation.
On Tue, Sep 22, 2015 at 9:17 AM, Alejandro Piñeiro wrote: > > > On 22/09/15 18:09, Jason Ekstrand wrote: >> On Tue, Sep 22, 2015 at 8:06 AM, Alejandro Piñeiro >> wrote: >>> Now it is more similar to brw_fs_copy_propagation, with three >>> clear stages: >>> >>> 1) Build up the value we are propagating as if it were the source of a >>> single MOV: >>> 2) Check that we can propagate that value >>> 3) Build the final value >>> >>> Previously everything was somewhat messed up, making the >>> implementation on some specific cases, like knowing if you can >>> propagate from a previous instruction even with type mismatches even >>> messier (for example, with the need of maintaining more of one >>> has_source_modifiers). The refactoring clears stuff, and gives >>> support to this mentioned use case without doing anything extra >>> (for example, only one has_source_modifiers is used). >>> >>> Shader-db results for vec4 programs on Haswell: >>> total instructions in shared programs: 1683842 -> 1669037 (-0.88%) >>> instructions in affected programs: 739837 -> 725032 (-2.00%) >>> helped:6237 >>> HURT: 0 >>> >>> v2: using 'arg' index to get the from inst was wrong, as pointed >>> by Jason Ekstrand >>> v3: rebased against last change on the previous patch of the series >>> v4: don't need to track instructions on struct copy_entry, as we >>> only set the source on a direct copy, as pointed by Jason >>> v5: change the approach for a refactoring, as pointed by Jason >>> --- >>> >>> Just the refactoring suggested at v4 review is enough to get the use >>> case was dealing with at the beginning solved. It would be hard >>> to split this patch in two (refactoring+solve issue). >>> >>> Tested with piglit without any regression. Needed to update shader-db >>> numbers after last Matt Turner's improvements. I think that the fact >>> of going from 30 HURT (v4) to 0 (v5) is Matt's merit. >>> >>> I added comments to clearly mark the different stages mentioned at >>> v4 review (1, 2 and 3), to make the review easier, but if the patch >>> gets approved, they can go away. >> I'd like to keep them only without the "1)", "2)", etc. > > Ok. > >>> The change itself doesn't explain clearly how it got solved, but the >>> resulting code is clearer. Thanks for the thoroughly review. >>> >>> .../drivers/dri/i965/brw_vec4_copy_propagation.cpp | 28 >>> ++ >>> 1 file changed, 18 insertions(+), 10 deletions(-) >>> >>> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp >>> b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp >>> index 1522eea..8a0bd72 100644 >>> --- a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp >>> +++ b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp >>> @@ -265,6 +265,9 @@ try_copy_propagate(const struct brw_device_info >>> *devinfo, >>> vec4_instruction *inst, >>> int arg, struct copy_entry *entry) >>> { >>> + /* 1) Build up the value we are propagating as if it were the source of >>> a >>> +* single MOV >>> +*/ >>> /* For constant propagation, we only handle the same constant >>> * across all 4 channels. Some day, we should handle the 8-bit >>> * float vector format, which would let us constant propagate >> Can we please kill this comment while we're here? It seems to have >> gotten copied+pasted from constant propagation and makes no sense in >> this context. > > Ok. > >> >> With that, and the updated comments, >> >> Reviewed-by: Jason Ekstrand >> >> I'm also running it through our CI system as I type. I'll let you >> know how that goes when I get to my office in about an hour. > > Ok. I will wait before pushing. I should be still around. > >> Do you have push access yet? If not, I can make the trivial changes >> and push this for you. >> --Jason > > Yes, I have push access, so I can make the changes myself. As mentioned > I will wait for the CI system outcome. CI gives the green light. Push it! Thanks for all your hard work and patience! --Jason >> >>> @@ -291,9 +294,9 @@ try_copy_propagate(const struct brw_device_info >>> *devinfo, >>> for (int i = 0; i < 4; i++) { >>>s[i] = BRW_GET_SWZ(entry->value[i]->swizzle, i); >>> } >>> - value.swizzle = brw_compose_swizzle(inst->src[arg].swizzle, >>> - BRW_SWIZZLE4(s[0], s[1], s[2], >>> s[3])); >>> + value.swizzle = BRW_SWIZZLE4(s[0], s[1], s[2], s[3]); >>> >>> + /* 2) Check that we can propagate that value */ >>> if (value.file != UNIFORM && >>> value.file != GRF && >>> value.file != ATTR) >>> @@ -304,13 +307,6 @@ try_copy_propagate(const struct brw_device_info >>> *devinfo, >>>return false; >>> } >>> >>> - if (inst->src[arg].abs) { >>> - value.negate = false; >>> - value.abs = true; >>> - } >>> - if (inst->src[arg].negate) >>> - value.negate = !value.ne
[Mesa-dev] [PATCH v2] i965/vec4: Don't coalesce regs in Gen6 MATH ops if reswizzle/writemask needed
Gen6 MATH instructions can not execute in align16 mode, so swizzles or writemasking are not allowed. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92033 --- I have tried to find an example where the writemask check is strictly needed in order to avoid an incorrect register coalescing, but I have failed. If I have something like this in my shader: in vec4 attr_pos; void main() { vec4 a; a.xy = sin(attr_pos); .. } It is translated into the following vec4 instructions: sin vgrf4.0:F, vgrf3.xyzw:F mov vgrf0.0.xy:F, vgrf4.xyzw:F <-- (added by emit_math to fix the writemasking issue) And converted by the opt_reduce_swizzle optimization into: sin vgrf4.0:F, vgrf3.xyzw:F mov vgrf0.0.xy:F, vgrf4.xyyy:F The swizzle in the MOV instruction is not the identity anymore, so the method "can_reswizzle" in opt_register_coalesce will return FALSE because of the swizzle-check that my first version of the patch added. Out of curiosity, I disabled the opt_reduce_swizzle optimization to see what happens. Still "can_reswizzle" returns FALSE preventing the incorrect register coalescing, because the math function sets more things than the MOV instruction; i.e. the check "(dst.writemask & ~swizzle_mask) is positive. Although I have not found an example where the writemask-check is crucial, that does not neccessarily mean that it does not exists. I am sending the second version of the patch in case you prefer to have that check in place. src/mesa/drivers/dri/i965/brw_ir_vec4.h | 3 ++- src/mesa/drivers/dri/i965/brw_vec4.cpp | 11 +-- 2 files changed, 11 insertions(+), 3 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_ir_vec4.h b/src/mesa/drivers/dri/i965/brw_ir_vec4.h index 966a410..a48bb68 100644 --- a/src/mesa/drivers/dri/i965/brw_ir_vec4.h +++ b/src/mesa/drivers/dri/i965/brw_ir_vec4.h @@ -175,7 +175,8 @@ public: bool is_send_from_grf(); unsigned regs_read(unsigned arg) const; - bool can_reswizzle(int dst_writemask, int swizzle, int swizzle_mask); + bool can_reswizzle(const struct brw_device_info *devinfo, int dst_writemask, + int swizzle, int swizzle_mask); void reswizzle(int dst_writemask, int swizzle); bool can_do_source_mods(const struct brw_device_info *devinfo); diff --git a/src/mesa/drivers/dri/i965/brw_vec4.cpp b/src/mesa/drivers/dri/i965/brw_vec4.cpp index ed49cd3..ae695f1 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4.cpp @@ -941,10 +941,17 @@ vec4_visitor::opt_set_dependency_control() } bool -vec4_instruction::can_reswizzle(int dst_writemask, +vec4_instruction::can_reswizzle(const struct brw_device_info *devinfo, +int dst_writemask, int swizzle, int swizzle_mask) { + /* Gen6 MATH instructions can not execute in align16 mode, so swizzles +* or writemasking are not allowed. */ + if (devinfo->gen == 6 && is_math() && + (swizzle != BRW_SWIZZLE_XYZW || dst_writemask != WRITEMASK_XYZW)) + return false; + /* If this instruction sets anything not referenced by swizzle, then we'd * totally break it when we reswizzle. */ @@ -1077,7 +1084,7 @@ vec4_visitor::opt_register_coalesce() break; /* If we can't handle the swizzle, bail. */ -if (!scan_inst->can_reswizzle(inst->dst.writemask, +if (!scan_inst->can_reswizzle(devinfo, inst->dst.writemask, inst->src[0].swizzle, chans_needed)) { break; -- 2.1.0 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH v5] i965/vec4: refactor brw_vec4_copy_propagation.
On 22/09/15 18:09, Jason Ekstrand wrote: > On Tue, Sep 22, 2015 at 8:06 AM, Alejandro Piñeiro > wrote: >> Now it is more similar to brw_fs_copy_propagation, with three >> clear stages: >> >> 1) Build up the value we are propagating as if it were the source of a >> single MOV: >> 2) Check that we can propagate that value >> 3) Build the final value >> >> Previously everything was somewhat messed up, making the >> implementation on some specific cases, like knowing if you can >> propagate from a previous instruction even with type mismatches even >> messier (for example, with the need of maintaining more of one >> has_source_modifiers). The refactoring clears stuff, and gives >> support to this mentioned use case without doing anything extra >> (for example, only one has_source_modifiers is used). >> >> Shader-db results for vec4 programs on Haswell: >> total instructions in shared programs: 1683842 -> 1669037 (-0.88%) >> instructions in affected programs: 739837 -> 725032 (-2.00%) >> helped:6237 >> HURT: 0 >> >> v2: using 'arg' index to get the from inst was wrong, as pointed >> by Jason Ekstrand >> v3: rebased against last change on the previous patch of the series >> v4: don't need to track instructions on struct copy_entry, as we >> only set the source on a direct copy, as pointed by Jason >> v5: change the approach for a refactoring, as pointed by Jason >> --- >> >> Just the refactoring suggested at v4 review is enough to get the use >> case was dealing with at the beginning solved. It would be hard >> to split this patch in two (refactoring+solve issue). >> >> Tested with piglit without any regression. Needed to update shader-db >> numbers after last Matt Turner's improvements. I think that the fact >> of going from 30 HURT (v4) to 0 (v5) is Matt's merit. >> >> I added comments to clearly mark the different stages mentioned at >> v4 review (1, 2 and 3), to make the review easier, but if the patch >> gets approved, they can go away. > I'd like to keep them only without the "1)", "2)", etc. Ok. >> The change itself doesn't explain clearly how it got solved, but the >> resulting code is clearer. Thanks for the thoroughly review. >> >> .../drivers/dri/i965/brw_vec4_copy_propagation.cpp | 28 >> ++ >> 1 file changed, 18 insertions(+), 10 deletions(-) >> >> diff --git a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp >> b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp >> index 1522eea..8a0bd72 100644 >> --- a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp >> +++ b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp >> @@ -265,6 +265,9 @@ try_copy_propagate(const struct brw_device_info *devinfo, >> vec4_instruction *inst, >> int arg, struct copy_entry *entry) >> { >> + /* 1) Build up the value we are propagating as if it were the source of a >> +* single MOV >> +*/ >> /* For constant propagation, we only handle the same constant >> * across all 4 channels. Some day, we should handle the 8-bit >> * float vector format, which would let us constant propagate > Can we please kill this comment while we're here? It seems to have > gotten copied+pasted from constant propagation and makes no sense in > this context. Ok. > > With that, and the updated comments, > > Reviewed-by: Jason Ekstrand > > I'm also running it through our CI system as I type. I'll let you > know how that goes when I get to my office in about an hour. Ok. I will wait before pushing. I should be still around. > Do you have push access yet? If not, I can make the trivial changes > and push this for you. > --Jason Yes, I have push access, so I can make the changes myself. As mentioned I will wait for the CI system outcome. > >> @@ -291,9 +294,9 @@ try_copy_propagate(const struct brw_device_info *devinfo, >> for (int i = 0; i < 4; i++) { >>s[i] = BRW_GET_SWZ(entry->value[i]->swizzle, i); >> } >> - value.swizzle = brw_compose_swizzle(inst->src[arg].swizzle, >> - BRW_SWIZZLE4(s[0], s[1], s[2], >> s[3])); >> + value.swizzle = BRW_SWIZZLE4(s[0], s[1], s[2], s[3]); >> >> + /* 2) Check that we can propagate that value */ >> if (value.file != UNIFORM && >> value.file != GRF && >> value.file != ATTR) >> @@ -304,13 +307,6 @@ try_copy_propagate(const struct brw_device_info >> *devinfo, >>return false; >> } >> >> - if (inst->src[arg].abs) { >> - value.negate = false; >> - value.abs = true; >> - } >> - if (inst->src[arg].negate) >> - value.negate = !value.negate; >> - >> bool has_source_modifiers = value.negate || value.abs; >> >> /* gen6 math and gen7+ SENDs from GRFs ignore source modifiers on >> @@ -376,6 +372,16 @@ try_copy_propagate(const struct brw_device_info >> *devinfo, >>} >> } >> >> + /* 3) Build the final v
Re: [Mesa-dev] [PATCH v5] i965/vec4: refactor brw_vec4_copy_propagation.
On Tue, Sep 22, 2015 at 8:06 AM, Alejandro Piñeiro wrote: > Now it is more similar to brw_fs_copy_propagation, with three > clear stages: > > 1) Build up the value we are propagating as if it were the source of a > single MOV: > 2) Check that we can propagate that value > 3) Build the final value > > Previously everything was somewhat messed up, making the > implementation on some specific cases, like knowing if you can > propagate from a previous instruction even with type mismatches even > messier (for example, with the need of maintaining more of one > has_source_modifiers). The refactoring clears stuff, and gives > support to this mentioned use case without doing anything extra > (for example, only one has_source_modifiers is used). > > Shader-db results for vec4 programs on Haswell: > total instructions in shared programs: 1683842 -> 1669037 (-0.88%) > instructions in affected programs: 739837 -> 725032 (-2.00%) > helped:6237 > HURT: 0 > > v2: using 'arg' index to get the from inst was wrong, as pointed > by Jason Ekstrand > v3: rebased against last change on the previous patch of the series > v4: don't need to track instructions on struct copy_entry, as we > only set the source on a direct copy, as pointed by Jason > v5: change the approach for a refactoring, as pointed by Jason > --- > > Just the refactoring suggested at v4 review is enough to get the use > case was dealing with at the beginning solved. It would be hard > to split this patch in two (refactoring+solve issue). > > Tested with piglit without any regression. Needed to update shader-db > numbers after last Matt Turner's improvements. I think that the fact > of going from 30 HURT (v4) to 0 (v5) is Matt's merit. > > I added comments to clearly mark the different stages mentioned at > v4 review (1, 2 and 3), to make the review easier, but if the patch > gets approved, they can go away. I'd like to keep them only without the "1)", "2)", etc. > The change itself doesn't explain clearly how it got solved, but the > resulting code is clearer. Thanks for the thoroughly review. > > .../drivers/dri/i965/brw_vec4_copy_propagation.cpp | 28 > ++ > 1 file changed, 18 insertions(+), 10 deletions(-) > > diff --git a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp > b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp > index 1522eea..8a0bd72 100644 > --- a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp > +++ b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp > @@ -265,6 +265,9 @@ try_copy_propagate(const struct brw_device_info *devinfo, > vec4_instruction *inst, > int arg, struct copy_entry *entry) > { > + /* 1) Build up the value we are propagating as if it were the source of a > +* single MOV > +*/ > /* For constant propagation, we only handle the same constant > * across all 4 channels. Some day, we should handle the 8-bit > * float vector format, which would let us constant propagate Can we please kill this comment while we're here? It seems to have gotten copied+pasted from constant propagation and makes no sense in this context. With that, and the updated comments, Reviewed-by: Jason Ekstrand I'm also running it through our CI system as I type. I'll let you know how that goes when I get to my office in about an hour. Do you have push access yet? If not, I can make the trivial changes and push this for you. --Jason > @@ -291,9 +294,9 @@ try_copy_propagate(const struct brw_device_info *devinfo, > for (int i = 0; i < 4; i++) { >s[i] = BRW_GET_SWZ(entry->value[i]->swizzle, i); > } > - value.swizzle = brw_compose_swizzle(inst->src[arg].swizzle, > - BRW_SWIZZLE4(s[0], s[1], s[2], s[3])); > + value.swizzle = BRW_SWIZZLE4(s[0], s[1], s[2], s[3]); > > + /* 2) Check that we can propagate that value */ > if (value.file != UNIFORM && > value.file != GRF && > value.file != ATTR) > @@ -304,13 +307,6 @@ try_copy_propagate(const struct brw_device_info *devinfo, >return false; > } > > - if (inst->src[arg].abs) { > - value.negate = false; > - value.abs = true; > - } > - if (inst->src[arg].negate) > - value.negate = !value.negate; > - > bool has_source_modifiers = value.negate || value.abs; > > /* gen6 math and gen7+ SENDs from GRFs ignore source modifiers on > @@ -376,6 +372,16 @@ try_copy_propagate(const struct brw_device_info *devinfo, >} > } > > + /* 3) Build the final value */ > + if (inst->src[arg].abs) { > + value.negate = false; > + value.abs = true; > + } > + if (inst->src[arg].negate) > + value.negate = !value.negate; > + > + value.swizzle = brw_compose_swizzle(inst->src[arg].swizzle, > + value.swizzle); > if (has_source_modifiers && > value.type
Re: [Mesa-dev] [PATCH 1/4] mesa: const-qualify buffer_object_subdata_range_good() bufObj parameter
The series is Reviewed-by: Matt Turner ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 4/4] st/mesa: remove st_bind_framebuffer()
The function was a no-op and if the ctx->Driver.BindFramebuffer pointer is null, Mesa won't try to use it. --- src/mesa/state_tracker/st_cb_fbo.c | 12 1 file changed, 12 deletions(-) diff --git a/src/mesa/state_tracker/st_cb_fbo.c b/src/mesa/state_tracker/st_cb_fbo.c index 5707590..9d06a23 100644 --- a/src/mesa/state_tracker/st_cb_fbo.c +++ b/src/mesa/state_tracker/st_cb_fbo.c @@ -388,17 +388,6 @@ st_new_renderbuffer_fb(enum pipe_format format, int samples, boolean sw) /** - * Called via ctx->Driver.BindFramebufferEXT(). - */ -static void -st_bind_framebuffer(struct gl_context *ctx, GLenum target, -struct gl_framebuffer *fb, struct gl_framebuffer *fbread) -{ - /* no-op */ -} - - -/** * Create or update the pipe_surface of a FBO renderbuffer. * This is usually called after st_finalize_texture. */ @@ -839,7 +828,6 @@ void st_init_fbo_functions(struct dd_function_table *functions) { functions->NewFramebuffer = st_new_framebuffer; functions->NewRenderbuffer = st_new_renderbuffer; - functions->BindFramebuffer = st_bind_framebuffer; functions->FramebufferRenderbuffer = _mesa_FramebufferRenderbuffer_sw; functions->RenderTexture = st_render_texture; functions->FinishRenderTexture = st_finish_render_texture; -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/4] mesa: const-qualify _mesa_base_tex_format() ctx param
--- src/mesa/main/teximage.c | 2 +- src/mesa/main/teximage.h | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/src/mesa/main/teximage.c b/src/mesa/main/teximage.c index 8913a72..9bc176a 100644 --- a/src/mesa/main/teximage.c +++ b/src/mesa/main/teximage.c @@ -130,7 +130,7 @@ adjust_for_oes_float_texture(GLenum format, GLenum type) * texture format and env mode determine the arithmetic used. */ GLint -_mesa_base_tex_format( struct gl_context *ctx, GLint internalFormat ) +_mesa_base_tex_format(const struct gl_context *ctx, GLint internalFormat) { switch (internalFormat) { case GL_ALPHA: diff --git a/src/mesa/main/teximage.h b/src/mesa/main/teximage.h index a4736b5..a434720 100644 --- a/src/mesa/main/teximage.h +++ b/src/mesa/main/teximage.h @@ -60,7 +60,7 @@ _mesa_is_zero_size_texture(const struct gl_texture_image *texImage) /*@{*/ extern GLint -_mesa_base_tex_format( struct gl_context *ctx, GLint internalFormat ); +_mesa_base_tex_format(const struct gl_context *ctx, GLint internalFormat); extern GLboolean -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/4] mesa: const-qualify buffer_object_subdata_range_good() bufObj parameter
--- src/mesa/main/bufferobj.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mesa/main/bufferobj.c b/src/mesa/main/bufferobj.c index 23da83e..f87cea9 100644 --- a/src/mesa/main/bufferobj.c +++ b/src/mesa/main/bufferobj.c @@ -246,7 +246,7 @@ bufferobj_range_mapped(const struct gl_buffer_object *obj, */ static bool buffer_object_subdata_range_good(struct gl_context *ctx, - struct gl_buffer_object *bufObj, + const struct gl_buffer_object *bufObj, GLintptr offset, GLsizeiptr size, bool mappedRange, const char *caller) { -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 3/4] mesa: const-qualify _mesa_is_legal_tex_storage_format ctx param
--- src/mesa/main/texstorage.c | 3 ++- src/mesa/main/texstorage.h | 3 ++- 2 files changed, 4 insertions(+), 2 deletions(-) diff --git a/src/mesa/main/texstorage.c b/src/mesa/main/texstorage.c index a29175e..9fd969f 100644 --- a/src/mesa/main/texstorage.c +++ b/src/mesa/main/texstorage.c @@ -202,7 +202,8 @@ update_fbo_texture(struct gl_context *ctx, struct gl_texture_object *texObj) GLboolean -_mesa_is_legal_tex_storage_format(struct gl_context *ctx, GLenum internalformat) +_mesa_is_legal_tex_storage_format(const struct gl_context *ctx, + GLenum internalformat) { /* check internal format - note that only sized formats are allowed */ switch (internalformat) { diff --git a/src/mesa/main/texstorage.h b/src/mesa/main/texstorage.h index 033ecb7..e80a9ff 100644 --- a/src/mesa/main/texstorage.h +++ b/src/mesa/main/texstorage.h @@ -111,7 +111,8 @@ _mesa_TextureStorage3DEXT(GLuint texture, GLenum target, GLsizei levels, GLsizei width, GLsizei height, GLsizei depth); extern GLboolean -_mesa_is_legal_tex_storage_format(struct gl_context *ctx, GLenum internalformat); +_mesa_is_legal_tex_storage_format(const struct gl_context *ctx, + GLenum internalformat); extern GLboolean _mesa_AllocTextureStorage_sw(struct gl_context *ctx, -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH v5] i965/vec4: refactor brw_vec4_copy_propagation.
Now it is more similar to brw_fs_copy_propagation, with three clear stages: 1) Build up the value we are propagating as if it were the source of a single MOV: 2) Check that we can propagate that value 3) Build the final value Previously everything was somewhat messed up, making the implementation on some specific cases, like knowing if you can propagate from a previous instruction even with type mismatches even messier (for example, with the need of maintaining more of one has_source_modifiers). The refactoring clears stuff, and gives support to this mentioned use case without doing anything extra (for example, only one has_source_modifiers is used). Shader-db results for vec4 programs on Haswell: total instructions in shared programs: 1683842 -> 1669037 (-0.88%) instructions in affected programs: 739837 -> 725032 (-2.00%) helped:6237 HURT: 0 v2: using 'arg' index to get the from inst was wrong, as pointed by Jason Ekstrand v3: rebased against last change on the previous patch of the series v4: don't need to track instructions on struct copy_entry, as we only set the source on a direct copy, as pointed by Jason v5: change the approach for a refactoring, as pointed by Jason --- Just the refactoring suggested at v4 review is enough to get the use case was dealing with at the beginning solved. It would be hard to split this patch in two (refactoring+solve issue). Tested with piglit without any regression. Needed to update shader-db numbers after last Matt Turner's improvements. I think that the fact of going from 30 HURT (v4) to 0 (v5) is Matt's merit. I added comments to clearly mark the different stages mentioned at v4 review (1, 2 and 3), to make the review easier, but if the patch gets approved, they can go away. The change itself doesn't explain clearly how it got solved, but the resulting code is clearer. Thanks for the thoroughly review. .../drivers/dri/i965/brw_vec4_copy_propagation.cpp | 28 ++ 1 file changed, 18 insertions(+), 10 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp index 1522eea..8a0bd72 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_copy_propagation.cpp @@ -265,6 +265,9 @@ try_copy_propagate(const struct brw_device_info *devinfo, vec4_instruction *inst, int arg, struct copy_entry *entry) { + /* 1) Build up the value we are propagating as if it were the source of a +* single MOV +*/ /* For constant propagation, we only handle the same constant * across all 4 channels. Some day, we should handle the 8-bit * float vector format, which would let us constant propagate @@ -291,9 +294,9 @@ try_copy_propagate(const struct brw_device_info *devinfo, for (int i = 0; i < 4; i++) { s[i] = BRW_GET_SWZ(entry->value[i]->swizzle, i); } - value.swizzle = brw_compose_swizzle(inst->src[arg].swizzle, - BRW_SWIZZLE4(s[0], s[1], s[2], s[3])); + value.swizzle = BRW_SWIZZLE4(s[0], s[1], s[2], s[3]); + /* 2) Check that we can propagate that value */ if (value.file != UNIFORM && value.file != GRF && value.file != ATTR) @@ -304,13 +307,6 @@ try_copy_propagate(const struct brw_device_info *devinfo, return false; } - if (inst->src[arg].abs) { - value.negate = false; - value.abs = true; - } - if (inst->src[arg].negate) - value.negate = !value.negate; - bool has_source_modifiers = value.negate || value.abs; /* gen6 math and gen7+ SENDs from GRFs ignore source modifiers on @@ -376,6 +372,16 @@ try_copy_propagate(const struct brw_device_info *devinfo, } } + /* 3) Build the final value */ + if (inst->src[arg].abs) { + value.negate = false; + value.abs = true; + } + if (inst->src[arg].negate) + value.negate = !value.negate; + + value.swizzle = brw_compose_swizzle(inst->src[arg].swizzle, + value.swizzle); if (has_source_modifiers && value.type != inst->src[arg].type) { /* We are propagating source modifiers from a MOV with a different @@ -387,8 +393,10 @@ try_copy_propagate(const struct brw_device_info *devinfo, inst->src[i].type = value.type; } inst->dst.type = value.type; - } else + } else { value.type = inst->src[arg].type; + } + inst->src[arg] = value; return true; } -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 00/24] Massive clean up in t_dd_dmatmp.h
On 22 September 2015 at 13:43, Predut, Marius wrote: > In the upstream code , 'nr' variable is used intensively in many places for > things like : > for (j = 0; j + 2 < count; j += nr ) { > nr = MIN2( currentsz, count - j ); > > But first time when it is used it isn't initialized. > I was unable to find a place (a macro or something) where "nr" is > initialized. > I suppose it is done somewhere because if NOT, THIS will be very strange how > this code works. > (Every time "nr" became "currentsz" because all the time nr is a big stack > value, so this is the reason for why code works?) > While I cannot quote the C spec, I'm pretty sure that j += nr, will be executed after nr = MIN2( currentsz, count - j ). The following simple test confirms it. int main(void) { int j, nr; for (j = 0; j < 10; printf("nr %d\n", nr), j += nr) nr = 1, printf("hello world %d\n", j); } hello world 0 nr 1 hello world 1 ... Cheers, Emil ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 00/24] Massive clean up in t_dd_dmatmp.h
In the upstream code , 'nr' variable is used intensively in many places for things like : for (j = 0; j + 2 < count; j += nr ) { nr = MIN2( currentsz, count - j ); But first time when it is used it isn't initialized. I was unable to find a place (a macro or something) where "nr" is initialized. I suppose it is done somewhere because if NOT, THIS will be very strange how this code works. (Every time "nr" became "currentsz" because all the time nr is a big stack value, so this is the reason for why code works?) -Original Message- From: mesa-dev [mailto:mesa-dev-boun...@lists.freedesktop.org] On Behalf Of Ian Romanick Sent: Tuesday, September 15, 2015 5:04 PM To: Brian Paul; mesa-dev@lists.freedesktop.org Subject: Re: [Mesa-dev] [PATCH 00/24] Massive clean up in t_dd_dmatmp.h On 09/14/2015 07:04 PM, Brian Paul wrote: > On 09/14/2015 07:01 PM, Ian Romanick wrote: >> I looked at t_dd_dmatmp.h after the previous discussions about fixing >> the "count" problem. It was a mess. The first 5 patches fix the bug. >> The remaining 19 patches delete dead code and make the file adhere to >> Mesa's coding standards. >> >> t_dd_triemit.h and t_dd_dmatmp2.h could probably use some similar >> clean up. >> >> I don't have easy access to i915, radeon, or r200 hardware right now, >> so I am only able to compile test this series. I won't be able to >> test it on hardware until after XDC (next week). > > I have a minor comment on patch 10 which could be addressed in a follow-on. > > Patch 19 does more than formatting changes (the second to last hunk is > a code transformation, but looks correct). I was putting that (and the changes from if-statements to ?: in other patches) under the loose umbrella of "formatting fixes". :) > Anyway, the series looks good to me (though I can't test it either). Marius noticed a problem in one of the patches, so I'm definitely going to wait until I can test both i915 and radeon before pushing. > Reviewed-by: Brian Paul ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 03/24] glsl: Add support for linking uniform arrays of arrays
Reviewed-by: Samuel Iglesias Gonsálvez On 17/09/15 09:02, Timothy Arceri wrote: > V2: Handle arrays of arrays in the same way structures are handled > > The ARB_arrays_of_arrays spec doesn't give very many details on how > AoA uniforms are intended to be implemented. However in the > ARB_program_interface_query spec there are details that show AoA are > intended to be handled in a similar way to structs. > > Issues 7 from the ARB_program_interface_query spec: > > We define rules consistent with our enumeration rules for > other complex types. For existing one-dimensional arrays, we enumerate > a single entry if the array is an array of basic types, or separate > entries for each array element if the array is an array of structures. > We follow similar rules here. For a uniform array such as: > >uniform vec4 a[5][4][3]; > > we enumerate twenty different entries ("a[0][0][0]" through > "a[4][3][0]"), each of which is treated as an array with three elements. > This is morally equivalent to what you'd get if you worked around the > limitation in current GLSL via: > > struct ArrayBottom { vec4 c[3]; }; > struct ArrayMid{ ArrayBottom b[3]; }; > uniform ArrayMid a[5]; > > which would enumerate "a[0].b[0].c[0]" through "a[4].b[3].c[0]". > --- > src/glsl/link_uniform_initializers.cpp | 4 +++- > src/glsl/link_uniforms.cpp | 13 + > 2 files changed, 12 insertions(+), 5 deletions(-) > > diff --git a/src/glsl/link_uniform_initializers.cpp > b/src/glsl/link_uniform_initializers.cpp > index 05000fc..f64ba1b 100644 > --- a/src/glsl/link_uniform_initializers.cpp > +++ b/src/glsl/link_uniform_initializers.cpp > @@ -179,6 +179,7 @@ set_uniform_initializer(void *mem_ctx, gl_shader_program > *prog, > const char *name, const glsl_type *type, > ir_constant *val, unsigned int boolean_true) > { > + const glsl_type *t_without_array = type->without_array(); > if (type->is_record()) { >ir_constant *field_constant; > > @@ -193,7 +194,8 @@ set_uniform_initializer(void *mem_ctx, gl_shader_program > *prog, >field_constant = (ir_constant *)field_constant->next; >} >return; > - } else if (type->is_array() && type->fields.array->is_record()) { > + } else if (t_without_array->is_record() || > + (type->is_array() && type->fields.array->is_array())) { >const glsl_type *const element_type = type->fields.array; > >for (unsigned int i = 0; i < type->length; i++) { > diff --git a/src/glsl/link_uniforms.cpp b/src/glsl/link_uniforms.cpp > index 6108270..bdc28cc 100644 > --- a/src/glsl/link_uniforms.cpp > +++ b/src/glsl/link_uniforms.cpp > @@ -143,7 +143,8 @@ program_resource_visitor::process(ir_variable *var) >recursion(var->type, &name, strlen(name), row_major, NULL, false, > record_array_count); >ralloc_free(name); > - } else if (t->without_array()->is_record()) { > + } else if (t_without_array->is_record() || > + (t->is_array() && t->fields.array->is_array())) { >char *name = ralloc_strdup(NULL, var->name); >recursion(var->type, &name, strlen(name), row_major, NULL, false, > record_array_count); > @@ -223,7 +224,8 @@ program_resource_visitor::recursion(const glsl_type *t, > char **name, > this->leave_record(t, *name, row_major); >} > } else if (t->without_array()->is_record() || > - t->without_array()->is_interface()) { > + t->without_array()->is_interface() || > + (t->is_array() && t->fields.array->is_array())) { >if (record_type == NULL && t->fields.array->is_record()) > record_type = t->fields.array; > > @@ -366,6 +368,7 @@ private: > { >assert(!type->without_array()->is_record()); >assert(!type->without_array()->is_interface()); > + assert(!(type->is_array() && type->fields.array->is_array())); > >(void) row_major; > > @@ -690,6 +693,7 @@ private: > { >assert(!type->without_array()->is_record()); >assert(!type->without_array()->is_interface()); > + assert(!(type->is_array() && type->fields.array->is_array())); > >unsigned id; >bool found = this->map->get(id, name); > @@ -769,7 +773,7 @@ private: > >if (type->is_array()) { > this->uniforms[id].array_stride = > -glsl_align(type->fields.array->std140_size(row_major), 16); > +glsl_align(type->without_array()->std140_size(row_major), 16); >} else { > this->uniforms[id].array_stride = 0; >} > @@ -923,7 +927,8 @@ link_update_uniform_buffer_variables(struct gl_shader > *shader) > >if (var->type->is_record()) { > sentinel = '.'; > - } else if (var->type->without_array()->is_record()) { > + } else if (var->type->is_array() && > (var->type->fields.array->is_array() > +
Re: [Mesa-dev] [PATCH 23/24] i965: enable ARB_arrays_of_arrays
Reviewed-by: Samuel Iglesias Gonsálvez On 17/09/15 09:03, Timothy Arceri wrote: > --- > src/mesa/drivers/dri/i965/intel_extensions.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/src/mesa/drivers/dri/i965/intel_extensions.c > b/src/mesa/drivers/dri/i965/intel_extensions.c > index e6d39e0..9f9c6ed 100644 > --- a/src/mesa/drivers/dri/i965/intel_extensions.c > +++ b/src/mesa/drivers/dri/i965/intel_extensions.c > @@ -174,6 +174,7 @@ intelInitExtensions(struct gl_context *ctx) > > assert(brw->gen >= 4); > > + ctx->Extensions.ARB_arrays_of_arrays = true; > ctx->Extensions.ARB_buffer_storage = true; > ctx->Extensions.ARB_clear_texture = true; > ctx->Extensions.ARB_clip_control = true; > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 25/24] glsl: add std140 layout support for AoA
Reviewed-by: Samuel Iglesias Gonsálvez On 20/09/15 14:07, Timothy Arceri wrote: > --- > I noticed this problem after adding AoA support [1] to Ian's random UBO test > script [2]. > > [1] http://patchwork.freedesktop.org/patch/59956/ > [2] http://cgit.freedesktop.org/~idr/piglit/log/?h=ubo-lolz > > src/glsl/glsl_types.cpp | 13 +++-- > 1 file changed, 7 insertions(+), 6 deletions(-) > > diff --git a/src/glsl/glsl_types.cpp b/src/glsl/glsl_types.cpp > index 86f0ea5..952bd0a 100644 > --- a/src/glsl/glsl_types.cpp > +++ b/src/glsl/glsl_types.cpp > @@ -1310,8 +1310,8 @@ glsl_type::std140_size(bool row_major) const >unsigned int array_len; > >if (this->is_array()) { > - element_type = this->fields.array; > - array_len = this->length; > + element_type = this->without_array(); > + array_len = this->arrays_of_arrays_size(); >} else { >element_type = this; >array_len = 1; > @@ -1344,12 +1344,13 @@ glsl_type::std140_size(bool row_major) const > * the array are laid out in order, according to rule (9). > */ > if (this->is_array()) { > - if (this->fields.array->is_record()) { > - return this->length * this->fields.array->std140_size(row_major); > + if (this->without_array()->is_record()) { > + return this->arrays_of_arrays_size() * > +this->without_array()->std140_size(row_major); >} else { >unsigned element_base_align = > - this->fields.array->std140_base_alignment(row_major); > - return this->length * MAX2(element_base_align, 16); > + this->without_array()->std140_base_alignment(row_major); > + return this->arrays_of_arrays_size() * MAX2(element_base_align, 16); >} > } > > ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 00/24] Massive clean up in t_dd_dmatmp.h
In the upstream code , 'nr' variable is used intensively in many places for things like : for (j = 0; j + 2 < count; j += nr ) { nr = MIN2( currentsz, count - j ); But first time when it is used it isn't initialized. I was unable to find a place (a macro or something) where "nr" is initialized. I suppose it is done somewhere because if NOT, THIS will be very strange how this code works. (Every time "nr" became "currentsz" because all the time nr is a big stack value, so this is the reason for why code works?) -Original Message- From: mesa-dev [mailto:mesa-dev-boun...@lists.freedesktop.org] On Behalf Of Ian Romanick Sent: Tuesday, September 15, 2015 5:04 PM To: Brian Paul; mesa-dev@lists.freedesktop.org Subject: Re: [Mesa-dev] [PATCH 00/24] Massive clean up in t_dd_dmatmp.h On 09/14/2015 07:04 PM, Brian Paul wrote: > On 09/14/2015 07:01 PM, Ian Romanick wrote: >> I looked at t_dd_dmatmp.h after the previous discussions about fixing >> the "count" problem. It was a mess. The first 5 patches fix the bug. >> The remaining 19 patches delete dead code and make the file adhere to >> Mesa's coding standards. >> >> t_dd_triemit.h and t_dd_dmatmp2.h could probably use some similar >> clean up. >> >> I don't have easy access to i915, radeon, or r200 hardware right now, >> so I am only able to compile test this series. I won't be able to >> test it on hardware until after XDC (next week). > > I have a minor comment on patch 10 which could be addressed in a follow-on. > > Patch 19 does more than formatting changes (the second to last hunk is > a code transformation, but looks correct). I was putting that (and the changes from if-statements to ?: in other patches) under the loose umbrella of "formatting fixes". :) > Anyway, the series looks good to me (though I can't test it either). Marius noticed a problem in one of the patches, so I'm definitely going to wait until I can test both i915 and radeon before pushing. > Reviewed-by: Brian Paul ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] glsl: bail out early in _mesa_ShaderSource if no shaderobj
On Tue, 2015-09-22 at 14:34 +0300, Tapani Pälli wrote: > Patch fixes a crash in conformance test that tries out different > invalid arguments for glShaderSource and glGetShaderSource: > >ES2-CTS.gtf.GL.glGetShaderSource.getshadersource_programhandle > > This is a regression from commit: >04e201d0c02cd30ace5c6fe80e9f021ebb733682 > > Additions in v2 also fix following failing deqp test: >dEQP-GLES[2|3].functional.negative_api.shader.shader_source Nice! I wasn't expecting that :) Reviewed-by: Iago Toral Quiroga > v2: cleanup function, do check earlier (Iago Toral) > > Signed-off-by: Tapani Pälli > --- > src/mesa/main/shaderapi.c | 20 +--- > 1 file changed, 9 insertions(+), 11 deletions(-) > > diff --git a/src/mesa/main/shaderapi.c b/src/mesa/main/shaderapi.c > index f31980b..edc23bc 100644 > --- a/src/mesa/main/shaderapi.c > +++ b/src/mesa/main/shaderapi.c > @@ -931,13 +931,9 @@ get_shader_source(struct gl_context *ctx, GLuint shader, > GLsizei maxLength, > * glShaderSource[ARB]. > */ > static void > -shader_source(struct gl_context *ctx, GLuint shader, const GLchar *source) > +shader_source(struct gl_shader *sh, const GLchar *source) > { > - struct gl_shader *sh; > - > - sh = _mesa_lookup_shader_err(ctx, shader, "glShaderSource"); > - if (!sh) > - return; > + assert(sh); > > /* free old shader source string and install new one */ > free((void *)sh->Source); > @@ -1639,13 +1635,17 @@ _mesa_ShaderSource(GLhandleARB shaderObj, GLsizei > count, > GLint *offsets; > GLsizei i, totalLength; > GLcharARB *source; > + struct gl_shader *sh; > > #if defined(HAVE_SHA1) > GLcharARB *replacement; > - struct gl_shader *sh; > #endif /* HAVE_SHA1 */ > > - if (!shaderObj || string == NULL) { > + sh = _mesa_lookup_shader_err(ctx, shaderObj, "glShaderSourceARB"); > + if (!sh) > + return; > + > + if (string == NULL) { >_mesa_error(ctx, GL_INVALID_VALUE, "glShaderSourceARB"); >return; > } > @@ -1697,8 +1697,6 @@ _mesa_ShaderSource(GLhandleARB shaderObj, GLsizei count, > source[totalLength - 2] = '\0'; > > #if defined(HAVE_SHA1) > - sh = _mesa_lookup_shader(ctx, shaderObj); > - > /* Dump original shader source to MESA_SHADER_DUMP_PATH and replace > * if corresponding entry found from MESA_SHADER_READ_PATH. > */ > @@ -1711,7 +1709,7 @@ _mesa_ShaderSource(GLhandleARB shaderObj, GLsizei count, > } > #endif /* HAVE_SHA1 */ > > - shader_source(ctx, shaderObj, source); > + shader_source(sh, source); > > free(offsets); > } ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH 0/5] Enable up to 24 MRF registers in gen6
On Tue, 2015-09-22 at 08:10 +0200, Iago Toral wrote: > Hi Mark, > > On Mon, 2015-09-21 at 17:45 -0700, Mark Janes wrote: > > Hi Iago, > > > > According to my tests, this patch series fixes the gles2/gles3 > > "functional.uniform_api.random.23" tests in dEQP, on sandybridge. > > > > Do you see the same results? Should this patch series be applied to the > > stable branch? > > I can try to verify this, but I would not be surprised if that was the > case. This seems to fix a couple of bugs in bugzilla and it fixes a lot > of piglit tests in SNB when we force spilling, so if that test is > triggering spilling, there is a good chance that it had the same problem > and this patch fixed it. I'll try to confirm this today and let you > know. Yes, I see the same result. Also, I verified that the test fails without my patches and that the reason it failed before is the same as all the other cases that my patches fix: if fails to link because it can't use m14 for spilling. Iago > As for whether this should go to stable, I suppose it depends on whether > we are confident that this won't break anything else... if you have not > seen any more issues with piglit/deqp on any other platform I guess > that's a good sign, but I'll let Ken make that call. > > Iago > > > thanks, > > > > Mark > > > > Iago Toral Quiroga writes: > > > > > There are some bug reports about shaders failing to compile in gen6 > > > because MRF 14 is used when we need to spill. For example: > > > https://bugs.freedesktop.org/show_bug.cgi?id=86469 > > > https://bugs.freedesktop.org/show_bug.cgi?id=90631 > > > > > > Discussion in bugzilla pointed to the fact that gen6 might actually have > > > 24 MRF registers available instead of 16, so we could use other MRF > > > registers and avoid these conflicts (we still need to investigate why > > > some shaders need up to MRF 14 anyway, since this is not expected). > > > > > > Notice that the hardware docs are not clear about this fact: > > > > > > SNB PRM Vol4 Part2's "Table 5-4. MRF Registers Available in Device > > > Hardware" says "Number per Thread" - "24 registers" > > > > > > However, SNB PRM Vol4 Part1, 1.6.1 Message Register File (MRF) says: > > > > > > "Normal threads should construct their messages in m1..m15. (...) > > > Regardless of actual hardware implementation, the thread should > > > not assume th at MRF addresses above m15 wrap to legal MRF registers." > > > > > > Therefore experimentation was necessary to evaluate if we had these extra > > > MRF registers available or not. This was tested in gen6 using MRF > > > registers 21..23 for spilling and doing a full piglit run (all.py) forcing > > > spilling of everything on the FS backend. It was also tested by doing > > > spilling of everything on both the FS and the VS backends with a piglit > > > run > > > of shader.py. In both cases no regressions were observed. In fact, many of > > > these tests where helped in the cases where we forced spilling, since that > > > triggered the same underlying problem described in the bug reports. Here > > > are > > > some results using INTEL_DEBUG=spill_fs,spill_vec4 for a shader.py run on > > > gen6 hardware: > > > > > > Using MRFs 13..15 for spilling: > > > crash: 2, fail: 113, pass: 6621, skip: 5461 > > > > > > Using MRFs 21..23 for spilling: > > > crash: 2, fail: 12, pass: 6722, skip: 5461 > > > > > > We might want to test this further with other instances of gen6 hardware > > > though... I am not sure that we can safely conclude that all > > > implementations > > > of gen6 hardware have 24 MRF registers from my tests on just one > > > particular > > > SandyBridge laptop. > > > > > > Iago Toral Quiroga (5): > > > i965: Move MRF register asserts out of brw_reg.h > > > i965: Turn BRW_MAX_MRF into a macro that accepts a hardware generation > > > i965/fs: Use MRF registers 21-23 for spilling in gen6 > > > i965/vec4: Use MRF registers 21-23 for spilling in gen6 > > > i965: Maximum allowed size of SEND messages is 15 (4 bits) > > > > > > src/mesa/drivers/dri/i965/brw_eu_emit.c| 11 + > > > src/mesa/drivers/dri/i965/brw_fs.cpp | 4 ++-- > > > src/mesa/drivers/dri/i965/brw_fs_generator.cpp | 15 > > > src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp | 27 > > > -- > > > src/mesa/drivers/dri/i965/brw_inst.h | 3 +++ > > > src/mesa/drivers/dri/i965/brw_ir_vec4.h| 2 +- > > > src/mesa/drivers/dri/i965/brw_reg.h| 9 > > > .../drivers/dri/i965/brw_schedule_instructions.cpp | 4 ++-- > > > src/mesa/drivers/dri/i965/brw_vec4_generator.cpp | 10 +--- > > > src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 15 +++- > > > 10 files changed, 61 insertions(+), 39 deletions(-) > > > > > > -- > > > 1.9.1 > > > > > > ___ > > > mesa-dev mailing list > > > mesa-dev@lists.freedesktop.org > > > http://lists.free
[Mesa-dev] [PATCH 3/3] i965: Define FIRST_SPILL_MRF and FIRST_PULL_LOAD_MRF only once and in one place
That should make tracking where we do spills and pull loads a bit easier. --- src/mesa/drivers/dri/i965/brw_fs.cpp | 2 -- src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp | 2 -- src/mesa/drivers/dri/i965/brw_inst.h | 6 ++ src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp| 3 --- 4 files changed, 6 insertions(+), 7 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index e42cc94..2d146ac 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -50,8 +50,6 @@ #include "glsl/glsl_types.h" #include "program/sampler.h" -#define FIRST_PULL_LOAD_MRF(gen) ((gen) == 6 ? 16 : 13) - using namespace brw; void diff --git a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp index 6900cee..c3a037b 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp @@ -30,8 +30,6 @@ #include "glsl/glsl_types.h" #include "glsl/ir_optimization.h" -#define FIRST_SPILL_MRF(gen) (gen == 6 ? 21 : 13) - using namespace brw; static void diff --git a/src/mesa/drivers/dri/i965/brw_inst.h b/src/mesa/drivers/dri/i965/brw_inst.h index c5132ba..ab37b70 100644 --- a/src/mesa/drivers/dri/i965/brw_inst.h +++ b/src/mesa/drivers/dri/i965/brw_inst.h @@ -42,6 +42,12 @@ extern "C" { /** Maximum SEND message length */ #define BRW_MAX_MSG_LENGTH 15 +/** First MRF register used by pull loads */ +#define FIRST_SPILL_MRF(gen) ((gen) == 6 ? 21 : 13) + +/** First MRF register used by spills */ +#define FIRST_PULL_LOAD_MRF(gen) ((gen) == 6 ? 16 : 13) + /* brw_context.h has a forward declaration of brw_inst, so name the struct. */ typedef struct brw_inst { uint64_t data[2]; diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp index 17516f9..4cd2aed 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp @@ -26,9 +26,6 @@ #include "glsl/ir_uniform.h" #include "program/sampler.h" -#define FIRST_SPILL_MRF(gen) (gen == 6 ? 21 : 13) -#define FIRST_PULL_LOAD_MRF(gen) (gen == 6 ? 16 : 13) - namespace brw { vec4_instruction::vec4_instruction(enum opcode opcode, const dst_reg &dst, -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 0/3] Make pull constant loads in gen6 start at MRFs 16/17
Originally, these could conflict with our spills, but now that we moved the latter to MRFs 21..23, that is no longer the case. Still, in gen6 we now use MRFs 1..15 for URB writes, so we probably want to make our pull constant loads out of that MRF space (currently, they start at MRFs 13/14). Even if we do not want to do this for some reason, I still think we should at least apply the first patch, since that plugs a hardcoded array size of 16 MRF registers. For some reason this only became a problem when I tried to move pull constant loads to MRFs > 15 and not when I did the same for spills, but it looks like the right thing to do in any case. I tested this on SandyBridge and IvyBridge and did not observe any regressions in piglit. Iago Toral Quiroga (3): i965: Fix remove_duplicate_mrf_writes so it can handle 24 MRFs in gen6 i965: make pull constant loads in gen6 start at MRFs 16/17 i965: Define FIRST_SPILL_MRF and FIRST_PULL_LOAD_MRF only once and in one place src/mesa/drivers/dri/i965/brw_fs.cpp | 6 +++--- src/mesa/drivers/dri/i965/brw_fs_reg_allocate.cpp | 2 -- src/mesa/drivers/dri/i965/brw_inst.h | 6 ++ src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp| 4 +--- 4 files changed, 10 insertions(+), 8 deletions(-) -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 2/3] i965: make pull constant loads in gen6 start at MRFs 16/17
So they do not conflict with our (un)spills (MRF 21..23) or our URB writes (MRF 1..15) --- src/mesa/drivers/dri/i965/brw_fs.cpp | 6 -- src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp | 3 ++- 2 files changed, 6 insertions(+), 3 deletions(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 81fe7f5..e42cc94 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -50,6 +50,8 @@ #include "glsl/glsl_types.h" #include "program/sampler.h" +#define FIRST_PULL_LOAD_MRF(gen) ((gen) == 6 ? 16 : 13) + using namespace brw; void @@ -210,7 +212,7 @@ fs_visitor::VARYING_PULL_CONSTANT_LOAD(const fs_builder &bld, inst->regs_written = regs_written; if (devinfo->gen < 7) { - inst->base_mrf = 13; + inst->base_mrf = FIRST_PULL_LOAD_MRF(devinfo->gen); inst->header_size = 1; if (devinfo->gen == 4) inst->mlen = 3; @@ -3008,7 +3010,7 @@ fs_visitor::lower_uniform_pull_constant_loads() * else does except for register spill/unspill, which generates and * uses its MRF within a single IR instruction. */ - inst->base_mrf = 14; + inst->base_mrf = FIRST_PULL_LOAD_MRF(devinfo->gen) + 1; inst->mlen = 1; } } diff --git a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp index 514ccd6..17516f9 100644 --- a/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp +++ b/src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp @@ -27,6 +27,7 @@ #include "program/sampler.h" #define FIRST_SPILL_MRF(gen) (gen == 6 ? 21 : 13) +#define FIRST_PULL_LOAD_MRF(gen) (gen == 6 ? 16 : 13) namespace brw { @@ -1386,7 +1387,7 @@ vec4_visitor::emit_pull_constant_load_reg(dst_reg dst, dst, surf_index, offset_reg); - pull->base_mrf = FIRST_SPILL_MRF(devinfo->gen) + 1; + pull->base_mrf = FIRST_PULL_LOAD_MRF(devinfo->gen) + 1; pull->mlen = 1; } -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/3] i965: Fix remove_duplicate_mrf_writes so it can handle 24 MRFs in gen6
--- src/mesa/drivers/dri/i965/brw_fs.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp b/src/mesa/drivers/dri/i965/brw_fs.cpp index 225a312..81fe7f5 100644 --- a/src/mesa/drivers/dri/i965/brw_fs.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp @@ -2687,7 +2687,7 @@ fs_visitor::emit_repclear_shader() bool fs_visitor::remove_duplicate_mrf_writes() { - fs_inst *last_mrf_move[16]; + fs_inst *last_mrf_move[BRW_MAX_MRF(devinfo->gen)]; bool progress = false; /* Need to update the MRF tracking for compressed instructions. */ -- 1.9.1 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] glsl: bail out early in _mesa_ShaderSource if no shaderobj
Patch fixes a crash in conformance test that tries out different invalid arguments for glShaderSource and glGetShaderSource: ES2-CTS.gtf.GL.glGetShaderSource.getshadersource_programhandle This is a regression from commit: 04e201d0c02cd30ace5c6fe80e9f021ebb733682 Additions in v2 also fix following failing deqp test: dEQP-GLES[2|3].functional.negative_api.shader.shader_source v2: cleanup function, do check earlier (Iago Toral) Signed-off-by: Tapani Pälli --- src/mesa/main/shaderapi.c | 20 +--- 1 file changed, 9 insertions(+), 11 deletions(-) diff --git a/src/mesa/main/shaderapi.c b/src/mesa/main/shaderapi.c index f31980b..edc23bc 100644 --- a/src/mesa/main/shaderapi.c +++ b/src/mesa/main/shaderapi.c @@ -931,13 +931,9 @@ get_shader_source(struct gl_context *ctx, GLuint shader, GLsizei maxLength, * glShaderSource[ARB]. */ static void -shader_source(struct gl_context *ctx, GLuint shader, const GLchar *source) +shader_source(struct gl_shader *sh, const GLchar *source) { - struct gl_shader *sh; - - sh = _mesa_lookup_shader_err(ctx, shader, "glShaderSource"); - if (!sh) - return; + assert(sh); /* free old shader source string and install new one */ free((void *)sh->Source); @@ -1639,13 +1635,17 @@ _mesa_ShaderSource(GLhandleARB shaderObj, GLsizei count, GLint *offsets; GLsizei i, totalLength; GLcharARB *source; + struct gl_shader *sh; #if defined(HAVE_SHA1) GLcharARB *replacement; - struct gl_shader *sh; #endif /* HAVE_SHA1 */ - if (!shaderObj || string == NULL) { + sh = _mesa_lookup_shader_err(ctx, shaderObj, "glShaderSourceARB"); + if (!sh) + return; + + if (string == NULL) { _mesa_error(ctx, GL_INVALID_VALUE, "glShaderSourceARB"); return; } @@ -1697,8 +1697,6 @@ _mesa_ShaderSource(GLhandleARB shaderObj, GLsizei count, source[totalLength - 2] = '\0'; #if defined(HAVE_SHA1) - sh = _mesa_lookup_shader(ctx, shaderObj); - /* Dump original shader source to MESA_SHADER_DUMP_PATH and replace * if corresponding entry found from MESA_SHADER_READ_PATH. */ @@ -1711,7 +1709,7 @@ _mesa_ShaderSource(GLhandleARB shaderObj, GLsizei count, } #endif /* HAVE_SHA1 */ - shader_source(ctx, shaderObj, source); + shader_source(sh, source); free(offsets); } -- 2.4.3 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] Apologies for forgetting to add Reviewed by Tapani
Hi all, I apologize for forgetting to add Tapani as the reviewer for the patches I just merged: commit 6c3de8996fbe9447e092cc75ccdd6f720fabaf4d commit cf293e518ebd847cb28e03d4378679c47548206d commit 419210005a84f1f26da353b945b3f783d53fa56a The patches was reviewed here: http://lists.freedesktop.org/archives/mesa-dev/2015-September/093432.html BR, Marta --- Intel Sweden AB Registered Office: Knarrarnasgatan 15, 164 40 Kista, Stockholm, Sweden Registration Number: 556189-6027 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
Re: [Mesa-dev] [PATCH] i965/vec4: Don't coalesce registers in gen6 math ops if reswizzling needed
On mar, 2015-09-22 at 11:22 +0200, Antía Puentes wrote: > I realized that if writemasking is also unsupported I should probably > update the patch to avoid coalescing if the dst_writemask parameter is > different from XYWZ. The condition in "vec4_instruction::can_reswizzle" > would be something like: > > /* Gen6 MATH instructions can not execute in align16 mode, so swizzles >* or writemasking are not allowed. */ >if (devinfo->gen == 6 && is_math() && >swizzle != BRW_SWIZZLE_XYZW && >dst_writemask != WRITEMASK_XYWZ) > return false; Ups, I meant: (swizzle != BRW_SWIZZLE_XYZW || dst_writemask != WRITEMASK_XYWZ) Sorry for the noise. ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH 1/2] glsl: fix indention in glsl_types.cpp
No functional changes. Signed-off-by: Samuel Iglesias Gonsalvez --- src/glsl/glsl_types.cpp | 140 1 file changed, 70 insertions(+), 70 deletions(-) diff --git a/src/glsl/glsl_types.cpp b/src/glsl/glsl_types.cpp index 97c79fa..b4525eb 100644 --- a/src/glsl/glsl_types.cpp +++ b/src/glsl/glsl_types.cpp @@ -45,8 +45,8 @@ glsl_type::init_ralloc_type_ctx(void) } glsl_type::glsl_type(GLenum gl_type, -glsl_base_type base_type, unsigned vector_elements, -unsigned matrix_columns, const char *name) : + glsl_base_type base_type, unsigned vector_elements, + unsigned matrix_columns, const char *name) : gl_type(gl_type), base_type(base_type), sampler_dimensionality(0), sampler_shadow(0), sampler_array(0), @@ -69,8 +69,8 @@ glsl_type::glsl_type(GLenum gl_type, } glsl_type::glsl_type(GLenum gl_type, glsl_base_type base_type, -enum glsl_sampler_dim dim, bool shadow, bool array, -unsigned type, const char *name) : + enum glsl_sampler_dim dim, bool shadow, bool array, + unsigned type, const char *name) : gl_type(gl_type), base_type(base_type), sampler_dimensionality(dim), sampler_shadow(shadow), @@ -96,7 +96,7 @@ glsl_type::glsl_type(GLenum gl_type, glsl_base_type base_type, } glsl_type::glsl_type(const glsl_struct_field *fields, unsigned num_fields, -const char *name) : + const char *name) : gl_type(0), base_type(GLSL_TYPE_STRUCT), sampler_dimensionality(0), sampler_shadow(0), sampler_array(0), @@ -112,12 +112,12 @@ glsl_type::glsl_type(const glsl_struct_field *fields, unsigned num_fields, assert(name != NULL); this->name = ralloc_strdup(this->mem_ctx, name); this->fields.structure = ralloc_array(this->mem_ctx, -glsl_struct_field, length); + glsl_struct_field, length); for (i = 0; i < length; i++) { this->fields.structure[i].type = fields[i].type; this->fields.structure[i].name = ralloc_strdup(this->fields.structure, -fields[i].name); + fields[i].name); this->fields.structure[i].location = fields[i].location; this->fields.structure[i].interpolation = fields[i].interpolation; this->fields.structure[i].centroid = fields[i].centroid; @@ -130,7 +130,7 @@ glsl_type::glsl_type(const glsl_struct_field *fields, unsigned num_fields, } glsl_type::glsl_type(const glsl_struct_field *fields, unsigned num_fields, -enum glsl_interface_packing packing, const char *name) : + enum glsl_interface_packing packing, const char *name) : gl_type(0), base_type(GLSL_TYPE_INTERFACE), sampler_dimensionality(0), sampler_shadow(0), sampler_array(0), @@ -146,11 +146,11 @@ glsl_type::glsl_type(const glsl_struct_field *fields, unsigned num_fields, assert(name != NULL); this->name = ralloc_strdup(this->mem_ctx, name); this->fields.structure = ralloc_array(this->mem_ctx, -glsl_struct_field, length); + glsl_struct_field, length); for (i = 0; i < length; i++) { this->fields.structure[i].type = fields[i].type; this->fields.structure[i].name = ralloc_strdup(this->fields.structure, -fields[i].name); + fields[i].name); this->fields.structure[i].location = fields[i].location; this->fields.structure[i].interpolation = fields[i].interpolation; this->fields.structure[i].centroid = fields[i].centroid; @@ -186,8 +186,8 @@ glsl_type::contains_sampler() const return this->fields.array->contains_sampler(); } else if (this->is_record()) { for (unsigned int i = 0; i < this->length; i++) { -if (this->fields.structure[i].type->contains_sampler()) - return true; + if (this->fields.structure[i].type->contains_sampler()) +return true; } return false; } else { @@ -203,8 +203,8 @@ glsl_type::contains_integer() const return this->fields.array->contains_integer(); } else if (this->is_record()) { for (unsigned int i = 0; i < this->length; i++) { -if (this->fields.structure[i].type->contains_integer()) - return true; + if (this->fields.structure[i].type->contains_integer()) +return true; } return false; } else { @@ -219,8 +219,8 @@ glsl_type::contains_double() const return this->fields.array->contains_double(); } else if (this->is_record()) { for (unsigned int i = 0; i < this->length; i++) { -if (th
[Mesa-dev] [PATCH 2/2] glsl: Add unsized array support to glsl_type::std140_size()
--- src/glsl/glsl_types.cpp | 7 ++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/src/glsl/glsl_types.cpp b/src/glsl/glsl_types.cpp index b4525eb..07d7248 100644 --- a/src/glsl/glsl_types.cpp +++ b/src/glsl/glsl_types.cpp @@ -1351,7 +1351,7 @@ glsl_type::std140_size(bool row_major) const * rounded up to the next multiple of the base alignment of the * structure. */ - if (this->is_record()) { + if (this->is_record() || this->is_interface()) { unsigned size = 0; unsigned max_align = 0; @@ -1367,6 +1367,11 @@ glsl_type::std140_size(bool row_major) const const struct glsl_type *field_type = this->fields.structure[i].type; unsigned align = field_type->std140_base_alignment(field_row_major); + + /* Ignore unsized arrays when calculating size */ + if (field_type->is_unsized_array()) +continue; + size = glsl_align(size, align); size += field_type->std140_size(field_row_major); -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] glsl: Add parser/compiler support for std430 interface packing qualifier
v2: - Fix a missing check in has_layout() v3: - Mention shader storage block in error message for layout qualifiers (Kristian). Signed-off-by: Samuel Iglesias Gonsalvez Reviewed-by: Jordan Justen --- src/glsl/ast.h | 1 + src/glsl/ast_to_hir.cpp | 21 + src/glsl/ast_type.cpp| 2 ++ src/glsl/glsl_parser.yy | 2 ++ src/glsl/glsl_types.h| 3 ++- src/glsl/link_uniform_blocks.cpp | 15 --- src/mesa/main/mtypes.h | 3 ++- 7 files changed, 38 insertions(+), 9 deletions(-) diff --git a/src/glsl/ast.h b/src/glsl/ast.h index cca32b3..4c31436 100644 --- a/src/glsl/ast.h +++ b/src/glsl/ast.h @@ -491,6 +491,7 @@ struct ast_type_qualifier { /** \name Layout qualifiers for GL_ARB_uniform_buffer_object */ /** \{ */ unsigned std140:1; + unsigned std430:1; unsigned shared:1; unsigned packed:1; unsigned column_major:1; diff --git a/src/glsl/ast_to_hir.cpp b/src/glsl/ast_to_hir.cpp index 92038a6..d6071ef 100644 --- a/src/glsl/ast_to_hir.cpp +++ b/src/glsl/ast_to_hir.cpp @@ -2920,11 +2920,13 @@ apply_type_qualifier_to_variable(const struct ast_type_qualifier *qual, var->data.depth_layout = ir_depth_layout_none; if (qual->flags.q.std140 || + qual->flags.q.std430 || qual->flags.q.packed || qual->flags.q.shared) { _mesa_glsl_error(loc, state, - "uniform block layout qualifiers std140, packed, and " - "shared can only be applied to uniform blocks, not " + "uniform and shader storage block layout qualifiers " + "std140, std430, packed, and shared can only be " + "applied to uniform or shader storage blocks, not " "members"); } @@ -5691,12 +5693,14 @@ ast_process_structure_or_interface_block(exec_list *instructions, const struct ast_type_qualifier *const qual = & decl_list->type->qualifier; if (qual->flags.q.std140 || + qual->flags.q.std430 || qual->flags.q.packed || qual->flags.q.shared) { _mesa_glsl_error(&loc, state, "uniform/shader storage block layout qualifiers " - "std140, packed, and shared can only be applied " - "to uniform/shader storage blocks, not members"); + "std140, std430, packed, and shared can only be " + "applied to uniform/shader storage blocks, not " + "members"); } if (qual->flags.q.constant) { @@ -5908,6 +5912,13 @@ ast_interface_block::hir(exec_list *instructions, this->block_name); } + if (!this->layout.flags.q.buffer && + this->layout.flags.q.std430) { + _mesa_glsl_error(&loc, state, + "std430 storage block layout qualifier is supported " + "only for shader storage blocks"); + } + /* The ast_interface_block has a list of ast_declarator_lists. We * need to turn those into ir_variables with an association * with this uniform block. @@ -5917,6 +5928,8 @@ ast_interface_block::hir(exec_list *instructions, packing = GLSL_INTERFACE_PACKING_SHARED; } else if (this->layout.flags.q.packed) { packing = GLSL_INTERFACE_PACKING_PACKED; + } else if (this->layout.flags.q.std430) { + packing = GLSL_INTERFACE_PACKING_STD430; } else { /* The default layout is std140. */ diff --git a/src/glsl/ast_type.cpp b/src/glsl/ast_type.cpp index a4671e2..08a4504 100644 --- a/src/glsl/ast_type.cpp +++ b/src/glsl/ast_type.cpp @@ -65,6 +65,7 @@ ast_type_qualifier::has_layout() const || this->flags.q.depth_less || this->flags.q.depth_unchanged || this->flags.q.std140 + || this->flags.q.std430 || this->flags.q.shared || this->flags.q.column_major || this->flags.q.row_major @@ -123,6 +124,7 @@ ast_type_qualifier::merge_qualifier(YYLTYPE *loc, ubo_layout_mask.flags.q.std140 = 1; ubo_layout_mask.flags.q.packed = 1; ubo_layout_mask.flags.q.shared = 1; + ubo_layout_mask.flags.q.std430 = 1; ast_type_qualifier ubo_binding_mask; ubo_binding_mask.flags.i = 0; diff --git a/src/glsl/glsl_parser.yy b/src/glsl/glsl_parser.yy index 028974e..4cb018a 100644 --- a/src/glsl/glsl_parser.yy +++ b/src/glsl/glsl_parser.yy @@ -1199,6 +1199,8 @@ layout_qualifier_id: $$.flags.q.std140 = 1; } else if (match_layout_qualifier($1, "shared", state) == 0) { $$.flags.q.shared = 1; + } else if (match_layout_qualifier($1, "std430", state) == 0) { +$$.flags.q.std430 = 1; } else if (match_layout_qualifier($1, "column_major", state) == 0)
[Mesa-dev] [PATCH] i965/fs/nir: implement nir_intrinsic_get_buffer_size
v2: - Remove inst->regs_written assignment as the instruction only writes to one register. Signed-off-by: Samuel Iglesias Gonsalvez --- src/mesa/drivers/dri/i965/brw_fs_nir.cpp | 24 1 file changed, 24 insertions(+) diff --git a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp index e4ddadc..97aef61 100644 --- a/src/mesa/drivers/dri/i965/brw_fs_nir.cpp +++ b/src/mesa/drivers/dri/i965/brw_fs_nir.cpp @@ -1734,6 +1734,30 @@ fs_visitor::nir_emit_intrinsic(const fs_builder &bld, nir_intrinsic_instr *instr break; } + case nir_intrinsic_get_buffer_size: { + nir_const_value *const_uniform_block = nir_src_as_const_value(instr->src[0]); + unsigned ubo_index = const_uniform_block ? const_uniform_block->u[0] : 0; + int reg_width = dispatch_width / 8; + + assert(shader->base.UniformBlocks[ubo_index].IsShaderStorage); + + /* Set LOD = 0 */ + fs_reg source = fs_reg(0); + + int mlen = 1 * reg_width; + fs_reg src_payload = fs_reg(GRF, alloc.allocate(mlen), + BRW_REGISTER_TYPE_UD); + bld.LOAD_PAYLOAD(src_payload, &source, 1, 0); + + fs_reg surf_index = fs_reg(prog_data->binding_table.ubo_start + ubo_index); + fs_inst *inst = bld.emit(FS_OPCODE_GET_BUFFER_SIZE, dest, + src_payload, surf_index); + inst->header_size = 0; + inst->mlen = mlen; + bld.emit(inst); + break; + } + default: unreachable("unknown intrinsic"); } -- 2.1.4 ___ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev
[Mesa-dev] [PATCH] glsl: Move interface block processing to glsl_parser_extras.cpp
No functional changes. --- src/glsl/ast.h | 5 ++ src/glsl/glsl_parser.yy | 127 +--- src/glsl/glsl_parser_extras.cpp | 122 ++ 3 files changed, 128 insertions(+), 126 deletions(-) diff --git a/src/glsl/ast.h b/src/glsl/ast.h index 335f426..cca32b3 100644 --- a/src/glsl/ast.h +++ b/src/glsl/ast.h @@ -1172,4 +1172,9 @@ extern void check_builtin_array_max_size(const char *name, unsigned size, YYLTYPE loc, struct _mesa_glsl_parse_state *state); +extern void _mesa_ast_process_interface_block(YYLTYPE *locp, + _mesa_glsl_parse_state *state, + ast_interface_block *const block, + const struct ast_type_qualifier q); + #endif /* AST_H */ diff --git a/src/glsl/glsl_parser.yy b/src/glsl/glsl_parser.yy index 42108a3..7f00929 100644 --- a/src/glsl/glsl_parser.yy +++ b/src/glsl/glsl_parser.yy @@ -2634,132 +2634,7 @@ basic_interface_block: block->block_name = $2; block->declarations.push_degenerate_list_at_head(& $4->link); - if ($1.flags.q.buffer) { - if (!state->has_shader_storage_buffer_objects()) { -_mesa_glsl_error(& @1, state, - "#version 430 / GL_ARB_shader_storage_buffer_object " - "required for defining shader storage blocks"); - } else if (state->ARB_shader_storage_buffer_object_warn) { -_mesa_glsl_warning(& @1, state, - "#version 430 / GL_ARB_shader_storage_buffer_object " - "required for defining shader storage blocks"); - } - } else if ($1.flags.q.uniform) { - if (!state->has_uniform_buffer_objects()) { -_mesa_glsl_error(& @1, state, - "#version 140 / GL_ARB_uniform_buffer_object " - "required for defining uniform blocks"); - } else if (state->ARB_uniform_buffer_object_warn) { -_mesa_glsl_warning(& @1, state, - "#version 140 / GL_ARB_uniform_buffer_object " - "required for defining uniform blocks"); - } - } else { - if (state->es_shader || state->language_version < 150) { -_mesa_glsl_error(& @1, state, - "#version 150 required for using " - "interface blocks"); - } - } - - /* From the GLSL 1.50.11 spec, section 4.3.7 ("Interface Blocks"): - * "It is illegal to have an input block in a vertex shader - * or an output block in a fragment shader" - */ - if ((state->stage == MESA_SHADER_VERTEX) && $1.flags.q.in) { - _mesa_glsl_error(& @1, state, - "`in' interface block is not allowed for " - "a vertex shader"); - } else if ((state->stage == MESA_SHADER_FRAGMENT) && $1.flags.q.out) { - _mesa_glsl_error(& @1, state, - "`out' interface block is not allowed for " - "a fragment shader"); - } - - /* Since block arrays require names, and both features are added in - * the same language versions, we don't have to explicitly - * version-check both things. - */ - if (block->instance_name != NULL) { - state->check_version(150, 300, & @1, "interface blocks with " - "an instance name are not allowed"); - } - - uint64_t interface_type_mask; - struct ast_type_qualifier temp_type_qualifier; - - /* Get a bitmask containing only the in/out/uniform/buffer - * flags, allowing us to ignore other irrelevant flags like - * interpolation qualifiers. - */ - temp_type_qualifier.flags.i = 0; - temp_type_qualifier.flags.q.uniform = true; - temp_type_qualifier.flags.q.buffer = true; - temp_type_qualifier.flags.q.in = true; - temp_type_qualifier.flags.q.out = true; - interface_type_mask = temp_type_qualifier.flags.i; - - /* Get the block's interface qualifier. The interface_qualifier - * production rule guarantees that only one bit will be set (and - * it will be in/out/uniform). - */ - uint64_t block_interface_qualifier = $1.flags.i; - - block->layout.flags.i |= block_interface_qualifier; - - if (state->stage == MESA_SHADER_GEOMETRY && - state->has_explicit_attrib_stream()) { - /* Assign global layout's stream value. */ - block->layout.flags.q.stream = 1; - block->layout.flags.q.explicit_stream = 0; - block->layout.stream = state->out_qualifier->stream; - } - - foreach_list_typed (ast_declarator_list, member, link, &block->declarations) { - ast_type_qual
[Mesa-dev] [PATCH] glsl: Add std430 related member functions to glsl_type class
They are used to calculate size, base alignment and array stride values for a glsl_type following std430 rules. v2: - Paste OpenGL 4.3 spec wording as it mentions stride of array. (Jordan) Signed-off-by: Samuel Iglesias Gonsalvez Reviewed-by: Jordan Justen --- src/glsl/glsl_types.cpp | 207 src/glsl/glsl_types.h | 19 + 2 files changed, 226 insertions(+) diff --git a/src/glsl/glsl_types.cpp b/src/glsl/glsl_types.cpp index 07d7248..93034a6 100644 --- a/src/glsl/glsl_types.cpp +++ b/src/glsl/glsl_types.cpp @@ -1388,6 +1388,213 @@ glsl_type::std140_size(bool row_major) const return -1; } +unsigned +glsl_type::std430_base_alignment(bool row_major) const +{ + + unsigned N = is_double() ? 8 : 4; + + /* (1) If the member is a scalar consuming basic machine units, the +* base alignment is . +* +* (2) If the member is a two- or four-component vector with components +* consuming basic machine units, the base alignment is 2 or +* 4, respectively. +* +* (3) If the member is a three-component vector with components consuming +* basic machine units, the base alignment is 4. +*/ + if (this->is_scalar() || this->is_vector()) { + switch (this->vector_elements) { + case 1: + return N; + case 2: + return 2 * N; + case 3: + case 4: + return 4 * N; + } + } + + /* OpenGL 4.30 spec, section 7.6.2.2 "Standard Uniform Block Layout": +* +* "When using the std430 storage layout, shader storage blocks will be +* laid out in buffer storage identically to uniform and shader storage +* blocks using the std140 layout, except that the base alignment and +* stride of arrays of scalars and vectors in rule 4 and of structures +* in rule 9 are not rounded up a multiple of the base alignment of a vec4. +*/ + + /* (1) If the member is a scalar consuming basic machine units, the +* base alignment is . +* +* (2) If the member is a two- or four-component vector with components +* consuming basic machine units, the base alignment is 2 or +* 4, respectively. +* +* (3) If the member is a three-component vector with components consuming +* basic machine units, the base alignment is 4. +*/ + if (this->is_array()) + return this->fields.array->std430_base_alignment(row_major); + + /* (5) If the member is a column-major matrix with columns and +* rows, the matrix is stored identically to an array of +* column vectors with components each, according to +* rule (4). +* +* (7) If the member is a row-major matrix with columns and +* rows, the matrix is stored identically to an array of +* row vectors with components each, according to rule (4). +*/ + if (this->is_matrix()) { + const struct glsl_type *vec_type, *array_type; + int c = this->matrix_columns; + int r = this->vector_elements; + + if (row_major) { + vec_type = get_instance(base_type, c, 1); + array_type = glsl_type::get_array_instance(vec_type, r); + } else { + vec_type = get_instance(base_type, r, 1); + array_type = glsl_type::get_array_instance(vec_type, c); + } + + return array_type->std430_base_alignment(false); + } + + /* (9) If the member is a structure, the base alignment of the +* structure is , where is the largest base alignment +* value of any of its members, and rounded up to the base +* alignment of a vec4. The individual members of this +* sub-structure are then assigned offsets by applying this set +* of rules recursively, where the base offset of the first +* member of the sub-structure is equal to the aligned offset +* of the structure. The structure may have padding at the end; +* the base offset of the member following the sub-structure is +* rounded up to the next multiple of the base alignment of the +* structure. +*/ + if (this->is_record()) { + unsigned base_alignment = 0; + for (unsigned i = 0; i < this->length; i++) { + bool field_row_major = row_major; + const enum glsl_matrix_layout matrix_layout = +glsl_matrix_layout(this->fields.structure[i].matrix_layout); + if (matrix_layout == GLSL_MATRIX_LAYOUT_ROW_MAJOR) { +field_row_major = true; + } else if (matrix_layout == GLSL_MATRIX_LAYOUT_COLUMN_MAJOR) { +field_row_major = false; + } + + const struct glsl_type *field_type = this->fields.structure[i].type; + base_alignment = MAX2(base_alignment, + field_type->std430_base_alignment(field_row_major)); + } + return base_alignment; + } + assert(!"not reached"); + return -1; +} + +unsigned +glsl_type::std430_array_stride(bool row_major) con
[Mesa-dev] [PATCH] glsl: Add parser/compiler support for unsized array's length()
The unsized array length is computed with the following formula: array.length() = max((buffer_object_size - offset_of_array) / stride_of_array, 0) Of these, only the buffer size needs to be provided by the backends, the frontend already knows the values of the two other variables. This patch identifies the cases where we need to get the length of an unsized array, injecting ir_unop_ssbo_unsized_array_length expressions that will be lowered (in a later patch) to inject the formula mentioned above. It also adds the ir_unop_get_buffer_size expression that drivers will implement to provide the buffer length. v2: - Do not define a triop that will force backends to implement the entire formula, they should only need to provide the buffer size since the other values are known by the frontend (Curro). v3: - Call state->has_shader_storage_buffer_objects() in ast_function.cpp instead of using state->ARB_shader_storage_buffer_object_enable (Tapani). Signed-off-by: Samuel Iglesias Gonsalvez --- src/glsl/ast_function.cpp | 13 + src/glsl/ir.cpp | 7 +++ src/glsl/ir.h | 19 ++- src/glsl/ir_validate.cpp | 11 +++ src/glsl/link_uniforms.cpp| 10 -- .../drivers/dri/i965/brw_fs_channel_expressions.cpp | 2 ++ src/mesa/drivers/dri/i965/brw_vec4_visitor.cpp| 8 src/mesa/program/ir_to_mesa.cpp | 2 ++ src/mesa/state_tracker/st_glsl_to_tgsi.cpp| 5 + 9 files changed, 70 insertions(+), 7 deletions(-) diff --git a/src/glsl/ast_function.cpp b/src/glsl/ast_function.cpp index 803edf5..ff5ecb9 100644 --- a/src/glsl/ast_function.cpp +++ b/src/glsl/ast_function.cpp @@ -1593,11 +1593,16 @@ ast_function_expression::handle_method(exec_list *instructions, if (op->type->is_array()) { if (op->type->is_unsized_array()) { -_mesa_glsl_error(&loc, state, "length called on unsized array"); -goto fail; +if (!state->has_shader_storage_buffer_objects()) { + _mesa_glsl_error(&loc, state, "length called on unsized array" + " only available with " + "ARB_shader_storage_buffer_object"); +} +/* Calculate length of an unsized array in run-time */ +result = new(ctx) ir_expression(ir_unop_ssbo_unsized_array_length, op); + } else { +result = new(ctx) ir_constant(op->type->array_size()); } - - result = new(ctx) ir_constant(op->type->array_size()); } else if (op->type->is_vector()) { if (state->ARB_shading_language_420pack_enable) { /* .length() returns int. */ diff --git a/src/glsl/ir.cpp b/src/glsl/ir.cpp index b9df976..2c45b9e 100644 --- a/src/glsl/ir.cpp +++ b/src/glsl/ir.cpp @@ -342,6 +342,11 @@ ir_expression::ir_expression(int op, ir_rvalue *op0) op0->type->vector_elements, 1); break; + case ir_unop_get_buffer_size: + case ir_unop_ssbo_unsized_array_length: + this->type = glsl_type::int_type; + break; + default: assert(!"not reached: missing automatic type setup for ir_expression"); this->type = op0->type; @@ -571,6 +576,8 @@ static const char *const operator_strs[] = { "noise", "subroutine_to_int", "interpolate_at_centroid", + "get_buffer_size", + "ssbo_unsized_array_length", "+", "-", "*", diff --git a/src/glsl/ir.h b/src/glsl/ir.h index 48b6795..43a2bf0 100644 --- a/src/glsl/ir.h +++ b/src/glsl/ir.h @@ -1425,9 +1425,26 @@ enum ir_expression_operation { ir_unop_interpolate_at_centroid, /** +* Ask the driver for the total size of a buffer block. +* +* operand0 is the ir_constant buffer block index in the linked shader. +*/ + ir_unop_get_buffer_size, + + /** +* Calculate length of an unsized array inside a buffer block. +* This opcode is going to be replaced in a lowering pass inside +* the linker. +* +* operand0 is the unsized array's ir_value for the calculation +* of its length. +*/ + ir_unop_ssbo_unsized_array_length, + + /** * A sentinel marking the last of the unary operations. */ - ir_last_unop = ir_unop_interpolate_at_centroid, + ir_last_unop = ir_unop_ssbo_unsized_array_length, ir_binop_add, ir_binop_sub, diff --git a/src/glsl/ir_validate.cpp b/src/glsl/ir_validate.cpp index 3f0dea7..935571a 100644 --- a/src/glsl/ir_validate.cpp +++ b/src/glsl/ir_validate.cpp @@ -409,6 +409,17 @@ ir_validate::visit_leave(ir_expression *ir) assert(ir->operands[0]->type->is_float()); break; + case ir_unop_get_buffer_size: + assert(ir->type == glsl_type::int_type); + assert(ir->operands[0]->type ==
[Mesa-dev] [PATCH] glsl: add std430 interface packing support to ssbo related operations
v2: - Get interface packing information from interface's type, not the variable type. - Simplify is_std430 condition in emit_access() for readability (Jordan) - Add a commment explaing why array of three-component vector case is different in std430 than the rest of cases. - Add calls to std430_array_stride(). v3: - Simplify size_mul change for std430's case (Jordan) - Fix commit log lines length (Jordan) - Pass 'packing' instead of 'is_std430' to emit_access() (Kristian) Signed-off-by: Samuel Iglesias Gonsalvez Reviewed-by: Jordan Justen --- src/glsl/lower_ubo_reference.cpp | 111 --- 1 file changed, 81 insertions(+), 30 deletions(-) diff --git a/src/glsl/lower_ubo_reference.cpp b/src/glsl/lower_ubo_reference.cpp index 8694383..4aaa259 100644 --- a/src/glsl/lower_ubo_reference.cpp +++ b/src/glsl/lower_ubo_reference.cpp @@ -147,7 +147,8 @@ public: ir_rvalue **offset, unsigned *const_offset, bool *row_major, -int *matrix_columns); +int *matrix_columns, +unsigned packing); ir_expression *ubo_load(const struct glsl_type *type, ir_rvalue *offset); ir_call *ssbo_load(const struct glsl_type *type, @@ -164,7 +165,7 @@ public: void emit_access(bool is_write, ir_dereference *deref, ir_variable *base_offset, unsigned int deref_offset, bool row_major, int matrix_columns, -unsigned write_mask); +unsigned packing, unsigned write_mask); ir_visitor_status visit_enter(class ir_expression *); ir_expression *calculate_ssbo_unsized_array_length(ir_expression *expr); @@ -176,7 +177,8 @@ public: ir_variable *); ir_expression *emit_ssbo_get_buffer_size(); - unsigned calculate_unsized_array_stride(ir_dereference *deref); + unsigned calculate_unsized_array_stride(ir_dereference *deref, + unsigned packing); void *mem_ctx; struct gl_shader *shader; @@ -257,7 +259,8 @@ lower_ubo_reference_visitor::setup_for_load_or_store(ir_variable *var, ir_rvalue **offset, unsigned *const_offset, bool *row_major, - int *matrix_columns) + int *matrix_columns, + unsigned packing) { /* Determine the name of the interface block */ ir_rvalue *nonconst_block_index; @@ -343,8 +346,15 @@ lower_ubo_reference_visitor::setup_for_load_or_store(ir_variable *var, const bool array_row_major = is_dereferenced_thing_row_major(deref_array); -array_stride = deref_array->type->std140_size(array_row_major); -array_stride = glsl_align(array_stride, 16); +/* The array type will give the correct interface packing + * information + */ +if (packing == GLSL_INTERFACE_PACKING_STD430) { + array_stride = deref_array->type->std430_array_stride(array_row_major); +} else { + array_stride = deref_array->type->std140_size(array_row_major); + array_stride = glsl_align(array_stride, 16); +} } ir_rvalue *array_index = deref_array->array_index; @@ -380,7 +390,12 @@ lower_ubo_reference_visitor::setup_for_load_or_store(ir_variable *var, ralloc_free(field_deref); -unsigned field_align = type->std140_base_alignment(field_row_major); +unsigned field_align = 0; + +if (packing == GLSL_INTERFACE_PACKING_STD430) + field_align = type->std430_base_alignment(field_row_major); +else + field_align = type->std140_base_alignment(field_row_major); intra_struct_offset = glsl_align(intra_struct_offset, field_align); @@ -388,7 +403,10 @@ lower_ubo_reference_visitor::setup_for_load_or_store(ir_variable *var, deref_record->field) == 0) break; -intra_struct_offset += type->std140_size(field_row_major); +if (packing == GLSL_INTERFACE_PACKING_STD430) + intra_struct_offset += type->std430_size(field_row_major); +else + intra_struct_offset += type->std140_size(field_row_major); /* If the field just examined was itself a structure, apply rule * #9: @@ -437,13 +455,15 @@ lower_ubo_reference_visitor::handle_rvalue(ir_rvalue **rvalue) unsigned const_offset; bool row_major;