On 12/31/2011 03:52 PM, Vincent Lejeune wrote: > Current glsl_to_tgsi::remove_output_read pass did not work properly when > indirect addressing was involved ; this commit replaces it with > a lowering pass that occurs before glsl_to_tgsi visitor is called. > This patch fix varying-array related piglit test. > --- > src/glsl/Makefile.sources | 1 + > src/glsl/lower_remove_output_read.cpp | 97 > ++++++++++++++++++++++++++++ > src/glsl/lower_remove_output_read.h | 62 ++++++++++++++++++ > src/mesa/state_tracker/st_glsl_to_tgsi.cpp | 20 ++++-- > 4 files changed, 172 insertions(+), 8 deletions(-) > create mode 100644 src/glsl/lower_remove_output_read.cpp > create mode 100644 src/glsl/lower_remove_output_read.h
Vincent, I like this! I have a couple of comments below. To save some trouble, I've actually gone ahead and made the changes, and will send out a proposed v2 of this patch shortly. > diff --git a/src/glsl/Makefile.sources b/src/glsl/Makefile.sources > index c65bfe4..6c80089 100644 > --- a/src/glsl/Makefile.sources > +++ b/src/glsl/Makefile.sources > @@ -60,6 +60,7 @@ LIBGLSL_CXX_SOURCES := \ > lower_vec_index_to_cond_assign.cpp \ > lower_vec_index_to_swizzle.cpp \ > lower_vector.cpp \ > + lower_remove_output_read.cpp \ > opt_algebraic.cpp \ > opt_constant_folding.cpp \ > opt_constant_propagation.cpp \ > diff --git a/src/glsl/lower_remove_output_read.cpp > b/src/glsl/lower_remove_output_read.cpp > new file mode 100644 > index 0000000..8150580 > --- /dev/null > +++ b/src/glsl/lower_remove_output_read.cpp > @@ -0,0 +1,97 @@ > +/* > + * Copyright © 2010 Intel Corporation > + * Copyright © 2012 Vincent Lejeune > + * > + * Permission is hereby granted, free of charge, to any person obtaining a > + * copy of this software and associated documentation files (the "Software"), > + * to deal in the Software without restriction, including without limitation > + * the rights to use, copy, modify, merge, publish, distribute, sublicense, > + * and/or sell copies of the Software, and to permit persons to whom the > + * Software is furnished to do so, subject to the following conditions: > + * > + * The above copyright notice and this permission notice (including the next > + * paragraph) shall be included in all copies or substantial portions of the > + * Software. > + * > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL > + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING > + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER > + * DEALINGS IN THE SOFTWARE. > + */ > + > +#include "lower_remove_output_read.h" > +#include "ir.h" > + > + > +void > +output_read_remover::add_replacement_pair(ir_variable *output, ir_variable > *temp) > +{ > + if (replacements_count == 0) { > + replacements_array = (struct replacement_pair *) > ralloc_array_size(mem_ctx, sizeof(struct replacement_pair),1); > + } > + else { > + replacements_array = (struct replacement_pair *) > reralloc_array_size(mem_ctx,replacements_array, sizeof(struct > replacement_pair), replacements_count + 1); > + } reralloc_array_size actually does an initial allocation if the pointer you pass is NULL, so you don't need a special case for zero. Plus, you could use the handy "reralloc" macro which does the typecast and sizeof for you. This could just be: replacements_array = reralloc(mem_ctx, replacements_array, struct replacement_pair, replacements_count + 1); However, you probably ought to avoid reallocating the array each time you encounter a new shader output. (It's probably not _critical_ since there aren't typically many outputs, but still worth fixing.) The typical solution is to maintain the array size as a second counter and double the size of the array each time. > + hash_table_insert(replacements,temp,output); > + replacements_array[replacements_count].output = output; > + replacements_array[replacements_count].temp = temp; > + replacements_count++; > +} > + > +output_read_remover::output_read_remover():ir_hierarchical_visitor(), > replacements_count(0) No need to call the parent class constructor explicitly; C++ just does that for you. > +{ > + replacements = > hash_table_ctor(0,hash_table_pointer_hash,hash_table_pointer_compare); Probably want to initialize replacements_array here. > + mem_ctx = ralloc_context(NULL); > +} > + > +output_read_remover::~output_read_remover() > +{ > + hash_table_dtor(replacements); > + ralloc_free(mem_ctx); > +} > + > +ir_visitor_status > +output_read_remover::visit(ir_dereference_variable *ir) > +{ > + ir_variable* temp = (ir_variable*) hash_table_find(replacements, ir->var); > + if (temp) { > + ir->var = temp; > + } > + else if(ir->var->mode == ir_var_out) { I'd suggest checking for ir_var_out first. That way, you don't have to even bother with the hash table lookup for most variables. Faster. :) A more serious issue, however, is that ir_var_out has a double meaning: - Shader output variables - Function "out" parameters I think you're safe here since you're calling this pass at codegen time, presumably after all the functions have been inlined. Calling this on a shader that still had functions with out-params could break it horribly (since, unless it has a return statement, you'd never copy the temps back to the real out params.) That said, Eric, Ian, and I all agree that using ir_var_out to mean two things is stupid, and I believe Ian has patches floating around to fix that. Once those land, you won't have to worry about this at all. I'll ask Ian what the status on those is. > + ir_variable* temp = new (ir->var) > ir_variable(ir->var->type,ir->var->name,ir_var_temporary); Hmm. I guess using ir->var as the context works, since the shader output isn't going to get removed. I'd still feel a bit safer if we used the same allocation context as the original variable, though: ralloc_parent(ir->var). > + add_replacement_pair(ir->var,temp); > + ir->var = temp; > + } > + return visit_continue; > +} > + > +ir_visitor_status > +output_read_remover::visit_enter(ir_return *ir) I'd use visit_leave here just to be safe. Again, I think you're safe due to inlining, but...paranoia. > +{ > + for (unsigned i = 0; i < replacements_count; i++) { > + ir_dereference_variable *lhs = new (ir) > ir_dereference_variable(replacements_array[i].output); > + ir_dereference_variable *rhs = new (ir) > ir_dereference_variable(replacements_array[i].temp); > + ir_assignment* assign = new (ir) ir_assignment(lhs, rhs); > + ir->insert_before(assign); > + } > + return visit_continue; > +} > + > +ir_visitor_status > +output_read_remover::visit_leave(ir_function *f) > +{ > + if (strcmp(f->name,"main") != 0) > + return visit_continue; > + exec_list empty; > + ir_function_signature* sig = f->matching_signature(&empty); Blargh :( I guess this works, but I'm not a fan of creating a blank list of function parameters and pattern matching on signatures. You can change this to visit_leave(ir_function_signature *) and just check sig->function_name() against "main". Easier. > + for (unsigned i = 0; i < replacements_count; i++) { > + ir_dereference_variable *lhs = new (f) > ir_dereference_variable(replacements_array[i].output); > + ir_dereference_variable *rhs = new (f) > ir_dereference_variable(replacements_array[i].temp); > + ir_assignment* assign = new (f) ir_assignment(lhs, rhs); > + sig->body.push_tail(assign); > + } > + return visit_continue; > +} > diff --git a/src/glsl/lower_remove_output_read.h > b/src/glsl/lower_remove_output_read.h > new file mode 100644 > index 0000000..825047d > --- /dev/null > +++ b/src/glsl/lower_remove_output_read.h > @@ -0,0 +1,62 @@ > +/* > + * Copyright (C) 2005-2007 Brian Paul All Rights Reserved. > + * Copyright (C) 2008 VMware, Inc. All Rights Reserved. > + * Copyright © 2010 Intel Corporation > + * Copyright © 2011 Bryan Cain > + * Copyright © 2012 Vincent Lejeune I'm pretty sure that your new header file doesn't contain any code by Brian Paul, VMware, or Bryan Cain. :) It doesn't matter though; in the v2 I'm about to send out, I removed this file. > + * > + * Permission is hereby granted, free of charge, to any person obtaining a > + * copy of this software and associated documentation files (the "Software"), > + * to deal in the Software without restriction, including without limitation > + * the rights to use, copy, modify, merge, publish, distribute, sublicense, > + * and/or sell copies of the Software, and to permit persons to whom the > + * Software is furnished to do so, subject to the following conditions: > + * > + * The above copyright notice and this permission notice (including the next > + * paragraph) shall be included in all copies or substantial portions of the > + * Software. > + * > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL > + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING > + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER > + * DEALINGS IN THE SOFTWARE. > + */ > + > +#ifndef LOWER_REMOVE_OUTPUT_READ_H > +#define LOWER_REMOVE_OUTPUT_READ_H > + > +#include "ir_hierarchical_visitor.h" > +#include "program/hash_table.h" > + > +/** > + * In GLSL shaders, varying vars can be read and written. > + * On some hardware, trying to read an output register causes trouble. > + * This pass replaces every output access with a temporary variable. > + * It then adds required assignement to fill outputs. > + * > + */ > + > +class output_read_remover : public ir_hierarchical_visitor { > +protected: > + hash_table* replacements; > + struct replacement_pair { > + ir_variable *output; > + ir_variable *temp; > + }; > + struct replacement_pair *replacements_array; > + unsigned replacements_count; > + > + void add_replacement_pair(class ir_variable *, class ir_variable *); > + void *mem_ctx; > +public: > + output_read_remover(); > + ~output_read_remover(); > + virtual ir_visitor_status visit(class ir_dereference_variable *); > + virtual ir_visitor_status visit_leave(class ir_function *); > + virtual ir_visitor_status visit_enter(class ir_return *); > +}; > + > +#endif // LOWER_REMOVE_OUTPUT_READ_H > diff --git a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp > b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp > index 28b8c2a..c3df807 100644 > --- a/src/mesa/state_tracker/st_glsl_to_tgsi.cpp > +++ b/src/mesa/state_tracker/st_glsl_to_tgsi.cpp > @@ -41,6 +41,7 @@ > #include "../glsl/program.h" > #include "ir_optimization.h" > #include "ast.h" > +#include "lower_remove_output_read.h" > > #include "main/mtypes.h" > #include "main/shaderobj.h" > @@ -5023,6 +5024,17 @@ get_mesa_program(struct gl_context *ctx, > _mesa_generate_parameters_list_for_uniforms(shader_program, shader, > prog->Parameters); > > + if (!screen->get_shader_param(screen, pipe_shader_type, > + PIPE_SHADER_CAP_OUTPUT_READ)) { > + /* Remove reads to output registers, and to varyings in vertex > shaders. */ > + output_read_remover orr_v; > + foreach_list(node, shader->ir) { > + ir_instruction *inst = (ir_instruction *) node; > + inst->accept(&orr_v); > + } > + } You can actually just use visit_list_elements(&orr_v, shader_>ir). Also, in most other places, we just provide a wrapper function (lower_output_reads(exec_list *)) that does the lowering for you. It's simpler to use, and also hides the visitor class completely, so you don't need to put it in a public header file. > + > /* Emit intermediate IR for main(). */ > visit_exec_list(shader->ir, v); > > @@ -5069,14 +5081,6 @@ get_mesa_program(struct gl_context *ctx, > } > #endif > > - if (!screen->get_shader_param(screen, pipe_shader_type, > - PIPE_SHADER_CAP_OUTPUT_READ)) { > - /* Remove reads to output registers, and to varyings in vertex > shaders. */ > - v->remove_output_reads(PROGRAM_OUTPUT); > - if (target == GL_VERTEX_PROGRAM_ARB) > - v->remove_output_reads(PROGRAM_VARYING); > - } I'm pretty sure that glsl_to_tgsi_visitor::remove_output_reads is dead after this change, so you probably want to delete it. Also, I'd split the patch up a bit: (1) add the new pass, (2) switch to the new pass, (3) delete the old pass. > /* Perform optimizations on the instructions in the glsl_to_tgsi_visitor. > */ > v->simplify_cmp(); > v->copy_propagate(); _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/mesa-dev