On Sat, Sep 15, 2018 at 12:46 AM Caio Marcelo de Oliveira Filho <
caio.olive...@intel.com> wrote:

> Extend the pass to propagate the copies information along the control
> flow graph.  It performs two walks, first it collects the vars
> that were written inside each node. Then it walks applying the copy
> propagation using a list of copies previously available.  At each node
> the list is invalidated according to results from the first walk.
>
> This approach is simpler than a full data-flow analysis, but covers
> various cases.  If derefs are used for operating on more memory
> resources (e.g. SSBOs), the difference from a regular pass is expected
> to be more visible -- as the SSA copy propagation pass won't apply to
> those.
>
> A full data-flow analysis would handle more scenarios: conditional
> breaks in the control flow and merge equivalent effects from multiple
> branches (e.g. using a phi node to merge the source for writes to the
> same deref).  However, as previous commentary in the code stated, its
> complexity 'rapidly get out of hand'.  The current patch is a good
> intermediate step towards more complex analysis.
>
> The 'copies' linked list was modified to use util_dynarray to make it
> more convenient to clone it (to handle ifs/loops).
>
> Annotated shader-db results for Skylake:
>
>     total instructions in shared programs: 15105796 -> 15105451 (<.01%)
>     instructions in affected programs: 152293 -> 151948 (-0.23%)
>     helped: 96
>     HURT: 17
>
>         All the HURTs and many HELPs are one instruction.  Looking
>         at pass by pass outputs, the copy prop kicks in removing a
>         bunch of loads correctly, which ends up altering what other
>         other optimizations kick.  In those cases the copies would be
>         propagated after lowering to SSA.
>
>         In few HELPs we are actually helping doing more than was
>         possible previously, e.g. consolidating load_uniforms from
>         different blocks.  Most of those are from
>         shaders/dolphin/ubershaders/.
>
>     total cycles in shared programs: 566048861 -> 565954876 (-0.02%)
>     cycles in affected programs: 151461830 -> 151367845 (-0.06%)
>     helped: 2933
>     HURT: 2950
>
>         A lot of noise on both sides.
>
>     total loops in shared programs: 4603 -> 4603 (0.00%)
>     loops in affected programs: 0 -> 0
>     helped: 0
>     HURT: 0
>
>     total spills in shared programs: 11085 -> 11073 (-0.11%)
>     spills in affected programs: 23 -> 11 (-52.17%)
>     helped: 1
>     HURT: 0
>
>         The shaders/dolphin/ubershaders/12.shader_test was able to
>         pull a couple of loads from inside if statements and reuse
>         them.
>
>     total fills in shared programs: 23143 -> 23089 (-0.23%)
>     fills in affected programs: 2718 -> 2664 (-1.99%)
>     helped: 27
>     HURT: 0
>
>         All from shaders/dolphin/ubershaders/.
>
>     LOST:   0
>     GAINED: 0
>
> The other generations follow the same overall shape.  The spills and
> fills HURTs are all from the same game.
>
> shader-db results for Broadwell.
>
>     total instructions in shared programs: 15402037 -> 15401841 (<.01%)
>     instructions in affected programs: 144386 -> 144190 (-0.14%)
>     helped: 86
>     HURT: 9
>
>     total cycles in shared programs: 600912755 -> 600902486 (<.01%)
>     cycles in affected programs: 185662820 -> 185652551 (<.01%)
>     helped: 2598
>     HURT: 3053
>
>     total loops in shared programs: 4579 -> 4579 (0.00%)
>     loops in affected programs: 0 -> 0
>     helped: 0
>     HURT: 0
>
>     total spills in shared programs: 80929 -> 80924 (<.01%)
>     spills in affected programs: 720 -> 715 (-0.69%)
>     helped: 1
>     HURT: 5
>
>     total fills in shared programs: 93057 -> 93013 (-0.05%)
>     fills in affected programs: 3398 -> 3354 (-1.29%)
>     helped: 27
>     HURT: 5
>
>     LOST:   0
>     GAINED: 2
>
> shader-db results for Haswell:
>
>     total instructions in shared programs: 9231975 -> 9230357 (-0.02%)
>     instructions in affected programs: 44992 -> 43374 (-3.60%)
>     helped: 27
>     HURT: 69
>
>     total cycles in shared programs: 87760587 -> 87727502 (-0.04%)
>     cycles in affected programs: 7720673 -> 7687588 (-0.43%)
>     helped: 1609
>     HURT: 1416
>
>     total loops in shared programs: 1830 -> 1830 (0.00%)
>     loops in affected programs: 0 -> 0
>     helped: 0
>     HURT: 0
>
>     total spills in shared programs: 1988 -> 1692 (-14.89%)
>     spills in affected programs: 296 -> 0
>     helped: 1
>     HURT: 0
>
>     total fills in shared programs: 2103 -> 1668 (-20.68%)
>     fills in affected programs: 438 -> 3 (-99.32%)
>     helped: 4
>     HURT: 0
>
>     LOST:   0
>     GAINED: 1
> ---
>  src/compiler/nir/nir_opt_copy_prop_vars.c | 394 +++++++++++++++++-----
>  1 file changed, 317 insertions(+), 77 deletions(-)
>
> diff --git a/src/compiler/nir/nir_opt_copy_prop_vars.c
> b/src/compiler/nir/nir_opt_copy_prop_vars.c
> index f58abfbb69f..966ccbdec53 100644
> --- a/src/compiler/nir/nir_opt_copy_prop_vars.c
> +++ b/src/compiler/nir/nir_opt_copy_prop_vars.c
> @@ -26,6 +26,7 @@
>  #include "nir_deref.h"
>
>  #include "util/bitscan.h"
> +#include "util/u_dynarray.h"
>
>  /**
>   * Variable-based copy propagation
> @@ -42,16 +43,21 @@
>   *     to do this because it isn't aware of variable writes that may
> alias the
>   *     value and make the former load invalid.
>   *
> - * Unfortunately, properly handling all of those cases makes this path
> rather
> - * complex.  In order to avoid additional complexity, this pass is
> entirely
> - * block-local.  If we tried to make it global, the data-flow analysis
> would
> - * rapidly get out of hand.  Fortunately, for anything that is only ever
> - * accessed directly, we get SSA based copy-propagation which is extremely
> - * powerful so this isn't that great a loss.
> + * This pass uses an intermediate solution between being local /
> "per-block"
> + * and a complete data-flow analysis.  It follows the control flow graph,
> and
> + * propagate the available copy information forward, invalidating data at
> each
> + * cf_node.
>   *
>   * Removal of dead writes to variables is handled by another pass.
>   */
>
> +struct vars_written {
> +   nir_variable_mode modes;
> +
> +   /* Key is deref and value is the uintptr_t with the write mask. */
> +   struct hash_table *derefs;
> +};
> +
>  struct value {
>     bool is_ssa;
>     union {
> @@ -61,61 +67,170 @@ struct value {
>  };
>
>  struct copy_entry {
> -   struct list_head link;
> -
>     struct value src;
>
>     nir_deref_instr *dst;
>  };
>
>  struct copy_prop_var_state {
> -   nir_shader *shader;
> +   nir_function_impl *impl;
>
>     void *mem_ctx;
> +   void *lin_ctx;
>
> -   struct list_head copies;
> -
> -   /* We're going to be allocating and deleting a lot of copy entries so
> we'll
> -    * keep a free list to avoid thrashing malloc too badly.
> +   /* Maps nodes to vars_written.  Used to invalidate copy entries when
> +    * visiting each node.
>      */
> -   struct list_head copy_free_list;
> +   struct hash_table *vars_written_map;
>
>     bool progress;
>  };
>
> -static struct copy_entry *
> -copy_entry_create(struct copy_prop_var_state *state,
> -                  nir_deref_instr *dst_deref)
> +static struct vars_written *
> +create_vars_written(struct copy_prop_var_state *state)
>  {
> -   struct copy_entry *entry;
> -   if (!list_empty(&state->copy_free_list)) {
> -      struct list_head *item = state->copy_free_list.next;
> -      list_del(item);
> -      entry = LIST_ENTRY(struct copy_entry, item, link);
> -      memset(entry, 0, sizeof(*entry));
> -   } else {
> -      entry = rzalloc(state->mem_ctx, struct copy_entry);
> +   struct vars_written *written =
> +      linear_zalloc_child(state->lin_ctx, sizeof(struct vars_written));
> +   written->derefs = _mesa_hash_table_create(state->mem_ctx,
> _mesa_hash_pointer,
> +                                             _mesa_key_pointer_equal);
> +   return written;
> +}
> +
> +static void
> +gather_vars_written(struct copy_prop_var_state *state,
> +                    struct vars_written *written,
> +                    nir_cf_node *cf_node)
> +{
> +   struct vars_written *new_written = NULL;
> +
> +   switch (cf_node->type) {
> +   case nir_cf_node_function: {
> +      nir_function_impl *impl = nir_cf_node_as_function(cf_node);
> +      foreach_list_typed_safe(nir_cf_node, cf_node, node, &impl->body)
> +         gather_vars_written(state, NULL, cf_node);
> +      break;
>     }
>
> -   entry->dst = dst_deref;
> -   list_add(&entry->link, &state->copies);
> +   case nir_cf_node_block: {
> +      if (!written)
> +         break;
>
> -   return entry;
> +      nir_block *block = nir_cf_node_as_block(cf_node);
> +      nir_foreach_instr(instr, block) {
> +         if (instr->type == nir_instr_type_call) {
> +            written->modes |= nir_var_shader_out |
> +                              nir_var_global |
> +                              nir_var_shader_storage |
> +                              nir_var_shared;
> +            continue;
> +         }
> +
> +         if (instr->type != nir_instr_type_intrinsic)
> +            continue;
> +
> +         nir_intrinsic_instr *intrin = nir_instr_as_intrinsic(instr);
> +         switch (intrin->intrinsic) {
> +         case nir_intrinsic_barrier:
> +         case nir_intrinsic_memory_barrier:
> +            written->modes |= nir_var_shader_out |
> +                              nir_var_shader_storage |
> +                              nir_var_shared;
> +            break;
> +
> +         case nir_intrinsic_emit_vertex:
> +         case nir_intrinsic_emit_vertex_with_counter:
> +            written->modes = nir_var_shader_out;
> +            break;
> +
> +         case nir_intrinsic_store_deref:
> +         case nir_intrinsic_copy_deref: {
> +            /* Destination in _both_ store_deref and copy_deref is
> src[0]. */
> +            nir_deref_instr *dst = nir_src_as_deref(intrin->src[0]);
> +
> +            uintptr_t mask = intrin->intrinsic ==
> nir_intrinsic_store_deref ?
> +               nir_intrinsic_write_mask(intrin) : (1 <<
> glsl_get_vector_elements(dst->type)) - 1;
> +
> +            struct hash_entry *ht_entry =
> _mesa_hash_table_search(written->derefs, dst);
> +            if (ht_entry)
> +               ht_entry->data = (void *)(mask |
> (uintptr_t)ht_entry->data);
> +            else
> +               _mesa_hash_table_insert(written->derefs, dst, (void
> *)mask);
> +
> +            break;
> +         }
> +
> +         default:
> +            break;
> +         }
> +      }
> +
> +      break;
> +   }
> +
> +   case nir_cf_node_if: {
> +      nir_if *if_stmt = nir_cf_node_as_if(cf_node);
> +
> +      new_written = create_vars_written(state);
> +
> +      foreach_list_typed_safe(nir_cf_node, cf_node, node,
> &if_stmt->then_list)
> +         gather_vars_written(state, new_written, cf_node);
> +
> +      foreach_list_typed_safe(nir_cf_node, cf_node, node,
> &if_stmt->else_list)
> +         gather_vars_written(state, new_written, cf_node);
> +
> +      break;
> +   }
> +
> +   case nir_cf_node_loop: {
> +      nir_loop *loop = nir_cf_node_as_loop(cf_node);
> +
> +      new_written = create_vars_written(state);
> +
> +      foreach_list_typed_safe(nir_cf_node, cf_node, node, &loop->body)
> +         gather_vars_written(state, new_written, cf_node);
> +
> +      break;
> +   }
>

default: unreachable() ?


> +   }
> +
> +   if (new_written) {
> +      /* Merge new information to the parent control flow node. */
> +      if (written) {
> +         written->modes |= new_written->modes;
> +         struct hash_entry *ht_entry;
> +         hash_table_foreach(new_written->derefs, ht_entry) {
> +            _mesa_hash_table_insert_pre_hashed(written->derefs,
> ht_entry->hash,
> +                                               ht_entry->key,
> ht_entry->data);
>

Do you want to somehow OR masks together?  This is just picking one of the
two masks.


> +         }
> +      }
> +      _mesa_hash_table_insert(state->vars_written_map, cf_node,
> new_written);
> +   }
> +}
> +
> +static struct copy_entry *
> +copy_entry_create(struct util_dynarray *copies,
> +                  nir_deref_instr *dst_deref)
> +{
> +   struct copy_entry new_entry = {
> +      .dst = dst_deref,
> +   };
> +   util_dynarray_append(copies, struct copy_entry, new_entry);
> +   return util_dynarray_top_ptr(copies, struct copy_entry);
>  }
>
>  static void
> -copy_entry_remove(struct copy_prop_var_state *state, struct copy_entry
> *entry)
> +copy_entry_remove(struct util_dynarray *copies,
> +                  struct copy_entry *entry)
>  {
> -   list_del(&entry->link);
> -   list_add(&entry->link, &state->copy_free_list);
> +   *entry = util_dynarray_pop(copies, struct copy_entry);
>

It might be worth a quick comment to justify that this works.  It took me a
minute to figure out that you were re-ordering the array in the process.


>  }
>
>  static struct copy_entry *
> -lookup_entry_for_deref(struct copy_prop_var_state *state,
> +lookup_entry_for_deref(struct util_dynarray *copies,
>                         nir_deref_instr *deref,
>                         nir_deref_compare_result allowed_comparisons)
>  {
> -   list_for_each_entry(struct copy_entry, iter, &state->copies, link) {
> +   util_dynarray_foreach(copies, struct copy_entry, iter) {
>        if (nir_compare_derefs(iter->dst, deref) & allowed_comparisons)
>           return iter;
>     }
> @@ -124,16 +239,16 @@ lookup_entry_for_deref(struct copy_prop_var_state
> *state,
>  }
>
>  static struct copy_entry *
> -get_entry_and_kill_aliases(struct copy_prop_var_state *state,
> -                           nir_deref_instr *deref,
> -                           unsigned write_mask)
> +lookup_entry_and_kill_aliases(struct util_dynarray *copies,
> +                              nir_deref_instr *deref,
> +                              unsigned write_mask)
>  {
>     struct copy_entry *entry = NULL;
> -   list_for_each_entry_safe(struct copy_entry, iter, &state->copies,
> link) {
> +   util_dynarray_foreach_reverse(copies, struct copy_entry, iter) {
>

Also might be worth commenting why it's safe to remove elements while
walking the array.


>        if (!iter->src.is_ssa) {
>           /* If this write aliases the source of some entry, get rid of it
> */
>           if (nir_compare_derefs(iter->src.deref, deref) &
> nir_derefs_may_alias_bit) {
> -            copy_entry_remove(state, iter);
> +            copy_entry_remove(copies, iter);
>              continue;
>           }
>        }
> @@ -144,28 +259,50 @@ get_entry_and_kill_aliases(struct
> copy_prop_var_state *state,
>           assert(entry == NULL);
>           entry = iter;
>        } else if (comp & nir_derefs_may_alias_bit) {
> -         copy_entry_remove(state, iter);
> +         copy_entry_remove(copies, iter);
>        }
>     }
>
> +   return entry;
> +}
> +
> +static void
> +kill_aliases(struct util_dynarray *copies,
> +             nir_deref_instr *deref,
> +             unsigned write_mask)
> +{
> +   struct copy_entry *entry =
> +      lookup_entry_and_kill_aliases(copies, deref, write_mask);
> +   if (entry)
> +      copy_entry_remove(copies, entry);
> +}
> +
> +static struct copy_entry *
> +get_entry_and_kill_aliases(struct util_dynarray *copies,
> +                           nir_deref_instr *deref,
> +                           unsigned write_mask)
> +{
> +   struct copy_entry *entry =
> +      lookup_entry_and_kill_aliases(copies, deref, write_mask);
> +
>     if (entry == NULL)
> -      entry = copy_entry_create(state, deref);
> +      entry = copy_entry_create(copies, deref);
>
>     return entry;
>  }
>
>  static void
> -apply_barrier_for_modes(struct copy_prop_var_state *state,
> +apply_barrier_for_modes(struct util_dynarray *copies,
>                          nir_variable_mode modes)
>  {
> -   list_for_each_entry_safe(struct copy_entry, iter, &state->copies,
> link) {
> +   util_dynarray_foreach_reverse(copies, struct copy_entry, iter) {
>        nir_variable *dst_var = nir_deref_instr_get_variable(iter->dst);
>        nir_variable *src_var = iter->src.is_ssa ? NULL :
>           nir_deref_instr_get_variable(iter->src.deref);
>
>        if ((dst_var->data.mode & modes) ||
>            (src_var && (src_var->data.mode & modes)))
> -         copy_entry_remove(state, iter);
> +         copy_entry_remove(copies, iter);
>     }
>  }
>
> @@ -396,13 +533,34 @@ try_load_from_entry(struct copy_prop_var_state
> *state, struct copy_entry *entry,
>  }
>
>  static void
> -copy_prop_vars_block(struct copy_prop_var_state *state,
> -                     nir_builder *b, nir_block *block)
> +invalidate_copies_for_node(struct copy_prop_var_state *state,
> +                           struct util_dynarray *copies,
> +                           nir_cf_node *cf_node)
>  {
> -   /* Start each block with a blank slate */
> -   list_for_each_entry_safe(struct copy_entry, iter, &state->copies, link)
> -      copy_entry_remove(state, iter);
> +   struct hash_entry *ht_entry =
> _mesa_hash_table_search(state->vars_written_map, cf_node);
> +   assert(ht_entry);
> +
> +   struct vars_written *written = ht_entry->data;
> +   if (written->modes) {
> +      util_dynarray_foreach_reverse(copies, struct copy_entry, entry) {
> +         nir_variable *var = nir_deref_instr_get_variable(entry->dst);
> +         if (var->data.mode & written->modes)
>

This can just be entry->dst->mode & written->modes


> +            copy_entry_remove(copies, entry);
> +      }
> +   }
>
> +   struct hash_entry *entry;
> +   hash_table_foreach (written->derefs, entry) {
> +      nir_deref_instr *deref_written = (nir_deref_instr *)entry->key;
> +      kill_aliases(copies, deref_written, (uintptr_t)entry->data);
> +   }
> +}
> +
> +static void
> +copy_prop_vars_block(struct copy_prop_var_state *state,
> +                     nir_builder *b, nir_block *block,
> +                     struct util_dynarray *copies)
> +{
>     nir_foreach_instr_safe(instr, block) {
>        if (instr->type == nir_instr_type_call) {
>           apply_barrier_for_modes(copies, nir_var_shader_out |
> @@ -426,14 +584,14 @@ copy_prop_vars_block(struct copy_prop_var_state
> *state,
>
>        case nir_intrinsic_emit_vertex:
>        case nir_intrinsic_emit_vertex_with_counter:
> -         apply_barrier_for_modes(state, nir_var_shader_out);
> +         apply_barrier_for_modes(copies, nir_var_shader_out);
>           break;
>
>        case nir_intrinsic_load_deref: {
>           nir_deref_instr *src = nir_src_as_deref(intrin->src[0]);
>
>           struct copy_entry *src_entry =
> -            lookup_entry_for_deref(state, src,
> nir_derefs_a_contains_b_bit);
> +            lookup_entry_for_deref(copies, src,
> nir_derefs_a_contains_b_bit);
>           struct value value;
>           if (try_load_from_entry(state, src_entry, b, intrin, src,
> &value)) {
>              if (value.is_ssa) {
> @@ -478,9 +636,9 @@ copy_prop_vars_block(struct copy_prop_var_state *state,
>            * contains what we're looking for.
>            */
>           struct copy_entry *store_entry =
> -            lookup_entry_for_deref(state, src, nir_derefs_equal_bit);
> +            lookup_entry_for_deref(copies, src, nir_derefs_equal_bit);
>           if (!store_entry)
> -            store_entry = copy_entry_create(state, src);
> +            store_entry = copy_entry_create(copies, src);
>
>           /* Set up a store to this entry with the value of the load.
> This way
>            * we can potentially remove subsequent loads.  However, we use a
> @@ -503,7 +661,7 @@ copy_prop_vars_block(struct copy_prop_var_state *state,
>           nir_deref_instr *dst = nir_src_as_deref(intrin->src[0]);
>           unsigned wrmask = nir_intrinsic_write_mask(intrin);
>           struct copy_entry *entry =
> -            get_entry_and_kill_aliases(state, dst, wrmask);
> +            get_entry_and_kill_aliases(copies, dst, wrmask);
>           store_to_entry(state, entry, &value, wrmask);
>           break;
>        }
> @@ -519,7 +677,7 @@ copy_prop_vars_block(struct copy_prop_var_state *state,
>           }
>
>           struct copy_entry *src_entry =
> -            lookup_entry_for_deref(state, src,
> nir_derefs_a_contains_b_bit);
> +            lookup_entry_for_deref(copies, src,
> nir_derefs_a_contains_b_bit);
>           struct value value;
>           if (try_load_from_entry(state, src_entry, b, intrin, src,
> &value)) {
>              if (value.is_ssa) {
> @@ -546,7 +704,7 @@ copy_prop_vars_block(struct copy_prop_var_state *state,
>           }
>
>           struct copy_entry *dst_entry =
> -            get_entry_and_kill_aliases(state, dst, 0xf);
> +            get_entry_and_kill_aliases(copies, dst, 0xf);
>           store_to_entry(state, dst_entry, &value, 0xf);
>           break;
>        }
> @@ -557,36 +715,118 @@ copy_prop_vars_block(struct copy_prop_var_state
> *state,
>     }
>  }
>
> -bool
> -nir_opt_copy_prop_vars(nir_shader *shader)
> +static void
> +copy_prop_vars_node(struct copy_prop_var_state *state,
> +                    struct util_dynarray *copies,
> +                    nir_cf_node *cf_node)
>  {
> -   struct copy_prop_var_state state;
> +   switch (cf_node->type) {
> +   case nir_cf_node_function: {
> +      nir_function_impl *impl = nir_cf_node_as_function(cf_node);
>
> -   state.shader = shader;
> -   state.mem_ctx = ralloc_context(NULL);
> -   list_inithead(&state.copies);
> -   list_inithead(&state.copy_free_list);
> +      struct util_dynarray impl_copies;
> +      util_dynarray_init(&impl_copies, state->mem_ctx);
>
> -   bool global_progress = false;
> -   nir_foreach_function(function, shader) {
> -      if (!function->impl)
> -         continue;
> +      foreach_list_typed_safe(nir_cf_node, cf_node, node, &impl->body)
> +         copy_prop_vars_node(state, &impl_copies, cf_node);
> +
> +      break;
> +   }
>
> +   case nir_cf_node_block: {
> +      nir_block *block = nir_cf_node_as_block(cf_node);
>        nir_builder b;
> -      nir_builder_init(&b, function->impl);
> +      nir_builder_init(&b, state->impl);
> +      copy_prop_vars_block(state, &b, block, copies);
> +      break;
> +   }
>
> -      state.progress = false;
> -      nir_foreach_block(block, function->impl)
> -         copy_prop_vars_block(&state, &b, block);
> +   case nir_cf_node_if: {
> +      nir_if *if_stmt = nir_cf_node_as_if(cf_node);
>
> -      if (state.progress) {
> -         nir_metadata_preserve(function->impl, nir_metadata_block_index |
> -                                               nir_metadata_dominance);
> -         global_progress = true;
> -      }
> +      /* Clone the copies for each branch of the if statement.  The idea
> is
> +       * that they both see the same state of available copies, but do not
> +       * interfere to each other.
> +       */
> +
> +      struct util_dynarray then_copies;
> +      util_dynarray_clone(&then_copies, state->mem_ctx, copies);
> +
> +      struct util_dynarray else_copies;
> +      util_dynarray_clone(&else_copies, state->mem_ctx, copies);
> +
> +      foreach_list_typed_safe(nir_cf_node, cf_node, node,
> &if_stmt->then_list)
> +         copy_prop_vars_node(state, &then_copies, cf_node);
> +
> +      foreach_list_typed_safe(nir_cf_node, cf_node, node,
> &if_stmt->else_list)
> +         copy_prop_vars_node(state, &else_copies, cf_node);
> +
> +      /* Both branches copies can be ignored, since the effect of running
> both
> +       * branches was captured in the first pass that collects
> vars_written.
> +       */
> +
> +      invalidate_copies_for_node(state, copies, cf_node);
> +
> +      break;
>     }
>
> -   ralloc_free(state.mem_ctx);
> +   case nir_cf_node_loop: {
> +      nir_loop *loop = nir_cf_node_as_loop(cf_node);
> +
> +      /* Invalidate before cloning the copies for the loop, since the loop
> +       * body can be executed more than once.
> +       */
> +
> +      invalidate_copies_for_node(state, copies, cf_node);
> +
> +      struct util_dynarray loop_copies;
> +      util_dynarray_clone(&loop_copies, state->mem_ctx, copies);
> +
> +      foreach_list_typed_safe(nir_cf_node, cf_node, node, &loop->body)
> +         copy_prop_vars_node(state, &loop_copies, cf_node);
> +
> +      break;
> +   }
> +   }
> +}
> +
> +static bool
> +nir_copy_prop_vars_impl(nir_function_impl *impl)
> +{
> +   void *mem_ctx = ralloc_context(NULL);
> +
> +   struct copy_prop_var_state state = {
> +      .impl = impl,
> +      .mem_ctx = mem_ctx,
> +      .lin_ctx = linear_zalloc_parent(mem_ctx, 0),
> +
> +      .vars_written_map = _mesa_hash_table_create(mem_ctx,
> _mesa_hash_pointer,
> +
> _mesa_key_pointer_equal),
> +   };
> +
> +   gather_vars_written(&state, NULL, &impl->cf_node);
> +
> +   copy_prop_vars_node(&state, NULL, &impl->cf_node);
> +
> +   if (state.progress) {
> +      nir_metadata_preserve(impl, nir_metadata_block_index |
> +                                  nir_metadata_dominance);
> +   }
> +
> +   ralloc_free(mem_ctx);
> +   return state.progress;
> +}
> +
> +bool
> +nir_opt_copy_prop_vars(nir_shader *shader)
> +{
> +   bool progress = false;
> +
> +   nir_foreach_function(function, shader) {
> +      if (!function->impl)
> +         continue;
> +      progress |= nir_copy_prop_vars_impl(function->impl);
> +   }
>
> -   return global_progress;
> +   return progress;
>  }
> --
> 2.19.0
>
> _______________________________________________
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/mesa-dev
>
_______________________________________________
mesa-dev mailing list
mesa-dev@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to