On Thu, Oct 29, 2015 at 8:18 PM, Alan Lawrence <alan.lawre...@arm.com> wrote:
> This has changed quite a bit since the previous revision
> (https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01484.html), mostly due to Ada
> and specifically Ada on ARM.
>
> I didn't find a good alternative to scanning for constant-pool accesses "as we
> go" through the function, and although I didn't find any of these accesses
> being disqualified after we'd found them in C, some awkward constant-pool
> expressions in Ada forced me to disqualify them for a new reason (inability to
> constant-propagate). Hence, I introduced a new bitmap of
> disqualified_constants, rather than an additional pass through the function.
>
> Next, the approach of using a replacement_expr (with tho constant value) in
> place of the old replacement_decl, also failed (the decl could be used in an
> expression where substituting in the expr produced ill-formed gimple). Hence,
> constant-pool entries now use replacement_decls just like everything else, for
> which I generate initializers in initialize_constant_pool_replacements.
>
> However, I was able to avoid generating the replacement constants early, and 
> to
> do so late in initialize_constant_pool_replacements; the main trick here was 
> to
> use fold_array_ctor_reference, kindly pointed out by Richie :), to fold the
> MEM_REFs built in analyze_access_subtree.
>
> Finally, I found completely_scalarize was still failing to fold all the
> constant expressions, because Ada was putting VIEW_CONVERT_EXPRs in the
> constant pool, and fold-const.c did not deal with ARRAY_REFs of these. 
> However,
> requiring completely_scalarize to be able to fold everything, seemed fragile,
> instead I decoupled from fold-const by allowing to fail.
>
> With these changes, ssa-dom-cse-2.c is fixed on all platforms with appropriate
> --param.

Hum.  I still wonder why we need all this complication ...  I would
expect that if
we simply decided to completely scalarize a constant pool aggregate load then
we can always do that.  SRA should then simply emit the _same_ IL as it does
for other complete scalarization element accesses.  The only difference is
that the result should be simplifiable to omit the element load from
the constant pool.

1) The FRE pass following SRA should do that for you.

2) You should be able to use fold_ctor_reference directly (in place of
all your code
in case offset and size are readily available - don't remember exactly how
complete scalarization "walks" elements).  Alternatively use
fold_const_aggregate_ref.

3) You can simplify the stmt SRA generated by simply calling fold_stmt on it,
that will do a bit more (wasted) work compared to 2) but may be easier.

I wouldn't bother with the case where we for some reason do not simplify
the constant pool load.

That is, I'd like to see the patch greatly simplified to just consider
constant pool
complete scalarization as if it were a regular variable.

Richard.

>
> There are still a few remaining test failures, on AArch64:
> gcc.dg/guality/pr54970.c   -O1  line 15 a[0] == 1
> gcc.dg/guality/pr54970.c   -O1  line 20 a[0] == 1
> gcc.dg/guality/pr54970.c   -O1  line 25 a[0] == 1
> (also at -O2, -O2 -flto with/without plugin, -O3 -g and -Os).
> ...which I'm working on. I also tested with the hack at 
> https://gcc.gnu.org/ml/gcc-patches/2015-08/msg01483.html . This revealed one 
> new failure in advsimd-intrinsics/vldX.c on AArch64 (fixed by 
> https://gcc.gnu.org/ml/gcc-patches/2015-10/msg02777.html), and fixed a bunch 
> of guality failures on ARM:
> gcc.dg/guality/pr54970.c   -O1  line 31 a[0] == 4
> gcc.dg/guality/pr54970.c   -O1  line 36 a[0] == 4
> gcc.dg/guality/pr54970.c   -O1  line 45 a[0] == 4
> gcc.dg/guality/pr54970.c   -O1  line 45 p[-2] == 4
> gcc.dg/guality/pr54970.c   -O1  line 45 q[-1] == 4
> (also at -O2, -O2 -flto with/without plugin, -O3 -g and -Os)
>
> --Alan
>
> gcc/ChangeLog:
>
>         * tree-sra.c (disqualified_constants): New.
>         (sra_initialize): Initialize it.
>         (sra_deinitialize): Deallocate it.
>         (disqualify_candidate): Set bit in disqualified_constants.
>         (subst_constant_pool_initial): New.
>         (create_access): Scan for constant-pool entries as we go.
>         (scalarizable_type_p): Disallow types containing placeholders.
>         (completely_scalarize): Return bool to allow failure.
>         (scalarize_elem): Likewise; check we can generate constant 
> replacements.
>         (maybe_add_sra_candidate): Allow constant-pool entries.
>         (analyze_access_subtree): Checking-assert that we can fold any refs
>         built for constant-pool entries.
>         (analyze_all_variable_accesses): Deal with completely_scalarize 
> failing.
>         (initialize_constant_pool_replacements): New.
>         (sra_modify_function_body): Call previous.
>
> gcc/testsuite/ChangeLog:
>
>         * gcc.dg/tree-ssa/ssa-dom-cse-2.c: Add sra-max-scalarization-size 
> param,
>         remove xfail.
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c |   7 +-
>  gcc/tree-sra.c                                | 221 
> ++++++++++++++++++++++++--
>  2 files changed, 208 insertions(+), 20 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c
> index 9eccdc9..b13d583 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-O3 -fno-tree-fre -fno-tree-pre -fdump-tree-optimized" } */
> +/* { dg-options "-O3 -fno-tree-fre -fno-tree-pre -fdump-tree-optimized 
> --param sra-max-scalarization-size-Ospeed=32" } */
>
>  int
>  foo ()
> @@ -17,7 +17,4 @@ foo ()
>  /* After late unrolling the above loop completely DOM should be
>     able to optimize this to return 28.  */
>
> -/* See PR63679 and PR64159, if the target forces the initializer to memory 
> then
> -   DOM is not able to perform this optimization.  */
> -
> -/* { dg-final { scan-tree-dump "return 28;" "optimized" { xfail aarch64*-*-* 
> alpha*-*-* hppa*-*-* powerpc*-*-* sparc*-*-* s390*-*-* } } } */
> +/* { dg-final { scan-tree-dump "return 28;" "optimized" } } */
> diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
> index 358db79..1920265 100644
> --- a/gcc/tree-sra.c
> +++ b/gcc/tree-sra.c
> @@ -90,6 +90,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "gimple-iterator.h"
>  #include "gimplify-me.h"
>  #include "gimple-walk.h"
> +#include "gimple-fold.h"
>  #include "tree-cfg.h"
>  #include "flags.h"
>  #include "insn-config.h"
> @@ -336,6 +337,10 @@ candidate (unsigned uid)
>     those which cannot be (because they are and need be used as a whole).  */
>  static bitmap should_scalarize_away_bitmap, cannot_scalarize_away_bitmap;
>
> +/* Bitmap of candidates in the constant pool, which cannot be scalarized
> +   because this would produce non-constant expressions (e.g. Ada).  */
> +static bitmap disqualified_constants;
> +
>  /* Obstack for creation of fancy names.  */
>  static struct obstack name_obstack;
>
> @@ -658,6 +663,7 @@ sra_initialize (void)
>      (vec_safe_length (cfun->local_decls) / 2);
>    should_scalarize_away_bitmap = BITMAP_ALLOC (NULL);
>    cannot_scalarize_away_bitmap = BITMAP_ALLOC (NULL);
> +  disqualified_constants = BITMAP_ALLOC (NULL);
>    gcc_obstack_init (&name_obstack);
>    base_access_vec = new hash_map<tree, auto_vec<access_p> >;
>    memset (&sra_stats, 0, sizeof (sra_stats));
> @@ -676,6 +682,7 @@ sra_deinitialize (void)
>    candidates = NULL;
>    BITMAP_FREE (should_scalarize_away_bitmap);
>    BITMAP_FREE (cannot_scalarize_away_bitmap);
> +  BITMAP_FREE (disqualified_constants);
>    access_pool.release ();
>    assign_link_pool.release ();
>    obstack_free (&name_obstack, NULL);
> @@ -690,6 +697,8 @@ disqualify_candidate (tree decl, const char *reason)
>  {
>    if (bitmap_clear_bit (candidate_bitmap, DECL_UID (decl)))
>      candidates->remove_elt_with_hash (decl, DECL_UID (decl));
> +  if (TREE_CODE (decl) == VAR_DECL && DECL_IN_CONSTANT_POOL (decl))
> +    bitmap_set_bit (disqualified_constants, DECL_UID (decl));
>
>    if (dump_file && (dump_flags & TDF_DETAILS))
>      {
> @@ -841,8 +850,76 @@ create_access_1 (tree base, HOST_WIDE_INT offset, 
> HOST_WIDE_INT size)
>    return access;
>  }
>
> +/* Given EXPR a chain of COMPONENT_REFs and/or ARRAY_REFs built around a
> +   constant-pool declaration VAR, builds a tree like EXPR but with VAR 
> replaced
> +   by its DECL_INITIAL, and tries to fold it to a constant.  If folding 
> succeeds
> +   then return the folded tree, else NULL_TREE.  */
> +
> +static tree
> +subst_constant_pool_initial (tree expr, tree var)
> +{
> +  if (TREE_CODE (expr) == VAR_DECL)
> +    {
> +      gcc_assert (DECL_IN_CONSTANT_POOL (expr));
> +      gcc_assert (expr == var);
> +      return DECL_INITIAL (expr);
> +    }
> +  if (TREE_CODE (expr) == COMPONENT_REF)
> +    {
> +      tree field = TREE_OPERAND (expr, 1);
> +      gcc_assert (TREE_CODE (field) == FIELD_DECL);
> +      gcc_assert (TREE_OPERAND (expr, 2) == NULL_TREE);
> +      tree record = subst_constant_pool_initial (TREE_OPERAND (expr, 0), 
> var);
> +      if (!record)
> +       return NULL_TREE;
> +      tree ref = build3 (COMPONENT_REF, TREE_TYPE (expr),
> +                        record, field, NULL_TREE);
> +      tree res = fold (ref);
> +      if (res != ref)
> +       return res;
> +    }
> +  else if (TREE_CODE (expr) == ARRAY_REF)
> +    {
> +      tree array = subst_constant_pool_initial (TREE_OPERAND (expr, 0), var);
> +      if (!array)
> +       return NULL_TREE;
> +      tree ref = build4 (ARRAY_REF, TREE_TYPE (expr),
> +                        array,
> +                        TREE_OPERAND (expr, 1),
> +                        TREE_OPERAND (expr, 2),
> +                        TREE_OPERAND (expr, 3));
> +      /* Ada (at least) builds ARRAY_REFs around strings.  */
> +      if (tree folded = fold_read_from_constant_string (ref))
> +       return folded;
> +      tree res = fold (ref);
> +      if (res != ref)
> +       return res;
> +    }
> +  else if (TREE_CODE (expr) == MEM_REF)
> +    {
> +      gcc_assert (TREE_CODE (TREE_OPERAND (expr, 0)) == ADDR_EXPR);
> +      tree type = TREE_TYPE (expr);
> +      tree sz = TYPE_SIZE (TREE_TYPE (expr));
> +      tree val = TREE_OPERAND (TREE_OPERAND (expr, 0), 0);
> +      val = subst_constant_pool_initial (val, var);
> +      if (TREE_CODE (val) != CONSTRUCTOR
> +         || !tree_fits_uhwi_p (TREE_OPERAND (expr, 1))
> +         || !tree_fits_uhwi_p (sz))
> +       return NULL_TREE;
> +      unsigned HOST_WIDE_INT offset =
> +         tree_to_uhwi (TREE_OPERAND (expr, 1)) * BITS_PER_UNIT;
> +      return fold_ctor_reference (type, val, offset, tree_to_uhwi (sz), var);
> +    }
> +
> +  /* Unknown expression, or could not constant-fold.  */
> +  return NULL_TREE;
> +}
> +
> +static bool maybe_add_sra_candidate (tree var);
> +
>  /* Create and insert access for EXPR. Return created access, or NULL if it is
> -   not possible.  */
> +   not possible.  Also scan for uses of constant pool as we go along and add
> +   to candidates.  */
>
>  static struct access *
>  create_access (tree expr, gimple *stmt, bool write)
> @@ -865,6 +942,32 @@ create_access (tree expr, gimple *stmt, bool write)
>    else
>      ptr = false;
>
> +  /* For constant-pool entries, check we can substitute the constant value.  
> */
> +  if (TREE_CODE (base) == VAR_DECL && DECL_IN_CONSTANT_POOL (base)
> +      && (sra_mode == SRA_MODE_EARLY_INTRA || sra_mode == SRA_MODE_INTRA))
> +    {
> +      gcc_assert (!write);
> +      if (bitmap_bit_p (disqualified_constants, DECL_UID (base)))
> +       return NULL;
> +      tree repl = subst_constant_pool_initial (expr, base);
> +      if (dump_file && (dump_flags & TDF_DETAILS))
> +      {
> +       fprintf (dump_file, "Trying to scalarize constant-pool ");
> +       print_generic_expr (dump_file, base, 0);
> +       fprintf (dump_file, "\n In expression: ");
> +       print_generic_expr (dump_file, expr, 0);
> +       fprintf (dump_file, "\n Constant replacement: ");
> +       print_generic_expr (dump_file, repl, 0);
> +       fprintf (dump_file, "\n");
> +      }
> +      /* If we can't resolve the expression to a constant, we risk creating
> +         a malformed expression (e.g. Ada testsuite, ACATS chapter C3).  */
> +      if (!repl || !TREE_CONSTANT (repl))
> +       disqualify_candidate (base, "Cannot constant-propagate out of pool.");
> +      else
> +       maybe_add_sra_candidate (base);
> +    }
> +
>    if (!DECL_P (base) || !bitmap_bit_p (candidate_bitmap, DECL_UID (base)))
>      return NULL;
>
> @@ -923,6 +1026,8 @@ static bool
>  scalarizable_type_p (tree type)
>  {
>    gcc_assert (!is_gimple_reg_type (type));
> +  if (type_contains_placeholder_p (type))
> +    return false;
>
>    switch (TREE_CODE (type))
>    {
> @@ -970,15 +1075,19 @@ scalarizable_type_p (tree type)
>    }
>  }
>
> -static void scalarize_elem (tree, HOST_WIDE_INT, HOST_WIDE_INT, tree, tree);
> +static bool scalarize_elem (tree, HOST_WIDE_INT, HOST_WIDE_INT, tree, tree);
>
>  /* Create total_scalarization accesses for all scalar fields of a member
>     of type DECL_TYPE conforming to scalarizable_type_p.  BASE
>     must be the top-most VAR_DECL representing the variable; within that,
>     OFFSET locates the member and REF must be the memory reference expression 
> for
> -   the member.  */
> +   the member.
>
> -static void
> +   Return TRUE if successful; may fail in the case of constant-pool entry 
> BASE
> +   whose DECL_INITIAL contains values to which we cannot fold ARRAY_REF or
> +   COMPONENT_REF expressions.  */
> +
> +static bool
>  completely_scalarize (tree base, tree decl_type, HOST_WIDE_INT offset, tree 
> ref)
>  {
>    switch (TREE_CODE (decl_type))
> @@ -991,10 +1100,11 @@ completely_scalarize (tree base, tree decl_type, 
> HOST_WIDE_INT offset, tree ref)
>             tree ft = TREE_TYPE (fld);
>             tree nref = build3 (COMPONENT_REF, ft, ref, fld, NULL_TREE);
>
> -           scalarize_elem (base, pos, tree_to_uhwi (DECL_SIZE (fld)), nref,
> -                           ft);
> +           if (!scalarize_elem (base, pos, tree_to_uhwi (DECL_SIZE (fld)),
> +                                nref, ft))
> +             return false;
>           }
> -      break;
> +      return true;
>      case ARRAY_TYPE:
>        {
>         tree elemtype = TREE_TYPE (decl_type);
> @@ -1026,12 +1136,13 @@ completely_scalarize (tree base, tree decl_type, 
> HOST_WIDE_INT offset, tree ref)
>                                     ref, size_int (idx + min),
>                                     NULL_TREE, NULL_TREE);
>                 int el_off = offset + idx * el_size;
> -               scalarize_elem (base, el_off, el_size, nref, elemtype);
> +               if (!scalarize_elem (base, el_off, el_size, nref, elemtype))
> +                 return false;
>               }
>             while (++idx <= lenlessone);
>           }
>        }
> -      break;
> +      return true;
>      default:
>        gcc_unreachable ();
>      }
> @@ -1040,22 +1151,32 @@ completely_scalarize (tree base, tree decl_type, 
> HOST_WIDE_INT offset, tree ref)
>  /* Create total_scalarization accesses for a member of type TYPE, which must
>     satisfy either is_gimple_reg_type or scalarizable_type_p.  BASE must be 
> the
>     top-most VAR_DECL representing the variable; within that, POS and SIZE 
> locate
> -   the member and REF must be the reference expression for it.  */
> +   the member and REF must be the reference expression for it.
>
> -static void
> +   Return TRUE if successful; may fail in the case of constant-pool entry 
> BASE
> +   whose DECL_INITIAL contains values to which we cannot fold ARRAY_REF or
> +   COMPONENT_REF expressions.  */
> +
> +static bool
>  scalarize_elem (tree base, HOST_WIDE_INT pos, HOST_WIDE_INT size,
>                  tree ref, tree type)
>  {
>    if (is_gimple_reg_type (type))
>    {
> +    if (TREE_CODE (base) == VAR_DECL
> +       && DECL_IN_CONSTANT_POOL (base)
> +       && subst_constant_pool_initial (ref, base) == NULL_TREE)
> +      return false;
>      struct access *access = create_access_1 (base, pos, size);
>      access->expr = ref;
>      access->type = type;
>      access->grp_total_scalarization = 1;
>      /* Accesses for intraprocedural SRA can have their stmt NULL.  */
> +
> +    return true;
>    }
> -  else
> -    completely_scalarize (base, type, pos, ref);
> +
> +  return completely_scalarize (base, type, pos, ref);
>  }
>
>  /* Create a total_scalarization access for VAR as a whole.  VAR must be of a
> @@ -1843,7 +1964,12 @@ maybe_add_sra_candidate (tree var)
>        reject (var, "not aggregate");
>        return false;
>      }
> -  if (needs_to_live_in_memory (var))
> +  /* Allow constant-pool entries (that "need to live in memory")
> +     unless we are doing IPA SRA.  */
> +  if (needs_to_live_in_memory (var)
> +      && (sra_mode == SRA_MODE_EARLY_IPA
> +         || TREE_CODE (var) != VAR_DECL
> +         || !DECL_IN_CONSTANT_POOL (var)))
>      {
>        reject (var, "needs to live in memory");
>        return false;
> @@ -2343,6 +2469,10 @@ analyze_access_subtree (struct access *root, struct 
> access *parent,
>                        (unsigned) root->offset, (unsigned) root->size);
>               fprintf (dump_file, " to an integer.\n");
>             }
> +         gcc_checking_assert (TREE_CODE (root->base) != VAR_DECL
> +                              || !DECL_IN_CONSTANT_POOL (root->base)
> +                              || subst_constant_pool_initial (root->expr,
> +                                                              root->base));
>         }
>
>        root->grp_to_be_replaced = 1;
> @@ -2604,8 +2734,17 @@ analyze_all_variable_accesses (void)
>             if (tree_to_uhwi (TYPE_SIZE (TREE_TYPE (var)))
>                 <= max_scalarization_size)
>               {
> +               vec<access_p> &accesses = base_access_vec->get_or_insert 
> (var);
> +               unsigned orig_count = accesses.length ();
> +               if (!completely_scalarize (var, TREE_TYPE (var), 0, var))
> +                 {
> +                   /* Failed.  Discard any accesses created during attempt.  
> */
> +                   for (unsigned i = orig_count; i < accesses.length (); i++)
> +                     access_pool.remove (accesses[i]);
> +                   accesses.truncate (orig_count);
> +                   continue;
> +                 }
>                 create_total_scalarization_access (var);
> -               completely_scalarize (var, TREE_TYPE (var), 0, var);
>                 if (dump_file && (dump_flags & TDF_DETAILS))
>                   {
>                     fprintf (dump_file, "Will attempt to totally scalarize ");
> @@ -3480,6 +3619,56 @@ sra_modify_assign (gimple *stmt, gimple_stmt_iterator 
> *gsi)
>      }
>  }
>
> +static void
> +initialize_constant_pool_replacements (void)
> +{
> +  gimple_seq seq = NULL;
> +  gimple_stmt_iterator gsi = gsi_start (seq);
> +  bitmap_iterator bi;
> +  unsigned i;
> +
> +  EXECUTE_IF_SET_IN_BITMAP (candidate_bitmap, 0, i, bi)
> +    if (bitmap_bit_p (should_scalarize_away_bitmap, i)
> +       && !bitmap_bit_p (cannot_scalarize_away_bitmap, i))
> +      {
> +       tree var = candidate (i);
> +       if (TREE_CODE (var) != VAR_DECL || !DECL_IN_CONSTANT_POOL (var))
> +         continue;
> +       vec<access_p> *access_vec = get_base_access_vector (var);
> +       if (!access_vec)
> +         continue;
> +       for (unsigned i = 0; i < access_vec->length (); i++)
> +         {
> +           struct access *access = (*access_vec)[i];
> +           tree val = subst_constant_pool_initial (access->expr, 
> access->base);
> +           if (dump_file && (dump_flags & TDF_DETAILS))
> +             {
> +               fprintf (dump_file, "Expression ");
> +               print_generic_expr (dump_file, access->expr, 0);
> +               fprintf (dump_file, " replaced with ");
> +               print_generic_expr (dump_file, access->replacement_decl, 0);
> +               fprintf (dump_file, " initialized to ");
> +               print_generic_expr (dump_file, val, 0);
> +               fprintf (dump_file, "\n");
> +               if (!access->replacement_decl)
> +                 dump_access (dump_file, access, true);
> +             }
> +           if (!access->replacement_decl)
> +             continue;
> +           gcc_assert (val);
> +           gassign *stmt = gimple_build_assign (
> +             get_access_replacement (access), val);
> +           gsi_insert_after (&gsi, stmt, GSI_NEW_STMT);
> +           update_stmt (stmt);
> +         }
> +      }
> +
> +  seq = gsi_seq (gsi);
> +  if (seq)
> +    gsi_insert_seq_on_edge_immediate (
> +      single_succ_edge (ENTRY_BLOCK_PTR_FOR_FN (cfun)), seq);
> +}
> +
>  /* Traverse the function body and all modifications as decided in
>     analyze_all_variable_accesses.  Return true iff the CFG has been
>     changed.  */
> @@ -3490,6 +3679,8 @@ sra_modify_function_body (void)
>    bool cfg_changed = false;
>    basic_block bb;
>
> +  initialize_constant_pool_replacements ();
> +
>    FOR_EACH_BB_FN (bb, cfun)
>      {
>        gimple_stmt_iterator gsi = gsi_start_bb (bb);
> --
> 1.9.1
>

Reply via email to