Re: [RFC 4/5] Handle constant-pool entries

2015-08-26 Thread Richard Biener
On Tue, Aug 25, 2015 at 10:13 PM, Jeff Law l...@redhat.com wrote:
 On 08/25/2015 05:06 AM, Alan Lawrence wrote:

 This makes SRA replace loads of records/arrays from constant pool entries,
 with elementwise assignments of the constant values, hence, overcoming the
 fundamental problem in PR/63679.

 As a first pass, the approach I took was to look for constant-pool loads
 as
 we scanned through other accesses, and add them as candidates there; to
 build a
 constant replacement_decl for any such accesses in completely_scalarize;
 and to
 use any existing replacement_decl rather than creating a variable in
 create_access_replacement. (I did try using CONSTANT_CLASS_P in the
 latter, but
 that does not allow addresses of labels, which can still end up in the
 constant
 pool.)

 Feedback as to the approach or how it might be better structured / fitted
 into
 SRA, is solicited ;).

 Bootstrapped + check-gcc on x86-none-linux-gnu, aarch64-none-linux-gnu and
 arm-none-linux-gnueabihf, including with the next patch (rfc), which
 greatly increases the number of testcases in which this code is exercised!

 Have also verified that the ssa-dom-cse-2.c scan-tree-dump test passes
 (using a stage 1 compiler only, without execution) on alpha, hppa, powerpc,
 sparc, avr, and sh.

 gcc/ChangeLog:

 * tree-sra.c (create_access): Scan for uses of constant pool and
 add
 to candidates.
 (subst_initial): New.
 (scalarize_elem): Build replacement_decl using subst_initial.
 (create_access_replacement): Use replacement_decl if set.

 gcc/testsuite/ChangeLog:

 * gcc.dg/tree-ssa/ssa-dom-cse-2.c: Remove xfail, add --param
 sra-max-scalarization-size-Ospeed.
 ---
   gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c |  7 +---
   gcc/tree-sra.c| 56
 +--
   2 files changed, 55 insertions(+), 8 deletions(-)

 diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
 index af35fcc..a3ff2df 100644
 --- a/gcc/tree-sra.c
 +++ b/gcc/tree-sra.c
 @@ -865,6 +865,17 @@ create_access (tree expr, gimple stmt, bool write)
 else
   ptr = false;

 +  /* FORNOW: scan for uses of constant pool as we go along.  */

 I'm not sure why you have this marked as FORNOW.  If I'm reading all this
 code correctly, you're lazily adding items from the constant pool into the
 candidates table when you find they're used.  That seems better than walking
 the entire constant pool adding them all to the candidates.

 I don't see this as fundamentally wrong or unclean.

 The question I have is why this differs from the effects of patch #5. That
 would seem to indicate that there's things we're not getting into the
 candidate tables with this approach?!?



 @@ -1025,6 +1036,37 @@ completely_scalarize (tree base, tree decl_type,
 HOST_WIDE_INT offset, tree ref)
   }
   }

 +static tree
 +subst_initial (tree expr, tree var)

 Function comment.

 I think this patch is fine with the function comment added and removing the
 FORNOW part of the comment in create_access.  It may be worth noting in
 create_access's comment that it can add new items to the candidates tables
 for constant pool entries.

I'm happy seeing this code in SRA as I never liked that we already decide
at gimplification time which initializers to expand and which to init from
a constant pool entry.  So ... can we now remove gimplify_init_constructor
by _always_ emitting a constant pool entry and an assignment from it
(obviously only if the constructor can be put into the constant pool)?  Defering
the expansion decision to SRA makes it possible to better estimate whether
the code is hot/cold or whether the initialized variable can be replaced by
the constant pool entry completely (variable ends up readonly).

Oh, and we'd no longer create the awful split code at -O0 ...

So can you explore that a bit once this series is settled?  This is probably
also related to 5/5 as this makes all the target dependent decisions in SRA
now and thus the initial IL from gimplification should be the same for all
targets (that's always a nice thing to have IMHO).

Thanks,
Richard.


 Jeff


Re: [RFC 4/5] Handle constant-pool entries

2015-08-26 Thread Martin Jambor
Hi,

On Tue, Aug 25, 2015 at 12:06:16PM +0100, Alan Lawrence wrote:
 This makes SRA replace loads of records/arrays from constant pool entries,
 with elementwise assignments of the constant values, hence, overcoming the
 fundamental problem in PR/63679.
 
 As a first pass, the approach I took was to look for constant-pool loads as
 we scanned through other accesses, and add them as candidates there; to build 
 a
 constant replacement_decl for any such accesses in completely_scalarize; and 
 to
 use any existing replacement_decl rather than creating a variable in
 create_access_replacement. (I did try using CONSTANT_CLASS_P in the latter, 
 but
 that does not allow addresses of labels, which can still end up in the 
 constant
 pool.)
 
 Feedback as to the approach or how it might be better structured / fitted into
 SRA, is solicited ;).

I'm not familiar with constant pools very much, but I'll try:

 
 Bootstrapped + check-gcc on x86-none-linux-gnu, aarch64-none-linux-gnu and
 arm-none-linux-gnueabihf, including with the next patch (rfc), which greatly 
 increases the number of testcases in which this code is exercised!
 
 Have also verified that the ssa-dom-cse-2.c scan-tree-dump test passes (using 
 a stage 1 compiler only, without execution) on alpha, hppa, powerpc, sparc, 
 avr, and sh.
 
 gcc/ChangeLog:
 
   * tree-sra.c (create_access): Scan for uses of constant pool and add
   to candidates.
   (subst_initial): New.
   (scalarize_elem): Build replacement_decl using subst_initial.
   (create_access_replacement): Use replacement_decl if set.
 
 gcc/testsuite/ChangeLog:
 
   * gcc.dg/tree-ssa/ssa-dom-cse-2.c: Remove xfail, add --param
   sra-max-scalarization-size-Ospeed.
 ---
  gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c |  7 +---
  gcc/tree-sra.c| 56 
 +--
  2 files changed, 55 insertions(+), 8 deletions(-)
 
 diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c 
 b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c
 index 9eccdc9..b13d583 100644
 --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c
 +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c
 @@ -1,5 +1,5 @@
  /* { dg-do compile } */
 -/* { dg-options -O3 -fno-tree-fre -fno-tree-pre -fdump-tree-optimized } */
 +/* { dg-options -O3 -fno-tree-fre -fno-tree-pre -fdump-tree-optimized 
 --param sra-max-scalarization-size-Ospeed=32 } */
  
  int
  foo ()
 @@ -17,7 +17,4 @@ foo ()
  /* After late unrolling the above loop completely DOM should be
 able to optimize this to return 28.  */
  
 -/* See PR63679 and PR64159, if the target forces the initializer to memory 
 then
 -   DOM is not able to perform this optimization.  */
 -
 -/* { dg-final { scan-tree-dump return 28; optimized { xfail aarch64*-*-* 
 alpha*-*-* hppa*-*-* powerpc*-*-* sparc*-*-* s390*-*-* } } } */
 +/* { dg-final { scan-tree-dump return 28; optimized } } */
 diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
 index af35fcc..a3ff2df 100644
 --- a/gcc/tree-sra.c
 +++ b/gcc/tree-sra.c
 @@ -865,6 +865,17 @@ create_access (tree expr, gimple stmt, bool write)
else
  ptr = false;
  
 +  /* FORNOW: scan for uses of constant pool as we go along.  */
 +  if (TREE_CODE (base) == VAR_DECL  DECL_IN_CONSTANT_POOL (base)
 +   !bitmap_bit_p (candidate_bitmap, DECL_UID (base)))
 +{
 +  gcc_assert (!write);
 +  bitmap_set_bit (candidate_bitmap, DECL_UID (base));
 +  tree_node **slot = candidates-find_slot_with_hash (base, DECL_UID 
 (base),
 +   INSERT);
 +  *slot = base;
 +}
 +

I believe you only want to do this if (sra_mode ==
SRA_MODE_EARLY_INTRA || sra_mode == SRA_MODE_INTRA).

The idea of candidates is that we gather them in find_var_candidates
and ten we only eliminate them, this has the benefit of not worrying
about disqualifying a candidate and then erroneously re-adding it
later.  So if you could find a way to structure your code this way, I'd
much happier.  If it is impossible without traversing the whole
function just for that purpose, we may need some mechanism to prevent
us from making a disqualified decl a candidate again.  Or, if we come
to the conclusion that constant pool decls do not ever get
disqualified, a gcc_assert making sure it actually does not happen in
disqualify_candidate.

And of course at find_var_candidates time we check that all candidates
pass simple checks in maybe_add_sra_candidate.  I suppose many of them
do not make sense for constant pool decls but at least please have a
look whether that is the case for all of them or not.

if (!DECL_P (base) || !bitmap_bit_p (candidate_bitmap, DECL_UID (base)))
  return NULL;
  
 @@ -1025,6 +1036,37 @@ completely_scalarize (tree base, tree decl_type, 
 HOST_WIDE_INT offset, tree ref)
  }
  }
  
 +static tree
 +subst_initial (tree expr, tree var)

This needs a comment and a better name.  A name that would make it
clear this is for constant 

Re: [RFC 4/5] Handle constant-pool entries

2015-08-26 Thread Alan Lawrence

Jeff Law wrote:


The question I have is why this differs from the effects of patch #5. 
That would seem to indicate that there's things we're not getting into 
the candidate tables with this approach?!?


I'll answer this first, as I think (Richard and) Martin have identified enough 
other issues with this patch that will take longer to address but if you 
look at the context to the hunk in patch 5, it is iterating through the 
candidates (from patch 4), and then filtering out any candidates bigger than 
max-scalarization-size, which filtering patch 5 removes.


--Alan



[RFC 4/5] Handle constant-pool entries

2015-08-25 Thread Alan Lawrence
This makes SRA replace loads of records/arrays from constant pool entries,
with elementwise assignments of the constant values, hence, overcoming the
fundamental problem in PR/63679.

As a first pass, the approach I took was to look for constant-pool loads as
we scanned through other accesses, and add them as candidates there; to build a
constant replacement_decl for any such accesses in completely_scalarize; and to
use any existing replacement_decl rather than creating a variable in
create_access_replacement. (I did try using CONSTANT_CLASS_P in the latter, but
that does not allow addresses of labels, which can still end up in the constant
pool.)

Feedback as to the approach or how it might be better structured / fitted into
SRA, is solicited ;).

Bootstrapped + check-gcc on x86-none-linux-gnu, aarch64-none-linux-gnu and
arm-none-linux-gnueabihf, including with the next patch (rfc), which greatly 
increases the number of testcases in which this code is exercised!

Have also verified that the ssa-dom-cse-2.c scan-tree-dump test passes (using a 
stage 1 compiler only, without execution) on alpha, hppa, powerpc, sparc, avr, 
and sh.

gcc/ChangeLog:

* tree-sra.c (create_access): Scan for uses of constant pool and add
to candidates.
(subst_initial): New.
(scalarize_elem): Build replacement_decl using subst_initial.
(create_access_replacement): Use replacement_decl if set.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/ssa-dom-cse-2.c: Remove xfail, add --param
sra-max-scalarization-size-Ospeed.
---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c |  7 +---
 gcc/tree-sra.c| 56 +--
 2 files changed, 55 insertions(+), 8 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c
index 9eccdc9..b13d583 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options -O3 -fno-tree-fre -fno-tree-pre -fdump-tree-optimized } */
+/* { dg-options -O3 -fno-tree-fre -fno-tree-pre -fdump-tree-optimized --param 
sra-max-scalarization-size-Ospeed=32 } */
 
 int
 foo ()
@@ -17,7 +17,4 @@ foo ()
 /* After late unrolling the above loop completely DOM should be
able to optimize this to return 28.  */
 
-/* See PR63679 and PR64159, if the target forces the initializer to memory then
-   DOM is not able to perform this optimization.  */
-
-/* { dg-final { scan-tree-dump return 28; optimized { xfail aarch64*-*-* 
alpha*-*-* hppa*-*-* powerpc*-*-* sparc*-*-* s390*-*-* } } } */
+/* { dg-final { scan-tree-dump return 28; optimized } } */
diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
index af35fcc..a3ff2df 100644
--- a/gcc/tree-sra.c
+++ b/gcc/tree-sra.c
@@ -865,6 +865,17 @@ create_access (tree expr, gimple stmt, bool write)
   else
 ptr = false;
 
+  /* FORNOW: scan for uses of constant pool as we go along.  */
+  if (TREE_CODE (base) == VAR_DECL  DECL_IN_CONSTANT_POOL (base)
+   !bitmap_bit_p (candidate_bitmap, DECL_UID (base)))
+{
+  gcc_assert (!write);
+  bitmap_set_bit (candidate_bitmap, DECL_UID (base));
+  tree_node **slot = candidates-find_slot_with_hash (base, DECL_UID 
(base),
+ INSERT);
+  *slot = base;
+}
+
   if (!DECL_P (base) || !bitmap_bit_p (candidate_bitmap, DECL_UID (base)))
 return NULL;
 
@@ -1025,6 +1036,37 @@ completely_scalarize (tree base, tree decl_type, 
HOST_WIDE_INT offset, tree ref)
 }
 }
 
+static tree
+subst_initial (tree expr, tree var)
+{
+  if (TREE_CODE (expr) == VAR_DECL)
+{
+  gcc_assert (DECL_IN_CONSTANT_POOL (expr));
+  gcc_assert (expr == var);
+  return DECL_INITIAL (expr);
+}
+  if (TREE_CODE (expr) == COMPONENT_REF)
+{
+  gcc_assert (TREE_CODE (TREE_OPERAND (expr, 1)) == FIELD_DECL);
+  gcc_assert (TREE_OPERAND (expr, 2) == NULL_TREE);
+  return fold_build3 (COMPONENT_REF, TREE_TYPE (expr),
+ subst_initial (TREE_OPERAND (expr, 0), var),
+ TREE_OPERAND (expr, 1),
+ NULL_TREE);
+}
+  if (TREE_CODE (expr) == ARRAY_REF)
+{
+  gcc_assert (TREE_OPERAND (expr, 2) == NULL_TREE);
+  gcc_assert (TREE_OPERAND (expr, 3) == NULL_TREE);
+  return fold (build4 (ARRAY_REF, TREE_TYPE (expr),
+  subst_initial (TREE_OPERAND (expr, 0), var),
+  TREE_OPERAND (expr, 1),
+  NULL_TREE,
+  NULL_TREE));
+}
+  gcc_unreachable ();
+}
+
 static void
 scalarize_elem (tree base, HOST_WIDE_INT pos, HOST_WIDE_INT size,
 tree ref, tree type)
@@ -1033,6 +1075,9 @@ scalarize_elem (tree base, HOST_WIDE_INT pos, 
HOST_WIDE_INT size,
   {
 struct access *access = create_access_1 (base, pos, size);
 access-expr = 

Re: [RFC 4/5] Handle constant-pool entries

2015-08-25 Thread Jeff Law

On 08/25/2015 05:06 AM, Alan Lawrence wrote:

This makes SRA replace loads of records/arrays from constant pool entries,
with elementwise assignments of the constant values, hence, overcoming the
fundamental problem in PR/63679.

As a first pass, the approach I took was to look for constant-pool loads as
we scanned through other accesses, and add them as candidates there; to build a
constant replacement_decl for any such accesses in completely_scalarize; and to
use any existing replacement_decl rather than creating a variable in
create_access_replacement. (I did try using CONSTANT_CLASS_P in the latter, but
that does not allow addresses of labels, which can still end up in the constant
pool.)

Feedback as to the approach or how it might be better structured / fitted into
SRA, is solicited ;).

Bootstrapped + check-gcc on x86-none-linux-gnu, aarch64-none-linux-gnu and
arm-none-linux-gnueabihf, including with the next patch (rfc), which greatly 
increases the number of testcases in which this code is exercised!

Have also verified that the ssa-dom-cse-2.c scan-tree-dump test passes (using a 
stage 1 compiler only, without execution) on alpha, hppa, powerpc, sparc, avr, 
and sh.

gcc/ChangeLog:

* tree-sra.c (create_access): Scan for uses of constant pool and add
to candidates.
(subst_initial): New.
(scalarize_elem): Build replacement_decl using subst_initial.
(create_access_replacement): Use replacement_decl if set.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/ssa-dom-cse-2.c: Remove xfail, add --param
sra-max-scalarization-size-Ospeed.
---
  gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-cse-2.c |  7 +---
  gcc/tree-sra.c| 56 +--
  2 files changed, 55 insertions(+), 8 deletions(-)

diff --git a/gcc/tree-sra.c b/gcc/tree-sra.c
index af35fcc..a3ff2df 100644
--- a/gcc/tree-sra.c
+++ b/gcc/tree-sra.c
@@ -865,6 +865,17 @@ create_access (tree expr, gimple stmt, bool write)
else
  ptr = false;

+  /* FORNOW: scan for uses of constant pool as we go along.  */
I'm not sure why you have this marked as FORNOW.  If I'm reading all 
this code correctly, you're lazily adding items from the constant pool 
into the candidates table when you find they're used.  That seems better 
than walking the entire constant pool adding them all to the candidates.


I don't see this as fundamentally wrong or unclean.

The question I have is why this differs from the effects of patch #5. 
That would seem to indicate that there's things we're not getting into 
the candidate tables with this approach?!?





@@ -1025,6 +1036,37 @@ completely_scalarize (tree base, tree decl_type, 
HOST_WIDE_INT offset, tree ref)
  }
  }

+static tree
+subst_initial (tree expr, tree var)

Function comment.

I think this patch is fine with the function comment added and removing 
the FORNOW part of the comment in create_access.  It may be worth noting 
in create_access's comment that it can add new items to the candidates 
tables for constant pool entries.



Jeff