Re: [parloops, PR83126], Use cached affine_ivs canonicalize_loop_ivs

2018-03-22 Thread Tom de Vries

On 03/21/2018 04:43 PM, Richard Biener wrote:

On Wed, 21 Mar 2018, Tom de Vries wrote:


On 03/12/2018 01:14 PM, Richard Biener wrote:

On Thu, 22 Feb 2018, Tom de Vries wrote:


Hi,

this patch fixes an ICE in the parloops pass.

The ICE (when compiling the test-case in attached patch) follows from the
fact
that here in gen_parallel_loop the call to canonicalize_loop_ivs fails to
"base all the induction variables in LOOP on a single control one":
...
/* Base all the induction variables in LOOP on a single control one.*/
canonicalize_loop_ivs (loop, , true);
...

This is caused by the fact that for one of the induction variables,
simple_iv
no longer returns true (as was the case at the start of
gen_parallel_loop).
This seems to be triggered by the fact that during loop_version we call
scev_reset (although I'm not sure why when recalculating scev info we're
not
reaching the same conclusion as before).

I guess the real reason is that canonicalize_loop_ivs calls
create_iv first which in the parloop case with bump-in-latch true
and then incrementally re-writes PHIs, eventually wrecking other
PHIs SCEV.  In this case the 2nd PHIs evolution depends on the
first one but the rewritten ones depend on the new IV already.

So the better fix (maybe not 100% enough) would be to re-organize
canonicalize_loop_ivs to first do analysis of all PHIs and then
rewrite them all.



Focusing on the re-organize first, is this sort of what you had in mind?


Yes, though not sure why you need to have a hash-map here, just pushing
to a vector of PHIs would work, no?  Or alternatively a bitmap
of SSA versions so you have a way to match the gphi iterator with
the set of IVs.  But the vector should be sorted in PHI order so
just traversing the PHIs looking for the "next" PHI in the vector
would work I guess.

Does it help with the original issue btw?



Not as such.

The original problem happens before canonicalize_loop_ivs.

Perhaps the easiest way to demonstrate the problem is this patch:
...
diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c
index 3a788cc..801fc46 100644
--- a/gcc/tree-parloops.c
+++ b/gcc/tree-parloops.c
@@ -2254,6 +2254,38 @@ num_phis (basic_block bb, bool count_virtual_p)
   return nr_phis;
 }

+static unsigned
+get_nr_simple_ivs (struct loop *loop)
+{
+  unsigned cnt = 0;
+  basic_block *bbs = get_loop_body_in_dom_order (loop);
+
+  for (unsigned i = 0; i < loop->num_nodes; i++)
+{
+  basic_block bb = bbs[i];
+  if (bb->loop_father != loop)
+   continue;
+
+  for (gphi_iterator psi = gsi_start_phis (bb); !gsi_end_p (psi);
+  gsi_next ())
+   {
+ gphi *phi = psi.phi ();
+ tree res = PHI_RESULT (phi);
+ if (virtual_operand_p (res))
+   continue;
+
+ affine_iv iv;
+ if (!simple_iv (loop, loop, res, , true))
+   continue;
+ cnt++;
+   }
+}
+
+  free (bbs);
+
+  return cnt;
+}
+
 /* Generates code to execute the iterations of LOOP in N_THREADS
threads in parallel, which can be 0 if that number is to be determined
later.
@@ -2378,11 +2410,15 @@ gen_parallel_loop (struct loop *loop,
   initialize_original_copy_tables ();

   /* We assume that the loop usually iterates a lot.  */
+  fprintf (stderr, "nr simple ivs: %u @ line %d\n",
+  get_nr_simple_ivs (loop), __LINE__);
   loop_version (loop, many_iterations_cond, NULL,
profile_probability::likely (),
profile_probability::unlikely (),
profile_probability::likely (),
profile_probability::unlikely (), true);
+  fprintf (stderr, "nr simple ivs: %u @ line %d\n",
+  get_nr_simple_ivs (loop), __LINE__);
   update_ssa (TODO_update_ssa);
   free_original_copy_tables ();
 }
...

and this output for pr83126.c:
...
nr simple ivs: 3 @ line 2414
nr simple ivs: 2 @ line 2421
...

I've written attached follow-up patch that works around that problem by 
analyzing the phis before calling loop_version. It works for the 
example, and bootstraps fine, but I'm not familiar enough with 
loop_version to known whether this is safe or not.


Thanks,
- Tom
[parloops] Fix canonicalize_loop_ivs in gen_parallel_loop

2018-03-21  Tom de Vries  

	PR tree-optimization/83126
	* tree-parloops.c (gen_parallel_loop): Inline canonicalize_loop_ivs.
	Assert that canonicalize_loop_ivs succeeds.
	* tree-ssa-loop-manip.c (get_all_phi_affine_ivs): Remove static.
	(struct phi_affine_iv): Move ...
	* tree-ssa-loop-manip.h (struct phi_affine_iv): ... here.
	(get_all_phi_affine_ivs, canonicalize_loop_ivs_1): Declare.

---
 gcc/tree-parloops.c   | 27 ++-
 gcc/tree-ssa-loop-manip.c |  8 +---
 gcc/tree-ssa-loop-manip.h | 10 ++
 3 files changed, 17 insertions(+), 28 deletions(-)

diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c
index 3a788cc..9c2d19a 100644
--- a/gcc/tree-parloops.c
+++ 

Re: [parloops, PR83126], Use cached affine_ivs canonicalize_loop_ivs

2018-03-22 Thread Tom de Vries

On 03/21/2018 04:43 PM, Richard Biener wrote:

On Wed, 21 Mar 2018, Tom de Vries wrote:


On 03/12/2018 01:14 PM, Richard Biener wrote:

On Thu, 22 Feb 2018, Tom de Vries wrote:


Hi,

this patch fixes an ICE in the parloops pass.

The ICE (when compiling the test-case in attached patch) follows from the
fact
that here in gen_parallel_loop the call to canonicalize_loop_ivs fails to
"base all the induction variables in LOOP on a single control one":
...
/* Base all the induction variables in LOOP on a single control one.*/
canonicalize_loop_ivs (loop, , true);
...

This is caused by the fact that for one of the induction variables,
simple_iv
no longer returns true (as was the case at the start of
gen_parallel_loop).
This seems to be triggered by the fact that during loop_version we call
scev_reset (although I'm not sure why when recalculating scev info we're
not
reaching the same conclusion as before).

I guess the real reason is that canonicalize_loop_ivs calls
create_iv first which in the parloop case with bump-in-latch true
and then incrementally re-writes PHIs, eventually wrecking other
PHIs SCEV.  In this case the 2nd PHIs evolution depends on the
first one but the rewritten ones depend on the new IV already.

So the better fix (maybe not 100% enough) would be to re-organize
canonicalize_loop_ivs to first do analysis of all PHIs and then
rewrite them all.



Focusing on the re-organize first, is this sort of what you had in mind?


Yes, though not sure why you need to have a hash-map here, just pushing
to a vector of PHIs would work, no?  Or alternatively a bitmap
of SSA versions so you have a way to match the gphi iterator with
the set of IVs.  But the vector should be sorted in PHI order so
just traversing the PHIs looking for the "next" PHI in the vector
would work I guess.


The reason I chose a hash map was robustness: to allow the maximum 
amount of change between analysis and transformation.


I've now used a vector of this struct:
...
struct phi_affine_iv
{
  gphi *phi;
  affine_iv iv;
};
...
and rewritten the transformation part to iterate over the vector.

Bootstrapped and reg-tested on x86_64.

Thanks,
- Tom
Split canonicalize_loop_ivs in analyze and rewrite part

2018-03-21  Tom de Vries  

	* tree-ssa-loop-manip.c (rewrite_phi_with_iv): Remove loop parameter.
	Add iv parameter.  Remove early-out tests.
	(struct phi_affine_iv): New struct.
	(get_all_phi_affine_ivs): New function.  Factor out of ...
	(rewrite_all_phi_nodes_with_iv): ... this.  Remove loop argument.  Add
	simple_ivs argument.  Rewrite to iterate over simple_ivs.
	(canonicalize_loop_ivs_1): Factor out of ...
	(canonicalize_loop_ivs): ... this. Call get_all_phi_affine_ivs and pass
	result as argument to canonicalize_loop_ivs_1.

---
 gcc/tree-ssa-loop-manip.c | 114 ++
 1 file changed, 76 insertions(+), 38 deletions(-)

diff --git a/gcc/tree-ssa-loop-manip.c b/gcc/tree-ssa-loop-manip.c
index bf425af..ecda325 100644
--- a/gcc/tree-ssa-loop-manip.c
+++ b/gcc/tree-ssa-loop-manip.c
@@ -1430,79 +1430,99 @@ tree_unroll_loop (struct loop *loop, unsigned factor,
induction variable MAIN_IV and insert the generated code at GSI.  */
 
 static void
-rewrite_phi_with_iv (loop_p loop,
-		 gphi_iterator *psi,
+rewrite_phi_with_iv (gphi_iterator *psi,
 		 gimple_stmt_iterator *gsi,
-		 tree main_iv)
+		 tree main_iv, affine_iv *iv)
 {
-  affine_iv iv;
   gassign *stmt;
   gphi *phi = psi->phi ();
   tree atype, mtype, val, res = PHI_RESULT (phi);
 
-  if (virtual_operand_p (res) || res == main_iv)
-{
-  gsi_next (psi);
-  return;
-}
-
-  if (!simple_iv (loop, loop, res, , true))
-{
-  gsi_next (psi);
-  return;
-}
-
   remove_phi_node (psi, false);
 
   atype = TREE_TYPE (res);
   mtype = POINTER_TYPE_P (atype) ? sizetype : atype;
-  val = fold_build2 (MULT_EXPR, mtype, unshare_expr (iv.step),
+  val = fold_build2 (MULT_EXPR, mtype, unshare_expr (iv->step),
 		 fold_convert (mtype, main_iv));
   val = fold_build2 (POINTER_TYPE_P (atype)
 		 ? POINTER_PLUS_EXPR : PLUS_EXPR,
-		 atype, unshare_expr (iv.base), val);
+		 atype, unshare_expr (iv->base), val);
   val = force_gimple_operand_gsi (gsi, val, false, NULL_TREE, true,
   GSI_SAME_STMT);
   stmt = gimple_build_assign (res, val);
   gsi_insert_before (gsi, stmt, GSI_SAME_STMT);
 }
 
-/* Rewrite all the phi nodes of LOOP in function of the main induction
-   variable MAIN_IV.  */
+struct phi_affine_iv
+{
+  gphi *phi;
+  affine_iv iv;
+};
+
+/* Return map of phi node result to affine_iv, for all phis in LOOPS.  */
 
 static void
-rewrite_all_phi_nodes_with_iv (loop_p loop, tree main_iv)
+get_all_phi_affine_ivs (struct loop *loop,
+			auto_vec *simple_ivs)
 {
-  unsigned i;
   basic_block *bbs = get_loop_body_in_dom_order (loop);
-  gphi_iterator psi;
 
-  for (i = 0; i < loop->num_nodes; i++)
+  for (unsigned i = 0; i < loop->num_nodes; i++)
 

Re: [parloops, PR83126], Use cached affine_ivs canonicalize_loop_ivs

2018-03-21 Thread Richard Biener
On Wed, 21 Mar 2018, Tom de Vries wrote:

> On 03/12/2018 01:14 PM, Richard Biener wrote:
> > On Thu, 22 Feb 2018, Tom de Vries wrote:
> > 
> > > Hi,
> > > 
> > > this patch fixes an ICE in the parloops pass.
> > > 
> > > The ICE (when compiling the test-case in attached patch) follows from the
> > > fact
> > > that here in gen_parallel_loop the call to canonicalize_loop_ivs fails to
> > > "base all the induction variables in LOOP on a single control one":
> > > ...
> > >/* Base all the induction variables in LOOP on a single control one.*/
> > >canonicalize_loop_ivs (loop, , true);
> > > ...
> > > 
> > > This is caused by the fact that for one of the induction variables,
> > > simple_iv
> > > no longer returns true (as was the case at the start of
> > > gen_parallel_loop).
> > > This seems to be triggered by the fact that during loop_version we call
> > > scev_reset (although I'm not sure why when recalculating scev info we're
> > > not
> > > reaching the same conclusion as before).
> > I guess the real reason is that canonicalize_loop_ivs calls
> > create_iv first which in the parloop case with bump-in-latch true
> > and then incrementally re-writes PHIs, eventually wrecking other
> > PHIs SCEV.  In this case the 2nd PHIs evolution depends on the
> > first one but the rewritten ones depend on the new IV already.
> > 
> > So the better fix (maybe not 100% enough) would be to re-organize
> > canonicalize_loop_ivs to first do analysis of all PHIs and then
> > rewrite them all.
> > 
> 
> Focusing on the re-organize first, is this sort of what you had in mind?

Yes, though not sure why you need to have a hash-map here, just pushing
to a vector of PHIs would work, no?  Or alternatively a bitmap
of SSA versions so you have a way to match the gphi iterator with
the set of IVs.  But the vector should be sorted in PHI order so
just traversing the PHIs looking for the "next" PHI in the vector
would work I guess.

Does it help with the original issue btw?

Richard.

> Bootstrapped and reg-tested on x86_64.
> 
> Thanks,
> - Tom
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [parloops, PR83126], Use cached affine_ivs canonicalize_loop_ivs

2018-03-21 Thread Bin.Cheng
On Wed, Mar 21, 2018 at 3:06 PM, Tom de Vries  wrote:
> On 03/12/2018 01:14 PM, Richard Biener wrote:
>>
>> On Thu, 22 Feb 2018, Tom de Vries wrote:
>>
>>> Hi,
>>>
>>> this patch fixes an ICE in the parloops pass.
>>>
>>> The ICE (when compiling the test-case in attached patch) follows from the
>>> fact
>>> that here in gen_parallel_loop the call to canonicalize_loop_ivs fails to
>>> "base all the induction variables in LOOP on a single control one":
>>> ...
>>>/* Base all the induction variables in LOOP on a single control one.*/
>>>canonicalize_loop_ivs (loop, , true);
>>> ...
>>>
>>> This is caused by the fact that for one of the induction variables,
>>> simple_iv
>>> no longer returns true (as was the case at the start of
>>> gen_parallel_loop).
>>> This seems to be triggered by the fact that during loop_version we call
>>> scev_reset (although I'm not sure why when recalculating scev info we're
>>> not
>>> reaching the same conclusion as before).
>>
>> I guess the real reason is that canonicalize_loop_ivs calls
>> create_iv first which in the parloop case with bump-in-latch true
>> and then incrementally re-writes PHIs, eventually wrecking other
>> PHIs SCEV.  In this case the 2nd PHIs evolution depends on the
>> first one but the rewritten ones depend on the new IV already.
>>
>> So the better fix (maybe not 100% enough) would be to re-organize
>> canonicalize_loop_ivs to first do analysis of all PHIs and then
>> rewrite them all.
>>
>
> Focusing on the re-organize first, is this sort of what you had in mind?

FYI, we have multiple interfaces manipulating on IVs, sometimes in a similar
way.  An example would be create_canonical_iv in tree-ssa-loop-ivcanon.c.

One proposal for discussion is to refactor such interfaces and put them in a
single source file, like tree-ssa-loop-ivopts.  For example of
create_canonical_iv,
it can be removed by simply adding the corresponding candidate in IVOPTs.

Thanks,
bin
>
> Bootstrapped and reg-tested on x86_64.
>
> Thanks,
> - Tom


Re: [parloops, PR83126], Use cached affine_ivs canonicalize_loop_ivs

2018-03-21 Thread Tom de Vries

On 03/12/2018 01:14 PM, Richard Biener wrote:

On Thu, 22 Feb 2018, Tom de Vries wrote:


Hi,

this patch fixes an ICE in the parloops pass.

The ICE (when compiling the test-case in attached patch) follows from the fact
that here in gen_parallel_loop the call to canonicalize_loop_ivs fails to
"base all the induction variables in LOOP on a single control one":
...
   /* Base all the induction variables in LOOP on a single control one.*/
   canonicalize_loop_ivs (loop, , true);
...

This is caused by the fact that for one of the induction variables, simple_iv
no longer returns true (as was the case at the start of gen_parallel_loop).
This seems to be triggered by the fact that during loop_version we call
scev_reset (although I'm not sure why when recalculating scev info we're not
reaching the same conclusion as before).

I guess the real reason is that canonicalize_loop_ivs calls
create_iv first which in the parloop case with bump-in-latch true
and then incrementally re-writes PHIs, eventually wrecking other
PHIs SCEV.  In this case the 2nd PHIs evolution depends on the
first one but the rewritten ones depend on the new IV already.

So the better fix (maybe not 100% enough) would be to re-organize
canonicalize_loop_ivs to first do analysis of all PHIs and then
rewrite them all.



Focusing on the re-organize first, is this sort of what you had in mind?

Bootstrapped and reg-tested on x86_64.

Thanks,
- Tom
Split canonicalize_loop_ivs in analyze and rewrite parts

2018-03-21  Tom de Vries  

	* tree-ssa-loop-manip.c (rewrite_phi_with_iv): Remove loop parameter.
	Add iv parameter.  Move early-out tests ...
	(rewrite_all_phi_nodes_with_iv): ... here.  Remove loop argument and add
	iv argument to rewrite_phi_with_iv call.
	(get_all_phi_affine_ivs): New function.
	(canonicalize_loop_ivs): Call get_all_phi_affine_ivs and pass result as
	argument to rewrite_all_phi_nodes_with_iv.

---
 gcc/tree-ssa-loop-manip.c | 82 +++
 1 file changed, 61 insertions(+), 21 deletions(-)

diff --git a/gcc/tree-ssa-loop-manip.c b/gcc/tree-ssa-loop-manip.c
index bf425af..15a5795 100644
--- a/gcc/tree-ssa-loop-manip.c
+++ b/gcc/tree-ssa-loop-manip.c
@@ -43,6 +43,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-scalar-evolution.h"
 #include "params.h"
 #include "tree-inline.h"
+#include "hash-map.h"
 
 /* All bitmaps for rewriting into loop-closed SSA go on this obstack,
so that we can free them all at once.  */
@@ -1430,48 +1431,65 @@ tree_unroll_loop (struct loop *loop, unsigned factor,
induction variable MAIN_IV and insert the generated code at GSI.  */
 
 static void
-rewrite_phi_with_iv (loop_p loop,
-		 gphi_iterator *psi,
+rewrite_phi_with_iv (gphi_iterator *psi,
 		 gimple_stmt_iterator *gsi,
-		 tree main_iv)
+		 tree main_iv, affine_iv *iv)
 {
-  affine_iv iv;
   gassign *stmt;
   gphi *phi = psi->phi ();
   tree atype, mtype, val, res = PHI_RESULT (phi);
 
-  if (virtual_operand_p (res) || res == main_iv)
-{
-  gsi_next (psi);
-  return;
-}
-
-  if (!simple_iv (loop, loop, res, , true))
-{
-  gsi_next (psi);
-  return;
-}
-
   remove_phi_node (psi, false);
 
   atype = TREE_TYPE (res);
   mtype = POINTER_TYPE_P (atype) ? sizetype : atype;
-  val = fold_build2 (MULT_EXPR, mtype, unshare_expr (iv.step),
+  val = fold_build2 (MULT_EXPR, mtype, unshare_expr (iv->step),
 		 fold_convert (mtype, main_iv));
   val = fold_build2 (POINTER_TYPE_P (atype)
 		 ? POINTER_PLUS_EXPR : PLUS_EXPR,
-		 atype, unshare_expr (iv.base), val);
+		 atype, unshare_expr (iv->base), val);
   val = force_gimple_operand_gsi (gsi, val, false, NULL_TREE, true,
   GSI_SAME_STMT);
   stmt = gimple_build_assign (res, val);
   gsi_insert_before (gsi, stmt, GSI_SAME_STMT);
 }
 
+/* Return map of phi node result to affine_iv, for all phis in LOOPS.  */
+
+static void
+get_all_phi_affine_ivs (loop_p loop, hash_map *simple_ivs)
+{
+  unsigned i;
+  basic_block *bbs = get_loop_body_in_dom_order (loop);
+  gphi_iterator psi;
+
+  for (i = 0; i < loop->num_nodes; i++)
+{
+  basic_block bb = bbs[i];
+
+  if (bb->loop_father != loop)
+	continue;
+
+  for (psi = gsi_start_phis (bb); !gsi_end_p (psi);
+	   gsi_next ())
+	{
+	  gphi *phi = psi.phi ();
+	  affine_iv iv;
+	  tree res = PHI_RESULT (phi);
+	  if (simple_iv (loop, loop, res, , true))
+	simple_ivs->put (res, iv);
+	}
+}
+
+  free (bbs);
+}
+
 /* Rewrite all the phi nodes of LOOP in function of the main induction
variable MAIN_IV.  */
 
 static void
-rewrite_all_phi_nodes_with_iv (loop_p loop, tree main_iv)
+rewrite_all_phi_nodes_with_iv (loop_p loop, tree main_iv,
+			   hash_map *simple_ivs)
 {
   unsigned i;
   basic_block *bbs = get_loop_body_in_dom_order (loop);
@@ -1486,7 +1504,25 @@ rewrite_all_phi_nodes_with_iv (loop_p loop, tree main_iv)
 	continue;
 
   for (psi = 

Re: [parloops, PR83126], Use cached affine_ivs canonicalize_loop_ivs

2018-03-21 Thread Richard Biener
On Wed, 21 Mar 2018, Tom de Vries wrote:

> On 03/12/2018 01:14 PM, Richard Biener wrote:
> > On Thu, 22 Feb 2018, Tom de Vries wrote:
> > > I can
> > > rework the bit of the patch that adds an assert after
> > > canonicalize_loop_ivs
> > > into a patch that aborts the optimization instead of ICE-ing.
> > 
> > I think that's something reasonable anyway.
> > 
> 
> Patch attached below implements that approach.
> 
> The difference between before parloops, and after ompexpssa2 is this (showing
> the effects of canonicalize_loop_ivs):
> ...
> $ diff -u pr83126.c.156t.dce5 pr83126.c.158t.ompexpssa2
>...
>  ew (short unsigned int c9, int stuff)
>  {
>int * fd;
> @@ -9,53 +103,53 @@
>int _3;
>short unsigned int _4;
>unsigned int _5;
> +  unsigned int ivtmp_8;
>unsigned int _22;
>unsigned int ivtmp_24;
>unsigned int ivtmp_26;
> +  unsigned int ivtmp_28;
> 
> [local count: 11811]:
> 
> [local count: 118111601]:
> -  # c9_1 = PHI 
> +  # c9_1 = PHI 
>_2 = (long int) c9_1;
>fd_13 = (int *) _2;
>_3 = *fd_13;
> 
> [local count: 1073741825]:
>if (_3 != 0)
> -goto ; [11.00%]
> +goto ; [11.00%]
>else
>  goto ; [89.00%]
> 
> [local count: 955630225]:
>goto ; [100.00%]
> 
> -   [local count: 118111600]:
> +   [local count: 94489281]:
> 
> -   [local count: 118111600]:
> -
> -   [local count: 236258638]:
> -  # _22 = PHI <0(10), _5(8)>
> -  # c9_23 = PHI 
> -  # e1_27 = PHI <0(10), e1_17(8)>
> -  # ivtmp_26 = PHI <2(10), ivtmp_24(8)>
> +   [local count: 189006911]:
> +  # c9_23 = PHI 
> +  # e1_27 = PHI 
> +  # ivtmp_8 = PHI 
> +  _22 = ivtmp_8;
> +  ivtmp_26 = 2 - ivtmp_8;
>_4 = (short unsigned int) e1_27;
>c9_14 = _4 * c9_23;
>_5 = _22 + 1;
>e1_17 = (int) _5;
>ivtmp_24 = ivtmp_26 - 1;
> -  if (ivtmp_24 != 0)
> +  if (ivtmp_8 < 1)
>  goto ; [66.67%]
>else
>  goto ; [33.33%]
> 
> -   [local count: 157513634]:
> +   [local count: 126010908]:
> +  ivtmp_28 = ivtmp_8 + 1;
>goto ; [100.00%]
> 
> [local count: 78745004]:
># c9_21 = PHI 
> -
> -   [local count: 78745004]:
>goto ; [100.00%]
> 
>  }
> ...
> 
> 
> Bootstrapped and reg-tested on x86_64.
> 
> OK for stage4 trunk?

OK.

Thanks,
Richard.


Re: [parloops, PR83126], Use cached affine_ivs canonicalize_loop_ivs

2018-03-21 Thread Tom de Vries

On 03/12/2018 01:14 PM, Richard Biener wrote:

On Thu, 22 Feb 2018, Tom de Vries wrote:

I can
rework the bit of the patch that adds an assert after canonicalize_loop_ivs
into a patch that aborts the optimization instead of ICE-ing.


I think that's something reasonable anyway.



Patch attached below implements that approach.

The difference between before parloops, and after ompexpssa2 is this 
(showing the effects of canonicalize_loop_ivs):

...
$ diff -u pr83126.c.156t.dce5 pr83126.c.158t.ompexpssa2
   ...
 ew (short unsigned int c9, int stuff)
 {
   int * fd;
@@ -9,53 +103,53 @@
   int _3;
   short unsigned int _4;
   unsigned int _5;
+  unsigned int ivtmp_8;
   unsigned int _22;
   unsigned int ivtmp_24;
   unsigned int ivtmp_26;
+  unsigned int ivtmp_28;

[local count: 11811]:

[local count: 118111601]:
-  # c9_1 = PHI 
+  # c9_1 = PHI 
   _2 = (long int) c9_1;
   fd_13 = (int *) _2;
   _3 = *fd_13;

[local count: 1073741825]:
   if (_3 != 0)
-goto ; [11.00%]
+goto ; [11.00%]
   else
 goto ; [89.00%]

[local count: 955630225]:
   goto ; [100.00%]

-   [local count: 118111600]:
+   [local count: 94489281]:

-   [local count: 118111600]:
-
-   [local count: 236258638]:
-  # _22 = PHI <0(10), _5(8)>
-  # c9_23 = PHI 
-  # e1_27 = PHI <0(10), e1_17(8)>
-  # ivtmp_26 = PHI <2(10), ivtmp_24(8)>
+   [local count: 189006911]:
+  # c9_23 = PHI 
+  # e1_27 = PHI 
+  # ivtmp_8 = PHI 
+  _22 = ivtmp_8;
+  ivtmp_26 = 2 - ivtmp_8;
   _4 = (short unsigned int) e1_27;
   c9_14 = _4 * c9_23;
   _5 = _22 + 1;
   e1_17 = (int) _5;
   ivtmp_24 = ivtmp_26 - 1;
-  if (ivtmp_24 != 0)
+  if (ivtmp_8 < 1)
 goto ; [66.67%]
   else
 goto ; [33.33%]

-   [local count: 157513634]:
+   [local count: 126010908]:
+  ivtmp_28 = ivtmp_8 + 1;
   goto ; [100.00%]

[local count: 78745004]:
   # c9_21 = PHI 
-
-   [local count: 78745004]:
   goto ; [100.00%]

 }
...


Bootstrapped and reg-tested on x86_64.

OK for stage4 trunk?

Thanks,
- Tom
[parloops] Handle canonicalize_loop_ivs failure

2018-03-19  Tom de Vries  

	PR tree-optimization/83126
	* tree-parloops.c (num_phis): New function.
	(gen_parallel_loop): Detect and handle canonicalize_loop_ivs failure.

	* gcc.dg/graphite/pr83126.c: New test.

---
 gcc/testsuite/gcc.dg/graphite/pr83126.c | 19 
 gcc/tree-parloops.c | 39 +
 2 files changed, 58 insertions(+)

diff --git a/gcc/testsuite/gcc.dg/graphite/pr83126.c b/gcc/testsuite/gcc.dg/graphite/pr83126.c
new file mode 100644
index 000..663d059
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/graphite/pr83126.c
@@ -0,0 +1,19 @@
+/* { dg-additional-options "-w -ftree-parallelize-loops=2 -floop-parallelize-all -O1" }  */
+
+void
+ew (unsigned short int c9, int stuff)
+{
+  int e1;
+
+  for (;;)
+{
+  unsigned int *by = 
+  int *fd = 
+
+  *fd = c9;
+  fd = *fd;
+  if (*fd != 0)
+	for (*by = 0; *by < 2; ++*by)
+	  c9 *= e1;
+}
+}
diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c
index e44ad5e..3a788cc 100644
--- a/gcc/tree-parloops.c
+++ b/gcc/tree-parloops.c
@@ -2235,6 +2235,25 @@ create_parallel_loop (struct loop *loop, tree loop_fn, tree data,
   calculate_dominance_info (CDI_DOMINATORS);
 }
 
+/* Return number of phis in bb.  If COUNT_VIRTUAL_P is false, don't count the
+   virtual phi.  */
+
+static unsigned int
+num_phis (basic_block bb, bool count_virtual_p)
+{
+  unsigned int nr_phis = 0;
+  gphi_iterator gsi;
+  for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next ())
+{
+  if (!count_virtual_p && virtual_operand_p (PHI_RESULT (gsi.phi (
+	continue;
+
+  nr_phis++;
+}
+
+  return nr_phis;
+}
+
 /* Generates code to execute the iterations of LOOP in N_THREADS
threads in parallel, which can be 0 if that number is to be determined
later.
@@ -2370,6 +2389,26 @@ gen_parallel_loop (struct loop *loop,
 
   /* Base all the induction variables in LOOP on a single control one.  */
   canonicalize_loop_ivs (loop, , true);
+  if (num_phis (loop->header, false) != reduction_list->elements () + 1)
+{
+  /* The call to canonicalize_loop_ivs above failed to "base all the
+	 induction variables in LOOP on a single control one".  Do damage
+	 control.  */
+  basic_block preheader = loop_preheader_edge (loop)->src;
+  basic_block cond_bb = single_pred (preheader);
+  gcond *cond = as_a  (gsi_stmt (gsi_last_bb (cond_bb)));
+  gimple_cond_make_true (cond);
+  update_stmt (cond);
+  /* We've gotten rid of the duplicate loop created by loop_version, but
+	 we can't undo whatever canonicalize_loop_ivs has done.
+	 TODO: Fix this properly by ensuring that the call to
+	 canonicalize_loop_ivs succeeds.  */
+  if (dump_file
+	  && (dump_flags & TDF_DETAILS))
+	fprintf (dump_file, 

Re: [parloops, PR83126], Use cached affine_ivs canonicalize_loop_ivs

2018-03-12 Thread Richard Biener
On Thu, 22 Feb 2018, Tom de Vries wrote:

> Hi,
> 
> this patch fixes an ICE in the parloops pass.
> 
> The ICE (when compiling the test-case in attached patch) follows from the fact
> that here in gen_parallel_loop the call to canonicalize_loop_ivs fails to
> "base all the induction variables in LOOP on a single control one":
> ...
>   /* Base all the induction variables in LOOP on a single control one.*/
>   canonicalize_loop_ivs (loop, , true);
> ...
> 
> This is caused by the fact that for one of the induction variables, simple_iv
> no longer returns true (as was the case at the start of gen_parallel_loop).
> This seems to be triggered by the fact that during loop_version we call
> scev_reset (although I'm not sure why when recalculating scev info we're not
> reaching the same conclusion as before).

I guess the real reason is that canonicalize_loop_ivs calls
create_iv first which in the parloop case with bump-in-latch true
and then incrementally re-writes PHIs, eventually wrecking other
PHIs SCEV.  In this case the 2nd PHIs evolution depends on the
first one but the rewritten ones depend on the new IV already.

So the better fix (maybe not 100% enough) would be to re-organize
canonicalize_loop_ivs to first do analysis of all PHIs and then
rewrite them all.

> 
> The patch fixes the ICE by caching the affine_ivs as calculated by simple_iv
> before calling loop_version, and reusing those in canonicalize_loop_ivs
> (instead of calling simple_iv again).
> 
> Bootstrapped and reg-tested on x86_64.
> 
> OK for stage1?

First of all please to not use std::map but instead our own hash_map,
the proper way of including  would be to define INCLUDE_MAP
before the system.h include.

But see above.

> I'm not sure if this is minimal/low-impact enough for stage4. If not, I can
> rework the bit of the patch that adds an assert after canonicalize_loop_ivs
> into a patch that aborts the optimization instead of ICE-ing.

I think that's something reasonable anyway.

Thanks,
Richard.

> Thanks,
> - Tom
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


[parloops, PR83126], Use cached affine_ivs canonicalize_loop_ivs

2018-02-22 Thread Tom de Vries

Hi,

this patch fixes an ICE in the parloops pass.

The ICE (when compiling the test-case in attached patch) follows from 
the fact that here in gen_parallel_loop the call to 
canonicalize_loop_ivs fails to "base all the induction variables in LOOP 
on a single control one":

...
  /* Base all the induction variables in LOOP on a single control one.*/
  canonicalize_loop_ivs (loop, , true);
...

This is caused by the fact that for one of the induction variables, 
simple_iv no longer returns true (as was the case at the start of 
gen_parallel_loop). This seems to be triggered by the fact that during 
loop_version we call scev_reset (although I'm not sure why when 
recalculating scev info we're not reaching the same conclusion as before).


The patch fixes the ICE by caching the affine_ivs as calculated by 
simple_iv before calling loop_version, and reusing those in 
canonicalize_loop_ivs (instead of calling simple_iv again).


Bootstrapped and reg-tested on x86_64.

OK for stage1?

I'm not sure if this is minimal/low-impact enough for stage4. If not, I 
can rework the bit of the patch that adds an assert after 
canonicalize_loop_ivs into a patch that aborts the optimization instead 
of ICE-ing.


Thanks,
- Tom
[parloops] Use cached affine_ivs canonicalize_loop_ivs

2018-02-22  Tom de Vries  

	PR tree-optimization/83126
	* tree-ssa-loop-manip.c (rewrite_phi_with_iv): Add and handle piv
	parameter.
	(rewrite_all_phi_nodes_with_iv, canonicalize_loop_ivs): Add and handle
	simple_ivs parameter.
	* tree-ssa-loop-manip.h (canonicalize_loop_ivs): Declare simple_ivs
	parameter and default to NULL.
	* tree-parloops.c (num_phis): New function.
	(gen_parallel_loop): Add simple_ivs argument to canonicalize_loop_ivs
	call.

---
 gcc/testsuite/gcc.dg/graphite/pr83126.c | 19 +++
 gcc/tree-parloops.c | 43 +++--
 gcc/tree-ssa-loop-manip.c   | 41 ---
 gcc/tree-ssa-loop-manip.h   |  6 -
 4 files changed, 97 insertions(+), 12 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/graphite/pr83126.c b/gcc/testsuite/gcc.dg/graphite/pr83126.c
new file mode 100644
index 000..663d059
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/graphite/pr83126.c
@@ -0,0 +1,19 @@
+/* { dg-additional-options "-w -ftree-parallelize-loops=2 -floop-parallelize-all -O1" }  */
+
+void
+ew (unsigned short int c9, int stuff)
+{
+  int e1;
+
+  for (;;)
+{
+  unsigned int *by = 
+  int *fd = 
+
+  *fd = c9;
+  fd = *fd;
+  if (*fd != 0)
+	for (*by = 0; *by < 2; ++*by)
+	  c9 *= e1;
+}
+}
diff --git a/gcc/tree-parloops.c b/gcc/tree-parloops.c
index e44ad5e..5971680 100644
--- a/gcc/tree-parloops.c
+++ b/gcc/tree-parloops.c
@@ -2235,6 +2235,25 @@ create_parallel_loop (struct loop *loop, tree loop_fn, tree data,
   calculate_dominance_info (CDI_DOMINATORS);
 }
 
+/* Return number of phis in bb.  If COUNT_VIRTUAL_P is false, don't count the
+   virtual phi.  */
+
+static unsigned int
+num_phis (basic_block bb, bool count_virtual_p)
+{
+  unsigned int nr_phis = 0;
+  gphi_iterator gsi;
+  for (gsi = gsi_start_phis (bb); !gsi_end_p (gsi); gsi_next ())
+{
+  if (!count_virtual_p && virtual_operand_p (PHI_RESULT (gsi.phi (
+	continue;
+
+  nr_phis++;
+}
+
+  return nr_phis;
+}
+
 /* Generates code to execute the iterations of LOOP in N_THREADS
threads in parallel, which can be 0 if that number is to be determined
later.
@@ -2326,6 +2345,24 @@ gen_parallel_loop (struct loop *loop,
   if (stmts)
 gsi_insert_seq_on_edge_immediate (loop_preheader_edge (loop), stmts);
 
+  /* The loop_version call below calls gimple_duplicate_loop_to_header_edge,
+ which calls scev_reset.  So a loop header phi node that classifies as
+ simple_iv here, may not classify as such anymore after loop_version, which
+ will make canonicalize_loop_ivs fail to "base all the induction variables
+ in LOOP on a single control one".  We fix this by storing the affine_iv
+ result of the simple_iv call in a map from phi node result to affine_iv,
+ and passing that map as an argument to canonicalize_loop_ivs.  */
+  std::map simple_ivs;
+  gphi_iterator psi;
+  for (psi = gsi_start_phis (loop->header); !gsi_end_p (psi); gsi_next ())
+{
+  gphi *phi = psi.phi ();
+  affine_iv iv;
+  tree res = PHI_RESULT (phi);
+  if (simple_iv (loop, loop, res, , true))
+	simple_ivs.insert (std::pair (res, iv));
+}
+
   if (!oacc_kernels_p)
 {
   if (loop->inner)
@@ -2364,12 +2401,14 @@ gen_parallel_loop (struct loop *loop,
 		profile_probability::unlikely (),
 		profile_probability::likely (),
 		profile_probability::unlikely (), true);
-  update_ssa (TODO_update_ssa);
   free_original_copy_tables ();
 }
 
   /* Base all the induction variables in LOOP on a single control one.  */
-  canonicalize_loop_ivs (loop, ,