Re: Expand oacc kernels after pass_fre

2015-06-04 Thread Tom de Vries

On 22/04/15 09:36, Richard Biener wrote:

On Tue, 21 Apr 2015, Thomas Schwinge wrote:


Hi!

On Tue, 25 Nov 2014 12:22:02 +0100, Tom de Vries  wrote:

On 24-11-14 11:56, Tom de Vries wrote:

On 15-11-14 18:19, Tom de Vries wrote:

On 15-11-14 13:14, Tom de Vries wrote:

I'm submitting a patch series with initial support for the oacc kernels
directive.

The patch series uses pass_parallelize_loops to implement parallelization of
loops in the oacc kernels region.

The patch series consists of these 8 patches:
...
  1  Expand oacc kernels after pass_build_ealias
  2  Add pass_oacc_kernels
  3  Add pass_ch_oacc_kernels to pass_oacc_kernels
  4  Add pass_tree_loop_{init,done} to pass_oacc_kernels
  5  Add pass_loop_im to pass_oacc_kernels
  6  Add pass_ccp to pass_oacc_kernels
  7  Add pass_parloops_oacc_kernels to pass_oacc_kernels
  8  Do simple omp lowering for no address taken var
...


This patch moves omp expansion of the oacc kernels directive to after
pass_build_ealias.

The rationale is that in order to use pass_parallelize_loops for analysis and
transformation of an oacc kernels region, we postpone omp expansion of that
region until the earliest point in the pass list where enough information is
availabe to run pass_parallelize_loops, in other words, after pass_build_ealias.

The patch postpones expansion in expand_omp, and ensures expansion by adding
pass_expand_omp_ssa:
- after pass_build_ealias, and
- after pass_all_early_optimizations for the case we're not optimizing.

In order to make sure the oacc kernels region arrives at pass_expand_omp_ssa,
the way it left expand_omp, the patch makes pass_ccp and pass_forwprop aware of
lowered omp code, to handle it conservatively.

The patch contains changes in expand_omp_target to deal with ssa-code, similar
to what is already present in expand_omp_taskreg.

Furthermore, the patch forces the .omp_data_sizes and .omp_data_kinds to not be
static for oacc kernels. It does this to get some references to .omp_data_sizes
and .omp_data_kinds in the ssa code.  Without these references, the definitions
will be removed. The reference of the variables in GIMPLE_OACC_KERNELS is not
enough to have them not removed. [ In vries/oacc-kernels, I used a BUILT_IN_USE
kludge for this purpose ].

Finally, at the end of pass_expand_omp_ssa we're left with SSA_NAMEs in the
original function of which the definition has been removed (as in moved to the
split off function). TODO_remove_unused_locals takes care of some of them, but
not the anonymous ones. So the patch iterates over all SSA_NAMEs to find these
dangling SSA_NAMEs and releases them.



Reposting with small update: I've replaced the use of the rather generic
gimple_stmt_omp_lowering_p with the more specific gimple_stmt_omp_data_i_init_p.

Bootstrapped and reg-tested in the same way as before.



I've moved pass_expand_omp_ssa one down in the pass list, past pass_fre.

This allows fre to unify references to the same omp variable before entering
pass_oacc_kernels, which helps pass_lim in pass_oacc_kernels.

F.i. this reduction fragment:
...
# VUSE <.MEM_8>
# PT = { D.2282 }
_67 = .omp_data_i_59->sumD.2270;
# VUSE <.MEM_8>
_68 = *_67;

_70 = _66 + _68;

# VUSE <.MEM_8>
# PT = { D.2282 }
_69 = .omp_data_i_59->sumD.2270;
# .MEM_71 = VDEF <.MEM_8>
*_69 = _70;
...

is transformed by fre into:
...
# VUSE <.MEM_8>
# PT = { D.2282 }
_67 = .omp_data_i_59->sumD.2270;
# VUSE <.MEM_8>
_68 = *_67;

_70 = _66 + _68;

# .MEM_71 = VDEF <.MEM_8>
*_67 = _70;
...

In order for pass_fre to respect the kernels region boundaries, I've added a
change in tree-ssa-sccvn.c:visit_use to handle the .omp_data_i init 
conservatively.

Bootstrapped and reg-tested as before.

OK for trunk?


Committed to gomp-4_0-branch in r79:

commit 93557ac5e30c26ee1a3d1255e31265b287171a0d
Author: tschwinge 
Date:   Tue Apr 21 19:37:19 2015 +

 Expand oacc kernels after pass_fre

gcc/
* omp-low.c: Include gimple-pretty-print.h.
(release_first_vuse_in_edge_dest): New function.
(expand_omp_target): When not in ssa, don't split off oacc kernels
region, clear PROP_gimple_eomp in cfun->curr_properties to force later
expanssion, and add GOACC_kernels_internal call.
When in ssa, split off oacc kernels and convert GOACC_kernels_internal
into GOACC_kernels call.  Handle ssa-code.
(pass_data_expand_omp): Don't set PROP_gimple_eomp unconditionally in
properties_provided field.
(pass_expand_omp::execute): Set PROP_gimple_eomp in
cfun->curr_properties tentatively.
(pass_data_expand_omp_ssa): Add TODO_remove_unused_locals to
todo_flags_finish field.
(pass_expand_omp_ssa::execute): Release dangling SSA_NAMEs after calling
execute_expand_omp

Re: Expand oacc kernels after pass_fre

2015-06-08 Thread Richard Biener
On Thu, 4 Jun 2015, Tom de Vries wrote:

> > >   {
> > > gsi_next (&gsi);
> > > continue;
> > > diff --git gcc/tree-ssa-sccvn.c gcc/tree-ssa-sccvn.c
> > > index e417a15..449a615 100644
> > > --- gcc/tree-ssa-sccvn.c
> > > +++ gcc/tree-ssa-sccvn.c
> > > @@ -85,6 +85,7 @@ along with GCC; see the file COPYING3.  If not see
> > >   #include "ipa-ref.h"
> > >   #include "plugin-api.h"
> > >   #include "cgraph.h"
> > > +#include "omp-low.h"
> > > 
> > >   /* This algorithm is based on the SCC algorithm presented by Keith
> > >  Cooper and L. Taylor Simpson in "SCC-Based Value numbering"
> > > @@ -3542,7 +3543,8 @@ visit_use (tree use)
> > >   {
> > > if (gimple_code (stmt) == GIMPLE_PHI)
> > >   changed = visit_phi (stmt);
> > > -  else if (gimple_has_volatile_ops (stmt))
> > > +  else if (gimple_has_volatile_ops (stmt)
> > > +|| gimple_stmt_omp_data_i_init_p (stmt))
> > 
> > No.
> > 
> > What is the intent of these changes?
> > 
> 
> These are changes to handle the kernels region conservatively, in order to not
> undo the omp-lowering before getting to the oacc-parloops pass.

Still it feels too much like the MPX mistake (maintainance cost and
compile-time cost).  How can any pass "undo" omp-lowering?

Richard.


Re: Expand oacc kernels after pass_fre

2015-06-19 Thread Tom de Vries

On 08/06/15 09:25, Richard Biener wrote:

On Thu, 4 Jun 2015, Tom de Vries wrote:


{
  gsi_next (&gsi);
  continue;
diff --git gcc/tree-ssa-sccvn.c gcc/tree-ssa-sccvn.c
index e417a15..449a615 100644
--- gcc/tree-ssa-sccvn.c
+++ gcc/tree-ssa-sccvn.c
@@ -85,6 +85,7 @@ along with GCC; see the file COPYING3.  If not see
   #include "ipa-ref.h"
   #include "plugin-api.h"
   #include "cgraph.h"
+#include "omp-low.h"

   /* This algorithm is based on the SCC algorithm presented by Keith
  Cooper and L. Taylor Simpson in "SCC-Based Value numbering"
@@ -3542,7 +3543,8 @@ visit_use (tree use)
   {
 if (gimple_code (stmt) == GIMPLE_PHI)
changed = visit_phi (stmt);
-  else if (gimple_has_volatile_ops (stmt))
+  else if (gimple_has_volatile_ops (stmt)
+  || gimple_stmt_omp_data_i_init_p (stmt))


No.

What is the intent of these changes?



These are changes to handle the kernels region conservatively, in order to not
undo the omp-lowering before getting to the oacc-parloops pass.


Still it feels too much like the MPX mistake (maintainance cost and
compile-time cost).  How can any pass "undo" omp-lowering?



I'm talking about the rewriting of the variables in terms of 
.omp_data_i. Passes like copy_prop and fre undo this rewriting, and 
propagate the variables from outside the kernels region back into the 
kernels region, eliminating .omp_data_i.


Thanks,
- Tom


Expand oacc kernels after pass_fre (was: [PATCH, 1/8] Expand oacc kernels after pass_build_ealias)

2015-04-21 Thread Thomas Schwinge
Hi!

On Tue, 25 Nov 2014 12:22:02 +0100, Tom de Vries  wrote:
> On 24-11-14 11:56, Tom de Vries wrote:
> > On 15-11-14 18:19, Tom de Vries wrote:
> >> On 15-11-14 13:14, Tom de Vries wrote:
> >>> I'm submitting a patch series with initial support for the oacc kernels
> >>> directive.
> >>>
> >>> The patch series uses pass_parallelize_loops to implement parallelization 
> >>> of
> >>> loops in the oacc kernels region.
> >>>
> >>> The patch series consists of these 8 patches:
> >>> ...
> >>>  1  Expand oacc kernels after pass_build_ealias
> >>>  2  Add pass_oacc_kernels
> >>>  3  Add pass_ch_oacc_kernels to pass_oacc_kernels
> >>>  4  Add pass_tree_loop_{init,done} to pass_oacc_kernels
> >>>  5  Add pass_loop_im to pass_oacc_kernels
> >>>  6  Add pass_ccp to pass_oacc_kernels
> >>>  7  Add pass_parloops_oacc_kernels to pass_oacc_kernels
> >>>  8  Do simple omp lowering for no address taken var
> >>> ...
> >>
> >> This patch moves omp expansion of the oacc kernels directive to after
> >> pass_build_ealias.
> >>
> >> The rationale is that in order to use pass_parallelize_loops for analysis 
> >> and
> >> transformation of an oacc kernels region, we postpone omp expansion of that
> >> region until the earliest point in the pass list where enough information 
> >> is
> >> availabe to run pass_parallelize_loops, in other words, after 
> >> pass_build_ealias.
> >>
> >> The patch postpones expansion in expand_omp, and ensures expansion by 
> >> adding
> >> pass_expand_omp_ssa:
> >> - after pass_build_ealias, and
> >> - after pass_all_early_optimizations for the case we're not optimizing.
> >>
> >> In order to make sure the oacc kernels region arrives at 
> >> pass_expand_omp_ssa,
> >> the way it left expand_omp, the patch makes pass_ccp and pass_forwprop 
> >> aware of
> >> lowered omp code, to handle it conservatively.
> >>
> >> The patch contains changes in expand_omp_target to deal with ssa-code, 
> >> similar
> >> to what is already present in expand_omp_taskreg.
> >>
> >> Furthermore, the patch forces the .omp_data_sizes and .omp_data_kinds to 
> >> not be
> >> static for oacc kernels. It does this to get some references to 
> >> .omp_data_sizes
> >> and .omp_data_kinds in the ssa code.  Without these references, the 
> >> definitions
> >> will be removed. The reference of the variables in GIMPLE_OACC_KERNELS is 
> >> not
> >> enough to have them not removed. [ In vries/oacc-kernels, I used a 
> >> BUILT_IN_USE
> >> kludge for this purpose ].
> >>
> >> Finally, at the end of pass_expand_omp_ssa we're left with SSA_NAMEs in the
> >> original function of which the definition has been removed (as in moved to 
> >> the
> >> split off function). TODO_remove_unused_locals takes care of some of them, 
> >> but
> >> not the anonymous ones. So the patch iterates over all SSA_NAMEs to find 
> >> these
> >> dangling SSA_NAMEs and releases them.
> >>
> >
> > Reposting with small update: I've replaced the use of the rather generic
> > gimple_stmt_omp_lowering_p with the more specific 
> > gimple_stmt_omp_data_i_init_p.
> >
> > Bootstrapped and reg-tested in the same way as before.
> >
> 
> I've moved pass_expand_omp_ssa one down in the pass list, past pass_fre.
> 
> This allows fre to unify references to the same omp variable before entering 
> pass_oacc_kernels, which helps pass_lim in pass_oacc_kernels.
> 
> F.i. this reduction fragment:
> ...
># VUSE <.MEM_8>
># PT = { D.2282 }
>_67 = .omp_data_i_59->sumD.2270;
># VUSE <.MEM_8>
>_68 = *_67;
> 
>_70 = _66 + _68;
> 
># VUSE <.MEM_8>
># PT = { D.2282 }
>_69 = .omp_data_i_59->sumD.2270;
># .MEM_71 = VDEF <.MEM_8>
>*_69 = _70;
> ...
> 
> is transformed by fre into:
> ...
># VUSE <.MEM_8>
># PT = { D.2282 }
>_67 = .omp_data_i_59->sumD.2270;
># VUSE <.MEM_8>
>_68 = *_67;
> 
>_70 = _66 + _68;
> 
># .MEM_71 = VDEF <.MEM_8>
>*_67 = _70;
> ...
> 
> In order for pass_fre to respect the kernels region boundaries, I've added a 
> change in tree-ssa-sccvn.c:visit_use to hand

Re: Expand oacc kernels after pass_fre (was: [PATCH, 1/8] Expand oacc kernels after pass_build_ealias)

2015-04-22 Thread Richard Biener
On Tue, 21 Apr 2015, Thomas Schwinge wrote:

> Hi!
> 
> On Tue, 25 Nov 2014 12:22:02 +0100, Tom de Vries  
> wrote:
> > On 24-11-14 11:56, Tom de Vries wrote:
> > > On 15-11-14 18:19, Tom de Vries wrote:
> > >> On 15-11-14 13:14, Tom de Vries wrote:
> > >>> I'm submitting a patch series with initial support for the oacc kernels
> > >>> directive.
> > >>>
> > >>> The patch series uses pass_parallelize_loops to implement 
> > >>> parallelization of
> > >>> loops in the oacc kernels region.
> > >>>
> > >>> The patch series consists of these 8 patches:
> > >>> ...
> > >>>  1  Expand oacc kernels after pass_build_ealias
> > >>>  2  Add pass_oacc_kernels
> > >>>  3  Add pass_ch_oacc_kernels to pass_oacc_kernels
> > >>>  4  Add pass_tree_loop_{init,done} to pass_oacc_kernels
> > >>>  5  Add pass_loop_im to pass_oacc_kernels
> > >>>  6  Add pass_ccp to pass_oacc_kernels
> > >>>  7  Add pass_parloops_oacc_kernels to pass_oacc_kernels
> > >>>  8  Do simple omp lowering for no address taken var
> > >>> ...
> > >>
> > >> This patch moves omp expansion of the oacc kernels directive to after
> > >> pass_build_ealias.
> > >>
> > >> The rationale is that in order to use pass_parallelize_loops for 
> > >> analysis and
> > >> transformation of an oacc kernels region, we postpone omp expansion of 
> > >> that
> > >> region until the earliest point in the pass list where enough 
> > >> information is
> > >> availabe to run pass_parallelize_loops, in other words, after 
> > >> pass_build_ealias.
> > >>
> > >> The patch postpones expansion in expand_omp, and ensures expansion by 
> > >> adding
> > >> pass_expand_omp_ssa:
> > >> - after pass_build_ealias, and
> > >> - after pass_all_early_optimizations for the case we're not optimizing.
> > >>
> > >> In order to make sure the oacc kernels region arrives at 
> > >> pass_expand_omp_ssa,
> > >> the way it left expand_omp, the patch makes pass_ccp and pass_forwprop 
> > >> aware of
> > >> lowered omp code, to handle it conservatively.
> > >>
> > >> The patch contains changes in expand_omp_target to deal with ssa-code, 
> > >> similar
> > >> to what is already present in expand_omp_taskreg.
> > >>
> > >> Furthermore, the patch forces the .omp_data_sizes and .omp_data_kinds to 
> > >> not be
> > >> static for oacc kernels. It does this to get some references to 
> > >> .omp_data_sizes
> > >> and .omp_data_kinds in the ssa code.  Without these references, the 
> > >> definitions
> > >> will be removed. The reference of the variables in GIMPLE_OACC_KERNELS 
> > >> is not
> > >> enough to have them not removed. [ In vries/oacc-kernels, I used a 
> > >> BUILT_IN_USE
> > >> kludge for this purpose ].
> > >>
> > >> Finally, at the end of pass_expand_omp_ssa we're left with SSA_NAMEs in 
> > >> the
> > >> original function of which the definition has been removed (as in moved 
> > >> to the
> > >> split off function). TODO_remove_unused_locals takes care of some of 
> > >> them, but
> > >> not the anonymous ones. So the patch iterates over all SSA_NAMEs to find 
> > >> these
> > >> dangling SSA_NAMEs and releases them.
> > >>
> > >
> > > Reposting with small update: I've replaced the use of the rather generic
> > > gimple_stmt_omp_lowering_p with the more specific 
> > > gimple_stmt_omp_data_i_init_p.
> > >
> > > Bootstrapped and reg-tested in the same way as before.
> > >
> > 
> > I've moved pass_expand_omp_ssa one down in the pass list, past pass_fre.
> > 
> > This allows fre to unify references to the same omp variable before 
> > entering 
> > pass_oacc_kernels, which helps pass_lim in pass_oacc_kernels.
> > 
> > F.i. this reduction fragment:
> > ...
> ># VUSE <.MEM_8>
> ># PT = { D.2282 }
> >_67 = .omp_data_i_59->sumD.2270;
> ># VUSE <.MEM_8>
> >_68 = *_67;
> > 
> >_70 = _66 + _68;
> > 
> ># VUSE <.MEM_8>