Re: [PATCH][GRAPHITE] Speedup SCOP detection some more, add region handling to domwalk

2017-09-28 Thread Sebastian Pop
On Wed, Sep 27, 2017 at 6:07 AM, Richard Biener  wrote:
>  /* Maximal number of array references in a scop.  */
>
DEFPARAM (PARAM_GRAPHITE_MAX_ARRAYS_PER_SCOP,
  "graphite-max-arrays-per-scop",
  "maximum number of arrays per scop.",
  100, 0, 0)

Let's also remove this param as we now have max-isl-operations.

Thanks,
Sebastian


Re: [PATCH][GRAPHITE] Speedup SCOP detection some more, add region handling to domwalk

2017-09-28 Thread Sebastian Pop
On Wed, Sep 27, 2017 at 6:07 AM, Richard Biener  wrote:
>
> This removes another quadraticness from SCOP detection, gather_bbs
> domwalk.  This is done by enhancing domwalk to handle SEME regions
> via a special return value from before_dom_children.
>
> With this I'm now confident to remove the
> PARAM_GRAPHITE_MAX_BBS_PER_FUNCTION parameter and its associated limit.
> Being there I've adjusted PARAM_GRAPHITE_MAX_NB_SCOP_PARAMS to its
> documented default value which enables 90 more loos to be processed
> in SPEC CPU 2006.  I've also made a value of zero magic in disabling
> the limit (a trick commonly used in GCC).
>
> Statistics I have gathered a few patches before for SPEC CPU 2006:
>
> 1255 multi-loop SESEs in SCOP processing
> max. params 34, 3 scops >= 20, 15 scops >= 10, 33 scops >= 8
> max. drs per scop 869, 10 scops >= 100
> max. pbbs per scop 36, 12 scops >= 10
> 919 SCOPs fail in build_alias_sets
>
> which shows the default for PARAM_GRAPHITE_MAX_ARRAYS_PER_SCOP
> is reasonable (if tuned to SPEC CPU 2006).
>
> I've also included the hunk that allows -fgraphite-identity
> to work ontop of -floop-nest-optimize and for -floop-nest-optimize
> -ftree-parallelize-all also make sure to code-gen loops that
> end up not transformed.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, SPEC CPU 2006
> tested, applied to trunk.
>
> Richard.
>
> 2017-09-27  Richard Biener  
>
> * doc/invoke.texi (graphite-max-bbs-per-function): Remove.
> (graphite-max-nb-scop-params): Document special value zero.
> * domwalk.h (dom_walker::STOP): New symbolical constant.
> (dom_walker::dom_walker): Add optional parameter for bb to
> RPO mapping.
> (dom_walker::~dom_walker): Declare.
> (dom_walker::before_dom_children): Document STOP return value.
> (dom_walker::m_user_bb_to_rpo): New member.
> (dom_walker::m_bb_to_rpo): Likewise.
> * domwalk.c (dom_walker::dom_walker): Compute bb to RPO
> mapping here if not provided by the user.
> (dom_walker::~dom_walker): Free bb to RPO mapping if not
> provided by the user.
> (dom_walker::STOP): Define.
> (dom_walker::walk): Do not compute bb to RPO mapping here.
> Support STOP return value from before_dom_children to stop
> walking.
> * graphite-optimize-isl.c (optimize_isl): If the schedule
> is the same still generate code if -fgraphite-identity
> or -floop-parallelize-all are given.
> * graphite-scop-detection.c: Include cfganal.h.
> (gather_bbs::gather_bbs): Get and pass through bb to RPO
> mapping.
> (gather_bbs::before_dom_children): Return STOP for BBs
> not in the region.
> (build_scops): Compute bb to RPO mapping and pass it to
> the domwalk.  Treat --param graphite-max-nb-scop-params=0
> as not limiting the number of params.
> * graphite.c (graphite_initialize): Remove limit on the
> number of basic-blocks in a function.
> * params.def (PARAM_GRAPHITE_MAX_BBS_PER_FUNCTION): Remove.
> (PARAM_GRAPHITE_MAX_NB_SCOP_PARAMS): Adjust to documented
> default value of 10.

The patch looks good.  Thanks!

>
> Index: gcc/doc/invoke.texi
> ===
> --- gcc/doc/invoke.texi (revision 253224)
> +++ gcc/doc/invoke.texi (working copy)
> @@ -10512,13 +10512,9 @@ sequence pairs.  This option only applie
>  @item graphite-max-nb-scop-params
>  To avoid exponential effects in the Graphite loop transforms, the
>  number of parameters in a Static Control Part (SCoP) is bounded.  The
> -default value is 10 parameters.

Now that we have "compute-out" functionality in all supported
versions of isl, let's remove this parameter.

We needed this in the past when isl was not able to stop an
exponential computation, and that happened when operating
on large dimension spaces.


[PATCH][GRAPHITE] Speedup SCOP detection some more, add region handling to domwalk

2017-09-27 Thread Richard Biener

This removes another quadraticness from SCOP detection, gather_bbs
domwalk.  This is done by enhancing domwalk to handle SEME regions
via a special return value from before_dom_children.

With this I'm now confident to remove the 
PARAM_GRAPHITE_MAX_BBS_PER_FUNCTION parameter and its associated limit.
Being there I've adjusted PARAM_GRAPHITE_MAX_NB_SCOP_PARAMS to its
documented default value which enables 90 more loos to be processed
in SPEC CPU 2006.  I've also made a value of zero magic in disabling
the limit (a trick commonly used in GCC).

Statistics I have gathered a few patches before for SPEC CPU 2006:

1255 multi-loop SESEs in SCOP processing
max. params 34, 3 scops >= 20, 15 scops >= 10, 33 scops >= 8
max. drs per scop 869, 10 scops >= 100
max. pbbs per scop 36, 12 scops >= 10
919 SCOPs fail in build_alias_sets

which shows the default for PARAM_GRAPHITE_MAX_ARRAYS_PER_SCOP
is reasonable (if tuned to SPEC CPU 2006).

I've also included the hunk that allows -fgraphite-identity
to work ontop of -floop-nest-optimize and for -floop-nest-optimize
-ftree-parallelize-all also make sure to code-gen loops that
end up not transformed.

Bootstrapped and tested on x86_64-unknown-linux-gnu, SPEC CPU 2006
tested, applied to trunk.

Richard.

2017-09-27  Richard Biener  

* doc/invoke.texi (graphite-max-bbs-per-function): Remove.
(graphite-max-nb-scop-params): Document special value zero.
* domwalk.h (dom_walker::STOP): New symbolical constant.
(dom_walker::dom_walker): Add optional parameter for bb to
RPO mapping.
(dom_walker::~dom_walker): Declare.
(dom_walker::before_dom_children): Document STOP return value.
(dom_walker::m_user_bb_to_rpo): New member.
(dom_walker::m_bb_to_rpo): Likewise.
* domwalk.c (dom_walker::dom_walker): Compute bb to RPO
mapping here if not provided by the user.
(dom_walker::~dom_walker): Free bb to RPO mapping if not
provided by the user.
(dom_walker::STOP): Define.
(dom_walker::walk): Do not compute bb to RPO mapping here.
Support STOP return value from before_dom_children to stop
walking.
* graphite-optimize-isl.c (optimize_isl): If the schedule
is the same still generate code if -fgraphite-identity
or -floop-parallelize-all are given.
* graphite-scop-detection.c: Include cfganal.h.
(gather_bbs::gather_bbs): Get and pass through bb to RPO
mapping.
(gather_bbs::before_dom_children): Return STOP for BBs
not in the region.
(build_scops): Compute bb to RPO mapping and pass it to
the domwalk.  Treat --param graphite-max-nb-scop-params=0
as not limiting the number of params.
* graphite.c (graphite_initialize): Remove limit on the
number of basic-blocks in a function.
* params.def (PARAM_GRAPHITE_MAX_BBS_PER_FUNCTION): Remove.
(PARAM_GRAPHITE_MAX_NB_SCOP_PARAMS): Adjust to documented
default value of 10.

Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi (revision 253224)
+++ gcc/doc/invoke.texi (working copy)
@@ -10512,13 +10512,9 @@ sequence pairs.  This option only applie
 @item graphite-max-nb-scop-params
 To avoid exponential effects in the Graphite loop transforms, the
 number of parameters in a Static Control Part (SCoP) is bounded.  The
-default value is 10 parameters.  A variable whose value is unknown at
-compilation time and defined outside a SCoP is a parameter of the SCoP.
-
-@item graphite-max-bbs-per-function
-To avoid exponential effects in the detection of SCoPs, the size of
-the functions analyzed by Graphite is bounded.  The default value is
-100 basic blocks.
+default value is 10 parameters, a value of zero can be used to lift
+the bound.  A variable whose value is unknown at compilation time and
+defined outside a SCoP is a parameter of the SCoP.
 
 @item loop-block-tile-size
 Loop blocking or strip mining transforms, enabled with
Index: gcc/domwalk.c
===
--- gcc/domwalk.c   (revision 253224)
+++ gcc/domwalk.c   (working copy)
@@ -174,13 +174,29 @@ sort_bbs_postorder (basic_block *bbs, in
If SKIP_UNREACHBLE_BLOCKS is true, then we need to set
EDGE_EXECUTABLE on every edge in the CFG. */
 dom_walker::dom_walker (cdi_direction direction,
-   bool skip_unreachable_blocks)
+   bool skip_unreachable_blocks,
+   int *bb_index_to_rpo)
   : m_dom_direction (direction),
 m_skip_unreachable_blocks (skip_unreachable_blocks),
-m_unreachable_dom (NULL)
+m_user_bb_to_rpo (bb_index_to_rpo != NULL),
+m_unreachable_dom (NULL),
+m_bb_to_rpo (bb_index_to_rpo)
 {
+  /* Compute the basic-block index to RPO mapping if not provided by
+ the user.  */
+  if (! m_bb_to_rpo && direction