Re: Graphite Patch Ping

2022-05-16 Thread Richard Biener via Gcc-patches
On Mon, 16 May 2022, Tobias Burnus wrote:

> Hi all,
> 
> I would like to ping the following patches from Frederik's
>  "[PATCH 00/40] OpenACC "kernels" Improvements" series
>   https://gcc.gnu.org/pipermail/gcc-patches/2021-December/586901.html
> patch set thread link:
>   https://gcc.gnu.org/pipermail/gcc-patches/2021-December/thread.html#586901

Can we get them re-based to after the .c -> .cc renaming please?

Richard.

> (A) Simpler patches
> 
> [PATCH 15/40] graphite: Extend SCoP detection dump output
>https://gcc.gnu.org/pipermail/gcc-patches/2021-December/586915.html
>"Extend dump output to make understanding why Graphite rejects to
> include a loop in a SCoP easier (for GCC developers)."
> 
> [PATCH 16/40] graphite: Rename isl_id_for_ssa_name"
>   https://gcc.gnu.org/pipermail/gcc-patches/2021-December/586917.html
>   (consistency renaming of static function/var name)
> 
> [PATCH 17/40] graphite: Fix minor mistakes in comments
>   https://gcc.gnu.org/pipermail/gcc-patches/2021-December/586918.html
>   Comment-typo fixes
> 
> 
> (B) User-visible change (new flag) - prep + actual new-flag patch
> 
> [PATCH 18/40] Move compute_alias_check_pairs to tree-data-ref.c
>   https://gcc.gnu.org/pipermail/gcc-patches/2021-December/586919.html
>   "Move this function from tree-loop-distribution.c to tree-data-ref.c
>and make it non-static to enable its use from other parts of GCC."
> 
> [PATCH 19/40] graphite: Add runtime alias checking
>   https://gcc.gnu.org/pipermail/gcc-patches/2021-December/586920.html
>   'Graphite rejects a SCoP if it contains a pair of data references for
>which it cannot determine statically if they may alias. This happens
>very often, for instance in C code which does not use explicit
>"restrict".  This commit adds the possibility to analyze a SCoP
>nevertheless and perform an alias check at runtime.'
> 
> Thanks,
> 
> Tobias
> 
> -
> Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstra?e 201, 80634
> M?nchen; Gesellschaft mit beschr?nkter Haftung; Gesch?ftsf?hrer: Thomas
> Heurung, Frank Th?rauf; Sitz der Gesellschaft: M?nchen; Registergericht
> M?nchen, HRB 106955
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)


Re: [GRAPHITE, PATCH] Ping: Loop unroll and jam optimization

2014-11-15 Thread Mircea Namolaru
The close of stage 1 is getting close (very close). Even there is not so much 
new code (basically
the new code computes the separation class option for AST build), I am not sure 
that the patch 
qualify for stage 2.

There is very nice code generated by unroll-and-jam (stride mining) for small 
kernels both for constant 
or non-constant bound loops, and is an argument for the new isl based code 
generator. Otherwise I'm afraid 
that the code generated looks very similar with the cloog generated one, an 
inner loop
with bounds of min/max that GCC doesn't further optimize, preventing perceived 
advantages of 
strip mining (register reuse and scalar reduction, instruction scheduling etc).

ok for trunk ?

Thanks, Mircea

Index: gcc/graphite-poly.h
===
--- gcc/graphite-poly.h	(revision 217013)
+++ gcc/graphite-poly.h	(working copy)
@@ -349,6 +349,9 @@
   poly_scattering_p _saved;
   isl_map *saved;
 
+  /* For tiling, the map for computing the separating class.  */
+  isl_map *map_sepclass;
+
   /* True when this PBB contains only a reduction statement.  */
   bool is_reduction;
 };
Index: gcc/graphite.c
===
--- gcc/graphite.c	(revision 217013)
+++ gcc/graphite.c	(working copy)
@@ -383,7 +383,8 @@
   || flag_loop_strip_mine
   || flag_graphite_identity
   || flag_loop_parallelize_all
-  || flag_loop_optimize_isl)
+  || flag_loop_optimize_isl
+  || flag_loop_unroll_jam)
 flag_graphite = 1;
 
   return flag_graphite != 0;
Index: gcc/common.opt
===
--- gcc/common.opt	(revision 217013)
+++ gcc/common.opt	(working copy)
@@ -1328,6 +1328,10 @@
 Common Report Var(flag_loop_block) Optimization
 Enable Loop Blocking transformation
 
+floop-unroll-and-jam
+Common Report Var(flag_loop_unroll_jam) Optimization
+Enable Loop Unroll Jam transformation
+ 
 fgnu-tm
 Common Report Var(flag_tm)
 Enable support for GNU transactional memory
Index: gcc/graphite-optimize-isl.c
===
--- gcc/graphite-optimize-isl.c	(revision 217013)
+++ gcc/graphite-optimize-isl.c	(working copy)
@@ -186,7 +186,7 @@
   PartialSchedule = isl_band_get_partial_schedule (Band);
   *Dimensions = isl_band_n_member (Band);
 
-  if (DisableTiling)
+  if (DisableTiling || flag_loop_unroll_jam)
 return PartialSchedule;
 
   /* It does not make any sense to tile a band with just one dimension.  */
@@ -241,7 +241,9 @@
constant number of iterations, if the number of loop iterations at
DimToVectorize can be devided by VectorWidth. The default VectorWidth is
currently constant and not yet target specific. This function does not reason
-   about parallelism.  */
+   about parallelism.
+
+  */
 static isl_map *
 getPrevectorMap (isl_ctx *ctx, int DimToVectorize,
 		 int ScheduleDimensions,
@@ -305,8 +307,98 @@
   isl_constraint_set_constant_si (c, VectorWidth - 1);
   TilingMap = isl_map_add_constraint (TilingMap, c);
 
-  isl_map_dump (TilingMap);
+  return TilingMap;
+}
 
+/* Compute an auxiliary map to getPrevectorMap, for computing the separating 
+   class defined by full tiles.  Used in graphite_isl_ast_to_gimple.c to set the 
+   corresponding option for AST build.
+
+   The map (for VectorWidth=4):
+
+   [i,j] -> [it,j,ip] : it % 4 = 0 and it <= ip <= it + 3 and it + 3 = i and
+ip >= 0
+
+   The image of this map is the separation class. The range of this map includes
+   all the i that are multiple of 4 in the domain beside the greater one. 
+
+ */ 
+static isl_map *
+getPrevectorMap_full (isl_ctx *ctx, int DimToVectorize,
+		 int ScheduleDimensions,
+		 int VectorWidth)
+{
+  isl_space *Space;
+  isl_local_space *LocalSpace, *LocalSpaceRange;
+  isl_set *Modulo;
+  isl_map *TilingMap;
+  isl_constraint *c;
+  isl_aff *Aff;
+  int PointDimension; /* ip */
+  int TileDimension;  /* it */
+  isl_val *VectorWidthMP;
+  int i;
+
+  /* assert (0 <= DimToVectorize && DimToVectorize < ScheduleDimensions);*/
+
+  Space = isl_space_alloc (ctx, 0, ScheduleDimensions, ScheduleDimensions + 1);
+  TilingMap = isl_map_universe (isl_space_copy (Space));
+  LocalSpace = isl_local_space_from_space (Space);
+  PointDimension = ScheduleDimensions;
+  TileDimension = DimToVectorize;
+
+  /* Create an identity map for everything except DimToVectorize and the 
+ point loop. */
+  for (i = 0; i < ScheduleDimensions; i++)
+{
+  if (i == DimToVectorize)
+continue;
+
+  c = isl_equality_alloc (isl_local_space_copy (LocalSpace));
+
+  isl_constraint_set_coefficient_si (c, isl_dim_in, i, -1);
+  isl_constraint_set_coefficient_si (c, isl_dim_out, i, 1);
+
+  TilingMap = isl_map_add_constraint (TilingMap, c);
+}
+
+  /* it % 'VectorWidth' = 0  */
+  LocalSpaceRange = isl_local_space_range (isl_local_space_copy (LocalSpace));
+  

Re: [GRAPHITE, PATCH] Ping: Loop unroll and jam optimization

2014-11-17 Thread Richard Biener
On Sat, Nov 15, 2014 at 11:57 AM, Mircea Namolaru
 wrote:
> The close of stage 1 is getting close (very close). Even there is not so much 
> new code (basically
> the new code computes the separation class option for AST build), I am not 
> sure that the patch
> qualify for stage 2.
>
> There is very nice code generated by unroll-and-jam (stride mining) for small 
> kernels both for constant
> or non-constant bound loops, and is an argument for the new isl based code 
> generator. Otherwise I'm afraid
> that the code generated looks very similar with the cloog generated one, an 
> inner loop
> with bounds of min/max that GCC doesn't further optimize, preventing 
> perceived advantages of
> strip mining (register reuse and scalar reduction, instruction scheduling 
> etc).
>
> ok for trunk ?

New optimization flags and new params need documentation in
gcc/doc/invoke.texi.

The description of the --params suggest they provide fixed values - is
there no way to autodetect sensible values with a cost-model?  I
hardly doubt that you can find two fixed values that apply for a whole
program...

Richard.

> Thanks, Mircea
>


Re: [GRAPHITE, PATCH] Ping: Loop unroll and jam optimization

2014-11-17 Thread Mircea Namolaru
> New optimization flags and new params need documentation in
> gcc/doc/invoke.texi.
> 

Thanks. Added description in invoke.texi. The patch is in trunk.

> The description of the --params suggest they provide fixed values - is
> there no way to autodetect sensible values with a cost-model?  I
> hardly doubt that you can find two fixed values that apply for a whole
> program...

There are a lot of models/heuristics, but in the general case tile sizes remain
kind of an open problem. For the time being, this option and its parameters are 
intended 
mostly to compiler developers interested to bring the loop to a specific form 
enabling
further optimizations.

Mircea.