[Bug tree-optimization/56145] [4.8/4.9 Regression] Use of too much optimizations -O2 -ffast-math -floop-parallelize-all

2015-02-24 Thread mircea.namolaru at inria dot fr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56145

--- Comment #16 from Mircea Namolaru  ---
Right, the NULL check fixed this for previous versions of GCC.

For the current version, it works without these NULL checks (the NULL paths are
not followed). The relevant scop fields are always initialized to empty maps,
so they are never NULL. The problematic call to compute_deps where the NULL
problem occured was from parallelization check code
(compute_loop_level_carries_dependencies), code no longer in GCC.


[Bug tree-optimization/56145] [4.8/4.9/5 Regression] Use of too much optimizations -O2 -ffast-math -floop-parallelize-all

2015-02-24 Thread mircea.namolaru at inria dot fr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56145

Mircea Namolaru  changed:

   What|Removed |Added

 CC||mircea.namolaru at inria dot fr

--- Comment #14 from Mircea Namolaru  ---
Didn't succeed to reproduce the problem on Linux.

The flag loop->can_be_parallel set by Graphite for parallelization is computed
in a different manner. This is made possible by the new ISL based code
generation that annotates the AST with relevant information for loop nodes. 

I think that is possible to close this problem.


[Bug tree-optimization/62630] [5 regression] gcc.dg/graphite/vect-pr43423.c FAILs

2015-02-23 Thread mircea.namolaru at inria dot fr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62630

--- Comment #18 from Mircea Namolaru  ---
I've succeeded to explain why these casts are generated, and they seem correct.
Graphite introduces new induction variables with a larger size type (then the
type of original
induction variable), to make sure that accommodate the iteration count.
Otherwise an overflow 
not in the original program occurs. On the other side, to not hide overflows
for computations 
using the original induction variables, you still need to use them.
This explain the casts to variables with smaller types. These cast will cause
an overflow (i.e.
a value not in the range of the smaller type), only when an overflow in the
original 
problem occurs. There were introduced to mimics the behaviour of the original
program in case
of overflows.

So, my understanding is that there is nothing wrong with Graphite or scalar
evolution. The 
vectorization succeeded because no larger size type was used,  but this was
unsafe. To make it 
work again, you need some supplementary analysis to determine that casts are
redundant. This is 
not a simple problem, but with Richard patch the vectorization works.
Unfortunately, don't 
see other simpler solutions.

It would be possible maybe to catch in Graphite that the transformation is
graphite identity,
and the original lower bounds of loops are zero. This will ensure that a larger
size type is not needed
for induction variable. But seems like a modification intended to make this
test works, and nothing more.

Btw, the only potential problem found may be with the code for gather in
vectorization. After the attempt 
to use the vector load fails, the vectorizer detects an opportunity for a
gather instruction, but as don't 
find a suitable one (this depends on architecture) vectorization fails. It
seems to me that the analysis for gather don't take into account the
possibility of overflows. For this test, I could modify the code and use 
as gather instruction a load vector (even this was found not to be safe). The
vectorization would succeed.
But not entirely sure about this ...


[Bug tree-optimization/62630] [5 regression] gcc.dg/graphite/vect-pr43423.c FAILs

2015-02-19 Thread mircea.namolaru at inria dot fr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62630

--- Comment #16 from Mircea Namolaru  ---
Yes, but it seems to me that the cast (not in the original code) should not
be generated at all if it could not be guaranteed that the casted-to type is
larger 
enough to accommodate it. Otherwise you introduce a cast from a longer signed
type
to a shorter signed one whose behaviour is undefined by the C standard and was
not
in the original code.

So the cast in the following code is problematic (when
graphite_IV, a signed long is not in the range of a signed int).

   _56 = (intD.6) graphite_IV.5_53;
   _55 = aD.1830[_56];

The solution to fix this is to made Graphite not to generate
casts like this. An alternative is to infer the range of
graphite_IV like you do and remove the cast (but this seems more complicated
and risky as the analysis may not succeed and the problematic cast is not
removed).


[Bug tree-optimization/62630] [5 regression] gcc.dg/graphite/vect-pr43423.c FAILs

2015-02-18 Thread mircea.namolaru at inria dot fr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62630

--- Comment #14 from Mircea Namolaru  ---
It seems to me that scalar evolution succeeds to determine
the number of iterations for the case of signed longs. Looking
in vectorization dump, first a symbolic expression for the number of 
iterations of a loop is found, and then vect_analyze_refs is entered.
The problem is that the code expect an offset of a load to be an induction
variable, 
but in our case an offset is only a cast of an induction variable, like
below:

 _56 = (intD.6) graphite_IV.5_53;
 _55 = aD.1830[_56];

The offset is found not to be an affine expression, and vectorization don't
succeed. But as the offset is a cast of an induction variable, it has the same
behaviour as an induction variable even if formally is not one. It seems to me
that somehow extending the code to support casts of induction variables
will solve our this problem.


[Bug tree-optimization/62630] [5 regression] gcc.dg/graphite/vect-pr43423.c FAILs

2015-02-18 Thread mircea.namolaru at inria dot fr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62630

--- Comment #10 from Mircea Namolaru  ---
On my Intel x86-64 platform changed in graphite-isl-ast-to-gimple.c:

- static int graphite_expression_type_precision = 128 <= max_mode_int_precision
?
-  128 : max_mode_int_precision;
+ static int graphite_expression_type_precision = 32;

The computation are done on INT, and you get this code for basic block 4 (and
vectorization performed):

  _28 = MIN_EXPR ;
  _29 = _28 > 0;
  if (_29 != 0)
goto ;
  else
goto ;

In my opinion the casts introduced cause problems to scalar evolution (and you
are right,
MIN/MAX are not the problem). 

I will look into two directions, and choose the quickest one fixing the
regression.
1) not to generate casts in Graphite, if correctness is not affected (as in
this case). 
But determining when the use of a longer size signed type is required is not so
simple. 
2) modify handling of casts in scalar evolution. But I am not familiar with
this code.

- Original Message -
> From: "rguenth at gcc dot gnu.org" 
> To: "mircea namolaru" 
> Sent: Wednesday, February 18, 2015 12:22:55 PM
> Subject: [Bug tree-optimization/62630] [5 regression] 
> gcc.dg/graphite/vect-pr43423.c FAILs
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62630
> 
> --- Comment #8 from Richard Biener  ---
> (In reply to Mircea Namolaru from comment #7)
> > Graphite generates MAX/MIN expressions.
> > 
> > I've modified Graphite to use the original types of "n" and "mid" in MIN
> > and
> > MAX, and to not generate the casts of "n" and "mid" to a longer signed INT
> > before MIN/MAX, and the vectorization succeeded.
> > 
> > It seems that it is not a Graphite problem but a scalar evolution one.
> > Scalar evolution is not able to handle MIN/MAX expressions in the presence
> > of casts. Beside vectorization also further unrolling is prevented.
> 
> Can you share a patch with that modification?  I'd like to look at the
> differences it makes with respect to SCEV / niter analysis.  Note that
> neither SCEV nor niter analysis handle MIN/MAX_EXPRs explicitely.  It might
> be that
> 
>   :
>   if (n_5(D) > 0)
> goto ;
> ...
>   :
>   _28 = (signed long) n_5(D);
>   _29 = (signed long) mid_6(D);
>   _30 = MIN_EXPR <_28, _29>;
>   _31 = _30 > 0;
>   if (_31 != 0)
> 
> can be simplified to mid_6(D) > 0 by expansion/folding in some way though
> if there are no casts in the way.  Not sure.  I suppose ISL doesn't get to
> know that n > 0 if the loop enters (and doesn't exploit that knowledge)?
> 
> --
> You are receiving this mail because:
> You are on the CC list for the bug.
>


[Bug tree-optimization/62630] [5 regression] gcc.dg/graphite/vect-pr43423.c FAILs

2015-02-17 Thread mircea.namolaru at inria dot fr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62630

Mircea Namolaru  changed:

   What|Removed |Added

 CC||mircea.namolaru at inria dot fr

--- Comment #7 from Mircea Namolaru  ---
Graphite generates MAX/MIN expressions.

I've modified Graphite to use the original types of "n" and "mid" in MIN and
MAX, and to not generate the casts of "n" and "mid" to a longer signed INT
before MIN/MAX, and the vectorization succeeded.

It seems that it is not a Graphite problem but a scalar evolution one. Scalar
evolution is not able to handle MIN/MAX expressions in the presence of casts.
Beside vectorization also further unrolling is prevented.


[Bug tree-optimization/64098] ICE isl_ctx.c:172: isl_ctx freed, but some objects still referenced

2014-11-29 Thread mircea.namolaru at inria dot fr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64098

--- Comment #1 from Mircea Namolaru  ---
Bug confirmed. The error message points to a problem in the way in which the 
unroll-and-jam code manage the isl objects (the space is not freed properly). 

I pinned down the function causing the problem, but work is still needed. I
will send a patch in the following days. The patch will also add some test
cases (including this one) for loop-unroll-and-jam.


[Bug tree-optimization/61000] No loop interchange for inner loop along the slow index

2014-04-30 Thread mircea.namolaru at inria dot fr
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=61000

--- Comment #4 from Mircea Namolaru  ---
Right, C arrays expressed as pointers suffers from the same problem.
But for C at least there is a way to avoid this.

Many thanks for your suggestion of how to de-linearize arrays in middle-end, it 
seems that may be simpler then I've thought. Hope to find time and wrote a
patch 
based on your idea for GCC 4.10. 

Mircea

- Original Message -
> From: "rguenth at gcc dot gnu.org" 
> To: "mircea namolaru" 
> Sent: Wednesday, April 30, 2014 1:02:10 PM
> Subject: [Bug tree-optimization/61000] No loop interchange for inner loop 
> along the slow index
> 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=61000
> 
> --- Comment #3 from Richard Biener  ---
> (In reply to Mircea Namolaru from comment #2)
> > Again, the problem is due to representation of arrays in Fortran as array
> > with a single dimnesion (for similar code in C profitability check work as
> > expected). It is a recurring problem that may lead to compilation time
> > increase (sometimes dramatically) or missed opportunities optimizations due
> > to too conservative dependence analysis or as on this case the
> > profitability
> > check failure. The solution is to de-liniarize array accesses in Fortran as
> > in C.
> 
> Note that C doesn't always have de-linearized arrays (once you access the
> array via a pointer).
> 
> For Fortran de-linearizing is "easy" via simple casting to a
> multi-dimensional
> (variable-bounds) array type.  For the middle-end side, that is.
> 
> --
> You are receiving this mail because:
> You are on the CC list for the bug.
>


[Bug tree-optimization/61000] No loop interchange for inner loop along the slow index

2014-04-30 Thread mircea.namolaru at inria dot fr
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=61000

--- Comment #2 from Mircea Namolaru  ---
Again, the problem is due to representation of arrays in Fortran as array with
a single dimnesion (for similar code in C profitability check work as
expected). It is a recurring problem that may lead to compilation time increase
(sometimes dramatically) or missed opportunities optimizations due to too
conservative dependence analysis or as on this case the profitability check
failure. The solution is to de-liniarize array accesses in Fortran as in C.


[Bug tree-optimization/61000] No loop interchange for inner loop along the slow index

2014-04-29 Thread mircea.namolaru at inria dot fr
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=61000

--- Comment #1 from Mircea Namolaru  ---
The built-in heuristics assess that loop interchange is not profitable. Indeed
there is a problem, I would expected that the second loop to be found
profitable.
Need to look more in depth at this.


[Bug tree-optimization/60997] -fopenmp conflicts with -floop-interchange

2014-04-29 Thread mircea.namolaru at inria dot fr
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60997

--- Comment #3 from Mircea Namolaru  ---
It is not that -floop-interchange is disabled, but the code received by
graphite is different if the option -fopenmp is enabled. In this case the check
for data
dependencies required by loop-interchange fails. I wil check more in depth if 
the data dependencies are right in this case or there is a problem with them
(but probably not). 

I guess that the problem is the same for vectorization (but there the data
dependencies for vectorization are not checked by graphite).


[Bug tree-optimization/55022] [4.8 Regression] air.f90 is miscompliled with -m64 -O2 -fgraphite-identity after revision 190619

2014-04-23 Thread mircea.namolaru at inria dot fr
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55022

--- Comment #27 from Mircea Namolaru  ---
Hi,

Many thanks. 

I've passed over the meta-bug opened by you for Graphite,
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59859 and seems to me
that many of the problem have been already solved (some by you) or at least 
as in case of some compile-time/memory usage/missed optimization issues
we know how to solve it - some form of delinearization of array accesses 
that for sure was not for GCC 4.9. 

Btw, there is also another Graphite bug in this list
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59586 that was solved but didn't 
make it for GCC 4.9.0.

Mircea


- Original Message -
> From: "rguenth at gcc dot gnu.org" 
> To: "mircea namolaru" 
> Sent: Tuesday, April 22, 2014 3:34:09 PM
> Subject: [Bug tree-optimization/55022] [4.8 Regression] air.f90 is 
> miscompliled with -m64 -O2 -fgraphite-identity
> after revision 190619
> 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55022
> 
> --- Comment #26 from Richard Biener  ---
> Author: rguenth
> Date: Tue Apr 22 13:33:37 2014
> New Revision: 209633
> 
> URL: http://gcc.gnu.org/viewcvs?rev=209633&root=gcc&view=rev
> Log:
> 2014-04-22  Richard Biener  
> 
> Backport from mainline
> 2014-04-14  Richard Biener  
> 
> PR middle-end/55022
> * fold-const.c (negate_expr_p): Don't negate directional rounding
> division.
> (fold_negate_expr): Likewise.
> 
> * gcc.dg/graphite/pr55022.c: New testcase.
> 
> Added:
> branches/gcc-4_9-branch/gcc/testsuite/gcc.dg/graphite/pr55022.c
> Modified:
> branches/gcc-4_9-branch/gcc/ChangeLog
> branches/gcc-4_9-branch/gcc/fold-const.c
> branches/gcc-4_9-branch/gcc/testsuite/ChangeLog
> 
> --
> You are receiving this mail because:
> You are on the CC list for the bug.
>


[Bug tree-optimization/59586] [4.8/4.9/4.10 Regression] [graphite] Segmentation fault with -Ofast -floop-parallelize-all -ftree-parallelize-loops

2014-04-14 Thread mircea.namolaru at inria dot fr
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59586

Mircea Namolaru  changed:

   What|Removed |Added

 CC||mircea.namolaru at inria dot fr

--- Comment #3 from Mircea Namolaru  ---
Roman Gareev and Tobias Grosser solved this one. There is a proposed patch on
gcc-patches (see http://gcc.gnu.org/ml/gcc-patches/2014-03/msg01134.html) after
Tobias and me reviewed it.


[Bug tree-optimization/59121] [4.8/4.9 Regression] endless loop with -O2 -floop-parallelize-all

2014-04-10 Thread mircea.namolaru at inria dot fr
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59121

--- Comment #19 from Mircea Namolaru  ---

The problem for many of these simple cases is with Graphite formulation of
memory accesses constraints. For Fortran, or C (if arrays are declared as
pointers), a memory access is not constrained enough (basically it is expressed
as a function of a single induction variable). This may increase dramatically
the number of the constraint solutions. The computation time for them could
become prohibitive as well. But even worse, even if the computation finishes,
part of the solutions found are false dependencies restricting possible legal
transformations.

The solution is to constrain more the memory access - as it is done for C
arrays
(similar code in C as this Fortran example don't create any problem).
A possible solution is to express the access in terms of basic induction
variable as in the original source code - maybe by modifying the front end.
But other solutions are possible. For sure not a work for GCC 4.9.


[Bug tree-optimization/55022] [4.8/4.9 Regression] air.f90 is miscompliled with -m64 -O2 -fgraphite-identity after revision 190619

2014-04-10 Thread mircea.namolaru at inria dot fr
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55022

Mircea Namolaru  changed:

   What|Removed |Added

 CC||mircea.namolaru at inria dot fr

--- Comment #17 from Mircea Namolaru  ---
Vladimir Kargov, Tobias Grosser and me found that the problem is caused by
incorrect folding of the floord operator. As a result Cloog translates the
expression
-4294967296*floord(_19-i_17,4294967296)
to the tree-SSA expression
4294967296*floord(_19-i_17,-4294967296)

This is wrong, in the first case floord is 0 and in the second is -1.


[Bug tree-optimization/59121] [4.8/4.9 Regression] endless loop with -O2 -floop-parallelize-all

2014-03-25 Thread mircea.namolaru at inria dot fr
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59121

--- Comment #17 from Mircea Namolaru  ---
Yes, data dependencies computation is expensive in the polyehdral model
and it could take considerable time - but it is worrying that in too many
cases fails to provide (after a few hours left running, when I stop it) an
answer for very simple programs. I will ckeck with the isl people if this is
the expected behaviour of the isl_union_map_compute_flow (this is the function
where the data dependency computation is stuck) and how (and if) they could
help us.

Many Fortran programs with loops having no-constants bounds and n-dimensional
arrays (n>=3) may be affected by this problem and may work only for small
dimensions of the arrays. The problem touches especially Fortran, that uses
linearized accesses to multi-dimensional arrays - this creates patterns 
leading to this problem (in this example we have an array acc of dimension
55,48,49 and the array access acc(j,k,l) is transformed to acc(j + 55*k +
2640*l).

I've checked the constraints passed to isl_union_map_compute - see that
wrapping is perfromed. But wrapping requires modulo operation, expressed by
constraints with existential quantifier that may be harder to solve. By
disabling the wrapping, some simple examples that before were stuck in data
dependency computation finish immediately. In what measure is wrapping
necessary ? - as a side-effect it may increase compilation time (that may be
already considerable).


[Bug tree-optimization/59121] [4.8/4.9 Regression] endless loop with -O2 -floop-parallelize-all

2014-03-10 Thread mircea.namolaru at inria dot fr
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59121

--- Comment #14 from Mircea Namolaru  ---
Confirmed.

Start looking at it. This test also enters in an endless loop with the 
options -fgraphite-identiy -floop-nest-optimize -O2 -c.


[Bug tree-optimization/59121] [4.8/4.9 Regression] endless loop with -O2 -floop-parallelize-all

2014-02-03 Thread mircea.namolaru at inria dot fr
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59121

Mircea Namolaru  changed:

   What|Removed |Added

 CC||mircea.namolaru at inria dot fr

--- Comment #13 from Mircea Namolaru  ---
(In reply to Jeffrey A. Law from comment #12)
> The problem Richard is nobody is maintaining the code.  What makes this any
> different than a port which has become unmaintained and thus isn't being
> fixed in a timely manner?  I'm not in a position to own the code and unless
> someone steps in to own it/maintain it, I'll formally call for its removal
> after 4.9 is released.  You called the code out as unmaintained two years
> ago.  I called it out again in the 2013 Cauldron.  At some point we have to
> face reality and take appropriate action.
> 

I just joined INRIA - and at least for the next year I will maintain the
Graphite code. Started to look at the P1 issue: 

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58028