[Bug tree-optimization/56145] [4.8/4.9 Regression] Use of too much optimizations -O2 -ffast-math -floop-parallelize-all
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56145 --- Comment #16 from Mircea Namolaru --- Right, the NULL check fixed this for previous versions of GCC. For the current version, it works without these NULL checks (the NULL paths are not followed). The relevant scop fields are always initialized to empty maps, so they are never NULL. The problematic call to compute_deps where the NULL problem occured was from parallelization check code (compute_loop_level_carries_dependencies), code no longer in GCC.
[Bug tree-optimization/56145] [4.8/4.9/5 Regression] Use of too much optimizations -O2 -ffast-math -floop-parallelize-all
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=56145 Mircea Namolaru changed: What|Removed |Added CC||mircea.namolaru at inria dot fr --- Comment #14 from Mircea Namolaru --- Didn't succeed to reproduce the problem on Linux. The flag loop->can_be_parallel set by Graphite for parallelization is computed in a different manner. This is made possible by the new ISL based code generation that annotates the AST with relevant information for loop nodes. I think that is possible to close this problem.
[Bug tree-optimization/62630] [5 regression] gcc.dg/graphite/vect-pr43423.c FAILs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62630 --- Comment #18 from Mircea Namolaru --- I've succeeded to explain why these casts are generated, and they seem correct. Graphite introduces new induction variables with a larger size type (then the type of original induction variable), to make sure that accommodate the iteration count. Otherwise an overflow not in the original program occurs. On the other side, to not hide overflows for computations using the original induction variables, you still need to use them. This explain the casts to variables with smaller types. These cast will cause an overflow (i.e. a value not in the range of the smaller type), only when an overflow in the original problem occurs. There were introduced to mimics the behaviour of the original program in case of overflows. So, my understanding is that there is nothing wrong with Graphite or scalar evolution. The vectorization succeeded because no larger size type was used, but this was unsafe. To make it work again, you need some supplementary analysis to determine that casts are redundant. This is not a simple problem, but with Richard patch the vectorization works. Unfortunately, don't see other simpler solutions. It would be possible maybe to catch in Graphite that the transformation is graphite identity, and the original lower bounds of loops are zero. This will ensure that a larger size type is not needed for induction variable. But seems like a modification intended to make this test works, and nothing more. Btw, the only potential problem found may be with the code for gather in vectorization. After the attempt to use the vector load fails, the vectorizer detects an opportunity for a gather instruction, but as don't find a suitable one (this depends on architecture) vectorization fails. It seems to me that the analysis for gather don't take into account the possibility of overflows. For this test, I could modify the code and use as gather instruction a load vector (even this was found not to be safe). The vectorization would succeed. But not entirely sure about this ...
[Bug tree-optimization/62630] [5 regression] gcc.dg/graphite/vect-pr43423.c FAILs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62630 --- Comment #16 from Mircea Namolaru --- Yes, but it seems to me that the cast (not in the original code) should not be generated at all if it could not be guaranteed that the casted-to type is larger enough to accommodate it. Otherwise you introduce a cast from a longer signed type to a shorter signed one whose behaviour is undefined by the C standard and was not in the original code. So the cast in the following code is problematic (when graphite_IV, a signed long is not in the range of a signed int). _56 = (intD.6) graphite_IV.5_53; _55 = aD.1830[_56]; The solution to fix this is to made Graphite not to generate casts like this. An alternative is to infer the range of graphite_IV like you do and remove the cast (but this seems more complicated and risky as the analysis may not succeed and the problematic cast is not removed).
[Bug tree-optimization/62630] [5 regression] gcc.dg/graphite/vect-pr43423.c FAILs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62630 --- Comment #14 from Mircea Namolaru --- It seems to me that scalar evolution succeeds to determine the number of iterations for the case of signed longs. Looking in vectorization dump, first a symbolic expression for the number of iterations of a loop is found, and then vect_analyze_refs is entered. The problem is that the code expect an offset of a load to be an induction variable, but in our case an offset is only a cast of an induction variable, like below: _56 = (intD.6) graphite_IV.5_53; _55 = aD.1830[_56]; The offset is found not to be an affine expression, and vectorization don't succeed. But as the offset is a cast of an induction variable, it has the same behaviour as an induction variable even if formally is not one. It seems to me that somehow extending the code to support casts of induction variables will solve our this problem.
[Bug tree-optimization/62630] [5 regression] gcc.dg/graphite/vect-pr43423.c FAILs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62630 --- Comment #10 from Mircea Namolaru --- On my Intel x86-64 platform changed in graphite-isl-ast-to-gimple.c: - static int graphite_expression_type_precision = 128 <= max_mode_int_precision ? - 128 : max_mode_int_precision; + static int graphite_expression_type_precision = 32; The computation are done on INT, and you get this code for basic block 4 (and vectorization performed): _28 = MIN_EXPR ; _29 = _28 > 0; if (_29 != 0) goto ; else goto ; In my opinion the casts introduced cause problems to scalar evolution (and you are right, MIN/MAX are not the problem). I will look into two directions, and choose the quickest one fixing the regression. 1) not to generate casts in Graphite, if correctness is not affected (as in this case). But determining when the use of a longer size signed type is required is not so simple. 2) modify handling of casts in scalar evolution. But I am not familiar with this code. - Original Message - > From: "rguenth at gcc dot gnu.org" > To: "mircea namolaru" > Sent: Wednesday, February 18, 2015 12:22:55 PM > Subject: [Bug tree-optimization/62630] [5 regression] > gcc.dg/graphite/vect-pr43423.c FAILs > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62630 > > --- Comment #8 from Richard Biener --- > (In reply to Mircea Namolaru from comment #7) > > Graphite generates MAX/MIN expressions. > > > > I've modified Graphite to use the original types of "n" and "mid" in MIN > > and > > MAX, and to not generate the casts of "n" and "mid" to a longer signed INT > > before MIN/MAX, and the vectorization succeeded. > > > > It seems that it is not a Graphite problem but a scalar evolution one. > > Scalar evolution is not able to handle MIN/MAX expressions in the presence > > of casts. Beside vectorization also further unrolling is prevented. > > Can you share a patch with that modification? I'd like to look at the > differences it makes with respect to SCEV / niter analysis. Note that > neither SCEV nor niter analysis handle MIN/MAX_EXPRs explicitely. It might > be that > > : > if (n_5(D) > 0) > goto ; > ... > : > _28 = (signed long) n_5(D); > _29 = (signed long) mid_6(D); > _30 = MIN_EXPR <_28, _29>; > _31 = _30 > 0; > if (_31 != 0) > > can be simplified to mid_6(D) > 0 by expansion/folding in some way though > if there are no casts in the way. Not sure. I suppose ISL doesn't get to > know that n > 0 if the loop enters (and doesn't exploit that knowledge)? > > -- > You are receiving this mail because: > You are on the CC list for the bug. >
[Bug tree-optimization/62630] [5 regression] gcc.dg/graphite/vect-pr43423.c FAILs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62630 Mircea Namolaru changed: What|Removed |Added CC||mircea.namolaru at inria dot fr --- Comment #7 from Mircea Namolaru --- Graphite generates MAX/MIN expressions. I've modified Graphite to use the original types of "n" and "mid" in MIN and MAX, and to not generate the casts of "n" and "mid" to a longer signed INT before MIN/MAX, and the vectorization succeeded. It seems that it is not a Graphite problem but a scalar evolution one. Scalar evolution is not able to handle MIN/MAX expressions in the presence of casts. Beside vectorization also further unrolling is prevented.
[Bug tree-optimization/64098] ICE isl_ctx.c:172: isl_ctx freed, but some objects still referenced
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64098 --- Comment #1 from Mircea Namolaru --- Bug confirmed. The error message points to a problem in the way in which the unroll-and-jam code manage the isl objects (the space is not freed properly). I pinned down the function causing the problem, but work is still needed. I will send a patch in the following days. The patch will also add some test cases (including this one) for loop-unroll-and-jam.
[Bug tree-optimization/61000] No loop interchange for inner loop along the slow index
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=61000 --- Comment #4 from Mircea Namolaru --- Right, C arrays expressed as pointers suffers from the same problem. But for C at least there is a way to avoid this. Many thanks for your suggestion of how to de-linearize arrays in middle-end, it seems that may be simpler then I've thought. Hope to find time and wrote a patch based on your idea for GCC 4.10. Mircea - Original Message - > From: "rguenth at gcc dot gnu.org" > To: "mircea namolaru" > Sent: Wednesday, April 30, 2014 1:02:10 PM > Subject: [Bug tree-optimization/61000] No loop interchange for inner loop > along the slow index > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=61000 > > --- Comment #3 from Richard Biener --- > (In reply to Mircea Namolaru from comment #2) > > Again, the problem is due to representation of arrays in Fortran as array > > with a single dimnesion (for similar code in C profitability check work as > > expected). It is a recurring problem that may lead to compilation time > > increase (sometimes dramatically) or missed opportunities optimizations due > > to too conservative dependence analysis or as on this case the > > profitability > > check failure. The solution is to de-liniarize array accesses in Fortran as > > in C. > > Note that C doesn't always have de-linearized arrays (once you access the > array via a pointer). > > For Fortran de-linearizing is "easy" via simple casting to a > multi-dimensional > (variable-bounds) array type. For the middle-end side, that is. > > -- > You are receiving this mail because: > You are on the CC list for the bug. >
[Bug tree-optimization/61000] No loop interchange for inner loop along the slow index
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=61000 --- Comment #2 from Mircea Namolaru --- Again, the problem is due to representation of arrays in Fortran as array with a single dimnesion (for similar code in C profitability check work as expected). It is a recurring problem that may lead to compilation time increase (sometimes dramatically) or missed opportunities optimizations due to too conservative dependence analysis or as on this case the profitability check failure. The solution is to de-liniarize array accesses in Fortran as in C.
[Bug tree-optimization/61000] No loop interchange for inner loop along the slow index
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=61000 --- Comment #1 from Mircea Namolaru --- The built-in heuristics assess that loop interchange is not profitable. Indeed there is a problem, I would expected that the second loop to be found profitable. Need to look more in depth at this.
[Bug tree-optimization/60997] -fopenmp conflicts with -floop-interchange
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60997 --- Comment #3 from Mircea Namolaru --- It is not that -floop-interchange is disabled, but the code received by graphite is different if the option -fopenmp is enabled. In this case the check for data dependencies required by loop-interchange fails. I wil check more in depth if the data dependencies are right in this case or there is a problem with them (but probably not). I guess that the problem is the same for vectorization (but there the data dependencies for vectorization are not checked by graphite).
[Bug tree-optimization/55022] [4.8 Regression] air.f90 is miscompliled with -m64 -O2 -fgraphite-identity after revision 190619
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55022 --- Comment #27 from Mircea Namolaru --- Hi, Many thanks. I've passed over the meta-bug opened by you for Graphite, http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59859 and seems to me that many of the problem have been already solved (some by you) or at least as in case of some compile-time/memory usage/missed optimization issues we know how to solve it - some form of delinearization of array accesses that for sure was not for GCC 4.9. Btw, there is also another Graphite bug in this list http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59586 that was solved but didn't make it for GCC 4.9.0. Mircea - Original Message - > From: "rguenth at gcc dot gnu.org" > To: "mircea namolaru" > Sent: Tuesday, April 22, 2014 3:34:09 PM > Subject: [Bug tree-optimization/55022] [4.8 Regression] air.f90 is > miscompliled with -m64 -O2 -fgraphite-identity > after revision 190619 > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55022 > > --- Comment #26 from Richard Biener --- > Author: rguenth > Date: Tue Apr 22 13:33:37 2014 > New Revision: 209633 > > URL: http://gcc.gnu.org/viewcvs?rev=209633&root=gcc&view=rev > Log: > 2014-04-22 Richard Biener > > Backport from mainline > 2014-04-14 Richard Biener > > PR middle-end/55022 > * fold-const.c (negate_expr_p): Don't negate directional rounding > division. > (fold_negate_expr): Likewise. > > * gcc.dg/graphite/pr55022.c: New testcase. > > Added: > branches/gcc-4_9-branch/gcc/testsuite/gcc.dg/graphite/pr55022.c > Modified: > branches/gcc-4_9-branch/gcc/ChangeLog > branches/gcc-4_9-branch/gcc/fold-const.c > branches/gcc-4_9-branch/gcc/testsuite/ChangeLog > > -- > You are receiving this mail because: > You are on the CC list for the bug. >
[Bug tree-optimization/59586] [4.8/4.9/4.10 Regression] [graphite] Segmentation fault with -Ofast -floop-parallelize-all -ftree-parallelize-loops
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59586 Mircea Namolaru changed: What|Removed |Added CC||mircea.namolaru at inria dot fr --- Comment #3 from Mircea Namolaru --- Roman Gareev and Tobias Grosser solved this one. There is a proposed patch on gcc-patches (see http://gcc.gnu.org/ml/gcc-patches/2014-03/msg01134.html) after Tobias and me reviewed it.
[Bug tree-optimization/59121] [4.8/4.9 Regression] endless loop with -O2 -floop-parallelize-all
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59121 --- Comment #19 from Mircea Namolaru --- The problem for many of these simple cases is with Graphite formulation of memory accesses constraints. For Fortran, or C (if arrays are declared as pointers), a memory access is not constrained enough (basically it is expressed as a function of a single induction variable). This may increase dramatically the number of the constraint solutions. The computation time for them could become prohibitive as well. But even worse, even if the computation finishes, part of the solutions found are false dependencies restricting possible legal transformations. The solution is to constrain more the memory access - as it is done for C arrays (similar code in C as this Fortran example don't create any problem). A possible solution is to express the access in terms of basic induction variable as in the original source code - maybe by modifying the front end. But other solutions are possible. For sure not a work for GCC 4.9.
[Bug tree-optimization/55022] [4.8/4.9 Regression] air.f90 is miscompliled with -m64 -O2 -fgraphite-identity after revision 190619
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55022 Mircea Namolaru changed: What|Removed |Added CC||mircea.namolaru at inria dot fr --- Comment #17 from Mircea Namolaru --- Vladimir Kargov, Tobias Grosser and me found that the problem is caused by incorrect folding of the floord operator. As a result Cloog translates the expression -4294967296*floord(_19-i_17,4294967296) to the tree-SSA expression 4294967296*floord(_19-i_17,-4294967296) This is wrong, in the first case floord is 0 and in the second is -1.
[Bug tree-optimization/59121] [4.8/4.9 Regression] endless loop with -O2 -floop-parallelize-all
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59121 --- Comment #17 from Mircea Namolaru --- Yes, data dependencies computation is expensive in the polyehdral model and it could take considerable time - but it is worrying that in too many cases fails to provide (after a few hours left running, when I stop it) an answer for very simple programs. I will ckeck with the isl people if this is the expected behaviour of the isl_union_map_compute_flow (this is the function where the data dependency computation is stuck) and how (and if) they could help us. Many Fortran programs with loops having no-constants bounds and n-dimensional arrays (n>=3) may be affected by this problem and may work only for small dimensions of the arrays. The problem touches especially Fortran, that uses linearized accesses to multi-dimensional arrays - this creates patterns leading to this problem (in this example we have an array acc of dimension 55,48,49 and the array access acc(j,k,l) is transformed to acc(j + 55*k + 2640*l). I've checked the constraints passed to isl_union_map_compute - see that wrapping is perfromed. But wrapping requires modulo operation, expressed by constraints with existential quantifier that may be harder to solve. By disabling the wrapping, some simple examples that before were stuck in data dependency computation finish immediately. In what measure is wrapping necessary ? - as a side-effect it may increase compilation time (that may be already considerable).
[Bug tree-optimization/59121] [4.8/4.9 Regression] endless loop with -O2 -floop-parallelize-all
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59121 --- Comment #14 from Mircea Namolaru --- Confirmed. Start looking at it. This test also enters in an endless loop with the options -fgraphite-identiy -floop-nest-optimize -O2 -c.
[Bug tree-optimization/59121] [4.8/4.9 Regression] endless loop with -O2 -floop-parallelize-all
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59121 Mircea Namolaru changed: What|Removed |Added CC||mircea.namolaru at inria dot fr --- Comment #13 from Mircea Namolaru --- (In reply to Jeffrey A. Law from comment #12) > The problem Richard is nobody is maintaining the code. What makes this any > different than a port which has become unmaintained and thus isn't being > fixed in a timely manner? I'm not in a position to own the code and unless > someone steps in to own it/maintain it, I'll formally call for its removal > after 4.9 is released. You called the code out as unmaintained two years > ago. I called it out again in the 2013 Cauldron. At some point we have to > face reality and take appropriate action. > I just joined INRIA - and at least for the next year I will maintain the Graphite code. Started to look at the P1 issue: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58028