[Bug fortran/42118] Slow forall
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42118 Thomas Koenig changed: What|Removed |Added Status|WAITING |RESOLVED CC||tkoenig at gcc dot gnu.org Resolution|--- |WONTFIX --- Comment #11 from Thomas Koenig --- (In reply to Lionel GUEZ from comment #10) > (In reply to kargl from comment #9) > > Fortran 2018 has declared FORALL to be an obsolescent feature. > > I doubt that anyone will ever try to improve the performance > > of FORALL, because the next standard is likely to delete it. > > > > I think that this bug can be closed with WONTFIX or WORKSFORME. > > OK for me. We have had forall loop interchange for quite some time now, and that is all the effort that people are likely to put into this. So, closing as WONTFIX.
[Bug fortran/42118] Slow forall
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42118 --- Comment #10 from Lionel GUEZ --- (In reply to kargl from comment #9) > Fortran 2018 has declared FORALL to be an obsolescent feature. > I doubt that anyone will ever try to improve the performance > of FORALL, because the next standard is likely to delete it. > > I think that this bug can be closed with WONTFIX or WORKSFORME. OK for me.
[Bug fortran/42118] Slow forall
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42118 kargl at gcc dot gnu.org changed: What|Removed |Added Status|NEW |WAITING CC||kargl at gcc dot gnu.org --- Comment #9 from kargl at gcc dot gnu.org --- Fortran 2018 has declared FORALL to be an obsolescent feature. I doubt that anyone will ever try to improve the performance of FORALL, because the next standard is likely to delete it. I think that this bug can be closed with WONTFIX or WORKSFORME.
[Bug fortran/42118] Slow forall
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42118 Dominique d'Humieres changed: What|Removed |Added Priority|P3 |P4
[Bug fortran/42118] Slow forall
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42118 --- Comment #8 from Tobias Burnus --- (In reply to Tobias Burnus from comment #7) > By the way, the Fortran committee is considering to deprecate FORALL in the > next standard (Fortran 2015) because it considers FORALL superior in nearly > all aspects. Change "FORALL" to "DO CONCURRENT" in the last line and "deprecate" to "obsolescent". See http://j3-fortran.org/doc/year/13/13-323.txt The proposal has not been accepted yet, but I also didn't see much opposition to it. Quoting the reasoning (proposed for Appendix B of the next Fortran standard): "The FORALL construct and statement were added to the language in the expectation that they would enable highly efficient execution, especially on parallel processors. However, the experience with them indicates that they are too complex and have too many restrictions for compilers to take advantage of them. They are redundant with the DO CONCURRENT loop, and may of the manipulations for which they might be used may be done more efficiently by use of pointers, especially using pointer rank remapping."
[Bug fortran/42118] Slow forall
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42118 Tobias Burnus changed: What|Removed |Added CC||burnus at gcc dot gnu.org --- Comment #7 from Tobias Burnus --- (In reply to Harald Anlauf from comment #5) > Do not forget that there are constraints for FORALL statements that are > not required for DO loops so that all assignments are independent. > This guarantees vectorization Not quite: The Fortran standard requires that the RHS is evaluated before the assignment to the LHS is done. This might even imply the generation of a temporary variable. By contrast, DO CONCURRENT is much better: The user guarantees that the is no order dependence, while constraints ensure that the many violations of this are detected at compile time. In addition, DO CONCURRENT permits more things within its body. By the way, the Fortran committee is considering to deprecate FORALL in the next standard (Fortran 2015) because it considers FORALL superior in nearly all aspects. For DO CONCURRENT, I have a pending patch which sets the vectorization safelen to infinity (well INT_MAX). I wonder whether one could do likewise for FORALL; that probably needs some dependency fine tuning to ensure that there is dependency at all between the LHS and RHS. To avoid temporaries, it is sufficient to be either forward or backward dependency free. (Setting the safelen for whole-array operations probably also makes sense. There, the same applies.) (In reply to Lionel GUEZ from comment #6) > There is also the problem of the order of indices in a forall. I guess this > is in close relation to the comparison of do and forall. Try compiling with -floop-interchange (requires a GCC built with Graphite). Deciding which order is best is not a trivial task, although in simple cases as yours, it shouldn't be that difficult. Maybe someone finds the time to do it. [Presumably the same issue comes up with DO CONCURRENT, if one places multiple iteration variables into that statement (opposed to using multiple DO CONCURRENT statements with one iteration variable).] > According to the Fortran standard, the order of indices in the forall header > is of no consequence. Well, it doesn't with any of the compilers: The resulting value is always the same. The standard doesn't tell anything about the performance (not about the index walking order).
[Bug fortran/42118] Slow forall
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42118 Lionel GUEZ changed: What|Removed |Added CC||ebay.20.tedlap@spamgourmet. ||com --- Comment #6 from Lionel GUEZ --- There is also the problem of the order of indices in a forall. I guess this is in close relation to the comparison of do and forall. Consider the following test program : program test_forall implicit none integer, parameter:: n = 1000 integer i, j, k double precision S(n, n, n) forall (i = 1: n, j = 1: n, k = 1: n) S(i, j, k) = i * j * k print *, "ijk, sum(s) = ", sum(s) end program test_forall According to the Fortran standard, the order of indices in the forall header is of no consequence. So, in the above program, we should be able to write equivalently : forall (k = 1: n, j = 1: n, i = 1: n) S(i, j, k) = i * j * k There is no way for the writer of the program to predict which of the two versions should be faster. It is interesting to note that, with gfortran, the forall with kji is much slower, while the inverse is true with the NAG compiler (version 5.3). I think the two versions should have the same run time. I have actually tested the two versions of the program with four compilers : -- gfortran 4.4.6 with -O3 kji, sum(s) = 1.253753751250046E+017 real1m32.511s user1m22.342s sys0m8.368s ijk, sum(s) = 1.253753751250046E+017 real0m12.962s user0m7.416s sys0m5.427s -- nagfor 5.3 with -O4 kji, sum(s) =1.2537537512500458E+17 real0m13.396s user0m6.833s sys0m6.054s ijk, sum(s) =1.2537537512500458E+17 real2m37.943s user2m27.723s sys0m7.873s -- pgf95 11.10 with -fast kji, sum(s) =1.253753751248E+017 real0m12.119s user0m6.051s sys0m5.910s ijk, sum(s) =1.253753751248E+017 real0m11.979s user0m5.854s sys0m5.939s -- ifort 12.1 with -O3 : kji, sum(s) = 1.25375375125E+017 real0m5.210s user0m3.028s sys0m2.150s ijk, sum(s) = 1.25375375125E+017 real0m5.114s user0m2.981s sys0m2.115s So we see that PG Fortran and Intel Fortran behave well : the two versions take about the same time. Also Intel Fortran is much faster than other compilers on this test. I would also like to comment on the use of the forall. Tobias Burnus says that improving the forall in Gfortran is not worth the effort. I think the forall is useful. It is an elegant way to write some assignments. There is no idea of time sequence in a forall and the forall can only contain an assignement while, as you know, the do construct could contain call to subroutines, input-output, recursive computations, anything. So when one reads a program and sees the forall it is much more quickly clear to understand what is going on than when one reads a do loop. Also the fact that assignments are independent (comment of Harald Anlauf) should make it easier for the compiler to produce a fast code.
[Bug fortran/42118] Slow forall
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42118 Harald Anlauf changed: What|Removed |Added CC||anlauf at gmx dot de --- Comment #5 from Harald Anlauf 2012-03-01 19:54:08 UTC --- (In reply to comment #4) > Additionally, as written before (comment 2), a reasonably well written DO loop > should be always as fast or faster than a FORALL. The definition of FORALL > does > not allow for a good optimization in the general case. Do not forget that there are constraints for FORALL statements that are not required for DO loops so that all assignments are independent. This guarantees vectorization > I did a quick run with six compilers. Result: The FORALL construct was between > 3.2 to 5.25 times slower than the DO loop. Thus, other compilers do not handle > it better, either. I tried the SunStudio 12 on i686 Time of operation was 11.831321 seconds Time of operation was 12.235342 seconds and on x86_64 (AMD barcelona) Time of operation was 8.715117 seconds Time of operation was 10.525522 seconds So a small slowdown. Then I tried NEC's sxf90 rev.441 for SX-9 at -Chopt: Time of operation was 4.187261 seconds Time of operation was 1.259775 seconds Whoops! After looking into the transformation listing and instrumenting the code, it looks like the do loop is poorly optimized, giving lots of so-called bank conflicts. Reducing optimization to -Cvopt, I get: Time of operation was 1.185673 seconds Time of operation was 1.271729 seconds Looks reasonable. So yes, FORALL is in practice slightly slower (almost always... ;-)
[Bug fortran/42118] Slow forall
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42118 Tobias Burnus changed: What|Removed |Added CC||burnus at gcc dot gnu.org --- Comment #4 from Tobias Burnus 2012-03-01 08:06:01 UTC --- (In reply to comment #3) > Also exist in the gcc4.7 trunk. Can we mark it a Regression? Only if it worked better in some previous GCC version, which does not seem to be the case. Additionally, as written before (comment 2), a reasonably well written DO loop should be always as fast or faster than a FORALL. The definition of FORALL does not allow for a good optimization in the general case. You should also consider using Fortran 2008's DO CONCURRENT, which allows for more optimizations than a normal DO loop. (Though, currently gfortran handles DO CONCURRENT as a normal DO loop.) As FORALL is rather complicated and not widely used, some possible optimizations aren't implemented. (I have not checked whether that's the case for the program in question.) I did a quick run with six compilers. Result: The FORALL construct was between 3.2 to 5.25 times slower than the DO loop. Thus, other compilers do not handle it better, either.
[Bug fortran/42118] Slow forall
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42118 xunxun changed: What|Removed |Added CC||xunxun1982 at gmail dot com --- Comment #3 from xunxun 2012-03-01 07:37:08 UTC --- Also exist in the gcc4.7 trunk. Can we mark it a Regression?
[Bug fortran/42118] Slow forall
--- Comment #2 from burnus at gcc dot gnu dot org 2009-11-20 14:20 --- (In reply to comment #0) > I think that forall statement must be at least as fast as equivalent > do- -end do construction. The Fortran standardization committee thought likewise, however, as it turned out in practice, it is sometimes not trivial for the compiler to see whether there is any dependence on the RHS (right-hand side) with regards to the LHS and thus it might use a temporary array even if none is needed - and temporary arrays are slow (and memory hungry). Thus, a DO loop should be always faster or as fast as a FORALL (assignment) statement (unless, one does something really stupid in the DO loop). [At least that is what I gathered from the comments at comp.lang.fortran and which matches my knowledge regarding how it is done in gfortran.] Having said that, gfortran still should try to make your program as fast for FORALL as it is for the DO loop. > But the next program (variant of LU-decomposition) shows that fragment > containing forall statement is approximately at 2.5(!) times slower then > fragment with do-end do. You could check using -fdump-tree-original how the two versions are handled; my guess is that the FORALL version uses a temporary array. (-fdump-tree-original creates a .004* which contains a dump of the internal representation of your code, which looks similar to C.) Seemingly, Richard already looked at the dump and confirmed my suspicion. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42118
[Bug fortran/42118] Slow forall
--- Comment #1 from rguenth at gcc dot gnu dot org 2009-11-20 14:03 --- Confirmed. GFortran seems to split the loops differently and uses a larger temporary for the forall case. -- rguenth at gcc dot gnu dot org changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever Confirmed|0 |1 Last reconfirmed|-00-00 00:00:00 |2009-11-20 14:03:56 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42118