[Bug fortran/41137] inefficient zeroing of an array
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=41137 Dominique d'Humieres changed: What|Removed |Added Status|NEW |WAITING --- Comment #20 from Dominique d'Humieres --- For the test in comment 0, I get 0.107776999 0.108125009 with gcc6, 7, and trunk (8.0) if I use -O3 or -Ofast, -O2 gives 0.107547000 0.569643974 Could this PR be considered FIXED?
[Bug fortran/41137] inefficient zeroing of an array
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41137 Dominique d'Humieres dominiq at lps dot ens.fr changed: What|Removed |Added Known to work||4.6.4, 4.7.3, 4.8.2, 4.9.0 Known to fail||4.5.4 --- Comment #17 from Dominique d'Humieres dominiq at lps dot ens.fr --- With -O3, I get the same timings for the test in comment 1 since gcc 4.6.4. Could this PR be closed as FIXED or did I miss something in the discussion?
[Bug fortran/41137] inefficient zeroing of an array
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41137 --- Comment #18 from Joost VandeVondele Joost.VandeVondele at mat dot ethz.ch --- (In reply to Dominique d'Humieres from comment #17) With -O3, I get the same timings for the test in comment 1 since gcc 4.6.4. Could this PR be closed as FIXED or did I miss something in the discussion? However, the difference remains if the subroutines would be in separate files (comment #14), in fact, with '-O3 -fno-ipa-cp -fno-inline' the timings remain poor: ./a.out 0.156975999 0.65592 I think the issue is that the frontend could/should generate better code for this.
[Bug fortran/41137] inefficient zeroing of an array
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41137 --- Comment #19 from Thomas Koenig tkoenig at gcc dot gnu.org --- Also see PR 55858.
[Bug fortran/41137] inefficient zeroing of an array
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41137 Joost VandeVondele Joost.VandeVondele at mat dot ethz.ch changed: What|Removed |Added Last reconfirmed|2009-11-01 16:21:21 |2013-03-29 CC||Joost.VandeVondele at mat ||dot ethz.ch Blocks||38654 --- Comment #14 from Joost VandeVondele Joost.VandeVondele at mat dot ethz.ch 2013-03-29 09:46:53 UTC --- The code in comment #0 is actually a frontend optimization, PR38654. Noteworthy that the optimizers (ipa-cp plus others) do the right thing for the tester in comment #1 at -O3 (but can't do this in the general case).
[Bug fortran/41137] inefficient zeroing of an array
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41137 --- Comment #15 from Thomas Koenig tkoenig at gcc dot gnu.org 2013-03-29 22:19:05 UTC --- The patch from comment#12 causes memory failure of the following code: module zero implicit none contains subroutine foo(a) real, contiguous :: a(:,:) a(:,:) = 0 end subroutine foo end module zero program main use zero implicit none real, dimension(5,5) :: a a = 1. call foo(a(1:5:2,1:5:2)) write (*,'(5F12.5)') a end program main which is a bit strange.
[Bug fortran/41137] inefficient zeroing of an array
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41137 Tobias Burnus burnus at gcc dot gnu.org changed: What|Removed |Added CC||burnus at gcc dot gnu.org --- Comment #16 from Tobias Burnus burnus at gcc dot gnu.org 2013-03-29 22:38:58 UTC --- Possible off-topic remark - or hitting right on the nail: Looking at a(:,:,:,:)=0.0 and a(5:) = 0.0 I wonder whether it couldn't be handled via RANGE_REF, e.g. RANGE_REF(a,5,...) = { }; should work if I am not mistaken. Currently, we only do a = 0.0 - a = {};. See ARRAY_RANGE_REF in trans-expr.c's class_array_data_assign (gfc_index_zero_node is the offset) for the usage; see also GCC internal manual and Ada.
[Bug fortran/41137] inefficient zeroing of an array
--- Comment #12 from burnus at gcc dot gnu dot org 2010-06-22 14:42 --- (In reply to comment #11) What's the reason why gfc_trans_zero_assign insists that len is INTEGER_CST? At least if it is contiguous (and not assumed size), why can't memset be used even for non-constant sizes? Suggested by Jakub: - if (!len || TREE_CODE (len) != INTEGER_CST) + if (!len + || (TREE_CODE (len) != INTEGER_CST + !gfc_is_simply_contiguous (expr, false))) Though, one needs to be careful that one zeros the right spot (maybe already taken care of): a(5:) = 0 Additionally, one could do the same for arrays which are contiguous but have a descriptor - for which one has to calculate the size manually (as len == NULL). At least after memset/memcpy middle-end fixes, the change should be profitable. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41137
[Bug fortran/41137] inefficient zeroing of an array
--- Comment #13 from jakub at gcc dot gnu dot org 2010-06-22 15:25 --- Well, a(5:)=0.0 doesn't satisfy copyable_array_p, so gfc_trans_zero_assign isn't called at all. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41137
[Bug fortran/41137] inefficient zeroing of an array
--- Comment #7 from burnus at gcc dot gnu dot org 2010-06-21 15:02 --- (In reply to comment #1) Just for reference, the difference in time between the two variants is truly impressive. About a factor of 11 with gcc 4.4 and 8 with gcc 4.5. I get for the example the following values, note especially the newly added CONTIGUOUS result: 0.31601900 - assumed-shape 0.21601403 - assumed-shape CONTIGUOUS 0.21601295 - explicit size (n,n,...) 0.20801300 - explicit size (10,10,...) 0.21601403 - explicit size (10*10*...) Ignoring some measuring noise, assumed-shape is 46% (-O0) to 25% (-O3) slower than explicit size, but using the CONTIGUOUS attribute, the performance is re-gained. I cannot reproduce the factor of 10 results, however. What surprises me a bit is that -flto -fwhole-program does not reduce the speed penalty of assumed-shape arrays. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41137
[Bug fortran/41137] inefficient zeroing of an array
--- Comment #8 from burnus at gcc dot gnu dot org 2010-06-21 15:22 --- (In reply to comment #7) I get for the example the following values, note especially the newly added CONTIGUOUS result: For the test case, see attachment 20966 at PR 44612; that PR I have filled because GCC does not optimize away the loops, which only set but never read the value from the variable. (Ifort does this optimization.) Additionally, if one prints the variable, ifort is twice as fast. For curiosity: Using NAG, the timing is 0.690 vs. 1.220, i.e. the assumed-shape version is actually faster [though, its overall the performance is poor]. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41137
[Bug fortran/41137] inefficient zeroing of an array
--- Comment #9 from jv244 at cam dot ac dot uk 2010-06-21 15:49 --- (In reply to comment #7) I cannot reproduce the factor of 10 results, however. Here this still is the case (so might depend on the precise architecture): /data03/vondele/gcc_trunk/build/libexec/gcc/x86_64-unknown-linux-gnu/4.6.0/f951 test.f90 -march=k8-sse3 -mcx16 -msahf --param l1-cache-size=64 --param l1-cache-line-size=64 --param l2-cache-size=1024 -mtune=k8 -quiet -dumpbase test.f90 -auxbase test -O3 -version -fintrinsic-modules-path /data03/vondele/gcc_trunk/build/lib/gcc/x86_64-unknown-linux-gnu/4.6.0/finclude -o /tmp/ccXsKXnD.s ./a.out 0.10800600 1.0520660 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41137
[Bug fortran/41137] inefficient zeroing of an array
--- Comment #10 from burnus at gcc dot gnu dot org 2010-06-21 17:00 --- (In reply to comment #9) (In reply to comment #7) I cannot reproduce the factor of 10 results, however. Here this still is the case (so might depend on the precise architecture): OK, I was using -fwhole-file out of habit - thus the difference is that small (all optimization levels, including -O0). Otherwise, I also get the same factor-of-10 difference. If one splits it in two files, one needs to use -O3 -flto to get a fast program. For comparison, using two files, ifort also shows a factor of 2 to 5 difference (and is at -O0 ten times slower than gfortran; at -O2 it is twice as fast as gfortran). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41137
[Bug fortran/41137] inefficient zeroing of an array
--- Comment #11 from jakub at gcc dot gnu dot org 2010-06-21 17:43 --- What's the reason why gfc_trans_zero_assign insists that len is INTEGER_CST? At least if it is contiguous (and not assumed size), why can't memset be used even for non-constant sizes? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41137
[Bug fortran/41137] inefficient zeroing of an array
--- Comment #6 from dfranke at gcc dot gnu dot org 2010-05-07 21:01 --- See also PR40598. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41137
[Bug fortran/41137] inefficient zeroing of an array
--- Comment #5 from tkoenig at gcc dot gnu dot org 2009-11-01 16:21 --- A workaround (which should really be implemented within the compiler): subroutine s(a,n) integer :: n real :: a(n*n*n*n) a = 0.0 end subroutine This is legal Fortran, equivalent to your routine, and should be much faster. Confirmed, BTW. -- tkoenig at gcc dot gnu dot org changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever Confirmed|0 |1 Last reconfirmed|-00-00 00:00:00 |2009-11-01 16:21:21 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41137
[Bug fortran/41137] inefficient zeroing of an array
-- tkoenig at gcc dot gnu dot org changed: What|Removed |Added Severity|normal |enhancement http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41137
[Bug fortran/41137] inefficient zeroing of an array
--- Comment #1 from jv244 at cam dot ac dot uk 2009-08-21 07:02 --- Just for reference, the difference in time between the two variants is truly impressive. About a factor of 11 with gcc 4.4 and 8 with gcc 4.5. Given that a code like CP2K spents sometimes about 5-10% of its time in zeroing stuff, this would help significantly. trunk: gfortran -O3 -march=native test.f90 ./a.out 0.1600 0.84405303 4.4 branch: gfortran -O3 -march=native test.f90 ./a.out 0.10400600 1.1320710 test code: SUBROUTINE S(a,n) INTEGER :: n REAL :: a(n,n,n,n) a(:,:,:,:)=0.0 END SUBROUTINE SUBROUTINE S2(a) REAL :: a(10,10,10,10) a(:,:,:,:)=0.0 END SUBROUTINE REAL :: a(10,10,10,10),t1,t2 INTEGER :: I,N N=10 CALL CPU_TIME(t1) DO I=1,N CALL S2(a) ENDDO CALL CPU_TIME(t2) write(6,*) t2-t1 CALL CPU_TIME(t1) DO I=1,N CALL S(a,10) ENDDO CALL CPU_TIME(t2) write(6,*) t2-t1 END -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41137
[Bug fortran/41137] inefficient zeroing of an array
--- Comment #2 from dfranke at gcc dot gnu dot org 2009-08-21 07:39 --- I think PR31009 is similar. -- dfranke at gcc dot gnu dot org changed: What|Removed |Added CC||dfranke at gcc dot gnu dot ||org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41137
[Bug fortran/41137] inefficient zeroing of an array
--- Comment #3 from jv244 at cam dot ac dot uk 2009-08-21 08:29 --- (In reply to comment #2) I think PR31009 is similar. In fact, this is almost a dup of PR31016, since also here, I'm explicitly talking about the case of known-to-be-contiguous arrays. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41137