[Bug fortran/41137] inefficient zeroing of an array

2018-01-27 Thread dominiq at lps dot ens.fr
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=41137

Dominique d'Humieres  changed:

   What|Removed |Added

 Status|NEW |WAITING

--- Comment #20 from Dominique d'Humieres  ---
For the test in comment 0, I get

  0.107776999
  0.108125009

with gcc6, 7, and trunk (8.0) if I use -O3 or -Ofast, -O2 gives

  0.107547000
  0.569643974

Could this PR be considered FIXED?

[Bug fortran/41137] inefficient zeroing of an array

2014-05-01 Thread dominiq at lps dot ens.fr
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41137

Dominique d'Humieres dominiq at lps dot ens.fr changed:

   What|Removed |Added

  Known to work||4.6.4, 4.7.3, 4.8.2, 4.9.0
  Known to fail||4.5.4

--- Comment #17 from Dominique d'Humieres dominiq at lps dot ens.fr ---
With -O3, I get the same timings for the test in comment 1 since gcc 4.6.4.
Could this PR be closed as FIXED or did I miss something in the discussion?


[Bug fortran/41137] inefficient zeroing of an array

2014-05-01 Thread Joost.VandeVondele at mat dot ethz.ch
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41137

--- Comment #18 from Joost VandeVondele Joost.VandeVondele at mat dot ethz.ch 
---
(In reply to Dominique d'Humieres from comment #17)
 With -O3, I get the same timings for the test in comment 1 since gcc 4.6.4.
 Could this PR be closed as FIXED or did I miss something in the discussion?

However, the difference remains if the subroutines would be in separate files
(comment #14), in fact, with '-O3 -fno-ipa-cp -fno-inline' the timings remain
poor:

 ./a.out
  0.156975999
  0.65592

I think the issue is that the frontend could/should generate better code for
this.


[Bug fortran/41137] inefficient zeroing of an array

2014-05-01 Thread tkoenig at gcc dot gnu.org
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41137

--- Comment #19 from Thomas Koenig tkoenig at gcc dot gnu.org ---
Also see PR 55858.


[Bug fortran/41137] inefficient zeroing of an array

2013-03-29 Thread Joost.VandeVondele at mat dot ethz.ch


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41137



Joost VandeVondele Joost.VandeVondele at mat dot ethz.ch changed:



   What|Removed |Added



   Last reconfirmed|2009-11-01 16:21:21 |2013-03-29

 CC||Joost.VandeVondele at mat

   ||dot ethz.ch

 Blocks||38654



--- Comment #14 from Joost VandeVondele Joost.VandeVondele at mat dot ethz.ch 
2013-03-29 09:46:53 UTC ---

The code in comment #0 is actually a frontend optimization, PR38654. 



Noteworthy that the optimizers (ipa-cp plus others) do the right thing for the

tester in comment #1 at -O3 (but can't do this in the general case).


[Bug fortran/41137] inefficient zeroing of an array

2013-03-29 Thread tkoenig at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41137



--- Comment #15 from Thomas Koenig tkoenig at gcc dot gnu.org 2013-03-29 
22:19:05 UTC ---

The patch from comment#12 causes memory failure of the

following code:





module zero

  implicit none

contains

  subroutine foo(a)

real, contiguous :: a(:,:)

a(:,:) = 0

  end subroutine foo

end module zero



program main

  use zero

  implicit none

  real, dimension(5,5) :: a

  a = 1.

  call foo(a(1:5:2,1:5:2))

  write (*,'(5F12.5)') a

end program main



which is a bit strange.


[Bug fortran/41137] inefficient zeroing of an array

2013-03-29 Thread burnus at gcc dot gnu.org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41137



Tobias Burnus burnus at gcc dot gnu.org changed:



   What|Removed |Added



 CC||burnus at gcc dot gnu.org



--- Comment #16 from Tobias Burnus burnus at gcc dot gnu.org 2013-03-29 
22:38:58 UTC ---

Possible off-topic remark - or hitting right on the nail: Looking at

  a(:,:,:,:)=0.0

and

  a(5:) = 0.0

I wonder whether it couldn't be handled via RANGE_REF, e.g.

  RANGE_REF(a,5,...) = { };

should work if I am not mistaken. Currently, we only do a = 0.0 - a = {};.



See ARRAY_RANGE_REF in trans-expr.c's class_array_data_assign

(gfc_index_zero_node is the offset) for the usage; see also GCC internal manual

and Ada.



[Bug fortran/41137] inefficient zeroing of an array

2010-06-22 Thread burnus at gcc dot gnu dot org


--- Comment #12 from burnus at gcc dot gnu dot org  2010-06-22 14:42 ---
(In reply to comment #11)
 What's the reason why gfc_trans_zero_assign insists that len is INTEGER_CST?
 At least if it is contiguous (and not assumed size), why can't memset be used
 even for non-constant sizes?

Suggested by Jakub: 

 -  if (!len || TREE_CODE (len) != INTEGER_CST)
 +  if (!len
 +  || (TREE_CODE (len) != INTEGER_CST
 +   !gfc_is_simply_contiguous (expr, false)))

Though, one needs to be careful that one zeros the right spot (maybe already
taken care of):
  a(5:) = 0

Additionally, one could do the same for arrays which are contiguous but have a
descriptor - for which one has to calculate the size manually (as len ==
NULL). At least after memset/memcpy middle-end fixes, the change should be
profitable.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41137



[Bug fortran/41137] inefficient zeroing of an array

2010-06-22 Thread jakub at gcc dot gnu dot org


--- Comment #13 from jakub at gcc dot gnu dot org  2010-06-22 15:25 ---
Well, a(5:)=0.0 doesn't satisfy copyable_array_p, so gfc_trans_zero_assign
isn't called at all.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41137



[Bug fortran/41137] inefficient zeroing of an array

2010-06-21 Thread burnus at gcc dot gnu dot org


--- Comment #7 from burnus at gcc dot gnu dot org  2010-06-21 15:02 ---
(In reply to comment #1)
 Just for reference, the difference in time between the two variants is truly
 impressive. About a factor of 11 with gcc 4.4 and 8 with gcc 4.5.

I get for the example the following values, note especially the newly added
CONTIGUOUS result:

  0.31601900 - assumed-shape
  0.21601403 - assumed-shape CONTIGUOUS 
  0.21601295  - explicit size (n,n,...)
  0.20801300  - explicit size (10,10,...)
  0.21601403  - explicit size (10*10*...)

Ignoring some measuring noise, assumed-shape is 46% (-O0) to 25% (-O3) slower
than explicit  size, but using the CONTIGUOUS attribute, the performance is
re-gained. I cannot reproduce the factor of 10 results, however. What surprises
me a bit is that -flto -fwhole-program does not reduce the speed penalty of
assumed-shape arrays.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41137



[Bug fortran/41137] inefficient zeroing of an array

2010-06-21 Thread burnus at gcc dot gnu dot org


--- Comment #8 from burnus at gcc dot gnu dot org  2010-06-21 15:22 ---
(In reply to comment #7)
 I get for the example the following values, note especially the newly added
 CONTIGUOUS result:

For the test case, see attachment 20966 at PR 44612; that PR I have filled
because GCC does not optimize away the loops, which only set but never read the
value from the variable. (Ifort does this optimization.) Additionally, if one
prints the variable, ifort is twice as fast. For curiosity: Using NAG, the
timing is 0.690 vs. 1.220, i.e. the assumed-shape version is actually
faster [though, its overall the performance is poor].


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41137



[Bug fortran/41137] inefficient zeroing of an array

2010-06-21 Thread jv244 at cam dot ac dot uk


--- Comment #9 from jv244 at cam dot ac dot uk  2010-06-21 15:49 ---
(In reply to comment #7)

 I cannot reproduce the factor of 10 results, however. 

Here this still is the case (so might depend on the precise architecture):

/data03/vondele/gcc_trunk/build/libexec/gcc/x86_64-unknown-linux-gnu/4.6.0/f951
test.f90 -march=k8-sse3 -mcx16 -msahf --param l1-cache-size=64 --param
l1-cache-line-size=64 --param l2-cache-size=1024 -mtune=k8 -quiet -dumpbase
test.f90 -auxbase test -O3 -version -fintrinsic-modules-path
/data03/vondele/gcc_trunk/build/lib/gcc/x86_64-unknown-linux-gnu/4.6.0/finclude
-o /tmp/ccXsKXnD.s

 ./a.out
  0.10800600
   1.0520660


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41137



[Bug fortran/41137] inefficient zeroing of an array

2010-06-21 Thread burnus at gcc dot gnu dot org


--- Comment #10 from burnus at gcc dot gnu dot org  2010-06-21 17:00 ---
(In reply to comment #9)
 (In reply to comment #7)
  I cannot reproduce the factor of 10 results, however. 
 Here this still is the case (so might depend on the precise architecture):

OK, I was using -fwhole-file out of habit - thus the difference is that small
(all optimization levels, including -O0). Otherwise, I also get the same
factor-of-10 difference. If one splits it in two files, one needs to use -O3
-flto to get a fast program.

For comparison, using two files, ifort also shows a factor of 2 to 5 difference
(and is at -O0 ten times slower than gfortran; at -O2 it is twice as fast as
gfortran).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41137



[Bug fortran/41137] inefficient zeroing of an array

2010-06-21 Thread jakub at gcc dot gnu dot org


--- Comment #11 from jakub at gcc dot gnu dot org  2010-06-21 17:43 ---
What's the reason why gfc_trans_zero_assign insists that len is INTEGER_CST?
At least if it is contiguous (and not assumed size), why can't memset be used
even for non-constant sizes?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41137



[Bug fortran/41137] inefficient zeroing of an array

2010-05-07 Thread dfranke at gcc dot gnu dot org


--- Comment #6 from dfranke at gcc dot gnu dot org  2010-05-07 21:01 ---
See also PR40598.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41137



[Bug fortran/41137] inefficient zeroing of an array

2009-11-01 Thread tkoenig at gcc dot gnu dot org


--- Comment #5 from tkoenig at gcc dot gnu dot org  2009-11-01 16:21 ---
A workaround (which should really be implemented within the compiler):

subroutine s(a,n)
integer :: n
real :: a(n*n*n*n)
a = 0.0
end subroutine

This is legal Fortran, equivalent to your routine, and should be much faster.

Confirmed, BTW.


-- 

tkoenig at gcc dot gnu dot org changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever Confirmed|0   |1
   Last reconfirmed|-00-00 00:00:00 |2009-11-01 16:21:21
   date||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41137



[Bug fortran/41137] inefficient zeroing of an array

2009-11-01 Thread tkoenig at gcc dot gnu dot org


-- 

tkoenig at gcc dot gnu dot org changed:

   What|Removed |Added

   Severity|normal  |enhancement


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41137



[Bug fortran/41137] inefficient zeroing of an array

2009-08-21 Thread jv244 at cam dot ac dot uk


--- Comment #1 from jv244 at cam dot ac dot uk  2009-08-21 07:02 ---
Just for reference, the difference in time between the two variants is truly
impressive. About a factor of 11 with gcc 4.4 and 8 with gcc 4.5. Given that a
code like CP2K spents sometimes about 5-10% of its time in zeroing stuff, this
would help significantly.

trunk:

 gfortran -O3 -march=native test.f90
 ./a.out
  0.1600
  0.84405303

4.4 branch:
 gfortran -O3 -march=native test.f90
 ./a.out
  0.10400600
  1.1320710

test code:
SUBROUTINE S(a,n)
INTEGER :: n
REAL :: a(n,n,n,n)
a(:,:,:,:)=0.0
END SUBROUTINE

SUBROUTINE S2(a)
REAL :: a(10,10,10,10)
a(:,:,:,:)=0.0
END SUBROUTINE


REAL :: a(10,10,10,10),t1,t2
INTEGER :: I,N
N=10

CALL CPU_TIME(t1)
DO I=1,N
CALL S2(a)
ENDDO
CALL CPU_TIME(t2)
write(6,*) t2-t1

CALL CPU_TIME(t1)
DO I=1,N
CALL S(a,10)
ENDDO
CALL CPU_TIME(t2)
write(6,*) t2-t1

END


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41137



[Bug fortran/41137] inefficient zeroing of an array

2009-08-21 Thread dfranke at gcc dot gnu dot org


--- Comment #2 from dfranke at gcc dot gnu dot org  2009-08-21 07:39 ---
I think PR31009 is similar.


-- 

dfranke at gcc dot gnu dot org changed:

   What|Removed |Added

 CC||dfranke at gcc dot gnu dot
   ||org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41137



[Bug fortran/41137] inefficient zeroing of an array

2009-08-21 Thread jv244 at cam dot ac dot uk


--- Comment #3 from jv244 at cam dot ac dot uk  2009-08-21 08:29 ---
(In reply to comment #2)
 I think PR31009 is similar.

In fact, this is almost a dup of PR31016, since also here, I'm explicitly
talking about the case of known-to-be-contiguous arrays.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41137