subject:"\[Bug fortran\/29549\] matmul slow for complex matrices"

[Bug fortran/29549] matmul slow for complex matrices

2008-02-26 Thread jb at gcc dot gnu dot org



--- Comment #14 from jb at gcc dot gnu dot org  2008-02-26 21:15 ---
Closing as fixed. Timings for a small test program comparing matrix
multiplication done manually vs. libgfortran for real and complex.

Results without the committed patch (-O3 -funroll-loops, 1.6 GHz Pentium-M):

Manual real:   0.2140
Real matmul:   0.2390
Complex manual:   0.8259
Complex matmul:   3.8654

with the patch:

Manual real:   0.2130
Real matmul:   0.2520
Complex manual:   0.8149
Complex matmul:   0.8099

I.e. almost a factor of five speedup.


-- 

jb at gcc dot gnu dot org changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution||FIXED


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29549

[Bug fortran/29549] matmul slow for complex matrices

2008-02-25 Thread jb at gcc dot gnu dot org



--- Comment #13 from jb at gcc dot gnu dot org  2008-02-25 19:28 ---
Subject: Bug 29549

Author: jb
Date: Mon Feb 25 19:27:28 2008
New Revision: 132638

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=132638
Log:
2008-02-25  Janne Blomqvist  <[EMAIL PROTECTED]>

PR fortran/29549
* Makefile.am: Add -fcx-fortran-rules to AM_CFLAGS for all of
libgfortran.
* Makefile.in: Regenerated.


Modified:
trunk/libgfortran/ChangeLog
trunk/libgfortran/Makefile.am
trunk/libgfortran/Makefile.in


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29549

[Bug fortran/29549] matmul slow for complex matrices

2008-02-25 Thread jb at gcc dot gnu dot org

--- Comment #12 from jb at gcc dot gnu dot org  2008-02-25 19:21 ---
Subject: Bug 29549

Author: jb
Date: Mon Feb 25 19:20:48 2008
New Revision: 132636

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=132636
Log:
2008-02-25  Janne Blomqvist  <[EMAIL PROTECTED]>

PR fortran/29549
* doc/invoke.texi (-fcx-limited-range): Document new option.
* toplev.c (process_options): Handle -fcx-fortran-rules.
* common.opt: Add documentation for -fcx-fortran-rules.

Modified:
trunk/gcc/common.opt
trunk/gcc/doc/invoke.texi
trunk/gcc/toplev.c

-- 

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29549

[Bug fortran/29549] matmul slow for complex matrices

2008-02-19 Thread jb at gcc dot gnu dot org



--- Comment #11 from jb at gcc dot gnu dot org  2008-02-19 19:33 ---
Patch here: http://gcc.gnu.org/ml/gcc-patches/2008-02/msg00788.html


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29549

[Bug fortran/29549] matmul slow for complex matrices

2008-02-16 Thread jb at gcc dot gnu dot org



--- Comment #10 from jb at gcc dot gnu dot org  2008-02-16 22:33 ---
Actually, we could compile the entire libgfortran with -fcx-fortran-rules as
well:

Index: Makefile.am
===
--- Makefile.am (revision 132367)
+++ Makefile.am (working copy)
@@ -28,6 +28,9 @@ AM_CPPFLAGS = -iquote$(srcdir)/io -I$(sr
  -I$(srcdir)/$(MULTISRCTOP)../gcc/config \
  -I$(MULTIBUILDTOP)../../$(host_subdir)/gcc -D_GNU_SOURCE

+# Fortran rules for complex multiplication and division
+AM_CFLAGS += -fcx-fortran-rules
+
 gfor_io_src= \
 io/close.c \
 io/file_pos.c \

Regtested on i686-pc-linux-gnu. This might benefit other intrinsics using
complex multiplication and division as well, e.g. PRODUCT.

I'll go ahead and write some documentation as well, and submit the entire thing
once 4.4 opens; assigning to myself.


-- 

jb at gcc dot gnu dot org changed:

   What|Removed |Added

 AssignedTo|unassigned at gcc dot gnu   |jb at gcc dot gnu dot org
   |dot org |
 Status|NEW |ASSIGNED
   Last reconfirmed|2006-11-04 14:15:02 |2008-02-16 22:33:12
   date||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29549

[Bug fortran/29549] matmul slow for complex matrices

2008-02-16 Thread rguenth at gcc dot gnu dot org



--- Comment #9 from rguenth at gcc dot gnu dot org  2008-02-16 21:58 ---
Actually the middle-end parts are ok for 4.4 if you add proper documentation
for
the flag.  But please post it once stage1 opens.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29549

[Bug fortran/29549] matmul slow for complex matrices

2008-02-16 Thread fxcoudert at gcc dot gnu dot org



--- Comment #8 from fxcoudert at gcc dot gnu dot org  2008-02-16 19:00 
---
The Makefile.am part was messed up by my terminal: 

Index: libgfortran/Makefile.am
===
--- libgfortran/Makefile.am (revision 132353)
+++ libgfortran/Makefile.am (working copy)
@@ -636,7 +636,7 @@
 install-pdf:

 # Turn on vectorization and loop unrolling for matmul.
-$(patsubst %.c,%.lo,$(notdir $(i_matmul_c))): AM_CFLAGS += -ftree-vectorize
-funroll-loops
+$(patsubst %.c,%.lo,$(notdir $(i_matmul_c))): AM_CFLAGS += -ftree-vectorize
-funroll-loops -fcx-fortran-rules
 # Logical matmul doesn't vectorize.
 $(patsubst %.c,%.lo,$(notdir $(i_matmull_c))): AM_CFLAGS += -funroll-loops


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29549

[Bug fortran/29549] matmul slow for complex matrices

2008-02-16 Thread fxcoudert at gcc dot gnu dot org



--- Comment #7 from fxcoudert at gcc dot gnu dot org  2008-02-16 18:50 
---
Thomas is right: -fcx-limited-range sets flag_complex_method to 0, but already
with flag_complex_method == 1 we have some rather good figures. Here are the
execution times of 300x300 matmul on my MacBook Pro (i386-apple-darwin8.11.1):

  - a home-made triple do loop in Fortran (Janne's comment #2) is 0.1876 sec
  - unpatched matmul is 0.5499 sec
  - matmul compiled with flag_complex_method == 1 is 0.1448 sec

The following patch is what I used to benchmark: it creates a
-fcx-fortran-rules (of course, we do know that Fortran actually rules, but
hiding it in an option name is a clever way for people to slowly start
realizing it) option that sets flag_complex_method to 1, and uses it to compile
libgfortran's matmul routines.


Index: gcc/toplev.c
===
--- gcc/toplev.c(revision 132353)
+++ gcc/toplev.c(working copy)
@@ -2001,6 +2001,10 @@
   if (flag_cx_limited_range)
 flag_complex_method = 0;

+  /* With -fcx-fortran-rules, we do something in-between cheap and C99.  */
+  if (flag_cx_fortran_rules)
+flag_complex_method = 1;
+
   /* Targets must be able to place spill slots at lower addresses.  If the
  target already uses a soft frame pointer, the transition is trivial.  */
   if (!FRAME_GROWS_DOWNWARD && flag_stack_protect)
Index: gcc/common.opt
===
--- gcc/common.opt  (revision 132353)
+++ gcc/common.opt  (working copy)
@@ -390,6 +390,10 @@
 Common Report Var(flag_cx_limited_range) Optimization
 Omit range reduction step when performing complex division

+fcx-fortran-rules
+Common Report Var(flag_cx_fortran_rules) Optimization
+Complex multiplication and division follow Fortran rules
+
 fdata-sections
 Common Report Var(flag_data_sections) Optimization
 Place data items into their own section
Index: libgfortran/Makefile.am
===
--- libgfortran/Makefile.am (revision 132353)
+++ libgfortran/Makefile.am (working copy)
@@ -636,7 +636,7 @@
 install-pdf:

 # Turn on vectorization and loop unrolling for matmul.
-$(patsubst %.c,%.lo,$(notdir $(i_matmul_c))): AM_CFLAGS += -ftree-vectorize
-fs
+$(patsubst %.c,%.lo,$(notdir $(i_matmul_c))): AM_CFLAGS += -ftree-vectorize
-fs
 # Logical matmul doesn't vectorize.
 $(patsubst %.c,%.lo,$(notdir $(i_matmull_c))): AM_CFLAGS += -funroll-loops



-- 

fxcoudert at gcc dot gnu dot org changed:

   What|Removed |Added

 CC||fxcoudert at gcc dot gnu dot
   ||org
   Keywords||patch


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29549

[Bug fortran/29549] matmul slow for complex matrices

2008-02-10 Thread tkoenig at gcc dot gnu dot org



--- Comment #6 from tkoenig at gcc dot gnu dot org  2008-02-10 22:47 ---
(In reply to comment #5)
> The big culprit seems to be -fcx-limited-range. The other flags enabled by
> -ffast-math help very little.

C has some strange rules for complex types, which are mandated by the
C standard and aren't much use for other languages.

This is controlled by the variable flag_complex_method.  For C, this
is either 2 (meaning full C rules) or 0, which implies limited range
for complex division.  Complex multiplication can be expanded into
a libcall for flag_complex_method == 2 under circumstances I don't
understand (line 981, tree-complex.c).

Fortran usually has 1, which means sane rules for complex division
and multiplication.

Unfortunately, our matmul routines are written in C, so we
get what we don't need in Fortran - full C rules and possibly
a call to a library routine.

Solutions?  We could introduce an option to set flag_complex_method to
1 in C.  We could also set -fcx-limited-range for our matmul
routines, which should be safe as they don't use complex division
(at least they should not :-)

CC:ing rth as he wrote the code in question.


-- 

tkoenig at gcc dot gnu dot org changed:

   What|Removed |Added

 CC||rth at gcc dot gnu dot org,
   ||tkoenig at gcc dot gnu dot
   ||org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29549

[Bug fortran/29549] matmul slow for complex matrices

2008-02-10 Thread jb at gcc dot gnu dot org



--- Comment #5 from jb at gcc dot gnu dot org  2008-02-10 19:19 ---
The big culprit seems to be -fcx-limited-range. The other flags enabled by
-ffast-math help very little.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29549

[Bug fortran/29549] matmul slow for complex matrices

2006-11-04 Thread jb at gcc dot gnu dot org



--- Comment #4 from jb at gcc dot gnu dot org  2006-11-04 22:16 ---
For the C version with 1d arrays, the benchmark results, with gfortran results
for comparison, are

Complex version:
-O3 funroll-loops -mfpmath=sse -msse2
1.32
above + fast-math
0.38
gfortran -O2:
0.32

Real version:
0.07 s
fast-math, same thing.

gfortran -O2 -g
0.07

So it seems the culprit is some optimization that -ffast-math enables that
makes a huge difference for C99 complex arithmetic. However, compiling matmul
in libgfortran with -ffast-math almost certainly won't fly.. 

So ideally we should find exactly what flag enables this performance
improvement, and see if we can enable only that without bringing in all the
-ffast-math baggage. Or then we should bugger the optimizer guys, if this is an
optimization that could be enabled also without -ffast-math.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29549

[Bug fortran/29549] matmul slow for complex matrices

2006-11-04 Thread jb at gcc dot gnu dot org



--- Comment #3 from jb at gcc dot gnu dot org  2006-11-04 21:24 ---
Well, redoing the C benchmark above to use 1d arrays and manual index
calculations, the results are now essentially the same as for the Fortran
version. And a commercial compiler produces about the same results for the
Fortran version as gfortran, which means the reason for our poor complex matmul
performance lies elsewhere.

#include 
#include 
#include 
#include 
#include 

int main(void)
{
  int n = 300;
  complex float *a, *b, *c;
  int i, j, k, tc;
  a = malloc (n*n * sizeof (*a));
  b = malloc (n*n * sizeof (*b));
  c = malloc (n*n * sizeof (*c));
  struct timeval tv, tv2;
  float res;
  FILE *fp;

  tc = 0;
  for (i = 0; i < n*n; i++)
{
  a[i] = i*10.0 + 100.0*I;
  b[i] = 1.0 + 42.0*I;
  c[i] = 0.0 + 0.0*I;
}

  gettimeofday (&tv, NULL);

  for (i = 0; i < n; i++)
{
  for (j = 0; j < n; j++)
{
  c[i*n + j] = 0.0 + 0.0*I;
  for (k = 0; k < n; k++)
{
  c[i*n + j] = c[i*n + j] + a[i*n + k] * b[k*n + j];
  tc++;
}
}
}
  gettimeofday (&tv2, NULL);
  res = tv2.tv_sec - tv.tv_sec + (tv2.tv_usec - tv.tv_usec) / 100.0;
  printf ("gemm time: %f\n", res);
  fp = fopen ("c-matrix", "w");
  for (i = 0; i < n; i++)
{
  for (j = 0; jhttp://gcc.gnu.org/bugzilla/show_bug.cgi?id=29549

[Bug fortran/29549] matmul slow for complex matrices

2006-11-04 Thread jb at gcc dot gnu dot org



--- Comment #2 from jb at gcc dot gnu dot org  2006-11-04 20:34 ---
I did some experimenting, and it seems the C version of a trivial matrix
multiply program is much slower than the same program written in Fortran?

Switch the commented declarations and c[i][j] = 0 in the loop to get the float
version.

#include 
#include 
#include 
#include 

int main(void)
{
  const int n = 300;
  complex float a[n][n], b[n][n], c[n][n];
  //float a[n][n], b[n][n], c[n][n];
  int i, j, k, tc;

  struct timeval tv, tv2;
  float res;
  tc = 0;
  gettimeofday (&tv, NULL);
  for (i = 0; i < n; i++)
{
  for (j = 0; j < n; j++)
{
  c[i][j] = 0.0 + 0.0*I;
  //c[i][j] = 0.0;
  for (k = 0; k < n; k++)
{
//  printf("i %i, j %i, k %i\n", i, j, k);
  c[i][j] = c[i][j] + a[i][k] * b[k][j];
  tc++;
}
}
}
  gettimeofday (&tv2, NULL);
  res = tv2.tv_sec - tv.tv_sec + (tv2.tv_usec - tv.tv_usec) / 100.0;
  printf ("gemm time: %f\n", res);
  printf ("trip count: %i\n", tc);
}


Fortran version:

program mymatmul
  implicit none
  integer, parameter :: n = 300
  real, dimension(n,n) :: rr, ri
  complex, dimension(n,n) :: a,b,c
  real :: t1, t2
  integer :: i, j, k

  call random_number (rr)
  call random_number (ri)
  a = cmplx (rr, ri)
  call random_number (rr)
  call random_number (ri)
  b = cmplx (rr, ri)

  call cpu_time (t1)

  do j = 1, n
 do i = 1, n
c(i,j) = cmplx (0., 0.)
do k = 1, n
   c(i,j) = c(i,j) + a(i,k) * b(k,j)
end do
 end do
  end do

  call cpu_time (t2)
  write (*,'(F8.4)') t2-t1
  open (10, file="cmatrix", form='unformatted')
  write (10) c
  close (10)

end program mymatmul

Fortran version with real instead of complex:

program mymatmul
  implicit none
  integer, parameter :: n = 300
  real, dimension(n,n) :: a,b,c
  real :: t1, t2
  integer :: i, j, k, tc

  call random_number (a)
  call random_number (b)

  call cpu_time (t1)

  tc = 0
  do j = 1, n
 do i = 1, n
c(i,j) = 0.
do k = 1, n
   c(i,j) = c(i,j) + a(i,k) * b(k,j)
   tc = tc + 1
end do
 end do
  end do

  call cpu_time (t2)
  write (*,'(F8.4)') t2-t1
  write (*, *) 'Trip count: ', tc
  open (10, file="rmatrix", form='unformatted')
  write (10) c
  close (10)

end program mymatmul

And my results:

C version, complex:
-O2
2.0 s
-ffast-math
0.9
gfortran -O2:
0.32

float:
-O2 0.6 s
fast math makes no difference!

gfortran -O2 -g
0.07


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29549

[Bug fortran/29549] matmul slow for complex matrices

2006-11-04 Thread jb at gcc dot gnu dot org



--- Comment #1 from jb at gcc dot gnu dot org  2006-11-04 14:15 ---
Confirmed.

I noticed it too when I was reviewing FX's external-blas patch. But the complex
version of matmul is generated from the same m4 sources as the real versions.
It might be that the middle- and/or back-end generates inefficient code for
complex arithmetic in general?


-- 

jb at gcc dot gnu dot org changed:

   What|Removed |Added

 CC||jb at gcc dot gnu dot org
 Status|UNCONFIRMED |NEW
 Ever Confirmed|0   |1
   Last reconfirmed|-00-00 00:00:00 |2006-11-04 14:15:02
   date||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29549

[Bug fortran/29549] matmul slow for complex matrices

[Bug fortran/29549] matmul slow for complex matrices

[Bug fortran/29549] matmul slow for complex matrices

[Bug fortran/29549] matmul slow for complex matrices

[Bug fortran/29549] matmul slow for complex matrices

[Bug fortran/29549] matmul slow for complex matrices

[Bug fortran/29549] matmul slow for complex matrices

[Bug fortran/29549] matmul slow for complex matrices

[Bug fortran/29549] matmul slow for complex matrices

[Bug fortran/29549] matmul slow for complex matrices

[Bug fortran/29549] matmul slow for complex matrices

[Bug fortran/29549] matmul slow for complex matrices

[Bug fortran/29549] matmul slow for complex matrices

[Bug fortran/29549] matmul slow for complex matrices

14 matches

Site Navigation

Mail list logo

Footer information