[Bug tree-optimization/38968] Complex matrix product is not vectorized

2009-03-28 Thread rguenth at gcc dot gnu dot org


--- Comment #11 from rguenth at gcc dot gnu dot org  2009-03-28 10:05 
---
Subject: Bug 38968

Author: rguenth
Date: Sat Mar 28 10:05:24 2009
New Revision: 145171

URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=145171
Log:
2009-03-28  Richard Guenther  rguent...@suse.de

PR tree-optimization/38968
* tree-vect-analyze.c (vect_compute_data_ref_alignment):
Use FLOOR_MOD_EXPR to compute misalignment.

* gfortran.dg/vect/fast-math-pr38968.f90: New testcase.

Added:
trunk/gcc/testsuite/gfortran.dg/vect/fast-math-pr38968.f90
Modified:
trunk/gcc/ChangeLog
trunk/gcc/testsuite/ChangeLog
trunk/gcc/tree-vect-analyze.c


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38968



[Bug tree-optimization/38968] Complex matrix product is not vectorized

2009-03-28 Thread rguenth at gcc dot gnu dot org


--- Comment #12 from rguenth at gcc dot gnu dot org  2009-03-28 10:06 
---
Fixed.


-- 

rguenth at gcc dot gnu dot org changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution||FIXED
   Target Milestone|--- |4.5.0


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38968



[Bug tree-optimization/38968] Complex matrix product is not vectorized

2009-02-01 Thread dominiq at lps dot ens dot fr


--- Comment #7 from dominiq at lps dot ens dot fr  2009-02-01 10:37 ---
Created an attachment (id=17220)
 -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17220action=view)
testin complex matrix multiplication

Comment #0 is not fully accurate. With some more testsing with the 
attached code, I get:
- gcc 4.3.3: no vectorization,
- gcc 4.4.0 (trunk) : vectorization for odd n,
- gcc 4.4.0 + patch from 
  http://gcc.gnu.org/ml/gcc-patches/2009-01/msg01271.html:
  vectorization for all values of n (in the tested range).

The attached code also checked the result of the matrix product which is
OK. Now as shown below (in flops/clock cycle), the timings are quite
disapointing (-m64 -O3 -ffast-math -funroll-loops): for odd n, the
vectorized code is slower than the nonvectorized one, for even n, the code
is faster with vectorization, but still significantly slower than with
ifort.

 n 4.3.3   trunk   trunk  ifort
  +patch   11.0

124 1.331.361.812.61
125 1.371.321.322.20
126 1.361.371.792.55
127 1.371.311.312.22
128 1.381.391.862.64


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38968



[Bug tree-optimization/38968] Complex matrix product is not vectorized

2009-02-01 Thread dominiq at lps dot ens dot fr


--- Comment #9 from dominiq at lps dot ens dot fr  2009-02-01 10:58 ---
 Did you try enabling SSE3 btw?

No. How do I get the enabled SSE* by default?

 Can you post the ifort assembly of the loop?

L_B1.14:# Preds L_B1.14 L_B1.13
lea   (%rsi,%r9,8), %r11#
lea   mymatmul_$A.0.1(%rip), %r10   #27.33
movaps(%r10,%r11), %xmm2#27.33
movaps16(%r10,%r11), %xmm4  #27.33
movaps%xmm0, %xmm3  #27.40
mulps %xmm2, %xmm3  #27.40
shufps$177, %xmm2, %xmm2#27.40
lea   (%rdx,%r9,8), %r15#
lea   mymatmul_$C.0.1(%rip), %r14   #27.24
movaps%xmm0, %xmm5  #27.40
addq  $4, %r9   #26.12
mulps %xmm1, %xmm2  #27.40
cmpq  $128, %r9 #26.12
addsubps  %xmm2, %xmm3  #27.40
addps (%r14,%r15), %xmm3#27.15
movaps%xmm3, (%r14,%r15)#27.15
mulps %xmm4, %xmm5  #27.40
shufps$177, %xmm4, %xmm4#27.40
mulps %xmm1, %xmm4  #27.40
addsubps  %xmm4, %xmm5  #27.40
addps 16(%r14,%r15), %xmm5  #27.15
movaps%xmm5, 16(%r14,%r15)  #27.15
jlL_B1.14   # Prob 99%  #26.12


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38968



[Bug tree-optimization/38968] Complex matrix product is not vectorized

2009-02-01 Thread rguenther at suse dot de


--- Comment #10 from rguenther at suse dot de  2009-02-01 11:11 ---
Subject: Re:  Complex matrix product is not
 vectorized

On Sun, 1 Feb 2009, dominiq at lps dot ens dot fr wrote:

 --- Comment #9 from dominiq at lps dot ens dot fr  2009-02-01 10:58 
 ---
  Did you try enabling SSE3 btw?
 
 No. How do I get the enabled SSE* by default?

You can enable SSE3 manually with -msse3, or automatically enable what
your local CPU can do with -march=native.

Richard.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38968



[Bug tree-optimization/38968] Complex matrix product is not vectorized

2009-02-01 Thread rguenth at gcc dot gnu dot org


--- Comment #8 from rguenth at gcc dot gnu dot org  2009-02-01 10:49 ---
This is somewhat expected.  We vectorize the complex product using vectors
of real parts and vectors of complex parts of two complex numbers (so we
are not using the fancy haddsub SSE codes).  Did you try enabling SSE3 btw?
Can you post the ifort assembly of the loop?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38968



[Bug tree-optimization/38968] Complex matrix product is not vectorized

2009-01-26 Thread rguenth at gcc dot gnu dot org


--- Comment #2 from rguenth at gcc dot gnu dot org  2009-01-26 11:15 ---
This happens because ivcanon introduces an induction variable that counts
from 2000 to 1.  This confuses data-ref analysis and we get

base_address: a_24(D)
offset from base address: (unnamed-signed:64)
((unnamed-unsigned:64) pretmp.28_148 * 16000)
constant offset from base address: -15996
step: 8
aligned to: 128
base_object: IMAGPART_EXPR (*a_24(D))[0]
symbol tag: SMT.12

notice the negative constant offset from base address.  This in turn
confuses the vectorizer alignment analysis - but only because the alignment
of the base object is known.  We hit (with misalign == -15996, alignment == 16)

  /* Modulo alignment.  */
  misalign = size_binop (TRUNC_MOD_EXPR, misalign, alignment);

  if (!host_integerp (misalign, 1))
{
  /* Negative or overflowed misalignment value.  */
  if (vect_print_dump_info (REPORT_DETAILS))
fprintf (vect_dump, unexpected misalign value);
  return false;
}

and the modulo is -12.

Now, I wonder why we do not just use alignment + misalign in that case.

I have a patch.


-- 

rguenth at gcc dot gnu dot org changed:

   What|Removed |Added

 AssignedTo|unassigned at gcc dot gnu   |rguenth at gcc dot gnu dot
   |dot org |org
 Status|NEW |ASSIGNED
   Last reconfirmed|2009-01-25 17:33:10 |2009-01-26 11:15:23
   date||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38968



[Bug tree-optimization/38968] Complex matrix product is not vectorized

2009-01-26 Thread irar at il dot ibm dot com


--- Comment #3 from irar at il dot ibm dot com  2009-01-26 13:09 ---
(In reply to comment #2)
 Now, I wonder why we do not just use alignment + misalign in that case.

I think you are right.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38968



[Bug tree-optimization/38968] Complex matrix product is not vectorized

2009-01-26 Thread rguenth at gcc dot gnu dot org


--- Comment #4 from rguenth at gcc dot gnu dot org  2009-01-26 13:25 ---
Patch posted.


-- 

rguenth at gcc dot gnu dot org changed:

   What|Removed |Added

URL||http://gcc.gnu.org/ml/gcc-
   ||patches/2009-
   ||01/msg01271.html


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38968



[Bug tree-optimization/38968] Complex matrix product is not vectorized

2009-01-26 Thread howarth at nitro dot med dot uc dot edu


--- Comment #5 from howarth at nitro dot med dot uc dot edu  2009-01-26 
14:21 ---
Is the fix for this PR targeted for gcc 4.4.0 or gcc 4.5 stage 1?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38968



[Bug tree-optimization/38968] Complex matrix product is not vectorized

2009-01-26 Thread rguenther at suse dot de


--- Comment #6 from rguenther at suse dot de  2009-01-26 14:23 ---
Subject: Re:  Complex matrix product is not
 vectorized

On Mon, 26 Jan 2009, howarth at nitro dot med dot uc dot edu wrote:

 --- Comment #5 from howarth at nitro dot med dot uc dot edu  2009-01-26 
 14:21 ---
 Is the fix for this PR targeted for gcc 4.4.0 or gcc 4.5 stage 1?

stage1, it is an enhancement.

Richard.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38968



[Bug tree-optimization/38968] Complex matrix product is not vectorized

2009-01-25 Thread rguenth at gcc dot gnu dot org


--- Comment #1 from rguenth at gcc dot gnu dot org  2009-01-25 17:33 ---
Confirmed.  Note the patch mentioned does not try to address any issue present
in the testcase.


-- 

rguenth at gcc dot gnu dot org changed:

   What|Removed |Added

 CC||rguenth at gcc dot gnu dot
   ||org
   Severity|normal  |enhancement
 Status|UNCONFIRMED |NEW
  Component|middle-end  |tree-optimization
 Ever Confirmed|0   |1
   Keywords||missed-optimization
   Last reconfirmed|-00-00 00:00:00 |2009-01-25 17:33:10
   date||


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38968