[Bug tree-optimization/38968] Complex matrix product is not vectorized
--- Comment #11 from rguenth at gcc dot gnu dot org 2009-03-28 10:05 --- Subject: Bug 38968 Author: rguenth Date: Sat Mar 28 10:05:24 2009 New Revision: 145171 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=145171 Log: 2009-03-28 Richard Guenther rguent...@suse.de PR tree-optimization/38968 * tree-vect-analyze.c (vect_compute_data_ref_alignment): Use FLOOR_MOD_EXPR to compute misalignment. * gfortran.dg/vect/fast-math-pr38968.f90: New testcase. Added: trunk/gcc/testsuite/gfortran.dg/vect/fast-math-pr38968.f90 Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-vect-analyze.c -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38968
[Bug tree-optimization/38968] Complex matrix product is not vectorized
--- Comment #12 from rguenth at gcc dot gnu dot org 2009-03-28 10:06 --- Fixed. -- rguenth at gcc dot gnu dot org changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution||FIXED Target Milestone|--- |4.5.0 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38968
[Bug tree-optimization/38968] Complex matrix product is not vectorized
--- Comment #7 from dominiq at lps dot ens dot fr 2009-02-01 10:37 --- Created an attachment (id=17220) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=17220action=view) testin complex matrix multiplication Comment #0 is not fully accurate. With some more testsing with the attached code, I get: - gcc 4.3.3: no vectorization, - gcc 4.4.0 (trunk) : vectorization for odd n, - gcc 4.4.0 + patch from http://gcc.gnu.org/ml/gcc-patches/2009-01/msg01271.html: vectorization for all values of n (in the tested range). The attached code also checked the result of the matrix product which is OK. Now as shown below (in flops/clock cycle), the timings are quite disapointing (-m64 -O3 -ffast-math -funroll-loops): for odd n, the vectorized code is slower than the nonvectorized one, for even n, the code is faster with vectorization, but still significantly slower than with ifort. n 4.3.3 trunk trunk ifort +patch 11.0 124 1.331.361.812.61 125 1.371.321.322.20 126 1.361.371.792.55 127 1.371.311.312.22 128 1.381.391.862.64 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38968
[Bug tree-optimization/38968] Complex matrix product is not vectorized
--- Comment #9 from dominiq at lps dot ens dot fr 2009-02-01 10:58 --- Did you try enabling SSE3 btw? No. How do I get the enabled SSE* by default? Can you post the ifort assembly of the loop? L_B1.14:# Preds L_B1.14 L_B1.13 lea (%rsi,%r9,8), %r11# lea mymatmul_$A.0.1(%rip), %r10 #27.33 movaps(%r10,%r11), %xmm2#27.33 movaps16(%r10,%r11), %xmm4 #27.33 movaps%xmm0, %xmm3 #27.40 mulps %xmm2, %xmm3 #27.40 shufps$177, %xmm2, %xmm2#27.40 lea (%rdx,%r9,8), %r15# lea mymatmul_$C.0.1(%rip), %r14 #27.24 movaps%xmm0, %xmm5 #27.40 addq $4, %r9 #26.12 mulps %xmm1, %xmm2 #27.40 cmpq $128, %r9 #26.12 addsubps %xmm2, %xmm3 #27.40 addps (%r14,%r15), %xmm3#27.15 movaps%xmm3, (%r14,%r15)#27.15 mulps %xmm4, %xmm5 #27.40 shufps$177, %xmm4, %xmm4#27.40 mulps %xmm1, %xmm4 #27.40 addsubps %xmm4, %xmm5 #27.40 addps 16(%r14,%r15), %xmm5 #27.15 movaps%xmm5, 16(%r14,%r15) #27.15 jlL_B1.14 # Prob 99% #26.12 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38968
[Bug tree-optimization/38968] Complex matrix product is not vectorized
--- Comment #10 from rguenther at suse dot de 2009-02-01 11:11 --- Subject: Re: Complex matrix product is not vectorized On Sun, 1 Feb 2009, dominiq at lps dot ens dot fr wrote: --- Comment #9 from dominiq at lps dot ens dot fr 2009-02-01 10:58 --- Did you try enabling SSE3 btw? No. How do I get the enabled SSE* by default? You can enable SSE3 manually with -msse3, or automatically enable what your local CPU can do with -march=native. Richard. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38968
[Bug tree-optimization/38968] Complex matrix product is not vectorized
--- Comment #8 from rguenth at gcc dot gnu dot org 2009-02-01 10:49 --- This is somewhat expected. We vectorize the complex product using vectors of real parts and vectors of complex parts of two complex numbers (so we are not using the fancy haddsub SSE codes). Did you try enabling SSE3 btw? Can you post the ifort assembly of the loop? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38968
[Bug tree-optimization/38968] Complex matrix product is not vectorized
--- Comment #2 from rguenth at gcc dot gnu dot org 2009-01-26 11:15 --- This happens because ivcanon introduces an induction variable that counts from 2000 to 1. This confuses data-ref analysis and we get base_address: a_24(D) offset from base address: (unnamed-signed:64) ((unnamed-unsigned:64) pretmp.28_148 * 16000) constant offset from base address: -15996 step: 8 aligned to: 128 base_object: IMAGPART_EXPR (*a_24(D))[0] symbol tag: SMT.12 notice the negative constant offset from base address. This in turn confuses the vectorizer alignment analysis - but only because the alignment of the base object is known. We hit (with misalign == -15996, alignment == 16) /* Modulo alignment. */ misalign = size_binop (TRUNC_MOD_EXPR, misalign, alignment); if (!host_integerp (misalign, 1)) { /* Negative or overflowed misalignment value. */ if (vect_print_dump_info (REPORT_DETAILS)) fprintf (vect_dump, unexpected misalign value); return false; } and the modulo is -12. Now, I wonder why we do not just use alignment + misalign in that case. I have a patch. -- rguenth at gcc dot gnu dot org changed: What|Removed |Added AssignedTo|unassigned at gcc dot gnu |rguenth at gcc dot gnu dot |dot org |org Status|NEW |ASSIGNED Last reconfirmed|2009-01-25 17:33:10 |2009-01-26 11:15:23 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38968
[Bug tree-optimization/38968] Complex matrix product is not vectorized
--- Comment #3 from irar at il dot ibm dot com 2009-01-26 13:09 --- (In reply to comment #2) Now, I wonder why we do not just use alignment + misalign in that case. I think you are right. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38968
[Bug tree-optimization/38968] Complex matrix product is not vectorized
--- Comment #4 from rguenth at gcc dot gnu dot org 2009-01-26 13:25 --- Patch posted. -- rguenth at gcc dot gnu dot org changed: What|Removed |Added URL||http://gcc.gnu.org/ml/gcc- ||patches/2009- ||01/msg01271.html http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38968
[Bug tree-optimization/38968] Complex matrix product is not vectorized
--- Comment #5 from howarth at nitro dot med dot uc dot edu 2009-01-26 14:21 --- Is the fix for this PR targeted for gcc 4.4.0 or gcc 4.5 stage 1? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38968
[Bug tree-optimization/38968] Complex matrix product is not vectorized
--- Comment #6 from rguenther at suse dot de 2009-01-26 14:23 --- Subject: Re: Complex matrix product is not vectorized On Mon, 26 Jan 2009, howarth at nitro dot med dot uc dot edu wrote: --- Comment #5 from howarth at nitro dot med dot uc dot edu 2009-01-26 14:21 --- Is the fix for this PR targeted for gcc 4.4.0 or gcc 4.5 stage 1? stage1, it is an enhancement. Richard. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38968
[Bug tree-optimization/38968] Complex matrix product is not vectorized
--- Comment #1 from rguenth at gcc dot gnu dot org 2009-01-25 17:33 --- Confirmed. Note the patch mentioned does not try to address any issue present in the testcase. -- rguenth at gcc dot gnu dot org changed: What|Removed |Added CC||rguenth at gcc dot gnu dot ||org Severity|normal |enhancement Status|UNCONFIRMED |NEW Component|middle-end |tree-optimization Ever Confirmed|0 |1 Keywords||missed-optimization Last reconfirmed|-00-00 00:00:00 |2009-01-25 17:33:10 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=38968