[Bug tree-optimization/42108] [4.4/4.5 Regression] Vectorizer cannot deal with PAREN_EXPR gracefully, 50% performance regression

2009-11-30 Thread irar at il dot ibm dot com


--- Comment #23 from irar at il dot ibm dot com  2009-11-30 12:20 ---
Applied:
http://gcc.gnu.org/viewcvs?limit_changes=0&view=revision&revision=154794

Thanks,
Ira


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108



[Bug tree-optimization/42108] [4.4/4.5 Regression] Vectorizer cannot deal with PAREN_EXPR gracefully, 50% performance regression

2009-11-30 Thread rguenther at suse dot de


--- Comment #22 from rguenther at suse dot de  2009-11-30 10:13 ---
Subject: Re:  [4.4/4.5 Regression] Vectorizer
 cannot deal with PAREN_EXPR gracefully, 50% performance regression

On Mon, 30 Nov 2009, irar at il dot ibm dot com wrote:

> --- Comment #20 from irar at il dot ibm dot com  2009-11-30 08:52 ---
> Actually, PAREN_EXPRs are vectorizable (the support was added by you, Richard,
> in your original PAREN_EXPR patch
> http://gcc.gnu.org/viewcvs?limit_changes=0&view=revision&revision=132515 )).

Oh, indeed ;)

> The problem here is that vectorizable_assignment does not support multiple
> types. The attached patch adds this support, but I don't know if the patch is
> suitable for the current stage...

Probably not (though it looks small).  If you feel confident about it
you may well apply it still though.

Thanks,
Richard.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108



[Bug tree-optimization/42108] [4.4/4.5 Regression] Vectorizer cannot deal with PAREN_EXPR gracefully, 50% performance regression

2009-11-30 Thread irar at il dot ibm dot com


--- Comment #21 from irar at il dot ibm dot com  2009-11-30 08:54 ---
Created an attachment (id=19183)
 --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=19183&action=view)
Multiple types support patch


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108



[Bug tree-optimization/42108] [4.4/4.5 Regression] Vectorizer cannot deal with PAREN_EXPR gracefully, 50% performance regression

2009-11-30 Thread irar at il dot ibm dot com


--- Comment #20 from irar at il dot ibm dot com  2009-11-30 08:52 ---
Actually, PAREN_EXPRs are vectorizable (the support was added by you, Richard,
in your original PAREN_EXPR patch
http://gcc.gnu.org/viewcvs?limit_changes=0&view=revision&revision=132515 )).

The problem here is that vectorizable_assignment does not support multiple
types. The attached patch adds this support, but I don't know if the patch is
suitable for the current stage...

Ira


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108



[Bug tree-optimization/42108] [4.4/4.5 Regression] Vectorizer cannot deal with PAREN_EXPR gracefully, 50% performance regression

2009-11-27 Thread rguenth at gcc dot gnu dot org


--- Comment #19 from rguenth at gcc dot gnu dot org  2009-11-27 11:23 
---
I guess this PR should be split further, a bug about the PAREN_EXPR wrt
vectorization and a bug about the yet unanalyzed performance regression.


-- 

rguenth at gcc dot gnu dot org changed:

   What|Removed |Added

   Severity|enhancement |normal
   Priority|P3  |P2


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108



[Bug tree-optimization/42108] [4.4/4.5 Regression] Vectorizer cannot deal with PAREN_EXPR gracefully, 50% performance regression

2009-11-23 Thread irar at il dot ibm dot com


--- Comment #18 from irar at il dot ibm dot com  2009-11-23 09:02 ---
I tried to vectorize eval.f90 with 4.3 and mainline on x86_64-suse-linux. In
both cases no loop gets vectorized in subroutine eval. The k loop is not
vectorizable because the step of x is unknown (function argument), and scalar
evolution analysis fails to analyze it. The j loop is not vectorized first of
all because of the k loop unknown loop bound (this is on our todo list).

Ira


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108



[Bug tree-optimization/42108] [4.4/4.5 Regression] Vectorizer cannot deal with PAREN_EXPR gracefully, 50% performance regression

2009-11-21 Thread rguenth at gcc dot gnu dot org


--- Comment #17 from rguenth at gcc dot gnu dot org  2009-11-21 13:58 
---
I have filed PR42131 for the DO loop translation issue.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108



[Bug tree-optimization/42108] [4.4/4.5 Regression] Vectorizer cannot deal with PAREN_EXPR gracefully, 50% performance regression

2009-11-21 Thread rguenther at suse dot de


--- Comment #16 from rguenther at suse dot de  2009-11-21 12:19 ---
Subject: Re:  [4.4/4.5 Regression] Vectorizer
 cannot deal with PAREN_EXPR gracefully, 50% performance regression

On Sat, 21 Nov 2009, toon at moene dot org wrote:

> --- Comment #15 from toon at moene dot org  2009-11-21 12:11 ---
> > I don't see that the standard suggests the specific code the Frontend
> > generates.  In fact it should be valid to increment the DO variable
> > by m3 and express the exit test in terms of the DO variable as well.
> 
> The Standard doesn't prescribe the code the Frontend generates - however, to 
> be
> sure one follows the Standard, it's most easy to simply implement the steps
> given.
> 
> To illustrate this with a simple example:
> 
> DO I = M1, M2, M3
>B(I) = A(I)
> ENDDO
> 
> would be most easily, and atraightforwardly, implemented as follows:
> 
>  IF (M3 > 0 .AND. M1 < M2) GOTO 200  ! Loop executed zero times
>  IF (M3 < 0 .AND. M1 > M2) GOTO 200  ! Ditto
>  ITEMP = (M2 - M1 + M3) / M3 ! Temporary loop count
>  I = M1
>  100 CONTINUE
>  B(I)  = A(I)
>  ITEMP = ITEMP - 1   ! Adjust internal loop counter
>  I = I + M3  ! Adjust DO loop variable
>  IF (ITEMP > 0) GOTO 100
>  200 CONTINUE
> 
> That there are two induction variables in this loop is inconsequential - one 
> of
> them should be eliminated by induction variable elimination (at least, that 
> was
> the case with g77 and the RTL loop optimization pass).

Sure, but the frontend generates

  if (M3 > 0)
 ITEMP = (M2 - M1) / M3
  else
 ITEMP = (M1 - M2) / -M3
  I = M1
100 CONTINUE
  B(I) = A(I)
  I = I + M3
  if (ITEMP == 0) GOTO 200
  ITEMP = ITEMP - 1
  GOTO 100
200 CONTINUE

The conditional setting of ITEMP is what confuses GCC.  Also I don't
see the test for zero-time executing loops (but maybe I omitted it
from my pasting in comment #12).

Richard.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108



[Bug tree-optimization/42108] [4.4/4.5 Regression] Vectorizer cannot deal with PAREN_EXPR gracefully, 50% performance regression

2009-11-21 Thread toon at moene dot org


--- Comment #15 from toon at moene dot org  2009-11-21 12:11 ---
> I don't see that the standard suggests the specific code the Frontend
> generates.  In fact it should be valid to increment the DO variable
> by m3 and express the exit test in terms of the DO variable as well.

The Standard doesn't prescribe the code the Frontend generates - however, to be
sure one follows the Standard, it's most easy to simply implement the steps
given.

To illustrate this with a simple example:

DO I = M1, M2, M3
   B(I) = A(I)
ENDDO

would be most easily, and atraightforwardly, implemented as follows:

 IF (M3 > 0 .AND. M1 < M2) GOTO 200  ! Loop executed zero times
 IF (M3 < 0 .AND. M1 > M2) GOTO 200  ! Ditto
 ITEMP = (M2 - M1 + M3) / M3 ! Temporary loop count
 I = M1
 100 CONTINUE
 B(I)  = A(I)
 ITEMP = ITEMP - 1   ! Adjust internal loop counter
 I = I + M3  ! Adjust DO loop variable
 IF (ITEMP > 0) GOTO 100
 200 CONTINUE

That there are two induction variables in this loop is inconsequential - one of
them should be eliminated by induction variable elimination (at least, that was
the case with g77 and the RTL loop optimization pass).

If you think that the Frontend does something different / in addition to the
above, feel free to open a separate PR.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108



[Bug tree-optimization/42108] [4.4/4.5 Regression] Vectorizer cannot deal with PAREN_EXPR gracefully, 50% performance regression

2009-11-20 Thread rguenth at gcc dot gnu dot org


--- Comment #14 from rguenth at gcc dot gnu dot org  2009-11-20 23:48 
---
(In reply to comment #13)
> > The funny conditional initialization of countm1.6 makes the analysis of
> > the number of iterations of this loop impossible (not to mention the
> > conversions to character(kind=4)).
> 
> > Why does the frontend do induction variable "optimization" at all and
> > not simply generate a loop with a non-unit counting IV?
> 
> It's not trying to be funny - it just follows the text of the Fortran Standard
> (hey, what a concept !):
> 
> 12   8.1.6.6.1Loop initiation
> 13 1 When the DO statement is executed, the DO construct becomes active. If
> loop-control is
> 14 2 [ , ] do-variable = scalar-int-expr 1 , scalar-int-expr 2 [ ,
> scalar-int-expr 3 ]
> 15 3 the following steps are performed in sequence.
> 16  (1)The initial parameter m1 , the terminal parameter m2 , and
> the incrementation parameter m3 are
> 17 of type integer with the same kind type parameter as the
> do-variable. Their values are established
> 18 by evaluating scalar-int-expr 1 , scalar-int-expr 2 , and
> scalar-int-expr 3 , respectively, including, if ne-
> 19 cessary, conversion to the kind type parameter of the
> do-variable according to the rules for numeric
> 20 conversion (Table 7.11). If scalar-int-expr 3 does not
> appear, m3 has the value 1. The value of m3
> 21 shall not be zero.
> 22  (2)The DO variable becomes defined with the value of the
> initial parameter m1 .
> 23  (3)The iteration count is established and is the value of the
> expression (m2 - m1 + m3 )/m3 , unless that
> 24 value is negative, in which case the iteration count is 0.
> 
> Only interprocedural analysis can tell us that this is a simple loop only
> executed 3 times (I got this wrong at first - it's *always* executed 3 times).

I don't see that the standard suggests the specific code the Frontend
generates.  In fact it should be valid to increment the DO variable
by m3 and express the exit test in terms of the DO variable as well.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108



[Bug tree-optimization/42108] [4.4/4.5 Regression] Vectorizer cannot deal with PAREN_EXPR gracefully, 50% performance regression

2009-11-20 Thread toon at moene dot org


--- Comment #13 from toon at moene dot org  2009-11-20 19:45 ---
> The funny conditional initialization of countm1.6 makes the analysis of
> the number of iterations of this loop impossible (not to mention the
> conversions to character(kind=4)).

> Why does the frontend do induction variable "optimization" at all and
> not simply generate a loop with a non-unit counting IV?

It's not trying to be funny - it just follows the text of the Fortran Standard
(hey, what a concept !):

12   8.1.6.6.1Loop initiation
13 1 When the DO statement is executed, the DO construct becomes active. If
loop-control is
14 2 [ , ] do-variable = scalar-int-expr 1 , scalar-int-expr 2 [ ,
scalar-int-expr 3 ]
15 3 the following steps are performed in sequence.
16  (1)The initial parameter m1 , the terminal parameter m2 , and
the incrementation parameter m3 are
17 of type integer with the same kind type parameter as the
do-variable. Their values are established
18 by evaluating scalar-int-expr 1 , scalar-int-expr 2 , and
scalar-int-expr 3 , respectively, including, if ne-
19 cessary, conversion to the kind type parameter of the
do-variable according to the rules for numeric
20 conversion (Table 7.11). If scalar-int-expr 3 does not
appear, m3 has the value 1. The value of m3
21 shall not be zero.
22  (2)The DO variable becomes defined with the value of the
initial parameter m1 .
23  (3)The iteration count is established and is the value of the
expression (m2 - m1 + m3 )/m3 , unless that
24 value is negative, in which case the iteration count is 0.

Only interprocedural analysis can tell us that this is a simple loop only
executed 3 times (I got this wrong at first - it's *always* executed 3 times).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108



[Bug tree-optimization/42108] [4.4/4.5 Regression] Vectorizer cannot deal with PAREN_EXPR gracefully, 50% performance regression

2009-11-20 Thread rguenth at gcc dot gnu dot org


--- Comment #12 from rguenth at gcc dot gnu dot org  2009-11-20 14:13 
---
The loop is not unrolled because the frontend presents us with very funny
obfuscated code:

  do  k=i,nnd,n
temp=temp+(x(k)-x(k+jmini))**2
  end do

gets translated to

{
  character(kind=4) countm1.6;
  integer(kind=4) D.1551;
  integer(kind=4) D.1550;
  integer(kind=4) D.1549;

  D.1549 = i;
  D.1550 = *nnd;
  D.1551 = *n;
  k = D.1549;
  if (D.1551 > 0)
{
  if (D.1550 < D.1549) goto L.6;, countm1.6 = (character(kind=4)) (D.1550 -
D.1549) / (character(kind=4)) D.1551;;
}
  else
{
  if (D.1550 > D.1549) goto L.6;, countm1.6 = (character(kind=4)) (D.1549 -
D.1550) / (character(kind=4)) -D.1551;;
}
  while (1)
{
{
  real(kind=8) D.1556;
  real(kind=8) D.1555;

  D.1555 = (((*x)[(integer(kind=8)) k + -1] - (*x)[(integer(kind=8)) (k
+ jmini) + -1]));
  D.1556 = D.1555 * D.1555;
  temp = temp + D.1556;
}
  L.5:;
  k = k + D.1551;
  if (countm1.6 == 0) goto L.6;
  countm1.6 = countm1.6 + 4294967295;
}
  L.6:;
}


WTF!?

The funny conditional initialization of countm1.6 makes the analysis of
the number of iterations of this loop impossible (not to mention the
conversions to character(kind=4)).

Why does the frontend do induction variable "optimization" at all and
not simply generate a loop with a non-unit counting IV?


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108



[Bug tree-optimization/42108] [4.4/4.5 Regression] Vectorizer cannot deal with PAREN_EXPR gracefully, 50% performance regression

2009-11-20 Thread sfilippone at uniroma2 dot it


--- Comment #11 from sfilippone at uniroma2 dot it  2009-11-20 14:12 ---
(In reply to comment #10)
Again, I am no asking for help in writing a better code (I think I know how to
handle this, and I will convince my colleague), I just thought it was worth
mentioning that the optimizer has apparently done a worse job lately (at least
on the platform I am using).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108



[Bug tree-optimization/42108] [4.4/4.5 Regression] Vectorizer cannot deal with PAREN_EXPR gracefully, 50% performance regression

2009-11-20 Thread sfilippone at uniroma2 dot it


--- Comment #10 from sfilippone at uniroma2 dot it  2009-11-20 14:03 ---
(In reply to comment #9)
> I am rather confused by some comments:
> 
> (1) Although I am not fluent with x86 assembly, I am pretty sure that no code
> in eval is vectorized (assembly taken from this pr or from the original post
> http://gcc.gnu.org/ml/fortran/2009-11/msg00163.html).
> 
> (2) If I am not mistaken, the k loop always handle 3 elements for i, i+n, and
> i+2*n.
> 
Yup, in the test case, in the original application the factor might be
different from 3. And yes, it may be better to declare the array as 2D


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108



[Bug tree-optimization/42108] [4.4/4.5 Regression] Vectorizer cannot deal with PAREN_EXPR gracefully, 50% performance regression

2009-11-20 Thread dominiq at lps dot ens dot fr


--- Comment #9 from dominiq at lps dot ens dot fr  2009-11-20 13:45 ---
I am rather confused by some comments:

(1) Although I am not fluent with x86 assembly, I am pretty sure that no code
in eval is vectorized (assembly taken from this pr or from the original post
http://gcc.gnu.org/ml/fortran/2009-11/msg00163.html).

(2) If I am not mistaken, the k loop always handle 3 elements for i, i+n, and
i+2*n.

(3) On a core2duo 2.1Ghz, I only see small changes in the timing between 4.3.4
to trunk, -O1 to -O3, and 32 or 64 bit mode.

Now if I do the following change:

--- pr42108_1_db.f902009-11-20 14:14:05.0 +0100
+++ pr42108_1_db_1.f90  2009-11-20 14:15:24.0 +0100
@@ -7,12 +7,10 @@ subroutine  eval(foo1,foo2,foo3,foo4,x,n
   do i=2,n
 foo3(i)=foo2*foo4(i)
 do  j=1,i-1
-  temp=0.0d0
-  jmini=j-i
-  do  k=i,nnd,n
-temp=temp+(x(k)-x(k+jmini))**2
-  end do
-  temp = sqrt(temp+foo1)
+  temp = sqrt( (x(i) - x(j))**2 &
+  +(x(i+n) - x(j+n))**2 &
+  +(x(i+2*n)-x(j+2*n))**2 &
+  +foo1)
   foo3(i)=foo3(i)+temp*foo4(j)
   foo3(j)=foo3(j)+temp*foo4(i)
 end do

I go from 9.2s to 5.5s for n=2. So the k loop is not automatically unrolled
even with -funroll-loops.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108



[Bug tree-optimization/42108] [4.4/4.5 Regression] Vectorizer cannot deal with PAREN_EXPR gracefully, 50% performance regression

2009-11-20 Thread sfilippone at uniroma2 dot it


--- Comment #8 from sfilippone at uniroma2 dot it  2009-11-20 08:32 ---
(In reply to comment #6)
> Richard Guenther wrote:
> 
> > Well, within eval there's nothing really obvious to me.  The
> > innermost loop is exactly the same:
> 
> But it is a very inefficient way of vectorizing, because the inner loop's body
> is either executed twice or three times per outer loop (depending on the value
> of i).
> 
While I agree that I would code in a different way, still there is the change
in compiler's behaviour. Although comment 7 indicates it's probably only at
64bits


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108



[Bug tree-optimization/42108] [4.4/4.5 Regression] Vectorizer cannot deal with PAREN_EXPR gracefully, 50% performance regression

2009-11-19 Thread anlauf at gmx dot de


--- Comment #7 from anlauf at gmx dot de  2009-11-19 22:33 ---
I tried the code on a x86 Core2 system (32 bit mode).

gfortran 4.3, 4.5:
22.74user 0.03system 0:22.82elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k

Intels ifort 11.1 is only ~ 5% faster, but:

SunStudio 12.1: (sunf95 -fast)
11.50user 0.00system 0:11.51elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k

Wow, that gives a 100% improvement potential!

(I added a
  print *, foo3(n)
after the call to eval to make sure that nothing gets optimized away.)


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108



[Bug tree-optimization/42108] [4.4/4.5 Regression] Vectorizer cannot deal with PAREN_EXPR gracefully, 50% performance regression

2009-11-19 Thread toon at moene dot org


--- Comment #6 from toon at moene dot org  2009-11-19 19:53 ---
Richard Guenther wrote:

> Well, within eval there's nothing really obvious to me.  The
> innermost loop is exactly the same:

But it is a very inefficient way of vectorizing, because the inner loop's body
is either executed twice or three times per outer loop (depending on the value
of i).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108



[Bug tree-optimization/42108] [4.4/4.5 Regression] Vectorizer cannot deal with PAREN_EXPR gracefully, 50% performance regression

2009-11-19 Thread sfilippone at uniroma2 dot it


--- Comment #5 from sfilippone at uniroma2 dot it  2009-11-19 19:42 ---
(In reply to comment #4)
> Subject: Re:  [4.4/4.5 Regression] Vectorizer
>  cannot deal with PAREN_EXPR gracefully, 50% performance regression
> 
> 
> Heh, with -fwhole-program GCC optimizes the test away and I get 0.0s
> runtime.
> 
Not too surprising, after all this was extracted to make the test case
manageable, the original code is not pointless..:-)

> Well, within eval there's nothing really obvious to me.  The
> innermost loop is exactly the same:
> 
> .L39:
> movsd   (%r15), %xmm0
> addq%rsi, %r15
> subsd   (%rdx), %xmm0
> addq%rsi, %rdx
> subl$1, %eax
> mulsd   %xmm0, %xmm0
> addsd   %xmm0, %xmm1
> jne .L39
> 
> the next outer loop has some less loads in 4.5 but also different
> induction variables.  So - nothing obvious to me.
> 
Exactly, it's quite surprising to see a difference with such a simple loop. 
However the size of the generated assembler is different, so there must be
something... 

> Richard.
> 


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108



[Bug tree-optimization/42108] [4.4/4.5 Regression] Vectorizer cannot deal with PAREN_EXPR gracefully, 50% performance regression

2009-11-19 Thread rguenther at suse dot de


--- Comment #4 from rguenther at suse dot de  2009-11-19 17:30 ---
Subject: Re:  [4.4/4.5 Regression] Vectorizer
 cannot deal with PAREN_EXPR gracefully, 50% performance regression

On Thu, 19 Nov 2009, sfilippone at uniroma2 dot it wrote:

> --- Comment #3 from sfilippone at uniroma2 dot it  2009-11-19 17:17 
> ---
> (In reply to comment #2)
> > -ftree-vectorizer-verbose=2 tells you:
> > 
> > eval.f90:35: note: not vectorized: relevant stmt not supported: D.1684_73 =
> > ((D.1683_72));
> > 
> > eval.f90:32: note: not vectorized: relevant stmt not supported: D.1684_58 =
> > ((D.1683_57));
> > 
> > PAREN_EXPRs are new in 4.4 and I believe they cannot be turned off
> > right now.
> > 
> > The loops are
> > 
> >   do i=1,nnd
> > x(i) = 1.d0 + (1.d0*i)/nnd
> >   end do
> >   do i=1,n
> > foo4(i) = 1.d0 + (1.d0*i)/n
> >   end do
> > 
> > where the vectorizer doesn't know how to ensure evaluation order is
> > preserved when trying to vectorize (1.d0*i)/n.  Writing them as
> > 1.d0*i/n vectorizes the function.
> > 
> > Still the performance is lower by a factor of two compared to 4.3
> > (even with -ffast-math).
> > 
> > Probably the bug should be split.
> > 
> 
> Well, the performance drop I am looking at is  in the subroutine. The
> initialization loops are (to me)  irrelevant, I had posted a previous version
> to the mailing list where the initialization was done with random_number and
> the situation was the same. 
> A run with profiling shows that more than 99% of the time is spent in eval_

Heh, with -fwhole-program GCC optimizes the test away and I get 0.0s
runtime.

Well, within eval there's nothing really obvious to me.  The
innermost loop is exactly the same:

.L39:
movsd   (%r15), %xmm0
addq%rsi, %r15
subsd   (%rdx), %xmm0
addq%rsi, %rdx
subl$1, %eax
mulsd   %xmm0, %xmm0
addsd   %xmm0, %xmm1
jne .L39

the next outer loop has some less loads in 4.5 but also different
induction variables.  So - nothing obvious to me.

Richard.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108



[Bug tree-optimization/42108] [4.4/4.5 Regression] Vectorizer cannot deal with PAREN_EXPR gracefully, 50% performance regression

2009-11-19 Thread sfilippone at uniroma2 dot it


--- Comment #3 from sfilippone at uniroma2 dot it  2009-11-19 17:17 ---
(In reply to comment #2)
> -ftree-vectorizer-verbose=2 tells you:
> 
> eval.f90:35: note: not vectorized: relevant stmt not supported: D.1684_73 =
> ((D.1683_72));
> 
> eval.f90:32: note: not vectorized: relevant stmt not supported: D.1684_58 =
> ((D.1683_57));
> 
> PAREN_EXPRs are new in 4.4 and I believe they cannot be turned off
> right now.
> 
> The loops are
> 
>   do i=1,nnd
> x(i) = 1.d0 + (1.d0*i)/nnd
>   end do
>   do i=1,n
> foo4(i) = 1.d0 + (1.d0*i)/n
>   end do
> 
> where the vectorizer doesn't know how to ensure evaluation order is
> preserved when trying to vectorize (1.d0*i)/n.  Writing them as
> 1.d0*i/n vectorizes the function.
> 
> Still the performance is lower by a factor of two compared to 4.3
> (even with -ffast-math).
> 
> Probably the bug should be split.
> 

Well, the performance drop I am looking at is  in the subroutine. The
initialization loops are (to me)  irrelevant, I had posted a previous version
to the mailing list where the initialization was done with random_number and
the situation was the same. 
A run with profiling shows that more than 99% of the time is spent in eval_


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108



[Bug tree-optimization/42108] [4.4/4.5 Regression] Vectorizer cannot deal with PAREN_EXPR gracefully, 50% performance regression

2009-11-19 Thread rguenth at gcc dot gnu dot org


--- Comment #2 from rguenth at gcc dot gnu dot org  2009-11-19 16:49 ---
-ftree-vectorizer-verbose=2 tells you:

eval.f90:35: note: not vectorized: relevant stmt not supported: D.1684_73 =
((D.1683_72));

eval.f90:32: note: not vectorized: relevant stmt not supported: D.1684_58 =
((D.1683_57));

PAREN_EXPRs are new in 4.4 and I believe they cannot be turned off
right now.

The loops are

  do i=1,nnd
x(i) = 1.d0 + (1.d0*i)/nnd
  end do
  do i=1,n
foo4(i) = 1.d0 + (1.d0*i)/n
  end do

where the vectorizer doesn't know how to ensure evaluation order is
preserved when trying to vectorize (1.d0*i)/n.  Writing them as
1.d0*i/n vectorizes the function.

Still the performance is lower by a factor of two compared to 4.3
(even with -ffast-math).

Probably the bug should be split.


-- 

rguenth at gcc dot gnu dot org changed:

   What|Removed |Added

 CC||irar at il dot ibm dot com,
   ||rguenth at gcc dot gnu dot
   ||org
   Severity|normal  |enhancement
 Status|UNCONFIRMED |NEW
  Component|fortran |tree-optimization
 Ever Confirmed|0   |1
   Keywords||missed-optimization
   Last reconfirmed|-00-00 00:00:00 |2009-11-19 16:49:51
   date||
Summary|Performance drop from 4.3 to|[4.4/4.5 Regression]
   |4.4/4.5 |Vectorizer cannot deal with
   ||PAREN_EXPR gracefully, 50%
   ||performance regression
   Target Milestone|--- |4.4.3


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108