[Bug tree-optimization/42108] [4.4/4.5 Regression] Vectorizer cannot deal with PAREN_EXPR gracefully, 50% performance regression
--- Comment #23 from irar at il dot ibm dot com 2009-11-30 12:20 --- Applied: http://gcc.gnu.org/viewcvs?limit_changes=0&view=revision&revision=154794 Thanks, Ira -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108
[Bug tree-optimization/42108] [4.4/4.5 Regression] Vectorizer cannot deal with PAREN_EXPR gracefully, 50% performance regression
--- Comment #22 from rguenther at suse dot de 2009-11-30 10:13 --- Subject: Re: [4.4/4.5 Regression] Vectorizer cannot deal with PAREN_EXPR gracefully, 50% performance regression On Mon, 30 Nov 2009, irar at il dot ibm dot com wrote: > --- Comment #20 from irar at il dot ibm dot com 2009-11-30 08:52 --- > Actually, PAREN_EXPRs are vectorizable (the support was added by you, Richard, > in your original PAREN_EXPR patch > http://gcc.gnu.org/viewcvs?limit_changes=0&view=revision&revision=132515 )). Oh, indeed ;) > The problem here is that vectorizable_assignment does not support multiple > types. The attached patch adds this support, but I don't know if the patch is > suitable for the current stage... Probably not (though it looks small). If you feel confident about it you may well apply it still though. Thanks, Richard. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108
[Bug tree-optimization/42108] [4.4/4.5 Regression] Vectorizer cannot deal with PAREN_EXPR gracefully, 50% performance regression
--- Comment #21 from irar at il dot ibm dot com 2009-11-30 08:54 --- Created an attachment (id=19183) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=19183&action=view) Multiple types support patch -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108
[Bug tree-optimization/42108] [4.4/4.5 Regression] Vectorizer cannot deal with PAREN_EXPR gracefully, 50% performance regression
--- Comment #20 from irar at il dot ibm dot com 2009-11-30 08:52 --- Actually, PAREN_EXPRs are vectorizable (the support was added by you, Richard, in your original PAREN_EXPR patch http://gcc.gnu.org/viewcvs?limit_changes=0&view=revision&revision=132515 )). The problem here is that vectorizable_assignment does not support multiple types. The attached patch adds this support, but I don't know if the patch is suitable for the current stage... Ira -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108
[Bug tree-optimization/42108] [4.4/4.5 Regression] Vectorizer cannot deal with PAREN_EXPR gracefully, 50% performance regression
--- Comment #19 from rguenth at gcc dot gnu dot org 2009-11-27 11:23 --- I guess this PR should be split further, a bug about the PAREN_EXPR wrt vectorization and a bug about the yet unanalyzed performance regression. -- rguenth at gcc dot gnu dot org changed: What|Removed |Added Severity|enhancement |normal Priority|P3 |P2 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108
[Bug tree-optimization/42108] [4.4/4.5 Regression] Vectorizer cannot deal with PAREN_EXPR gracefully, 50% performance regression
--- Comment #18 from irar at il dot ibm dot com 2009-11-23 09:02 --- I tried to vectorize eval.f90 with 4.3 and mainline on x86_64-suse-linux. In both cases no loop gets vectorized in subroutine eval. The k loop is not vectorizable because the step of x is unknown (function argument), and scalar evolution analysis fails to analyze it. The j loop is not vectorized first of all because of the k loop unknown loop bound (this is on our todo list). Ira -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108
[Bug tree-optimization/42108] [4.4/4.5 Regression] Vectorizer cannot deal with PAREN_EXPR gracefully, 50% performance regression
--- Comment #17 from rguenth at gcc dot gnu dot org 2009-11-21 13:58 --- I have filed PR42131 for the DO loop translation issue. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108
[Bug tree-optimization/42108] [4.4/4.5 Regression] Vectorizer cannot deal with PAREN_EXPR gracefully, 50% performance regression
--- Comment #16 from rguenther at suse dot de 2009-11-21 12:19 --- Subject: Re: [4.4/4.5 Regression] Vectorizer cannot deal with PAREN_EXPR gracefully, 50% performance regression On Sat, 21 Nov 2009, toon at moene dot org wrote: > --- Comment #15 from toon at moene dot org 2009-11-21 12:11 --- > > I don't see that the standard suggests the specific code the Frontend > > generates. In fact it should be valid to increment the DO variable > > by m3 and express the exit test in terms of the DO variable as well. > > The Standard doesn't prescribe the code the Frontend generates - however, to > be > sure one follows the Standard, it's most easy to simply implement the steps > given. > > To illustrate this with a simple example: > > DO I = M1, M2, M3 >B(I) = A(I) > ENDDO > > would be most easily, and atraightforwardly, implemented as follows: > > IF (M3 > 0 .AND. M1 < M2) GOTO 200 ! Loop executed zero times > IF (M3 < 0 .AND. M1 > M2) GOTO 200 ! Ditto > ITEMP = (M2 - M1 + M3) / M3 ! Temporary loop count > I = M1 > 100 CONTINUE > B(I) = A(I) > ITEMP = ITEMP - 1 ! Adjust internal loop counter > I = I + M3 ! Adjust DO loop variable > IF (ITEMP > 0) GOTO 100 > 200 CONTINUE > > That there are two induction variables in this loop is inconsequential - one > of > them should be eliminated by induction variable elimination (at least, that > was > the case with g77 and the RTL loop optimization pass). Sure, but the frontend generates if (M3 > 0) ITEMP = (M2 - M1) / M3 else ITEMP = (M1 - M2) / -M3 I = M1 100 CONTINUE B(I) = A(I) I = I + M3 if (ITEMP == 0) GOTO 200 ITEMP = ITEMP - 1 GOTO 100 200 CONTINUE The conditional setting of ITEMP is what confuses GCC. Also I don't see the test for zero-time executing loops (but maybe I omitted it from my pasting in comment #12). Richard. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108
[Bug tree-optimization/42108] [4.4/4.5 Regression] Vectorizer cannot deal with PAREN_EXPR gracefully, 50% performance regression
--- Comment #15 from toon at moene dot org 2009-11-21 12:11 --- > I don't see that the standard suggests the specific code the Frontend > generates. In fact it should be valid to increment the DO variable > by m3 and express the exit test in terms of the DO variable as well. The Standard doesn't prescribe the code the Frontend generates - however, to be sure one follows the Standard, it's most easy to simply implement the steps given. To illustrate this with a simple example: DO I = M1, M2, M3 B(I) = A(I) ENDDO would be most easily, and atraightforwardly, implemented as follows: IF (M3 > 0 .AND. M1 < M2) GOTO 200 ! Loop executed zero times IF (M3 < 0 .AND. M1 > M2) GOTO 200 ! Ditto ITEMP = (M2 - M1 + M3) / M3 ! Temporary loop count I = M1 100 CONTINUE B(I) = A(I) ITEMP = ITEMP - 1 ! Adjust internal loop counter I = I + M3 ! Adjust DO loop variable IF (ITEMP > 0) GOTO 100 200 CONTINUE That there are two induction variables in this loop is inconsequential - one of them should be eliminated by induction variable elimination (at least, that was the case with g77 and the RTL loop optimization pass). If you think that the Frontend does something different / in addition to the above, feel free to open a separate PR. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108
[Bug tree-optimization/42108] [4.4/4.5 Regression] Vectorizer cannot deal with PAREN_EXPR gracefully, 50% performance regression
--- Comment #14 from rguenth at gcc dot gnu dot org 2009-11-20 23:48 --- (In reply to comment #13) > > The funny conditional initialization of countm1.6 makes the analysis of > > the number of iterations of this loop impossible (not to mention the > > conversions to character(kind=4)). > > > Why does the frontend do induction variable "optimization" at all and > > not simply generate a loop with a non-unit counting IV? > > It's not trying to be funny - it just follows the text of the Fortran Standard > (hey, what a concept !): > > 12 8.1.6.6.1Loop initiation > 13 1 When the DO statement is executed, the DO construct becomes active. If > loop-control is > 14 2 [ , ] do-variable = scalar-int-expr 1 , scalar-int-expr 2 [ , > scalar-int-expr 3 ] > 15 3 the following steps are performed in sequence. > 16 (1)The initial parameter m1 , the terminal parameter m2 , and > the incrementation parameter m3 are > 17 of type integer with the same kind type parameter as the > do-variable. Their values are established > 18 by evaluating scalar-int-expr 1 , scalar-int-expr 2 , and > scalar-int-expr 3 , respectively, including, if ne- > 19 cessary, conversion to the kind type parameter of the > do-variable according to the rules for numeric > 20 conversion (Table 7.11). If scalar-int-expr 3 does not > appear, m3 has the value 1. The value of m3 > 21 shall not be zero. > 22 (2)The DO variable becomes defined with the value of the > initial parameter m1 . > 23 (3)The iteration count is established and is the value of the > expression (m2 - m1 + m3 )/m3 , unless that > 24 value is negative, in which case the iteration count is 0. > > Only interprocedural analysis can tell us that this is a simple loop only > executed 3 times (I got this wrong at first - it's *always* executed 3 times). I don't see that the standard suggests the specific code the Frontend generates. In fact it should be valid to increment the DO variable by m3 and express the exit test in terms of the DO variable as well. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108
[Bug tree-optimization/42108] [4.4/4.5 Regression] Vectorizer cannot deal with PAREN_EXPR gracefully, 50% performance regression
--- Comment #13 from toon at moene dot org 2009-11-20 19:45 --- > The funny conditional initialization of countm1.6 makes the analysis of > the number of iterations of this loop impossible (not to mention the > conversions to character(kind=4)). > Why does the frontend do induction variable "optimization" at all and > not simply generate a loop with a non-unit counting IV? It's not trying to be funny - it just follows the text of the Fortran Standard (hey, what a concept !): 12 8.1.6.6.1Loop initiation 13 1 When the DO statement is executed, the DO construct becomes active. If loop-control is 14 2 [ , ] do-variable = scalar-int-expr 1 , scalar-int-expr 2 [ , scalar-int-expr 3 ] 15 3 the following steps are performed in sequence. 16 (1)The initial parameter m1 , the terminal parameter m2 , and the incrementation parameter m3 are 17 of type integer with the same kind type parameter as the do-variable. Their values are established 18 by evaluating scalar-int-expr 1 , scalar-int-expr 2 , and scalar-int-expr 3 , respectively, including, if ne- 19 cessary, conversion to the kind type parameter of the do-variable according to the rules for numeric 20 conversion (Table 7.11). If scalar-int-expr 3 does not appear, m3 has the value 1. The value of m3 21 shall not be zero. 22 (2)The DO variable becomes defined with the value of the initial parameter m1 . 23 (3)The iteration count is established and is the value of the expression (m2 - m1 + m3 )/m3 , unless that 24 value is negative, in which case the iteration count is 0. Only interprocedural analysis can tell us that this is a simple loop only executed 3 times (I got this wrong at first - it's *always* executed 3 times). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108
[Bug tree-optimization/42108] [4.4/4.5 Regression] Vectorizer cannot deal with PAREN_EXPR gracefully, 50% performance regression
--- Comment #12 from rguenth at gcc dot gnu dot org 2009-11-20 14:13 --- The loop is not unrolled because the frontend presents us with very funny obfuscated code: do k=i,nnd,n temp=temp+(x(k)-x(k+jmini))**2 end do gets translated to { character(kind=4) countm1.6; integer(kind=4) D.1551; integer(kind=4) D.1550; integer(kind=4) D.1549; D.1549 = i; D.1550 = *nnd; D.1551 = *n; k = D.1549; if (D.1551 > 0) { if (D.1550 < D.1549) goto L.6;, countm1.6 = (character(kind=4)) (D.1550 - D.1549) / (character(kind=4)) D.1551;; } else { if (D.1550 > D.1549) goto L.6;, countm1.6 = (character(kind=4)) (D.1549 - D.1550) / (character(kind=4)) -D.1551;; } while (1) { { real(kind=8) D.1556; real(kind=8) D.1555; D.1555 = (((*x)[(integer(kind=8)) k + -1] - (*x)[(integer(kind=8)) (k + jmini) + -1])); D.1556 = D.1555 * D.1555; temp = temp + D.1556; } L.5:; k = k + D.1551; if (countm1.6 == 0) goto L.6; countm1.6 = countm1.6 + 4294967295; } L.6:; } WTF!? The funny conditional initialization of countm1.6 makes the analysis of the number of iterations of this loop impossible (not to mention the conversions to character(kind=4)). Why does the frontend do induction variable "optimization" at all and not simply generate a loop with a non-unit counting IV? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108
[Bug tree-optimization/42108] [4.4/4.5 Regression] Vectorizer cannot deal with PAREN_EXPR gracefully, 50% performance regression
--- Comment #11 from sfilippone at uniroma2 dot it 2009-11-20 14:12 --- (In reply to comment #10) Again, I am no asking for help in writing a better code (I think I know how to handle this, and I will convince my colleague), I just thought it was worth mentioning that the optimizer has apparently done a worse job lately (at least on the platform I am using). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108
[Bug tree-optimization/42108] [4.4/4.5 Regression] Vectorizer cannot deal with PAREN_EXPR gracefully, 50% performance regression
--- Comment #10 from sfilippone at uniroma2 dot it 2009-11-20 14:03 --- (In reply to comment #9) > I am rather confused by some comments: > > (1) Although I am not fluent with x86 assembly, I am pretty sure that no code > in eval is vectorized (assembly taken from this pr or from the original post > http://gcc.gnu.org/ml/fortran/2009-11/msg00163.html). > > (2) If I am not mistaken, the k loop always handle 3 elements for i, i+n, and > i+2*n. > Yup, in the test case, in the original application the factor might be different from 3. And yes, it may be better to declare the array as 2D -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108
[Bug tree-optimization/42108] [4.4/4.5 Regression] Vectorizer cannot deal with PAREN_EXPR gracefully, 50% performance regression
--- Comment #9 from dominiq at lps dot ens dot fr 2009-11-20 13:45 --- I am rather confused by some comments: (1) Although I am not fluent with x86 assembly, I am pretty sure that no code in eval is vectorized (assembly taken from this pr or from the original post http://gcc.gnu.org/ml/fortran/2009-11/msg00163.html). (2) If I am not mistaken, the k loop always handle 3 elements for i, i+n, and i+2*n. (3) On a core2duo 2.1Ghz, I only see small changes in the timing between 4.3.4 to trunk, -O1 to -O3, and 32 or 64 bit mode. Now if I do the following change: --- pr42108_1_db.f902009-11-20 14:14:05.0 +0100 +++ pr42108_1_db_1.f90 2009-11-20 14:15:24.0 +0100 @@ -7,12 +7,10 @@ subroutine eval(foo1,foo2,foo3,foo4,x,n do i=2,n foo3(i)=foo2*foo4(i) do j=1,i-1 - temp=0.0d0 - jmini=j-i - do k=i,nnd,n -temp=temp+(x(k)-x(k+jmini))**2 - end do - temp = sqrt(temp+foo1) + temp = sqrt( (x(i) - x(j))**2 & + +(x(i+n) - x(j+n))**2 & + +(x(i+2*n)-x(j+2*n))**2 & + +foo1) foo3(i)=foo3(i)+temp*foo4(j) foo3(j)=foo3(j)+temp*foo4(i) end do I go from 9.2s to 5.5s for n=2. So the k loop is not automatically unrolled even with -funroll-loops. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108
[Bug tree-optimization/42108] [4.4/4.5 Regression] Vectorizer cannot deal with PAREN_EXPR gracefully, 50% performance regression
--- Comment #8 from sfilippone at uniroma2 dot it 2009-11-20 08:32 --- (In reply to comment #6) > Richard Guenther wrote: > > > Well, within eval there's nothing really obvious to me. The > > innermost loop is exactly the same: > > But it is a very inefficient way of vectorizing, because the inner loop's body > is either executed twice or three times per outer loop (depending on the value > of i). > While I agree that I would code in a different way, still there is the change in compiler's behaviour. Although comment 7 indicates it's probably only at 64bits -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108
[Bug tree-optimization/42108] [4.4/4.5 Regression] Vectorizer cannot deal with PAREN_EXPR gracefully, 50% performance regression
--- Comment #7 from anlauf at gmx dot de 2009-11-19 22:33 --- I tried the code on a x86 Core2 system (32 bit mode). gfortran 4.3, 4.5: 22.74user 0.03system 0:22.82elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k Intels ifort 11.1 is only ~ 5% faster, but: SunStudio 12.1: (sunf95 -fast) 11.50user 0.00system 0:11.51elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k Wow, that gives a 100% improvement potential! (I added a print *, foo3(n) after the call to eval to make sure that nothing gets optimized away.) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108
[Bug tree-optimization/42108] [4.4/4.5 Regression] Vectorizer cannot deal with PAREN_EXPR gracefully, 50% performance regression
--- Comment #6 from toon at moene dot org 2009-11-19 19:53 --- Richard Guenther wrote: > Well, within eval there's nothing really obvious to me. The > innermost loop is exactly the same: But it is a very inefficient way of vectorizing, because the inner loop's body is either executed twice or three times per outer loop (depending on the value of i). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108
[Bug tree-optimization/42108] [4.4/4.5 Regression] Vectorizer cannot deal with PAREN_EXPR gracefully, 50% performance regression
--- Comment #5 from sfilippone at uniroma2 dot it 2009-11-19 19:42 --- (In reply to comment #4) > Subject: Re: [4.4/4.5 Regression] Vectorizer > cannot deal with PAREN_EXPR gracefully, 50% performance regression > > > Heh, with -fwhole-program GCC optimizes the test away and I get 0.0s > runtime. > Not too surprising, after all this was extracted to make the test case manageable, the original code is not pointless..:-) > Well, within eval there's nothing really obvious to me. The > innermost loop is exactly the same: > > .L39: > movsd (%r15), %xmm0 > addq%rsi, %r15 > subsd (%rdx), %xmm0 > addq%rsi, %rdx > subl$1, %eax > mulsd %xmm0, %xmm0 > addsd %xmm0, %xmm1 > jne .L39 > > the next outer loop has some less loads in 4.5 but also different > induction variables. So - nothing obvious to me. > Exactly, it's quite surprising to see a difference with such a simple loop. However the size of the generated assembler is different, so there must be something... > Richard. > -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108
[Bug tree-optimization/42108] [4.4/4.5 Regression] Vectorizer cannot deal with PAREN_EXPR gracefully, 50% performance regression
--- Comment #4 from rguenther at suse dot de 2009-11-19 17:30 --- Subject: Re: [4.4/4.5 Regression] Vectorizer cannot deal with PAREN_EXPR gracefully, 50% performance regression On Thu, 19 Nov 2009, sfilippone at uniroma2 dot it wrote: > --- Comment #3 from sfilippone at uniroma2 dot it 2009-11-19 17:17 > --- > (In reply to comment #2) > > -ftree-vectorizer-verbose=2 tells you: > > > > eval.f90:35: note: not vectorized: relevant stmt not supported: D.1684_73 = > > ((D.1683_72)); > > > > eval.f90:32: note: not vectorized: relevant stmt not supported: D.1684_58 = > > ((D.1683_57)); > > > > PAREN_EXPRs are new in 4.4 and I believe they cannot be turned off > > right now. > > > > The loops are > > > > do i=1,nnd > > x(i) = 1.d0 + (1.d0*i)/nnd > > end do > > do i=1,n > > foo4(i) = 1.d0 + (1.d0*i)/n > > end do > > > > where the vectorizer doesn't know how to ensure evaluation order is > > preserved when trying to vectorize (1.d0*i)/n. Writing them as > > 1.d0*i/n vectorizes the function. > > > > Still the performance is lower by a factor of two compared to 4.3 > > (even with -ffast-math). > > > > Probably the bug should be split. > > > > Well, the performance drop I am looking at is in the subroutine. The > initialization loops are (to me) irrelevant, I had posted a previous version > to the mailing list where the initialization was done with random_number and > the situation was the same. > A run with profiling shows that more than 99% of the time is spent in eval_ Heh, with -fwhole-program GCC optimizes the test away and I get 0.0s runtime. Well, within eval there's nothing really obvious to me. The innermost loop is exactly the same: .L39: movsd (%r15), %xmm0 addq%rsi, %r15 subsd (%rdx), %xmm0 addq%rsi, %rdx subl$1, %eax mulsd %xmm0, %xmm0 addsd %xmm0, %xmm1 jne .L39 the next outer loop has some less loads in 4.5 but also different induction variables. So - nothing obvious to me. Richard. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108
[Bug tree-optimization/42108] [4.4/4.5 Regression] Vectorizer cannot deal with PAREN_EXPR gracefully, 50% performance regression
--- Comment #3 from sfilippone at uniroma2 dot it 2009-11-19 17:17 --- (In reply to comment #2) > -ftree-vectorizer-verbose=2 tells you: > > eval.f90:35: note: not vectorized: relevant stmt not supported: D.1684_73 = > ((D.1683_72)); > > eval.f90:32: note: not vectorized: relevant stmt not supported: D.1684_58 = > ((D.1683_57)); > > PAREN_EXPRs are new in 4.4 and I believe they cannot be turned off > right now. > > The loops are > > do i=1,nnd > x(i) = 1.d0 + (1.d0*i)/nnd > end do > do i=1,n > foo4(i) = 1.d0 + (1.d0*i)/n > end do > > where the vectorizer doesn't know how to ensure evaluation order is > preserved when trying to vectorize (1.d0*i)/n. Writing them as > 1.d0*i/n vectorizes the function. > > Still the performance is lower by a factor of two compared to 4.3 > (even with -ffast-math). > > Probably the bug should be split. > Well, the performance drop I am looking at is in the subroutine. The initialization loops are (to me) irrelevant, I had posted a previous version to the mailing list where the initialization was done with random_number and the situation was the same. A run with profiling shows that more than 99% of the time is spent in eval_ -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108
[Bug tree-optimization/42108] [4.4/4.5 Regression] Vectorizer cannot deal with PAREN_EXPR gracefully, 50% performance regression
--- Comment #2 from rguenth at gcc dot gnu dot org 2009-11-19 16:49 --- -ftree-vectorizer-verbose=2 tells you: eval.f90:35: note: not vectorized: relevant stmt not supported: D.1684_73 = ((D.1683_72)); eval.f90:32: note: not vectorized: relevant stmt not supported: D.1684_58 = ((D.1683_57)); PAREN_EXPRs are new in 4.4 and I believe they cannot be turned off right now. The loops are do i=1,nnd x(i) = 1.d0 + (1.d0*i)/nnd end do do i=1,n foo4(i) = 1.d0 + (1.d0*i)/n end do where the vectorizer doesn't know how to ensure evaluation order is preserved when trying to vectorize (1.d0*i)/n. Writing them as 1.d0*i/n vectorizes the function. Still the performance is lower by a factor of two compared to 4.3 (even with -ffast-math). Probably the bug should be split. -- rguenth at gcc dot gnu dot org changed: What|Removed |Added CC||irar at il dot ibm dot com, ||rguenth at gcc dot gnu dot ||org Severity|normal |enhancement Status|UNCONFIRMED |NEW Component|fortran |tree-optimization Ever Confirmed|0 |1 Keywords||missed-optimization Last reconfirmed|-00-00 00:00:00 |2009-11-19 16:49:51 date|| Summary|Performance drop from 4.3 to|[4.4/4.5 Regression] |4.4/4.5 |Vectorizer cannot deal with ||PAREN_EXPR gracefully, 50% ||performance regression Target Milestone|--- |4.4.3 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108