[Bug tree-optimization/39300] vectorizer confused by predictive commoning and PRE
--- Comment #5 from dorit at gcc dot gnu dot org 2009-03-08 14:25 --- This is a known problem... Indeed when Zdenek introduced predictive-commoning there was a discussion on whether to schedule it before or after vectorization. AFAIR, it ended up getting scheduled before the vectorizer just because this happened to be what Zdenek tested/experimented with, and he didn't have a problem with scheduling it after vectorization as long as it didn't hurt performance (of mgrid in particular). Here are related threads: http://gcc.gnu.org/ml/gcc-patches/2007-02/msg01383.html http://gcc.gnu.org/ml/gcc-patches/2007-02/msg00555.html http://gcc.gnu.org/ml/gcc-patches/2007-05/msg00571.html Regardless of whether we scheudule predcom after vectorization, it will still be useful to teach the vectorizer to handle such dependence patterns, as they may (and do) appear in the source code. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39300
[Bug tree-optimization/39068] signed short plus and signed char plus not vectorized
--- Comment #2 from dorit at gcc dot gnu dot org 2009-02-01 21:06 --- (reminds me of a couple missed-optimization PRs where vectorization is also failing due to casts - PR31873 , PR26128 - don't know if this is related) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39068
[Bug tree-optimization/37021] Fortran Complex reduction / multiplication not vectorized
--- Comment #9 from dorit at gcc dot gnu dot org 2009-01-27 12:40 --- (In reply to comment #4) The testcase should be subroutine to_product_of(self,a,b,a1,a2) complex(kind=8) :: self (:) complex(kind=8), intent(in) :: a(:,:) complex(kind=8), intent(in) :: b(:) integer a1,a2 do i = 1,a1 do j = 1,a2 self(i) = self(i) + a(j,i)*b(j) end do end do end subroutine to be meaningful - otherwise we are accessing a in non-continuous ways in the inner loop which would prevent vectorization. this change from a(i,j) to a(j,i) is not required if we try to vectorize the outer-loop, where the stride is 1. It's also a better way to vectorize the reduction. A few limitations on the way though are: 1) somehow don't let gcc create guard code around the innermost loop to check that it executes more than zero iterations. This creates a complicated control flow structure within the outer-loop. For now you have to have constant number of iterations for the inner-loop because of that, or insert a statement like if (a2=0) return; before the loop... 2) use -fno-tree-sink cause otherwise it moves the loop iv increment to the latch block and the vectorizer likes to have the latch block empty... (see also PR33113 for related reference). With the versioning for stride == 1 I get then .L13: movupd 16(%rax), %xmm1 movupd (%rax), %xmm3 incl%ecx movupd (%rdx), %xmm4 addq$32, %rax movapd %xmm3, %xmm0 unpckhpd%xmm1, %xmm3 unpcklpd%xmm1, %xmm0 movupd 16(%rdx), %xmm1 movapd %xmm4, %xmm2 addq$32, %rdx movapd %xmm3, %xmm9 cmpl%ecx, %r8d unpcklpd%xmm1, %xmm2 unpckhpd%xmm1, %xmm4 movapd %xmm4, %xmm1 movapd %xmm2, %xmm4 mulpd %xmm1, %xmm9 mulpd %xmm0, %xmm4 mulpd %xmm3, %xmm2 mulpd %xmm1, %xmm0 subpd %xmm9, %xmm4 addpd %xmm2, %xmm0 addpd %xmm4, %xmm6 addpd %xmm0, %xmm5 ja .L13 haddpd %xmm5, %xmm5 cmpl%r15d, %edi movl-4(%rsp), %ecx haddpd %xmm6, %xmm6 addsd %xmm5, %xmm8 addsd %xmm6, %xmm7 jne .L12 jmp .L14 for the innermost loop, followed by a tail loop (peel for niters). This is about 15% faster on AMD K10 than the non-vectorized loop (if you disable the cost-model and make sure to have enough iterations in the inner loop to pay back for the extra guarding conditions). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37021
[Bug tree-optimization/33113] Failing to represent the stride (with array) of a dataref when it is not a constant
--- Comment #7 from dorit at gcc dot gnu dot org 2009-01-27 12:46 --- related testcase/PR: PR37021 and related discussion: http://gcc.gnu.org/ml/gcc-patches/2009-01/msg01322.html -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33113
[Bug tree-optimization/37692] New: [alias-improvements-branch] can't alias fortran function arguments
This happens in testcases gfortran.dg/vect/vect-[2,3,4].f90 - On the alias branch we can't tell that subroutine arguments don't alias. e.g., X,Y in SUBROUTINE SAXPY(X,Y,A). As a result the vectorizer applies loop-versioning with runtime aliasing test, which also means it will handle misalignment using versioning instead of peeling: versioning for alias required: can't determine dependence between (*x_32(D))[D.1518_28] and (*y_29(D))[D.1518_28] vect-3.f90:6: note: mark for run-time aliasing test between (*x_32(D))[D.1518_28] and (*y_29(D))[D.1518_28] ... vect-3.f90:6: note: Alignment of access forced using versioning. vect-3.f90:6: note: Versioning for alignment will be applied. -- Summary: [alias-improvements-branch] can't alias fortran function arguments Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: dorit at gcc dot gnu dot org GCC build triplet: i386-linux GCC host triplet: i386-linux GCC target triplet: i386-linux http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37692
[Bug tree-optimization/37693] New: [alias-improvements-branch] can't prove non-zero number of iterations
This happens in testcase gfortran.dg/vect/pr32377.f90: On the alias branch can't prove that number of iteratios is non zero: Analyzing # of iterations of loop 1 exit condition [2, + , 1](no_overflow) D.1554_60 bounds on difference of bases: -2147483650 ... 2147483645 result: zero if D.1554_60 = 1 # of iterations (character(kind=4)) D.1554_60 + 0x0fffe, bounded by 2147483645 (set_nb_iterations_in_loop = scev_not_known)) (get_loop_exit_condition if (D.1554_60 = S.10_78) ) pr32377.f90:9: note: not vectorized: number of iterations cannot be computed. pr32377.f90:9: note: bad loop form. pr32377.f90:4: note: vectorized 0 loops in function. Using mainline we have: Analyzing # of iterations of loop 1 exit condition [2, + , 1](no_overflow) D.1416_112 bounds on difference of bases: 0 ... 2147483645 result: # of iterations (character(kind=4)) D.1416_112 + 0x0fffe, bounded by 2147483645 (set_nb_iterations_in_loop = (character(kind=4)) D.1416_112 + 0x0fffe)) pr32377.f90:9: note: == get_loop_niters:(character(kind=4)) D.1416_112 + 0x0(get_loop_exit_condition if (S.10_78 = D.1416_112) ) pr32377.f90:9: note: Symbolic number of iterations is (character(kind=4)) D.1416_112 + 0x0 -- Summary: [alias-improvements-branch] can't prove non-zero number of iterations Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: dorit at gcc dot gnu dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37693
[Bug tree-optimization/37694] New: [alias-improvements-branch] can't alias (restrict) function-pointer (read) and local array (write)
This happens in testcases gcc.dg/vect/no-scevccp-outer-6.c and gcc.dg/vect/vect-multitypes-6.c: On the alias branch we can't tell that a read through a (restrict) pointer (which is a function argument) does not overlap with write to a local arrays. As a result we try to vectorize the loop using loop-versioning controled by a run-time aliasing test. In no-scevccp-outer-6.c this capability is not yet supported for outer-loops so we can't vectorize the outer-loop (the inner loop does get vectorized). In vect-multitypes-6.c there are too many runtime checks required, so we bail out: === vect_prune_runtime_alias_test_list === vect-multitypes-6.c:34: note: disable versioning for alias - max number of generated checks exceeded. vect-multitypes-6.c:34: note: too long list of versioning for alias run-time tests. (with --param vect-max-version-for-alias-checks=20 we do vectorize the loop). -- Summary: [alias-improvements-branch] can't alias (restrict) function-pointer (read) and local array (write) Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: dorit at gcc dot gnu dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37694
[Bug tree-optimization/37695] New: [alias-improvements-branch] can't alias a restrict pointer write and a local array read
This happens in gcc.dg/vect/vect-42.c: On the alias branch we can't tell that a write through a restrict pointer (which is a function argument) does not overlap with reads from local arrays. As a result we vectorize using loop-versioning controled by a run-time aliasing test. This in turn forces us to handle misalignment using loop-versioning (rather than peeling, cause right now we don't support peeling combined with versioning, and these are the only ways we currently support misaligned stores). Without the aliasing problem, the loop is vectorized using peeling to align the store. === vect_analyze_dependences === vect-42.c:36: note: versioning for alias required: can't determine dependence between pb[i_59] and *D.2074_6 vect-42.c:36: note: mark for run-time aliasing test between pb[i_59] and *D.2074_6 vect-42.c:36: note: versioning for alias required: can't determine dependence between pc[i_59] and *D.2074_6 vect-42.c:36: note: mark for run-time aliasing test between pc[i_59] and *D.2074_6 ... vect-42.c:36: note: === vect_enhance_data_refs_alignment === vect-42.c:36: note: Unknown misalignment, is_packed = 0 vect-42.c:36: note: Alignment of access forced using versioning. vect-42.c:36: note: Versioning for alignment will be applied. -- Summary: [alias-improvements-branch] can't alias a restrict pointer write and a local array read Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: dorit at gcc dot gnu dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37695
[Bug tree-optimization/37698] New: [alias-improvements-branch] pre makes latech-block non-empty
This happens in testcase gcc.dg/vect/vect-62.c: looks like on the alias branch pre is more powerful, as it moves the load into the latch block; as a result the latch block is not empty, and we fail to vectorize (with -fno-tree-pre vectorization succeeds). Related non-empty-latch PRs that prevernt vectorization: PR28643, PR33447 -- Summary: [alias-improvements-branch] pre makes latech-block non- empty Product: gcc Version: unknown Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: dorit at gcc dot gnu dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37698
[Bug tree-optimization/37699] New: [alias-improvements-branch] can't alias ptr and local array
This happens in gcc.dg/vect/vect-96.c and gcc.dg/vect/no-vfa-vect-43.c. In the first, we can't distinguish between a write through a (local) pointer to a global array (which is a field in a struct), and a read from a local array. s a result we vectorize the loop using loop-versioning controled by a run-time aliasing test, which also means we'll use versioning instead of peeling to align a misaligned store. In the second, we can't tell that reads through a pointer (which is a function argument) do not overlap with a write to a local array. As a result we try to vectorize the loop using loop-versioning controled by a run-time aliasing test, however this testcase doe not allow that (--param vect-max-version-for-alias-checks=0), so vectorization fails. -- Summary: [alias-improvements-branch] can't alias ptr and local array Product: gcc Version: unknown Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: dorit at gcc dot gnu dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37699
[Bug tree-optimization/37700] New: [alias-improvements-branch] redundant load doesn't get eliminated
This happens in testcase gcc.dg/vect/slp-19.c: The problem is with the loop at line 17: with trunk we detect that one of the elements of array 'in' is read twice, so we generate overall 8 loads (reusing one of them). On the alias branch we do not eliminate the extra load. All the reads and write are from/to local arrays, by the way. This results in 9 loads, which the vectorizer interperts as a complicated SLP permutation, so instead it is vectorized across iterations rather than using SLP: slp-19.c:17: note: Load permutation 0 1 2 4 5 6 7 8 slp-19.c:17: note: Build SLP failed: unsupported load permutation out [D.2646_11] = D.2647_12; -- Summary: [alias-improvements-branch] redundant load doesn't get eliminated Product: gcc Version: unknown Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: dorit at gcc dot gnu dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37700
[Bug tree-optimization/37574] [4.4 Regression] ICE with the vectorizer and GC
--- Comment #4 from dorit at gcc dot gnu dot org 2008-09-26 06:29 --- Subject: Bug 37574 Author: dorit Date: Fri Sep 26 06:28:01 2008 New Revision: 140685 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=140685 Log: PR tree-optimization/37574 * tree-vectorizer.c (vect_is_simple_use): Fix indentation. * tree-vect-transform.c (vect_get_constant_vectors): Use vectype instead of vector_type for constants. Take computation out of loop. (vect_get_vec_def_for_operand): Use only vectype for constant case, and use only vector_type for invariant case. (get_initial_def_for_reduction): Use vectype instead of vector_type. Added: trunk/gcc/testsuite/gcc.dg/vect/ggc-pr37574.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.dg/vect/vect.exp trunk/gcc/tree-vect-transform.c trunk/gcc/tree-vectorizer.c -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37574
[Bug tree-optimization/37574] [4.4 Regression] ICE with the vectorizer and GC
-- dorit at gcc dot gnu dot org changed: What|Removed |Added AssignedTo|unassigned at gcc dot gnu |dorit at gcc dot gnu dot org |dot org | Status|NEW |ASSIGNED Last reconfirmed|2008-09-19 14:12:43 |2008-09-21 13:17:55 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37574
[Bug tree-optimization/37574] [4.4 Regression] ICE with the vectorizer and GC
--- Comment #3 from dorit at gcc dot gnu dot org 2008-09-21 13:18 --- happens during outer-loop vectorization. I'm looking into it. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37574
[Bug tree-optimization/37194] Autovectorization of small constant iteration loop degrades performance
--- Comment #3 from dorit at gcc dot gnu dot org 2008-08-22 13:31 --- (In reply to comment #2) The x86_64 generated code looks like ... I wonder why we do not use movups instead. t.i:3: note: Alignment of access forced using peeling. t.i:3: note: Peeling for alignment will be applied. because the vectorizer doesn't support misaligned stores. I think it should be easy to add - see this old patch: http://gcc.gnu.org/ml/gcc-patches/2007-01/msg00604.html (and also on http://gcc.gnu.org/wiki/VectorizationTasks, under todo). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37194
[Bug bootstrap/37152] tree-vect-transform.c: use of = where == may have been intended
--- Comment #2 from dorit at gcc dot gnu dot org 2008-08-19 07:15 --- Subject: Bug 37152 Author: dorit Date: Tue Aug 19 07:14:26 2008 New Revision: 139224 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=139224 Log: PR bootstrap/37152 * tree-vect-transform.c (vect_create_epilog_for_reduction): Change = to == in assert statement. (vectorizable_reduction): Fix typo. Modified: trunk/gcc/ChangeLog trunk/gcc/tree-vect-transform.c -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37152
[Bug bootstrap/37152] tree-vect-transform.c: use of = where == may have been intended
--- Comment #1 from dorit at gcc dot gnu dot org 2008-08-18 20:11 --- (In reply to comment #0) I just tried to compile GNU CC version 4.4 snapshot 20080815 with the Intel C compiler and it said gcc/tree-vect-transform.c(2488): warning #187: use of = where == may have been intended The source code is gcc_assert (ncopies = 1); Perhaps gcc_assert (ncopies == 1); was intended ? no... thanks for the catch, I'll commit a fix -- dorit at gcc dot gnu dot org changed: What|Removed |Added Summary|tree-vect-transform.c: use |tree-vect-transform.c: use |of = where == may have |of = where == may have |been intended |been intended http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37152
[Bug tree-optimization/36844] Vectorizer doesn't support INT-FP conversions with different size
--- Comment #2 from dorit at gcc dot gnu dot org 2008-07-22 10:39 --- (In reply to comment #1) One problem is vectorizable_conversion. Is there a way to support V4DF/V4DI - D4SI/V4SF V8SI - V8SF With the current framework, the only way to support V8SI - V8SF is to implement the TARGET_VECTORIZE_BUILTIN_CONVERSION for these modes. There's no way in the current framework to support V4DF - V4SI V4DI - V4SF because of the single-vector-size assumption. These however would be supported: V4DF - V8SI V4DI - V8SF by modeling the idioms unpack[u/s]_float_[lo/hi] and vec_pack_[u/s]fix_trunc for the respective modes. I think that in order to really support AVX the vectorizer would need to be extended to consider multiple vector sizes (which would probably involve more than just extending the support for conversions). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36844
[Bug middle-end/35343] Sum-reduction loop not recognized
--- Comment #1 from dorit at gcc dot gnu dot org 2008-02-25 10:21 --- (In reply to comment #0) It is beneficial to unroll reduction loop (and split the reduction target) to reduce dependence height due to recurrence, but GCC does not perform such optimization (-O3 -fno-tree-vectorize) it does, if you use -fvariable-expansion-in-unroller -funroll-loops (this splits the reduction target into 2 accumulators. For more agressive spiltting you can use --param max-variable-expansions-in-unrolle=[n]) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35343
[Bug target/32893] zlib segfault in inflate_table() compiled w/ -O -msse2 ftree-vectorize
--- Comment #19 from dorit at gcc dot gnu dot org 2008-01-28 13:20 --- Fixed? In a way, yes. The problem is avoided by generating too conservative code. AFAIU, a better solution may be expected in 4.4 from the stack alignment branch. In any case this segfault PR can be closed, and instead a missed optimization PR could be opened. -- dorit at gcc dot gnu dot org changed: What|Removed |Added Status|WAITING |RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32893
[Bug tree-optimization/34591] [4.3 Regression] internal compiler error: in cost_for_stmt, at tree-vect-transform.c:98
--- Comment #6 from dorit at gcc dot gnu dot org 2008-01-03 10:08 --- (In reply to comment #5) I can confirm that pulseaudio 0.9.8 sources which caused the crash, compile fine now with the latest gcc 4.3 snapshot. thanks. (I usually prefer to wait for the person who reported the bug to confirm that it can be closed) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34591
[Bug tree-optimization/34591] [4.3 Regression] internal compiler error: in cost_for_stmt, at tree-vect-transform.c:98
--- Comment #7 from dorit at gcc dot gnu dot org 2008-01-03 10:17 --- fixed -- dorit at gcc dot gnu dot org changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34591
[Bug tree-optimization/34591] [4.3 Regression] internal compiler error: in cost_for_stmt, at tree-vect-transform.c:98
--- Comment #3 from dorit at gcc dot gnu dot org 2007-12-27 19:14 --- Subject: Bug 34591 Author: dorit Date: Thu Dec 27 19:14:17 2007 New Revision: 131206 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=131206 Log: PR tree-optimization/34591 * tree-vect-trasnform.c (vect_estimate_min_profitable_iters): Skip stmts (including reduction stmts) that are not live. Added: trunk/gcc/testsuite/gcc.dg/vect/pr34591.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-vect-transform.c -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34591
[Bug tree-optimization/34591] [4.3 Regression] internal compiler error: in cost_for_stmt, at tree-vect-transform.c:98
-- dorit at gcc dot gnu dot org changed: What|Removed |Added AssignedTo|unassigned at gcc dot gnu |dorit at gcc dot gnu dot org |dot org | Status|NEW |ASSIGNED Last reconfirmed|2007-12-26 13:55:29 |2007-12-26 15:29:56 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34591
[Bug tree-optimization/34330] -ftree-parallelize-loops=4 ICE with the vectorizer also
--- Comment #3 from dorit at gcc dot gnu dot org 2007-12-19 09:38 --- This is a vectorizer vs not being able to run may_alias after it can you please remind me why we can't run may_alias after the vectorizer? (and what do you think can be done about it?) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34330
[Bug tree-optimization/34445] [4.3 Regression] internal compiler error: in cost_for_stmt, at tree-vect-transform.c:98
--- Comment #5 from dorit at gcc dot gnu dot org 2007-12-17 11:14 --- Subject: Bug 34445 Author: dorit Date: Mon Dec 17 11:13:56 2007 New Revision: 131006 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=131006 Log: PR tree-optimization/34445 * tree-vect-trasnform.c (vect_estimate_min_profitable_iters): Skip stmts (including live stmts) that are not relevant. Added: trunk/gcc/testsuite/gfortran.dg/vect/cost-model-pr34445.f trunk/gcc/testsuite/gfortran.dg/vect/cost-model-pr34445a.f Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gfortran.dg/vect/vect.exp trunk/gcc/tree-vect-transform.c -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34445
[Bug tree-optimization/34445] [4.3 Regression] internal compiler error: in cost_for_stmt, at tree-vect-transform.c:98
--- Comment #4 from dorit at gcc dot gnu dot org 2007-12-16 13:06 --- testing this patch: *** tree-vect-transform.c (revision 130987) --- tree-vect-transform.c (working copy) *** vect_estimate_min_profitable_iters (loop *** 197,214 factor = 1; for (si = bsi_start (bb); !bsi_end_p (si); bsi_next (si)) ! { ! tree stmt = bsi_stmt (si); ! stmt_vec_info stmt_info = vinfo_for_stmt (stmt); ! if (!STMT_VINFO_RELEVANT_P (stmt_info) !!STMT_VINFO_LIVE_P (stmt_info)) ! continue; ! scalar_single_iter_cost += cost_for_stmt (stmt) * factor; ! vec_inside_cost += STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) * factor; /* FIXME: for stmts in the inner-loop in outer-loop vectorization, some of the outside costs are generated inside the outer-loop. */ ! vec_outside_cost += STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info); ! } } /* Add additional cost for the peeled instructions in prologue and epilogue --- 197,215 factor = 1; for (si = bsi_start (bb); !bsi_end_p (si); bsi_next (si)) ! { ! tree stmt = bsi_stmt (si); ! stmt_vec_info stmt_info = vinfo_for_stmt (stmt); ! /* Skip stmts that are not vectorized inside the loop. */ ! if (!STMT_VINFO_RELEVANT_P (stmt_info) ! STMT_VINFO_DEF_TYPE (stmt_info) != vect_reduction_def) ! continue; ! scalar_single_iter_cost += cost_for_stmt (stmt) * factor; ! vec_inside_cost += STMT_VINFO_INSIDE_OF_LOOP_COST (stmt_info) * factor; /* FIXME: for stmts in the inner-loop in outer-loop vectorization, some of the outside costs are generated inside the outer-loop. */ ! vec_outside_cost += STMT_VINFO_OUTSIDE_OF_LOOP_COST (stmt_info); ! } } /* Add additional cost for the peeled instructions in prologue and epilogue (It fixes both testcases) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34445
[Bug tree-optimization/34445] [4.3 Regression] internal compiler error: in cost_for_stmt, at tree-vect-transform.c:98
-- dorit at gcc dot gnu dot org changed: What|Removed |Added AssignedTo|unassigned at gcc dot gnu |dorit at gcc dot gnu dot org |dot org | Status|NEW |ASSIGNED Last reconfirmed|2007-12-12 20:00:40 |2007-12-15 20:50:23 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34445
[Bug tree-optimization/33319] [4.2 regression] g++.dg/tree-ssa/pr27549.C ICE with vectorization
--- Comment #14 from dorit at gcc dot gnu dot org 2007-11-22 15:22 --- closed, given recent feedback -- dorit at gcc dot gnu dot org changed: What|Removed |Added Status|NEW |RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33319
[Bug tree-optimization/33869] [4.3 Regression] ICE verify_ssa failed (missing definition for SSA_NAME)
--- Comment #15 from dorit at gcc dot gnu dot org 2007-11-22 15:17 --- (In reply to comment #12) ... Richard, is this related to the issue you reported in http://gcc.gnu.org/ml/gcc-patches/2007-10/msg01127.html (looks like the same error)? ... Yes, these are likely similar problems. The only difference I see is that this one doesn't involve unions? Richard, any chance you could take a look? (I'm asking just cause it sounds like you've had recent experience at looking at potentially exactly this kind of problem...)? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33869
[Bug tree-optimization/33869] [4.3 Regression] ICE verify_ssa failed (missing definition for SSA_NAME)
--- Comment #14 from dorit at gcc dot gnu dot org 2007-11-22 15:14 --- (In reply to comment #13) Dorit, can you please take a look again? I will not be able to look into this in the next couple of weeks, sorry. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33869
[Bug tree-optimization/33860] [4.3 Regression] ICE in vectorizable_load, at tree-vect-transform.c:5503
--- Comment #7 from dorit at gcc dot gnu dot org 2007-11-13 13:29 --- fixed -- dorit at gcc dot gnu dot org changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33860
[Bug rtl-optimization/34011] Memory load is not eliminated from tight vectorized loop
--- Comment #1 from dorit at gcc dot gnu dot org 2007-11-07 18:06 --- (In reply to comment #0) Following testcase exposes optimization problem with current SVN gcc: ... the same address is accessed with unaligned access (3) as well as aligned access. This is a missed-optimization in the vectorizer - we use loop-versioning to deal with the fact that we don't yet support misaligned stores; so the vectorized version of the loop is guarded by a runtime test that checks that the address of the store is aligned. However, we don't use the information that there's a load from the same address that is therefore also guaranteed to be aligned. We actualy have this information (we detect DRs that have the same alignment and collect them in STMT_VINFO_SAME_ALIGN_REFS), but we don't use it when we do the versioning. We *do* use this information when instead of versioning the loop, we peel the loop to make the store aligned. In this case we also mark the relevant SAME_ALIGN_REFS as aligned and generate aligned accesses for them. (By the way, the reason we decide to use loop-versioning and not loop-peeling is because we can't determing whether the pointers overlap at compile time. So we have to use runtime dependence testing (i.e. versioning for aliasing), and since we currently don't support both versioning and peeling together, this dictates that we will use runtime alignment testing instead of peeling.) Here is how it looks like in the vectorizer dump file: pr34011.c:14: note: === vect_analyze_dependences === pr34011.c:14: note: dependence distance = 0. pr34011.c:14: note: accesses have the same alignment. pr34011.c:14: note: dependence distance modulo vf == 0 between *D.1529_9 and *D.1529_9 pr34011.c:14: note: versioning for alias required: can't determine dependence between *D.1531_14 and *D.1529_9 pr34011.c:14: note: mark for run-time aliasing test between *D.1531_14 and *D.1529_9 ... pr34011.c:14: note: === vect_enhance_data_refs_alignment === pr34011.c:14: note: Unknown misalignment, is_packed = 0 pr34011.c:14: note: Alignment of access forced using versioning. pr34011.c:14: note: Versioning for alignment will be applied. pr34011.c:14: note: Vectorizing an unaligned access. pr34011.c:14: note: Vectorizing an unaligned access. Instead, if I add __restrict__ qualifiers to the pointer arguments, we get this: pr34011b.c:14: note: === vect_analyze_dependences === pr34011b.c:14: note: dependence distance = 0. pr34011b.c:14: note: accesses have the same alignment. pr34011b.c:14: note: dependence distance modulo vf == 0 between *D.1529_9 and *D.1529_9 ... pr34011b.c:14: note: === vect_enhance_data_refs_alignment === pr34011b.c:14: note: Unknown misalignment, is_packed = 0 ... pr34011b.c:14: note: Alignment of access forced using peeling. pr34011b.c:14: note: Peeling for alignment will be applied. pr34011b.c:14: note: Vectorizing an unaligned access. i.e. we don't need to use runtime dependence testing and version the loop, so we can use peeling to align the store along with anything that has the same alignment as the store: bb 6: MEM[base: D.1676, index: ivtmp.142] = M*(vect_p.111 + ivtmp.142){misalignment: 0} srcshift | MEM[base: D.1676, index: ivtmp.142]; ... Missing IV elimination could be attributed to tree loop optimizations, but others are IMO RTL optimization problems, (except for the misaligned access, which the vectorizer can avoid). because we enter RTL generation with: bad: bb 4: MEM[index: ivtmp.127] = M*(vector int *) ivtmp.130{misalignment: 0} srcshift.3 | M*(vector int *) ivtmp.127{misalignment: 0}; -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34011
[Bug tree-optimization/34005] [4.3 Regression] ICE: verify_ssa failed (expected an SSA_NAME object)
--- Comment #3 from dorit at gcc dot gnu dot org 2007-11-06 18:11 --- I don't think these are related to PR33680. Sounds like we may be generating a stmt with a cond_expr at the rhs. The data-reference analysis results in: base_address: blocks offset from base address: k_4(D) == 0 ? 8 : 0 constant offset from base address: 0 step: 1 aligned to: 8 base_object: blocks[0][0] symbol tag: blocks (Note the cond_expr used to represent the offset). We probably need to call the gimplifier (if we don't already) and also apply Zdenek's patch that allows gimplifying rhs cond_exprs - http://gcc.gnu.org/ml/gcc-patches/2007-07/msg02052.html. -- dorit at gcc dot gnu dot org changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever Confirmed|0 |1 Last reconfirmed|-00-00 00:00:00 |2007-11-06 18:11:35 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34005
[Bug tree-optimization/34005] [4.3 Regression] ICE: verify_ssa failed (expected an SSA_NAME object)
--- Comment #4 from dorit at gcc dot gnu dot org 2007-11-06 18:29 --- We probably need to call the gimplifier (if we don't already) and also apply Zdenek's patch that allows gimplifying rhs cond_exprs - http://gcc.gnu.org/ml/gcc-patches/2007-07/msg02052.html. Yep - I just tried applying Zdenek's patch to the gimplifier, and it indeed solves the ICE in both tests. I'll go back and propose this patch for mainline again. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=34005
[Bug tree-optimization/33987] [4.3 regression] internal compiler error: in get_initial_def_for_reduction, at tree-vect-transform.c:2110 with -O3 -msse2
--- Comment #3 from dorit at gcc dot gnu dot org 2007-11-04 03:49 --- Subject: Bug 33987 Author: dorit Date: Sun Nov 4 03:48:58 2007 New Revision: 129880 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=129880 Log: PR tree-optimization/33987 * tree-vect-transform.c (get_initial_def_for_reduction): Fix assert. Fix indentation. (vectorizable_reduction): Add type check. Modified: trunk/gcc/ChangeLog trunk/gcc/tree-vect-transform.c -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33987
[Bug tree-optimization/33319] [4.2 regression] g++.dg/tree-ssa/pr27549.C ICE with vectorization
--- Comment #11 from dorit at gcc dot gnu dot org 2007-11-04 04:09 --- (In reply to comment #10) Doesn't fail on trunk since r129797: 2007-10-31 Sebastian Pop [EMAIL PROTECTED] PR tree-optimization/32377 ... before: pr27549.C:58: note: create runtime check for data references *D.2383_45 and *D.2381_41 pr27549.C:58: note: LOOP VECTORIZED. after: pr27549.C:58: note: not vectorized, possible dependence between data-refs *D.2383_45 and *D.2381_41 the assumption was that the ICE was related to versioning-for-aliasing (run-time dependence testing), which, now that the dependence-tester was fixed, is not required anymore, but: Still fails on 4.2 branch. ...we don't have versioning-for-aliasing in 4.2, so this loop could not be vectorized with 4.2 (unless our dependence tester in 4.2 is able to determine the dependence without this fix?). Interesting. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33319
[Bug tree-optimization/33987] [4.3 regression] internal compiler error: in get_initial_def_for_reduction, at tree-vect-transform.c:2110 with -O3 -msse2
-- dorit at gcc dot gnu dot org changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever Confirmed|0 |1 Last reconfirmed|-00-00 00:00:00 |2007-11-03 03:35:30 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33987
[Bug tree-optimization/33987] [4.3 regression] internal compiler error: in get_initial_def_for_reduction, at tree-vect-transform.c:2110 with -O3 -msse2
--- Comment #2 from dorit at gcc dot gnu dot org 2007-11-03 04:06 --- testing this fix: Index: tree-vect-transform.c === *** tree-vect-transform.c (revision 129763) --- tree-vect-transform.c (working copy) *** get_initial_def_for_reduction (tree stmt *** 2107,2113 tree vector_type; bool nested_in_vect_loop = false; ! gcc_assert (INTEGRAL_TYPE_P (type) || SCALAR_FLOAT_TYPE_P (type)); if (nested_in_vect_loop_p (loop, stmt)) nested_in_vect_loop = true; else --- 2107,2113 tree vector_type; bool nested_in_vect_loop = false; ! gcc_assert (POINTER_TYPE_P (type) || INTEGRAL_TYPE_P (type) || SCALAR_FLOAT_TYPE_P (type)); if (nested_in_vect_loop_p (loop, stmt)) nested_in_vect_loop = true; else *** get_initial_def_for_reduction (tree stmt *** 2120,2136 case WIDEN_SUM_EXPR: case DOT_PROD_EXPR: case PLUS_EXPR: ! if (nested_in_vect_loop) ! *adjustment_def = vecdef; ! else ! *adjustment_def = init_val; ! /* Create a vector of zeros for init_def. */ ! if (INTEGRAL_TYPE_P (type)) ! def_for_init = build_int_cst (type, 0); else def_for_init = build_real (type, dconst0); ! for (i = nunits - 1; i = 0; --i) ! t = tree_cons (NULL_TREE, def_for_init, t); vector_type = get_vectype_for_scalar_type (TREE_TYPE (def_for_init)); gcc_assert (vector_type); init_def = build_vector (vector_type, t); --- 2120,2136 case WIDEN_SUM_EXPR: case DOT_PROD_EXPR: case PLUS_EXPR: ! if (nested_in_vect_loop) ! *adjustment_def = vecdef; else + *adjustment_def = init_val; + /* Create a vector of zeros for init_def. */ + if (SCALAR_FLOAT_TYPE_P (type)) def_for_init = build_real (type, dconst0); ! else ! def_for_init = build_int_cst (type, 0); ! for (i = nunits - 1; i = 0; --i) ! t = tree_cons (NULL_TREE, def_for_init, t); vector_type = get_vectype_for_scalar_type (TREE_TYPE (def_for_init)); gcc_assert (vector_type); init_def = build_vector (vector_type, t); *** vectorizable_reduction (tree stmt, block *** 2716,2721 --- 2716,2724 return false; scalar_dest = GIMPLE_STMT_OPERAND (stmt, 0); scalar_type = TREE_TYPE (scalar_dest); + if (!POINTER_TYPE_P (scalar_type) !INTEGRAL_TYPE_P (scalar_type) +!SCALAR_FLOAT_TYPE_P (scalar_type)) + return false; /* All uses but the last are expected to be defined in the loop. The last use is the reduction variable. */ -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33987
[Bug target/33958] Using -ftree-vectorize , creates an illegal movaps instruction
--- Comment #3 from dorit at gcc dot gnu dot org 2007-10-31 17:46 --- (In reply to comment #2) Works for me. Try a newer 4.2.x release. I wonder if the fix for PR25413 fixed this problem - it went into 4.2 on July 25th, just shortly after 4.2.1 was released :-( but should be in 4.2.2 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33958
[Bug tree-optimization/33113] Failing to represent the stride (with array) of a dataref when it is not a constant
--- Comment #6 from dorit at gcc dot gnu dot org 2007-11-01 00:55 --- thanks! but the problem is that in the vectorizer, DR_STEP has to be an INTEGER_CST: for instance, step = TREE_INT_CST_LOW (DR_STEP (dra)); ... || tree_int_cst_compare (DR_STEP (dra), DR_STEP (drb))) and plenty of other places will ICE if we feed them with symbolic strides. This can be fixed. I'll try to look into that. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33113
[Bug target/32893] zlib segfault in inflate_table() compiled w/ -O -msse2 ftree-vectorize
--- Comment #17 from dorit at gcc dot gnu dot org 2007-10-30 05:25 --- Subject: Bug 32893 Author: dorit Date: Tue Oct 30 05:25:10 2007 New Revision: 129764 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=129764 Log: PR tree-optimization/32893 * tree-vectorize.c (vect_can_force_dr_alignment_p): Check STACK_BOUNDARY instead of PREFERRED_STACK_BOUNDARY. Added: trunk/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-6-global.c trunk/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-31.c trunk/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-34.c trunk/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-36.c trunk/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-64.c trunk/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-65.c trunk/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-66.c trunk/gcc/testsuite/gcc.dg/vect/no-section-anchors-vect-68.c trunk/gcc/testsuite/gcc.dg/vect/vect-77-alignchecks.c trunk/gcc/testsuite/gcc.dg/vect/vect-77-global.c trunk/gcc/testsuite/gcc.dg/vect/vect-78-alignchecks.c trunk/gcc/testsuite/gcc.dg/vect/vect-78-global.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gcc.dg/vect/costmodel/i386/costmodel-vect-31.c trunk/gcc/testsuite/gcc.dg/vect/costmodel/x86_64/costmodel-vect-31.c trunk/gcc/testsuite/gcc.dg/vect/no-scevccp-outer-6.c trunk/gcc/testsuite/gcc.dg/vect/slp-25.c trunk/gcc/testsuite/gcc.dg/vect/vect-13.c trunk/gcc/testsuite/gcc.dg/vect/vect-17.c trunk/gcc/testsuite/gcc.dg/vect/vect-18.c trunk/gcc/testsuite/gcc.dg/vect/vect-19.c trunk/gcc/testsuite/gcc.dg/vect/vect-2.c trunk/gcc/testsuite/gcc.dg/vect/vect-20.c trunk/gcc/testsuite/gcc.dg/vect/vect-21.c trunk/gcc/testsuite/gcc.dg/vect/vect-22.c trunk/gcc/testsuite/gcc.dg/vect/vect-27.c trunk/gcc/testsuite/gcc.dg/vect/vect-29.c trunk/gcc/testsuite/gcc.dg/vect/vect-3.c trunk/gcc/testsuite/gcc.dg/vect/vect-31.c trunk/gcc/testsuite/gcc.dg/vect/vect-34.c trunk/gcc/testsuite/gcc.dg/vect/vect-36.c trunk/gcc/testsuite/gcc.dg/vect/vect-4.c trunk/gcc/testsuite/gcc.dg/vect/vect-5.c trunk/gcc/testsuite/gcc.dg/vect/vect-6.c trunk/gcc/testsuite/gcc.dg/vect/vect-64.c trunk/gcc/testsuite/gcc.dg/vect/vect-65.c trunk/gcc/testsuite/gcc.dg/vect/vect-66.c trunk/gcc/testsuite/gcc.dg/vect/vect-68.c trunk/gcc/testsuite/gcc.dg/vect/vect-7.c trunk/gcc/testsuite/gcc.dg/vect/vect-72.c trunk/gcc/testsuite/gcc.dg/vect/vect-73.c trunk/gcc/testsuite/gcc.dg/vect/vect-76.c trunk/gcc/testsuite/gcc.dg/vect/vect-77.c trunk/gcc/testsuite/gcc.dg/vect/vect-78.c trunk/gcc/testsuite/gcc.dg/vect/vect-86.c trunk/gcc/testsuite/gcc.dg/vect/vect-all.c trunk/gcc/testsuite/gcc.dg/vect/vect.exp trunk/gcc/testsuite/gcc.dg/vect/wrapv-vect-7.c trunk/gcc/testsuite/lib/target-supports.exp trunk/gcc/tree-vectorizer.c -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32893
[Bug tree-optimization/33860] [4.3 Regression] ICE in vectorizable_load, at tree-vect-transform.c:5503
--- Comment #5 from dorit at gcc dot gnu dot org 2007-10-23 19:50 --- Subject: Bug 33860 Author: dorit Date: Tue Oct 23 19:50:18 2007 New Revision: 129587 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=129587 Log: PR tree-optimization/33860 * tree-vect-transform.c (vect_analyze_data_ref_access): Don't allow interleaved accesses in case the dr is inside the inner-loop during outer-loop vectorization. Added: trunk/gcc/testsuite/g++.dg/vect/pr33860.cc trunk/gcc/testsuite/g++.dg/vect/pr33860a.cc Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-vect-analyze.c -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33860
[Bug tree-optimization/33860] [4.3 Regression] ICE in vectorizable_load, at tree-vect-transform.c:5503
--- Comment #4 from dorit at gcc dot gnu dot org 2007-10-22 22:54 --- There's some bad interaction here between the data-interleaving support and the outer-loop support - these are not yet supported together, however it still slipped through the checks during the analysis phase. This patch fixes that by not allowing us to detect interleaved accesses in the inner-loop during outer-loop vectorization: --- tree-vect-analyze.c 2007-10-22 08:34:45.0 +0200 +++ tree-vect-analyze.dn.c 2007-10-22 22:23:01.0 +0200 @@ -2321,6 +2321,10 @@ if (nested_in_vect_loop_p (loop, stmt)) { + /* Interleaved accesses are not yet supported within outer-loop +vectorization for references in the inner-loop. */ + DR_GROUP_FIRST_DR (vinfo_for_stmt (stmt)) = NULL_TREE; + /* For the rest of the analysis we use the outer-loop step. */ step = STMT_VINFO_DR_STEP (stmt_info); dr_step = TREE_INT_CST_LOW (step); (yet to be bootstrapped etc.) By the way, on powerpc-linux, this testcase gets vectorized with this fix (after changing the doubles to floats, and forcing alignment of the data array with attribute aligned), without taking advantage of the fact that the two loads are interleaved. By the way, I suspect that the vectorized code here is quite worse than the original scalar code; instead of: (ld,ld,add,store) * 16 we have: (vload,realign,splat,vload,realign,splat,vadd,vstore) * 4 with additional overhead outside the loop. After the ICE is fixed we should probably add this as a missed-optimization PR (both in terms of the cost model, and in terms of exploiting the data reuse of the interleaved accesses). -- dorit at gcc dot gnu dot org changed: What|Removed |Added AssignedTo|unassigned at gcc dot gnu |dorit at gcc dot gnu dot org |dot org | Status|UNCONFIRMED |ASSIGNED Ever Confirmed|0 |1 Last reconfirmed|-00-00 00:00:00 |2007-10-22 22:54:32 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33860
[Bug tree-optimization/33834] [4.3 Regression] ICE in vect_get_vec_def_for_operand, at tree-vect-transform.c:1829
--- Comment #7 from dorit at gcc dot gnu dot org 2007-10-23 03:24 --- Subject: Bug 33834 Author: dorit Date: Tue Oct 23 03:24:06 2007 New Revision: 129571 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=129571 Log: PR tree-optimization/33834 PR tree-optimization/33835 * tree-vect-analyze.c (vect_analyze_operations): RELEVANT and LIVE stmts need to be checked for success seperately. * tree-vect-transform.c (vectorizable_call, vectorizable_conversion): Remove the check that stmt is not LIVE. (vectorizable_assignment, vectorizable_induction): Likewise. (vectorizable_operation, vectorizable_type_demotion): Likewise. (vectorizable_type_promotion, vectorizable_load, vectorizable_store): Likewise. (vectorizable_live_operation): Check that op is not NULL. Added: trunk/gcc/testsuite/g++.dg/vect/pr33834_1.cc trunk/gcc/testsuite/g++.dg/vect/pr33834_2.cc trunk/gcc/testsuite/g++.dg/vect/pr33835.cc Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-vect-analyze.c trunk/gcc/tree-vect-transform.c -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33834
[Bug tree-optimization/33835] [4.3 Regression] Segfault in vect_is_simple_use
--- Comment #5 from dorit at gcc dot gnu dot org 2007-10-23 03:24 --- Subject: Bug 33835 Author: dorit Date: Tue Oct 23 03:24:06 2007 New Revision: 129571 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=129571 Log: PR tree-optimization/33834 PR tree-optimization/33835 * tree-vect-analyze.c (vect_analyze_operations): RELEVANT and LIVE stmts need to be checked for success seperately. * tree-vect-transform.c (vectorizable_call, vectorizable_conversion): Remove the check that stmt is not LIVE. (vectorizable_assignment, vectorizable_induction): Likewise. (vectorizable_operation, vectorizable_type_demotion): Likewise. (vectorizable_type_promotion, vectorizable_load, vectorizable_store): Likewise. (vectorizable_live_operation): Check that op is not NULL. Added: trunk/gcc/testsuite/g++.dg/vect/pr33834_1.cc trunk/gcc/testsuite/g++.dg/vect/pr33834_2.cc trunk/gcc/testsuite/g++.dg/vect/pr33835.cc Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-vect-analyze.c trunk/gcc/tree-vect-transform.c -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33835
[Bug tree-optimization/33833] [4.3 Regression] ICE in build2_stat, at tree.c:3110 at -O3, tree-vectorizer
--- Comment #4 from dorit at gcc dot gnu dot org 2007-10-21 06:39 --- I was able to reproduce this on i386-linux. Looks like it's related to PLUS_EXPR vs. POINTER_PLUS_EXPR. The folowing patch fixes this testcase: Index: tree-vect-analyze.c === *** tree-vect-analyze.c (revision 129521) --- tree-vect-analyze.c (working copy) *** vect_analyze_data_refs (loop_vec_info lo *** 3249,3255 inner-loop: *(BASE+INIT). (The first location is actually BASE+INIT+OFFSET, but we add OFFSET separately later. */ tree inner_base = build_fold_indirect_ref ! (fold_build2 (PLUS_EXPR, TREE_TYPE (base), base, init)); if (vect_print_dump_info (REPORT_DETAILS)) { --- 3249,3256 inner-loop: *(BASE+INIT). (The first location is actually BASE+INIT+OFFSET, but we add OFFSET separately later. */ tree inner_base = build_fold_indirect_ref ! (fold_build2 (POINTER_PLUS_EXPR, ! TREE_TYPE (base), base, init)); if (vect_print_dump_info (REPORT_DETAILS)) { ... but breaks some of the current vectorizer testcases: WARNING: gcc.dg/vect/vect-62.c compilation failed to produce executable WARNING: gcc.dg/vect/vect-63.c compilation failed to produce executable WARNING: gcc.dg/vect/vect-64.c compilation failed to produce executable WARNING: gcc.dg/vect/vect-65.c compilation failed to produce executable WARNING: gcc.dg/vect/vect-66.c compilation failed to produce executable WARNING: gcc.dg/vect/vect-67.c compilation failed to produce executable WARNING: gcc.dg/vect/vect-70.c compilation failed to produce executable WARNING: gcc.dg/vect/vect-align-2.c compilation failed to produce executable WARNING: gcc.dg/vect/no-section-anchors-vect-69.c compilation failed to produce executable WARNING: gcc.dg/vect/no-scevccp-slp-30.c compilation failed to produce executable I looked into one of these failures, and it fails with: /home/dorit/mainline/gcc/gcc/testsuite/gcc.dg/vect/vect-62.c:10: internal compiler error: in build2_stat, at tree.c:3115 Looks like it doesn't like the POINTER_PLUS_EXPR in this case because arg1 is not compatible with sizetype: Breakpoint 4, useless_type_conversion_p (outer_type=0xb7cbf000, inner_type=0xb7cc521c) at ../../gcc/gcc/tree-ssa.c:1074 1074{ (gdb) p debug_tree(outer_type) integer_type 0xb7cbf000 unsigned int public unsigned sizetype SI size integer_cst 0xb7cb2658 type integer_type 0xb7cbf06c bit_size_type constant invariant 32 unit size integer_cst 0xb7cb2444 type integer_type 0xb7cbf000 unsigned int constant invariant 4 align 32 symtab -1210758772 alias set -1 canonical type 0xb7cc50d8 precision 32 min integer_cst 0xb7cb2674 0 max integer_cst 0xb7cb2c08 -1 $8 = void (gdb) p debug_tree(inner_type) integer_type 0xb7cc521c public sizetype SI size integer_cst 0xb7cb2658 type integer_type 0xb7cbf06c bit_size_type constant invariant 32 unit size integer_cst 0xb7cb2444 type integer_type 0xb7cbf000 unsigned int constant invariant 4 align 32 symtab 0 alias set -1 canonical type 0xb7cc521c precision 32 min integer_cst 0xb7cb2b98 -2147483648 max integer_cst 0xb7cb2bb4 2147483647 $9 = void I think POINTER_PLUS_EXPR makes sense here - need to check why we have this mismatch between unsigned and signed sizetypes (and if that's also the problem in the other testcases). -- dorit at gcc dot gnu dot org changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever Confirmed|0 |1 Last reconfirmed|-00-00 00:00:00 |2007-10-21 06:39:08 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33833
[Bug tree-optimization/33834] [4.3 Regression] ICE in vect_get_vec_def_for_operand, at tree-vect-transform.c:1829
--- Comment #5 from dorit at gcc dot gnu dot org 2007-10-21 07:14 --- This patch fixes it: Index: tree-vect-transform.c === *** tree-vect-transform.c (revision 129521) --- tree-vect-transform.c (working copy) *** vectorizable_live_operation (tree stmt, *** 5870,5875 --- 5870,5878 gcc_assert (STMT_VINFO_LIVE_P (stmt_info)); + if (STMT_VINFO_RELEVANT_P (stmt_info)) + return false; + if (STMT_VINFO_DEF_TYPE (stmt_info) == vect_reduction_def) return false; (but doesn't allow vectorization. I may try a different fix that does allow vectorization) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33834
[Bug tree-optimization/33835] [4.3 Regression] Segfault in vect_is_simple_use
--- Comment #3 from dorit at gcc dot gnu dot org 2007-10-21 08:07 --- The proposed fix/work-around for PR33834 also happens to fix this PR. But the real problem is that we try to access a NULL argument (operand 2 of a CALL_EXPR may be NULL). So we should probably at least add something like this: *** vectorizable_live_operation (tree stmt, *** 5893,5899 for (i = 0; i op_type; i++) { op = TREE_OPERAND (operation, i); ! if (!vect_is_simple_use (op, loop_vinfo, def_stmt, def, dt)) { if (vect_print_dump_info (REPORT_DETAILS)) fprintf (vect_dump, use not simple.); --- 5896,5902 for (i = 0; i op_type; i++) { op = TREE_OPERAND (operation, i); ! if (op !vect_is_simple_use (op, loop_vinfo, def_stmt, def, dt)) { if (vect_print_dump_info (REPORT_DETAILS)) fprintf (vect_dump, use not simple.); This would help us pass the analysis stage, but we would later fail in the transform stage just like in PR33834. So this PR would require the same fix as PR33834. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33835
[Bug tree-optimization/33834] [4.3 Regression] ICE in vect_get_vec_def_for_operand, at tree-vect-transform.c:1829
--- Comment #6 from dorit at gcc dot gnu dot org 2007-10-22 04:28 --- I'm testing this patch. It fixes the two testcases, while allowing the first testcase to get vectorized. (the last bit in the patch is the fix for PR33835): Index: tree-vect-analyze.c === *** tree-vect-analyze.c (revision 129521) --- tree-vect-analyze.c (working copy) *** vect_analyze_operations (loop_vec_info l *** 481,487 need_to_vectorize = true; } ! ok = (vectorizable_type_promotion (stmt, NULL, NULL) || vectorizable_type_demotion (stmt, NULL, NULL) || vectorizable_conversion (stmt, NULL, NULL, NULL) || vectorizable_operation (stmt, NULL, NULL, NULL) --- 481,489 need_to_vectorize = true; } ! if (STMT_VINFO_RELEVANT_P (stmt_info) ! || STMT_VINFO_DEF_TYPE (stmt_info) == vect_reduction_def) ! ok = (vectorizable_type_promotion (stmt, NULL, NULL) || vectorizable_type_demotion (stmt, NULL, NULL) || vectorizable_conversion (stmt, NULL, NULL, NULL) || vectorizable_operation (stmt, NULL, NULL, NULL) *** vect_analyze_operations (loop_vec_info l *** 492,508 || vectorizable_condition (stmt, NULL, NULL) || vectorizable_reduction (stmt, NULL, NULL)); /* Stmts that are (also) live (i.e. - that are used out of the loop) need extra handling, except for vectorizable reductions. */ if (STMT_VINFO_LIVE_P (stmt_info) STMT_VINFO_TYPE (stmt_info) != reduc_vec_info_type) ! ok |= vectorizable_live_operation (stmt, NULL, NULL); if (!ok) { if (vect_print_dump_info (REPORT_UNVECTORIZED_LOOPS)) { ! fprintf (vect_dump, not vectorized: stmt not supported: ); print_generic_expr (vect_dump, stmt, TDF_SLIM); } return false; --- 494,522 || vectorizable_condition (stmt, NULL, NULL) || vectorizable_reduction (stmt, NULL, NULL)); + if (!ok) + { + if (vect_print_dump_info (REPORT_UNVECTORIZED_LOOPS)) + { + fprintf (vect_dump, not vectorized: relevant stmt not ); + fprintf (vect_dump, supported: ); + print_generic_expr (vect_dump, stmt, TDF_SLIM); + } + return false; + } + /* Stmts that are (also) live (i.e. - that are used out of the loop) need extra handling, except for vectorizable reductions. */ if (STMT_VINFO_LIVE_P (stmt_info) STMT_VINFO_TYPE (stmt_info) != reduc_vec_info_type) ! ok = vectorizable_live_operation (stmt, NULL, NULL); if (!ok) { if (vect_print_dump_info (REPORT_UNVECTORIZED_LOOPS)) { ! fprintf (vect_dump, not vectorized: live stmt not ); ! fprintf (vect_dump, supported: ); print_generic_expr (vect_dump, stmt, TDF_SLIM); } return false; Index: tree-vect-transform.c === *** tree-vect-transform.c (revision 129521) --- tree-vect-transform.c (working copy) *** vectorizable_call (tree stmt, block_stmt *** 2961,2974 if (STMT_SLP_TYPE (stmt_info)) return false; - /* FORNOW: not yet supported. */ - if (STMT_VINFO_LIVE_P (stmt_info)) - { - if (vect_print_dump_info (REPORT_DETAILS)) - fprintf (vect_dump, value used after loop.); - return false; - } - /* Is STMT a vectorizable call? */ if (TREE_CODE (stmt) != GIMPLE_MODIFY_STMT) return false; --- 2961,2966 *** vectorizable_conversion (tree stmt, bloc *** 3307,3320 if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_loop_def) return false; - if (STMT_VINFO_LIVE_P (stmt_info)) - { - /* FORNOW: not yet supported. */ - if (vect_print_dump_info (REPORT_DETAILS)) - fprintf (vect_dump, value used after loop.); - return false; - } - if (TREE_CODE (stmt) != GIMPLE_MODIFY_STMT) return false; --- 3299,3304 *** vectorizable_assignment (tree stmt, bloc *** 3585,3598 if (STMT_VINFO_DEF_TYPE (stmt_info) != vect_loop_def) return false; - /* FORNOW: not yet supported. */ - if (STMT_VINFO_LIVE_P (stmt_info)) - { - if (vect_print_dump_info (REPORT_DETAILS)) - fprintf (vect_dump, value used after loop.); - return false; - } - /* Is vectorizable assignment? */ if (TREE_CODE (stmt) != GIMPLE_MODIFY_STMT) return false; --- 3569,3574 *** vectorizable_induction (tree
[Bug tree-optimization/33835] [4.3 Regression] Segfault in vect_is_simple_use
--- Comment #4 from dorit at gcc dot gnu dot org 2007-10-22 04:37 --- I'm testing a patch that would fix both this PR and PR33834 (posted it under the PR33834 entry). By the way, this testcase does not get vectorized with current mainline (an Oct21 snapshot) because the call to cos is not taken out of the inner-loop, although it's invariant; it was taken out of the loop with an older snapshot (Sept10). Another data point - in the testcase in PR33834 the call to cos is taken out of the inner-loop with current snapshot. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33835
[Bug regression/32582] Bootstrap with vectorization enabled fails with ICE on PPC
--- Comment #35 from dorit at gcc dot gnu dot org 2007-10-15 05:52 --- bootstrap with vectorization enabled with your patch applied passes for me on ppc64-linux. thanks!! -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32582
[Bug target/32893] zlib segfault in inflate_table() compiled w/ -O -msse2 ftree-vectorize
--- Comment #16 from dorit at gcc dot gnu dot org 2007-10-03 18:52 --- Ryan, thanks a lot for the info. FYI, I started a discussion about this here: http://gcc.gnu.org/ml/gcc-patches/2007-10/msg00202.html -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32893
[Bug target/32893] zlib segfault in inflate_table() compiled w/ -O -msse2 ftree-vectorize
--- Comment #7 from dorit at gcc dot gnu dot org 2007-09-19 14:28 --- (In reply to comment #6) It looks like zlib compiled w/ -O -msse -ftree-vectorize (built with fedora's rpm package gcc-4.1.2-17) has same problem. In my environment, rpm-4.4.2.1-7.fc8 and seamonkey-1.1.3-6.fc8 segfault like below: Program received signal SIGSEGV, Segmentation fault. 0x003a869d in inflate_table (type=CODES, lens=0x913b5c8, codes=19, table=0x913b5c4, bits=0x913b5ac, work=0x913b848) at inftrees.c:108 108 count[len] = 0; could you please provide a complete (reduced...) testcase that could be used to reproduce this? In the meantime, other things that may help: - could you please try to add __attribute__ ((__aligned__(16))) to the definition of count, as suggested in comment 5? - could you please show the relevant generated assembly up to the offending insn? (with and without the attribute aligned)? could you also check (with gdb) what is the address accessed and what is the address of the stack pointer? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32893
[Bug bootstrap/21335] [meta-bug] bootstrap fails with -ftree-vectorize
--- Comment #7 from dorit at gcc dot gnu dot org 2007-09-14 18:49 --- (In reply to comment #6) I can bootstrap current trunk (r128479) with -ftree-vectorize on x86_64-unknown-linux-gnu for some time now, and, according to http://gcc.gnu.org/ml/gcc-patches/2007-09/msg00327.html, this problem is gone on powerpc64 too. actually the link you give above explicitly states that we don't pass bootstrap with vectorization enabled on powerpc-linux, but there's already a PR for that (PR32582) so it's ok to close this one So closing as fixed. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21335
[Bug tree-optimization/33373] [4.3 Regression] ICE in vectorizable_type_demotion, at tree-vect-transform.c:4098
--- Comment #5 from dorit at gcc dot gnu dot org 2007-09-14 20:53 --- (In reply to comment #4) Very similar testcase with the difference that it is not fixed by r128415 and makes current trunk segfault in VEC_tree_base_pop(): void f (unsigned int *d, unsigned int *s, int w) { int i; for (i = 0; i w; ++i) d [i] = s [i] * (unsigned short) (~d [i] 24); } this should fix it: Index: tree-vect-transform.c === *** tree-vect-transform.c (revision 128501) --- tree-vect-transform.c (working copy) *** vect_get_vec_defs_for_stmt_copy (enum ve *** 1938,1944 vec_oprnd = vect_get_vec_def_for_stmt_copy (dt[0], vec_oprnd); VEC_quick_push (tree, *vec_oprnds0, vec_oprnd); ! if (vec_oprnds1) { vec_oprnd = VEC_pop (tree, *vec_oprnds1); vec_oprnd = vect_get_vec_def_for_stmt_copy (dt[1], vec_oprnd); --- 1938,1944 vec_oprnd = vect_get_vec_def_for_stmt_copy (dt[0], vec_oprnd); VEC_quick_push (tree, *vec_oprnds0, vec_oprnd); ! if (vec_oprnds1 *vec_oprnds1) { vec_oprnd = VEC_pop (tree, *vec_oprnds1); vec_oprnd = vect_get_vec_def_for_stmt_copy (dt[1], vec_oprnd); (and by the way, I think this is a totally different issue than what this PR was originally opened for, and should be a separate PR. I think this regression is due to r128289) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33373
[Bug tree-optimization/33373] ICE in vectorizable_type_demotion, at tree-vect-transform.c:4098
--- Comment #3 from dorit at gcc dot gnu dot org 2007-09-12 07:10 --- Subject: Bug 33373 Author: dorit Date: Wed Sep 12 07:09:38 2007 New Revision: 128415 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=128415 Log: PR tree-optimization/33373 * tree-vect-analyze (vect_determine_vectorization_factor): Call TREE_INT_CST_LOW when comparing TYPE_SIZE_UNIT. Added: trunk/gcc/testsuite/gcc.dg/vect/pr33373.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-vect-analyze.c -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33373
[Bug tree-optimization/33373] ICE in vectorizable_type_demotion, at tree-vect-transform.c:4098
--- Comment #2 from dorit at gcc dot gnu dot org 2007-09-10 09:08 --- Testing this patch (it's a bug in the fix for PR33301. I accidentally treated TYPE_SIZE_UNIT as a constant, whereas it's really a tree...): Index: tree-vect-analyze.c === *** tree-vect-analyze.c (revision 128322) --- tree-vect-analyze.c (working copy) *** vect_determine_vectorization_factor (loo *** 242,252 operation = GIMPLE_STMT_OPERAND (stmt, 1); if (TREE_CODE (operation) == NOP_EXPR || TREE_CODE (operation) == CONVERT_EXPR ! || TREE_CODE (operation) == WIDEN_MULT_EXPR) { tree rhs_type = TREE_TYPE (TREE_OPERAND (operation, 0)); ! if (TYPE_SIZE_UNIT (rhs_type) TYPE_SIZE_UNIT (scalar_type)) ! scalar_type = TREE_TYPE (TREE_OPERAND (operation, 0)); } if (vect_print_dump_info (REPORT_DETAILS)) --- 242,253 operation = GIMPLE_STMT_OPERAND (stmt, 1); if (TREE_CODE (operation) == NOP_EXPR || TREE_CODE (operation) == CONVERT_EXPR ! || TREE_CODE (operation) == WIDEN_MULT_EXPR) { tree rhs_type = TREE_TYPE (TREE_OPERAND (operation, 0)); ! if (TREE_INT_CST_LOW (TYPE_SIZE_UNIT (rhs_type)) ! TREE_INT_CST_LOW (TYPE_SIZE_UNIT (scalar_type))) ! scalar_type = rhs_type; } if (vect_print_dump_info (REPORT_DETAILS)) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33373
[Bug tree-optimization/33301] wrong vectorization factor due to an invariant type-promotion in the loop
--- Comment #1 from dorit at gcc dot gnu dot org 2007-09-08 09:19 --- Subject: Bug 33301 Author: dorit Date: Sat Sep 8 09:19:39 2007 New Revision: 128265 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=128265 Log: PR tree-optimization/33301 * tree-vect-analyze (analyze_operations): Look at the type of the rhs when relevant. Added: trunk/gcc/testsuite/gfortran.dg/vect/pr33301.f Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-vect-analyze.c -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33301
[Bug tree-optimization/33301] wrong vectorization factor due to an invariant type-promotion in the loop
--- Comment #2 from dorit at gcc dot gnu dot org 2007-09-08 09:23 --- fix committed -- dorit at gcc dot gnu dot org changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33301
[Bug tree-optimization/33299] [4.3 Regression] miscompilation with gfortran -O2 -ffast-math -ftree-vectorize
--- Comment #6 from dorit at gcc dot gnu dot org 2007-09-08 09:24 --- fix committed -- dorit at gcc dot gnu dot org changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33299
[Bug tree-optimization/33320] ICE with vectorization in the testsuite during dataref analysis
--- Comment #2 from dorit at gcc dot gnu dot org 2007-09-08 09:42 --- (In reply to comment #1) (In reply to comment #0) When the testcase gcc.dg/tree-ssa/predcom-3.c is compiled with vectorization it ICes when the dataref analysis called from vectorizer: I can't get the compiler (current mainline) to segfault with the compile flags form the description on x86_64-pc-linux-gnu or i686-pc-linux-gnu (with and without -msse2). I can't reproduce it anymore either... I actually opened this PR at least a week after I saw this failure, so maybe it got fixed in the meantime? I guess I'll close it then. -- dorit at gcc dot gnu dot org changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution||WORKSFORME http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33320
[Bug tree-optimization/33299] [4.3 Regression] miscompilation with gfortran -O2 -ffast-math -ftree-vectorize
--- Comment #5 from dorit at gcc dot gnu dot org 2007-09-07 15:00 --- Subject: Bug 33299 Author: dorit Date: Fri Sep 7 15:00:11 2007 New Revision: 128242 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=128242 Log: PR tree-optimization/33299 * tree-vect-transform.c (vect_create_epilog_for_reduction): Update uses for all relevant loop-exit phis, not just the first. Added: trunk/gcc/testsuite/gfortran.dg/vect/fast-math-pr33299.f90 Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/testsuite/gfortran.dg/vect/vect.exp trunk/gcc/tree-vect-transform.c -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33299
[Bug tree-optimization/33319] New: ICE with vectorization in
when the testcase g++.dg/tree-ssa/pr27549.C is compiled with -ftree-vectorize it ICEs with: Unable to coalesce ssa_names 141 and 280 which are marked as MUST COALESCE. s$b_141(ab) and s$b_280(ab) /Develop/mainline-dn/gcc/gcc/testsuite/g++.dg/tree-ssa/pr27549.C: In function âconst char* foo()â: /Develop/mainline-dn/gcc/gcc/testsuite/g++.dg/tree-ssa/pr27549.C:72: internal compiler error: SSA corruption The testcase is vectorized using versioning-for-aliasing, although it is known at compile time that there is a dependence for sure, so there's no point in testing this at runtime (as pointed out here: http://gcc.gnu.org/ml/gcc-patches/2007-08/msg01211.html). With --param vect-max-version-for-alias-checks=0 the testcase doesn't get vectorized and doesn't ICE. So, Disabling versioning-for-aliasing when it's redundant (like in the above case) would avoid the ICE, but we should probably figure out what is really causing the ICE. (this is how the testcase is compiled: /Develop/mainline-dn/build1/gcc/testsuite/g++/../../g++ -B/Develop/mainline-dn/build1/gcc/testsuite/g++/../../ /Develop/mainline-dn/gcc/gcc/testsuite/g++.dg/tree-ssa/pr27549.C -nostdinc++ -I/Develop/mainline-dn/build1/powerpc64-unknown-linux-gnu/libstdc++-v3/include/powerpc64-unknown-linux-gnu -I/Develop/mainline-dn/build1/powerpc64-unknown-linux-gnu/libstdc++-v3/include -I/Develop/mainline-dn/gcc/libstdc++-v3/libsupc++ -I/Develop/mainline-dn/gcc/libstdc++-v3/include/backward -I/Develop/mainline-dn/gcc/libstdc++-v3/testsuite/util -fmessage-length=0 -O2 -S -m64 -O2 -ftree-vectorize -maltivec -fdump-tree-vect-details -o pr27549.s) -- Summary: ICE with vectorization in Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: dorit at gcc dot gnu dot org GCC build triplet: powerpc-linux GCC host triplet: powerpc-linux GCC target triplet: powerpc-linux http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33319
[Bug tree-optimization/33320] New: ICE with vectorization in the testsuite during dataref analysis
When the testcase gcc.dg/tree-ssa/predcom-3.c is compiled with vectorization it ICes when the dataref analysis called from vectorizer: /home/dorit/mainline/build2/gcc/xgcc -B/home/dorit/mainline/build2/gcc/ /home/dorit/mainline/gcc/gcc/testsuite/gcc.dg/tree-ssa/predcom-3.c -O2 -fpredictive-commoning -fdump-tree-pcom-details -fno-show-column -S -O2 -ftree-vectorize -fdump-tree-vect-details -o predcom-3.s /home/dorit/mainline/gcc/gcc/testsuite/gcc.dg/tree-ssa/predcom-3.c: In function âtestâ: /home/dorit/mainline/gcc/gcc/testsuite/gcc.dg/tree-ssa/predcom-3.c:7: internal compiler error: Segmentation fault -- Summary: ICE with vectorization in the testsuite during dataref analysis Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: dorit at gcc dot gnu dot org GCC build triplet: i386-linux GCC host triplet: i386-linux GCC target triplet: i386-linux http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33320
[Bug tree-optimization/33299] [4.3 Regression] miscompilation with gfortran -O2 -ffast-math -ftree-vectorize
--- Comment #2 from dorit at gcc dot gnu dot org 2007-09-04 11:44 --- (In reply to comment #1) Confirmed. It looks like the vectorizer forgets to update the PHI node for stmp_var: yes. I suspect I didn't expect at the time that there would be two loop-closed-ssa-form phi-nodes at the loop exit for s_3, so I probably update just one of them (s_10) and not the other (s_4). This is how it looks before vectorization: bb 7: # s_4 = PHI s_3(3) # s_10 = PHI s_3(3) D.1368_15 = *x_14(D); if (D.1368_15 0.0) goto bb 8; else goto bb 9; bb 8: s_16 = -s_10; bb 9: # s_1 = PHI s_4(7), s_16(8) return s_1; I'll prepare a fix. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33299
[Bug tree-optimization/33299] [4.3 Regression] miscompilation with gfortran -O2 -ffast-math -ftree-vectorize
-- dorit at gcc dot gnu dot org changed: What|Removed |Added AssignedTo|unassigned at gcc dot gnu |dorit at gcc dot gnu dot org |dot org | Status|NEW |ASSIGNED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33299
[Bug tree-optimization/33301] New: wrong vectorization factor due to an invariant type-promotion in the loop
(operation, 0)); + if (TYPE_SIZE_UNIT (rhs_type) TYPE_SIZE_UNIT (scalar_type)) + scalar_type = TREE_TYPE (TREE_OPERAND (operation, 0)); + } + if (vect_print_dump_info (REPORT_DETAILS)) { fprintf (vect_dump, get vectype for scalar type: ); -- Summary: wrong vectorization factor due to an invariant type- promotion in the loop Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: dorit at gcc dot gnu dot org ReportedBy: dorit at gcc dot gnu dot org GCC build triplet: i386-linux GCC host triplet: i386-linux GCC target triplet: i386-linux http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33301
[Bug tree-optimization/33299] [4.3 Regression] miscompilation with gfortran -O2 -ffast-math -ftree-vectorize
--- Comment #3 from dorit at gcc dot gnu dot org 2007-09-04 19:11 --- I'm testing this patch: Index: tree-vect-transform.c === *** tree-vect-transform.c (revision 128037) --- tree-vect-transform.c (working copy) *** vect_create_epilog_for_reduction (tree v *** 1964,1969 --- 1964,1971 tree operation = GIMPLE_STMT_OPERAND (stmt, 1); bool nested_in_vect_loop = false; int op_type; + VEC(tree,heap) *phis = NULL; + int i; if (nested_in_vect_loop_p (loop, stmt)) { *** vect_finalize_reduction: *** 2260,2270 epilog_stmt = build_gimple_modify_stmt (new_dest, expr); new_temp = make_ssa_name (new_dest, epilog_stmt); GIMPLE_STMT_OPERAND (epilog_stmt, 0) = new_temp; - #if 0 - bsi_insert_after (exit_bsi, epilog_stmt, BSI_NEW_STMT); - #else bsi_insert_before (exit_bsi, epilog_stmt, BSI_SAME_STMT); - #endif } --- 2262,2268 *** vect_finalize_reduction: *** 2274,2318 Find the loop-closed-use at the loop exit of the original scalar result. (The reduction result is expected to have two immediate uses - one at the latch block, and one at the loop exit). */ ! exit_phi = NULL; FOR_EACH_IMM_USE_FAST (use_p, imm_iter, scalar_dest) { if (!flow_bb_inside_loop_p (loop, bb_for_stmt (USE_STMT (use_p { exit_phi = USE_STMT (use_p); ! break; } } /* We expect to have found an exit_phi because of loop-closed-ssa form. */ ! gcc_assert (exit_phi); ! if (nested_in_vect_loop) { ! stmt_vec_info stmt_vinfo = vinfo_for_stmt (exit_phi); ! /* FORNOW. Currently not supporting the case that an inner-loop reduction !is not used in the outer-loop (but only outside the outer-loop). */ ! gcc_assert (STMT_VINFO_RELEVANT_P (stmt_vinfo) ! !STMT_VINFO_LIVE_P (stmt_vinfo)); ! ! epilog_stmt = adjustment_def ? epilog_stmt : new_phi; ! STMT_VINFO_VEC_STMT (stmt_vinfo) = epilog_stmt; ! set_stmt_info (get_stmt_ann (epilog_stmt), ! new_stmt_vec_info (epilog_stmt, loop_vinfo)); ! if (vect_print_dump_info (REPORT_DETAILS)) ! { ! fprintf (vect_dump, vector of partial results after inner-loop:); ! print_generic_expr (vect_dump, epilog_stmt, TDF_SLIM); ! } ! return; } - - /* Replace the uses: */ - orig_name = PHI_RESULT (exit_phi); - FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, orig_name) - FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter) - SET_USE (use_p, new_temp); } --- 2272,2313 Find the loop-closed-use at the loop exit of the original scalar result. (The reduction result is expected to have two immediate uses - one at the latch block, and one at the loop exit). */ ! phis = VEC_alloc (tree, heap, 10); FOR_EACH_IMM_USE_FAST (use_p, imm_iter, scalar_dest) { if (!flow_bb_inside_loop_p (loop, bb_for_stmt (USE_STMT (use_p { exit_phi = USE_STMT (use_p); ! VEC_quick_push (tree, phis, exit_phi); } } /* We expect to have found an exit_phi because of loop-closed-ssa form. */ ! gcc_assert (!VEC_empty (tree, phis)); ! for (i = 0; VEC_iterate (tree, phis, i, exit_phi); i++) { ! if (nested_in_vect_loop) ! { ! stmt_vec_info stmt_vinfo = vinfo_for_stmt (exit_phi); ! /* FORNOW. Currently not supporting the case that an inner-loop reduction !is not used in the outer-loop (but only outside the outer-loop). */ ! gcc_assert (STMT_VINFO_RELEVANT_P (stmt_vinfo) ! !STMT_VINFO_LIVE_P (stmt_vinfo)); ! ! epilog_stmt = adjustment_def ? epilog_stmt : new_phi; ! STMT_VINFO_VEC_STMT (stmt_vinfo) = epilog_stmt; ! set_stmt_info (get_stmt_ann (epilog_stmt), ! new_stmt_vec_info (epilog_stmt, loop_vinfo)); ! continue; ! } ! /* Replace the uses: */ ! orig_name = PHI_RESULT (exit_phi); ! FOR_EACH_IMM_USE_STMT (use_stmt, imm_iter, orig_name) ! FOR_EACH_IMM_USE_ON_STMT (use_p, imm_iter) ! SET_USE (use_p, new_temp); } } -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33299
[Bug tree-optimization/33299] [4.3 Regression] miscompilation with gfortran -O2 -ffast-math -ftree-vectorize
--- Comment #4 from dorit at gcc dot gnu dot org 2007-09-04 19:14 --- (by the way, fast-math should not be required here, but that's a different bug... will fix that soonish) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33299
[Bug tree-optimization/33245] Missed opportunities for vectorization due to invariant condition
--- Comment #1 from dorit at gcc dot gnu dot org 2007-08-31 13:39 --- (In reply to comment #0) The innermost loop in j cannot be vectorized because of the irregular code in that loop, i.e. the condition IF ( l.NE.k ). But the cond expression is invariant in that loop, so the whole condition can be hoisted outside that loop, versioning the loop, and potentially allowing the vectorization of the innermost loop. if you use -O3 the condition *is* taken out of the loop by loop-unswitch (at least that's what I see with revision 127623). SUBROUTINE DGEFA(A,Lda,N,Ipvt,Info) INTEGER Lda , N , Ipvt(*) , Info DOUBLE PRECISION A(Lda,*) DOUBLE PRECISION t INTEGER IDAMAX , j , k , kp1 , l , nm1 Info = 0 nm1 = N - 1 IF ( nm1.GE.1 ) THEN DO k = 1 , nm1 kp1 = k + 1 l = IDAMAX(N-k+1,A(k,k),1) + k - 1 Ipvt(k) = l IF ( A(l,k).EQ.0.0D0 ) THEN Info = k ELSE IF ( l.NE.k ) THEN t = A(l,k) A(l,k) = A(k,k) A(k,k) = t ENDIF t = -1.0D0/A(k,k) CALL DSCAL(N-k,t,A(k+1,k),1) DO j = kp1 , N t = A(l,j) IF ( l.NE.k ) THEN A(l,j) = A(k,j) A(k,j) = t ENDIF CALL DAXPY(N-k,t,A(k+1,k),1,A(k+1,j),1) ENDDO ENDIF ENDDO ENDIF Ipvt(N) = N IF ( A(N,N).EQ.0.0D0 ) Info = N CONTINUE END The result of the vectorizer on this testcase is: /home/seb/ex/linpk.f90:24: note: not vectorized: too many BBs in loop. /home/seb/ex/linpk.f90:24: note: bad loop form. /home/seb/ex/linpk.f90:1: note: vectorized 0 loops in function. Okay, if I'm versioning that loop by hand, I get the same error due to the PRE as for capacita.f90: the PRE inserts in the loop-latch block some code: bb 11: # VUSE PARM_NOALIAS.16_252 { PARM_NOALIAS.16 } pretmp.47_297 = *n_13(D); goto bb 10; Looks like -fno-tree-pre is not enough, because if PRE doesn't do it, then sink does it. When I use -O3 -ftree-vectorize -msse2 -fno-tree-pre -fno-tree-sink I get the dataref problem you report below, without manual modifications to the code And with PRE disabled, the fail occurs in the data ref analysis: ./linpk_corrected.f90:26: note: not vectorized: data ref analysis failed t.8_70 = (*a_25(D))[D.1406_69] ./linpk_corrected.f90:26: note: bad data references. Just for the record, this is the dataref problem that the dataref analyzer reports: Creating dr for t analyze_innermost: (analyze_scalar_evolution (loop_nb = 3) (scalar = t) (get_scalar_evolution (scalar = t) (scalar_evolution = )) ) success. base_address: t offset from base address: 0 constant offset from base address: 0 step: 0 aligned to: 128 base_object: t symbol tag: t FAILED as dr address is invariant -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33245
[Bug tree-optimization/33246] Missed opportunities for vectorization due to data ref analysis
--- Comment #3 from dorit at gcc dot gnu dot org 2007-08-31 13:57 --- ... This is due to data ref analysis problems: ./fatigue.f90:14: note: not vectorized: data ref analysis failed (*stress_tensor.0_16)[D.1508_168] = D.1513_173 ./fatigue.f90:14: note: bad data references. and ./fatigue.f90:14: note: not vectorized: data ref analysis failed D.1489_133 = (*strain_tensor.0_41)[D.1488_132] ./fatigue.f90:14: note: bad data references. The data-ref analyzer reports: failed: evolution of offset is not affine. As a result, the DR fields that represent the access relative to the inner-most loop are almost all empty: base_address: offset from base address: constant offset from base address: step: aligned to: base_object: (*(real8[0:D.1433] *) D.1437_15)[0] symbol tag: SMT.79 However note that the DR fields relative to the outer-loop are computable: outer base_address: A.23 outer offset from base address: 0 outer constant offset from base address: 0 outer step: 24 outer aligned to: 128 If the data-ref analyzer can return the expression for the evolution in the inner-loop, instead of failing, we would at least have a chance to do outer-loop vectorization. This is a duplicate of PR33113. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33246
[Bug tree-optimization/33245] Missed opportunities for vectorization due to invariant condition
--- Comment #3 from dorit at gcc dot gnu dot org 2007-08-31 14:18 --- (In reply to comment #2) Subject: Re: Missed opportunities for vectorization due to invariant condition Looks like -fno-tree-pre is not enough, because if PRE doesn't do it, then sink does it. When I use -O3 -ftree-vectorize -msse2 -fno-tree-pre -fno-tree-sink I get the dataref problem you report below, without manual modifications to the code Apparently this is sink is triggered on -O3, Daniel also warned yesterday about the fact that it's not PRE specific. Actually, can't we move that code back in the loop body when the vectorizer detects that code in the latch bb? here's a related discussion from a couple years ago: http://gcc.gnu.org/ml/gcc-patches/2005-11/msg02045.html (and also a somewhat related PR - PR28643) I'm thinking that it is not really difficult to consider these scalars as arrays with a single element, and then just pass these to the rest of data deps. I'll try to figure out a patch for this problem that would bring us more vectorized cases. great. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33245
[Bug rtl-optimization/33224] failing rtl iv analysis (maybe due to df)
--- Comment #3 from dorit at gcc dot gnu dot org 2007-08-30 08:12 --- (In reply to comment #2) I suspect this might be due to not updating the rd information after unrolling. Can you check if analyze_insns_in_loop() (which calls df_analyze()) is being called just before the problematic unrolling ? it looks like it's called just before the unroller actually transforms somthing, but not before the (failing) analysis. But when I add a call to it in decide_peel_completely the analysis still fails. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33224
[Bug tree-optimization/33243] Missed opportunities for vectorization due to unhandled real_type
--- Comment #2 from dorit at gcc dot gnu dot org 2007-08-30 10:12 --- There are two time consuming routines in air.f90 of the Polyhedron benchmark that are not vectorized: lines 1328 and 1354. These appear in the top counting of execution time with oprofile: SUBROUTINE DERIVY(D,U,Uy,Al,Np,Nd,M) IMPLICIT REAL*8(A-H,O-Z) PARAMETER (NX=150,NY=150) DIMENSION D(NY,33) , U(NX,NY) , Uy(NX,NY) , Al(30) , Np(30) DO jm = 1 , M jmax = 0 jmin = 1 DO i = 1 , Nd jmax = jmax + Np(i) + 1 DO j = jmin , jmax uyt = 0. DO k = 0 , Np(i) uyt = uyt + D(j,k+1)*U(jm,jmin+k) ENDDO Uy(jm,j) = uyt*Al(i) ENDDO jmin = jmin + Np(i) + 1 ENDDO ENDDO CONTINUE END ./poly_air_1354.f90:12: note: def_stmt: uyt_1 = PHI 0.0(9), uyt_42(11) ./poly_air_1354.f90:12: note: Unsupported pattern. ./poly_air_1354.f90:12: note: not vectorized: unsupported use in stmt. ./poly_air_1354.f90:12: note: unexpected pattern. ./poly_air_1354.f90:1: note: vectorized 0 loops in function. This is due to an unsupported type, real_type, for the reduction variable uyt: (this is on an i686-linux machine) There is no unhandled real_type problem, you just need to use -ffast-math to allow vectorization of summation of fp types (or the new reassociation flag): pr33243b.f90:12: note: Analyze phi: uyt_1 = PHI 0.0(9), uyt_42(11) pr33243b.f90:12: note: reduction: unsafe fp math optimization: D.1386_41 + uyt_1 pr33243b.f90:12: note: Unknown def-use cycle pattern. If you use -ffast-math the reduction is detected: pr33243b.f90:12: note: Analyze phi: uyt_1 = PHI 0.0(9), uyt_42(11) pr33243b.f90:12: note: detected reduction:D.1386_41 + uyt_1 pr33243b.f90:12: note: Detected reduction. However, the loop will still not get vectorized because there is a non-consecutive access in the loop: pr33243b.f90:12: note: === vect_analyze_data_ref_accesses === pr33243b.f90:12: note: not consecutive access pr33243b.f90:12: note: not vectorized: complicated access pattern. This is because the stride of the accesses to D(j,k+1) and U(jm,jmin+k) in the inner-loop (k-loop) between inner-loop iterations is 1200B: DO j = jmin , jmax uyt = 0. DO k = 0 , NP(i) uyt = uyt + D(j,k+1)*U(jm,jmin+k) ENDDO Uy(jm,j) = uyt*Al(i) ENDDO In the outer-loop (j-loop) these accesses are consecutive, and also you don't need to use the -ffast-math flag. However there are other problems: 1) the compiler creates a guard to control whether to enter the inner-loop or not (cause it may execute 0 times). This creates a more involved control-flow than the outer-loop vectorizer is willing to work with. A solution would be to create this guard outside the outer-loop (in case it is invariant, as is the case here), which is like versioning the loop (or unswichting the loop). 2) if you change the loop count to something constant (just to bypass the above problem), then indeed no guard code is generated, but there is a computation (advancing an iv) in the latch block of the outer-loop (so it is not empty, and we are not willing to work with such loops). We need to clean that away. 3) After these problems are solved, we still need to deal with a non-consecutive access in the outer-loop - the store to Uy(jm,j). AFAICS, this requires either transposing the Uy array in advance, or teaching the vectorizer to scatter the results to the non-adjacent locations (which would be quite expensive, but we could give it a try). Alternatively, vectorizing the inner-loop would require transposing the D and U matrices. Another option is to interchange the jm loop with the j loop - I think this way all accesses would be consecutive, and we could vectorize the jm loop (which would now be a doubly-nested loop that the outer-loop vectorizer could handle). So, the PR for this testcase would be better classified under one of the above problems/missed-optimizations rather than unhandled real_type. Another similar routine that also appears in the top ranked and not vectorized due to the same unsupported real_type reasons is in air.f90:1181 SUBROUTINE FVSPLTX2 IMPLICIT REAL*8(A-H,O-Z) PARAMETER (NX=150,NY=150) DIMENSION DX(NX,33) , ALX(30) , NPX(30) DIMENSION FP1(NX,NY) , FM1(NX,NY) , FP1x(30,NX) , FM1x(30,NX) DIMENSION FP2(NX,NY) , FM2(NX,NY) , FP2x(30,NX) , FM2x(30,NX) DIMENSION FP3(NX,NY) , FM3(NX,NY) , FP3x(30,NX) , FM3x(30,NX) DIMENSION FP4(NX,NY) , FM4(NX,NY) , FP4x(30,NX) , FM4x(30,NX) DIMENSION FV2(NX,NY) , DXP2(30,NX) , DXM2(30,NX) DIMENSION FV3(NX,NY) , DXP3(30,NX) , DXM3(30,NX) DIMENSION FV4(NX,NY) , DXP4(30,NX) , DXM4(30,NX) COMMON /XD1 / FP1 , FM1 , FP2 , FM2 , FP3 , FM3 , FP4 , FM4
[Bug rtl-optimization/33224] failing rtl iv analysis (maybe due to df)
--- Comment #5 from dorit at gcc dot gnu dot org 2007-08-30 16:29 --- dorit, i am having trouble exactly reproducing this example because you did not give the svn revision and so all of the numbers are a little bit different. it's revision 127623 However, I am going to submit a patch which improves the dump information a lot for these passes and we should talk about it after we can get on the same page. I applied your patch, and I'll send you the dump shorlty. However, from looking at your posting, there are some issues that you may want to look at before we talk: The reaching defs problem makes a scan for all of the defs in the blocks in the region. Once all of the defs are found, they are sorted where the primary key is the regno. The id's (DF_REF_ID) are then assigned based on this sorting. The reaching defs problem actually depends on all of the defs for a regno to be contigious. The DF_REF_IDs are not stable between calls to df_set_blocks and any def outside of the region has an undefined DF_REF_ID. In your posting you have: Below is the output of df_ref_debug for adef in each iteration of the loop in latch_dominating_def: d40 reg 187 bb 3 insn 255 flag 0x0 type 0x0 loc 0xf7da4608(0xf7d9a4e0) chain { } d93 reg 187 bb 2 insn 40 flag 0x0 type 0x0 loc 0xf7d89cc8(0xf7d9a4e0) chain { } The number after the first d is the DF_REF_ID. Note that they are not contiguous. Given the sorting that occurred, they must be contiguous. I assume from this that someone is holding on to old id's. This is not correct. If you are going to play the game with df_set_blocks, you are allowed to hold onto a def, but not the DF_REF_ID, you cannot look at the DF_REF_ID for a def that is not in the blocks set by df_set_blocks. are you saying it's safer not to call df_set_blocks in iv_analysis_loop_init? (iv-analysis still fails when I do that, but maybe that in turn requires other changes?) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33224
[Bug rtl-optimization/33222] New: failing rtl iv analysis (maybe due to df)
In the testcase below, after the inner-loop gets completely unrolled, the enclosing i-loop does not get unrolled because of failure to analyze the loop iv, possibly due to a bug in df: #define N 40 #define M 10 float in[N+M], coeff[M], out[N]; void fir (){ int i,j,k; float diff; for (i = 0; i N; i++) { diff = 0; for (j = 0; j M; j++) { diff += in[j+i]*coeff[j]; } out[i] = diff; } } Compiler options used: /Develop/mainline-dn1/bin/gcc -O3 -maltivec -funroll-loops vect-outer-fir2-kernel.c -S --param max-completely-peeled-insns=5000 --param max-completely-peel-times=40 -fdump-tree-all -da -ftree-vectorize (without -ftree-vectorize the i-loop does get unrolled). Detailed description and discussion here: http://gcc.gnu.org/ml/gcc/2007-08/msg00482.html Here are the relevant pieces from the RTL dump (at loop3_unroll): bb2: (insn 40 39 41 2 vect-outer-fir2-kernel.c:38 (set (reg:DI 187 [ ivtmp.59 ]) (mem/u/c:DI (plus:DI (reg:DI 2 2) (const:DI (minus:DI (symbol_ref/u:DI (*.LC4) [flags 0x2]) (symbol_ref:DI (*.LCTOC1) [7 S8 A8])) 344 {*movdi_internal64} (expr_list:REG_EQUAL (symbol_ref:DI (fir_out) [flags 0x80] var_decl 0xf7d571c0 fir_out) (nil))) ... (insn 289 288 68 2 (set (reg/f:DI 319) (plus:DI (reg:DI 187 [ ivtmp.59 ]) (const_int 160 [0xa0]))) 80 {*adddi3_internal1} (expr_list:REG_DEAD (reg:DI 2 2) (expr_list:REG_EQUAL (const:DI (plus:DI (symbol_ref:DI (fir_out) [flags 0x80] var_decl 0xf7d571c0 fir_out) (const_int 160 [0xa0]))) (nil ... loop: bb3 (loop-header): ... (insn 255 254 256 3 vect-outer-fir2-kernel.c:47 (set (reg:DI 187 [ ivtmp.59 ]) (plus:DI (reg:DI 187 [ ivtmp.59 ]) (const_int 16 [0x10]))) 80 {*adddi3_internal1} (nil)) ... (insn 265 263 266 3 vect-outer-fir2-kernel.c:47 (set (reg:CC 316) (compare:CC (reg:DI 187 [ ivtmp.59 ]) (reg/f:DI 319))) 459 {*cmpdi_internal1} (expr_list:REG_EQUAL (compare:CC (reg:DI 187 [ ivtmp.59 ]) (const:DI (plus:DI (symbol_ref:DI (fir_out) [flags 0x80] var_decl 0xf7d571c0 fir_out) (const_int 160 [0xa0] (nil))) Below is the output of df_ref_debug for adef in each iteration of the loop in latch_dominating_def: d40 reg 187 bb 3 insn 255 flag 0x0 type 0x0 loc 0xf7da4608(0xf7d9a4e0) chain { } d93 reg 187 bb 2 insn 40 flag 0x0 type 0x0 loc 0xf7d89cc8(0xf7d9a4e0) chain { } For both the bitmap is set. -- Summary: failing rtl iv analysis (maybe due to df) Product: gcc Version: 4.3.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: rtl-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: dorit at gcc dot gnu dot org GCC build triplet: powerpc64-linux GCC host triplet: powerpc64-linux GCC target triplet: powerpc64-linux http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33222
[Bug rtl-optimization/33224] New: failing rtl iv analysis (maybe due to df)
In the testcase below, after the inner-loop gets completely unrolled, the enclosing i-loop does not get unrolled because of failure to analyze the loop iv, possibly due to a bug in df: #define N 40 #define M 10 float in[N+M], coeff[M], out[N]; void fir (){ int i,j,k; float diff; for (i = 0; i N; i++) { diff = 0; for (j = 0; j M; j++) { diff += in[j+i]*coeff[j]; } out[i] = diff; } } Compiler options used: /Develop/mainline-dn1/bin/gcc -O3 -maltivec -funroll-loops vect-outer-fir2-kernel.c -S --param max-completely-peeled-insns=5000 --param max-completely-peel-times=40 -fdump-tree-all -da -ftree-vectorize (without -ftree-vectorize the i-loop does get unrolled). Detailed description and discussion here: http://gcc.gnu.org/ml/gcc/2007-08/msg00482.html Here are the relevant pieces from the RTL dump (at loop3_unroll): bb2: (insn 40 39 41 2 vect-outer-fir2-kernel.c:38 (set (reg:DI 187 [ ivtmp.59 ]) (mem/u/c:DI (plus:DI (reg:DI 2 2) (const:DI (minus:DI (symbol_ref/u:DI (*.LC4) [flags 0x2]) (symbol_ref:DI (*.LCTOC1) [7 S8 A8])) 344 {*movdi_internal64} (expr_list:REG_EQUAL (symbol_ref:DI (fir_out) [flags 0x80] var_decl 0xf7d571c0 fir_out) (nil))) ... (insn 289 288 68 2 (set (reg/f:DI 319) (plus:DI (reg:DI 187 [ ivtmp.59 ]) (const_int 160 [0xa0]))) 80 {*adddi3_internal1} (expr_list:REG_DEAD (reg:DI 2 2) (expr_list:REG_EQUAL (const:DI (plus:DI (symbol_ref:DI (fir_out) [flags 0x80] var_decl 0xf7d571c0 fir_out) (const_int 160 [0xa0]))) (nil ... loop: bb3 (loop-header): ... (insn 255 254 256 3 vect-outer-fir2-kernel.c:47 (set (reg:DI 187 [ ivtmp.59 ]) (plus:DI (reg:DI 187 [ ivtmp.59 ]) (const_int 16 [0x10]))) 80 {*adddi3_internal1} (nil)) ... (insn 265 263 266 3 vect-outer-fir2-kernel.c:47 (set (reg:CC 316) (compare:CC (reg:DI 187 [ ivtmp.59 ]) (reg/f:DI 319))) 459 {*cmpdi_internal1} (expr_list:REG_EQUAL (compare:CC (reg:DI 187 [ ivtmp.59 ]) (const:DI (plus:DI (symbol_ref:DI (fir_out) [flags 0x80] var_decl 0xf7d571c0 fir_out) (const_int 160 [0xa0] (nil))) Below is the output of df_ref_debug for adef in each iteration of the loop in latch_dominating_def: d40 reg 187 bb 3 insn 255 flag 0x0 type 0x0 loc 0xf7da4608(0xf7d9a4e0) chain { } d93 reg 187 bb 2 insn 40 flag 0x0 type 0x0 loc 0xf7d89cc8(0xf7d9a4e0) chain { } For both the bitmap is set. -- Summary: failing rtl iv analysis (maybe due to df) Product: gcc Version: 4.3.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: rtl-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: dorit at gcc dot gnu dot org GCC build triplet: powerpc64-linux GCC host triplet: powerpc64-linux GCC target triplet: powerpc64-linux http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33224
[Bug rtl-optimization/33224] failing rtl iv analysis (maybe due to df)
--- Comment #1 from dorit at gcc dot gnu dot org 2007-08-29 09:04 --- In the testcase below, after the inner-loop gets completely unrolled, the enclosing i-loop does not get unrolled because of failure to analyze the loop iv, possibly due to a bug in df: ... Compiler options used: /Develop/mainline-dn1/bin/gcc -O3 -maltivec -funroll-loops vect-outer-fir2-kernel.c -S --param max-completely-peeled-insns=5000 --param max-completely-peel-times=40 -fdump-tree-all -da -ftree-vectorize (without -ftree-vectorize the i-loop does get unrolled). (it could be ofcourse a result of something the vectorizer does. like, maybe the vectorizer is not updating the dominance information correctly or something. but I'd think most such information would be recomputed and verified between vectorization and rtl unrolling? anyhow, verify_dominance seem to pass). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33224
[Bug rtl-optimization/33222] failing rtl iv analysis (maybe due to df)
--- Comment #1 from dorit at gcc dot gnu dot org 2007-08-29 09:08 --- I accidentally entered this bug twice. I'm closing this one, and will use PR33224 instead. -- dorit at gcc dot gnu dot org changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33222
[Bug rtl-optimization/33224] failing rtl iv analysis (maybe due to df)
-- dorit at gcc dot gnu dot org changed: What|Removed |Added AssignedTo|unassigned at gcc dot gnu |dorit at gcc dot gnu dot org |dot org | Status|UNCONFIRMED |ASSIGNED Ever Confirmed|0 |1 Last reconfirmed|-00-00 00:00:00 |2007-08-29 09:13:05 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33224
[Bug target/28629] [4.1] Segfault with --march=pentium-m -O2 when compiling faac
--- Comment #10 from dorit at gcc dot gnu dot org 2007-08-26 07:49 --- (In reply to comment #9) I've confirmed that the problem is caused by '-ftree-vectorize' passed to compile gcc. More precisely, a 'movdqa' instruction in constraint_operands() accessed an unaligned memory. since this is reported to work on 4.2 and 4.3, I wonder if it's related to the fix for PR25413 (which was committed to 4.2 and 4.3). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28629
[Bug tree-optimization/33113] New: Failing to represent the stride of a dataref when it is not a constant
In the following testcase: subroutine sub(aa,bb,n,m) implicit none integer, intent(in) :: n,m real, intent(inout) :: aa(n,m) real, intent(in):: bb(n,m) integer :: i,j do i = 1,m do j= 2,n aa(i,j)= aa(i,j-1)+bb(i,j-1) enddo enddo end subroutine end The stride of the accesses in the inner-loop is a parameter (m is not a compile-time known constant). As a result the data dataref analyzer reports: failed: evolution of offset is not affine ... base_address: offset from base address: constant offset from base address: step: aligned to: base_object: (*aa_54(D))[0] symbol tag: SMT.25 Any chance that the dataref analysis can return an (invariant) expression in step, so that further analysis could continue? (for example, the access in the outer-loop is consecutive, so if we had an expression to represent the inner-loop stride, we could vectorize the outer-loop). -- Summary: Failing to represent the stride of a dataref when it is not a constant Product: gcc Version: 4.3.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: dorit at gcc dot gnu dot org GCC build triplet: powerpc64-linux GCC host triplet: powerpc64-linux GCC target triplet: powerpc64-linux http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33113
[Bug tree-optimization/32378] can't determine dependence (distinct sections of an array)
--- Comment #6 from dorit at gcc dot gnu dot org 2007-08-19 13:47 --- Sebastian - any thughts/plans? Here's another testcase: subroutine sub(aa,bb,n,m) implicit none integer, intent(in) :: n,m real, intent(inout) :: aa(n,m) real, intent(in):: bb(n,m) integer :: i,j do j= 2,n do i = 1,m aa(i,j)= aa(i,j-1)+bb(i,j-1) enddo enddo end subroutine end Here too we get: (compute_affine_dependence (stmt_a = D.1385_55 = (*aa_54(D))[D.1384_53]) (stmt_b = (*aa_54(D))[D.1380_49] = D.1390_62) (subscript_dependence_tester (analyze_overlapping_iterations (chrec_a = {pretmp.34_76 + 1, +, 1}_2) (chrec_b = {pretmp.34_32 + 1, +, 1}_2) (analyze_siv_subscript siv test failed: unimplemented. ) (overlap_iterations_a = not known ) (overlap_iterations_b = not known ) ) (dependence classified: scev_not_known) ) ) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32378
[Bug tree-optimization/33113] Failing to represent the stride (with array) of a dataref when it is not a constant
--- Comment #2 from dorit at gcc dot gnu dot org 2007-08-20 05:55 --- Making us return symbolic stride would not be hard. The problem is that data dependence analysis would fail anyway, sometimes (not in this testcases) there won't be a need for dependence testing - e.g. a reduction computation where there are no stores, or initialization with a constant (i.e. a store and no loads), so there's already a value in doing this. since we cannot tell whether n is zero. can we do the data-dependence analysis conditioned on a maybe_zero (like the number-of-iterations analysis)? (by the way, I was told that ifort vectorizes this. I think we'd need loop reversal to vectorize the inner-loop though. on top of overcoming the unknown-stride issue in the DR and DDR analysis) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33113
[Bug tree-optimization/25621] Missed optimization when unrolling the loop (splitting up the sum) (only with -ffast-math)
--- Comment #9 from dorit at gcc dot gnu dot org 2007-08-14 20:17 --- PR32824 discusses a similar issue. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25621
[Bug tree-optimization/32824] Missed reduction vectorizer after store to global is LIM'd
--- Comment #5 from dorit at gcc dot gnu dot org 2007-08-14 20:47 --- Additional testcases: (1) see loop in lines 23 and 32 in http://gcc.gnu.org/ml/gcc-help/2007-08/msg00171.html (2) SUBROUTINE SUSCEP(L,Iz) IMPLICIT NONE INTEGER L , Iz(L,L) , iznum, ix, iy iznum = 0 DO ix = 1 , L DO iy = 1 , L iznum = iznum + Iz(iy,ix) ENDDO ENDDO PRINT* iznum END subroutine end The above is a slightly modified testcase taken from Polyhedron test suite (ac.f90). We get: b.f90:6: note: Analyze phi: iznum_lsm.74_31 = PHI iznum_lsm.74_32(4), iznum_lsm.74_12(6) b.f90:6: note: reduction: not commutative/associative: iznum.10_37 tobias2b.f90:6: note: Unknown def-use cycle pattern. ... b.f90:6: note: worklist: examine stmt: iznum.9_36 = iznum_lsm.74_31 b.f90:6: note: vect_is_simple_use: operand iznum_lsm.74_31 b.f90:6: note: def_stmt: iznum_lsm.74_31 = PHI iznum_lsm.74_32(4), iznum_lsm.74_12(6) b.f90:6: note: Unsupported pattern. b.f90:6: note: not vectorized: unsupported use in stmt. 2b.f90:6: note: unexpected pattern. This happens because we get the following pattern: # iznum_lsm.74_31 = PHI iznum_lsm.74_32(4), iznum_lsm.74_12(6) ... iznum.9_36 = iznum_lsm.74_31; iznum.10_37 = D.1420_35 + iznum.9_36; iznum_lsm.74_12 = iznum.10_37; ... -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32824
[Bug regression/32582] Bootstrap with vectorization enabled fails with ICE on PPC
--- Comment #24 from dorit at gcc dot gnu dot org 2007-08-01 10:08 --- I do; however, I got stuck with another bootstrap problem at the moment (vectorization changes alignment of variables, which causes a misscompilation of crtend.o on my machine; I wonder if this is related to PR32893? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32582
[Bug target/32893] zlib segfault in inflate_table() compiled w/ -O -msse2 ftree-vectorize
--- Comment #5 from dorit at gcc dot gnu dot org 2007-08-01 11:57 --- Ryan, I wonder what happens if you force alignment in the source code, like so: unsigned short count[MAXBITS+1] __attribute__ ((__aligned__(16))) ; In this case the vectorizer does not change the alignment of the array. I wonder if the compiler honors the alignment attribute when the user asks for it, rather than the vectorizer. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32893
[Bug target/32893] zlib segfault in inflate_table() compiled w/ -O -msse2 ftree-vectorize
--- Comment #4 from dorit at gcc dot gnu dot org 2007-08-01 11:36 --- Also just for the record - the testcase for this PR is here: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25413#c14 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32893
[Bug regression/32582] Bootstrap with vectorization enabled fails with ICE on PPC
--- Comment #8 from dorit at gcc dot gnu dot org 2007-07-28 19:20 --- v0 (and v10 are scratch registers and not saved. so does it look like a register allocation bug then? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32582
[Bug target/32893] zlib segfault in inflate_table() compiled w/ -O -msse2 ftree-vectorize
--- Comment #3 from dorit at gcc dot gnu dot org 2007-07-28 21:03 --- (In reply to comment #2) Andrew, makes sense to you? I think my patch only checks PREFERRED_STACK_BOUNDARY and not STACK_BOUNDARY which is why it does not work but I have not looked into it at all. I see references in the patch to both PREFERRED_STACK_BOUNDARY and STACK_BOUNDARY. Could you please check which of these needs to be fixed? (cause I think your fix is the more desirable one). (just for the record, the link to the patch in question is here: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25413#c21) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32893
[Bug target/25413] wrong alignment or incorrect address computation in vectorized code on Pentium 4 SSE
--- Comment #17 from dorit at gcc dot gnu dot org 2007-07-25 08:40 --- This looks like an unrelated problem - the vectorizer does not perform loop peeling here so it's not an issue of natural alignment. Lets open a separate PR for this one, unless there's already one open. In the meantime, would you please try this patch?: Index: tree-vectorizer.c === *** tree-vectorizer.c (revision 126902) --- tree-vectorizer.c (working copy) *** vect_can_force_dr_alignment_p (tree decl *** 1527,1533 PREFERRED_STACK_BOUNDARY is honored by all translation units. However, until someone implements forced stack alignment, SSE isn't really usable without this. */ ! return (alignment = PREFERRED_STACK_BOUNDARY); } --- 1527,1533 PREFERRED_STACK_BOUNDARY is honored by all translation units. However, until someone implements forced stack alignment, SSE isn't really usable without this. */ ! return (alignment = STACK_BOUNDARY); } -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25413
[Bug target/25413] wrong alignment or incorrect address computation in vectorized code on Pentium 4 SSE
--- Comment #18 from dorit at gcc dot gnu dot org 2007-07-25 08:51 --- Subject: Bug 25413 Author: dorit Date: Wed Jul 25 08:51:12 2007 New Revision: 126904 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=126904 Log: 2007-07-25 Dorit Nuzman [EMAIL PROTECTED] Devang Patel [EMAIL PROTECTED] PR tree-optimization/25413 * targhooks.c (default_builtin_vector_alignment_reachable): New. * targhooks.h (default_builtin_vector_alignment_reachable): New. * tree.h (contains_packed_reference): New. * expr.c (contains_packed_reference): New. * tree-vect-analyze.c (vector_alignment_reachable_p): New. (vect_enhance_data_refs_alignment): Call vector_alignment_reachable_p. * target.h (vector_alignment_reachable): New builtin. * target-def.h (TARGET_VECTOR_ALIGNMENT_REACHABLE): New. * config/rs6000/rs6000.c (rs6000_vector_alignment_reachable): New. (TARGET_VECTOR_ALIGNMENT_REACHABLE): Define. 2007-07-25 Dorit Nuzman [EMAIL PROTECTED] Devang Patel [EMAIL PROTECTED] Uros Bizjak [EMAIL PROTECTED] PR tree-optimization/25413 * lib/target-supports.exp (check_effective_target_vect_aligned_arrays): New procedure to check if arrays are naturally aligned to the vector alignment boundary. * gcc.dg/vect/vect-align-1.c: New. * gcc.dg/vect/vect-align-2.c: New. * gcc.dg/vect/pr25413.c: New. * gcc.dg/vect/pr25413a.c: New. Added: branches/gcc-4_2-branch/gcc/testsuite/gcc.dg/vect/pr25413.c branches/gcc-4_2-branch/gcc/testsuite/gcc.dg/vect/pr25413a.c branches/gcc-4_2-branch/gcc/testsuite/gcc.dg/vect/vect-align-1.c branches/gcc-4_2-branch/gcc/testsuite/gcc.dg/vect/vect-align-2.c Modified: branches/gcc-4_2-branch/gcc/ChangeLog branches/gcc-4_2-branch/gcc/config/rs6000/rs6000.c branches/gcc-4_2-branch/gcc/expr.c branches/gcc-4_2-branch/gcc/target-def.h branches/gcc-4_2-branch/gcc/target.h branches/gcc-4_2-branch/gcc/targhooks.c branches/gcc-4_2-branch/gcc/targhooks.h branches/gcc-4_2-branch/gcc/testsuite/ChangeLog branches/gcc-4_2-branch/gcc/testsuite/lib/target-supports.exp branches/gcc-4_2-branch/gcc/tree-vect-analyze.c branches/gcc-4_2-branch/gcc/tree.h -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25413
[Bug target/25413] wrong alignment or incorrect address computation in vectorized code on Pentium 4 SSE
--- Comment #19 from dorit at gcc dot gnu dot org 2007-07-25 08:52 --- problem fixed. -- dorit at gcc dot gnu dot org changed: What|Removed |Added Status|NEW |RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25413
[Bug target/25413] wrong alignment or incorrect address computation in vectorized code on Pentium 4 SSE
--- Comment #21 from dorit at gcc dot gnu dot org 2007-07-25 11:11 --- Of course after my patch for PR 16660, the patch here should be changed to just return true always. In this case, Ryan, could you please also try to see if Andrew's patch (http://gcc.gnu.org/ml/gcc-patches/2007-07/msg00177.html) fixes the problem? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25413
[Bug target/32893] zlib segfault in inflate_table() compiled w/ -O -msse2 ftree-vectorize
--- Comment #1 from dorit at gcc dot gnu dot org 2007-07-25 20:43 --- thanks a lot for checking both patches! With this patch zlib appears to compile successfully. The loop is vectorized with an alignment of access forced using peeling note and linked apps no longer segfault. I'd like to try to verify if the problem is indeed related to the STACK_BOUNDARY, or whether this has to do with some weird interplay with the compilation of some other function, possibly after inlining (i.e. something like what we had in PR27770). I'm not sure how to suggest to check that... I also tested using Andrew's patch from bug #16660 and always returning true in vect_can_force_dr_alignment_p but it does not fix this error. Andrew, makes sense to you? Let me know if I can provide any other info that would be useful to you. thanks, I'll think about it... -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32893
[Bug bootstrap/31776] Bootstrap fails with error: conflicting types for strsignal
--- Comment #8 from dorit at gcc dot gnu dot org 2007-07-24 07:50 --- Subject: Bug 31776 Author: dorit Date: Tue Jul 24 07:50:10 2007 New Revision: 126868 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=126868 Log: 2007-07-23 Dorit Nuzman [EMAIL PROTECTED] merge revision 124373 from trunk: 2007-05-02 Brooks Moses [EMAIL PROTECTED] PR bootstrap/31776 * system.h: Remove inclusion of double-int.h * tree.h: Include double-int.h * gengtype.c: Likewise * cfgloop.h: Likewise * Makefile.in: Adjust dependencies on double-int.h Modified: branches/autovect-branch/ (props changed) branches/autovect-branch/gcc/ChangeLog.autovect branches/autovect-branch/gcc/Makefile.in branches/autovect-branch/gcc/cfgloop.h branches/autovect-branch/gcc/gengtype.c branches/autovect-branch/gcc/system.h branches/autovect-branch/gcc/tree.h Propchange: branches/autovect-branch/ ('svnmerge-integrated' modified) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31776
[Bug tree-optimization/32093] BOOT_CFLAGS=-O2 -g -msse2 -ftree-vectorize causes dfp tests to fail
--- Comment #4 from dorit at gcc dot gnu dot org 2007-07-24 08:50 --- i'm wondering if this could be related to a problem we're seeing with segfaults caused by misaligned movdqa instructions in zlib compiled with -ftree-vectorize. A fix for PR25413 was committed to mainline. Ryan, could you please check if it solves the zlib miscompilation? Andrew, would you plase check if it solves the libgcc miscompilation that you are seeing? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32093
[Bug target/32218] [4.2/4.3 Regression] segfault with -O1 -ftree-vectorize
--- Comment #5 from dorit at gcc dot gnu dot org 2007-07-24 08:53 --- (In reply to comment #4) I just tried to reproduce this bug on IA64 Linux (and HP-UX) with ToT sources (version 126242) and was not able to. Can anyone else reproduce this with ToT sources? does the fact that no one has responded yet means that this failure cannot be reproduced anymore? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32218