[Bug tree-optimization/45714] [4.6 Regression] Vectorization of double pow function causes a segmentation fault
--- Comment #7 from irar at il dot ibm dot com 2010-09-20 06:43 --- Fixed. -- irar at il dot ibm dot com changed: What|Removed |Added Status|NEW |RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45714
[Bug tree-optimization/45733] [4.6 Regression] ICE: verify_stmts failed: invalid conversion in gimple call with -fstrict-overflow -ftree-vectorize
--- Comment #2 from irar at il dot ibm dot com 2010-09-20 12:17 --- Looks like it is caused by revision 164367: http://gcc.gnu.org/ml/gcc-cvs/2010-09/msg00661.html -- irar at il dot ibm dot com changed: What|Removed |Added CC||matz at gcc dot gnu dot org Status|UNCONFIRMED |NEW Ever Confirmed|0 |1 Last reconfirmed|-00-00 00:00:00 |2010-09-20 12:17:14 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45733
[Bug tree-optimization/45733] [4.6 Regression] ICE: verify_stmts failed: invalid conversion in gimple call with -fstrict-overflow -ftree-vectorize
--- Comment #3 from irar at il dot ibm dot com 2010-09-20 13:08 --- For vector(2) void * we get vec_perm_v2di_u builtin declaration, because the mode of vector(2) void * is unsigned V2DI. I wonder if this can happen for every builtin call, and we should convert back to the original type everywhere? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45733
[Bug tree-optimization/45714] [4.6 Regression] Vectorization of double pow function causes a segmentation fault
--- Comment #3 from irar at il dot ibm dot com 2010-09-19 08:52 --- gimple_bb (stmt) returns NULL for that statement (D.1575_33 = __builtin_pow (D.1542_14, D.1574_32)). We can avoid vectorization in such cases, but looks like it should be fixed to return the actual basic block. Ira -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45714
[Bug tree-optimization/45714] [4.6 Regression] Vectorization of double pow function causes a segmentation fault
--- Comment #5 from irar at il dot ibm dot com 2010-09-19 10:08 --- Right. This patch fixes it: Index: tree-vect-stmts.c === --- tree-vect-stmts.c (revision 164332) +++ tree-vect-stmts.c (working copy) @@ -4478,6 +4478,7 @@ vect_transform_stmt (gimple stmt, gimple case call_vec_info_type: gcc_assert (!slp_node); done = vectorizable_call (stmt, gsi, vec_stmt); + stmt = gsi_stmt (*gsi); break; case reduc_vec_info_type: I am going to test it now. Thanks, Ira -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45714
[Bug tree-optimization/45470] [4.6 Regression] ICE: verify_flow_info failed: BB 2 can not throw but has an EH edge with -ftree-vectorize -fnon-call-exceptions
--- Comment #9 from irar at il dot ibm dot com 2010-09-12 09:46 --- OK, thanks. I am going to test this patch, it only checks data-refs and function calls: Index: tree-vect-data-refs.c === --- tree-vect-data-refs.c (revision 164227) +++ tree-vect-data-refs.c (working copy) @@ -2542,6 +2542,17 @@ vect_analyze_data_refs (loop_vec_info lo offset = unshare_expr (DR_OFFSET (dr)); init = unshare_expr (DR_INIT (dr)); + if (stmt_could_throw_p (stmt)) +{ + if (vect_print_dump_info (REPORT_UNVECTORIZED_LOCATIONS)) +{ + fprintf (vect_dump, not vectorized: statement can throw an + exception ); + print_gimple_stmt (vect_dump, stmt, 0, TDF_SLIM); +} + return false; +} + /* Update DR field in stmt_vec_info struct. */ /* If the dataref is in an inner-loop of the loop that is considered for Index: tree-vect-stmts.c === --- tree-vect-stmts.c (revision 164227) +++ tree-vect-stmts.c (working copy) @@ -1343,6 +1343,9 @@ vectorizable_call (gimple stmt, gimple_s if (TREE_CODE (gimple_call_lhs (stmt)) != SSA_NAME) return false; + if (stmt_could_throw_p (stmt)) +return false; + vectype_out = STMT_VINFO_VECTYPE (stmt_info); /* Process function arguments. */ Thanks, Ira -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45470
[Bug tree-optimization/45470] [4.6 Regression] ICE: verify_flow_info failed: BB 2 can not throw but has an EH edge with -ftree-vectorize -fnon-call-exceptions
--- Comment #3 from irar at il dot ibm dot com 2010-09-01 09:06 --- r163260 only made this BB vectorizable. I checked lookup_stmt_eh_lp for the last stmt of the BB and EDGE_EH flags before and after vectorization (basic block SLP), and in both cases lookup_stmt_eh_lp returns 0 and there is an EH edge from the basic block. I also tried to add cleanup_eh pass after SLP. If is somewhere before pass_tree_loop_done, there is no ICE: Index: passes.c === --- passes.c(revision 163538) +++ passes.c(working copy) @@ -925,6 +925,7 @@ init_optimization_passes (void) NEXT_PASS (pass_parallelize_loops); NEXT_PASS (pass_loop_prefetch); NEXT_PASS (pass_iv_optimize); + NEXT_PASS (pass_cleanup_eh); NEXT_PASS (pass_tree_loop_done); } NEXT_PASS (pass_cse_reciprocals); If cleanup_eh is scheduled after tree_loop_done, there is ICE: Index: passes.c === --- passes.c(revision 163538) +++ passes.c(working copy) @@ -926,6 +926,7 @@ init_optimization_passes (void) NEXT_PASS (pass_loop_prefetch); NEXT_PASS (pass_iv_optimize); NEXT_PASS (pass_tree_loop_done); + NEXT_PASS (pass_cleanup_eh); } NEXT_PASS (pass_cse_reciprocals); NEXT_PASS (pass_reassoc); -- irar at il dot ibm dot com changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever Confirmed|0 |1 Last reconfirmed|-00-00 00:00:00 |2010-09-01 09:06:19 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45470
[Bug tree-optimization/45470] [4.6 Regression] ICE: verify_flow_info failed: BB 2 can not throw but has an EH edge with -ftree-vectorize -fnon-call-exceptions
--- Comment #6 from irar at il dot ibm dot com 2010-09-01 11:54 --- (In reply to comment #5) I see before SLP: bb 2: MEM[(struct A *)this_1(D)].a = 0; MEM[(struct A *)this_1(D)].b = 0; MEM[(struct A *)this_1(D)].c = 0; [LP 2] MEM[(struct A *)this_1(D) + 12B].a = 0; and after: bb 2: vect_cst_.1_16 = { 0, 0, 0, 0 }; vect_p.5_17 = MEM[(struct A *)this_1(D)].a; M*vect_p.5_17{misalignment: 0} = vect_cst_.1_16; so EH info has not been properly transfered. How should it be done? Is it ok to assume that if one of the old stmts can throw, then we can set TREE_THIS_NOTRAP for the new access to 0? (and then we can call maybe_duplicate_eh_stmt (new_stmt, old_stmt)). Or maybe it's better to avoid vectorization?... Thanks, Ira Now that only MEM[(struct A *)this_1(D) + 12B].a can throw internally but not MEM[(struct A *)this_1(D)].c = 0; is a fact that the frontend establishes. The following mitigates the problem by simply removing the dead EH edges. Index: gcc/tree-vect-slp.c === --- gcc/tree-vect-slp.c (revision 163721) +++ gcc/tree-vect-slp.c (working copy) @@ -2474,6 +2474,9 @@ vect_schedule_slp (loop_vec_info loop_vi } } + if (bb_vinfo) +gimple_purge_dead_eh_edges (BB_VINFO_BB (bb_vinfo)); + return is_store; } -- irar at il dot ibm dot com changed: What|Removed |Added Target Milestone|4.6.0 |--- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45470
[Bug tree-optimization/41881] [4.5/4.6 regression] Complete unrolling (inner) versus vectorization of reduction
--- Comment #7 from irar at il dot ibm dot com 2010-08-11 10:24 --- (In reply to comment #6) I think that SLP doesn't handle reduction. Not all kinds of reduction. We handle #a1 = phi a0, a2 #b1 = phi b0, b2 ... a2 = a1 + x b2 = b1 + y Here we also have: #a1 = phi a0, a9 ... a2 = a1 + x ... a3 = a2 + y ... a9 = a8 + z -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41881
[Bug tree-optimization/45241] CPU2006 465.tonto ICE in the vectorizer with -fno-tree-pre
--- Comment #4 from irar at il dot ibm dot com 2010-08-10 09:06 --- I am testing the same patch as in comment #1. Testcase that shows the problem: int foo(short x) { short i, y; int sum; for (i = 0; i x; i++) y = x * i; for (i = x; i 0; i--) sum += y; return sum; } -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45241
[Bug tree-optimization/45241] CPU2006 465.tonto ICE in the vectorizer with -fno-tree-pre
--- Comment #5 from irar at il dot ibm dot com 2010-08-10 10:23 --- (In reply to comment #1) This patch should be a valid fix, because the recognition of the dot_prod pattern is known to be fail at this point if the stmt is outside the loop. (I am not sure whether we should not see this case in the vectorizer at this point -- should previous analysis already filter out?): I don't understand this. Where do we check if the stmt (which one?) is outside the loop? diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c index 19f0ae6..5f81a73 100644 --- a/gcc/tree-vect-patterns.c +++ b/gcc/tree-vect-patterns.c @@ -259,6 +259,10 @@ vect_recog_dot_prod_pattern (gimple last_stmt, tree *type_in, tree *type_out) inside the loop (in case we are analyzing an outer-loop). */ if (!is_gimple_assign (stmt)) return NULL; + + if (!flow_bb_inside_loop_p (loop, gimple_bb (stmt))) +return NULL; + stmt_vinfo = vinfo_for_stmt (stmt); gcc_assert (stmt_vinfo); if (STMT_VINFO_DEF_TYPE (stmt_vinfo) != vect_internal_def) I was looking at PR 45239 and didn't notice that there is another PR and didn't see this comment. So I tested the same fix (successfully on x86_64-suse-linux). You can commit it if you like (just please notice, that the bug exists on 4.5 as well). Thanks, Ira -- irar at il dot ibm dot com changed: What|Removed |Added Status|UNCONFIRMED |NEW Ever Confirmed|0 |1 Last reconfirmed|-00-00 00:00:00 |2010-08-10 10:24:00 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45241
[Bug lto/44152] ICE on compiling xshow.f of xplor-nih with -O3 -ffast-math -fwhopr
--- Comment #4 from irar at il dot ibm dot com 2010-07-27 09:25 --- I am testing a patch. -- irar at il dot ibm dot com changed: What|Removed |Added AssignedTo|unassigned at gcc dot gnu |irar at il dot ibm dot com |dot org | Status|NEW |ASSIGNED Last reconfirmed|2010-07-22 14:47:20 |2010-07-27 09:25:25 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44152
[Bug tree-optimization/44861] internal compiler error: in vectorizable_load, at tree-vect-stmts.c:3812
--- Comment #1 from irar at il dot ibm dot com 2010-07-08 09:14 --- The failure is in vectorizable_store(): /* If accesses through a pointer to vectype do not alias the original memory reference we have a problem. This should never happen. */ gcc_assert (alias_sets_conflict_p (get_alias_set (data_ref), get_alias_set (gimple_assign_lhs (stmt; Since MEM_REF merge the types struct Foo * and struct counted_base * pass types_compatible_p() test in vect_check_interleaving(). But in revision 161655 (the merge) the basic block gets vectorized and there is no ICE. -- irar at il dot ibm dot com changed: What|Removed |Added CC||richard dot guenther at ||gmail dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44861
[Bug tree-optimization/44710] New: If-conversion generates redundant statements
Starting from revision 160625 (http://gcc.gnu.org/ml/gcc-patches/2010-06/msg01155.html) if-conversion generates redundant statements for for (i = 0; i N; i++) if (arr[i] limit) { pos = i + 1; limit = arr[i]; } # pos_22 = PHI pos_1(4), 1(2) # i_23 = PHI prephitmp.8_2(4), 0(2) # limit_24 = PHI limit_4(4), 1.28e+2(2) # ivtmp.9_18 = PHI ivtmp.9_17(4), 64(2) limit_9 = arr[i_23]; pos_10 = i_23 + 1; D.4534_12 = limit_9 limit_24; - pretmp.7_3 = i_23 + 1; D.4535_20 = limit_9 = limit_24; - pos_1 = [cond_expr] limit_9 = limit_24 ? pos_22 : pos_10; limit_4 = [cond_expr] limit_9 = limit_24 ? limit_24 : limit_9; prephitmp.8_2 = [cond_expr] limit_9 = limit_24 ? pretmp.7_3 : pos_10; ivtmp.9_17 = ivtmp.9_18 - 1; D.4536_19 = D.4534_12 || D.4535_20; - if (ivtmp.9_17 != 0) goto bb 4; else goto bb 5; The statements are removed by later dce pass, but they interfere with my attempts to vectorize this loop. -- Summary: If-conversion generates redundant statements Product: gcc Version: 4.6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: irar at il dot ibm dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44710
[Bug tree-optimization/44710] If-conversion generates redundant statements
--- Comment #1 from irar at il dot ibm dot com 2010-06-29 09:11 --- Created an attachment (id=21036) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=21036action=view) Full testcase -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44710
[Bug tree-optimization/44711] New: PRE doesn't remove equivalent computations of induction variables
For the following loop for (i = 0; i N; i++) if (arr[i] limit) { pos = i + 1; limit = arr[i]; } PRE fails to eliminate redundant i_24 + 1 computation. Here is Richard's analysis from http://gcc.gnu.org/ml/gcc-patches/2010-06/msg02982.html: So the reason is our heuristic in PRE to not introduce new IVs: Found partial redundancy for expression {plus_expr,i_24,1} (0005) Skipping insertion of phi for partial redundancy: Looks like an induction variable Inserted pretmp.4_2 = i_13 + 1; in predecessor 8 Found partial redundancy for expression {plus_expr,i_24,1} (0005) Inserted pretmp.4_22 = i_24 + 1; in predecessor 7 Created phi prephitmp.5_21 = PHI pretmp.4_22(7), pos_11(4) in block 5 Found partial redundancy for expression {plus_expr,i_24,1} (0005) Skipping insertion of phi for partial redundancy: Looks like an induction variable Replaced i_24 + 1 with prephitmp.5_21 in i_13 = i_24 + 1; Removing unnecessary insertion:pretmp.4_2 = i_13 + 1; we do not want to insert into block 3, so we are left with bb 3: # pos_23 = PHI pos_1(8), 1(2) # i_24 = PHI i_13(8), 0(2) # limit_25 = PHI limit_4(8), 1.28e+2(2) limit_9 = arr[i_24]; D.3841_10 = limit_9 limit_25; if (D.3841_10 != 0) goto bb 4; else goto bb 7; bb 7: pretmp.4_22 = i_24 + 1; goto bb 5; bb 4: pos_11 = i_24 + 1; bb 5: # pos_1 = PHI pos_23(7), pos_11(4) # limit_4 = PHI limit_25(7), limit_9(4) # prephitmp.5_21 = PHI pretmp.4_22(7), pos_11(4) i_13 = prephitmp.5_21; where there is no full redundancy for i_24 + 1 now (that is, we did some useless half-way PRE because of that IV heuristic ...). -- Summary: PRE doesn't remove equivalent computations of induction variables Product: gcc Version: 4.6.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: irar at il dot ibm dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44711
[Bug tree-optimization/44711] PRE doesn't remove equivalent computations of induction variables
--- Comment #1 from irar at il dot ibm dot com 2010-06-29 11:00 --- Created an attachment (id=21037) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=21037action=view) Full testcase -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44711
[Bug tree-optimization/44507] [4.5/4.6 Regression] vectorization ANDs array elements together incorrectly
--- Comment #5 from irar at il dot ibm dot com 2010-06-13 10:29 --- The bug is in creation of a neutral value for BIT_AND_EXPR. What is the correct way to create it for all types? I found double-int.h:#define ALL_ONES (~((unsigned HOST_WIDE_INT) 0)) but it won't work for signed. Thanks, Ira -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44507
[Bug tree-optimization/44507] [4.5/4.6 Regression] vectorization ANDs array elements together incorrectly
--- Comment #7 from irar at il dot ibm dot com 2010-06-13 12:01 --- (In reply to comment #6) (In reply to comment #5) The bug is in creation of a neutral value for BIT_AND_EXPR. What is the correct way to create it for all types? I found double-int.h:#define ALL_ONES (~((unsigned HOST_WIDE_INT) 0)) but it won't work for signed. build_int_cst (type, -1) OK, thanks. At least in tree-vect-slp.c:1669 this seems to be buggy. The case for BIT_AND_EXPR should be separated from that of MULT_EXPR. Right, this is buggy too, but the failure here is in reduction (get_initial_def_for_reduction), not in SLP. Is it safe to assume that operands of BIT_AND_EXPR are of integral type? If so, I'll test the following patch: Index: tree-vect-loop.c === --- tree-vect-loop.c(revision 160524) +++ tree-vect-loop.c(working copy) @@ -2871,12 +2871,15 @@ get_initial_def_for_reduction (gimple st *adjustment_def = init_val; } -if (code == MULT_EXPR || code == BIT_AND_EXPR) +if (code == MULT_EXPR) { real_init_val = dconst1; int_init_val = 1; } +if (code == BIT_AND_EXPR) + int_init_val = -1; + if (SCALAR_FLOAT_TYPE_P (scalar_type)) def_for_init = build_real (scalar_type, real_init_val); else Index: tree-vect-slp.c === --- tree-vect-slp.c (revision 160524) +++ tree-vect-slp.c (working copy) @@ -1662,7 +1662,6 @@ vect_get_constant_vectors (slp_tree slp_ break; case MULT_EXPR: - case BIT_AND_EXPR: if (SCALAR_FLOAT_TYPE_P (TREE_TYPE (op))) neutral_op = build_real (TREE_TYPE (op), dconst1); else @@ -1670,6 +1669,10 @@ vect_get_constant_vectors (slp_tree slp_ break; + case BIT_AND_EXPR: +neutral_op = build_int_cst (TREE_TYPE (op), -1); +break; + default: neutral_op = NULL; } Thanks, Ira -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44507
[Bug tree-optimization/44183] Vectorizer may generate invalid memory access
--- Comment #1 from irar at il dot ibm dot com 2010-05-20 07:13 --- Do you mean that extract_even implementation does something illegal with this last element? Misaligned load also accesses elements outside the array, but the problem is in extract_even? Other than doing something in the backend, we can reduce the number of vector iterations in cases that may access elements outside array bounds for specific targets... -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44183
[Bug tree-optimization/44183] Vectorizer may generate invalid memory access
--- Comment #3 from irar at il dot ibm dot com 2010-05-20 10:04 --- I am curious what is the problem with that? These elements are not used, they are just loaded... -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44183
[Bug tree-optimization/44183] Vectorizer may generate invalid memory access
--- Comment #5 from irar at il dot ibm dot com 2010-05-20 10:24 --- Even if we are talking about less than vector size from array boundary? And that boundary is not (vector) aligned. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44183
[Bug middle-end/43901] [4.6 Regression] FAIL: gcc.c-torture/compile/pr42196-2.c
--- Comment #16 from irar at il dot ibm dot com 2010-05-10 08:17 --- Fixed. -- irar at il dot ibm dot com changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43901
[Bug middle-end/43901] [4.6 Regression] FAIL: gcc.c-torture/compile/pr42196-2.c
--- Comment #14 from irar at il dot ibm dot com 2010-05-05 09:02 --- It tries to get a _vector_ type of the same size. In theory each vectorization method can choose whatever vector size suits them most (as for external defs they need to build up a vector of equivalent elements anyway). So with AVX we can do V4DF - V4SF vectorization, if the double is an external def the vectorization method could choose to create a vector with double size. But the reasonable default for now is th force a same-sized vector type as that is what the vectorizer was tested for until now (well, until I get the followup patch cleaned up and posted again). OK, thanks for the explanation. So yes, if we can return false we should probably do so instead of asserting (maybe assert that if we are supposed to create vectorized stmts and thus cannot fail that we indeed have a vector type here). I'll prepare a patch. Thanks, Ira Richard. -- irar at il dot ibm dot com changed: What|Removed |Added AssignedTo|unassigned at gcc dot gnu |irar at il dot ibm dot com |dot org | Status|NEW |ASSIGNED Last reconfirmed|2010-05-02 10:44:22 |2010-05-05 09:02:26 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43901
[Bug middle-end/43901] [4.6 Regression] FAIL: gcc.c-torture/compile/pr42196-2.c
--- Comment #12 from irar at il dot ibm dot com 2010-05-03 12:30 --- Well. For loops we'd have disqualified it as there is no vector type for the external def (well, the stmt inside the loop). I don't think that's true. With -fno-tree-pre we get the same ICE for loop vectorization for: #define N 64 union U { __complex__ int ci; __complex__ float cf; }; union U u[N]; void foo (double f1, double f2) { int i; for (i=0; iN; i++) { __real__ u[i].cf = f1; __imag__ u[i].cf = f2; } } So we do not do this for SLP? In that case yes, if we can return false at this point then we should replace this (and similar) asserts with return false. Or we should fix the code that scans the BB initially and sets vector types properly? The loop scan that sets vector types, only checks lhs types (or the smallest type in stmt) in order to decide on vectorization factor. There is a similar scan for BBs in vect_analyze_stmt (only to set vector types for stmts) and it also looks only at lhs. The failure occurs in analysis, so it's ok to return false at this point. But I don't understand why external def has to have the same size as the lhs? (And it is, of course, possible that both types are vectorizable, but still the rhs type is bigger than the lhs type). Thanks, Ira Thanks, Richard. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43901
[Bug middle-end/43901] [4.6 Regression] FAIL: gcc.c-torture/compile/pr42196-2.c
--- Comment #9 from irar at il dot ibm dot com 2010-05-02 11:08 --- Thanks, Uros! I reproduced the ICE using your instructions. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43901
[Bug middle-end/43901] [4.6 Regression] FAIL: gcc.c-torture/compile/pr42196-2.c
--- Comment #10 from irar at il dot ibm dot com 2010-05-02 12:12 --- Looks like it's caused by: r158157 | rguenth | 2010-04-09 13:40:14 +0300 (Fri, 09 Apr 2010) | 28 lines The problem is in getting vectype for f1_2: foo (int b, double f1, double f2, int c1, int c2) { ... float D.1999; float D.1998; ... bb 3: D.1998_3 = (float) f1_2(D); REALPART_EXPR u.cf = D.1998_3; D.1999_5 = (float) f2_4(D); IMAGPART_EXPR u.cf = D.1999_5; D.2012_10 = u.ci; goto bb 5; An immediate fix would be to replace the assert in /* If op0 is an external or constant def use a vector type with the same size as the output vector type. */ if (!vectype) vectype = get_same_sized_vectype (TREE_TYPE (op0), vectype_out); gcc_assert (vectype); with 'return false', since get_same_sized_vectype currently just redirects to get_vectype_for_scalar_type. But the comment (and the future intent) seems incorrect for external defs, as f1 and f2 in this test. Ira -- irar at il dot ibm dot com changed: What|Removed |Added CC||rguenther at suse dot de http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43901
[Bug middle-end/43901] [4.6 Regression] FAIL: gcc.c-torture/compile/pr42196-2.c
--- Comment #4 from irar at il dot ibm dot com 2010-05-02 05:51 --- I don't have access to ia64. I tried to change the types in the test to make the basic blocks vectorizable on x86_64, but didn't get any error. So I still need SLP dump in order to solve this. Thanks, Ira -- irar at il dot ibm dot com changed: What|Removed |Added Target Milestone|4.6.0 |--- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43901
[Bug middle-end/43901] [4.6 Regression] FAIL: gcc.c-torture/compile/pr42196-2.c
--- Comment #1 from irar at il dot ibm dot com 2010-04-27 05:53 --- Could you please give some more information? It doesn't fail on x86_64-linux. (For SLP dump please use -fdump-tree-slp-details). Thanks, Ira -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43901
[Bug tree-optimization/43842] [4.6 Regression] ice in vect_create_epilog_for_reduction
-- irar at il dot ibm dot com changed: What|Removed |Added AssignedTo|unassigned at gcc dot gnu |irar at il dot ibm dot com |dot org | Status|NEW |ASSIGNED Last reconfirmed|2010-04-22 08:51:50 |2010-04-22 11:46:44 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43842
[Bug testsuite/43482] Fix *.log tests merged output containing ===
--- Comment #6 from irar at il dot ibm dot com 2010-04-22 18:11 --- Yes, sorry about that. I updated the ChangeLogs. Ira -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43482
[Bug rtl-optimization/31485] C complex numbers, amd64 SSE, missed optimization opportunity
--- Comment #8 from irar at il dot ibm dot com 2010-04-21 11:33 --- Yes, it's possible to add this to SLP. But I don't understand how D.3154_3 = COMPLEX_EXPR D.3163_8, D.3164_9; should be vectorized. D.3154_3 is complex and the rhs will be a vector {D.3163_8, D.3164_9} (btw, we have to change float to double, otherwise, we don't have complete vectors and this is not supported). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31485
[Bug rtl-optimization/31485] C complex numbers, amd64 SSE, missed optimization opportunity
--- Comment #10 from irar at il dot ibm dot com 2010-04-21 18:33 --- Thanks. So, it is not always profitable and requires a cost model. I am now working on cost model for basic block vectorization, I can look at this once we have one. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31485
[Bug tree-optimization/43771] [4.5/4.6 Regression] ICE on valid when compiling ParMetis with gcc 4.5.0 and -O3
--- Comment #7 from irar at il dot ibm dot com 2010-04-19 07:48 --- Fixed on 4.6, 4.5 and 4.4. -- irar at il dot ibm dot com changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43771
[Bug tree-optimization/37027] SLP loop vectorization missing support for reductions
--- Comment #5 from irar at il dot ibm dot com 2010-04-19 14:35 --- Fixed. -- irar at il dot ibm dot com changed: What|Removed |Added Status|NEW |RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37027
[Bug tree-optimization/43771] [4.5/4.6 Regression] ICE on valid when compiling ParMetis with gcc 4.5.0 and -O3
-- irar at il dot ibm dot com changed: What|Removed |Added AssignedTo|unassigned at gcc dot gnu |irar at il dot ibm dot com |dot org | Status|NEW |ASSIGNED Last reconfirmed|2010-04-16 21:16:37 |2010-04-18 08:12:31 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43771
[Bug tree-optimization/43692] small loop not vectorized
--- Comment #1 from irar at il dot ibm dot com 2010-04-08 17:14 --- It probably happens because the vectorization is not profitable. Try -fno-vect-cost-model flag. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43692
[Bug tree-optimization/43692] small loop not vectorized
--- Comment #3 from irar at il dot ibm dot com 2010-04-08 17:33 --- Both loops get vectorized for me with -O3 on x86_64-suse-linux. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43692
[Bug tree-optimization/43692] small loop not vectorized
--- Comment #5 from irar at il dot ibm dot com 2010-04-08 17:59 --- In GCC 4.4 the smaller loop gets completely unrolled before the vectorizer. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43692
[Bug tree-optimization/43425] enhance scalar expansion to vectorize this loop
--- Comment #2 from irar at il dot ibm dot com 2010-03-28 08:59 --- I think PR 35229 covers this issue. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43425
[Bug tree-optimization/43431] Diagnostic message is not clear for vectorization profitability analysis
--- Comment #1 from irar at il dot ibm dot com 2010-03-28 09:41 --- (In reply to comment #0) What does this message mean? vector iteration cost = 2056 is divisible by scalar iteration cost = 4 by a factor greater than or equal to the vectorization factor = 4 . Is the vectorization not profitable? Why? The cost of one vector iteration is 2056. The cost of one scalar iteration is 4. 2056/4 = 514 514 4 (= vectorization factor) The vectorization is not profitable. We want to vectorize only if one vector iteration cost is lower than one scalar iteration cost multiplied by vectorization factor. (Vector cost is so high here, because of the j,i access. We should vectorize the outer loop, but we fail because of some unsupported features: unknown inner loop bound, need for versioning (for alias) in outer loop.) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43431
[Bug tree-optimization/43436] Missed vectorization: unhandled data-ref
--- Comment #2 from irar at il dot ibm dot com 2010-03-28 10:58 --- (In reply to comment #0) sub_hfyu_median_prediction.c:18: note: not vectorized: unhandled data-ref Looking with GDB at it, I get: (gdb) p debug_data_references (datarefs) (Data Ref: stmt: D.2736_16 = *D.2735_15; ref: *D.2735_15; base_object: *src1_14(D); Access function 0: {0B, +, 1}_1 ) (Data Ref: stmt: ref: base_object: ) I think it is the dst data ref that is NULL. Might be an aliasing problem for the data dep analysis, but still, the data ref should be analyzed correctly first. Data refs analysis fails because of the function call in the loop. The vectorizer should check the return value of compute_data_dependences_for_loop() and print some better error message though. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43436
[Bug tree-optimization/43436] Missed vectorization: unhandled data-ref
--- Comment #3 from irar at il dot ibm dot com 2010-03-28 11:07 --- (In reply to comment #1) hadamard8_diff.c:44: note: not vectorized: unhandled data-ref There is a function call in this loop as well. hadamard8_diff.c:26: note: not vectorized: data ref analysis failed D.2771_12 = *D.2770_11; Scalar evolution analysis fails here with: failed: evolution of base is not affine. D.2768_8 = i_361 * stride_7(D); D.2769_9 = (long unsigned int) D.2768_8; D.2770_11 = src_10(D) + D.2769_9; D.2771_12 = *D.2770_11; stride is function parameter. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43436
[Bug tree-optimization/43543] Reorder the statements in the loop can vectorize it
--- Comment #1 from irar at il dot ibm dot com 2010-03-28 11:16 --- Looks similar to PR 32806. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43543
[Bug tree-optimization/43436] Missed vectorization: unhandled data-ref
--- Comment #6 from irar at il dot ibm dot com 2010-03-28 18:05 --- (In reply to comment #4) What about fixing the diagnostic message like this: It would be nice to do the same for SLP (compute_data_dependences_for_bb) for completeness. Thanks, Ira diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c index 37ae9b5..44248b3 100644 --- a/gcc/tree-vect-data-refs.c +++ b/gcc/tree-vect-data-refs.c @@ -1866,10 +1866,21 @@ vect_analyze_data_refs (loop_vec_info loop_vinfo, bb_vec_info bb_vinfo) if (loop_vinfo) { + bool res; + loop = LOOP_VINFO_LOOP (loop_vinfo); - compute_data_dependences_for_loop (loop, true, - LOOP_VINFO_DATAREFS (loop_vinfo), - LOOP_VINFO_DDRS (loop_vinfo)); + res = compute_data_dependences_for_loop + (loop, true, LOOP_VINFO_DATAREFS (loop_vinfo), +LOOP_VINFO_DDRS (loop_vinfo)); + + if (!res) +{ + if (vect_print_dump_info (REPORT_UNVECTORIZED_LOCATIONS)) + fprintf (vect_dump, not vectorized: loop contains function calls + or data references that cannot be analyzed); + return false; +} + datarefs = LOOP_VINFO_DATAREFS (loop_vinfo); } else -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43436
[Bug tree-optimization/43436] Missed vectorization: unhandled data-ref
--- Comment #7 from irar at il dot ibm dot com 2010-03-28 18:22 --- (In reply to comment #5) When defining the missing function like this: static inline int mid_pred(int a, int b, int c) { int t= (a-b)((a-b)31); a-=t; b+=t; b-= (b-c)((b-c)31); b+= (a-b)((a-b)31); return b; } The vectorization reports: not vectorized: unsupported use in stmt. Yes, we have an unsupported cycles for l and lt, since they don't match regular reduction pattern. When this function is defined like this: static inline int mid_pred(int a, int b, int c) { if(ab){ if(cb){ if(ca) b=a; elseb=c; } }else{ if(bc){ if(ca) b=c; elseb=a; } } return b; } the vectorizer stops with: not vectorized: control flow in loop. if-conversion fails with l_34 = *D.2750_33; tree could trap... -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43436
[Bug tree-optimization/42652] vectorizer created unaligned vector insns
--- Comment #17 from irar at il dot ibm dot com 2010-02-22 09:01 --- Is there a way to pass alignment information similar to PR 39954? Otherwise, a proper fix would be some inter-procedural analysis... Meantime, we can do intra-procedural analysis and fail when we reach function argument, i.e, use runtime checks. We already have several types of versioning, so adding another one will complicate the things even more, and will not always be possible (because of code size constrains). Thanks, Ira -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42652
[Bug tree-optimization/43074] [4.4/4.5 Regression] ICE in vectorizable_reduction, at tree-vect-loop.c:3491
-- irar at il dot ibm dot com changed: What|Removed |Added AssignedTo|unassigned at gcc dot gnu |irar at il dot ibm dot com |dot org | Status|UNCONFIRMED |ASSIGNED Ever Confirmed|0 |1 Last reconfirmed|-00-00 00:00:00 |2010-02-15 12:39:54 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43074
[Bug tree-optimization/42846] GCC sometimes ignores information about pointer target alignment
--- Comment #3 from irar at il dot ibm dot com 2010-01-24 07:39 --- This has already been discussed in PR 41464. Ira -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42846
[Bug tree-optimization/42652] vectorizer created unaligned vector insns
--- Comment #13 from irar at il dot ibm dot com 2010-01-18 12:17 --- Does something like this make sense? (With this patch we will never use peeling for function parameters, unless the builtin returns OK to peel for packed types). Index: tree-vect-data-refs.c === --- tree-vect-data-refs.c (revision 155880) +++ tree-vect-data-refs.c (working copy) @@ -1010,10 +1010,29 @@ vector_alignment_reachable_p (struct dat tree type = (TREE_TYPE (DR_REF (dr))); tree ba = DR_BASE_OBJECT (dr); bool is_packed = false; + tree tmp = TREE_TYPE (DR_BASE_ADDRESS (dr)); if (ba) is_packed = contains_packed_reference (ba); + is_packed = is_packed || contains_packed_reference (DR_BASE_ADDRESS (dr)); + + if (!is_packed) +{ + while (tmp) +{ + is_packed = TYPE_PACKED (tmp); + if (is_packed) +break; + + tmp = TREE_TYPE (tmp); +} +} + + if (TREE_CODE (DR_BASE_ADDRESS (dr)) == SSA_NAME + TREE_CODE (SSA_NAME_VAR (DR_BASE_ADDRESS (dr))) == PARM_DECL) +is_packed = true; + if (vect_print_dump_info (REPORT_DETAILS)) fprintf (vect_dump, Unknown misalignment, is_packed = %d,is_packed); if (targetm.vectorize.vector_alignment_reachable (type, is_packed)) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42652
[Bug tree-optimization/42652] vectorizer created unaligned vector insns
--- Comment #10 from irar at il dot ibm dot com 2010-01-13 09:35 --- Yes, I understand that we can't assume that an access is aligned if we can't prove it's aligned. I don't understand how we can prove that a COMPONENT_REF is aligned, i.e., if there is a way to check if a struct is packed, or we'd better decide that we always use versioning for COMPONENT_REFs? Thanks, Ira -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42652
[Bug tree-optimization/42709] [4.5 Regression] error: type mismatch in pointer plus expression
-- irar at il dot ibm dot com changed: What|Removed |Added AssignedTo|unassigned at gcc dot gnu |irar at il dot ibm dot com |dot org | Status|NEW |ASSIGNED Last reconfirmed|2010-01-12 16:12:10 |2010-01-13 11:36:55 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42709
[Bug tree-optimization/42652] vectorizer created unaligned vector insns
--- Comment #8 from irar at il dot ibm dot com 2010-01-12 08:08 --- So, to be on the safe side, we should assume that COMPONENT_REFs are not naturally aligned and never use peeling for them? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42652
[Bug tree-optimization/42652] vectorizer created unaligned vector insns
--- Comment #5 from irar at il dot ibm dot com 2010-01-10 08:22 --- In vector_alignment_reachable_p() we check if an access is packed using contains_packed_reference(). For packed accesses we return false, meaning alignment is unreachable and peeling cannot be used. In the attached testcase contains_packed_reference() returns false for palette_5. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42652
[Bug middle-end/41082] [4.5 Regression] FAIL: gfortran.fortran-torture/execute/where_2.f90 execution, -O3 -g with -m64
--- Comment #43 from irar at il dot ibm dot com 2010-01-10 13:43 --- Since -O2 -ftree-vectorize doesn't cause bad code, it has to be some other optimization on top of vectorized code that causes the problem. Bad code is generated when the alignment of 'reduce' is forced and the reduction 'sum(reduce)' is vectorized. However, the result of the reduction is correct, and the vector store element does not do any damage (as far as I can see in debugger). So, the vector stores don't corrupt anything. The part that goes wrong is in the scalar code that implements the decision on whether to add the (correctly computed) reduction value to temp[9] and temp[10]. The code that sets the condition, (which, by the way, is not using any vectorized code) is not using the values of reduce[9] and reduce[10], even though the value of the condition depends on them: reduce(1:3) = -1 reduce(4:6) = 0 reduce(7:8) = 5 reduce(9:10) = 10 ... WHERE (reduce 6) temp = temp + sum(reduce) Here is the code for adding the result of the sum(reduce) to temp[9]: L29: lbz r11,152(r1) # ** cmpwi cr7,r11,0 # reduce 6 ? beq cr7,L30 lwz r11,240(r1) # load temp[9] add r11,r11,r9 # temp[9] + sum(reduce) stw r11,240(r1) # store temp[9] ** - The calculation of 152(r1) is based only on the value of reduce[8]! The values of reduce[9] and reduce[10] are only used in the reduction calculation and not compared to 6 at all. In case we don't vectorize (but force the alignment), there is cmpwi cr7,r29,6 instruction, where r29 is reduce[9] (and the code is correct). The same happens when the alignment of reduce is not forced and the reduction is vectorized using peeling. I.e., as far as I can see, in the bad code, the comparison of reduce[9] and reduce[10] with 6 do not exist. I wonder which optimization can be responsible for that? Also, some values of reduce are copied to a temporal array and are further compared with 6. In the version with peeling the values that are copied are reduce[4:8]: there is no need to keep the first three and the last two are kept in registers and compared to 6 (and also used in reduction epilogue). While in the bad version the kept values are reduce[3:8] and reduce[8] is put before the values of reduce[3:7] (reduce[3:7] are in 276(r1) to 292(r1), and reduce[8] is in 272(r1)). (And in the bad code the last two values reduce[9] and reduce[10] are only used in reduction epilogue). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41082
[Bug middle-end/41082] [4.5 Regression] FAIL: gfortran.fortran-torture/execute/where_2.f90 execution, -O3 -g with -m64
--- Comment #42 from irar at il dot ibm dot com 2010-01-05 09:09 --- So, it's enough to force alignment of reduce only (and to vectorize its loop) to get wrong code. On the other hand, the result of the vectorized loop is correct, and the problem is in choosing the correct index of temp. The assembly looks fine to me. So, for me the only way to proceed is to debug. Dominique, is it possible to access your machine? Ira -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41082
[Bug middle-end/41956] Segfault in vectorizer
--- Comment #7 from irar at il dot ibm dot com 2009-12-30 10:16 --- The bug is in SLP load permutation analysis. I am testing a patch. -- irar at il dot ibm dot com changed: What|Removed |Added AssignedTo|unassigned at gcc dot gnu |irar at il dot ibm dot com |dot org | Status|NEW |ASSIGNED Last reconfirmed|2009-12-29 17:42:02 |2009-12-30 10:16:22 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41956
[Bug middle-end/41082] [4.5 Regression] FAIL: gfortran.fortran-torture/execute/where_2.f90 execution, -O3 -g with -m64
--- Comment #40 from irar at il dot ibm dot com 2009-12-23 14:49 --- (In reply to comment #39) I have regtested the patch in comment #31 and I have ~75 regressions on x86_64-apple-darwin10 in the gcc vect test suite (~100 on powerpc-apple-darwin9). Is this expected? and do you want the list? Yes, it is expected, it is not a bug fixing patch (as well as the rest of the hacks I asked you to check), it disables a feature - alignment forcing, so some tests are supposed to fail. Thanks, Ira -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41082
[Bug middle-end/41082] [4.5 Regression] FAIL: gfortran.fortran-torture/execute/where_2.f90 execution, -O3 -g with -m64
--- Comment #30 from irar at il dot ibm dot com 2009-12-22 11:42 --- We can try to verify the alignment issue by applying the two hacks I am attaching. The first one disables alignment forcing for all the data-refs (and marks the alignment as unknown). The loops are still vectorizable using peeling - hopefully, they are also vectorizable on darwin. So, if the results are correct and the two loops are vectorized, then the problem is in alignment. If the results are incorrect, the problem is in vectorization. The second one still forces alignment of the vectorized arrays, but not of the other arrays. With -fdump-tree-vect-details (or verbosity 9) it prints force alignment of data-ref, so we can verify that the correct arrays were aligned (reduce line 11 and temp line 5). So, here, the loops should be vectorized as before and only the alignment of not vectorized arrays will not be forced. Dominique, could you please check this? Thanks, Ira -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41082
[Bug middle-end/41082] [4.5 Regression] FAIL: gfortran.fortran-torture/execute/where_2.f90 execution, -O3 -g with -m64
--- Comment #31 from irar at il dot ibm dot com 2009-12-22 11:43 --- Created an attachment (id=19370) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=19370action=view) disable alignment forcing -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41082
[Bug middle-end/41082] [4.5 Regression] FAIL: gfortran.fortran-torture/execute/where_2.f90 execution, -O3 -g with -m64
--- Comment #32 from irar at il dot ibm dot com 2009-12-22 11:44 --- Created an attachment (id=19371) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=19371action=view) force alignment of vectorized arrays only -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41082
[Bug middle-end/41082] [4.5 Regression] FAIL: gfortran.fortran-torture/execute/where_2.f90 execution, -O3 -g with -m64
--- Comment #36 from irar at il dot ibm dot com 2009-12-23 07:54 --- Thanks! So, it is alignment of the vectorized arrays. I'd like to do two more checks: 1. Just force alignment of the two arrays (temp and reduce) and do not vectorize. 2. Force alignment of reduce only (and vectorize both loops). I am attaching the hacks. Could you please chesk this as well? Thanks, Ira -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41082
[Bug middle-end/41082] [4.5 Regression] FAIL: gfortran.fortran-torture/execute/where_2.f90 execution, -O3 -g with -m64
--- Comment #37 from irar at il dot ibm dot com 2009-12-23 07:54 --- Created an attachment (id=19377) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=19377action=view) Force alignment but don't vectorize -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41082
[Bug middle-end/41082] [4.5 Regression] FAIL: gfortran.fortran-torture/execute/where_2.f90 execution, -O3 -g with -m64
--- Comment #38 from irar at il dot ibm dot com 2009-12-23 07:55 --- Created an attachment (id=19378) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=19378action=view) Force alignment of reduce only -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41082
[Bug middle-end/41082] [4.5 Regression] FAIL: gfortran.fortran-torture/execute/where_2.f90 execution, -O3 -g with -m64
--- Comment #23 from irar at il dot ibm dot com 2009-12-20 12:18 --- The code that now gets vectorized is the summation of array 'reduce': sum(reduce). It looks like the problem is with adding the reduction result to the correct index of 'temp' (scalar code), and not with the reduction itself. Could you please verify that by printing the reduction result? Thanks, Ira -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41082
[Bug middle-end/41082] [4.5 Regression] FAIL: gfortran.fortran-torture/execute/where_2.f90 execution, -O3 -g with -m64
--- Comment #26 from irar at il dot ibm dot com 2009-12-20 13:46 --- I think the problem is in alignment. We force alignment of temp.6 and temp.20 - the arrays of relevant comaprison results - even though we don't vectorize their loop. The decision whether we can force alignment is made in vect_can_force_dr_alignment_p(), and it seems that the only target specific query there is comparison with MAX_STACK_ALIGNMENT. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41082
[Bug middle-end/41082] [4.5 Regression] FAIL: gfortran.fortran-torture/execute/where_2.f90 execution, -O3 -g with -m64
--- Comment #28 from irar at il dot ibm dot com 2009-12-20 13:59 --- Hm, I don't know, but this is my best guess - we change something in the code that goes wrong... We also force alignment of reduce, but the reduction computation looks ok. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41082
[Bug middle-end/41082] [4.5 Regression] FAIL: gfortran.fortran-torture/execute/where_2.f90 execution, -O3 -g with -m64
--- Comment #21 from irar at il dot ibm dot com 2009-12-16 12:01 --- Thanks. I'll be able to look at this only on Sunday due to holidays. Ira -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41082
[Bug middle-end/41082] [4.5 Regression] FAIL: gfortran.fortran-torture/execute/where_2.f90 execution, -O3 -g with -m64
--- Comment #7 from irar at il dot ibm dot com 2009-12-15 08:25 --- I can't reproduce it with current mainline on powerpc64-suse-linux. Could you please attach vectorizer dump? Does the good old version gets vectorized? If so, could you please attach it as well? Thanks, Ira -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41082
[Bug middle-end/41082] [4.5 Regression] FAIL: gfortran.fortran-torture/execute/where_2.f90 execution, -O3 -g with -m64
--- Comment #11 from irar at il dot ibm dot com 2009-12-15 10:59 --- Looks that it has to be my patch that enables vectorization of conditions: r149806 | irar | 2009-07-20 14:59:10 +0300 (Mon, 20 Jul 2009) | 19 lines * tree-vectorizer.h (vectorizable_condition): Add parameters. * tree-vect-loop.c (vect_is_simple_reduction): Support COND_EXPR. (get_initial_def_for_reduction): Likewise. (vectorizable_reduction): Skip the check of first operand in case of COND_EXPR. Add check that it is outer loop vectorization if nested cycle was detected. Call vectorizable_condition() for COND_EXPR. If reduction epilogue cannot be created do not fail for nested cycles (if it is not double reduction). Assert that there is only one type in the loop in case of COND_EXPR. Call vectorizable_condition() to vectorize COND_EXPR. * tree-vect-stmts.c (vectorizable_condition): Update comment. Add parameters. Allow nested cycles if called from vectorizable_reduction(). Use reduction vector variable if provided. (vect_analyze_stmt): Call vectorizable_reduction() before vectorizable_condition(). (vect_transform_stmt): Update call to vectorizable_condition(). I'll try to find out what's wrong with it. Thanks, Ira -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41082
[Bug middle-end/41082] [4.5 Regression] FAIL: gfortran.fortran-torture/execute/where_2.f90 execution, -O3 -g with -m64
--- Comment #13 from irar at il dot ibm dot com 2009-12-15 13:07 --- (In reply to comment #12) Looks that it has to be my patch that enables vectorization of conditions: I am doing a clean bootstrap of C and FORTRAN of revision 149805 to see if the test works for it (allow for ~6h on my poor G5). Then I'll update to 149806. 1) Thanks. I got confused by the var names, but actually there is no COND_EXPR there. But still, it can be this patch. So it worth checking. 2) The vectorizer's code for powerpc64-suse-linux I got is identical to darwin's except that the first has calls: _gfortran_set_args (argc_1(D), argv_2(D)); _gfortran_set_options (8, options.36[0]); in the begining and the second one has this bb: bb 43: dt_parm.33.common.filename = where_2.f90[1]{lb: 1 sz: 1}; dt_parm.33.common.line = 20; dt_parm.33.common.flags = 128; dt_parm.33.common.unit = 6; _gfortran_st_write (dt_parm.33); parm.34.dtype = 265; parm.34.dim[0].lbound = 1; parm.34.dim[0].ubound = 10; parm.34.dim[0].stride = 1; parm.34.data = temp[0]; parm.34.offset = -1; _gfortran_transfer_array (dt_parm.33, parm.34, 4, 0); _gfortran_st_write_done (dt_parm.33); (I am attaching my dump). 3) The only difference between the targets I am aware of is natural alignment, but we don't do peeling, so it shouldn't make any difference here. 4) We do force alignment. Between in the revisions range there is this patch that may be somehow related: r149853 | pbrook | 2009-07-21 15:35:38 +0300 (Tue, 21 Jul 2009) | 12 lines 2009-07-21 Paul Brook p...@codesourcery.com gcc/ * tree-vectorizer.c (increase_alignment): Handle nested arrays. Terminate debug dump with newline. gcc/testsuite/ * gcc.dg/vect/section-anchors-nest-1.c: New test. * lib/target-supports.exp (check_effective_target_section_anchors): Add arm*-*-*. 5) Also looking at the assembly may help. Could you please attach it as well? Thanks, Ira -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41082
[Bug middle-end/41082] [4.5 Regression] FAIL: gfortran.fortran-torture/execute/where_2.f90 execution, -O3 -g with -m64
--- Comment #14 from irar at il dot ibm dot com 2009-12-15 13:08 --- Created an attachment (id=19311) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=19311action=view) powerpc64-suse-linux vect dump -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41082
[Bug middle-end/41082] [4.5 Regression] FAIL: gfortran.fortran-torture/execute/where_2.f90 execution, -O3 -g with -m64
--- Comment #16 from irar at il dot ibm dot com 2009-12-15 13:35 --- But in comment #5 you wrote that it passes with the print, right? So, this dump contains correct or incorrect code? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41082
[Bug tree-optimization/42286] October 23rd change to tree-ssa-pre.c breaks calculix on powerpc with -ffast-math
--- Comment #3 from irar at il dot ibm dot com 2009-12-06 13:25 --- On powerpc64-suse-linux with current trunk calculix failed after a couple of minutes with -O3 -maltivec -ffast-math -O3 -maltivec -ffast-math -fno-tree-vectorize -O2 -maltivec -ffast-math -O1 -maltivec -ffast-math It is currently running for about an hour with -O0 -maltivec -ffast-math Ira -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42286
[Bug tree-optimization/42108] [4.4/4.5 Regression] Vectorizer cannot deal with PAREN_EXPR gracefully, 50% performance regression
--- Comment #20 from irar at il dot ibm dot com 2009-11-30 08:52 --- Actually, PAREN_EXPRs are vectorizable (the support was added by you, Richard, in your original PAREN_EXPR patch http://gcc.gnu.org/viewcvs?limit_changes=0view=revisionrevision=132515 )). The problem here is that vectorizable_assignment does not support multiple types. The attached patch adds this support, but I don't know if the patch is suitable for the current stage... Ira -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108
[Bug tree-optimization/42108] [4.4/4.5 Regression] Vectorizer cannot deal with PAREN_EXPR gracefully, 50% performance regression
--- Comment #21 from irar at il dot ibm dot com 2009-11-30 08:54 --- Created an attachment (id=19183) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=19183action=view) Multiple types support patch -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108
[Bug tree-optimization/42108] [4.4/4.5 Regression] Vectorizer cannot deal with PAREN_EXPR gracefully, 50% performance regression
--- Comment #23 from irar at il dot ibm dot com 2009-11-30 12:20 --- Applied: http://gcc.gnu.org/viewcvs?limit_changes=0view=revisionrevision=154794 Thanks, Ira -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108
[Bug middle-end/42193] [4.5 Regression] 454.calculix in SPEC CPU 2006 failed to compile at -O3
-- irar at il dot ibm dot com changed: What|Removed |Added AssignedTo|unassigned at gcc dot gnu |irar at il dot ibm dot com |dot org | Status|NEW |ASSIGNED Last reconfirmed|2009-11-27 10:36:38 |2009-11-29 12:24:11 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42193
[Bug tree-optimization/42108] [4.4/4.5 Regression] Vectorizer cannot deal with PAREN_EXPR gracefully, 50% performance regression
--- Comment #18 from irar at il dot ibm dot com 2009-11-23 09:02 --- I tried to vectorize eval.f90 with 4.3 and mainline on x86_64-suse-linux. In both cases no loop gets vectorized in subroutine eval. The k loop is not vectorizable because the step of x is unknown (function argument), and scalar evolution analysis fails to analyze it. The j loop is not vectorized first of all because of the k loop unknown loop bound (this is on our todo list). Ira -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=42108
[Bug tree-optimization/41879] [4.5 Regression] 172.mgrid regression, vectorizer prevents predictive commoning
--- Comment #5 from irar at il dot ibm dot com 2009-11-12 07:51 --- (In reply to comment #4) I didn't check yet. We'll work on a simple cost-model integration of predcom. You mean, vectorizer cost model will take predcom into account? If the vectorization is not profitable (vs. scalar without predcom), it can be a matter of vectorizer cost model tuning (looks easier). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41879
[Bug tree-optimization/41879] [4.5 Regression] 172.mgrid regression, vectorizer prevents predictive commoning
--- Comment #3 from irar at il dot ibm dot com 2009-11-10 10:02 --- (In reply to comment #0) This causes mgrid score to drop by almost 40% on x86_64 and the vectorized code is pretty bad because it uses unaligned accesses. Is the vectorized code worse than the scalar one even without predcom? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41879
[Bug tree-optimization/41464] vector loads are unnecessarily split into high and low loads
--- Comment #4 from irar at il dot ibm dot com 2009-09-27 08:06 --- (In reply to comment #1) The interesting thing is that data-ref analysis sees 128bit alignment but the vectorizer still produces vect_var_.24_59 = M*vect_p.20_57{misalignment: 0}; D.2564_12 = *D.2563_11; vect_var_.25_61 = vect_var_.24_59 * vect_cst_.26_60; D.2565_13 = D.2564_12 * 2.299523162841796875e+0; M*vect_p.27_64{misalignment: 0} = vect_var_.25_61; thus, unknown misalignment. (instantiate_scev (instantiate_below = 3) (evolution_loop = 1) (chrec = {i_10(D), +, 4}_1) (res = {i_10(D), +, 4}_1)) base_address: i_10(D) offset from base address: 0 constant offset from base address: 0 step: 4 aligned to: 128 base_object: *i_10(D) Creating dr for *D.2562_7 (res = {f_6(D), +, 4}_1)) base_address: f_6(D) offset from base address: 0 constant offset from base address: 0 step: 4 aligned to: 128 base_object: *f_6(D) t2.i:5: note: === vect_enhance_data_refs_alignment === t2.i:5: note: Vectorizing an unaligned access. t2.i:5: note: Vectorizing an unaligned access. aligned to refers to the offset misalignment and not to the misalignment of base. attribute aligned works only for arrays, i.e., declarations, and not for pointer arguments. For pointers the vectorizer only checks TYPE_ALIGN_UNIT of the base type. Ira -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41464
[Bug tree-optimization/41464] vector loads are unnecessarily split into high and low loads
--- Comment #6 from irar at il dot ibm dot com 2009-09-27 09:56 --- (In reply to comment #5) aligned to refers to the offset misalignment and not to the misalignment of base. Hmm, I believe it refers to base + offset + constant offset. tree-data-refs.h: /* Alignment information. ALIGNED_TO is set to the largest power of two that divides OFFSET. */ tree aligned_to; tree-dat-refs.c: DR_ALIGNED_TO (dr) = size_int (highest_pow2_factor (offset_iv.base)); attribute aligned works only for arrays, i.e., declarations, and not for pointer arguments. I have to check that - I believe that in principle it should work. For pointers the vectorizer only checks TYPE_ALIGN_UNIT of the base type. That should be ok. But we need TYPE_ALIGN_UNIT to be 16, and we are checking scalar type here, so without user defined alignment it will be 4. Ira I guess I have to see what's going on here. Richard. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41464
[Bug target/41288] [4.5 Regression] gcc.target/x86_64/abi/test_struct_returning.c regressions on *-apple-darwin* at -m64
--- Comment #9 from irar at il dot ibm dot com 2009-09-08 05:51 --- Looks related to PR 39907. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41288
[Bug tree-optimization/41019] Variate_generator with mt19937 and normal_distribution produces wrong sequence for -O3.
--- Comment #10 from irar at il dot ibm dot com 2009-08-13 11:34 --- Reduced testcase: #include stdlib.h #include stdio.h #define N 4 long int a[N]; int main () { int k; for (k = 0; k N; ++k) a[k] = a[k] != 5 ? 12 : 10; for (k = 0; k N; ++k) printf (%u , a[k]); printf (\n); return 0; } %gcc -O3 t.c % ./a.out 0 0 0 0 %gcc -O2 t.c % ./a.out 12 12 12 12 If the type of 'a' is int, there is no problem. The vectorizer produces almost the same code in both cases (except for number of iterations and types). I am attaching the assembly for int and long int versions. Ira -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41019
[Bug tree-optimization/41019] Variate_generator with mt19937 and normal_distribution produces wrong sequence for -O3.
--- Comment #11 from irar at il dot ibm dot com 2009-08-13 11:36 --- Created an attachment (id=18350) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=18350action=view) The assembly for the long int version (wrong code) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41019
[Bug tree-optimization/41019] Variate_generator with mt19937 and normal_distribution produces wrong sequence for -O3.
--- Comment #12 from irar at il dot ibm dot com 2009-08-13 11:37 --- Created an attachment (id=18351) -- (http://gcc.gnu.org/bugzilla/attachment.cgi?id=18351action=view) The assembly for the int version (correct) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41019
[Bug tree-optimization/41019] Variate_generator with mt19937 and normal_distribution produces wrong sequence for -O3.
--- Comment #6 from irar at il dot ibm dot com 2009-08-12 12:14 --- Looks like a problem in data-ref analysis: Creating dr for this_6(D)-_M_x[__k_87] ... base_address: this_6(D) offset from base address: 0 constant offset from base address: 0 step: 8 aligned to: 128 base_object: this_6(D)-_M_x[0] And the vectorizer creates accesses relatively to this_6(D) (base_address above) with zero offset (instead of this_6(D)-_M_x[0] or with an offset of _M_x). Ira -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41019
[Bug tree-optimization/41008] [4.5 Regression] ICE in vect_is_simple_reduction, at tree-vect-loop.c:1708
--- Comment #3 from irar at il dot ibm dot com 2009-08-09 12:15 --- Fixed. -- irar at il dot ibm dot com changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=41008
[Bug middle-end/37150] vectorizer misses some loops
--- Comment #10 from irar at il dot ibm dot com 2009-08-06 10:49 --- Yes. The problem is that only a basic implementation was added. To vectorize this code several improvements must be done: support stmt group sizes greater than vector size, allow loads and stores to the same location, initiate SLP analysis from groups of loads, support misaligned access, etc. Finding a benchmark could really help to push these items to the top of vectorizer's todo list. Ira -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37150
[Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)
--- Comment #41 from irar at il dot ibm dot com 2009-07-28 08:12 --- That requires pattern recognition. MIN/MAX_EXPR are recognized by the first phiopt pass, so MIN/MAXLOC should be either also recognized there or in the vectorizer. (The phiopt pass transforms if clause to MIN/MAX_EXPR. The vectorizer gets COND_EXPR after if-conversion pass). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067
[Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)
--- Comment #34 from irar at il dot ibm dot com 2009-07-27 08:36 --- (In reply to comment #33) Using the example from comment 23 with ... gfortran shows: test.f90:12: note: not vectorized: unsupported use in stmt. and needs 2.272s. (By comparison. 4.4 needs 3.688s.) This is for the inner loop vectorization. For the outer loop we get: tmp.f90:11: note: not vectorized: control flow in loop. because of the if's. Maybe loop unswitching can help us. Vectorizable outer-loops look like this: (pre-header) | header ---+ | | inner-loop | | | tail --+ | (exit-bb) Does ifort vectorize the exact same implemantion of minloc? Ira -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067
[Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)
--- Comment #38 from irar at il dot ibm dot com 2009-07-27 12:44 --- I am not sure that that kind of computation can be generated automatically, since in general the order of caclulation of cond_expr cannot be changed. However, the loop can be split: for (i = 0; i end; i++) if (arr[i] limit) limit = arr[i]; for (i = 0; i end; i++) if (arr[i] == limit) { pos = i + 1; break; } making the first loop vectorizable (inner-most loop vectorization). Ira -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067
[Bug tree-optimization/40801] internal compiler error: in vect_get_vec_def_for_stmt_copy, at tree-vect-stmts.c:1096
--- Comment #5 from irar at il dot ibm dot com 2009-07-26 07:04 --- Fixed. -- irar at il dot ibm dot com changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution||FIXED http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40801
[Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)
--- Comment #32 from irar at il dot ibm dot com 2009-07-26 07:48 --- (In reply to comment #30) Regarding the just committed inline version: It would be interesting to know whether it is vectorizable (with/without -ffinite-math-only [i.e. -ffast-math]). It depends on where it is inlined. It has to be vectorized in outer loop (see my previous comment), so it needs another loop around it. Ira -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067
[Bug tree-optimization/40770] Vectorization of complex types, vectorization of sincos missing
--- Comment #7 from irar at il dot ibm dot com 2009-07-20 11:18 --- AFAIU, querying for the component type of complex type is not difficult to implement. I think, that loop-based vectorization is preferable here, so we should stay with vectorization factor of 2 for doubles. The next problem is to vectorize D.1611_4 = IMAGPART_EXPR sincostmp.1_1; and D.1612_6 = REALPART_EXPR sincostmp.1_1; Currently, we support only loads and stores with IMAGPART/REALPART_EXPR, vectorizing them as strided accesses, with extract odd and even operations for loads. So, we will have to support interleaving of non-memory variables. Does __builtin_cexpi have a vector implementation? If so, does it return two vectors? If not, I guess, we need something like: sincostmp.1 = __builtin_cexpi (xd[i]); sincostmp.2 = __builtin_cexpi (xd[i+1]); v1 = VEC_EXTRACT_EVEN (sincostmp.1, sincostmp.2); v2 = VEC_EXTRACT_ODD (sincostmp.1, sincostmp.2); sf[i:i+1] = v1; cf[i:i+1] = v2; i = i + 2; Or we can use the two vectors from vectorized __builtin_cexpi as parameters of extract operations. Does that make sense? Ira -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40770
[Bug fortran/31067] MINLOC should sometimes be inlined (gas_dyn is sooooo sloooow)
--- Comment #28 from irar at il dot ibm dot com 2009-07-20 12:03 --- I've just committed a patch that adds support of cond_expr in reductions in nested cycles (http://gcc.gnu.org/ml/gcc-patches/2009-07/msg01124.html). cond_expr cannot be vectorized in reduction of inner-most loop, because such reduction changes the order of computation, and that cannot be done for cond_expr. Ira -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31067
[Bug tree-optimization/40801] internal compiler error: in vect_get_vec_def_for_stmt_copy, at tree-vect-stmts.c:1096
--- Comment #3 from irar at il dot ibm dot com 2009-07-19 09:35 --- Testing a fix. Ira -- irar at il dot ibm dot com changed: What|Removed |Added AssignedTo|unassigned at gcc dot gnu |irar at il dot ibm dot com |dot org | Status|NEW |ASSIGNED Last reconfirmed|2009-07-18 19:15:43 |2009-07-19 09:35:55 date|| http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40801
[Bug tree-optimization/40770] Vectorization of complex types, vectorization of sincos missing
--- Comment #2 from irar at il dot ibm dot com 2009-07-16 12:29 --- pr40770.c:20: note: == examining statement: sincostmp.21_1 = __builtin_cexpi (D.1625_3); pr40770.c:20: note: get vectype for scalar type: complex double pr40770.c:20: note: not vectorized: unsupported data-type complex double make_vector_type returns NULL for this type. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40770
[Bug tree-optimization/40770] Vectorization of complex types, vectorization of sincos missing
--- Comment #6 from irar at il dot ibm dot com 2009-07-16 17:31 --- (In reply to comment #3) make_vector_type returns NULL for this type. Yes - there is no vector type for complex double. But the vectorizer could query for a vector type for the complex component type (double) and divide the vector element count by 2 (for complex) to get the vectorization factor which would be 1 here. I see. Should SLP the be possible for that loop? Not with the current implementation - SLP needs strided stores to start. Here the stores are not even adjacent. I think, it would be better to vectorize this loop with regular loop-based vectorization to avoid permutations. I'll take a better look on Sunday. Ira Richard. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40770