[Bug tree-optimization/52298] ICE: verify_ssa failed: definition in block follows use
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52298 --- Comment #8 from Ulrich Weigand uweigand at gcc dot gnu.org 2012-02-28 23:40:37 UTC --- Author: uweigand Date: Tue Feb 28 23:40:32 2012 New Revision: 184645 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=184645 Log: Partially revert: 2012-02-20 Richard Guenther rguent...@suse.de PR tree-optimization/52298 * tree-vect-stmts.c (vectorizable_load): Properly use STMT_VINFO_DR_STEP instead of DR_STEP when vectorizing outer loops. Modified: trunk/gcc/ChangeLog trunk/gcc/tree-vect-stmts.c
[Bug tree-optimization/52298] ICE: verify_ssa failed: definition in block follows use
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52298 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added Status|ASSIGNED|NEW AssignedTo|jakub at gcc dot gnu.org|unassigned at gcc dot ||gnu.org --- Comment #2 from Jakub Jelinek jakub at gcc dot gnu.org 2012-02-20 08:57:40 UTC --- Unassigning myself, not familiar enough with the nested_in_vect_loop stuff, Richard, could you please have a look at this? Just random comments: Shorter testcase (-O3): int a, b, i[2]; int foo (void) { int l = 0; for (a = 0; a = 3; a++) for (b = 1; b = 0; b--) l |= i[b]; return l; } The immediate problem I see is that in vectorizable_load we initialize negative = tree_int_cst_compare (DR_STEP (dr), size_zero_node) 0; i.e. from the inner loop's step (which is really negative), while we actually use STMT_VINFO_DR_STEP when deciding if it is invariant and how to adjust the read in each loop. STMT_VINFO_DR_STEP is 0 here, i.e. invariant, and vectorizable_load in that case inserts stmts after the original stmt, but as negative is true, we then put a permutation of it (which doesn't make any sense for invariant loads, as all the vector elements are the same) before the original stmt. So, a quick hack of ignoring negative if inv_p !bb_vinfo fixes this, but can't be the right fix. I guess negative flag should be taken from STMT_VINFO_DR_STEP if nested_in_vect_loop, but not sure if always, and the question is what to do with the testing for the negative case, which needs to pass a dr around, and for the outer loop we don't have any dr with the right step etc. I think the testcase shows several other issues, which can be observed also on (-O3): int a, b, i[2]; int foo (void) { int l = 0; for (a = 0; a = 3; a++) for (b = 0; b = 1; b++) l |= i[b]; return l; } 1) it is unfortunately cunrolli pass doesn't do anything here, then the vectorizer wouldn't make such weird decisions; but when cunrolli is run, lim hasn't moved out the b variable loads/stores from the loop yet, and the next complete unrolling happens only after vectorization. Perhaps another inner loop complete unrolling a couple of passes before vectorization would help here, but the question is if it would help even real-world apps. 2) if the outer loop has zero step and inner non-zero step, the question is if it is worthwhile to do the vectorization at all, and at least on the above testcase the cost model should say that it can't be beneficial given the large cost of the reduction for a single iteration. 3) after complete unrolling, we end up with: vect_var_.18_27 = vect_cst_.17_14 | { 0, 0, 0, 0 }; but nothing at the tree level optimizes that away into just vect_var_.18_27 = vect_cst_.17_14; So, the resulting assembly: movdi(%rip), %xmm0 movl$2, b(%rip) movdi+4(%rip), %xmm2 movl$4, a(%rip) pshufd $0, %xmm0, %xmm1 pshufd $0, %xmm2, %xmm0 por %xmm1, %xmm0 movdqa %xmm0, %xmm1 psrldq $8, %xmm1 por %xmm1, %xmm0 movdqa %xmm0, %xmm1 psrldq $4, %xmm1 por %xmm1, %xmm0 movd%xmm0, -12(%rsp) movl-12(%rsp), %eax ret really can't be faster than movli+4(%rip), %edx movli(%rip), %eax movl$2, b(%rip) movl$4, a(%rip) orl %edx, %eax orl %edx, %eax ret (but even for the non-vectorized loop, we should optimize away the second orl). While playing with this, I ended up with a wrong-code testcase as opposed to ice-on-valid: /* { dg-options -O1 -ftree-vectorize -fno-tree-pre -fno-tree-loop-im } */ extern void abort (void); int c[80]; __attribute__((noinline)) int foo (void) { int l = 0; int a, b; for (a = 3; a = 0; a--) for (b = 7; b = 0; b--) l |= c[a+60]; return l; } int main () { int i; for (i = 0; i 60; i++) c[i] = 1; for (; i 64; i++) c[i] = 1 (i - 59); if (foo () != 30) abort (); return 0; } And last minor thing (shouldn't we have a warning for it, at least static code analyzers should and usually do warn about it): vect_analyze_data_ref_access does: tree step = DR_STEP (dr); ... HOST_WIDE_INT dr_step = TREE_INT_CST_LOW (step); ... if (loop_vinfo !step) { if (vect_print_dump_info (REPORT_DETAILS)) fprintf (vect_dump, bad data-ref access in loop); return false; } So, either the if should go, because we'd segfault if step is NULL, or initialization of dr_step needs to be moved after this test.
[Bug tree-optimization/52298] ICE: verify_ssa failed: definition in block follows use
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52298 Richard Guenther rguenth at gcc dot gnu.org changed: What|Removed |Added Status|NEW |ASSIGNED AssignedTo|unassigned at gcc dot |rguenth at gcc dot gnu.org |gnu.org | --- Comment #3 from Richard Guenther rguenth at gcc dot gnu.org 2012-02-20 12:14:17 UTC --- Mine.
[Bug tree-optimization/52298] ICE: verify_ssa failed: definition in block follows use
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52298 --- Comment #4 from Richard Guenther rguenth at gcc dot gnu.org 2012-02-20 13:41:59 UTC --- DOM optimizes the |s with zero if you adjust the integer_..._p predicates like Index: gcc/tree.c === --- gcc/tree.c (revision 184390) +++ gcc/tree.c (working copy) @@ -1714,12 +1714,25 @@ integer_zerop (const_tree expr) { STRIP_NOPS (expr); - return ((TREE_CODE (expr) == INTEGER_CST - TREE_INT_CST_LOW (expr) == 0 - TREE_INT_CST_HIGH (expr) == 0) - || (TREE_CODE (expr) == COMPLEX_CST - integer_zerop (TREE_REALPART (expr)) - integer_zerop (TREE_IMAGPART (expr; + switch (TREE_CODE (expr)) +{ +case INTEGER_CST: + return (TREE_INT_CST_LOW (expr) == 0 + TREE_INT_CST_HIGH (expr) == 0); +case COMPLEX_CST: + return (integer_zerop (TREE_REALPART (expr)) + integer_zerop (TREE_IMAGPART (expr))); +case VECTOR_CST: + { + tree elt; + for (elt = TREE_VECTOR_CST_ELTS (expr); elt; elt = TREE_CHAIN (elt)) + if (!integer_zerop (TREE_VALUE (elt))) + return false; + return true; + } +default: + return false; +} } /* Return 1 if EXPR is the integer constant one or the corresponding @@ -1730,12 +1743,25 @@ integer_onep (const_tree expr) { STRIP_NOPS (expr); - return ((TREE_CODE (expr) == INTEGER_CST - TREE_INT_CST_LOW (expr) == 1 - TREE_INT_CST_HIGH (expr) == 0) - || (TREE_CODE (expr) == COMPLEX_CST - integer_onep (TREE_REALPART (expr)) - integer_zerop (TREE_IMAGPART (expr; + switch (TREE_CODE (expr)) +{ +case INTEGER_CST: + return (TREE_INT_CST_LOW (expr) == 1 + TREE_INT_CST_HIGH (expr) == 0); +case COMPLEX_CST: + return (integer_onep (TREE_REALPART (expr)) + integer_zerop (TREE_IMAGPART (expr))); +case VECTOR_CST: + { + tree elt; + for (elt = TREE_VECTOR_CST_ELTS (expr); elt; elt = TREE_CHAIN (elt)) + if (!integer_onep (TREE_VALUE (elt))) + return false; + return true; + } +default: + return false; +} } /* Return 1 if EXPR is an integer containing all 1's in as much precision as @@ -1754,6 +1780,15 @@ integer_all_onesp (const_tree expr) integer_zerop (TREE_IMAGPART (expr))) return 1; + else if (TREE_CODE (expr) == VECTOR_CST) +{ + tree elt; + for (elt = TREE_VECTOR_CST_ELTS (expr); elt; elt = TREE_CHAIN (elt)) + if (!integer_all_onesp (TREE_VALUE (elt))) + return 0; + return 1; +} + else if (TREE_CODE (expr) != INTEGER_CST) return 0; I'll queue that for 4.8.
[Bug tree-optimization/52298] ICE: verify_ssa failed: definition in block follows use
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52298 --- Comment #5 from Richard Guenther rguenth at gcc dot gnu.org 2012-02-20 14:14:48 UTC --- Not sure why in vect_model_reduction_cost we do not consider the reduction at all if nested_in_vect_loop_p (loop, orig_stmt). I think simply dropping that conditional would model the cost more appropriately (though the question is how outer loop vectorization uses the costs).
[Bug tree-optimization/52298] ICE: verify_ssa failed: definition in block follows use
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52298 --- Comment #6 from Richard Guenther rguenth at gcc dot gnu.org 2012-02-20 15:16:00 UTC --- Author: rguenth Date: Mon Feb 20 15:15:52 2012 New Revision: 184396 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=184396 Log: 2012-02-20 Richard Guenther rguent...@suse.de PR tree-optimization/52298 * tree-vect-stmts.c (vectorizable_store): Properly use STMT_VINFO_DR_STEP instead of DR_STEP when vectorizing outer loops. (vectorizable_load): Likewise. * tree-vect-data-refs.c (vect_analyze_data_ref_access): Access DR_STEP after ensuring it is not NULL. * gcc.dg/torture/pr52298.c: New testcase. * gcc.dg/vect/pr52298.c: Likewise. Added: trunk/gcc/testsuite/gcc.dg/torture/pr52298.c trunk/gcc/testsuite/gcc.dg/vect/pr52298.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-vect-data-refs.c trunk/gcc/tree-vect-stmts.c
[Bug tree-optimization/52298] ICE: verify_ssa failed: definition in block follows use
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52298 Richard Guenther rguenth at gcc dot gnu.org changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution||FIXED --- Comment #7 from Richard Guenther rguenth at gcc dot gnu.org 2012-02-20 15:16:25 UTC --- Fixed.
[Bug tree-optimization/52298] ICE: verify_ssa failed: definition in block follows use
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52298 Jakub Jelinek jakub at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |ASSIGNED Last reconfirmed||2012-02-18 Component|c |tree-optimization CC||jakub at gcc dot gnu.org AssignedTo|unassigned at gcc dot |jakub at gcc dot gnu.org |gnu.org | Ever Confirmed|0 |1 Target Milestone|--- |4.7.0 --- Comment #1 from Jakub Jelinek jakub at gcc dot gnu.org 2012-02-18 10:05:50 UTC --- Confirmed, looking into it.