[Bug tree-optimization/33869] [4.3 Regression] ICE verify_ssa failed (missing definition for SSA_NAME)
--- Comment #11 from dorit at il dot ibm dot com 2007-10-30 05:48 --- (In reply to comment #6) Richard, is this related to the issue you reported in http://gcc.gnu.org/ml/gcc-patches/2007-10/msg01127.html (looks like the same error)? Any idea why the fix you committed doesn't cover this case? (I haven't looked into this PR yet, it just reminded me of that thread) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33869
[Bug tree-optimization/25371] -ftree-vectorize results in internal compiler error on AMD64
--- Comment #12 from dorit at il dot ibm dot com 2007-07-01 09:30 --- > Subject: Re: -ftree-vectorize results in internal compiler error on AMD64 > Zdenek's patch for cleaning the dataref analysis is also fixing this bug. > http://gcc.gnu.org/ml/gcc-patches/2007-05/msg00634.html So now that Zdenek's patch went in, can someone confirm if this problem doesn't occur anymore on x86_64? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25371
[Bug tree-optimization/24659] Conversions are not vectorized
--- Comment #19 from dorit at il dot ibm dot com 2007-06-29 16:46 --- testing this patch for Altivec: Index: config/rs6000/altivec.md === *** config/rs6000/altivec.md(revision 126053) --- config/rs6000/altivec.md(working copy) *** *** 147,152 --- 147,156 (UNSPEC_VPERMHI321) (UNSPEC_INTERHI 322) (UNSPEC_INTERLO 323) +(UNSPEC_VUPKHS_V4SF 324) +(UNSPEC_VUPKLS_V4SF 325) +(UNSPEC_VUPKHU_V4SF 326) +(UNSPEC_VUPKLU_V4SF 327) ]) (define_constants *** *** 2933,2935 --- 2937,2995 emit_insn (gen_altivec_vmrgl (operands[0], operands[1], operands[2])); DONE; }") + + (define_expand "vec_unpacks_float_hi_v8hi" + [(set (match_operand:V4SF 0 "register_operand" "") + (unspec:V4SF [(match_operand:V8HI 1 "register_operand" "")] + UNSPEC_VUPKHS_V4SF))] + "TARGET_ALTIVEC" + " + { + rtx tmp = gen_reg_rtx (V4SImode); + + emit_insn (gen_vec_unpacks_hi_v8hi (tmp, operands[1])); + emit_insn (gen_altivec_vcfsx (operands[0], tmp, const0_rtx)); + DONE; + }") + + (define_expand "vec_unpacks_float_lo_v8hi" + [(set (match_operand:V4SF 0 "register_operand" "") + (unspec:V4SF [(match_operand:V8HI 1 "register_operand" "")] + UNSPEC_VUPKLS_V4SF))] + "TARGET_ALTIVEC" + " + { + rtx tmp = gen_reg_rtx (V4SImode); + + emit_insn (gen_vec_unpacks_lo_v8hi (tmp, operands[1])); + emit_insn (gen_altivec_vcfsx (operands[0], tmp, const0_rtx)); + DONE; + }") + + (define_expand "vec_unpacku_float_hi_v8hi" + [(set (match_operand:V4SF 0 "register_operand" "") + (unspec:V4SF [(match_operand:V8HI 1 "register_operand" "")] + UNSPEC_VUPKHU_V4SF))] + "TARGET_ALTIVEC" + " + { + rtx tmp = gen_reg_rtx (V4SImode); + + emit_insn (gen_vec_unpacku_hi_v8hi (tmp, operands[1])); + emit_insn (gen_altivec_vcfux (operands[0], tmp, const0_rtx)); + DONE; + }") + + (define_expand "vec_unpacku_float_lo_v8hi" + [(set (match_operand:V4SF 0 "register_operand" "") + (unspec:V4SF [(match_operand:V8HI 1 "register_operand" "")] + UNSPEC_VUPKLU_V4SF))] + "TARGET_ALTIVEC" + " + { + rtx tmp = gen_reg_rtx (V4SImode); + + emit_insn (gen_vec_unpacku_lo_v8hi (tmp, operands[1])); + emit_insn (gen_altivec_vcfux (operands[0], tmp, const0_rtx)); + DONE; + }") -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24659
[Bug rtl-optimization/32084] gfortran 4.3 13%-18% slower for induct.f90 than gcc 4.0-based competitor
--- Comment #5 from dorit at il dot ibm dot com 2007-06-27 11:57 --- (In reply to comment #4) > (In reply to comment #3) > > The problem is in -ftree-vectorize > The difference is, that without -ftree-vectorize the inner loop (do k = 1, 9) > is completely unrolled, but with vectorization, the loop is vectorized, but > _not_ unrolled. Since the vectorization factor is only 2 for V2DF mode > vectors, > we loose big time at this point. > My best guess for unroller problems would be rtl-optimization. Could it be the tree-level complete unroller? (does the vectorizer peel the loop to handle a misaligned store by any chance? if so, and if the misalignment amount is unknown, then the number of iterations of the vectorized loop is unknown, in which case the complete unroller wouldn't work). In autovect-branch the tree-level complete unroller is before the vectorizer - wonder what happens there. Another thing to consider is using -fvect-cost-model (it's very perliminary and hasn't been tuned much, but this could be a good data point for whoever wants to tune the vectorizer cost-model for x86_64). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32084
[Bug tree-optimization/32378] can't determine dependence (distinct sections of an array)
--- Comment #4 from dorit at il dot ibm dot com 2007-06-18 11:08 --- I see this in the vectorizer dump file (with mainline from a few days ago): (compute_affine_dependence (stmt_a = D.1423_50 = (*a_49(D))[D.1422_48]) (stmt_b = (*a_49(D))[D.1420_51] = D.1425_54) Data ref a: (Data Ref: stmt: D.1423_50 = (*a_49(D))[D.1422_48]; ref: (*a_49(D))[D.1422_48]; base_object: (*a_49(D))[0]; Access function 0: {pretmp.48_45 + 1, +, 1}_1 Access function 1: 0B ) Data ref b: (Data Ref: stmt: (*a_49(D))[D.1420_51] = D.1425_54; ref: (*a_49(D))[D.1420_51]; base_object: (*a_49(D))[0]; Access function 0: {0, +, 1}_1 Access function 1: 0B ) affine dependence test not usable: access function not affine or constant. (dependence classified: scev_not_known) ) (compute_affine_dependence (stmt_a = D.1424_53 = (*b_52(D))[D.1420_51]) (stmt_b = (*a_49(D))[D.1420_51] = D.1425_54) ) (the IR looks a bit different than PR32075, but the data-rependence analysis fails with the same problem). pinskia - are you still planning to address this issue? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32378
[Bug tree-optimization/32075] can't determine dependence between p->a[x+i] and p->a[x+i+1] where x is invariant but defined in the function
--- Comment #2 from dorit at il dot ibm dot com 2007-06-18 11:03 --- I see this in the vectorizer dump file (with mainline from a few days ago): (compute_affine_dependence (stmt_a = D.3027_19 = p_7->a[D.3026_18]) (stmt_b = p_7->a[D.3025_17] = D.3027_19) Data ref a: (Data Ref: stmt: D.3027_19 = p_7->a[D.3026_18]; ref: p_7->a[D.3026_18]; base_object: p_7->a[0]; Access function 0: {x1_5 + 1, +, 1}_2 Access function 1: 0B ) Data ref b: (Data Ref: stmt: p_7->a[D.3025_17] = D.3027_19; ref: p_7->a[D.3025_17]; base_object: p_7->a[0]; Access function 0: {x1_5, +, 1}_2 Access function 1: 0B ) affine dependence test not usable: access function not affine or constant. (dependence classified: scev_not_known) ) (In reply to comment #1) > Ok, I have a patch for this issue, I am going to test it with -ftree-vectorize so how is that coming along? do you think it will also address PRs 32375/6/7/8/9 ? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32075
[Bug target/32274] FAIL: gcc.dg/vect/pr32224.c
--- Comment #1 from dorit at il dot ibm dot com 2007-06-13 08:41 --- Sorry about the breakage. Does it work for you if you change the testcase as follows?: Index: pr32224.c === --- pr32224.c (revision 125641) +++ pr32224.c (working copy) @@ -10,7 +10,7 @@ for (i = 0; i < count; i++) { -__asm__ ("bswap %q0": "=r" (*__dst):"0" (*(__src))); +__asm__ ("checkme": "=r" (*__dst):"0" (*(__src))); __src++; } } -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32274
[Bug tree-optimization/32309] Unnecessary conversion from short to unsigend short breaks vectorization
--- Comment #4 from dorit at il dot ibm dot com 2007-06-12 17:46 --- it's on my (long) todo list... -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32309
[Bug tree-optimization/32224] [4.3 Regression] ICE in vect_analyze_operations, at tree-vect-analyze.c:374
--- Comment #4 from dorit at il dot ibm dot com 2007-06-07 18:40 --- You're right. I'm testing this obvious patch: Index: tree-vect-analyze.c === *** tree-vect-analyze.c (revision 125526) --- tree-vect-analyze.c (working copy) *** vect_determine_vectorization_factor (loo *** 173,181 print_generic_expr (vect_dump, stmt, TDF_SLIM); } - if (TREE_CODE (stmt) != GIMPLE_MODIFY_STMT) - continue; - gcc_assert (stmt_info); /* skip stmts which do not need to be vectorized. */ --- 173,178 *** vect_determine_vectorization_factor (loo *** 187,192 --- 184,199 continue; } + if (TREE_CODE (stmt) != GIMPLE_MODIFY_STMT) + { + if (vect_print_dump_info (REPORT_UNVECTORIZED_LOOPS)) + { + fprintf (vect_dump, "not vectorized: irregular stmt."); + print_generic_expr (vect_dump, stmt, TDF_SLIM); + } + return false; + } + if (!GIMPLE_STMT_P (stmt) && VECTOR_MODE_P (TYPE_MODE (TREE_TYPE (stmt { -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32224
[Bug tree-optimization/32216] [4.3 Regression] ICE: verify_stmts failed (invalid reference prefix) with -ftree-vectorize
--- Comment #5 from dorit at il dot ibm dot com 2007-06-06 08:33 --- (In reply to comment #4) > (In reply to comment #3) > > Probably something similar is required for the VEC_UNPACK_FLOAT_*_EXPR > > tree-codes ? > But these tree-codes are already there: sorry, I guess I was looking at autovect-branch or something -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32216
[Bug tree-optimization/32216] [4.3 Regression] ICE: verify_stmts failed (invalid reference prefix) with -ftree-vectorize
--- Comment #3 from dorit at il dot ibm dot com 2007-06-06 03:28 --- (In reply to comment #1) veclower expands things when it wrongly concludes that they are not supported by the target in vecor mode. For demotion/promotion/conversion kinda operations this may be because it does not check the optab table using the right type. For example, I had to add the following in expand_vector_operations_1: " /* For widening/narrowing vector operations, the relevant type is of the arguments, not the widened result. */ if (code == WIDEN_SUM_EXPR || code == VEC_WIDEN_MULT_HI_EXPR || code == VEC_WIDEN_MULT_LO_EXPR || code == VEC_UNPACK_HI_EXPR || code == VEC_UNPACK_LO_EXPR || code == VEC_PACK_TRUNC_EXPR || code == VEC_PACK_SAT_EXPR) type = TREE_TYPE (TREE_OPERAND (rhs, 0)); " Probably something similar is required for the VEC_UNPACK_FLOAT_*_EXPR tree-codes ? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32216
[Bug target/32107] New: bad codegen for vector initialization in Altivec
Compiling the folloxing testcase: #define vector __attribute__((__vector_size__(16) )) float fa[100] __attribute__ ((__aligned__(16))); vector float foo () { float f = fa[0]; vector float vf = {f, f, f, f}; return vf; } ...with gcc -O2 -maltivec, we get: ld r9,0(r2) lfs f0,0(r9) addir9,r1,-16 stfsf0,-16(r1) lvewx v2,r0,r9 vspltw v2,v2,0 blr My problem is with the {lfs,stfs,lvewx} sequence: we load a value into f0, and then store it (with stfs) into an aligned memory location, so that it could be loaded from there into a vector (with lvewx). However, since the address from which f0 was loaded is known to be aligned, we could directly do an lvewx from there, and avoid the extra {lfs,stfs}, so the following should be enough: ld r9,0(r2) lvewx v2,r0,r9 vspltw v2,v2,0 blr The problem is that rs6000_expand_vector_init doesn't know that f0 is originated from an aligned address. It gets the following as vals: (parallel:V4SF [ (reg/v:SF 119 [ f ]) (reg/v:SF 119 [ f ]) (reg/v:SF 119 [ f ]) (reg/v:SF 119 [ f ]) ]) We somehow want to expand 'f = fa[0]' and '{f,f,f,f}' together... if expand_vector_init could get this as vals: '{fa[0],fa[0],fa[0],fa[0]}', it could see that the original address is aligned. Alternatively, the prospects of getting rid of the redundant load and store later on during some kind of a peephole optimization don't seem so high to me... Thoughts? This may be related to PR31334 (though there the issue is about initialization with constants, so I'm not sure if the idea for a solution proposed there would help us here). -- Summary: bad codegen for vector initialization in Altivec Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: dorit at il dot ibm dot com GCC build triplet: powerpc-linux GCC host triplet: powerpc-linux GCC target triplet: powerpc-linux http://gcc.gnu.org/bugzilla/show_bug.cgi?id=32107
[Bug middle-end/31738] Fortran dot product vectorization is restricted
--- Comment #3 from dorit at il dot ibm dot com 2007-05-16 20:45 --- (In reply to comment #2) > Here is what happens in the three loops that don't get vectorized: > (1) the loop in testvectdp2: ... > so the vectorizer is ok, except that in this case D.1437_32 doesn't seem to > > be used anywhere in the function, so this stmt looks dead to me, but for > some reason it is not cleaned away before the vectorizer... Still need to > investigate why. So looks like the stmt D.1437_32 = prephitmp.192_37 became dead by pass pr31738a.f90.089t.copyprop3. So the question is what's the most appropriate fix: (1) fix copyprop3 to also clean away any dead code it creates? (2) add a dce pass after copyprop3? (3) work around it in the vectorizer. I think it should be easy - just move the check of the uses of the reduction in the loop until after the vectorizer analysis pass that marks relevant stmts. If (3) sounds like the way to go - I can prepare a patch for that. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31738
[Bug tree-optimization/31946] New: missed vectorization due to too strict peeling-for-alignment policy
The vectorizer is too restricted in the way it decides by how many iterations to peel a loop in order to align a certain memory reference in a loop. It considers only the first (potentially) misaligned store it encounters in the loop. For this reason the testcases vect-multitypes-1.c, vect-multitypes-4.c and vect-iv-4.c don't get vectorized. For example (using Vector Size of 16 bytes), in vect-multitypes-1.c we have: short sa[N], sb[N]; int ia[N], ib[N]; for (i = 0; i < n; i++) { ia[i+3] = ib[i]; sa[i+3] = sb[i]; } The current peeling-for-alignment scheme will consider the 'ia[i+3]' access for peeling, and therefore will examine the option of using a peeling factor = (4-3)%4 = 1. This will not align the access 'sa[i+3]', for which we need to peel 5 iterations. As a result the loop doesn't get vectorized (cause we currently can't handle misaligned stores unless we align them by peeling). However, if we had considered the 'sa[i+3]' access as well for peeling, we would have examined the option of using a peeling factor = (8-3)%8 = 5, which would align both accesses, and would allow us to vectorize the loop. So the vectorizer needs to be extended to consider more peeling factors, and not just one. -- Summary: missed vectorization due to too strict peeling-for- alignment policy Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: dorit at il dot ibm dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31946
[Bug tree-optimization/31945] New: missing type vector conversions patterns on spu
Since the following patch: 2007-04-22 Uros Bizjak <[EMAIL PROTECTED]> PR tree-optimization/24659 GCC supports vectorization of float<-->double conversions. These can also be modelled for the spu target by implementing the following patterns: vec_pack_trunc_v2df vec_unpacks_lo_v4sf vec_unpacks_hi_v4sf (also see PR24659) -- Summary: missing type vector conversions patterns on spu Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: dorit at il dot ibm dot com GCC build triplet: spu GCC host triplet: spu GCC target triplet: spu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31945
[Bug tree-optimization/25809] missed PRE optimization - move "invariant casts" out of loops
--- Comment #8 from dorit at il dot ibm dot com 2007-05-09 07:14 --- > So I guess this should be handled somewhere else. I'll open a new > missed-optimization PR instead (not against PRE this time). thanks. This is now PR31873 -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25809
[Bug tree-optimization/31873] New: missed optimization: we don't move "invariant casts" out of loops
This PR was originally opened against PRE (PR25809), but turns out PRE can't solve this problem, so here's a new PR instead: In testcases that have reduction, like gcc.dg/vect/vect-reduc-2char.c and gcc.dg/vect-reduc-2short.c, the following casts appear: signed char sdiff; unsigned char ux, udiff; sdiff_0 = ... loop: # sdiff_41 = PHI ; . ux_36 = udiff_37 = (unsigned char) sdiff_41; udiff_38 = x_36 + udiff_37; sdiff_39 = (signed char) udiff_38; end_loop although these casts could be taken out of loop all together. i.e., transform the code into something like the following: signed char sdiff; unsigned char ux, udiff; sdiff_0 = ... udiff_1 = (unsigned char) sdiff_0; loop: # udiff_3 = PHI ; . ux_36 = udiff_2 = ux_36 + udiff_3; end_loop sdiff_39 = (signed char) udiff_2; see this discussion thread: http://gcc.gnu.org/ml/gcc-patches/2005-12/msg01827.html -- Summary: missed optimization: we don't move "invariant casts" out of loops Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: dorit at il dot ibm dot com GCC host triplet: powerpc64-linux http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31873
[Bug middle-end/31738] Fortran dot product vectorization is restricted
--- Comment #2 from dorit at il dot ibm dot com 2007-05-08 21:00 --- Here is what happens in the three loops that don't get vectorized: (1) the loop in testvectdp2: This is the loop we analyze: # prephitmp.192_37 = PHI # i_1 = PHI <1(3), i_44(5)> :; D.1437_32 = prephitmp.192_37; D.1438_33 = (int8) i_1; D.1439_34 = D.1438_33 + -1; D.1440_36 = (*a_35(D))[D.1439_34]; D.1441_40 = (*b_39(D))[D.1439_34]; D.1442_41 = D.1441_40 * D.1440_36; D.1443_42 = prephitmp.192_37 + D.1442_41; storetmp.191_38 = D.1443_42; c__lsm.199_17 = D.1443_42; i_44 = i_1 + 1; if (i_1 == D.1429_5) goto (); else goto (); We recognize the reduction, but we think that it is used in the loop: pr31738.f90:14: note: reduction used in loop. and indeed, prephitmp.192_37 is used in: D.1443_42 = prephitmp.192_37 + D.1442_41; which is ok, because this is the reduction stmt, but also used here: D.1437_32 = prephitmp.192_37; which is indeed something that we normally don't allow. so the vectorizer is ok, except that in this case D.1437_32 doesn't seem to be used anywhere in the function, so this stmt looks dead to me, but for some reason it is not cleaned away before the vectorizer... Still need to investigate why. (2) the loop in testvecm: This looks like the problem reported in PR31756: failed to compute offset or step for (*a.0_11)[D.1559_52] create_data_ref: failed to create a dr for (*a.0_11)[D.1559_52] pr31738.f90:24: note: not vectorized: unhandled data-ref (3) the loop in testvecm2 Same story (the PR31756 problem): failed to compute offset or step for (*a.0_10)[D.1509_52] create_data_ref: failed to create a dr for (*a.0_10)[D.1509_52] pr31738.f90:32: note: not vectorized: unhandled data-ref -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31738
[Bug middle-end/31699] [4.3 Regression] -march=opteron -ftree-vectorize generates wrong code
--- Comment #6 from dorit at il dot ibm dot com 2007-05-02 20:38 --- patch: http://gcc.gnu.org/ml/gcc-patches/2007-05/msg00111.html -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31699
[Bug testsuite/31589] gcc.dg/vect failures due to missing target specifiers
--- Comment #4 from dorit at il dot ibm dot com 2007-04-27 05:44 --- patch: http://gcc.gnu.org/ml/gcc-patches/2007-04/msg01739.html requires retesting on ia64 before I can commit it. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31589
[Bug middle-end/31699] [Regression 4.3] -march=opteron -ftree-vectorize generates wrong code
--- Comment #4 from dorit at il dot ibm dot com 2007-04-26 19:37 --- I'm testing the attched patch. The problem is that we don't compute the peel factor correctly (when peeling to align a store) when we have multiple data-types in the loop (the computation assumes that VF is the number of elements in a vector, but that doesn't hold for all the datarefs in the loop if their types are of different sizes) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31699
[Bug middle-end/31699] [Regression 4.3] -march=opteron -ftree-vectorize generates wrong code
--- Comment #3 from dorit at il dot ibm dot com 2007-04-26 19:34 --- Created an attachment (id=13450) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=13450&action=view) patch -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31699
[Bug fortran/31615] testsuite failure in gfortran.dg/vect/vect-5.f90
--- Comment #7 from dorit at il dot ibm dot com 2007-04-25 21:30 --- > Are you going to submit/install your patch? yes, I'll go ahead and submit the patch -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31615
[Bug fortran/31615] testsuite failure in gfortran.dg/vect/vect-5.f90
--- Comment #5 from dorit at il dot ibm dot com 2007-04-19 07:27 --- (In reply to comment #4) > (In reply to comment #3) > > But then I wonder why we don't see the same failure on ia64? > Because the failing part of the testcase is only done on ilp32 targets: > ! { dg-final { scan-tree-dump-times "Alignment of access forced using > versioning." 3 "vect" { target { ilp32 && vect_no_align > } } } } ah, ok. so, in that case we probably want to just change the '3' to '2' in the above test: Index: testsuite/gfortran.dg/vect/vect-5.f90 === --- testsuite/gfortran.dg/vect/vect-5.f90 (revision 123954) +++ testsuite/gfortran.dg/vect/vect-5.f90 (working copy) @@ -38,7 +38,7 @@ ! { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" } } ! { dg-final { scan-tree-dump-times "Alignment of access forced using peeling" 1 "vect" { xfail { vect_no_align } } } } ! { dg-final { scan-tree-dump-times "Vectorizing an unaligned access" 1 "vect" { xfail { vect_no_align } } } } -! { dg-final { scan-tree-dump-times "Alignment of access forced using versioning." 3 "vect" { target { ilp32 && vect_no_align } } } } +! { dg-final { scan-tree-dump-times "Alignment of access forced using versioning." 2 "vect" { target { ilp32 && vect_no_align } } } } ! We also expect to vectorize one loop for lp64 targets that support ! misaligned access: -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31615
[Bug fortran/31615] testsuite failure in gfortran.dg/vect/vect-5.f90
--- Comment #3 from dorit at il dot ibm dot com 2007-04-18 10:18 --- > Created dump file using -fdump-tree-vect-details thanks. So I don't understand why we expect to version for 3 different data-references, since there are only 2 in the loop that is vectorized. But then I wonder why we don't see the same failure on ia64? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31615
[Bug fortran/31615] testsuite failure in gfortran.dg/vect/vect-5.f90
--- Comment #1 from dorit at il dot ibm dot com 2007-04-18 06:42 --- could you please provide the .vect dump file, as generated with -fdump-tree-vect-details? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31615
[Bug testsuite/31589] gcc.dg/vect failures due to missing target specifiers
--- Comment #2 from dorit at il dot ibm dot com 2007-04-17 20:10 --- > 2 more are under investigation: > no-section-anchors-vect-69.c > vect-reduc-dot-u16a.c In the first testcase, the vectorizer can only prove that the data reference in the third loop is aligned on 8 bytes. This is enough for targets like ia64 in which the vector size is 8 bytes, and therefore we don't need to peel in order to force alignment for this loop. So overall in this testcase we peel only twice. On targets that require 16byte alignment, a guaranteed 8bytes alignment is not enough, and therefore we peel this loop to align the data-reference (and overall in the testcase we peel 3 times). I guess the way to solve this is to add a keyword that lists the targets with 8byte-wide-vectors and targets with 16byte-wide-vectors, or just hard code the targets that are expected to fail/pass here. I'll sleep on it and supply a patch soon. The second test needs the same fix as a lot of the other tests: add { target vect_pack_mod } to the check. This is because the loop in main has a cast from int to short in it. However, in this testcase we already have two target keywords that we are checking: { target { vect_short_mult && vect_widen_sum_hi_to_si } }, and I don't think the testsuite engine currently provides the flexibility to and a third keyword, so I suggest to just change the loop slightly to avoid the cast (it's not the point of this testcase anyway): Index: vect-reduc-dot-u16a.c === --- vect-reduc-dot-u16a.c (revision 123909) +++ vect-reduc-dot-u16a.c (working copy) @@ -30,7 +30,7 @@ int main (void) { unsigned int dot1; - int i; + unsigned short i; check_vect (); -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31589
[Bug tree-optimization/25809] missed PRE optimization - move "invariant casts" out of loops
--- Comment #7 from dorit at il dot ibm dot com 2007-04-17 19:31 --- > so I will look into it. (for reference: http://gcc.gnu.org/ml/gcc-patches/2007-04/msg01103.html). So I guess this should be handled somewhere else. I'll open a new missed-optimization PR instead (not against PRE this time). thanks. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25809
[Bug tree-optimization/25809] missed PRE optimization - move "invariant casts" out of loops
--- Comment #6 from dorit at il dot ibm dot com 2007-04-17 07:38 --- > can you please send me the patch so that I could look at this failures before > you close this PR? I'm going over my inbox top down, so I just saw that you had laready sent the patch... so I will look into it. (thanks!) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25809
[Bug tree-optimization/25809] missed PRE optimization - move "invariant casts" out of loops
--- Comment #5 from dorit at il dot ibm dot com 2007-04-17 07:22 --- > Doing cast motion actually causes about 25 *more* failures in the vectorizer > testsuite. > I'm closing this as won't fix since it seems there was no other reason to do > this. can you please send me the patch so that I could look at this failures before you close this PR? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25809
[Bug fortran/31561] FAIL: gfortran.dg/vect/vect-4.f90
--- Comment #4 from dorit at il dot ibm dot com 2007-04-14 09:38 --- > I think the only thing that really matters is that the loop is vectorized. I > don't think the alignment details are important checking, even on platforms > where they are relevant. So we should remove all scan-tree-dump-times except > the first one, I guess. I think it's not such a bad idea to check that the handling of alignment is working as expected. Also, it's relevant both for targets that support alignment, and for targets that don't, because on those that don't - we can still vectorize misaligned accesses using loop-versioning. > I'm adding Ira and Dorit to the CC list, as they wrote and modified the > original test. Ira, Dorit, I'm not sure how to proceed here, do you agree with > the paragraph above about what is the right thing to do? see - http://gcc.gnu.org/ml/gcc/2007-04/msg00479.html. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31561
[Bug target/31334] Bad codegen for vector initializer with constants prop'd into a vector initializer
--- Comment #8 from dorit at il dot ibm dot com 2007-04-03 20:46 --- (In reply to comment #7) > Something like: > (define_insn_and_split "altivec_dup" > [(set (match_operand:V 0 "register_operand" "v") > (vec_duplicate: (match_operand: 0 "r"))) >(clobber (match_operand:V 3 "memory_operand" "=Z"))] > "TARGET_ALTIVEC" > "#" > "&& reload_completed" > > Which then will be generated from rs6000_expand_vector_init. I can write this > if you want, it is just I cannot test this until Monday. yes, please... I'll be very happy to see this fixed. (thanks!!) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31334
[Bug target/25413] wrong alignment or incorrect address computation in vectorized code on Pentium 4 SSE
--- Comment #6 from dorit at il dot ibm dot com 2007-04-03 20:22 --- So I see Devang had sent a patch for this over a year ago: http://gcc.gnu.org/ml/gcc-patches/2006-03/msg00167.html I don't know what ever happened to it. Maybe you want to give it a try? (you may need to implement the new target hook for Pentium4). If you have problems applying the patch (it is a bit old) - I could try to help update the patch (not before next week though). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25413
[Bug tree-optimization/31460] if(a) a[i] = xxx; else a[i] = yyy; is not converted to if (a) ddd= xxx; else ddd = yyy; a[i] = ddd;
--- Comment #4 from dorit at il dot ibm dot com 2007-04-03 19:56 --- yes, this is indeed a known problem (I don't know if there's a PR open for it). It is one of the tree-ifcvt enhancements that Victor was going to tackle for 4.3 (item (2.3) in http://gcc.gnu.org/wiki/AutovectBranchOptimizations?). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31460
[Bug target/31334] New: Bad codegen for vectorized induction with altivec
Turns out the code we are generating for vectorized induction ppc is quite terrible - the vector induction variable is advanced by a constant step in the loop (e.g., {4,4,4,4} as in the testcase below). This is the sequence gcc currently creates for altivec in order to generate the {4,4,4,4} vector: li 0,4 stw 0,-48(1) lvewx 0,0,9 vspltw 0,0,0 So, one thing to figure out is why we don't use the immediate form of the splat (vspltiw); The other is - why this sequence ends up getting generated not only before the loop (see insns marked with "<<<1" below), but also inside the loop... (see insns marked with "<<<2" below). This is the testcase (it is basically the testcase gcc.dg/vect/no-tree-scev-cprop-vect-iv-1.c with larger loop count to avoid complete unrolling): int main1 (int X) { int s = X; int i; for (i = 0; i < 96; i++) s += i; return s; } compiled as follows: gcc -O2 -ftree-vectorize -maltivec -fno-tree-scev-cprop -S t.c li 0,4 <<<1 stw 0,-48(1)<<<1 ld 9,[EMAIL PROTECTED](2) li 0,23 mr 11,3 mtctr 0 lvx 1,0,9 addi 9,1,-48 vor 13,1,1 lvewx 0,0,9 <<<1 vspltw 0,0,0<<<1 vadduwm 1,1,0 .p2align 4,,15 .L2: li 0,4 <<<2 addi 9,1,-48 vadduwm 13,13,1 stw 0,-48(1)<<<2 lvewx 0,0,9 <<<2 vspltw 0,0,0<<<2 vadduwm 1,1,0 bdnz .L2 -- Summary: Bad codegen for vectorized induction with altivec Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: dorit at il dot ibm dot com GCC build triplet: powerpc-linux GCC host triplet: powerpc-linux GCC target triplet: powerpc-linux http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31334
[Bug tree-optimization/31333] New: ICE with -fno-tree-dominator-opts -ftree-vectorize -msse
The testcase gcc.dg/vect/no-tree-dom-vect-bug.c ICEs on i386-linux when compiled as follows: gcc no-tree-dom-vect-bug.c -O2 -fno-tree-dominator-opts -ftree-vectorize -msse no-tree-dom-vect-bug.c: In function âmain1â: no-tree-dom-vect-bug.c:15: internal compiler error: in expand_simple_binop, at optabs.c:1192 This happens somewhere between these passes: no-tree-dom-vect-bug.c.036t.release_ssa no-tree-dom-vect-bug.c.044t.apply_inline (maybe when the vectorized main1 is inlined into main?) gdb traceback: function=0x8703852 "expand_simple_binop") at ../../gcc/gcc/diagnostic.c:656 656 internal_error ("in %s, at %s:%d", function, trim_filename (file), line); (gdb) backtrace #0 fancy_abort (file=0x8703282 "../../gcc/gcc/optabs.c", line=1192, function=0x8703852 "expand_simple_binop") at ../../gcc/gcc/diagnostic.c:656 #1 0x08271467 in expand_simple_binop (mode=Variable "mode" is not available. ) at ../../gcc/gcc/optabs.c:1192 #2 0x081a2a3f in force_operand (value=0xb7d01d98, target=0xb7a9d960) at ../../gcc/gcc/expr.c:6069 #3 0x08622a4d in move_invariant_reg (loop=0xa363f60, invno=0) at ../../gcc/gcc/loop-invariant.c:1180 #4 0x086236bd in move_loop_invariants () at ../../gcc/gcc/loop-invariant.c:1242 #5 0x08621757 in rtl_move_loop_invariants () at ../../gcc/gcc/loop-init.c:256 #6 0x082775c6 in execute_one_pass (pass=0x87ad8a0) at ../../gcc/gcc/passes.c:1058 #7 0x082777b7 in execute_pass_list (pass=0x87ad8a0) at ../../gcc/gcc/passes.c:1110 #8 0x082777ca in execute_pass_list (pass=0x87ad7e0) at ../../gcc/gcc/passes.c: #9 0x082777ca in execute_pass_list (pass=0x87aab60) at ../../gcc/gcc/passes.c: #10 0x08356638 in tree_rest_of_compilation (fndecl=0xb7ce9a80) at ../../gcc/gcc/tree-optimize.c:412 #11 0x0808db8c in c_expand_body (fndecl=0xb7ce9a80) at ../../gcc/gcc/c-common.c:4285 #12 0x084b4ab1 in cgraph_expand_function (node=0xb7c1b700) at ../../gcc/gcc/cgraphunit.c:1015 #13 0x084b6e96 in cgraph_optimize () at ../../gcc/gcc/cgraphunit.c:1084 #14 0x0805b4df in c_write_global_declarations () at ../../gcc/gcc/c-decl.c:7930 #15 0x082f5bd6 in toplev_main (argc=20, argv=0xbfd26434) at ../../gcc/gcc/toplev.c:1063 #16 0x080d5e02 in main (argc=Cannot access memory at address 0x1 ) at ../../gcc/gcc/main.c:35 (gdb) up #2 0x081a2a3f in force_operand (value=0xb7d01d98, target=0xb7a9d960) at ../../gcc/gcc/expr.c:6069 6069 return expand_simple_binop (GET_MODE (value), code, op1, op2, (gdb) p code $1 = VEC_SELECT -- Summary: ICE with -fno-tree-dominator-opts -ftree-vectorize -msse Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: dorit at il dot ibm dot com GCC build triplet: i386-linux GCC host triplet: i386-linux GCC target triplet: i386-linux http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31333
[Bug target/30784] [4.3 regression] ICE on loop vectorization (-O1 -march=athlon-xp -ftree-vectorize)
--- Comment #7 from dorit at il dot ibm dot com 2007-03-24 08:00 --- patch: http://gcc.gnu.org/ml/gcc/2007-03/msg00918.html -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30784
[Bug target/30784] [4.3 regression] ICE on loop vectorization (-O1 -march=athlon-xp -ftree-vectorize)
--- Comment #5 from dorit at il dot ibm dot com 2007-03-14 12:29 --- this is the testcase I have ICE-ing on powerpc64-yellowdog, when compiled with -ftree-vectorize -maltivec -m64 -O2: long stack_vars_sorted[32]; void partition_stack_vars (long stack_vars_num) { long si, n = stack_vars_num; for (si = 0; si < n; ++si) stack_vars_sorted[si] = si; } (extracted from cfgexpand.c which ICEs during bootstrap with vectorization) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30784
[Bug target/30784] [4.3 regression] ICE on loop vectorization (-O1 -march=athlon-xp -ftree-vectorize)
--- Comment #4 from dorit at il dot ibm dot com 2007-03-14 12:13 --- I also saw this on powerpc64, on a different testcase (vectorizing longs with -m64). seems like constant propagation during dom3 propagates the vector initializer into a BIT_FIELD_EXPR, which results in invalid gimple? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30784
[Bug tree-optimization/31041] [4.3 Regression] verify_stmts failed: invalid operand to binary operator with -O2 -ftree-vectorize
--- Comment #5 from dorit at il dot ibm dot com 2007-03-05 20:15 --- I'm travelling now, but can prepare a fix when I'm back (next week). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31041
[Bug tree-optimization/30858] [4.3 Regression] ice for legal code with -O2 -ftree-vectorize
--- Comment #8 from dorit at il dot ibm dot com 2007-02-21 19:31 --- > Is this acceptable ? sure, thanks -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30858
[Bug tree-optimization/30858] [4.3 Regression] ice for legal code with -O2 -ftree-vectorize
--- Comment #6 from dorit at il dot ibm dot com 2007-02-20 22:56 --- proposed patches - http://gcc.gnu.org/ml/gcc-patches/2007-02/msg01734.html > I have thrown most of Suse Linux 10.3 at it and it has crashed > in a few places. would you mind giving these patches a try? (to see what's the next ICE...?) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30858
[Bug tree-optimization/30858] [4.3 Regression] ice for legal code with -O2 -ftree-vectorize
--- Comment #5 from dorit at il dot ibm dot com 2007-02-19 14:12 --- Looks like I wasn't careful enough with my fix for PR30771. Here is a fix for that fix I'm now testing: Index: tree-vect-analyze.c === --- tree-vect-analyze.c (revision 122128) +++ tree-vect-analyze.c (working copy) @@ -124,10 +124,11 @@ /* Two cases of "relevant" phis: those that define an induction that is used in the loop, and those that -define a reduction. */ +directly define a reduction. */ if ((STMT_VINFO_RELEVANT (stmt_info) == vect_used_in_loop && STMT_VINFO_DEF_TYPE (stmt_info) == vect_induction_def) - || (STMT_VINFO_RELEVANT (stmt_info) == vect_used_by_reduction + || (STMT_VINFO_RELEVANT (stmt_info) == + vect_used_directly_by_reduction && STMT_VINFO_DEF_TYPE (stmt_info) == vect_reduction_def)) { gcc_assert (!STMT_VINFO_VECTYPE (stmt_info)); @@ -328,8 +329,12 @@ return false; } - if (STMT_VINFO_RELEVANT (stmt_info) == vect_used_in_loop - && STMT_VINFO_DEF_TYPE (stmt_info) != vect_induction_def) + if ((STMT_VINFO_RELEVANT (stmt_info) == vect_used_in_loop + && STMT_VINFO_DEF_TYPE (stmt_info) != vect_induction_def) + || (STMT_VINFO_RELEVANT (stmt_info) > + vect_used_directly_by_reduction + && STMT_VINFO_DEF_TYPE (stmt_info) == vect_reduction_def)) + { /* Most likely a reduction-like computation that is used in the loop. */ @@ -2313,9 +2318,11 @@ if (STMT_VINFO_DEF_TYPE (stmt_vinfo) == vect_reduction_def) { gcc_assert (relevant == vect_unused_in_loop && live_p); - relevant = vect_used_by_reduction; + relevant = vect_used_directly_by_reduction; live_p = false; } + else if (relevant == vect_used_directly_by_reduction) + relevant = vect_used_by_reduction; i = 0; FOR_EACH_SSA_TREE_OPERAND (use, stmt, iter, SSA_OP_USE) Index: tree-vectorizer.h === --- tree-vectorizer.h (revision 122128) +++ tree-vectorizer.h (working copy) @@ -175,6 +175,7 @@ /* Indicates whether/how a variable is used in the loop. */ enum vect_relevant { vect_unused_in_loop = 0, + vect_used_directly_by_reduction, vect_used_by_reduction, vect_used_in_loop }; -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30858
[Bug c/30858] ice for legal code with -O2 -ftree-vectorize
--- Comment #3 from dorit at il dot ibm dot com 2007-02-19 12:56 --- (In reply to comment #0) Thanks for exercising the vectorizer and reporting these bugs! > On the wider issue of the quality of the vectorizer, I > have thrown most of Suse Linux 10.3 at it and it has crashed > in a few places. only a few? :-) > I suspect there would be considerable value at getting some > other distribution [ Debian ?], maybe on another type of > machine [ PPC64 ?], and flushing out a few more bugs in the optimizer. > You would need to ensure that -ftree-vectorize was switched > on for every compile. > Just a suggestion. I agree. We are working on a cost model these days to make the vectorizer less greedy, hopefully as a step towards enabling vectorization on by default - which would help in flushing bugs out. (Just as a side comment - FYI - most of the vectorizer bugs you opened so far in the last few days (30771, 30795, 30843) are related to features that were added *very* recently). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30858
[Bug c/30858] ice for legal code with -O2 -ftree-vectorize
--- Comment #2 from dorit at il dot ibm dot com 2007-02-19 12:45 --- Reduced testcase: int foo (int ko) { int j,i; for (j = 0; j < ko; j++) i += (i > 10) ? -5 : 7; return i; } Looking into it... -- dorit at il dot ibm dot com changed: What|Removed |Added Summary|ice for legal code with -O2 |ice for legal code with -O2 |-ftree-vectorize|-ftree-vectorize http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30858
[Bug c/30843] ice for legal code with -ftree-vectorize -O2
--- Comment #3 from dorit at il dot ibm dot com 2007-02-19 08:28 --- > Looks like possibly some bad interaction between vectorization of induction > and > vectorization of strided-access. Will investigate. I looked into it with Ira, and looks like the problem is that during transformation we remove each of the stores of the interleaved-group as we scan the stmts, but we actually vectorize them all together only when we reach the last store of the interleaved-group, at which point, we attempt to insert the loop-update for the vectorized induction before one of the stores - and crash. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30843
[Bug c/30843] ice for legal code with -ftree-vectorize -O2
--- Comment #2 from dorit at il dot ibm dot com 2007-02-18 21:50 --- I was able to reproduce it. Here's a reduced testcase: void dacP98FillRGBMap( unsigned char *pBuffer ) { unsigned long dw, dw1; unsigned long *pdw = (unsigned long *)(pBuffer); for( dw = 256, dw1 = 0; dw; dw--, dw1 += 0x01010101 ) { *pdw++ = dw1; *pdw++ = dw1; *pdw++ = dw1; *pdw++ = dw1; } } Looks like possibly some bad interaction between vectorization of induction and vectorization of strided-access. Will investigate. -- dorit at il dot ibm dot com changed: What|Removed |Added Summary|ice for legal code with - |ice for legal code with - |ftree-vectorize -O2 |ftree-vectorize -O2 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30843
[Bug tree-optimization/30795] [4.3 Regression] ice for legal code with -ftree-vectorize -O2
--- Comment #4 from dorit at il dot ibm dot com 2007-02-18 16:42 --- patch: http://gcc.gnu.org/ml/gcc-patches/2007-02/msg01555.html -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30795
[Bug tree-optimization/30795] [4.3 Regression] ice for legal code with -ftree-vectorize -O2
--- Comment #3 from dorit at il dot ibm dot com 2007-02-15 10:21 --- I'll look into it. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30795
[Bug tree-optimization/30771] ice for legal code with -O2 -ftree-vectorize
--- Comment #4 from dorit at il dot ibm dot com 2007-02-12 14:23 --- I'm testing the patch below. (wasn;t able to reproduce the problem in the attched testcase, but here's a reduced testcase for the problem that Richi described - thanks!: int a[128]; int main() { short i; for (i=0; i<64; i++){ a[i] = (int)i; } return 0; } ) Index: tree-vect-analyze.c === --- tree-vect-analyze.c (revision 121843) +++ tree-vect-analyze.c (working copy) @@ -97,8 +97,12 @@ int nbbs = loop->num_nodes; block_stmt_iterator si; unsigned int vectorization_factor = 0; + tree scalar_type; + tree phi; + tree vectype; + unsigned int nunits; + stmt_vec_info stmt_info; int i; - tree scalar_type; if (vect_print_dump_info (REPORT_DETAILS)) fprintf (vect_dump, "=== vect_determine_vectorization_factor ==="); @@ -107,12 +111,67 @@ { basic_block bb = bbs[i]; + for (phi = phi_nodes (bb); phi; phi = PHI_CHAIN (phi)) + { + stmt_info = vinfo_for_stmt (phi); + if (vect_print_dump_info (REPORT_DETAILS)) + { + fprintf (vect_dump, "==> examining phi: "); + print_generic_expr (vect_dump, phi, TDF_SLIM); + } + + gcc_assert (stmt_info); + + /* Two cases of "relevant" phis: those that define an +induction that is used in the loop, and those that +define a reduction. */ + if ((STMT_VINFO_RELEVANT (stmt_info) == vect_used_in_loop + && STMT_VINFO_DEF_TYPE (stmt_info) == vect_induction_def) + || (STMT_VINFO_RELEVANT (stmt_info) == vect_used_by_reduction + && STMT_VINFO_DEF_TYPE (stmt_info) == vect_reduction_def)) +{ + gcc_assert (!STMT_VINFO_VECTYPE (stmt_info)); + scalar_type = TREE_TYPE (PHI_RESULT (phi)); + + if (vect_print_dump_info (REPORT_DETAILS)) + { + fprintf (vect_dump, "get vectype for scalar type: "); + print_generic_expr (vect_dump, scalar_type, TDF_SLIM); + } + + vectype = get_vectype_for_scalar_type (scalar_type); + if (!vectype) + { + if (vect_print_dump_info (REPORT_UNVECTORIZED_LOOPS)) + { + fprintf (vect_dump, + "not vectorized: unsupported data-type "); + print_generic_expr (vect_dump, scalar_type, TDF_SLIM); + } + return false; + } + STMT_VINFO_VECTYPE (stmt_info) = vectype; + + if (vect_print_dump_info (REPORT_DETAILS)) + { + fprintf (vect_dump, "vectype: "); + print_generic_expr (vect_dump, vectype, TDF_SLIM); + } + + nunits = TYPE_VECTOR_SUBPARTS (vectype); + if (vect_print_dump_info (REPORT_DETAILS)) + fprintf (vect_dump, "nunits = %d", nunits); + + if (!vectorization_factor + || (nunits > vectorization_factor)) + vectorization_factor = nunits; + } + } + for (si = bsi_start (bb); !bsi_end_p (si); bsi_next (&si)) { tree stmt = bsi_stmt (si); - unsigned int nunits; - stmt_vec_info stmt_info = vinfo_for_stmt (stmt); - tree vectype; + stmt_info = vinfo_for_stmt (stmt); if (vect_print_dump_info (REPORT_DETAILS)) { @@ -269,10 +328,11 @@ return false; } - if (STMT_VINFO_RELEVANT_P (stmt_info)) + if (STMT_VINFO_RELEVANT (stmt_info) == vect_used_in_loop + && STMT_VINFO_DEF_TYPE (stmt_info) != vect_induction_def) { /* Most likely a reduction-like computation that is used -in the loop. */ +in the loop. */ if (vect_print_dump_info (REPORT_UNVECTORIZED_LOOPS)) fprintf (vect_dump, "not vectorized: unsupported pattern."); return false; @@ -2235,17 +2295,7 @@ (case 2) If STMT has been identified as defining a reduction variable, then - we have two cases: - (case 2.1) -The last use of STMT is the reduction-variable, which is defined -by a loop-header-phi. We don't want to mark the phi as live or -relevant (because it does not need to be vectorized, it is handled - as part of the vectorization of the reduction), so in this case we -skip the call to vect_mark_relevant. - (case 2.2) -The rest of the uses of STMT are defined in the loop body. For - the def_stmt of these uses we want to set l
[Bug c++/30771] ice for legal code with -O2 -ftree-vectorize
--- Comment #3 from dorit at il dot ibm dot com 2007-02-12 10:11 --- I'll look into it. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30771
[Bug tree-optimization/29145] unsafe use of restrict qualifier
--- Comment #11 from dorit at il dot ibm dot com 2007-02-06 08:18 --- (In reply to comment #10) > One thing I can think of that this description misses is that the two > pointers must be based-on *different* restrict-qualified pointers, unless > that case is already handled elsewhere. yes, at the beginning of this function we check if the two pointers are the same, and if so - we don't reach this part of the code. Since our implementation of "based on" is the pointer itself (i.e. "is ptr_a based on some restricted pointer ptr_b" is implemented as "is ptr_a a restricted pointer", we are safe. You're right though that when the implementation of "based on" is extended, we'd need to compare the two restricted pointers (we now compare the two ptr_a's, we'd need to compare the two ptr_b's). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29145
[Bug tree-optimization/29145] unsafe use of restrict qualifier
--- Comment #8 from dorit at il dot ibm dot com 2007-01-07 20:22 --- I'm testing this patch, that makes us more conservative, and concludes that two pointers don't overlap only if both are "based on" restricted pointers, with "based on" trivially implemented as the pointer used in the reference itself. In addition, we check that the declarations of both pointers are in the parameter list of the same function (to be safe w.r.t the scope of the pointer declarations). Looks like this should be safe enough? Most of the vectorizer testcases still get vectorized with this patch. Two testcases that don't, however, are - vect-[74,80].c, for which we need a bit less trivial implementation of "based on". We can start with this conservative implementation, switch this PR to a "missed optimization", and gradually work on relaxing the restrictions as much as we can. Index: tree-data-ref.c === --- tree-data-ref.c (revision 120551) +++ tree-data-ref.c (working copy) @@ -490,6 +490,7 @@ tree addr_a = DR_BASE_ADDRESS (dra); tree addr_b = DR_BASE_ADDRESS (drb); tree type_a, type_b; + tree decl_a, decl_b; bool aliased; if (!addr_a || !addr_b) @@ -547,14 +548,25 @@ } /* An instruction writing through a restricted pointer is "independent" of any - instruction reading or writing through a different pointer, in the same - block/scope. */ - else if ((TYPE_RESTRICT (type_a) && !DR_IS_READ (dra)) - || (TYPE_RESTRICT (type_b) && !DR_IS_READ (drb))) + instruction reading or writing through a different restricted pointer, + in the same block/scope. */ + else if (TYPE_RESTRICT (type_a) + && TYPE_RESTRICT (type_b) + && (!DR_IS_READ (drb) || !DR_IS_READ (dra)) + && TREE_CODE (DR_BASE_ADDRESS (dra)) == SSA_NAME + && (decl_a = SSA_NAME_VAR (DR_BASE_ADDRESS (dra))) + && TREE_CODE (decl_a) == PARM_DECL + && TREE_CODE (DECL_CONTEXT (decl_a)) == FUNCTION_DECL + && TREE_CODE (DR_BASE_ADDRESS (drb)) == SSA_NAME + && (decl_b = SSA_NAME_VAR (DR_BASE_ADDRESS (drb))) + && TREE_CODE (decl_b) == PARM_DECL + && TREE_CODE (DECL_CONTEXT (decl_b)) == FUNCTION_DECL + && DECL_CONTEXT (decl_a) == DECL_CONTEXT (decl_b)) { *differ_p = true; return true; } + return false; } -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29145
[Bug tree-optimization/30038] Call to sin(x), cos(x) should be transformed to sincos(x)
--- Comment #16 from dorit at il dot ibm dot com 2006-12-12 20:59 --- (In reply to comment #13) Looks like what's blocking vectorization of the loop is: sinc.f90:8: note: value used after loop. sinc.f90:8: note: not vectorized: relevant stmt not supported: D.1408_32 = (*radius_31)[D.1407_30] i.e., there is a value computed in the loop that is also used after the loop (coefficient__lsm.61_26), and the above stmt is in its def-use chain, as can be seen from the loop snippet below: # n_3 = PHI <1(3), n_70(5)>; :; ... D.1408_32 = (*radius_31)[D.1407_30]; ... D.1410_35 = reciptmp.60_24 * D.1408_32; ... D.1419_63 = D.1410_35 * pretmp.53_51; coefficient__lsm.61_26 = D.1419_63; ... (*tmp_49)[D.1426_67] = D.1419_63; n_70 = n_3 + 1; if (n_3 == D.1398_5) goto ; else goto ; :; goto (); # coefficient__lsm.61_54 = PHI ; :; *coefficient_44 = coefficient__lsm.61_54; :; return; We currently support a computation that is used after the loop only if the computation is a reduction. We have a patch in autovect branch that provides the first step towards supporting this situation in general, but it needs more work. How important is this feature do you think? In the meantime you can try to use a different variable for the coefficient inside the loop, and after the loop read the desired value from memory to set the coefficient function argument (hopefully this will disconnect the use outside the loop from the def inside the loop). -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30038
[Bug tree-optimization/30038] Call to sin(x), cos(x) should be transformed to sincos(x)
--- Comment #11 from dorit at il dot ibm dot com 2006-12-07 20:19 --- (In reply to comment #10) > Using the three patches: ... > gfortran is able to use sincos - and does so for my example (comment #0; the > example, however, cannot be vectorized). why? (what does -fdump-tree-vect-details say?) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=30038
[Bug fortran/29779] [4.3 Regression] vectorizer fortran testcases failing
--- Comment #12 from dorit at il dot ibm dot com 2006-12-06 22:22 --- > By the way, you wrote 2006-11-17: > > Should be submitted this weekend > Any new ETA? It was already submitted: http://gcc.gnu.org/ml/gcc-patches/2006-12/msg00110.html -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29779
[Bug fortran/29779] [4.3 Regression] vectorizer fortran testcases failing
--- Comment #7 from dorit at il dot ibm dot com 2006-11-17 06:46 --- (In reply to comment #6) > This patch should fix the problem: indeed it does, thanks! are you going to submit it to mainline? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29779
[Bug tree-optimization/29777] missed optimization: model missing widen_mult* idioms for SSE
--- Comment #2 from dorit at il dot ibm dot com 2006-11-09 20:26 --- > But these files can be succesfully vectorized using current (gcc version 4.3.0 > 20061109) version on i686: > gcc -O2 -msse2 -ftree-vectorize -fdump-tree-vect-all vect-widen-mult-sum.c > vect-widen-mult-sum.c:16: note: LOOP VECTORIZED. > vect-widen-mult-sum.c:12: note: vectorized 1 loops in function. Probably because the i386 port models the "vect_unpack" and "vect_int_mult" idioms (see target-supports.exp:check_effective_target_vect_widen_mult_hi_to_si()): i.e., instead of recognizing it's a widening multiplication and vectorizing it as such, it's vectorized by first unpacking (widening) the shorts to ints, and then doing int multiplication, which is probably less efficient. Sorry for the unclarity > > The missing insns (that should be merged from autovect-branch and debugged): > > vec_widen_umult_hi_v8hi > > vec_widen_umult_lo_v8hi > These patterns _are_ present in gcc version 4.3.0 20061109 (experimental) in > sse.md. I'm sorry - I meant vec_widen_smult_hi_v8hi and vec_widen_smult_lo_v8hi. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29777
[Bug middle-end/29779] New: vectorizer fortran testcases failing
Looks like sometime between Oct27 (http://gcc.gnu.org/ml/gcc-testresults/2006-10/msg01336.html) and Oct30 (http://gcc.gnu.org/ml/gcc-testresults/2006-10/msg01538.html) the fortran vectorizer testcases started ICEing on: gfortran.dg/vect/vect-3.f90:0: warning: 'const' attribute directive ignored gfortran.dg/vect/vect-3.f90:4: internal compiler error: in vect_setup_realignment, at tree-vect-transform.c:2534 Should be related somehow to this code in rs6000.c: /* Initialize target builtin that implements targetm.vectorize.builtin_mask_for_load. */ decl = add_builtin_function ("__builtin_altivec_mask_for_load", v16qi_ftype_long_pcvoid, ALTIVEC_BUILTIN_MASK_FOR_LOAD, BUILT_IN_MD, NULL, tree_cons (get_identifier ("const"), NULL_TREE, NULL_TREE)); Anybody knows which patch caused this? -- Summary: vectorizer fortran testcases failing Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: dorit at il dot ibm dot com GCC build triplet: ppc*-*-linux GCC host triplet: ppc*-*-linux GCC target triplet: ppc*-*-linux http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29779
[Bug tree-optimization/29778] New: missed optimization: model missing vec_pack/unpack idioms for ia64
We need to port the ia64 support for vectorization of multiple-datatypes from autovect-branch. This is the patch missing from mainline (wasn't included in http://gcc.gnu.org/ml/gcc-patches/2006-08/msg00166.html cause I cauldn't test this): 2005-12-02 Richard Henderson <[EMAIL PROTECTED]> * config/ia64/ia64.c (TARGET_VECTORIZE_BUILTIN_EXTRACT_EVEN): New. (TARGET_VECTORIZE_BUILTIN_EXTRACT_ODD): New. (TARGET_VECTORIZE_BUILTIN_MUL_WIDEN_EVEN, TARGET_VECTORIZE_BUILTIN_MUL_WIDEN_ODD, ia64_builtin_mul_widen_even, ia64_builtin_mul_widen_odd, builtin_ia64_pmpy_r, builtin_ia64_pmpy_l, IA64_BUILTIN_PMPY_R, IA64_BUILTIN_PMPY_L): New (ia64_init_builtins): Initialize builtin_ia64_pmpy_[rl]. (ia64_expand_builtin): Expand them. (ia64_expand_unpack): New. * config/ia64/vect.md (smulv4hi3_highpart, umulv4hi3_highpart): New. (vec_pack_ssat_v4hi): Rename from pack2_sss. (vec_pack_usat_v4hi): Rename from pack2_uss. (vec_pack_ssat_v2si): Rename from pack4_sss. (vec_pack_mod_v4hi, vec_pack_mod_v2si): New. (vec_interleave_lowv8qi): Rename from unpack1_l. (vec_interleave_highv8qi): Rename from unpack1_h. (vec_interleave_lowv4hi): Rename from unpack2_l. (vec_interleave_highv4hi): Rename from unpack2_h. (vec_interleave_lowv2si): Rename from unpack4_l. (vec_interleave_highv2si): Rename from unpack4_h. (vec_unpacku_hi_v8qi, vec_unpacks_hi_v8qi): New. (vec_unpacku_lo_v8qi, vec_unpacks_lo_v8qi): New. (vec_unpacku_hi_v4hi, vec_unpacks_hi_v4hi): New. (vec_unpacku_lo_v4hi, vec_unpacks_lo_v4hi): New. * config/ia64/ia64-protos.h (ia64_expand_unpack): Declare. Once the above is merged, we can add ia64 to the lists of targets that support the following functions in testsuite/lib/target-support.exp: check_effective_target_vect_sdot_hi check_effective_target_vect_udot_qi check_effective_target_vect_sdot_qi check_effective_target_vect_widen_sum_qi_to_hi check_effective_target_vect_widen_sum_hi_to_si -- Summary: missed optimization: model missing vec_pack/unpack idioms for ia64 Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: dorit at il dot ibm dot com GCC build triplet: ia64-*-* GCC host triplet: ia64-*-* GCC target triplet: ia64-*-* http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29778
[Bug tree-optimization/29777] New: missed optimization: model missing widen_mult* idioms for SSE
The patch that adds support for vectorization of multiple data-types (http://gcc.gnu.org/ml/gcc-patches/2006-08/msg00166.html) was missing a few bits from the i386 port that rth contributed to autovect-branch a while back. This is because a couple testcases were failing with these features: The testcases that failed (on assembler error) are two of tests that require "vect_widen_mult_hi_to_si": testsuite/gcc.dg/vect/vect-reduc-dot-s16a.c testsuite/gcc.dg/vect/vect-widen-mult-s16.c testsuite/gcc.dg/vect/vect-widen-mult-sum.c The missing insns (that should be merged from autovect-branch and debugged): vec_widen_umult_hi_v8hi vec_widen_umult_lo_v8hi When these are back in, we'll want to add i?86-*-* and x86_64-*-* to the list of targets that return true in the function "vect_widen_mult_hi_to_si" in testsuite/lib/target-support.exp. -- Summary: missed optimization: model missing widen_mult* idioms for SSE Product: gcc Version: 4.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: dorit at il dot ibm dot com GCC build triplet: i?86-*-* and x86_64-*-* GCC host triplet: i?86-*-* and x86_64-*-* GCC target triplet: i?86-*-* and x86_64-*-* http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29777
[Bug tree-optimization/29145] unsafe use of restrict qualifier
--- Comment #6 from dorit at il dot ibm dot com 2006-11-05 15:48 --- (In reply to comment #5) > This was something that slipped in, IIRC. I was of Ian's viewpoint, that > may_alias_p should handle it, and it shouldn't be special to data-references. yes, it was originally added as a temporary hack until alias analysis did something with restrict -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29145
[Bug middle-end/29269] New: missing documentation for "vcond" (vector conditional operation)
missing documentation for "vcond" (vector conditional operation). -- Summary: missing documentation for "vcond" (vector conditional operation) Product: gcc Version: 4.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: dorit at il dot ibm dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29269
[Bug middle-end/29268] New: missed optimization: need to generalize realignment support in the vectorizer
details in theis thread: http://gcc.gnu.org/ml/gcc/2006-09/msg00503.html Need to add other ways to handle realignment, that are applicable to targets that can't support the realign_load the way it is currently defined. -- Summary: missed optimization: need to generalize realignment support in the vectorizer Product: gcc Version: 4.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: dorit at il dot ibm dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29268
[Bug tree-optimization/29170] autovec cannot handle short+=short
--- Comment #4 from dorit at il dot ibm dot com 2006-09-21 19:30 --- By the way, the testcase gets vectorized if you compile with -fwrapv. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29170
[Bug middle-end/29160] New: missed optimization: redundant casts prevent vectorization
Details in this thread: http://gcc.gnu.org/ml/gcc/2006-09/msg00167.html " A silly little testcase which the vectorizer doesn't vectorize: unsigned char qa[128]; unsigned char qb[128]; unsigned char qc[128]; unsigned char qd[128]; void autovectqi (void) { int i; for (i = 0; i < 128; i ++) qd[i] = qa[i] ^ qb[i] + qc[i]; } ... If I change 'qb[i] + qc[i]' to e.g. 'qb[i] & qc[i]' the vectorizer works fine. autovecttest.c:11: note: not vectorized: relevant stmt not supported: D.1861_9 = (signed char) D.1860_8 " Devnag suggested the solution should be part of a "tree-combin" pass: http://gcc.gnu.org/ml/gcc/2006-09/msg00182.html Dorit suggested to add it as part of the vectorizer's pattern-recognition engine: http://gcc.gnu.org/ml/gcc/2006-09/msg00281.html -- Summary: missed optimization: redundant casts prevent vectorization Product: gcc Version: 4.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: dorit at il dot ibm dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=29160
[Bug middle-end/28684] Imprecise -funsafe-math-optimizations definition
--- Comment #4 from dorit at il dot ibm dot com 2006-09-11 10:57 --- > You could help by looking at the source code (there are only a few dozens > places mentioning flag_unsafe_math_optimizations) and auditing which places > would be more suited to a new flag_reassociate_fp variable. we'd be very interested in allowing the vectorizer to work under flag_reassociate_fp rather than flag_unsafe_math_optimizations, so we'll give this a try. -- dorit at il dot ibm dot com changed: What|Removed |Added CC||eres at il dot ibm dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28684
[Bug tree-optimization/26969] [4.1 Regression] ICE with -O1 -funswitch-loops -ftree-vectorize
--- Comment #12 from dorit at il dot ibm dot com 2006-09-01 05:43 --- oops - I didn't notice it was open against 4.1. So hopefully porting Victor's patch to 4.1 would fix it. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26969
[Bug tree-optimization/26969] [4.1 Regression] ICE with -O1 -funswitch-loops -ftree-vectorize
--- Comment #10 from dorit at il dot ibm dot com 2006-08-31 08:22 --- I think this can be closed? (I opened a missed-optimization PR instead - PR28643) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26969
[Bug tree-optimization/27742] [4.2 regression] ICE with -ftree-vectorizer-verbose
--- Comment #9 from dorit at il dot ibm dot com 2006-08-31 08:08 --- I have been unsuccessful in reproducing this problem on a i386-redhat-linux. I don't get a failure compiling the testcase from comment 8. I tried to compile the testcase from comment 7 and got the following errors: g++ -O1 -g -ftree-vectorize -ftree-vectorizer-verbose=5 -S G2\[1].ii G2[1].ii:2154: error: integer constant is too large for âlongâ type G2[1].ii:2154: error: integer constant is too large for âlongâ type G2[1].ii:2156: error: integer constant is too large for âlongâ type G2[1].ii:425: warning: â__malloc__â attribute ignored G2[1].ii:1662: warning: no matching push for â#pragma GCC visibility popâ G2[1].ii:2065: error: âoperator newâ takes type âsize_tâ (âunsigned intâ) as first parameter G2[1].ii:2065: error: âoperator newâ takes type âsize_tâ (âunsigned intâ) as first parameter G2[1].ii:2065: error: âoperator newâ takes type âsize_tâ (âunsigned intâ) as first parameter G2[1].ii:2065: error: âoperator newâ takes type âsize_tâ (âunsigned intâ) as first parameter G2[1].ii:2065: error: âoperator newâ takes type âsize_tâ (âunsigned intâ) as first parameter G2[1].ii:2065: error: âoperator newâ takes type âsize_tâ (âunsigned intâ) as first parameter G2[1].ii: In static member function âstatic long int std::numeric_limits::min()â: G2[1].ii:2154: warning: overflow in implicit constant conversion G2[1].ii: In static member function âstatic long int std::numeric_limits::max()â: G2[1].ii:2154: warning: overflow in implicit constant conversion G2[1].ii: In static member function âstatic long unsigned int std::numeric_limits::max()â: G2[1].ii:2156: warning: overflow in implicit constant conversion -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27742
[Bug target/27827] [4.0/4.1 Regression] gcc 4 produces worse x87 code on all platforms than gcc 3
--- Comment #55 from dorit at il dot ibm dot com 2006-08-09 19:10 --- Subject: Re: [4.0/4.1 Regression] gcc 4 produces worse x87 code on all platforms than gcc 3 > > Here's some questions I need to figure out: > (1) Why do I have to throw the -funsafe-math-optimizations flag to > enable this? >-- I see where the .vect file warns of it, but it refers to an SSA line, > so I'm not sure what's going on. This flag is needed in order to allow vectorization of reduction (summation in your case) of floating-point data. This is because vectorization of reduction changes the order of the computation, which may result in different behavior (instead of summing this way: ((a0+a1)+a2)+a3)+a4)+a5)+a6)+a7, we sum this way (((a0+a2)+a4)+a6)+(((a1+a3)+a5)+a7) > (2) Is there any pragma or assertion, etc, that I can put in the code to > notify the compiler that certain pointers point to 16-byte aligned data? > -- Only the output array (C) is possibly misaligned in ATLAS > Not really, I'm afraid - there is something that's not entirely supported in gcc yet - see details in PR20794. dorit > Thanks, > Clint > > > -- > > > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27827 > -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27827
[Bug tree-optimization/28643] redundant phi-node in latch-block prevents vectorization
--- Comment #3 from dorit at il dot ibm dot com 2006-08-08 07:38 --- > Err, SSA copy prop should be enough, actually, since after copy-prop, > the phi will have no users (and they shouldn't care about code with no > uses that doesn't access memory). > Though it's interesting that this redundant phi survives so long. What > is creating it? I think it's loop-unswitch -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28643
[Bug middle-end/28643] New: redundant phi-node in latch-block prevents vectorization
Since the fix for PR26969, we now fail to vectorize loops that have redundant phi-nodes in their (otherwise empty) latch block. The testcase committed with the PR fix is an example for such a case. See http://gcc.gnu.org/ml/gcc-patches/2006-08/msg00034.html for more details. -- Summary: redundant phi-node in latch-block prevents vectorization Product: gcc Version: 4.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: dorit at il dot ibm dot com GCC build triplet: powerpc-linux GCC host triplet: powerpc-linux GCC target triplet: powerpc-linux http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28643
[Bug target/27827] [4.0/4.1 Regression] gcc 4 produces worse x87 code on all platforms than gcc 3
--- Comment #43 from dorit at il dot ibm dot com 2006-08-07 20:35 --- > I'm all for this. info gcc says that w/o a guarantee of alignment, loops are > duped, with an if selecting between vector and scalar loops, is this not > accurate? yes >I spent a day trying to get gcc to vectorize any of the generator's > loops, and did not succeed (can you make it vectorize the provided benchmark > code?). The aggressive unrolling in the provided example seems to be the first obstacle to vectorize the code > I also tried various unrollings of the inner loop, particularly no > unrolling and unroll=2 (vector length). I was unable to truly decipher the > warning messages explaining the lack of vectorization, and I would truly > welcome some help in fixing this. I'd be happy to help decipher the vectorizer's dump file. please send the un-unrolled version and the dump file generated by -fdump-tree-vect-details, and I'll see if I can help. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27827
[Bug middle-end/27770] [4.2 Regression] wrong code in spec tests for -ftree-vectorize -maltivec
--- Comment #25 from dorit at il dot ibm dot com 2006-08-07 07:09 --- (In reply to comment #24) > Fixed, a new different bug for the missed optimization should be opened. It's PR28628. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27770
[Bug middle-end/28628] New: Not forcing alignment of arrays in structs with -fsection-anchors
Since the fix to PR27770, we now miss opportunities to align some arrays when -fsection-anchors is enabled. The patch for PR27770 increases the alignment of (global) arrays only. We have a few testcases though (e.g. section-anchors-vect-69.c) that have global structs that contain fields that are arrays. Aligning the beginning of these structs can sometime align one/some of their array fields. Since the new function cgraph_increase_alignment does notattempt to do that, we have cases that will be vectorized less efficiently. To solve this we need to extend the optimization to align global structs that have array fields that could become aligned as a result. -- Summary: Not forcing alignment of arrays in structs with - fsection-anchors Product: gcc Version: 4.2.0 Status: UNCONFIRMED Severity: enhancement Priority: P3 Component: middle-end AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: dorit at il dot ibm dot com GCC build triplet: powerpc-linux GCC host triplet: powerpc-linux GCC target triplet: powerpc-linux http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28628
[Bug middle-end/27770] [4.2 Regression] wrong code in spec tests for -ftree-vectorize -maltivec
--- Comment #19 from dorit at il dot ibm dot com 2006-07-23 19:03 --- > The fix we've agreed is best in principle is to speculatively increase > the DECL_ALIGN of vectorisable variables before compiling functions. > Dorit says that there is a patch related to this on the autovect branch, > which I'll look at when I get back from Ottawa. > Richard Turns out the patch I was thinking about is only for the rs6000 port: http://gcc.gnu.org/ml/gcc-patches/2006-05/msg00266.html so that's not much help. Do we want to implement this as a separate pass? at which point of the compilation? (doing it during ipa might be a problem if ipa is not enabled?) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27770
[Bug tree-optimization/26197] [4.2 regression] ICE in is_old_name with vectorizer
--- Comment #13 from dorit at il dot ibm dot com 2006-03-01 12:35 --- So I'll submit the patch to gcc-patches for approval. Can someone please check if this patch actually solves this PR? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26197
[Bug tree-optimization/26197] [4.2 regression] ICE in is_old_name with vectorizer
--- Comment #11 from dorit at il dot ibm dot com 2006-02-28 08:26 --- Created an attachment (id=10935) --> (http://gcc.gnu.org/bugzilla/attachment.cgi?id=10935&action=view) tentative patch I get a similar error message when trying to bootstrap mainline with vectorization enabled: /home/dorit/mainline_svn/build2/./prev-gcc/xgcc -B/home/dorit/mainline_svn/build2/./prev-gcc/ -B/home/dorit/mainline_svn2/ppc64-yellowdog-linux/bin/ -c -g -O2 -ftree-vectorize -maltivec -DIN_GCC -W -Wall -Wwrite-strings -Wstrict-prototypes -Wmissing-prototypes -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -Wold-style-definition -Wmissing-format-attribute -Werror -fno-common -DHAVE_CONFIG_H -I. -I. -I../../gcc/gcc -I../../gcc/gcc/. -I../../gcc/gcc/../include -I../../gcc/gcc/../libcpp/include -I../../gcc/gcc/../libdecnumber -I../libdecnumber../../gcc/gcc/recog.c -o recog.o ../../gcc/gcc/recog.c: In function âconstrain_operandsâ: ../../gcc/gcc/recog.c:2270: internal compiler error: tree check: expected ssa_name, have struct_field_tag in verify_ssa, at tree-ssa.c:735 make[3]: *** [recog.o] Error 1 make[3]: Leaving directory `/home/dorit/mainline_svn/build2/gcc' make[2]: *** [all-stage2-gcc] Error 2 make[2]: Leaving directory `/home/dorit/mainline_svn/build2' make[1]: *** [stage2-bubble] Error 2 make[1]: Leaving directory `/home/dorit/mainline_svn/build2' make: *** [bootstrap] Error 2 Following Zdenek's observations, I tried the attached patch. It solves this failure above in recog.c, but it fails bootstrap with vectorization enabled later on. (It does pass regular bootstrap on ppc-linux). So this patch needs to be further examined, but I wonder if it fixes this PR (I can't reproduce it)? About the patch: new_type_alias() originally looked like this: TAG <- new tag for ptr; if (var has subvars){ foreach subvar add the subvar as may-alias of TAG. } else{ get the may-aliases of var; if (|may-aliases| == 1) set the (single) may-alias of var as the new tag of ptr; else if (|may-aliases| == 0) add var as may-alias of the TAG; else /* |may-aliases| > 1 */ add the may-aliases of var as may-aliases of TAG; } What I did is basically factored out the 'else' part into a separate function, and called that function also in the 'if' part, for each subvar; this way, we don't add the subvar as may-alias of TAG if the subvar itself has may-aliases, but add its may-aliases instead: new version of new_type_alias(): TAG <- new tag for ptr; if (var has subvars){ foreach subvar add_may_aliases_for_new_tag (TAG, subvar) } else{ add_may_aliases_for_new_tag (TAG, var) } add_may_aliases_for_new_tag (TAG, var) { get the may-aliases of var; if (|may-aliases| == 1) set the (single) may-alias of var as the new tag of ptr; else if (|may-aliases| == 0) add var as may-alias of the TAG; else /* |may-aliases| > 1 */ add the may-aliases of var as may-aliases of TAG; } Makes sense to anyone? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26197
[Bug tree-optimization/26419] -ftree-vectorizer-verbose=n documentation is terse
--- Comment #3 from dorit at il dot ibm dot com 2006-02-26 11:05 --- patch: http://gcc.gnu.org/ml/gcc-patches/2006-02/msg01905.html -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26419
[Bug tree-optimization/26420] -ftree-vectorizer-verbose=1 prints unvectorized loops information
--- Comment #2 from dorit at il dot ibm dot com 2006-02-26 11:01 --- For -ftree-vectorizer-verbose=1 the vectorizer reports each loop that got vectorized, and also the total number of loops that got vectorized, even if that number is zero. If preferable, we can report that 0 loops got vectorized only under -ftree-vectorizer-verbose=2, or higher. The patch below makes the "vectorized 0 loops" be reported for verbosity level 2 and higher. Shall I suggest the patch for mainline? Index: tree-vectorizer.c === *** tree-vectorizer.c (revision 111450) --- tree-vectorizer.c (working copy) *** vectorize_loops (struct loops *loops) *** 2047,2053 num_vectorized_loops++; } ! if (vect_print_dump_info (REPORT_VECTORIZED_LOOPS)) fprintf (vect_dump, "vectorized %u loops in function.\n", num_vectorized_loops); --- 2047,2058 num_vectorized_loops++; } ! if (num_vectorized_loops > 0 ! && vect_print_dump_info (REPORT_VECTORIZED_LOOPS)) ! fprintf (vect_dump, "vectorized %u loops in function.\n", !num_vectorized_loops); ! else if (num_vectorized_loops == 0 ! && vect_print_dump_info (REPORT_UNVECTORIZED_LOOPS)) fprintf (vect_dump, "vectorized %u loops in function.\n", num_vectorized_loops); -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26420
[Bug tree-optimization/26360] [4.2 Regression] Autovectorization of char -> int loop gets ICE
--- Comment #3 from dorit at il dot ibm dot com 2006-02-21 22:02 --- patch: http://gcc.gnu.org/ml/gcc-patches/2006-02/msg01713.html -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26360
[Bug tree-optimization/26359] [4.2 Regression] Over optimization of loop when using -ftree-vectorize
--- Comment #3 from dorit at il dot ibm dot com 2006-02-21 22:01 --- patch: http://gcc.gnu.org/ml/gcc-patches/2006-02/msg01710.html -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26359
[Bug tree-optimization/26362] ICE on the autovect-branch (gfortran example)
--- Comment #2 from dorit at il dot ibm dot com 2006-02-20 17:09 --- Actually there's this patch by rth that seems to fix this ICE; it's from a while back, I don't think it was fully tested at the time, and I'm not sure it provides all the missing bits/fixes for SSE support. === targhooks.c == --- targhooks.c (revision 108004) +++ targhooks.c (local) @@ -448,7 +448,8 @@ tree type; enum machine_mode mode; block_stmt_iterator bsi; - tree th, tl, result, x; + tree t1, t2, result, x; + int i, n; /* If the first argument is a type, just check if support is available. Return a non NULL value if supported, NULL_TREE otherwise. @@ -472,31 +473,37 @@ return NULL; bsi = bsi_for_stmt (stmt); - - th = make_rename_temp (type, NULL); - x = build2 (VEC_INTERLEAVE_HIGH_EXPR, type, vec1, vec2); - x = build2 (MODIFY_EXPR, type, th, x); - th = make_ssa_name (th, x); - TREE_OPERAND (x, 0) = th; - bsi_insert_before (&bsi, x, BSI_SAME_STMT); - tl = make_rename_temp (type, NULL); - x = build2 (VEC_INTERLEAVE_LOW_EXPR, type, vec1, vec2); - x = build2 (MODIFY_EXPR, type, tl, x); - tl = make_ssa_name (tl, x); - TREE_OPERAND (x, 0) = tl; - bsi_insert_before (&bsi, x, BSI_SAME_STMT); + n = exact_log2 (GET_MODE_NUNITS (mode)) - 1; + for (i = 0; i < n; ++i) +{ + t1 = create_tmp_var (type, NULL); + add_referenced_tmp_var (t1); + x = build2 (VEC_INTERLEAVE_HIGH_EXPR, type, vec1, vec2); + x = build2 (MODIFY_EXPR, type, t1, x); + t1 = make_ssa_name (t1, x); + TREE_OPERAND (x, 0) = t1; + bsi_insert_before (&bsi, x, BSI_SAME_STMT); - result = make_rename_temp (type, NULL); - /* ??? Endianness issues? */ + t2 = create_tmp_var (type, NULL); + add_referenced_tmp_var (t2); + x = build2 (VEC_INTERLEAVE_LOW_EXPR, type, vec1, vec2); + x = build2 (MODIFY_EXPR, type, t2, x); + t2 = make_ssa_name (t2, x); + TREE_OPERAND (x, 0) = t2; + bsi_insert_before (&bsi, x, BSI_SAME_STMT); + + if (BYTES_BIG_ENDIAN) +vec1 = t1, vec2 = t2; + else +vec1 = t2, vec2 = t1; +} + x = build2 (odd_p ? VEC_INTERLEAVE_HIGH_EXPR : VEC_INTERLEAVE_LOW_EXPR, - type, th, tl); - x = build2 (MODIFY_EXPR, type, result, x); - result = make_ssa_name (result, x); - TREE_OPERAND (x, 0) = result; - bsi_insert_before (&bsi, x, BSI_SAME_STMT); + type, vec1, vec2); + x = build2 (MODIFY_EXPR, type, dest, x); - return result; + return x; } tree -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26362
[Bug tree-optimization/26362] ICE on the autovect-branch (gfortran example)
--- Comment #1 from dorit at il dot ibm dot com 2006-02-20 16:45 --- Looks like the vectorizer detects a strided access in this testcase. Strided accesses are not entirely supported for SSE right now (work in progress...), but it is enabled, so currently all strided testcases brake on SSE. -- dorit at il dot ibm dot com changed: What|Removed |Added CC||rth at gcc dot gnu dot org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26362
[Bug tree-optimization/26197] [4.2 regression] ICE in is_old_name, at tree-into-ssa.c:466
--- Comment #10 from dorit at il dot ibm dot com 2006-02-19 16:10 --- so maybe if an SFT has may-aliases then new_type_alias should add the may-aliases of the SFT as may-aliases of the new tag, instead of adding the SFT as a may-alias of the new tag. ? There's a comment in new_type_alias that's quite worrying: "/* The following is based on code in add_stmt_operand to ensure that the same defs/uses/vdefs/vuses will be found" So this code depends on code in add_stmt_operand, that may have changed by now... Related or not, there's another bug in new_type_alias in PR26359. -- dorit at il dot ibm dot com changed: What|Removed |Added CC||victork at il dot ibm dot ||com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26197
[Bug tree-optimization/26359] [4.2 Regression] Over optimization of loop when using -ftree-vectorize
--- Comment #2 from dorit at il dot ibm dot com 2006-02-19 15:34 --- The problem is that during dce the call to is_hidden_global_store returns false cause the tag is not marked as global/static. This seems to fix it: Index: tree-ssa-alias.c === *** tree-ssa-alias.c(revision 110911) --- tree-ssa-alias.c(working copy) *** new_type_alias (tree ptr, tree var) *** 2638,2643 --- 2638,2651 add_may_alias (tag, al); } } + + /* CHECKME: + DECL_CONTEXT (tag) = DECL_CONTEXT (var); + TREE_PUBLIC (tag) = TREE_PUBLIC (var); + TREE_READONLY (tag) = TREE_READONLY (var); + */ + MTAG_GLOBAL (tag) = DECL_EXTERNAL (var); + TREE_STATIC (tag) = TREE_STATIC (var); } but I don't know if it's the right thing to do in the general case. -- dorit at il dot ibm dot com changed: What|Removed |Added CC| |dorit at il dot ibm dot com, ||victork at il dot ibm dot ||com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26359
[Bug tree-optimization/26360] [4.2 Regression] Autovectorization of char -> int loop gets ICE
--- Comment #2 from dorit at il dot ibm dot com 2006-02-19 08:50 --- This happens because we actually rely on dce taking place after the vectorizer to clean up dead code. When we detect a pattern (widneing-summation in this case) we create a "dummy" stmt ("pattern-stmt") that represents the pattern and that will be vectorized instead of the original sequence of stmts (that involves in this case type promotions etc). The def of that "pattern-stmt" is not connected to any use. This "pattern-stmt" is never meant to remain in the code in its scalar form (if the loop is vectorized, there will be a vectorized form of that stmt in the loop, but the scalar "pattern-stmt" will always remain dead). So, two ways to handle this - either (1) have a "special" dce pass after the vectorizer that is not disabled by -fno-tree-dce if -ftree-vectorize is on. or (2) have the vectorizer clean up these pattern-stmts itself when it's done with the loop; the vectorizer actually does scan the loop after it's done (in order to free various data structures), so it basically wouldn't cost anything to do this extra cleanup; the question is - wouldn't it be nicer if a pass could rely on dce taking place right after, it instead of trying to do some of the job itself? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26360
[Bug tree-optimization/26197] [4.2 regression] ICE in is_old_name, at tree-into-ssa.c:466
--- Comment #6 from dorit at il dot ibm dot com 2006-02-13 16:23 --- (In reply to comment #5) > Probably related to > http://gcc.gnu.org/ml/gcc-patches/2006-01/msg00446.html Would you expect then that calling mark_new_vars_to_rename, like you did in your patch, will fix this problem? I wasn't able to reproduce this error on powerpc-linux and i686-pc-linux-gnu. I do realize that there's a problem with the setting of virtual operands in the vectorizer. The over conservativeness in the vectorizer with respect to setting aliasing information for vector pointers when accessing struct fields may be responssible for this. I will try to look into this issue. In the meantime, could someone that can reproduce this problem try out the mark_new_vars_to_rename patch that Zdenek suggested in the link? -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=26197
[Bug tree-optimization/25918] gcc.dg/vect/vect-reduc-dot-s16.c scan-tree-dump-times vectorized 1 loops 1 and gcc.dg/vect/vect-reduc-pattern-2.c scan-tree-dump-times vectorized 2 loops 1 fail
--- Comment #7 from dorit at il dot ibm dot com 2006-02-08 14:19 --- (In reply to comment #5) Will take care of that. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25918
[Bug tree-optimization/25918] gcc.dg/vect/vect-reduc-dot-s16.c scan-tree-dump-times vectorized 1 loops 1 and gcc.dg/vect/vect-reduc-pattern-2.c scan-tree-dump-times vectorized 2 loops 1 fail
--- Comment #6 from dorit at il dot ibm dot com 2006-02-08 14:17 --- (In reply to comment #4) > ... This happens > because the IA-64 port defines the widen_ssumv4hi3 pattern. The IA-64 port is > the only one that defines this pattern, and hence is probably the only port > "broken" here. All others will presumably fail to vectorize this loop. that's correct. it's actually a combination of being able to support widen_ssumv4hi3 and (non widening) multiplication of shorts. looks like we need to split these loops into separate testcases, and for this particular loop expect vectorization if vect_widen_sum and vect_short_mult (new keyword) are supported. > and the testcase fails because we only expected 1 loop to be vectorized. > I think the only thing wrong here is that the dg-final tests in the testcase > are not precise enough to handle this case. indeed. Will take care of that. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25918
[Bug tree-optimization/25918] gcc.dg/vect/vect-reduc-dot-s16.c scan-tree-dump-times vectorized 1 loops 1 and gcc.dg/vect/vect-reduc-pattern-2.c scan-tree-dump-times vectorized 2 loops 1 fail
--- Comment #1 from dorit at il dot ibm dot com 2006-01-26 09:07 --- Can you please send the dump files generated by -fdump-tree-vect-details? reduc-dot-s16.c needs the sdot_prodv4hi pattern, which is implemented for ia64, so I'd expect one loop to be vectorized. I wonder what's the problem there. In vect-reduc-pattern-2.c - does the vectorizer report vectorizing one loop? The one loop (that sums shorts into and int accumulator) needs the widen_ssumv4hi3 pattern to be vectorized, which is implemented for ia64. Does that loop get vectorized? The second loop however (that sums chars into and int accumulator) cannot be vectorized on ia64 because the mode of the result of the widen_ssumv8qi3 pattern as implemented on ia64 in short, not int. If this is indeed the reason for the failure we'd probably want to introduce finer keywords to represent the available widening support (in target-supports.exp we currently have just a "vect_widen_sum" keyword, which does not distinguish between char-to-short summation and char-to-int summation). -- dorit at il dot ibm dot com changed: What|Removed |Added CC| |dorit at il dot ibm dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25918
[Bug tree-optimization/25911] [4.2 Regression] ice in vect_recog_dot_prod_pattern
--- Comment #5 from dorit at il dot ibm dot com 2006-01-24 09:10 --- Patch: Index: tree-vect-patterns.c === --- tree-vect-patterns.c(revision 109954) +++ tree-vect-patterns.c(working copy) @@ -243,7 +243,8 @@ gcc_assert (stmt); stmt_vinfo = vinfo_for_stmt (stmt); gcc_assert (stmt_vinfo); - gcc_assert (STMT_VINFO_DEF_TYPE (stmt_vinfo) == vect_loop_def); + if (STMT_VINFO_DEF_TYPE (stmt_vinfo) != vect_loop_def) +return NULL; expr = TREE_OPERAND (stmt, 1); if (TREE_CODE (expr) != MULT_EXPR) return NULL; -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25911
[Bug tree-optimization/25809] New: missed PRE optimization - move "invariant casts" out of loops
In testcases that have reduction, like gcc.dg/vect/vect-reduc-2char.c and gcc.dg/vect-reduc-2short.c, the following casts appear: signed char sdiff; unsigned char ux, udiff; sdiff_0 = ... loop: # sdiff_41 = PHI ; . ux_36 = udiff_37 = (unsigned char) sdiff_41; udiff_38 = x_36 + udiff_37; sdiff_39 = (signed char) udiff_38; end_loop although these casts could be taken out of loop all together. i.e., transform the code into something like the following: signed char sdiff; unsigned char ux, udiff; sdiff_0 = ... udiff_1 = (unsigned char) sdiff_0; loop: # udiff_3 = PHI ; . ux_36 = udiff_2 = ux_36 + udiff_3; end_loop sdiff_39 = (signed char) udiff_2; see this discussion thread: http://gcc.gnu.org/ml/gcc-patches/2005-12/msg01827.html -- Summary: missed PRE optimization - move "invariant casts" out of loops Product: gcc Version: 4.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: dorit at il dot ibm dot com GCC build triplet: ppc64-yellowdog-linux GCC host triplet: ppc64-yellowdog-linux GCC target triplet: ppc64-yellowdog-linux http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25809
[Bug libfortran/21468] vectorizing libfortran
--- Comment #10 from dorit at il dot ibm dot com 2006-01-08 13:49 --- > Reopening since many of the intrinsics could still vectorize better. Could help if you list specific functions that you expect to get vectorized. As far as dotprod is concerned - if it's operating on floats, you need to use -ffast-math or -funsafe-math-optimizations to enable vectorization. If it's dotprod of integers - probably the recent patches I sent to support reduction patterns (http://gcc.gnu.org/ml/gcc-patches/2005-12/msg01896.html) would be required (this functionality is present in auotvect; you can try to see if it's vectorized any better with autovect-branch). -- dorit at il dot ibm dot com changed: What|Removed |Added CC| |dorit at il dot ibm dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21468
[Bug testsuite/25590] FAIL: gcc.dg/tree-ssa/gen-vect-11.c scan-tree-dump-times vectorized 1 loops 1
--- Comment #7 from dorit at il dot ibm dot com 2006-01-04 07:36 --- (sorry, didn't notice it was already diagnosed as such) -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25590
[Bug testsuite/25590] FAIL: gcc.dg/tree-ssa/gen-vect-11.c scan-tree-dump-times vectorized 1 loops 1
--- Comment #6 from dorit at il dot ibm dot com 2006-01-04 07:33 --- Maybe related to: 2005-12-26 Kazu Hirata <[EMAIL PROTECTED]> PR tree-optimization/25125 * convert.c (convert_to_integer): Don't narrow the type of a PLUX_EXPR or MINUS_EXPR if !flag_wrapv and the unwidened type is signed. (indeed this testcase fails vectorization due to cast to unsigned char). If that's the case, it should probably be xfailed and the PR should be "missed optimization" -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25590
[Bug target/25413] wrong alignment or incorrect address computation in vectorized code on Pentium 4 SSE
--- Comment #3 from dorit at il dot ibm dot com 2005-12-15 12:50 --- related discussion: http://gcc.gnu.org/ml/gcc/2005-12/msg00390.html -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25413
[Bug target/25413] wrong alignment or incorrect address computation in vectorized code on Pentium 4 SSE
--- Comment #2 from dorit at il dot ibm dot com 2005-12-15 12:41 --- The problem is that the vectorizer applies loop-peeling in order to align the data reference *(m->c+i), and peeling only works correctly if the data is naturally aligned (aligned on it's type size). This is what the vectorizer currently blindly assumes, but on the Pentium4 doubles are not necessarily 64bit aligned. Accidentally Devang and I discussed this issue last week, and Devang actually committed a patch to apple-ppc branch that works around the problem ( http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=108214). Devang's patch however will not fix this PR - the patch he committed disables vectorization if the vectorizer was able to compute the misalignment, and discovered that it doesn't evenly divide by the type size. In this testcase the misalignment is unknown at compile time. To fix this problem we need to disable loop-peeling in the vectorizer if we can't prove that the data is naturally aligned. Alternatively, if we can't prove either way we can peel the loop but control the number of iterations it will execute using a runtime test (i.e. have the prolog loop iterate the entire loop-count if at runtime we discover that the data is not naturally aligned). -- dorit at il dot ibm dot com changed: What|Removed |Added CC| |dorit at il dot ibm dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=25413
[Bug target/24378] [4.1/4.2 Regression] gcc.dg/vect/pr24300.c (test for excess errors) fails
--- Comment #9 from dorit at il dot ibm dot com 2005-12-14 15:38 --- Thanks for testing the patch. I finally submitted it: http://gcc.gnu.org/ml/gcc-patches/2005-12/msg01071.html -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24378