[Bug tree-optimization/48172] [4.5/4.6/4.7 Regression] incorrect vectorization of loop in GCC 4.5.* with -O3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48172 --- Comment #14 from Richard Guenther rguenth at gcc dot gnu.org 2011-05-13 08:31:28 UTC --- Author: rguenth Date: Fri May 13 08:31:18 2011 New Revision: 173725 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=173725 Log: 2011-05-13 Richard Guenther rguent...@suse.de PR tree-optimization/48172 * tree-vect-loop-manip.c (vect_vfa_segment_size): Avoid multiplying by number of iterations for equal step. (vect_create_cond_for_alias_checks): Likewise. Modified: trunk/gcc/ChangeLog trunk/gcc/tree-vect-loop-manip.c
[Bug tree-optimization/48172] [4.5/4.6/4.7 Regression] incorrect vectorization of loop in GCC 4.5.* with -O3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48172 --- Comment #8 from Richard Guenther rguenth at gcc dot gnu.org 2011-05-12 10:40:02 UTC --- Created attachment 24236 -- http://gcc.gnu.org/bugzilla/attachment.cgi?id=24236 patch Patch I'm going to test.
[Bug tree-optimization/48172] [4.5/4.6/4.7 Regression] incorrect vectorization of loop in GCC 4.5.* with -O3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48172 --- Comment #9 from Ira Rosen irar at il dot ibm.com 2011-05-12 11:48:56 UTC --- (In reply to comment #8) Created attachment 24236 [details] patch Patch I'm going to test. So, segment_length = scalar_step * vf * scalar_niters? I think we don't need vf here. Also, why not do that only for different steps? Thanks, Ira
[Bug tree-optimization/48172] [4.5/4.6/4.7 Regression] incorrect vectorization of loop in GCC 4.5.* with -O3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48172 --- Comment #10 from Richard Guenther rguenth at gcc dot gnu.org 2011-05-12 12:14:48 UTC --- Author: rguenth Date: Thu May 12 12:14:45 2011 New Revision: 173703 URL: http://gcc.gnu.org/viewcvs?root=gccview=revrev=173703 Log: 2011-05-12 Richard Guenther rguent...@suse.de PR tree-optimization/48172 * tree-vect-loop-manip.c (vect_vfa_segment_size): Do not exclude the number of iterations from the segment size calculation. (vect_create_cond_for_alias_checks): Adjust. * gcc.dg/vect/pr48172.c: New testcase. Added: trunk/gcc/testsuite/gcc.dg/vect/pr48172.c Modified: trunk/gcc/ChangeLog trunk/gcc/testsuite/ChangeLog trunk/gcc/tree-vect-loop-manip.c
[Bug tree-optimization/48172] [4.5/4.6/4.7 Regression] incorrect vectorization of loop in GCC 4.5.* with -O3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48172 --- Comment #12 from Richard Guenther rguenth at gcc dot gnu.org 2011-05-12 12:46:10 UTC --- Like this? Index: gcc/tree-vect-loop-manip.c === --- gcc/tree-vect-loop-manip.c (revision 173703) +++ gcc/tree-vect-loop-manip.c (working copy) @@ -2353,23 +2353,19 @@ vect_create_cond_for_align_checks (loop_ Input: DR: The data reference. - VECT_FACTOR: vectorization factor. - SCALAR_LOOP_NITERS: number of iterations. + LENGTH_FACTOR: segment length to consider. Return an expression whose value is the size of segment which will be accessed by DR. */ static tree -vect_vfa_segment_size (struct data_reference *dr, int vect_factor, +vect_vfa_segment_size (struct data_reference *dr, tree length_factor, tree scalar_loop_niters) { tree segment_length; segment_length = size_binop (MULT_EXPR, fold_convert (sizetype, DR_STEP (dr)), - size_int (vect_factor)); - segment_length = size_binop (MULT_EXPR, - segment_length, - fold_convert (sizetype, scalar_loop_niters)); + fold_convert (sizetype, length_factor)); if (vect_supportable_dr_alignment (dr, false) == dr_explicit_realign_optimized) { @@ -2465,10 +2461,12 @@ vect_create_cond_for_alias_checks (loop_ vect_create_addr_base_for_vector_ref (stmt_b, cond_expr_stmt_list, NULL_TREE, loop); - segment_length_a = vect_vfa_segment_size (dr_a, vect_factor, - scalar_loop_iters); - segment_length_b = vect_vfa_segment_size (dr_b, vect_factor, - scalar_loop_iters); + if (!operand_equal_p (DR_STEP (dr_a), DR_STEP (dr_b), 0)) + length_factor = scalar_loop_iters; + else + length_factor = size_int (vect_factor); + segment_length_a = vect_vfa_segment_size (dr_a, length_factor); + segment_length_b = vect_vfa_segment_size (dr_b, length_factor); if (vect_print_dump_info (REPORT_DR_DETAILS)) { I also think that the re-alignment adjustment needs to be multiplied by DR_STEP (maybe we only support it for DR_STEP == 1 at the moment).
[Bug tree-optimization/48172] [4.5/4.6/4.7 Regression] incorrect vectorization of loop in GCC 4.5.* with -O3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48172 --- Comment #11 from Richard Guenther rguenth at gcc dot gnu.org 2011-05-12 12:38:15 UTC --- (In reply to comment #9) (In reply to comment #8) Created attachment 24236 [details] patch Patch I'm going to test. So, segment_length = scalar_step * vf * scalar_niters? I think we don't need vf here. Hm, right. I'll prepare a followup. Also, why not do that only for different steps? We don't know this at this point. Maybe we can change the structure of the code somewhat. I'll have a look. Thanks, Ira
[Bug tree-optimization/48172] [4.5/4.6/4.7 Regression] incorrect vectorization of loop in GCC 4.5.* with -O3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48172 --- Comment #13 from Ira Rosen irar at il dot ibm.com 2011-05-12 13:02:39 UTC --- (In reply to comment #12) Like this? Yes, looks good to me. I also think that the re-alignment adjustment needs to be multiplied by DR_STEP (maybe we only support it for DR_STEP == 1 at the moment). The realignment adjustment is for the case when we load two consecutive aligned vectors and extract the relevant elements from them (in Altivec): for a[1:4] we load a[0:3] and a[4:7]. So, the adjustment adds one more vector size to cover that additional loaded vector. I don't see why it needs to be multiplied by DR_STEP. Thanks, Ira
[Bug tree-optimization/48172] [4.5/4.6/4.7 Regression] incorrect vectorization of loop in GCC 4.5.* with -O3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48172 Richard Guenther rguenth at gcc dot gnu.org changed: What|Removed |Added Target Milestone|4.5.3 |4.5.4 --- Comment #7 from Richard Guenther rguenth at gcc dot gnu.org 2011-04-28 14:51:31 UTC --- GCC 4.5.3 is being released, adjusting target milestone.
[Bug tree-optimization/48172] [4.5/4.6/4.7 Regression] incorrect vectorization of loop in GCC 4.5.* with -O3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48172 Richard Guenther rguenth at gcc dot gnu.org changed: What|Removed |Added Priority|P3 |P2
[Bug tree-optimization/48172] [4.5/4.6/4.7 Regression] incorrect vectorization of loop in GCC 4.5.* with -O3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48172 PcX xunxun1982 at gmail dot com changed: What|Removed |Added CC||xunxun1982 at gmail dot com --- Comment #6 from PcX xunxun1982 at gmail dot com 2011-04-08 00:15:36 UTC --- Using mingw gcc4.5.2, the situation is not the same. I found that when I use -O3, the result is pass. When I use -O3 -march=native, the result is COMPILER BUG: array[1025] should be 98177 but is 0.
[Bug tree-optimization/48172] [4.5/4.6/4.7 Regression] incorrect vectorization of loop in GCC 4.5.* with -O3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48172 Ira Rosen irar at il dot ibm.com changed: What|Removed |Added CC||irar at il dot ibm.com --- Comment #5 from Ira Rosen irar at il dot ibm.com 2011-03-19 17:11:46 UTC --- (In reply to comment #4) In particular for all tests the segment size we use for the alias tests is not enough for data-refs with differing DR_STEP. It would need to take the number of iterations into account. Right, instead of checking init_addr1 + step1 * vf init_addr2 we should check something like init_addr1 + step1 * vf + scalar_niters * (step1 - step2) init_addr2 (or the other direction).
[Bug tree-optimization/48172] [4.5/4.6/4.7 Regression] incorrect vectorization of loop in GCC 4.5.* with -O3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48172 Richard Guenther rguenth at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |NEW Known to work||4.4.4 Keywords||wrong-code Last reconfirmed||2011.03.18 09:29:27 Ever Confirmed|0 |1 Summary|incorrect vectorization of |[4.5/4.6/4.7 Regression] |loop in GCC 4.5.* with -O3 |incorrect vectorization of ||loop in GCC 4.5.* with -O3 Target Milestone|--- |4.5.3 Known to fail||4.5.0, 4.5.2, 4.6.0, 4.7.0 --- Comment #1 from Richard Guenther rguenth at gcc dot gnu.org 2011-03-18 09:29:27 UTC --- Not vectorized on the 4.4 branch because of t.c:23: note: not vectorized: unsupported unaligned store. t.c:16: note: vect_model_induction_cost: inside_cost = 6, outside_cost = 4 . t.c:16: note: not vectorized: relevant stmt not supported: i.16_8 = (uint32_t) i_60; Confirmed. It's not the conversion but the unaligned store it seems, also fails with 4.5.0.
[Bug tree-optimization/48172] [4.5/4.6/4.7 Regression] incorrect vectorization of loop in GCC 4.5.* with -O3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48172 --- Comment #2 from Richard Guenther rguenth at gcc dot gnu.org 2011-03-18 09:48:39 UTC --- (compute_affine_dependence (stmt_a = D.3677_12 = array[D.3676_11]; ) (stmt_b = array[D.3675_10] = D.3680_16; ) (subscript_dependence_tester (analyze_overlapping_iterations (chrec_a = {0, +, 2}_2) (chrec_b = {514, +, 1}_2) (analyze_siv_subscript (analyze_subscript_affine_affine (overlaps_a = [257 + 1 * x_1] ) (overlaps_b = [0 + 2 * x_1] ) ) ) (overlap_iterations_a = [257 + 1 * x_1] ) (overlap_iterations_b = [0 + 2 * x_1] ) ) (Dependence relation cannot be represented by distance vector.) ) (compute_affine_dependence (stmt_a = D.3679_15 = array[D.3678_14]; ) (stmt_b = array[D.3675_10] = D.3680_16; ) (subscript_dependence_tester (analyze_overlapping_iterations (chrec_a = {1, +, 2}_2) (chrec_b = {514, +, 1}_2) (analyze_siv_subscript (analyze_subscript_affine_affine (overlaps_a = [257 + 1 * x_1] ) (overlaps_b = [1 + 2 * x_1] ) ) ) (overlap_iterations_a = [257 + 1 * x_1] ) (overlap_iterations_b = [1 + 2 * x_1] (overlap_iterations_b = [1 + 2 * x_1] ) ) (Dependence relation cannot be represented by distance vector.) ) ) ... t.c:23: note: versioning for alias required: bad dist vector for array[D.3676_11] and array[D.3675_10] t.c:23: note: mark for run-time aliasing test between array[D.3676_11] and array[D.3675_10] t.c:23: note: versioning for alias required: bad dist vector for array[D.3678_14] and array[D.3675_10] t.c:23: note: mark for run-time aliasing test between array[D.3678_14] and array[D.3675_10] and the alias check looks like vect_parray.14_32 = array; vect_parray.17_31 = array[514]; D.3707_29 = vect_parray.14_32 + 32; D.3708_28 = D.3707_29 vect_parray.17_31; D.3709_63 = vect_parray.17_31 + 16; D.3710_64 = D.3709_63 vect_parray.14_32; D.3711_65 = D.3708_28 || D.3710_64; D.3713_78 = !D.3711_65; if (D.3713_78 != 0) goto bb 12; else goto bb 9; which doesn't at all test something sensible. Shortened non-runtime testcase: #define ASIZE 1028 #define HALF (ASIZE/2) unsigned int array[ASIZE]; void foo(void) { int i; for (i = 0; i HALF-1; i++) array[HALF+i] = array[2*i] + array[2*i + 1]; }
[Bug tree-optimization/48172] [4.5/4.6/4.7 Regression] incorrect vectorization of loop in GCC 4.5.* with -O3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48172 Richard Guenther rguenth at gcc dot gnu.org changed: What|Removed |Added Status|NEW |ASSIGNED CC||irar at gcc dot gnu.org AssignedTo|unassigned at gcc dot |rguenth at gcc dot gnu.org |gnu.org | --- Comment #3 from Richard Guenther rguenth at gcc dot gnu.org 2011-03-18 10:00:34 UTC --- Versioning for alias only seems to consider the case that DR_STEP is the same for all DRs, right? Index: gcc/tree-vect-data-refs.c === --- gcc/tree-vect-data-refs.c (revision 171097) +++ gcc/tree-vect-data-refs.c (working copy) @@ -528,6 +528,14 @@ vect_mark_for_runtime_alias_test (ddr_p print_generic_expr (vect_dump, DR_REF (DDR_B (ddr)), TDF_SLIM); } + if (!operand_equal_p (DR_STEP (DDR_A (ddr)), DR_STEP (DDR_B (ddr)), 0)) +{ + if (vect_print_dump_info (REPORT_DR_DETAILS)) + fprintf (vect_dump, versioning not supported for accesses with +different step.); + return false; +} + if (optimize_loop_nest_for_size_p (loop)) { if (vect_print_dump_info (REPORT_DR_DETAILS)) fixes it for me.
[Bug tree-optimization/48172] [4.5/4.6/4.7 Regression] incorrect vectorization of loop in GCC 4.5.* with -O3
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48172 --- Comment #4 from Richard Guenther rguenth at gcc dot gnu.org 2011-03-18 11:57:11 UTC --- The patch FAILs FAIL: gcc.dg/vect/pr37539.c scan-tree-dump-times vect vectorized 1 loops 2 FAIL: gcc.dg/vect/pr43432.c scan-tree-dump-times vect vectorized 1 loops 1 FAIL: gcc.dg/vect/vect-multitypes-11.c scan-tree-dump-times vect vectorized 1 l oops 2 FAIL: gcc.dg/vect/vect-multitypes-12.c scan-tree-dump-times vect vectorized 1 l oops 2 FAIL: gcc.dg/vect/vect-multitypes-16.c scan-tree-dump-times vect vectorized 1 l oops 1 on x86_64. I can't see how the alias check is ok for pr37539.c given that we could call ayuv2yuyv_ref with d[0], d[4]. Similar for pr43432.c. For vect-multitypes-11.c type-based aliasing should handle the case instead of /space/rguenther/src/svn/trunk/gcc/testsuite/gcc.dg/vect/vect-multitypes-11.c:14: note: versioning for alias required: can't determine dependence between x[i_16] and MEM[(int *)D.3264_7] not sure why this doesn't happen. For -fno-strict-aliasing the runtime test looks bogus as well. vect-multitypes-12.c and vect-multitypes-16.c look similar (but as character types are involved TBAA doesn't help and the vectorization does not appear to be safe). In particular for all tests the segment size we use for the alias tests is not enough for data-refs with differing DR_STEP. It would need to take the number of iterations into account.