https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80631
--- Comment #4 from rguenther at suse dot de <rguenther at suse dot de> --- On December 8, 2017 4:56:12 PM GMT+01:00, "jakub at gcc dot gnu.org" <gcc-bugzi...@gcc.gnu.org> wrote: >https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80631 > >Jakub Jelinek <jakub at gcc dot gnu.org> changed: > > What |Removed |Added >---------------------------------------------------------------------------- > CC| |rguenth at gcc dot gnu.org > >--- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> --- >More complete testcase: > >int v[8] = { 77, 1, 79, 3, 4, 5, 6, 7 }; > >__attribute__((noipa)) void >foo () >{ > int k, r = -1; > for (k = 0; k < 8; k++) > if (v[k] == 77) > r = k; > if (r != 0) > __builtin_abort (); >} > >__attribute__((noipa)) void >bar () >{ > int k, r = 4; > for (k = 0; k < 8; k++) > if (v[k] == 79) > r = k; > if (r != 2) > __builtin_abort (); >} > >int >main () >{ > foo (); > bar (); > return 0; >} > >The conditional reduction handling is buggy. >In foo we emit: > vect_cst__21 = { 8, 8, 8, 8, 8, 8, 8, 8 }; > vect_cst__28 = { 77, 77, 77, 77, 77, 77, 77, 77 }; > vect_cst__30 = { -1, -1, -1, -1, -1, -1, -1, -1 }; > > <bb 3> [local count: 119292720]: >... ># vect_vec_iv_.0_22 = PHI <vect_vec_iv_.0_23(9), { 0, 1, 2, 3, 4, 5, 6, >7 >}(2)> ># vect_r_3.1_24 = PHI <vect_r_3.6_29(9), { 0, 0, 0, 0, 0, 0, 0, 0 }(2)> > # vectp_v.2_25 = PHI <vectp_v.2_26(9), &v(2)> >... > vect_vec_iv_.0_23 = vect_vec_iv_.0_22 + vect_cst__21; > vect__1.4_27 = MEM[(int *)vectp_v.2_25]; > vect_r_3.6_29 = VEC_COND_EXPR <vect__1.4_27 == vect_cst__28, >vect_vec_iv_.0_22, vect_r_3.1_24>; >... > <bb 18> [local count: 119292720]: > # vect_r_3.6_31 = PHI <vect_r_3.6_29(3)> > stmp_r_3.7_32 = REDUC_MAX (vect_r_3.6_31); > stmp_r_3.7_33 = stmp_r_3.7_32 == 0 ? -1 : stmp_r_3.7_32; > >vect_cst__30 which seems to be the initial value of the reduction var r >as a >vector is unused. >The problem is that by starting with zero vector for vect_r_3.1_24 >there is no >difference between a condition match on the first iteration and >no match at all, both result in REDUC_MAX of 0 and the emitted code >assumes >REDUC_MAX of 0 means no match. > >In this case (if the first iteration iterator is constant and bigger >than the >minimum value of the type), just initializing by a vector containing >any value >smaller than the first iteration IV and adjusting that: > stmp_r_3.7_33 = stmp_r_3.7_32 == 0 ? -1 : stmp_r_3.7_32; >to >stmp_r_3.7_33 = stmp_r_3.7_32 == the_chosen_value ? -1 : stmp_r_3.7_32; >or specially in case when the reduction var is previously initialized >to a >value smaller than the minimum, we could build a vector of those values >and >avoid the COND_EXPR on the REDUC_MAX value. > >Now, in case the first iteration iterator is constant, but is the >minimum >value, we can't use this trick. Perhaps we could in that case just >bias it by one, say if the reduction is with unsigned type emit e.g.: ># vect_vec_iv_.0_22 = PHI <vect_vec_iv_.0_23(9), { 1, 2, 3, 4, 5, 6, 7, >8 >}(2)> ># vect_r_3.1_24 = PHI <vect_r_3.6_29(9), { 0, 0, 0, 0, 0, 0, 0, 0 }(2)> > # vectp_v.2_25 = PHI <vectp_v.2_26(9), &v(2)> >... > vect_vec_iv_.0_23 = vect_vec_iv_.0_22 + vect_cst__21; > vect__1.4_27 = MEM[(int *)vectp_v.2_25]; > vect_r_3.6_29 = VEC_COND_EXPR <vect__1.4_27 == vect_cst__28, >vect_vec_iv_.0_22, vect_r_3.1_24>; >... > <bb 18> [local count: 119292720]: > # vect_r_3.6_31 = PHI <vect_r_3.6_29(3)> > stmp_r_3.7_32 = REDUC_MAX (vect_r_3.6_31); > stmt_r_3.7_34 = stmp_r_3.7_32 - 1; >stmp_r_3.7_33 = stmp_r_3.7_32 == 0 ? <original_r_value> : >stmt_r_3.7_34; > >For the non-constant IV first value we actually emit really weird code: >int v[8] = { 77, 1, 79, 3, 4, 5, 6, 7 }; > >__attribute__((noipa)) void >foo (int *v, int f) >{ > int k, r = -1; > for (k = f; k < f + 8; k++) > if (v[k] == 77) > r = k; > if (r != 0) > __builtin_abort (); >} > >__attribute__((noipa)) void >bar (int *v, int f) >{ > int k, r = 4; > for (k = f; k < f + 8; k++) > if (v[k] == 79) > r = k; > if (r != 2) > __builtin_abort (); >} > >int >main () >{ > foo (v, 0); > bar (v, 0); > return 0; >} > >where we emit 2 VEC_COND_EXPRs and 2 REDUC_MAX. While that testcases >passes, >not really sure if it is correct generally, and furthermore, >it seems unnecessarily complicated to me. Can't we just emit what we'd >emit >for unsigned conditional reduction with first iteration 1, and only >after the >vectorized loop adjust it. >So, say for the foo in the second case, emit: > > vect_cst__21 = { 8, 8, 8, 8, 8, 8, 8, 8 }; > vect_cst__28 = { 77, 77, 77, 77, 77, 77, 77, 77 }; > > <bb 3> [local count: 119292720]: >... ># vect_vec_iv_.0_22 = PHI <vect_vec_iv_.0_23(9), { 1, 2, 3, 4, 5, 6, 7, >8 >}(2)> ># vect_r_3.1_24 = PHI <vect_r_3.6_29(9), { 0, 0, 0, 0, 0, 0, 0, 0 }(2)> > # vectp_v.2_25 = PHI <vectp_v.2_26(9), &v(2)> >... > vect_vec_iv_.0_23 = vect_vec_iv_.0_22 + vect_cst__21; > vect__1.4_27 = MEM[(int *)vectp_v.2_25]; > vect_r_3.6_29 = VEC_COND_EXPR <vect__1.4_27 == vect_cst__28, >vect_vec_iv_.0_22, vect_r_3.1_24>; >... > <bb 18> [local count: 119292720]: > # vect_r_3.6_31 = PHI <vect_r_3.6_29(3)> > stmp_r_3.7_32 = REDUC_MAX (vect_r_3.6_31); > stmt_r_3.7_34 = f_9(D) + (stmp_r_3.7_32 - 1) * step; >stmp_r_3.7_33 = stmp_r_3.7_32 == 0 ? <r_value_before_loop> : >stmp_r_3.7_34; >where _22, _24, _29 would be all in vectors of unsigned_type_for (r)? >Or for signed start with { min, min, ... } as condition never seen >value, and { >min+1, min+2, min+3, ... } vector as the initial _22 value? There's a dup for this (the existing vect.exp execute fail) and there is an approved patch for it.