8 Regression] Compiling with -O3 -mavx2 gives wrong code

rguenther at suse dot de Fri, 08 Dec 2017 09:23:42 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80631


--- Comment #4 from rguenther at suse dot de <rguenther at suse dot de> ---
On December 8, 2017 4:56:12 PM GMT+01:00, "jakub at gcc dot gnu.org"
<gcc-bugzi...@gcc.gnu.org> wrote:
>https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80631
>
>Jakub Jelinek <jakub at gcc dot gnu.org> changed:
>
>           What    |Removed                     |Added
>----------------------------------------------------------------------------
>             CC|                            |rguenth at gcc dot gnu.org
>
>--- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
>More complete testcase:
>
>int v[8] = { 77, 1, 79, 3, 4, 5, 6, 7 };
>
>__attribute__((noipa)) void
>foo ()
>{
>  int k, r = -1;
>  for (k = 0; k < 8; k++)
>    if (v[k] == 77)
>      r = k;
>  if (r != 0)
>    __builtin_abort ();
>}
>
>__attribute__((noipa)) void
>bar ()
>{
>  int k, r = 4;
>  for (k = 0; k < 8; k++)
>    if (v[k] == 79)
>      r = k;
>  if (r != 2)
>    __builtin_abort ();
>}
>
>int
>main ()
>{
>  foo ();
>  bar ();
>  return 0;
>}
>
>The conditional reduction handling is buggy.
>In foo we emit:
>  vect_cst__21 = { 8, 8, 8, 8, 8, 8, 8, 8 };
>  vect_cst__28 = { 77, 77, 77, 77, 77, 77, 77, 77 };
>  vect_cst__30 = { -1, -1, -1, -1, -1, -1, -1, -1 };
>
>  <bb 3> [local count: 119292720]:
>...
># vect_vec_iv_.0_22 = PHI <vect_vec_iv_.0_23(9), { 0, 1, 2, 3, 4, 5, 6,
>7
>}(2)>
># vect_r_3.1_24 = PHI <vect_r_3.6_29(9), { 0, 0, 0, 0, 0, 0, 0, 0 }(2)>
>  # vectp_v.2_25 = PHI <vectp_v.2_26(9), &v(2)>
>...
>  vect_vec_iv_.0_23 = vect_vec_iv_.0_22 + vect_cst__21;
>  vect__1.4_27 = MEM[(int *)vectp_v.2_25];
>  vect_r_3.6_29 = VEC_COND_EXPR <vect__1.4_27 == vect_cst__28,
>vect_vec_iv_.0_22, vect_r_3.1_24>;
>...
>  <bb 18> [local count: 119292720]:
>  # vect_r_3.6_31 = PHI <vect_r_3.6_29(3)>
>  stmp_r_3.7_32 = REDUC_MAX (vect_r_3.6_31);
>  stmp_r_3.7_33 = stmp_r_3.7_32 == 0 ? -1 : stmp_r_3.7_32;
>
>vect_cst__30 which seems to be the initial value of the reduction var r
>as a
>vector is unused.
>The problem is that by starting with zero vector for vect_r_3.1_24
>there is no
>difference between a condition match on the first iteration and
>no match at all, both result in REDUC_MAX of 0 and the emitted code
>assumes
>REDUC_MAX of 0 means no match.
>
>In this case (if the first iteration iterator is constant and bigger
>than the
>minimum value of the type), just initializing by a vector containing
>any value
>smaller than the first iteration IV and adjusting that:
>  stmp_r_3.7_33 = stmp_r_3.7_32 == 0 ? -1 : stmp_r_3.7_32;
>to
>stmp_r_3.7_33 = stmp_r_3.7_32 == the_chosen_value ? -1 : stmp_r_3.7_32;
>or specially in case when the reduction var is previously initialized
>to a
>value smaller than the minimum, we could build a vector of those values
>and
>avoid the COND_EXPR on the REDUC_MAX value.
>
>Now, in case the first iteration iterator is constant, but is the
>minimum
>value, we can't use this trick.  Perhaps we could in that case just
>bias it by one, say if the reduction is with unsigned type emit e.g.:
># vect_vec_iv_.0_22 = PHI <vect_vec_iv_.0_23(9), { 1, 2, 3, 4, 5, 6, 7,
>8
>}(2)>
># vect_r_3.1_24 = PHI <vect_r_3.6_29(9), { 0, 0, 0, 0, 0, 0, 0, 0 }(2)>
>  # vectp_v.2_25 = PHI <vectp_v.2_26(9), &v(2)>
>...
>  vect_vec_iv_.0_23 = vect_vec_iv_.0_22 + vect_cst__21;
>  vect__1.4_27 = MEM[(int *)vectp_v.2_25];
>  vect_r_3.6_29 = VEC_COND_EXPR <vect__1.4_27 == vect_cst__28,
>vect_vec_iv_.0_22, vect_r_3.1_24>;
>...
>  <bb 18> [local count: 119292720]:
>  # vect_r_3.6_31 = PHI <vect_r_3.6_29(3)>
>  stmp_r_3.7_32 = REDUC_MAX (vect_r_3.6_31);
>  stmt_r_3.7_34 = stmp_r_3.7_32 - 1;
>stmp_r_3.7_33 = stmp_r_3.7_32 == 0 ? <original_r_value> :
>stmt_r_3.7_34;
>
>For the non-constant IV first value we actually emit really weird code:
>int v[8] = { 77, 1, 79, 3, 4, 5, 6, 7 };
>
>__attribute__((noipa)) void
>foo (int *v, int f)
>{
>  int k, r = -1;
>  for (k = f; k < f + 8; k++)
>    if (v[k] == 77)
>      r = k;
>  if (r != 0)
>    __builtin_abort ();
>}
>
>__attribute__((noipa)) void
>bar (int *v, int f)
>{
>  int k, r = 4;
>  for (k = f; k < f + 8; k++)
>    if (v[k] == 79)
>      r = k;
>  if (r != 2)
>    __builtin_abort ();
>}
>
>int
>main ()
>{
>  foo (v, 0);
>  bar (v, 0);
>  return 0;
>}
>
>where we emit 2 VEC_COND_EXPRs and 2 REDUC_MAX.  While that testcases
>passes,
>not really sure if it is correct generally, and furthermore,
>it seems unnecessarily complicated to me.  Can't we just emit what we'd
>emit
>for unsigned conditional reduction with first iteration 1, and only
>after the
>vectorized loop adjust it.
>So, say for the foo in the second case, emit:
>
>  vect_cst__21 = { 8, 8, 8, 8, 8, 8, 8, 8 };
>  vect_cst__28 = { 77, 77, 77, 77, 77, 77, 77, 77 };
>
>  <bb 3> [local count: 119292720]:
>...
># vect_vec_iv_.0_22 = PHI <vect_vec_iv_.0_23(9), { 1, 2, 3, 4, 5, 6, 7,
>8
>}(2)>
># vect_r_3.1_24 = PHI <vect_r_3.6_29(9), { 0, 0, 0, 0, 0, 0, 0, 0 }(2)>
>  # vectp_v.2_25 = PHI <vectp_v.2_26(9), &v(2)>
>...
>  vect_vec_iv_.0_23 = vect_vec_iv_.0_22 + vect_cst__21;
>  vect__1.4_27 = MEM[(int *)vectp_v.2_25];
>  vect_r_3.6_29 = VEC_COND_EXPR <vect__1.4_27 == vect_cst__28,
>vect_vec_iv_.0_22, vect_r_3.1_24>;
>...
>  <bb 18> [local count: 119292720]:
>  # vect_r_3.6_31 = PHI <vect_r_3.6_29(3)>
>  stmp_r_3.7_32 = REDUC_MAX (vect_r_3.6_31);
>  stmt_r_3.7_34 = f_9(D) + (stmp_r_3.7_32 - 1) * step;
>stmp_r_3.7_33 = stmp_r_3.7_32 == 0 ? <r_value_before_loop> :
>stmp_r_3.7_34;
>where _22, _24, _29 would be all in vectors of unsigned_type_for (r)?
>Or for signed start with { min, min, ... } as condition never seen
>value, and {
>min+1, min+2, min+3, ... } vector as the initial _22 value?

There's a dup for this (the existing vect.exp execute fail) and there is an
approved patch for it.

[Bug tree-optimization/80631] [6/7/8 Regression] Compiling with -O3 -mavx2 gives wrong code

Reply via email to