https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68483

--- Comment #4 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
Ah, no, the problem is not on the backend side, but during veclower2 pass.
Before that pass we after the replacement of v>> 64 or v>>32 shifts we have:
  vect_sum_15.12_58 = VEC_PERM_EXPR <vect_sum_15.10_57, { 0, 0, 0, 0 }, { 2, 3,
4, 5 }>;
  vect_sum_15.12_59 = vect_sum_15.12_58 + vect_sum_15.10_57;
  vect_sum_15.12_60 = VEC_PERM_EXPR <vect_sum_15.12_59, { 0, 0, 0, 0 }, { 1, 2,
3, 4 }>;
  vect_sum_15.12_61 = vect_sum_15.12_60 + vect_sum_15.12_59;
but veclower2 for some reason decides to lower the latter VEC_PERM_EXPR into:
  _32 = BIT_FIELD_REF <vect_sum_15.12_59, 32, 32>;
  _17 = BIT_FIELD_REF <vect_sum_15.12_59, 32, 64>;
  _23 = BIT_FIELD_REF <vect_sum_15.12_59, 32, 96>;
  vect_sum_15.12_60 = {_32, _17, _23, 0};
The first VEC_PERM_EXPR is kept and generates efficient code.  If I manually
disable in the debugger the lowering, the code regression is gone.

Reply via email to