https://gcc.gnu.org/bugzilla/show_bug.cgi?id=121049
--- Comment #15 from Richard Biener <rguenth at gcc dot gnu.org> ---
I think the issue is that we do
_79 = _78 > { 0, 1, 2, 3, 4, 5, 6, 7 };
vect__12.20_57 = .MASK_LOAD (vectp_mon_lengths.19_51, 256B, _79, { 0, 0, 0,
0, 0, 0, 0, 0 });
vect_patt_18.21_58 = WIDEN_MULT_EVEN_EXPR <vect__12.20_57, { 2, 2, 2, 2, 2,
2, 2, 2 }>;
vect_patt_18.21_59 = WIDEN_MULT_ODD_EXPR <vect__12.20_57, { 2, 2, 2, 2, 2, 2,
2, 2 }>;
_63 = VIEW_CONVERT_EXPR<vector(4) <signed-boolean:1>>(_79);
vect_value_4.23_64 = .COND_ADD (_63, vect_patt_18.21_58, _60, _60);
_65 = VIEW_CONVERT_EXPR<unsigned char>(_79);
_66 = _65 >> 4;
_67 = VIEW_CONVERT_EXPR<vector(4) <signed-boolean:1>>(_66);
vect_value_4.23_68 = .COND_ADD (_67, vect_patt_18.21_59, vect_value_4.23_64,
vect_value_4.23_64);
so we use an even/odd widen mult for the reduction - which ultimatively is OK,
but when we do loop masking it is not, since we the mask the wrong elements.
We'd need an lo/hi widen mult or alternatively do an even/odd extract of the
loop mask instead of taking the lo/hi when distributing.