https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111648
--- Comment #4 from prathamesh3492 at gcc dot gnu.org --- (In reply to prathamesh3492 from comment #3) > Created attachment 56037 [details] > Untested fix > > The issue is that when a1 is a multiple of vector length, we end up creating > following encoding in result: { base_elem, arg[0], arg[1], ... } where arg > is chosen input vector, which is incorrect. > > For above case, vectorizer pass creates VEC_PERM_EXPR<arg0, arg, sel> where: > arg0: { -16, -9, -10, -11 } > arg1: { -12, -5, -6, -7 } > sel = { 3, 4, 5, 6 } > > arg0, arg1 and sel are encoded with npatterns = 1 and nelts_per_pattern = 3. > Since a1 = 4 and arg_len = 4, it ended up creating the result with > following encoding: > res = { arg0[3], arg1[0], arg1[1] } // npatterns = 1, nelts_per_pattern = 3 > = { -11, -12, -5 } > > So for res[4], it used S = (-5) - (-12) = 7 Typo: I meant res[3], not res[4]. Sorry. > And hence computed it as -5 + 7 = 2. > instead of arg1[2], ie, -6. > which is the difference we see in output at -O0 vs -O2. > > The patch tweaks the constratints in valid_mask_for_fold_vec_perm_cst_p to > punt if a1 is a multiple of vector length, so a1 ... ae only selects from > stepped part of the input vector, which seems to fix this issue. > I will run a proper bootstrap+test and post it upstream.