https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111754
--- Comment #7 from prathamesh3492 at gcc dot gnu.org --- (In reply to Richard Biener from comment #5) > It seems we have VECTOR_CST_NELTS_PER_PATTERN ({ 9.0e+0, 0.0, 0.0, 0.0 }) > 2 and VECTOR_CST_NPATTERNS == 1. And the selector { 1, 0, 1, 2 } has > npatterns == 1 and nelts-per-pattern == 3. > > /* (1) If SEL is a suitable mask as determined by > valid_mask_for_fold_vec_perm_cst_p, then: > res_npatterns = max of npatterns between ARG0, ARG1, and SEL > res_nelts_per_pattern = max of nelts_per_pattern between > ARG0, ARG1 and SEL. > (2) If SEL is not a suitable mask, and TYPE is VLS then: > res_npatterns = nelts in result vector. > res_nelts_per_pattern = 1. > This exception is made so that VLS ARG0, ARG1 and SEL work as before. > */ > if (valid_mask_for_fold_vec_perm_cst_p (arg0, arg1, sel, reason)) > { > res_npatterns > = std::max (VECTOR_CST_NPATTERNS (arg0), > std::max (VECTOR_CST_NPATTERNS (arg1), > sel.encoding ().npatterns ())); > > res_nelts_per_pattern > = std::max (VECTOR_CST_NELTS_PER_PATTERN (arg0), > std::max (VECTOR_CST_NELTS_PER_PATTERN (arg1), > sel.encoding ().nelts_per_pattern ())); > > res_nelts = res_npatterns * res_nelts_per_pattern; > > this seems to be a case that doesn't fit, so the fix needs to be to > valid_mask_for_fold_vec_perm_cst_p which really looks a bit > unwieldly. valid_mask_for_fold_vec_perm_cst_p returns incorrectly true here, which is being addressed in PR111648 patch: https://gcc.gnu.org/pipermail/gcc-patches/2023-October/631926.html Even if the vectors had integral element type: arg0 = arg1 = (v4si){ 9, 0, 0, 0 } // encoded as {9, 0, ...} and sel = { 1, 0, 1, 2 } // encoded as {1, 0, 1, ...} The pattern in sel {1, 0, 1, ...} would choose elements from arg0, and res would have incorrect encoding with step = -9: res = { arg0[1], arg0[0], arg0[1], ... } = { 0, 9, 0, ... } And res[3] will be incorrectly computed as -9 instead of arg0[2]. However, for floating element types, even if encoding is correct, I assume it will still ICE when trying to derive elements not present in encoding since poly_int_cst can only deal with integral elements ? > > An assert that res_nelts is power-of-two would be nice to add. Sorry, I don't understand. res_nelts for VLA need not be power of 2, since res_nelts_per_pattern can be 3. The encoding for res is chosen to be max of npatterns and max of nelts_per_pattern between arg0, arg1, and sel. Thanks, Prathamesh