Richard Biener <rguent...@suse.de> writes: > This adds, incrementally ontop of moving VEC_PERM_EXPR folding > to match.pd (but not incremental patch - sorry...), folding > of single-element insert permutations to BIT_INSERT_EXPR. > > Things are quite awkward with the new poly-int vec-perm stuff > so effectively this doesn't work for SVE and is still very > ugly. I wonder how to make it generic enough so SVE would > be happy and / or how to make the code prettier. > > I also can't find a helper to read a specific vector element > from a VECTOR_CST/CONSTRUCTOR (can I even do that "generally" > with a poly-int index?!), but there surely must be one.
Yeah, would be nice to have a helper that handles both VECTOR_CST and CONSTRUCTOR, even if just for constant indices. CONSTRUCTORs are still very much fixed-length, so it wouldn't really make sense to fold a poly_int index at compile time. The only case we could handle is when the index is known to be beyond the last nonzero element. We could fold some poly_int VECTOR_CST_ELT indices to poly_ints, but it'd depend on the values involved. Indexing specific poly_int elements of a single vector is a bit dubious in length-agnostic code though. Not saying it'll never be useful, but it's certainly not a native SVE operation. So I think even for SVE, constant VECTOR_CST/CONSTRUCTOR elements are the only interesting case for now. > [...] > @@ -11774,61 +11777,7 @@ fold_ternary_loc (location_t loc, enum t > poly_uint64 nelts = TYPE_VECTOR_SUBPARTS (type); > bool single_arg = (op0 == op1); > vec_perm_indices sel (builder, single_arg ? 1 : 2, nelts); > - > - /* Check for cases that fold to OP0 or OP1 in their original > - element order. */ > - if (sel.series_p (0, 1, 0, 1)) > - return op0; > - if (sel.series_p (0, 1, nelts, 1)) > - return op1; > - > - if (!single_arg) > - { > - if (sel.all_from_input_p (0)) > - op1 = op0; > - else if (sel.all_from_input_p (1)) > - { > - op0 = op1; > - sel.rotate_inputs (1); > - } > - } Since this isn't an incremental patch... :-) One of the holes of the current code is that we still allow two permute indices for the same permutation, e.g. { 4, 1, 2, 3 } and { 0, 5, 6, 7 }. IMO we should canonicalize it so that the first index selects from the first vector. So maybe the above should be: if (!single_arg) { if (known_ge (sel[0], nunits)) { std::swap (op0, op1); sel.rotate_inputs (1); } if (sel.all_from_input_p (0)) op1 = op0; } > [...] > + /* See if the permutation is performing a single element > + insert from a CONSTRUCTOR or constant and use a BIT_INSERT_EXPR > + in that case. */ > + unsigned HOST_WIDE_INT cnelts; > + if ((TREE_CODE (cop0) == VECTOR_CST > + || TREE_CODE (cop0) == CONSTRUCTOR > + || TREE_CODE (cop1) == VECTOR_CST > + || TREE_CODE (cop1) == CONSTRUCTOR) > + && nelts.is_constant (&cnelts)) > + { > + unsigned first = 0, first_oo = 0, first_i; > + unsigned second = 0, second_oo = 0, second_i; > + HOST_WIDE_INT idx; > + for (unsigned HOST_WIDE_INT i = 0; i < cnelts; ++i) > + if (!sel[i].is_constant (&idx)) > + { > + first = second = 0; > + break; > + } > + else if ((unsigned HOST_WIDE_INT)idx < cnelts) > + { > + first_i = i; > + first++; > + first_oo += (unsigned HOST_WIDE_INT)idx != i; > + } > + else > + { > + second_i = i; > + second++; > + second_oo += (unsigned HOST_WIDE_INT)idx != i + cnelts; > + } This won't handle the case in which the inserted element comes from the same vector. If we add the extra canonicalization above, we'd only ever be inserting into the second vector at element 0. The test for that would be: if (sel.series_p (1, 1, nelts + 1, 1)) // insert sel[0] into index 0 of the second vector I think the SVE-friendly way of doing the check for the first vector would be: unsigned int encoded_nelts = sel.encoding ().encoded_nelts (); unsigned int i = 0; for (; i < encoded_nelts; ++i) if (maybe_ne (sel[i], i)) break; if (i < encoded_nelts && sel.series_p (i + 1, 1, i + 1, 1)) // insert sel[i] into index i of the first vector Although that's two searches, one of them will halt very early. The SVE-friendly way of getting the VECTOR_CST/CONSTRUCTOR element would be: poly_uint64 idx = sel[i]; unsigned HOST_WIDE_INT const_idx; if (known_le (idx, nelts) && idx.is_constant (&const_idx)) // const_idx from first vector else if (known_ge (idx, nelts) && (idx - nelts).is_constant (&const_idx)) // const_idx from second vector else // bail out ...which is a bit ugly, I admit. Thanks, Richard