https://gcc.gnu.org/bugzilla/show_bug.cgi?id=125303
--- Comment #7 from Tamar Christina <tnfchris at gcc dot gnu.org> --- (In reply to [email protected] from comment #6) > On Mon, 18 May 2026, tnfchris at gcc dot gnu.org wrote: > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=125303 > > > > --- Comment #5 from Tamar Christina <tnfchris at gcc dot gnu.org> --- > > (In reply to Richard Biener from comment #2) > >> (In reply to Drea Pinski from comment #1) > >>> Confirmed. > >>> > >>> The difference comes from who does the expansion/combining. > >>> > >>> So vec_shuf is handled by vectorizer SLP pass. > >>> > >>> While vec_xor_shuf is handled by forwprop1 and veclowering pass. > >>> > >>> The veclowering pass does not handle: > >>> _15 = VEC_PERM_EXPR <_2, _2, { 4, 0, 5, 1, 6, 2, 7, 3 }>; > >>> > >>> in a partial wise and only handles scalar wise at this stage. Nobody has > >>> improved it yet to try to use partial vector sizes. > >> > >> The idea is that the re-vectorization patches from Tamar should address > >> this > >> (we'll have to trigger extra BB SLP seeds of course) > > > > I've been looking into this today to see what's required here, as my current > > patches > > allow growing of the VF but not shrinking. The extra seed is easy enough to > > do: > > > > @@ -10000,6 +10003,24 @@ vect_slp_check_for_roots (bb_vec_info bb_vinfo) > > } > > } > > } > > +#if 1 > > + else if (gimple_vdef (assign) > > + && VECTOR_TYPE_P (TREE_TYPE (rhs)) > > + && ((op = optab_for_tree_code (code, TREE_TYPE (rhs), > > optab_default)) == unknown_optab > > + || !can_implement_p (op, TYPE_MODE (TREE_TYPE (rhs))))) > > + { > > + vec<stmt_vec_info> roots = vNULL; > > + auto stmt_vinfo = bb_vinfo->lookup_stmt (assign); > > + roots.safe_push (stmt_vinfo); > > + vec<stmt_vec_info> stmts; > > + auto num_entries = TYPE_VECTOR_SUBPARTS (TREE_TYPE > > (rhs)).to_constant > > (); > > + stmts.create (num_entries); > > + for (int i = 0; i < num_entries; i++) > > + stmts.quick_push (stmt_vinfo); > > + bb_vinfo->roots.safe_push (slp_root (slp_inst_kind_store, > > + stmts, roots)); > > + } > > +#endif > > } > > } > > > > Which triggers a change in data_ref analysis since we don't allow loads > > groups > > with zero step during BB vectorization. > > Updating that hits the largest road block and as expected that's build_slp > > analysis. > > > > Because the group size is larger than any supported vector type it fails. > > I changed it to pick the largest possible vectype and then if analysis > > succeeds > > would set unroll fact. > > I'm a bit confused, it should be fine to pick a smaller vector type if the > group size is a multiple of it. The unroll factor (VF) is always 1 in > BB vectorization, but yes, "ncopies" can be > 1 iff the vector type is > smaller than the group size. > Yeah I came to the same conclusion and did that below. Which I think failed because I hacked it in and the query for vectype was inconsistent. > > > > However there are way too many things doing analysis as we expected so it > > still > > fails: > > > > note: === vect_analyze_data_ref_accesses === > > note: === vect_determine_precisions === > > note: === vect_pattern_recog === > > note: === vect_analyze_slp === > > note: Starting SLP discovery for > > note: *x_5(D) = _15; > > note: *x_5(D) = _15; > > note: *x_5(D) = _15; > > note: *x_5(D) = _15; > > note: *x_5(D) = _15; > > note: *x_5(D) = _15; > > note: *x_5(D) = _15; > > note: *x_5(D) = _15; > > note: starting SLP discovery for node 0x5eba780 > > note: get vectype for scalar type (group size 8): unsigned int > > note: vectype: vector(4) unsigned int > > note: get vectype for smallest scalar type: vec256 > > note: nunits vectype: vector(4) unsigned int > > note: nunits = 4 > > note: Build SLP for *x_5(D) = _15; > > note: Build SLP for *x_5(D) = _15; > > note: Build SLP for *x_5(D) = _15; > > note: Build SLP for *x_5(D) = _15; > > note: Build SLP for *x_5(D) = _15; > > note: Build SLP for *x_5(D) = _15; > > note: Build SLP for *x_5(D) = _15; > > note: Build SLP for *x_5(D) = _15; > > note: SLP discovery for node 0x5eba780 failed > > note: SLP discovery failed > > > > So I'm not sure what the cleanest solution is he.. hmm maybe I'm looking at > > this wrong... > > > > Since the tree is supposed to already be vectorized, perhaps I should > > instead > > build the initial > > groups using the largest supported group size and then set the unroll based > > on > > that.. > > I think the group size has to match the vector type used in the IL > (or be larger if we combine two vector typed operations). But as I didn't > yet see your re-vectorization patches I can only guess what you did to > representation in the SLP graph ... > Yeah the design is actually quite simple, which is why I was checking if it would adapt to the above. I'm cleaning it up now and will send an RFC end of next week (bank holidays this and next week). I needed to get enough done to see how to change if-convert etc so I can explain the scope we need. I think I have enough now. > > create lanes according to the vector type in the IL. Rely on > "ncopies" for code generation. It should work the same as you'd > have scalar code with 16 lanes but vector type with 4 lanes. > > > Note that trying to change unrolling so it matches the old VF seems to > > trigger > > node splitting: > > > > note: SLP discovery succeeded but node needs splitting > > > > which then ICEs. But haven't looked into that yet in case this is a bad > > approach. > > That said - I think we can followup with the details but should make > sure the design of re-vectorization is sound to handle this case since > IMO it's trivially the "same". > Ack, I'll send up a write up and RFC next week, I think I know how things should be handled. Well everything but data-ref analysis which I think needs a target hook for describing intrinsics loads. Thanks, Tamar > Richard.
