Richard Biener <rguent...@suse.de> writes: >> > > @@ -2192,6 +2378,17 @@ vect_analyze_slp_instance (vec_info *vinfo, >> > > &tree_size, bst_map); >> > > if (node != NULL) >> > > { >> > > + /* Temporarily allow add_stmt calls again. */ >> > > + vinfo->stmt_vec_info_ro = false; >> > > + >> > > + /* See if any patterns can be found in the constructed SLP tree >> > > + before we do any analysis on it. */ >> > > + vect_match_slp_patterns (node, vinfo, group_size, &max_nunits, >> > > + matches, &npermutes, &tree_size, >> > > + bst_map); >> > > + >> > > + /* After this no more add_stmt calls are allowed. */ >> > > + vinfo->stmt_vec_info_ro = true; >> > > + >> > > >> > > I think this is a bit early to match patterns - I'd defer it to the >> > > point where all entries into the same SLP subgraph are analyzed, thus >> > > somewhere at the end of vect_analyze_slp loop over all instances and >> > > match patterns? That way phases are more clearly separated. >> > >> > That would probably work, my only worry is that the SLP analysis itself may >> > fail and bail out at >> > >> > /* If the loads and stores can be handled with load/store-lane >> > instructions do not generate this SLP instance. */ >> > if (is_a <loop_vec_info> (vinfo) >> > && loads_permuted >> > && dr && vect_store_lanes_supported (vectype, group_size, >> > false)) > > Ah, that piece of code. Yeah, I'm repeatedly running into it as well - > it's a bad hack that stands in the way all the time :/
At one point I was wondering about trying to drop the above, vectorise with and without SLP, and then compare their costs, like for VECT_COMPARE_COSTS. But that seemed like a dead end with the move to doing everything on the SLP representation. > I guess we should try moving this upward like to > vect_analyze_loop_2 right before > > /* Check the SLP opportunities in the loop, analyze and build SLP trees. > */ > ok = vect_analyze_slp (loop_vinfo, *n_stmts); > if (!ok) > return ok; > > and there check whether all grouped loads and stores can be handled > with store-/load-lanes (and there are no SLP reduction chains?) in > which case do not try to attempt SLP at all. Because the testcases > this check was supposed to change were all-load/store-lane or > all SLP so the mixed case is probably not worth special casing. > > Since load-/store-lanes is an arm speciality I tried to only touch > this fragile part with a ten-foot pole ;) CCing Richard, if he > acks the above I can produce a patch. Yeah, sounds good to me. Probably also sorth checking whether the likely_max iteration count is high enough to support group_size vectors, if we have enough information to guess that. We could also get the gen* machinery to emit a macro that is true if at least one load/store-lane pattern is defined, so that we can skip the code for non-Arm targets. I can do that as a follow-up. Thanks, Richard