Richard Biener <rguent...@suse.de> writes:
>> > > @@ -2192,6 +2378,17 @@ vect_analyze_slp_instance (vec_info *vinfo,
>> > >                               &tree_size, bst_map);
>> > >    if (node != NULL)
>> > >      {
>> > > +      /* Temporarily allow add_stmt calls again.  */
>> > > +      vinfo->stmt_vec_info_ro = false;
>> > > +
>> > > +      /* See if any patterns can be found in the constructed SLP tree
>> > > +        before we do any analysis on it.  */
>> > > +      vect_match_slp_patterns (node, vinfo, group_size, &max_nunits,
>> > > +                              matches, &npermutes, &tree_size,
>> > > + bst_map);
>> > > +
>> > > +      /* After this no more add_stmt calls are allowed.  */
>> > > +      vinfo->stmt_vec_info_ro = true;
>> > > +
>> > >
>> > > I think this is a bit early to match patterns - I'd defer it to the
>> > > point where all entries into the same SLP subgraph are analyzed, thus
>> > > somewhere at the end of vect_analyze_slp loop over all instances and
>> > > match patterns?  That way phases are more clearly separated.
>> > 
>> > That would probably work, my only worry is that the SLP analysis itself may
>> > fail and bail out at
>> > 
>> >      /* If the loads and stores can be handled with load/store-lane
>> >         instructions do not generate this SLP instance.  */
>> >      if (is_a <loop_vec_info> (vinfo)
>> >          && loads_permuted
>> >          && dr && vect_store_lanes_supported (vectype, group_size,
>> > false))
>
> Ah, that piece of code.  Yeah, I'm repeatedly running into it as well - 
> it's a bad hack that stands in the way all the time :/

At one point I was wondering about trying to drop the above, vectorise with
and without SLP, and then compare their costs, like for VECT_COMPARE_COSTS.
But that seemed like a dead end with the move to doing everything on the
SLP representation.

> I guess we should try moving this upward like to
> vect_analyze_loop_2 right before
>
>   /* Check the SLP opportunities in the loop, analyze and build SLP trees.  
> */
>   ok = vect_analyze_slp (loop_vinfo, *n_stmts);
>   if (!ok)
>     return ok;
>
> and there check whether all grouped loads and stores can be handled
> with store-/load-lanes (and there are no SLP reduction chains?) in
> which case do not try to attempt SLP at all.  Because the testcases
> this check was supposed to change were all-load/store-lane or
> all SLP so the mixed case is probably not worth special casing.
>
> Since load-/store-lanes is an arm speciality I tried to only touch
> this fragile part with a ten-foot pole ;)  CCing Richard, if he
> acks the above I can produce a patch.

Yeah, sounds good to me.  Probably also sorth checking whether the
likely_max iteration count is high enough to support group_size
vectors, if we have enough information to guess that.

We could also get the gen* machinery to emit a macro that is true if at
least one load/store-lane pattern is defined, so that we can skip the
code for non-Arm targets.  I can do that as a follow-up.

Thanks,
Richard

Reply via email to