On Fri, 11 Sep 2020, Richard Sandiford wrote:

> Richard Biener <rguent...@suse.de> writes:
> > This tries to improve BB vectorization dumps by providing more
> > precise locations.  Currently the vect_location is simply the
> > very last stmt in a basic-block that has a location.  So for
> >
> > double a[4], b[4];
> > int x[4], y[4];
> > void foo()
> > {
> >   a[0] = b[0]; // line 5
> >   a[1] = b[1];
> >   a[2] = b[2];
> >   a[3] = b[3];
> >   x[0] = y[0]; // line 9
> >   x[1] = y[1];
> >   x[2] = y[2];
> >   x[3] = y[3];
> > } // line 13
> >
> > we show the user with -O3 -fopt-info-vec
> >
> > t.c:13:1: optimized: basic block part vectorized using 16 byte vectors
> >
> > while with the patch we point to both independently vectorized
> > opportunities:
> >
> > t.c:5:8: optimized: basic block part vectorized using 16 byte vectors
> > t.c:9:8: optimized: basic block part vectorized using 16 byte vectors
> >
> > there's the possibility that the location regresses in case the
> > root stmt in the SLP instance has no location.  For a SLP subgraph
> > with multiple entries the location also chooses one entry at random,
> > not sure in which case we want to dump both.
> >
> > Still as the plan is to extend the basic-block vectorization
> > scope from single basic-block to multiple ones this is a first
> > step to preserve something sensible.
> >
> > Implementation-wise this makes both costing and code-generation
> > happen on the subgraphs as analyzed.
> >
> > Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
> >
> > Richard - is iteration over vector modes for BB vectorization
> > still important now that we have related_vector_type and thus
> > no longer only consider a fixed size?  If so it will probably
> > make sense to somehow still iterate even if there was some
> > SLP subgraph vectorized?  It also looks like BB vectorization
> > was never updated to consider multiple modes based on cost,
> > it will still pick the first opportunity.  For BB vectorization
> > we also have the code that re-tries SLP discovery with
> > splitting the store group.  So what's your overall thoughts to
> > this?
> 
> I think there might be different answers for “in principle” and
> “in practice”. :-)
> 
> In principle, there's no one right answer to (say) “what vector mode
> should I use for 4 32-bit integers?”.  If the block is only operating on
> that type, then VNx4SI is the right choice for 128-bit SVE.  But if the
> block is mostly operating on 4 64-bit integers and just converting to
> 32-bit integers for a small region, then it might be better to use
> 2 VNx2SIs instead (paired with 2 VNx2DIs).
> 
> In practice, one situation in which the current loop might be needed
> is pattern statements.  There we assign a vector type during pattern
> recognition, based only on the element type.  So in that situation,
> the first pass (with the autodetected base vector mode) will not take
> the number of scalar stmts into account.

Ah, indeed.  So currently the per-BB decision is probably not too
bad but when it becomes a per-function decision we need to do something
about this I guess.

> Also, although SLP currently only operates on full vectors,
> I was hoping we would eventually support predication for SLP too.
> At that point, the number of scalar statements wouldn't directly
> determine the number of vector lanes.
> 
> On the cost thing: it would be better to try all four and pick the one
> with the lowest cost, but given your in-progress changes, it seemed like
> a dead end to do that with the current code.
>
> It sounded like the general direction here was to build an SLP graph
> and “solve” the vector type assignment problem in a more global way,
> once we have a view of the entire graph.  Is that right?  If so,
> then at that point we might be able to do something more intelligent
> than just iterate over all the options.  (Although at the same time,
> iterating over all the options on a fixed (sub?)graph would be cheaper
> than what we do now.)

Yes, in the final end we should have the SLP tree build without
having assinged a vector type which also means doing pattern detection
on the SLP tree after the SLP build (which makes that a viable change
only after we got rid of the non-SLP paths).

But yeah, I forgot about the early vector type assingment during
pattern recog ... I did try to get rid of vect_update_shared_vectype
some months ago (the only remaining user of STMT_VINFO_NUM_SLP_USES),
but somehow miserably failed even though we now have
SLP_TREE_VECTYPE - a vector type per SLP node (see
https://gcc.gnu.org/pipermail/gcc-patches/2020-June/547981.html).

So yeah, pattern recog ... a convenient but also quite iffy feature :/

Richard.

Reply via email to