On Fri, 11 Sep 2020, Richard Sandiford wrote: > Richard Biener <rguent...@suse.de> writes: > > This tries to improve BB vectorization dumps by providing more > > precise locations. Currently the vect_location is simply the > > very last stmt in a basic-block that has a location. So for > > > > double a[4], b[4]; > > int x[4], y[4]; > > void foo() > > { > > a[0] = b[0]; // line 5 > > a[1] = b[1]; > > a[2] = b[2]; > > a[3] = b[3]; > > x[0] = y[0]; // line 9 > > x[1] = y[1]; > > x[2] = y[2]; > > x[3] = y[3]; > > } // line 13 > > > > we show the user with -O3 -fopt-info-vec > > > > t.c:13:1: optimized: basic block part vectorized using 16 byte vectors > > > > while with the patch we point to both independently vectorized > > opportunities: > > > > t.c:5:8: optimized: basic block part vectorized using 16 byte vectors > > t.c:9:8: optimized: basic block part vectorized using 16 byte vectors > > > > there's the possibility that the location regresses in case the > > root stmt in the SLP instance has no location. For a SLP subgraph > > with multiple entries the location also chooses one entry at random, > > not sure in which case we want to dump both. > > > > Still as the plan is to extend the basic-block vectorization > > scope from single basic-block to multiple ones this is a first > > step to preserve something sensible. > > > > Implementation-wise this makes both costing and code-generation > > happen on the subgraphs as analyzed. > > > > Bootstrapped on x86_64-unknown-linux-gnu, testing in progress. > > > > Richard - is iteration over vector modes for BB vectorization > > still important now that we have related_vector_type and thus > > no longer only consider a fixed size? If so it will probably > > make sense to somehow still iterate even if there was some > > SLP subgraph vectorized? It also looks like BB vectorization > > was never updated to consider multiple modes based on cost, > > it will still pick the first opportunity. For BB vectorization > > we also have the code that re-tries SLP discovery with > > splitting the store group. So what's your overall thoughts to > > this? > > I think there might be different answers for “in principle” and > “in practice”. :-) > > In principle, there's no one right answer to (say) “what vector mode > should I use for 4 32-bit integers?”. If the block is only operating on > that type, then VNx4SI is the right choice for 128-bit SVE. But if the > block is mostly operating on 4 64-bit integers and just converting to > 32-bit integers for a small region, then it might be better to use > 2 VNx2SIs instead (paired with 2 VNx2DIs). > > In practice, one situation in which the current loop might be needed > is pattern statements. There we assign a vector type during pattern > recognition, based only on the element type. So in that situation, > the first pass (with the autodetected base vector mode) will not take > the number of scalar stmts into account.
Ah, indeed. So currently the per-BB decision is probably not too bad but when it becomes a per-function decision we need to do something about this I guess. > Also, although SLP currently only operates on full vectors, > I was hoping we would eventually support predication for SLP too. > At that point, the number of scalar statements wouldn't directly > determine the number of vector lanes. > > On the cost thing: it would be better to try all four and pick the one > with the lowest cost, but given your in-progress changes, it seemed like > a dead end to do that with the current code. > > It sounded like the general direction here was to build an SLP graph > and “solve” the vector type assignment problem in a more global way, > once we have a view of the entire graph. Is that right? If so, > then at that point we might be able to do something more intelligent > than just iterate over all the options. (Although at the same time, > iterating over all the options on a fixed (sub?)graph would be cheaper > than what we do now.) Yes, in the final end we should have the SLP tree build without having assinged a vector type which also means doing pattern detection on the SLP tree after the SLP build (which makes that a viable change only after we got rid of the non-SLP paths). But yeah, I forgot about the early vector type assingment during pattern recog ... I did try to get rid of vect_update_shared_vectype some months ago (the only remaining user of STMT_VINFO_NUM_SLP_USES), but somehow miserably failed even though we now have SLP_TREE_VECTYPE - a vector type per SLP node (see https://gcc.gnu.org/pipermail/gcc-patches/2020-June/547981.html). So yeah, pattern recog ... a convenient but also quite iffy feature :/ Richard.