On Tue, 26 May 2020, Kewen.Lin wrote: > Hi all, > > This patch set adds support for vector load/store with length, Power > ISA 3.0 brings instructions lxvl/stxvl to perform vector load/store with > length, it's good to be exploited for those cases we don't have enough > stuffs to fill in the whole vector like epilogues. > > This support mainly refers to the handlings for fully-predicated loop > but it also covers the epilogue usage. Now it supports two modes > controlled by parameter vect-with-length-scope, it can support any > loops fully with length or just for those cases with small iteration > counts less than VF like epilogue, for now I don't have ready env to > benchmark it, but based on the current inefficient length generation, > I don't think it's a good idea to adopt vector with length for any loops. > For the main loop which used to be vectorized, it increases register > pressure and introduces extra computation for length, the pro for icache > seems not comparable. But I think it might be a good idea to keep this > parameter there for functionality testing, further benchmarking and other > ports' potential future supports.
Can you explain in more detail what "vector load/store with length" does? Is that a "simplified" masked load/store which instead of masking arbitrary elements (and need a mask computed in the first place), masks elements > N (the length operand)? Thus assuming a lane IV decrementing to zero that IV would be the natural argument for the length operand? If that's correct, what data are the remaining lanes filled with? >From a look at the series description below you seem to add a new way of doing loads for this. Did you review other ISAs (those I'm not familiar with myself too much are SVE, RISC-V and GCN) in GCC whether they have similar support and whether your approach can be supported there? ISTR SVE must have some similar support - what's the reason you do not piggy-back on that? I think a load like I described above might be represented as _1 = __VIEW_CONVERT <v4df_t> (__MEM <double[n_2]> ((double *)p_3)); not sure if that actually works out though. But given it seems it is a contiguous load we shouldn't need an internal function here? [there's a possible size mismatch in the __VIEW_CONVERT above, I guess on RTL you end up with a paradoxical subreg - or an UNSPEC] That said, I'm not very happy seeing yet another way of doing loads [for fully predicated loops]. I'd rather like to see a single representation on GIMPLE at least. Will dig into the patch once the actual workings of those load/store with length is confirmed. I don't spot tree-vect-slp.c being changed - maybe that's not necessary for SLP operation, but please do not introduce new vectorizer features without supporting SLP operation at this point. Thanks, Richard. > As we don't have any benchmarking, this support isn't enabled by default > for any particular cpus, all testings are with explicit parameter setting. > > Bootstrapped on powerpc64le-linux-gnu P9 with all vect-with-length-scope > settings (0/1/2). Regress-test passed with vector-with-length-scope 0, > for the other twos, several vector related cases need to be updated, no > remarkable failures found. BTW, P9 is the one which supports the > functionality but not ready to evaluate the performance. > > Here still are many things to be supported or improved, not limited to: > - reduction/live-out support > - Cost model adding/tweaking > - IFN gimple folding > - Some unnecessary ops improvements eg: vector_size check > - Some possible refactoring > I'll support/post the patches gradually. > > Any comments are highly appreciated. > > BR, > Kewen > ----- > > Patch set outline: > [PATCH 1/7] ifn/optabs: Support vector load/store with length > [PATCH 2/7] rs6000: lenload/lenstore optab support > [PATCH 3/7] vect: Factor out codes for niters smaller than vf check > [PATCH 4/7] hook/rs6000: Add vectorize length mode for vector with length > [PATCH 5/7] vect: Support vector load/store with length in vectorizer > [PATCH 6/7] ivopts: Add handlings for vector with length IFNs > [PATCH 7/7] rs6000/testsuite: Vector with length test cases > > gcc/config/rs6000/rs6000.c | 3 + > gcc/config/rs6000/vsx.md | 30 ++++++++++ > gcc/doc/invoke.texi | 7 +++ > gcc/doc/md.texi | 16 ++++++ > gcc/doc/tm.texi | 6 ++ > gcc/doc/tm.texi.in | 2 + > gcc/internal-fn.c | 13 ++++- > gcc/internal-fn.def | 6 ++ > gcc/optabs.def | 2 + > gcc/params.opt | 4 ++ > gcc/target.def | 7 +++ > gcc/testsuite/gcc.target/powerpc/p9-vec-length-1.h | 18 ++++++ > gcc/testsuite/gcc.target/powerpc/p9-vec-length-2.h | 17 ++++++ > gcc/testsuite/gcc.target/powerpc/p9-vec-length-3.h | 31 +++++++++++ > gcc/testsuite/gcc.target/powerpc/p9-vec-length-4.h | 24 ++++++++ > gcc/testsuite/gcc.target/powerpc/p9-vec-length-5.h | 29 ++++++++++ > gcc/testsuite/gcc.target/powerpc/p9-vec-length-6.h | 32 +++++++++++ > gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-1.c | 15 +++++ > gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-2.c | 15 +++++ > gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-3.c | 18 ++++++ > gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-4.c | 15 +++++ > gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-5.c | 15 +++++ > gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-6.c | 16 ++++++ > gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-run-1.c | 10 ++++ > gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-run-2.c | 10 ++++ > gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-run-3.c | 10 ++++ > gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-run-4.c | 10 ++++ > gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-run-5.c | 10 ++++ > gcc/testsuite/gcc.target/powerpc/p9-vec-length-epil-run-6.c | 10 ++++ > gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-1.c | 16 ++++++ > gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-2.c | 16 ++++++ > gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-3.c | 17 ++++++ > gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-4.c | 16 ++++++ > gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-5.c | 16 ++++++ > gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-6.c | 16 ++++++ > gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-run-1.c | 10 ++++ > gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-run-2.c | 10 ++++ > gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-run-3.c | 10 ++++ > gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-run-4.c | 10 ++++ > gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-run-5.c | 10 ++++ > gcc/testsuite/gcc.target/powerpc/p9-vec-length-full-run-6.c | 10 ++++ > gcc/testsuite/gcc.target/powerpc/p9-vec-length-run-1.h | 34 > ++++++++++++ > gcc/testsuite/gcc.target/powerpc/p9-vec-length-run-2.h | 36 > ++++++++++++ > gcc/testsuite/gcc.target/powerpc/p9-vec-length-run-3.h | 34 > ++++++++++++ > gcc/testsuite/gcc.target/powerpc/p9-vec-length-run-4.h | 62 > +++++++++++++++++++++ > gcc/testsuite/gcc.target/powerpc/p9-vec-length-run-5.h | 45 > +++++++++++++++ > gcc/testsuite/gcc.target/powerpc/p9-vec-length-run-6.h | 52 > +++++++++++++++++ > gcc/testsuite/gcc.target/powerpc/p9-vec-length.h | 14 +++++ > gcc/tree-ssa-loop-ivopts.c | 4 ++ > gcc/tree-vect-loop-manip.c | 268 > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++- > gcc/tree-vect-loop.c | 272 > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++----- > gcc/tree-vect-stmts.c | 152 > ++++++++++++++++++++++++++++++++++++++++++++++++++ > gcc/tree-vectorizer.h | 32 +++++++++++ > 53 files changed, 1545 insertions(+), 18 deletions(-) > -- Richard Biener <rguent...@suse.de> SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)