On Tue, May 30, 2023 at 11:17 AM juzhe.zh...@rivai.ai <juzhe.zh...@rivai.ai> wrote: > > In the future, we will definitely mixing VLA and VLS-vlmin together in a > codegen and it will not cause any issues. > For VLS-vlmin, I prefer it is used in length style auto-vectorization (I am > not sure since my SELECT_VL patch is not > finished, I will check if can work when I am working in SELECT_VL patch).
For the future it would be then good to have the vectorizer re-vectorize loops with VLS vector uses to VLA style? I think there's a PR with a draft patch from a few years ago attached (from me) somewhere. Currently the vectorizer will give up when seeing vector operations in a loop but ideally those should simply be SLPed. > >> In general I don't have a good overview of which optimizations we gain by > >> such an approach or rather which ones are prevented by VLA altogether? > These patches VLS modes can help for SLP auto-vectorization. > > ________________________________ > juzhe.zh...@rivai.ai > > > From: Robin Dapp > Date: 2023-05-30 17:05 > To: juzhe.zh...@rivai.ai; Richard Biener; Kito.cheng > CC: rdapp.gcc; gcc-patches; palmer; kito.cheng; jeffreyalaw; pan2.li > Subject: Re: [PATCH] RISC-V: Basic VLS code gen for RISC-V > >>> but ideally the user would be able to specify -mrvv-size=32 for an > >>> implementation with 32 byte vectors and then vector lowering would make > >>> use > >>> of vectors up to 32 bytes? > > > > Actually, we don't want to specify -mrvv-size = 32 to enable vectorization > > on GNU vectors. > > You can take a look this example: > > https://godbolt.org/z/3jYqoM84h <https://godbolt.org/z/3jYqoM84h> > > > > GCC need to specify the mrvv size to enable GNU vectors and the codegen > > only can run on CPU with vector-length = 128bit. > > However, LLVM doesn't need to specify the vector length, and the codegen > > can run on any CPU with RVV vector-length >= 128 bits. > > > > This is what this patch want to do. > > > > Thanks. > I think Richard's question was rather if it wasn't better to do it more > generically and lower vectors to what either the current cpu or what the > user specified rather than just 16-byte vectors (i.e. indeed a fixed > vlmin and not a fixed vlmin == fixed vlmax). > > This patch assumes everything is fixed for optimization purposes and then > switches over to variable-length when nothing can be changed anymore. That > is, we would work on "vlmin"-sized chunks in a VLA fashion at runtime? > We would need to make sure that no pass after reload makes use of VLA > properties at all. > > In general I don't have a good overview of which optimizations we gain by > such an approach or rather which ones are prevented by VLA altogether? > What's the idea for the future? Still use LEN_LOAD et al. (and masking) > with "fixed vlmin"? Wouldn't we select different IVs with this patch than > what we would have for pure VLA? > > Regards > Robin >