https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114686
JuzheZhong <juzhe.zhong at rivai dot ai> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |juzhe.zhong at rivai dot ai --- Comment #2 from JuzheZhong <juzhe.zhong at rivai dot ai> --- CCing RISC-V folks who may be interested at it. Yeah, I agree to set dynamic lmul as default. I have mentioned it long time ago. However, almost all other RISC-V folks disagree with that. Here is data from Li Pan@intel: https://github.com/Incarnation-p-lee/Incarnation-p-lee/blob/master/performance/coremark-pro/coremark-pro_in_k230_evb.png Doing auto-vectorization on both LLVM and GCC (all LMUL) of coremark-pro. Turns out dynamic LMUL is beneficial. >> The vrgather.vv instruction should be except from that, because an LMUL=8 >> vrgather.vv is way more powerful than eight LMUL=1 vrgather.vv instructions, >> and thus disproportionately complex to implement. When you don't need to >> cross >> lanes, it's possible to unrolling LMUL=1 vrgathers manually, instead of >> choosing a higher LMUL. Agree, I think for some instructions like vrgather, we shouldn't pick the large LMUL even though the register pressure of the program is ok. We can consider large LMUL of vrgather as expensive in dynamic LMUL cost model and optimize it in GCC-15. >> vcompress.vm doesn't scale linearly with LMUL on the XuanTie chips either, >> but >> a better implementation is conceivable, because the work can be better >> distributed/subdivided. GCC currently doesn't seem to generate vcompress.vm >> via >> auto-vectorization anyway: https://godbolt.org/z/Mb5Kba865 GCC may generate compress in auto-vectorization, your case is because GCC failed to vectorize it, we may will optimize it in GCC-15. Here is some cases that GCC may generate vcompress: https://godbolt.org/z/5GKh4eM7z