Re: [PATCH v5 2/2] RISC-V: Allow VLS types using up to LMUL 8

Robin Dapp Mon, 01 Sep 2025 03:51:44 -0700

We used to apply -mrvv-max-lmul= to limit VLS code gen, auto vectorizer,
and builtin string function expansion. But I think the VLS code gen part doesn't
need this limit, since it only happens when the user explicitly writes vector
types.


For example, int32x8_t under -mrvv-max-lmul=m1 with VLEN=128 would be split into
two int32x4_t, which generate more instructions and runs slower.

In this patch, I changed -mrvv-max-lmul= to only affect auto vectorization and
builtin string function expansion. Actually, the option's help text already
says it only controls the LMUL used by auto-vectorization, so I believe this
change is makes sense :)


This might have been discussed while I was away so I haven't complained yet :)

To me the -mrvv-max-lmul option always included "everything" and IMHO themaximum LMUL should be generally tied to a microarchitecture.

Many of the higher-end cores won't favor LMUL > 1 and I'd find it surprising ifwe started emitting LMUL8 even for fixed vector sizes.

To play devil's advocate: If LMUL8 (or 4, 2) is faster why don't we enable itunconditionally? Not that I think it's generally faster but what's specialabout such a VLS example that doesn't hold for auto-vectorization?

Is the code for this example particularly bad for LMUL1 or is it optimal andLMUL8 is just faster on your uarchs?


--
Regards
Robin

Re: [PATCH v5 2/2] RISC-V: Allow VLS types using up to LMUL 8

Reply via email to