https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122486
--- Comment #5 from 孙东亚 <sundongya at nucleisys dot com> ---
(In reply to Robin Dapp from comment #4)
> > Would it be possible to skip such optimization/canonicalization, as Clang
> > already does? In practice, an exact match is sometimes essential for
> > programming specific hardware effectively.
>
> Could you share a code snippet or a specific problem you're trying to solve?
> The idea is that the compiler takes care of optimal vsetvl placement.
> It sounds as if you want to emit a particular SEW, LMUL (to be used by
> another instruction?) and not just compute the vector length?
// size_t vl = _riscv_vsetvl_e32m4(4);
asm volatile("vsetvli zero,%0,e32,m4,ta,ma" : : "r"(vl));
vint32m4_t v1 = __riscv_xl_vdscmul_vv_i32m4(vtmp, vtwd, vl);
As shown in the code snippet above, we need a vtype that exactly matches
vint32m4 to satisfy the vdscmul instruction.
Our hardware implements this instruction for 32-bit data only, so the regular
vsetvl intrinsic triggers an illegal-instruction exception.
The only workaround at the moment is inline assembly, which is far from
convenient.
We therefore hope that GCC can skip or bypass this
optimization/canonicalization, but we don’t know how to achieve it.
For details on the vdscmul instruction, please refer to
https://gitee.com/XinShengTech_RISCV_OpenSource/riscv-vector4wireless-extension/blob/Zvw/src/chapter6.adoc.