https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122486

--- Comment #3 from 孙东亚 <sundongya at nucleisys dot com> ---
(In reply to Robin Dapp from comment #1)
> That's an optimization/canonicalization of SEW/LMUL, as 32/4 = 8/1.  The
> vsetvl intrinsics docs say:
> 
> 8.1. vsetvl
> The vsetvl intrinsics return the number of elements processed in a
> stripmining loop when provided with the element width and LMUL in the
> intrinsic suffix. This pseudo intrinsic is typically mapped to vsetvli or
> vsetivli instructions.
> 
> The implementation must respect the ratio between SEW and LMUL given to the
> intrinsic.  On the other hand, as mentioned in Section 5.2, the vsetvl
> intrinsics do not necessarily map to the emission a vsetvli or vsetivli
> instruction of that exact SEW and LMUL provided.  The actual value written
> to the vl control register is an implementation defined behavior and
> typically not known until runtime

Would it be possible to skip such optimization/canonicalization, as Clang
already does? In practice, an exact match is sometimes essential for
programming specific hardware effectively.

Reply via email to