https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122486
--- Comment #3 from 孙东亚 <sundongya at nucleisys dot com> --- (In reply to Robin Dapp from comment #1) > That's an optimization/canonicalization of SEW/LMUL, as 32/4 = 8/1. The > vsetvl intrinsics docs say: > > 8.1. vsetvl > The vsetvl intrinsics return the number of elements processed in a > stripmining loop when provided with the element width and LMUL in the > intrinsic suffix. This pseudo intrinsic is typically mapped to vsetvli or > vsetivli instructions. > > The implementation must respect the ratio between SEW and LMUL given to the > intrinsic. On the other hand, as mentioned in Section 5.2, the vsetvl > intrinsics do not necessarily map to the emission a vsetvli or vsetivli > instruction of that exact SEW and LMUL provided. The actual value written > to the vl control register is an implementation defined behavior and > typically not known until runtime Would it be possible to skip such optimization/canonicalization, as Clang already does? In practice, an exact match is sometimes essential for programming specific hardware effectively.
