https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114109

--- Comment #3 from JuzheZhong <juzhe.zhong at rivai dot ai> ---
(In reply to Robin Dapp from comment #2)
> It is vectorized with a higher zvl, e.g. zvl512b, refer
> https://godbolt.org/z/vbfjYn5Kd.

OK. I see. But Clang generates many slide instruction which are expensive in
real hardware.

And also vluxei64 is also expensive.

I am not sure which is better. It should be tested on real RISC-V hardware to
evaluate their performance rather than simply tested on SPIKE/QEMU dynamic
instructions count.

Reply via email to