https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100267
--- Comment #5 from Hongtao.liu <crazylht at gmail dot com> ---
(In reply to Hongtao.liu from comment #4)
> (In reply to Hongtao.liu from comment #3)
> > After support v{,p}expand* thats w/o mask operands, codegen seems to be
> > optimal
> >
>
> I was wrong, without mask, it's just simple move.
finally optimized to
_Z16dummyf1_avx512x8PK11flow_avx512:
.LFB5665:
.cfi_startproc
movl (%rdi), %edx
movq 8(%rdi), %rax
vmovdqu (%rax,%rdx,8), %ymm0
vmovdqu 32(%rax,%rdx,8), %ymm1
vpaddq %ymm1, %ymm0, %ymm0
ret
I'm testing the patch.