https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96654

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |rguenth at gcc dot gnu.org,
                   |                            |rsandifo at gcc dot gnu.org
             Blocks|                            |53947

--- Comment #3 from Richard Biener <rguenth at gcc dot gnu.org> ---
The pattern is exercised directly by BB vectorization only, loop vectorization
still uses a fixed vector size.  Still the assembly shows basically the same
code when doing BB vectorization only:

f:
.LFB0:
        .cfi_startproc
        vmovupd (%rdi), %xmm1
        vinsertf128     $0x1, 16(%rdi), %ymm1, %ymm0
        vcvttpd2dqy     %ymm0, %xmm0
        vmovdqu %xmm0, (%rsi)
        vzeroupper
        ret

this is probably because of some tuning (split unaligned loads, not using
a memory operand for vcvttpd2dqy).

With -O3 -fno-tree-loop-vectorize -march=core-avx2 I get

f:
.LFB0:
        .cfi_startproc
        vcvttpd2dqy     (%rdi), %xmm0
        vmovdqu %xmm0, (%rsi)
        ret


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=53947
[Bug 53947] [meta-bug] vectorizer missed-optimizations

Reply via email to