https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111354

            Bug ID: 111354
           Summary: [7/10/12 regression] The instructions of the DPDK demo
                    program are different and run time increases.
           Product: gcc
           Version: 10.3.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: d_vampile at 163 dot com
  Target Milestone: ---

Created attachment 55863
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55863&action=edit
test case

Test platform: x86_64
The test platform supports avx2 and sse4.2
Default mtune=generic
Compiler Options:  
gcc main.c -g -o main -O2 -msse4.2 -mavx2 -fno-inline

GCC 7.3.0 produces:
.L3:
        vmovdqu (%rsi), %xmm3
        subq    $-128, %rdi
        subq    $-128, %rsi
        vmovdqu -96(%rsi), %xmm2
        vinserti128     $0x1, -112(%rsi), %ymm3, %ymm3
        vmovdqu -64(%rsi), %xmm1
        vinserti128     $0x1, -80(%rsi), %ymm2, %ymm2
        vmovdqu -32(%rsi), %xmm0
        vinserti128     $0x1, -48(%rsi), %ymm1, %ymm1
        vinserti128     $0x1, -16(%rsi), %ymm0, %ymm0
        vmovups %xmm3, -128(%rdi)
        vextracti128    $0x1, %ymm3, -112(%rdi)
        vmovups %xmm2, -96(%rdi)
        vextracti128    $0x1, %ymm2, -80(%rdi)
        vmovups %xmm1, -64(%rdi)
        vextracti128    $0x1, %ymm1, -48(%rdi)
        vmovups %xmm0, -32(%rdi)
        vextracti128    $0x1, %ymm0, -16(%rdi)
        cmpq    %rax, %rdi
        jne     .L3
        vzeroupper

Runtime with gcc7.3.0:
$ time ./main_gcc7.3 2000
start to run 2000.
end to run 2000.

real    6m30.461s
user    6m26.587s
sys     0m0.814s

GCC 10.3.0 produces:
.L3:
        vmovdqu (%rsi), %xmm4
        vmovdqu 32(%rsi), %xmm5
        subq    $-128, %rdi
        subq    $-128, %rsi
        vmovdqu -64(%rsi), %xmm6
        vmovdqu -32(%rsi), %xmm7
        vinserti128     $0x1, -112(%rsi), %ymm4, %ymm3
        vinserti128     $0x1, -80(%rsi), %ymm5, %ymm2
        vinserti128     $0x1, -48(%rsi), %ymm6, %ymm1
        vinserti128     $0x1, -16(%rsi), %ymm7, %ymm0
        vmovdqu %xmm3, -128(%rdi)
        vextracti128    $0x1, %ymm3, -112(%rdi)
        vextracti128    $0x1, %ymm2, -80(%rdi)
        vmovdqu %xmm2, -96(%rdi)
        vextracti128    $0x1, %ymm1, -48(%rdi)
        vextracti128    $0x1, %ymm0, -16(%rdi)
        vmovdqu %xmm1, -64(%rdi)
        vmovdqu %xmm0, -32(%rdi)
        cmpq    %rax, %rdi
        jne     .L3
        vzeroupper

Runtime with gcc10.3.0:
$ time ./main_gcc10.3 2000
start to run 2000.
end to run 2000.

real    7m18.696s
user    7m13.912s
sys     0m1.098s


GCC 12.3.0 produces:
.L3:
        vmovdqu (%rsi), %ymm2
        vmovdqu 32(%rsi), %ymm1
        subq    $-128, %rdi
        subq    $-128, %rsi
        vmovdqu -64(%rsi), %ymm0
        vmovdqu -32(%rsi), %ymm3
        vmovdqu %ymm2, -128(%rdi)
        vmovdqu %ymm3, -32(%rdi)
        vmovdqu %ymm1, -96(%rdi)
        vmovdqu %ymm0, -64(%rdi)
        cmpq    %rax, %rdi
        jne     .L3
        vzeroupper

Runtime with gcc12.3.0:
$ time ./main_gcc12.3 2000
start to run 2000.
end to run 2000.

real    10m1.303s
user    9m52.527s
sys     0m2.253s

Why does it seem that the instructions of gcc12 are simpler but run time is
significantly increased in the same test environment and compilation options?

What is the reason for the different instructions generated by these three
versions of gcc?

Reply via email to