https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111354
Bug ID: 111354 Summary: [7/10/12 regression] The instructions of the DPDK demo program are different and run time increases. Product: gcc Version: 10.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: d_vampile at 163 dot com Target Milestone: --- Created attachment 55863 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55863&action=edit test case Test platform: x86_64 The test platform supports avx2 and sse4.2 Default mtune=generic Compiler Options: gcc main.c -g -o main -O2 -msse4.2 -mavx2 -fno-inline GCC 7.3.0 produces: .L3: vmovdqu (%rsi), %xmm3 subq $-128, %rdi subq $-128, %rsi vmovdqu -96(%rsi), %xmm2 vinserti128 $0x1, -112(%rsi), %ymm3, %ymm3 vmovdqu -64(%rsi), %xmm1 vinserti128 $0x1, -80(%rsi), %ymm2, %ymm2 vmovdqu -32(%rsi), %xmm0 vinserti128 $0x1, -48(%rsi), %ymm1, %ymm1 vinserti128 $0x1, -16(%rsi), %ymm0, %ymm0 vmovups %xmm3, -128(%rdi) vextracti128 $0x1, %ymm3, -112(%rdi) vmovups %xmm2, -96(%rdi) vextracti128 $0x1, %ymm2, -80(%rdi) vmovups %xmm1, -64(%rdi) vextracti128 $0x1, %ymm1, -48(%rdi) vmovups %xmm0, -32(%rdi) vextracti128 $0x1, %ymm0, -16(%rdi) cmpq %rax, %rdi jne .L3 vzeroupper Runtime with gcc7.3.0: $ time ./main_gcc7.3 2000 start to run 2000. end to run 2000. real 6m30.461s user 6m26.587s sys 0m0.814s GCC 10.3.0 produces: .L3: vmovdqu (%rsi), %xmm4 vmovdqu 32(%rsi), %xmm5 subq $-128, %rdi subq $-128, %rsi vmovdqu -64(%rsi), %xmm6 vmovdqu -32(%rsi), %xmm7 vinserti128 $0x1, -112(%rsi), %ymm4, %ymm3 vinserti128 $0x1, -80(%rsi), %ymm5, %ymm2 vinserti128 $0x1, -48(%rsi), %ymm6, %ymm1 vinserti128 $0x1, -16(%rsi), %ymm7, %ymm0 vmovdqu %xmm3, -128(%rdi) vextracti128 $0x1, %ymm3, -112(%rdi) vextracti128 $0x1, %ymm2, -80(%rdi) vmovdqu %xmm2, -96(%rdi) vextracti128 $0x1, %ymm1, -48(%rdi) vextracti128 $0x1, %ymm0, -16(%rdi) vmovdqu %xmm1, -64(%rdi) vmovdqu %xmm0, -32(%rdi) cmpq %rax, %rdi jne .L3 vzeroupper Runtime with gcc10.3.0: $ time ./main_gcc10.3 2000 start to run 2000. end to run 2000. real 7m18.696s user 7m13.912s sys 0m1.098s GCC 12.3.0 produces: .L3: vmovdqu (%rsi), %ymm2 vmovdqu 32(%rsi), %ymm1 subq $-128, %rdi subq $-128, %rsi vmovdqu -64(%rsi), %ymm0 vmovdqu -32(%rsi), %ymm3 vmovdqu %ymm2, -128(%rdi) vmovdqu %ymm3, -32(%rdi) vmovdqu %ymm1, -96(%rdi) vmovdqu %ymm0, -64(%rdi) cmpq %rax, %rdi jne .L3 vzeroupper Runtime with gcc12.3.0: $ time ./main_gcc12.3 2000 start to run 2000. end to run 2000. real 10m1.303s user 9m52.527s sys 0m2.253s Why does it seem that the instructions of gcc12 are simpler but run time is significantly increased in the same test environment and compilation options? What is the reason for the different instructions generated by these three versions of gcc?