https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64410

            Bug ID: 64410
           Summary: gcc 25% slower than clang 3.5 for adding complex
                    numbers
           Product: gcc
           Version: 4.9.2
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: conradsand.arma at gmail dot com

Created attachment 34336
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=34336&action=edit
cxaddspeed.cpp

gcc 4.9.2 has worse performance than clang 3.5 when dealing with complex
numbers.

Attached is a simple program which adds two vectors with complex numbers. 
Compiled with -O3 on x86-64 (i7), Fedora 21, gcc 4.9.2 and clang 3.5.0.

$ time ./cxaddspeed_gcc 5000 1000000
5.364u 0.002s 0:05.36 100.0%

$ time ./cxaddspeed_clang 5000 1000000
4.417u 0.001s 0:04.41 100.0%

ie. gcc is about 25% slower.


inner loop produced by gcc:
.L52:
    movsd    (%r15,%rax), %xmm1
    movsd    8(%r15,%rax), %xmm0
    addsd    0(%rbp,%rax), %xmm1
    addsd    8(%rbp,%rax), %xmm0
    movsd    %xmm1, (%rbx,%rax)
    movsd    %xmm0, 8(%rbx,%rax)
    addq    $16, %rax
    cmpq    %rsi, %rax
    jne    .L52

inner loop produced by clang:
.LBB0_145:
    movupd    -16(%rbx), %xmm0
    movupd    -16(%rax), %xmm1
    addpd    %xmm0, %xmm1
    movupd    %xmm1, -16(%rdi)
    movupd    (%rbx), %xmm0
    movupd    (%rax), %xmm1
    addpd    %xmm0, %xmm1
    movupd    %xmm1, (%rdi)
    addq    $2, %rbp
    addq    $32, %rbx
    addq    $32, %rax
    addq    $32, %rdi
    addl    $-2, %ecx
    jne    .LBB0_145

Reply via email to