Considering that "complex" turns basically any basic type into a vector type,
complex number addition and subtraction could utilize SSE instructions to
perform the operation on real and imaginary parts simultaneously. (Only applies
to addition and subtraction.)

Code:

#include <complex.h>

typedef float complex ss1;
typedef float ss2 __attribute__((vector_size(sizeof(ss1))));

ss1 add1(ss1 a, ss1 b) { return a + b; }
ss2 add2(ss2 a, ss2 b) { return a + b; }

Produces:

add1:
        movq    %xmm0, -8(%rsp)
        movq    %xmm1, -16(%rsp)
        movss   -4(%rsp), %xmm0
        movss   -8(%rsp), %xmm1
        addss   -12(%rsp), %xmm0
        addss   -16(%rsp), %xmm1
        movss   %xmm0, -20(%rsp)
        movss   %xmm1, -24(%rsp)
        movq    -24(%rsp), %xmm0
        ret
add2:
        movlps  %xmm0, -16(%rsp)
        movlps  %xmm1, -24(%rsp)
        movaps  -24(%rsp), %xmm0
        addps   -16(%rsp), %xmm0
        movaps  %xmm0, -56(%rsp)
        movlps  -56(%rsp), %xmm0
        ret

Command line:
    gcc -msse  -O3 -S test2.c
    (Results are same with -ffast-math)
Architecture:
CPU=AMD Athlon(tm) 64 X2 Dual Core Processor 4600+
CPU features=fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat
pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt lm 3dnowext 3dnow
pni lahf_lm cmp_legacy

GCC is:
Target: x86_64-linux-gnu
Configured with: ../src/configure -v
--enable-languages=c,c++,fortran,objc,obj-c++,treelang --prefix=/usr
--enable-shared --with-system-zlib --libexecdir=/usr/lib
--without-included-gettext --enable-threads=posix --enable-nls
--program-suffix=-4.1 --enable-__cxa_atexit --enable-clocale=gnu
--enable-libstdcxx-debug --enable-mpfr --enable-checking=release
x86_64-linux-gnu
Thread model: posix
gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)


-- 
           Summary: C complex numbers, amd64 SSE, missed optimization
                    opportunity
           Product: gcc
           Version: 4.1.2
            Status: UNCONFIRMED
          Severity: enhancement
          Priority: P3
         Component: rtl-optimization
        AssignedTo: unassigned at gcc dot gnu dot org
        ReportedBy: bisqwit at iki dot fi


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31485

Reply via email to