Re: GHC vs. GCC on raw vector addition

Simon Marlow Thu, 19 Jan 2006 03:28:09 -0800

John Meacham wrote:

On Wed, Jan 18, 2006 at 08:54:43PM +0300, Bulat Ziganshin wrote:

sorry, with the "gcc -O3 -ffast-math -fstrict-aliasing -funroll-loops"
the C version is 50 times faster than best Haskell one... it's the
loop from C version:


I believe something similar to what I noted here is the culprit:
http://www.haskell.org//pipermail/glasgow-haskell-users/2005-October/009174.html

it is fixable, but not without modifying ghc.

Ah, I see what you mean by indirect jumps. Those indirect jumps go awayif you compile with -optc-O2 or -fasm, they're droppings left byinadequacies in gcc's standard -O optimisation.


Actually, -fasm does better by one instruction than gcc on this example:

.globl Test_zdwfac_info
Test_zdwfac_info:
        movq (%rbp),%rax
        cmpq $1,%rax
        jne .LcmO
        movq 8(%rbp),%r13
        addq $16,%rbp
        jmp *(%rbp)
.LcmO:
        leaq -1(%rax),%rcx
        imulq 8(%rbp),%rax
        movq %rax,8(%rbp)
        movq %rcx,(%rbp)
        jmp Test_zdwfac_info

vs. gcc -O2:

Test_zdwfac_info:
.text
        .align 8
        movq    (%rbp), %rdx
        cmpq    $1, %rdx
        je      .L6
.L3:
        movq    8(%rbp), %rax
        imulq   %rdx, %rax
        decq    %rdx
        movq    %rdx, (%rbp)
        movq    %rax, 8(%rbp)
        jmp     Test_zdwfac_info
        .p2align 4,,7
.L6:
        movq    8(%rbp), %r13
        addq    $16, %rbp
        jmp     *(%rbp)

We should probably reverse the sense of that branch, like gcc does. Thememory accesses are still there, of course. Hopefully someday I'll getaround to trying to use more registers on x86_64 again.


Cheers,
        Simon
_______________________________________________
Glasgow-haskell-users mailing list
[email protected]
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users

Re: GHC vs. GCC on raw vector addition

Reply via email to