Actually, to be completely correct, the access is the movl 2 instructions
above the "-->". The "-->" points to the accumulation in the local
variable k.

Any ideas how I can further narrow this 20x gap and improve GHC's
observed relative array access performance ?

int main () {
  int i, k = 0;
  int a [16];
  for (i=0; i<100000000; i++) {
    k += a [i % 16];
  }
  return k;
}

Thanks!
Lex

--
Lex Stein                     http://www.eecs.harvard.edu/~stein/
[EMAIL PROTECTED]        TEL: 617-233-0246


On Thu, 26 Jun 2003, Lex Stein wrote:

>
> You must be using my old example, with gcc -O3 my new example does access
> the array and is 20x faster than the ghc -O code. Scroll down for
> "-->" that I inserted beside the instruction to see the array access.
>
>       .file   "arrayloop.c"
>       .version        "01.01"
> gcc2_compiled.:
> .text
>       .p2align 2,0x90
> .globl main
>               .type            main,@function
> main:
>       pushl %ebp
>       movl %esp,%ebp
>       subl $80,%esp
>       pushl %esi
>       pushl %ebx
>       xorl %ebx,%ebx
>       xorl %ecx,%ecx
>       leal -64(%ebp),%esi
>       .p2align 2,0x90
> .L6:
>       movl %ecx,%edx
>       testl %ecx,%ecx
>       jge .L7
>       leal 15(%ecx),%edx
> .L7:
>       andl $-16,%edx
>       movl %ecx,%eax
>       subl %edx,%eax
> -->   addl (%esi,%eax,4),%ebx
>       incl %ecx
>       cmpl $99999999,%ecx
>       jle .L6
>       movl %ebx,%eax
>       popl %ebx
>       popl %esi
>       leave
>       ret
> .Lfe1:
>               .size            main,.Lfe1-main
>       .ident  "GCC: (GNU) c 2.95.4 20020320 [FreeBSD]"
>
> On Thu, 26 Jun 2003, Hal Daume wrote:
>
> > This has been hinted to before: gcc gets rid of the loop.
> >
> > bash-2.05b$ gcc -O3 arr.c -S -o arr.s
> > bash-2.05b$ cat arr.s
> >     .file   "arr.c"
> >     .def    ___main;        .scl    2;      .type   32;     .endef
> >     .text
> >     .align 2
> >     .align 16
> > .globl _main
> >     .def    _main;  .scl    2;      .type   32;     .endef
> > _main:
> >     pushl   %ebp
> >     movl    %esp, %ebp
> >     subl    $72, %esp
> >     xorl    %eax, %eax
> >     andl    $-16, %esp
> >     call    __alloca
> >     call    ___main
> >     xorl    %eax, %eax
> >     .align 16
> > L7:
> >     incl    %eax
> >     cmpl    $100000000, %eax
> >     jle     L7
> >     leave
> >     ret
> >
> > the inside of the loop is exactly:
> >
> > L7:
> >     incl    %eax
> >     cmpl    $100000000, %eax
> >     jle     L7
> >
> > which doesn't read the array!  essentially you are being unfare to ghc
> > by making it seq the array element, but not doing the same to GCC.  even
> > with -O0, gcc gets rid of the loop body.
> >
> > one solution: add in the assembly to actually read the array (i'm too
> > lazy and busy to do this).
> >
> >  - hal
> >
> >
> > --
> >  Hal Daume III                                   | [EMAIL PROTECTED]
> >  "Arrest this man, he talks in maths."           | www.isi.edu/~hdaume
> >
> >
> > > -----Original Message-----
> > > From: [EMAIL PROTECTED]
> > > [mailto:[EMAIL PROTECTED] On Behalf Of Lex Stein
> > > Sent: Thursday, June 26, 2003 3:46 PM
> > > To: Sven Panne
> > > Cc: [EMAIL PROTECTED]
> > > Subject: Re: haskell array access
> > >
> > >
> > >
> > > Great, thanks. Those suggestions narrow the gap from GHC -O being 330x
> > > slower than GCC -O3 to it being 20x slower. Here are the new results:
> > >
> > > gcc -O3      0.54s
> > > ocamlopt     1.11s
> > > ghc -O      10.76s
> > > ocamlc      14.10s
> > >
> > > GHC is still pretty slow for native x86 instruction code. Is
> > > there any way
> > > to further explain the performance gap ? (new code below)
> > >
> > > Thanks!
> > > Lex
> > >
> > > Haskell (GHC) source:
> > >
> > > import Array
> > > k :: Array Int Int
> > > k = Array.array (0,15) [(i, i+1) | i <- [0 .. 15] ]
> > > acc :: Int -> Int
> > > acc 0 = 0
> > > acc n = seq (k Array.! (n `mod` 16)) (acc (n-1))
> > > main = do
> > >     print (acc 100000000)
> > >
> > > Caml test source:
> > >
> > > let a = Array.make 16 0;;
> > > let rec do1 i z =
> > >     if (i>100000000) then z else
> > >         do1 (i+1) (z + a.(i mod 16));;
> > > do1 0 0;;
> > >
> > > C (gcc) test source:
> > >
> > > int main () {
> > >     int i, k = 0;
> > >     int a [16];
> > >     for (i=0; i<100000000; i++) {
> > >         k += a [i % 16];
> > >     }
> > >     return (k);
> > > }
> > >
> > >
> > > On Fri, 27 Jun 2003, Sven Panne wrote:
> > >
> > > > Well, part of the answer is definitely that the Haskell
> > > program is the
> > > > *only* one which really uses the array elements. :-) I
> > > guess that the
> > > > compilers for the other languages simply remove the array
> > > access from
> > > > the generated code (gcc definitely does, only an empty loop
> > > remains).
> > > >
> > > > Another reason is that the type signatures are missing, so
> > > Integer is
> > > > used instead of Int, which is a bit unfair. Adding
> > > >
> > > >     k :: Array Int Int
> > > >     acc :: Int -> Int
> > > >
> > > > cuts down the time by more than factor 5 on my machine.
> > > >
> > > > Cheers,
> > > >     S.
> > > >
> > > >
> > > > _______________________________________________
> > > > Haskell-Cafe mailing list
> > > > [EMAIL PROTECTED]
> > > > http://www.haskell.org/mailman/listinfo/haskell-cafe
> > > >
> > > _______________________________________________
> > > Haskell-Cafe mailing list
> > > [EMAIL PROTECTED]
> > > http://www.haskell.org/mailman/listinfo/haskell-cafe
> > >
> > _______________________________________________
> > Haskell-Cafe mailing list
> > [EMAIL PROTECTED]
> > http://www.haskell.org/mailman/listinfo/haskell-cafe
> >
> _______________________________________________
> Haskell-Cafe mailing list
> [EMAIL PROTECTED]
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
_______________________________________________
Haskell-Cafe mailing list
[EMAIL PROTECTED]
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to