On 2010-03-30, at 4:40 AM, Derick Eddington wrote:

> On Tue, 2010-03-30 at 01:22 -0700, Derick Eddington wrote:
>> On Mon, 2010-03-29 at 13:39 -0400, Marc Feeley wrote:
>>> What settings are normally used for benchmarking ikarus (for example
>>> the benchmarks that come with the distribution)?
>> 
>> I'm not sure what settings are normally used for benchmarking Ikarus,
> 
> I just discovered cp0-effort-limit and that the file benchmarks/bench.ss
> has:
> 
>        (optimize-level 2)
>        ;(cp0-effort-limit 100)
>        ;(cp0-size-limit 10)
>        ;(optimizer-output #t)
> 
> -- 
> : Derick
> ----------------------------------------------------------------

Thanks for the information.  However, none of those settings affect the 
performance of the small programs I've tried (ack, fib, tak). I do get a 10% to 
20% speed improvement by using the unsafe fixnum arithmetic functions in the 
library (ikarus system $fx) like this:

(import (ikarus system $fx))
(define-syntax fx+ (syntax-rules () ((fx+ x ...) ($fx+ x ...))))
(define-syntax fx- (syntax-rules () ((fx- x ...) ($fx- x ...))))
(define-syntax fx= (syntax-rules () ((fx= x ...) ($fx= x ...))))
(define-syntax fx< (syntax-rules () ((fx< x ...) ($fx< x ...))))
...etc

Even with the unsafe fixnum arithmetic functions, Ikarus is consistently slower 
than Larceny on these benchmarks (from 10% to 35% slower).  I've looked at the 
x86 assembly code produced by Ikarus, by using (assembler-output #t), and I 
don't see any type checks and overflow checks that remain, so Ikarus' poorer 
performance on these benchmarks seems to be a consequence of its management of 
the stack (function calls and parameters) and registers.

Something I would like to measure also is the size of the machine code 
generated.  Is there a way to get at that information?  Ikarus' code is rather 
verbose.  For example the x86 code for

(define (fib x)
  (if (fx< x 2)
      x
      (fx+ (fib (fx- x 1))
           (fib (fx- x 2)))))

is

    (name (fib . #f))
    (label L8)
    (cmpl -8 %eax)
    (jne (label L11))
    (label L9)
    (cmpl (disp %esi 32) %esp)
    (jb (label ERROR))
    (label L12)
    (addl 8 (disp %esi 72))
    (je (label ERROR))
    (label L13)
    (movl (disp -8 %esp) %eax)
    (cmpl 16 %eax)
    (jge (label L14))
    (movl (disp -8 %esp) %eax)
    (ret)
    (label L14)
    (movl (disp -8 %esp) %eax)
    (movl %eax (disp -24 %esp))
    (subl 8 (disp -24 %esp))
    (movl (obj #["closure" #["code-loc" L8] () #t]) %eax)
    (movl %eax %edi)
    (movl -8 %eax)
    (subl 8 %esp)
    (jmp (label L15))
    (byte-vector #(2))
    (int 16)
    (current-frame-offset)
    (label-address SL_multiple_values_error_rp)
    (pad 10 (label L15) (call (label L9)))
    (addl 8 %esp)
    (movl %eax (disp -16 %esp))
    (movl (disp -8 %esp) %eax)
    (movl %eax (disp -32 %esp))
    (subl 16 (disp -32 %esp))
    (movl (obj #["closure" #["code-loc" L8] () #t]) %eax)
    (movl %eax %edi)
    (movl -8 %eax)
    (subl 16 %esp)
    (jmp (label L16))
    (byte-vector #(4))
    (int 24)
    (current-frame-offset)
    (label-address SL_multiple_values_error_rp)
    (pad 10 (label L16) (call (label L9)))
    (addl 16 %esp)
    (movl %eax %edi)
    (movl (disp -16 %esp) %eax)
    (addl %edi %eax)
    (ret)
    (label L11)
    (jmp (label SL_invalid_args))
    (nop)
    (label ERROR)
    (movl (foreign-label "ik_stack_overflow") %eax)
    (movl %eax %edi)
    (movl 0 %eax)
    (movl (foreign-label "ik_foreign_call") %ebx)
    (subl 8 %esp)
    (jmp (label L17))
    (byte-vector #(2))
    (int 16)
    (current-frame-offset)
    (label-address SL_multiple_ignore_error_rp)
    (pad 10 (label L17) (call %ebx))
    (addl 8 %esp)
    (jmp (label L12))
    (label ERROR)
    (movl (obj $do-event) %eax)
    (movl (disp %eax 19) %eax)
    (movl %eax %edi)
    (movl 0 %eax)
    (subl 8 %esp)
    (jmp (label L18))
    (byte-vector #(2))
    (int 16)
    (current-frame-offset)
    (label-address SL_multiple_ignore_error_rp)
    (pad 10 (label L18) (call (disp -3 %edi)))
    (addl 8 %esp)
    (jmp (label L13))

There seems to be quite a lot of overhead to handle multiple return values (in 
code size and execution time).  Can support for multiple-values be turned off?  
For reference, a program computing (fib 40) in C using the same doubly 
recursive algorithm is about 3 times faster than Ikarus (when compiled with gcc 
-O3).

Marc

Reply via email to