On 2010-03-30, at 4:40 AM, Derick Eddington wrote:
> On Tue, 2010-03-30 at 01:22 -0700, Derick Eddington wrote:
>> On Mon, 2010-03-29 at 13:39 -0400, Marc Feeley wrote:
>>> What settings are normally used for benchmarking ikarus (for example
>>> the benchmarks that come with the distribution)?
>>
>> I'm not sure what settings are normally used for benchmarking Ikarus,
>
> I just discovered cp0-effort-limit and that the file benchmarks/bench.ss
> has:
>
> (optimize-level 2)
> ;(cp0-effort-limit 100)
> ;(cp0-size-limit 10)
> ;(optimizer-output #t)
>
> --
> : Derick
> ----------------------------------------------------------------
Thanks for the information. However, none of those settings affect the
performance of the small programs I've tried (ack, fib, tak). I do get a 10% to
20% speed improvement by using the unsafe fixnum arithmetic functions in the
library (ikarus system $fx) like this:
(import (ikarus system $fx))
(define-syntax fx+ (syntax-rules () ((fx+ x ...) ($fx+ x ...))))
(define-syntax fx- (syntax-rules () ((fx- x ...) ($fx- x ...))))
(define-syntax fx= (syntax-rules () ((fx= x ...) ($fx= x ...))))
(define-syntax fx< (syntax-rules () ((fx< x ...) ($fx< x ...))))
...etc
Even with the unsafe fixnum arithmetic functions, Ikarus is consistently slower
than Larceny on these benchmarks (from 10% to 35% slower). I've looked at the
x86 assembly code produced by Ikarus, by using (assembler-output #t), and I
don't see any type checks and overflow checks that remain, so Ikarus' poorer
performance on these benchmarks seems to be a consequence of its management of
the stack (function calls and parameters) and registers.
Something I would like to measure also is the size of the machine code
generated. Is there a way to get at that information? Ikarus' code is rather
verbose. For example the x86 code for
(define (fib x)
(if (fx< x 2)
x
(fx+ (fib (fx- x 1))
(fib (fx- x 2)))))
is
(name (fib . #f))
(label L8)
(cmpl -8 %eax)
(jne (label L11))
(label L9)
(cmpl (disp %esi 32) %esp)
(jb (label ERROR))
(label L12)
(addl 8 (disp %esi 72))
(je (label ERROR))
(label L13)
(movl (disp -8 %esp) %eax)
(cmpl 16 %eax)
(jge (label L14))
(movl (disp -8 %esp) %eax)
(ret)
(label L14)
(movl (disp -8 %esp) %eax)
(movl %eax (disp -24 %esp))
(subl 8 (disp -24 %esp))
(movl (obj #["closure" #["code-loc" L8] () #t]) %eax)
(movl %eax %edi)
(movl -8 %eax)
(subl 8 %esp)
(jmp (label L15))
(byte-vector #(2))
(int 16)
(current-frame-offset)
(label-address SL_multiple_values_error_rp)
(pad 10 (label L15) (call (label L9)))
(addl 8 %esp)
(movl %eax (disp -16 %esp))
(movl (disp -8 %esp) %eax)
(movl %eax (disp -32 %esp))
(subl 16 (disp -32 %esp))
(movl (obj #["closure" #["code-loc" L8] () #t]) %eax)
(movl %eax %edi)
(movl -8 %eax)
(subl 16 %esp)
(jmp (label L16))
(byte-vector #(4))
(int 24)
(current-frame-offset)
(label-address SL_multiple_values_error_rp)
(pad 10 (label L16) (call (label L9)))
(addl 16 %esp)
(movl %eax %edi)
(movl (disp -16 %esp) %eax)
(addl %edi %eax)
(ret)
(label L11)
(jmp (label SL_invalid_args))
(nop)
(label ERROR)
(movl (foreign-label "ik_stack_overflow") %eax)
(movl %eax %edi)
(movl 0 %eax)
(movl (foreign-label "ik_foreign_call") %ebx)
(subl 8 %esp)
(jmp (label L17))
(byte-vector #(2))
(int 16)
(current-frame-offset)
(label-address SL_multiple_ignore_error_rp)
(pad 10 (label L17) (call %ebx))
(addl 8 %esp)
(jmp (label L12))
(label ERROR)
(movl (obj $do-event) %eax)
(movl (disp %eax 19) %eax)
(movl %eax %edi)
(movl 0 %eax)
(subl 8 %esp)
(jmp (label L18))
(byte-vector #(2))
(int 16)
(current-frame-offset)
(label-address SL_multiple_ignore_error_rp)
(pad 10 (label L18) (call (disp -3 %edi)))
(addl 8 %esp)
(jmp (label L13))
There seems to be quite a lot of overhead to handle multiple return values (in
code size and execution time). Can support for multiple-values be turned off?
For reference, a program computing (fib 40) in C using the same doubly
recursive algorithm is about 3 times faster than Ikarus (when compiled with gcc
-O3).
Marc