Re: [ocaml-developer 219] Re: Writing fast (tail-)recursive calls

Chris King Tue, 17 May 2011 17:41:23 -0700

On Fri, May 13, 2011 at 10:00 AM, OGrandeDiEnne <[email protected]> wrote:
>
> let visit tree =
> match tree with
> | leaf(v) -> v
> | node(v,t1,t2) ->
>  let v1 = visit t1
>  and v2 = visit t2
>  in
>  v + v1 + v2
>
>
> let visit_tail tree k =
> | Leaf(v) -> k v
> | Node(v,t1,t2) ->
>  visit_tail t1 (fun v1 -> visit_tail t2 (fun v2 -> k( v + v1 +
> v2 ) ))
>
> You said that
>> This will always be of the form of some sort of stack or queue,
>> whether explicit or implicit (implied by non-tail recursion).
>
> Here the queue is encoded in the chain of anonymous function.
>
> How are the two versions actually compiled ? Is the second faster than
> the first ?


I don't believe that OCaml performs any optimization of anonymous
functions, which means that every call of visit_tail will create a new
closure on the heap.  visit on the other hand will simply place values
on the stack, which *should* be faster.  (I say "should" because I do
not know the internals of OCaml's GC, but allocating values on the
stack is generally very very fast.)

Compiling the above with ocamlopt -S produces the following output on
x86-64 for visit:

camlStack__visit_63:
        subq    $24, %rsp
.L106:
        movzbq  -8(%rax), %rbx
        testq   %rbx, %rbx
        je      .L105
        movq    %rax, 0(%rsp)
        movq    8(%rax), %rax
        call    camlStack__visit_63@PLT
.L107:
        movq    %rax, 8(%rsp)
        movq    0(%rsp), %rax
        movq    16(%rax), %rax
        call    camlStack__visit_63@PLT
.L108:
        movq    0(%rsp), %rbx
        movq    (%rbx), %rdi
        movq    8(%rsp), %rbx
        addq    %rbx, %rdi
        leaq    -2(%rdi, %rax), %rax
        addq    $24, %rsp
        ret
.L105:
        movq    (%rax), %rax
        addq    $24, %rsp
        ret

Note, this is entirely self-contained -- call instructions are used
for recursion.  Here is the code for visit_tail:

camlStack__visit_tail_71:
        subq    $8, %rsp
.L110:
        movq    %rax, %rdx
        movzbq  -8(%rdx), %rax
        testq   %rax, %rax
        je      .L109
.L111:  subq    $48, %r15
        movq    caml_young_limit@GOTPCREL(%rip), %rax
        cmpq    (%rax), %r15
        jb      .L112
        leaq    8(%r15), %rsi
        movq    $5367, -8(%rsi)
        movq    camlStack__fun_98@GOTPCREL(%rip), %rax
        movq    %rax, (%rsi)
        movq    $3, 8(%rsi)
        movq    %rdi, 16(%rsi)
        movq    %rdx, 24(%rsi)
        movq    %rbx, 32(%rsi)
        movq    8(%rdx), %rax
        movq    %rsi, %rbx
        jmp     .L110
        .align  4
.L109:
        movq    (%rdx), %rax
        movq    (%rbx), %rdi
        addq    $8, %rsp
        jmp     *%rdi
.L112:  call    caml_call_gc@PLT
.L113:  jmp     .L111

Note the heap allocation (the access to caml_young_limit), the
subsequent creation of a closure (whose body is at camlStack__fun_98;
not shown), and the call to the GC on the second to last line.  The
actual tail call (last line) is dwarfed by the explosion in complexity
elsewhere.

The OCaml compiler is decidedly *not* smart; it performs the fewest
high-level optimizations of any compiler that I know.  (I don't think
it even even lifts invariants out of loops.)  However this means it is
usually very predictable.  In particular, simple-looking code
generally leads to simple-looking (and efficient) assembly (see the
example above).

If you want to check out a "smart" compiler, the Mercury [1] compiler
performs dozens (if not hundreds) of high- and low-level
optimizations, one of which is accumulator introduction (rewriting
certain functions to use tail calls).  From my experience, this makes
predicting the compiler's output much more difficult, though it does
lead to performance gains in certain areas.

- Chris

[1] http://www.mercury.csse.unimelb.edu.au/

-- 
You received this message because you are subscribed to the Google Groups 
"ocaml-developer" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to 
[email protected]
For more options, visit this group at 
http://groups.google.com/group/ocaml-developer?hl=en
For other OCaml forums, see http://caml.inria.fr/resources/forums.en.html

Re: [ocaml-developer 219] Re: Writing fast (tail-)recursive calls

Reply via email to