Here are the numbers for Racket, Typed Racket, Gambit and Larceny on
32 bits, and without Larceny on 64 bits.
Overall, we're competitive, but we're losing pretty hard on deriv.
Vincent
| fastest | gambit | larceny | racket | typed-racket |
cpstack | 5715 ms | 1.13 | 1 | 1.46 | 1.44 |
dderiv | 6200 ms | 1.48 | 1 | 1.49 | 1.51 |
deriv | 3096 ms | 1 | 1.34 | 3.08 | 3.13 |
div | 7434 ms | 1 | 2.16 | 1.16 | 1.11 |
fft | 5903 ms | 1 | 2.01 | 1.49 | 1.11 |
graphs | 7656 ms | 1.47 | 1.20 | 1.02 | 1 |
lattice2 | 5863 ms | 2.07 | 1 | 1.50 | 1.41 |
maze2 | 7678 ms | 3.64 | 3.58 | 1.00 | 1 |
mazefun | 11693 ms | 1.52 | 1 | 1.08 | 1.06 |
nfa | 6419 ms | 1.26 | 1 | 1.23 | 1.03 |
nqueens | 6835 ms | 1.04 | 1 | 1.04 | 1.07 |
paraffins | 6950 ms | 1 | 1.95 | 1.43 | 1.22 |
tak | 7669 ms | 1.54 | 1.56 | 1.00 | 1 |
takl | 7325 ms | 2.32 | 1.64 | 1.35 | 1 |
triangle | 8960 ms | 1.00 | 1.10 | 1 | 1.09 |
| fastest | gambit | racket | typed-racket |
cpstack | 5332 ms | 1 | 1.50 | 1.47 |
dderiv | 5981 ms | 1 | 1.33 | 1.30 |
deriv | 3064 ms | 1 | 2.37 | 2.32 |
div | 7014 ms | 1 | 1.58 | 1.56 |
fft | 3830 ms | 1 | 1.47 | 1.16 |
graphs | 6794 ms | 1.22 | 1.01 | 1 |
lattice2 | 7250 ms | 1.31 | 1.06 | 1 |
maze2 | 7280 ms | 2.46 | 1.04 | 1 |
mazefun | 12295 ms | 1.10 | 1.09 | 1 |
nfa | 6794 ms | 1 | 1.39 | 1.15 |
nqueens | 5651 ms | 1 | 1.21 | 1.09 |
paraffins | 8679 ms | 1 | 1.52 | 1.20 |
tak | 7916 ms | 1 | 1.02 | 1.00 |
takl | 8252 ms | 1.62 | 1.13 | 1 |
triangle | 6862 ms | 1 | 1.21 | 1.27 |
At Sun, 24 Apr 2011 22:09:18 -0400,
Vincent St-Amour wrote:
>
> These are impressive speedups!
>
> Given how close we were to the fastest Scheme compilers on some of
> these, that may be enough to give us the lead.
>
> I'll run the benchmarks on different implementations tomorrow.
>
> Vincent
>
>
> At Sun, 24 Apr 2011 17:11:21 -0600,
> Matthew Flatt wrote:
> >
> > The `assoc' example helped focus my attention on a long-unsolved issue
> > with JIT-generated code, where non-tail calls from JIT-generated code
> > to other JIT-generated code seemed more expensive than they should be.
> > This effect showed up in `assq' and `assoc' through a high relative
> > cost for calling `assq' or `assoc' on a short list (compared to calling
> > the C implementation).
> >
> > This time, I finally saw what I've been missing: It's crucial to pair
> > `call' and `ret' instructions on x86. That won't be news to compiler
> > writers; it's a basic fact that I missed along the way.
> >
> > When the JIT generates a non-tail call from to other code that it
> > generates, it sets up the called procedure's frame directly (because
> > various computed values are more readily available before jumping to
> > the called procedure). After setting up the frame --- including a
> > return address --- the target code was reached using `jmp'. Later, the
> > `ret' to return from the non-tail call would confuse the processor and
> > caused stalls, because the `ret' it wasn't matched with its `call'.
> > It's easy enough to put the return address in place using `call' when
> > setting up a frame, which exposes the right nesting to the processor.
> >
> > The enclosed table shows the effect on traditional Scheme
> > microbenchmarks. Improvements of 20% are common, and several improve by
> > 50% or more. It's difficult to say which real code will benefit, but I
> > think the improvement is likely to be useful.
_________________________________________________
For list-related administrative tasks:
http://lists.racket-lang.org/listinfo/dev