CL implementation speed (Re: [computer-go] Drunken sailor on payday)

William Harold Newman Sun, 25 Nov 2007 12:07:24 -0800

On Fri, Nov 23, 2007 at 12:30:01AM +0100, Benjamin Teuber wrote:
> AFAIK, CMUCL is the fastest free lisp available. But I would rather
> stick with its offspring, SBCL, which might be a bit slower, but it is
> being worked on actively, it is quite portable
> (http://sbcl.sourceforge.net/platform-table.html) and it supports
> native threads (CMUCL just has green threads). Of course it's not easy
> to compare speed of different languages, but e.g.
> http://shootout.alioth.debian.org/gp4/benchmark.php?test=all&lang=all
> might give you an idea (oh shame, Java seems faster there).

There are two kinds of compiler speed.

The first kind of speed is how fast the compiled program runs after it
has been compiled. SBCL didn't lose much of that speed in the fork
from CMU CL, and since then it has been at least as actively developed
as as CMU CL has been, so it generally hasn't fallen behind. While
it's probably not hard to find benchmarks where CMU CL beats SBCL, I
believe it's fair to say the speed of SBCL's compiled code is
basically at least competitive with that of CMU CL's compiled code.

(And generally I'd expect CMU CL and SBCL to produce the fastest
compiled code, OpenMCL to be next, and CLISP to be significantly
slower on any code where its lack of native compilation (only byte
code) creates a bottleneck.)

The second kind of speed is how fast the compiler translates source
code into compiled programs. SBCL was originally notoriously slow at
that compared to CMU CL (which was, incidentally, my fault personally,
due to choices I made in the fork from CMU CL to SBCL). These days,
though, SBCL's speed at that is more nearly competitive with that of
CMU CL (mostly thanks to work of others, not me). For some benchmarks
on compilation speed in bad-old-days SBCL, reasonably-recent SBCL, and
CMU CL, see

http://jsnell.iki.fi/blog/archive/2007-02-07-compilation-speed-in-sbcl-over-the-years.html

> On Nov 22, 2007 10:54 PM, Don Dailey <[EMAIL PROTECTED]> wrote:
> > What is the state of common lisp these days? What is fastest compiler
> > for X86 and how does it compare to C code in performance?

I'll punt on "state of CL," and answer the other two questions
separately.

Re: What is fastest CL compiler for X86?

Here's my rough guess; take it for what it's worth. (I know a lot
about SBCL and CMU CL and Lisp in general, to the point where I'm
likely biased. and I have only ever put serious effort into optimizing
performance of one sizable piece of CL code, and that was in CMU CL
back in the last millennium.)

In many cases the free compilers SBCL and/or CMU CL will be fastest.
They are sufficiently closely related that if one is the fastest, it's
fairly likely that the other one will be second fastest. (And the
commercial "scieneer" compiler is also related, but I haven't heard
much about that for years.)

In many cases the commercial compiler Allegro CL will be fastest. In
particular, Allegro apparently compiles Common Lisp Object System code
using a rather different scheme than SBCL and CMU CL do, one which is
supposed to be generally more clever. (SBCL and CMU CL use the old
public reference implementation of CLOS-in-CL plus years of tweaking;
Allegro CL uses a CLOS-in-native-code which was apparently written
independently from scratch.) If you write an arbitrary representative
piece of code where object allocation and/or method dispatch are the
bottleneck, I'd expect Allegro to be faster.

I really don't know much about the rest of Allegro's general compiler
design or how heavily it's been tweaked in practice, so I can't guess
where else its performance might be better. I'd be surprised, though,
if it's generally much better. (As an SBCL fan, I was tickled to run
across an Allegro advocate listing many bullet points in its favor ---
various impressive libraries, e.g. --- but not general performance.:-)

Probably in a few cases the free CLISP byte-compiler will be fastest.
It labors under the handicap of being a byte-compiled language, but in
a calculation which spends most of its time in libraries, that
handicap can be overcome by better library performance. And CLISP's
huge-integer library routines, in particular, are supposed to be very
good in some ways, so it wouldn't surprise me if CLISP could win in
various number-theory-related calculations (like public-key crypto).

There may also be other subclasses of program, and contenders for
fastest-CL-on-X86, which I don't happen to know about.

Re: How does it (CL in general, and on X86 in particular) compare to C
in performance?

It seems to me that it is often pretty easy to make CL code approach
the performance of C code within a factor of three or so. The main
exception in my experience is when you wander into a niche where CL's
machine model is a particularly poor fit to the underlying CPU
capabilities. For me, on the X86 target you asked about, by far the
biggest such niche is passing around globally-visible floating point
numbers (as opposed to numbers in private variables within a single
function or other block of code, where a smart compiler like
CMUCL/SBCL can magic the problem away). GCed dynamically typed
languages generally represent their globally-visible data with tagged
machine words, and there aren't enough bits in a single X86 machine
word to store a single precision floating point number plus a tag. The
symptom of this misfit is that unless you are very careful (even to
the point of sometimes writing tediously unidiomatic code) you can end
up with a tagged machine word which is a pointer to a heap-allocated
floating point number which needs to be GCed later. Anywhere this
happens, your performance goes completely to hell, because heap
allocation/GC is orders of magnitude more expensive than a floating
point operation. (Note the devil-in-the-details nature of this kind of
misfit-to-the-CPU problem: the tagged-floating-point problem can
vanish completely on CL on a 64-bit CPU like X86-64, because in a
64-bit machine word there's plenty of space for a floating point
number and some tag bits.) Even in such a niche you can often get to
within a factor of three or so of C performance, but it can be quite
tedious to do so, and while your code can remain portably correct, its
good performance might be unportably tied to a particular compiler.

After you get within a factor of three-ish, getting closer to C
becomes much less predictable: it can be sensitive to all sorts of
things in your algorithm and its data structures, sometimes in deeply
obscure ways like the way that GC invariants constrain the register
allocation scheme used by an optimizing compiler. CL-PPCRE is a
well-known example of good CL performance, but I think I could cook up
bad examples faster than I could type.:-| If you really really need
impressive microefficiency, look at your algorithms, think carefully
about roughly what the language has to be compiled to, and seriously
consider lower-level languages.

> > I was either going to experiment with Forth or lisp in the near
> > future. I will get around to both eventually.

You (or anyone coming to CL from a game-playing perspective) might
want to look at the Othello-playing program in Norvig's _Paradigms of
Artificial Intelligence Programming: Case Studies in Common Lisp_. The
book is noticeably old (notably in its idea of what the paradigms of
AI are), but it remains quite a good source for practical CL
programming examples and advice.

--
William Harold Newman <[EMAIL PROTECTED]>
PGP key fingerprint 85 CE 1C BA 79 8D 51 8C B9 25 FB EE E0 C3 E5 7C
Ubi saeva indignatio ulterius cor lacerare nequit. -- Jonathan Swift's epitaph
_______________________________________________
computer-go mailing list
computer-go@computer-go.org
http://www.computer-go.org/mailman/listinfo/computer-go/

CL implementation speed (Re: [computer-go] Drunken sailor on payday)

Reply via email to