Re: closures

Alexander Burger Mon, 27 Oct 2008 23:21:38 -0700

Hi Tomas,

> so what is the reason for optimizing picolisp for 32/64 bits
> specificly?  Is miniPicoLisp so inefficient?


To explain this, I'd like to refer to "doc/structures" in each model.
"doc/structures" is the foundation of each implementation, like a set of
axioms. Almost everything else follows from them.

In the following, I'll compare the pointer tag patterns in each
implementation. Note that in all cases the least significant bit is not
used as a tag bit; it is reserved as a mark bit for garbage collection.


The primary data types of picoLisp2 are:

      xxxxxxxxxxxxxxxxxxxxxxxxxxxxx010 Number
      xxxxxxxxxxxxxxxxxxxxxxxxxxxxx100 Symbol
      xxxxxxxxxxxxxxxxxxxxxxxxxxxxx000 Cell

You can see that a bitwise AND with 2 will indicate a number, 4 a
symbol, 6 an atom, and so on.


For miniPicoLisp it is (the number of 'x's is reduced):

         num      xxxxxx10
         sym      xxxxx100
         cell     xxxxx000

That is, a number is still indicated by an AND with 2, or an atom with
6. Note, however, that you cannot directly check for a symbol here,
because a number may also have bit three on. To determine if a given
datum is a symbol, it must first be asserted that it is not a number.

This design gives an additional bit for the number's value, at the
expense of a possibly more time-consuming check. The reason is that
miniPicoLisp has only short numbers, and especially with a 32 bit word
size each bit is very precious.

Finally, for the encoding of symbol names, rather convoluted structures
are used which are too involved to describe in this mail. Perhaps an
intensive study of "doc/structures" and the sources could give some
insight. In picoLisp2, symbol names are simply combined big and short
numbers.


Now, for picoLisp3, which is guaranteed to have a word size of 64 bits:

   cnt   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxS010
   big   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxS100
   sym   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx1000
   cell  xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx0000

We have an additional tag bit, and use it to differentiate between short
numbers and bignums. The 'S' bit in each type is the sign bit. So a
number can be identified by ANDing with 6, a symbol with 8, and so on.


In summary, picoLisp2 has only bignums, miniPicoLisp has only short
numbers, and picoLisp2 uses a combination of both.


Why is this so critical? For a 64bit system, it would be a huge waste of
space without a short number type. Take a list (1 2 3) as an example. If
there are only bignums as in picoLisp2, it would look like

      +-----+-----+     +-----+-----+     +-----+-----+
      |  |  |  ---+---> |  |  |  ---+---> |  |  |  /  |
      +--+--+-----+     +--+--+-----+     +--+--+-----+
         |                 |                 |
         V                 V                 V
      +-----+-----+     +-----+-----+     +-----+-----+
      |  1  |  /  |     |  2  |  /  |     |  3  |  /  |
      +-----+-----+     +-----+-----+     +-----+-----+

The list would occupy 6 cells in total, 48 bytes on a 32bit machine, but
96 bytes on a 64bit machine.

With short numbers, this is reduced to half:

      +-----+-----+     +-----+-----+     +-----+-----+
      |  1  |  ---+---->|  2  |  ---+---->|  3  |  /  |
      +-----+-----+     +-----+-----+     +-----+-----+


The same applies to symbol names (and thus also to strings). In
picoLisp3, a symbol with a name of up to 7 characters fits into a single
cell:

            Symbol
            |
            V
      +-----+-----+
      |'abc'| VAL |
      +-----+-----+

The name "abc" is stored as a short number directly in the symbol's
tail. Otherwise (as in picoLisp2) it would look like

            Symbol
            |
            V
      +-----+-----+
      |  |  | VAL |
      +-----+-----+
         |
         V
      +-----+-----+
      |'abc'|  /  |
      +-----+-----+

here, too, we would need twice as much space.


On picoLisp3, this can even be combined. The whole name of a symbol with
up to 15 characters fits into a single cell, 8 into the bignum's digit
part, and 7 into the final short number:

            Symbol
            |
            V
      +-----+-----+
      |  |  | VAL |
      +-----+-----+
         |
         V
      +-----+-----+
      |  8  |  7  |
      +-----+-----+


So, to bring it to the point: The above implementation was chosen simply
to save space. To my experience, optimizing space consumptions is far
more important than short sighted code optimizations or structural
design decisions (e.g. for a compiler). If the code is slow, you might
have a system that is half as fast as the other. So what? But if you run
out of space because you'd need twice as much, performance will go down
dramatically because of cache misses, swapping and trashing.



> Or what features of picoLisp cannot be implemented in miniPicoLisp?

Even if you decide to live with (62 bit) short numbers only, still the
limitation of current C compilers does not allow to directly implement
the mul/div operation with an intermediate double-word result (as used
in '*/'). So you'd have to resort to half-word twiddling or assembly
language here too.

The implementation of 6-and-a-half bits for ASCII characters in
miniPicoLisp does not allow for UTF-8 support or external symbol
encodings.

Cheers,
- Alex
-- 
UNSUBSCRIBE: mailto:[EMAIL PROTECTED]

Re: closures

Reply via email to