Hi Tomas, > so what is the reason for optimizing picolisp for 32/64 bits > specificly? Is miniPicoLisp so inefficient?
To explain this, I'd like to refer to "doc/structures" in each model. "doc/structures" is the foundation of each implementation, like a set of axioms. Almost everything else follows from them. In the following, I'll compare the pointer tag patterns in each implementation. Note that in all cases the least significant bit is not used as a tag bit; it is reserved as a mark bit for garbage collection. The primary data types of picoLisp2 are: xxxxxxxxxxxxxxxxxxxxxxxxxxxxx010 Number xxxxxxxxxxxxxxxxxxxxxxxxxxxxx100 Symbol xxxxxxxxxxxxxxxxxxxxxxxxxxxxx000 Cell You can see that a bitwise AND with 2 will indicate a number, 4 a symbol, 6 an atom, and so on. For miniPicoLisp it is (the number of 'x's is reduced): num xxxxxx10 sym xxxxx100 cell xxxxx000 That is, a number is still indicated by an AND with 2, or an atom with 6. Note, however, that you cannot directly check for a symbol here, because a number may also have bit three on. To determine if a given datum is a symbol, it must first be asserted that it is not a number. This design gives an additional bit for the number's value, at the expense of a possibly more time-consuming check. The reason is that miniPicoLisp has only short numbers, and especially with a 32 bit word size each bit is very precious. Finally, for the encoding of symbol names, rather convoluted structures are used which are too involved to describe in this mail. Perhaps an intensive study of "doc/structures" and the sources could give some insight. In picoLisp2, symbol names are simply combined big and short numbers. Now, for picoLisp3, which is guaranteed to have a word size of 64 bits: cnt xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxS010 big xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxS100 sym xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx1000 cell xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx0000 We have an additional tag bit, and use it to differentiate between short numbers and bignums. The 'S' bit in each type is the sign bit. So a number can be identified by ANDing with 6, a symbol with 8, and so on. In summary, picoLisp2 has only bignums, miniPicoLisp has only short numbers, and picoLisp2 uses a combination of both. Why is this so critical? For a 64bit system, it would be a huge waste of space without a short number type. Take a list (1 2 3) as an example. If there are only bignums as in picoLisp2, it would look like +-----+-----+ +-----+-----+ +-----+-----+ | | | ---+---> | | | ---+---> | | | / | +--+--+-----+ +--+--+-----+ +--+--+-----+ | | | V V V +-----+-----+ +-----+-----+ +-----+-----+ | 1 | / | | 2 | / | | 3 | / | +-----+-----+ +-----+-----+ +-----+-----+ The list would occupy 6 cells in total, 48 bytes on a 32bit machine, but 96 bytes on a 64bit machine. With short numbers, this is reduced to half: +-----+-----+ +-----+-----+ +-----+-----+ | 1 | ---+---->| 2 | ---+---->| 3 | / | +-----+-----+ +-----+-----+ +-----+-----+ The same applies to symbol names (and thus also to strings). In picoLisp3, a symbol with a name of up to 7 characters fits into a single cell: Symbol | V +-----+-----+ |'abc'| VAL | +-----+-----+ The name "abc" is stored as a short number directly in the symbol's tail. Otherwise (as in picoLisp2) it would look like Symbol | V +-----+-----+ | | | VAL | +-----+-----+ | V +-----+-----+ |'abc'| / | +-----+-----+ here, too, we would need twice as much space. On picoLisp3, this can even be combined. The whole name of a symbol with up to 15 characters fits into a single cell, 8 into the bignum's digit part, and 7 into the final short number: Symbol | V +-----+-----+ | | | VAL | +-----+-----+ | V +-----+-----+ | 8 | 7 | +-----+-----+ So, to bring it to the point: The above implementation was chosen simply to save space. To my experience, optimizing space consumptions is far more important than short sighted code optimizations or structural design decisions (e.g. for a compiler). If the code is slow, you might have a system that is half as fast as the other. So what? But if you run out of space because you'd need twice as much, performance will go down dramatically because of cache misses, swapping and trashing. > Or what features of picoLisp cannot be implemented in miniPicoLisp? Even if you decide to live with (62 bit) short numbers only, still the limitation of current C compilers does not allow to directly implement the mul/div operation with an intermediate double-word result (as used in '*/'). So you'd have to resort to half-word twiddling or assembly language here too. The implementation of 6-and-a-half bits for ASCII characters in miniPicoLisp does not allow for UTF-8 support or external symbol encodings. Cheers, - Alex -- UNSUBSCRIBE: mailto:[EMAIL PROTECTED]