Dan Sugalski <[EMAIL PROTECTED]> writes:

[...]

> No, it doesn't. It needs to preallocate a few entries in the TOC at
> the start of the chunk, but that's it. Not that much waste, even if
> some of the metadata's in the TOC.
> 
> 
> The point is to have a file format that does what we need it
> to--present executable bytecode data to the interpreter--as fast as
> possible. Everything else is secondary to that. The rest of the
> metadata's needed, so it's there, but access to it doesn't need to be
> as fast.
> 
> 
> Yes, it'll only be a few microseconds, but it's a few microseconds
> versus a few bytes in the file. Disk space is less dear than cycles.

[...]

Last weekend I tried to implement something like this. It took several
attemps to get the segemented bytecode, the preallocated toc-entries
and the need for speed together.

First idea was to have a global list of segments each with its own
TOC. But this has the problem, that the fixup between the diffrent
segments needs an extra indirection, and it limits the extenability to
adding chunks to a segment. A global table with a fixed number of
chunks per segments violates the rule of no arbitary limits. An other
problem is that that unpacking should be as seldom as possible before
the first bytecode is executed; other segements should only loaded on
demand.

My proposed solution is this:
There exists only on global TOC. This consits of a list of fixed sized
items one for each chunk of data. If no wordsizetransformation or
endianization are necessary this can be memmapped and each item can be
accessed by array lookup.

Each chunk is a binary blob of data in the packfile identified by a
(offset,size) pair in the TOC. Chunks don't overlap. They also take a
type field wich identifies the way to unpack the data.

Chunks can be grouped together. Therefore each directory item has a
skip field. The following items belong to its group. The group-head is
responsible for handling its children. The bytecode can have its
children at fixed positon, e.g CONSTANT 1, FIXUP 2, LINE_INFO 3, ...

Here is an example

items: 7
TOC:
    item1:
         offset: chunk1
         size:
         type: BYTECODE
         skip: 4
    item2:
         offset: chunk2
         size:
         type: CONSTANT
         skip: 1
    item3:
         offset: chunk3
         size:
         type: FIXUP
         skip: 1
    item4:
         offset: chunk4
         size:
         type: LINE_INFO
         skip: 1
    item5:
         offset: chunk5
         size:
         type: BYTECODE
         skip: 2
    item6:
         offset: chunk6
         size:
         type: CONSTANT
         skip: 1
    item7:
         offset: chunk7
         size:
         type: foobar
         skip: 1

chunk1:
    ...
chunk2:
    ...
....

The first bytecode chunk can be found very fast. The TOC is memmapped,
and the chunks are searched for the first top level bytecode
segment. In a typical packfile this will be the first chunk. 
If its turns out that we need _many_ segments, and they need to be
accessed in random order, it is possible to add a sibling table for
fast access of the individual bytecode segements. This is possible
without changing the general bytecode format, and without decreasing
performance.

One side note: It might be nice to have individual version numbers for
each chunk of data. A change of core.ops does not change the packfile
format, only the data in the bytecode section is invalid. But this
might go to far.

Comments?
b.
-- 
Juergen Boemmels                        [EMAIL PROTECTED]
Fachbereich Physik                      Tel: ++49-(0)631-205-2817
Universitaet Kaiserslautern             Fax: ++49-(0)631-205-3906
PGP Key fingerprint = 9F 56 54 3D 45 C1 32 6F  23 F6 C7 2F 85 93 DD 47

Reply via email to