Dan Sugalski:
# Huh? No, you misunderstand. Each chunk of the bytecode has a separate 
# TOC for stuff like this. The full identifier would be 
# file/chunk/entry, which should be reasonably guaranteed to be unique. 
# When the compiler's emitting code to reference a piece of binary data 
# (which is essentially a big binary string constant, but I realize 
# that having it in separate segments is terribly useful) it can turn 
# any human-readable identifier into the internal identifier the engine 
# needs to look up the actual data.

        DIRECTORY:
                SEG 1 OFFSET: 324
                SEG 2 OFFSET: 2496
                SEG 3 OFFSET: 32482
                ...

        SEG 1:
                TYPE: Line Locations
                LENGTH: 2070
                DATA: 101011101001...

I was thinking in terms of what TYPE: stores; it seems you were thinking
about how you identify a particular segment.  Yeah, you can probably get
away with just numbering the segments, although that might slow things
down a bit when you're looking for a particular type of segment.  (In
foo.pbc, the line location segment might be 1, but in bar.pbc, it's 2.)

BTW, my father (a programmer too, although most of his work is with
database-driven programs) suggested a solution that's half-way between
string and number: hash the string and use the hash as the number.  With
a good hashing function (say, MD5 with the four chunks XORed together)
you'll probably be able to avoid collisions but still have unique
identifiers.

--Brent Dax <[EMAIL PROTECTED]>
@roles=map {"Parrot $_"} qw(embedding regexen Configure)

Wire telegraph is a kind of a very, very long cat. You pull his tail in
New York and his head is meowing in Los Angeles. And radio operates
exactly the same way. The only difference is that there is no cat.
    --Albert Einstein (explaining radio)

Reply via email to