William D Clinger scripsit: > In the reference implementation, all of the Unicode tables > add up to a little over 85000 bytes (on a 32-bit machine).
Impressive, and thanks for pointing me there. I see you mostly use inversion lists (a sorted vector of codepoints at which the value of a property changes), with associated values where required, and nicely fast-path ASCII and the BMP (plane 0). Inversion lists are compact, but in most cases ICU uses tries, trading space for improved lookup speed. Details are at http://macchiato.com/slides/Bits_of_Unicode.ppt and http://icu-project.org/docs/papers/foldedtrie_iuc21.ppt . Mozilla uses (unless it has changed) binary trees of SSGO records: Start/Size/Gap/Offset, where Offset is used for mappings, and is the delta between a codepoint and the codepoint it's mapped to. Gap is a flag that is set if this particular Start-Size range is gappy; that is, if it only includes every other codepoint. This comes up where Unicode encodes alternating upper and lower case letters. ASCII is fast-pathed. -- John Cowan http://ccil.org/~cowan [email protected] Monday we watch-a Firefly's house, but he no come out. He wasn't home. Tuesday we go to the ball game, but he fool us. He no show up. Wednesday he go to the ball game, and we fool him. We no show up. Thursday was a double-header. Nobody show up. Friday it rained all day. There was no ball game, so we stayed home and we listened to it on-a the radio. --Chicolini _______________________________________________ r6rs-discuss mailing list [email protected] http://lists.r6rs.org/cgi-bin/mailman/listinfo/r6rs-discuss
