Duncan Coutts <[EMAIL PROTECTED]> writes:

> To get something really compact we could use an index composed of three
> unboxed Int arrays. 

To get something *really* compact, we could build a (kind of) suffix
array.  That is, we do a lexical sort of the lines, and store the
sorted offsets of the lines in an array.  You can then look up any
index by binary search using this array. That's 10-15 characters plus
one offset (4 bytes) per line, ~1.3 times the original file.

(There are also compressed suffix array structures, but I don't think
those will gain you anything if you don't store all the positions.)

-k
-- 
If I haven't seen further, it is by standing in the footprints of giants
_______________________________________________
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Reply via email to