>  > The question that I haven't thought through is if it's possible
>  > to Huffman encode data without destroying the sort order, i.e.,
>  > if I have "aaaa", "aaab" and "aaac", does the Huffman encoded
>  > representation sort in the exact same order?  (Remember, the
>
>  Since the words are compressed according to frequencies, the order
> is not preserved. If compressing is done based on trigram frequencies
> for instance, the order of aaa and aab will depend on their frequency,
> not their lexicographic order. 

It seems to me that we could write a Huffman engine that *would*
preserve the sort order.  It wouldn't be the best possible encoding,
but it might still be relatively good.

The tradeoff would be that we wouldn't have to decompress keys
to do comparisons.  (We could do the best possible Huffman
encoding on the data items, and a Huffman encoding on the keys
that would preserve the sort order.)

Someone that knows Huffman encoding should do rough calculations
on the relative compression we'd expect in, say, English, with
the two engines.

Regards,
--keith

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Keith Bostic
Sleepycat Software Inc.         [EMAIL PROTECTED]
394 E. Riding Dr.               +1-978-287-4781
Carlisle, MA 01741-1601         http://www.sleepycat.com


------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the SUBJECT of the message.

Reply via email to