For sorted lists of text, like dictionaries, one quick-to-decode technique that saves a fair amount of space, is to start each string with the number of bytes that match the previous string, and then append the remainder of the string.
In other words, the list of words
though thought thoughtful
would reduce to
0though 6t 7ful
I seem to recall stumbling across a Perl module that does this sort of thing once, but I'm not getting the right keywords in my searches to find it again. Or else I'm searching in the wrong places (CPAN, Google).
Any one know where such a module might be hiding?
Hi Glenn,
I think the term your thinking of is stemming. Maybe Lingua-Stem <http://search.cpan.org/dist/Lingua-Stem/> is what your looking for?
Regards, Randy.
_______________________________________________ Perl-Win32-Users mailing list [EMAIL PROTECTED] To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs