On Thu, Jun 30, 2005 at 05:39:29AM -0600, Kevin Atkinson wrote: > > A few things to take into consideration. > > 1) To Minimize the Space Used The Word List Should be Compressed with > "prezip -s". (The "-s" is to resort the word list using the "C" local > which is needed for maximum compressed with prezip). And than further > compressed with bzip2. You can decompress it by piping it through "bzcat > | precat". To give you an idea on sizes using various methods here are > the file sizes for en-common.wl (.cwl is the word list compressed with > prezip) > > 1224 en-common.wl > 424 en-common.cwl > 136 en-common.cwl.bz2 > 164 en-common.cwl.gz > 432 en-common.wl.bz2 > 332 en-common.wl.gz > > yes bzip2 is WORSE than gzip on a sorted word list. > > Also prezip and friends consists of an ANSI C program and some shell > scripts which can easily be separated out into a separate package so that > you can also use them with Ispell if so desired.
Thanks for the hints, Kevin I assume this applies only to non affix compressed wordlists. I think we should also encourage affix compression when possible, hash sizes are much better. Regarding bzip2, it implies adding another dependency. While most systems already have it installed I personally prefer using gzip, which must always be present, even if that implies a larger size. All the last tests I did were using plain gzip and affix compressed wordlists. Only the very first tests were done with gzipped raw (no prezip) wordlists. Since the system seems viable, I will add support for prezip/precat if wordlist name is of the form .cwl.gz. > > 2) To avoid spitting out a bunch of warnings during compile you should > clean it by piping it though "aspell clean strict". This will remove all > problem words and affix flags that Aspell will complain about when > compiling. The compiled dictionary should be the same with either the > dirty or the clean word list. You can also use "aspell clean" but that > but that handles some errors in a different way and the resulting compiled > dictionary may be different. > > 3) Aspell by defaults performs a number of checks when creating a > word list, some if these can be expensive. You can disable the expensive > one with "--dont-validate-affixes". If you clean the word list first this > should be 100% safe. It should also be safe to use on a dirty word list > as the invalid affix flags don't cause a problem in the compiled word > list. You may also consider using "--validate-words" but those checks are > not very expensive. This is very interesting info, I was worrying about the bunch of bug reports that would follow all those warnings about non applicable affix flags. Again, thanks for your feedback Cheers, -- Agustin -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]