On Thu, Jun 30, 2005 at 11:27:48AM -0600, Kevin Atkinson wrote: > > I assume this applies only to non affix compressed wordlists. > > No it also helps with affix compressed word lists.
Fine, I added today support for *.cwl.gz files to our aspell-autobuildhash branch. This should work for prezip+gzip affix compressed wordlists as long as they have that extension. > > > I think we > > should also encourage affix compression when possible, hash sizes are much > > better. > > If the language in question does not have does not have any sort of > soundslike data, than affix compression is a clear win; however, if > the language uses phonetic soundslike data (such as German, French, and > most importunately English) than the choice to enable affix compression > in the compiled hash table is much more difficult. Aspell will support > both affix compression and phonetic soundslike lookup but the results may > not be as good. Thus I, in general, don't recommend it. You should use > the settings in the official dictionary package. This seems something to be decided in a per dict basis and delegated on dict maintainers. For some dicts the replacements code might suffice, but for some others real phonetic code is needed. Not to mention sizes. When we played with an aspell version of the catalan dictionary, the hash file went over 100MB (yes, not a typo), but in my experiments with affix compression, I think it went below 4MB (not have the dict here). The choice at that time was to severely strip down the dictionary so the hash was manageable (I think was ~10MB). Brian, we should write something about this so dict maintainers have a clue. > > > Regarding bzip2, it implies adding another dependency. While most systems > > already have it installed I personally prefer using gzip, which must always > > be present, even if that implies a larger size. > > Well bzip2 is very common now and in fact my dictionary packages use bzip2 > and no one has complained yet. This sounds like a policy decision that > should possibly be discussed with other Debian developers (or whoever the > appropriate people are on policy decisions such as that). But that is for building the hash, that is usually done by dict package maintainer. In our case hash will be built by each user, so each user is forced to have bzip2 installed. The vast majority will have it, but I am rather reluctant to force that. And from other message, ------------------------ > > You may also consider using "--validate-words" but those checks are > > not very expensive. > That should be "--dont-validate-words". Thanks, I only added "--dont-validate-affixes", I tested only gl-minimos and ca, and catalan dict has some useless flags. Piping through "aspell clean strict" gave me a lot of errors on the 'point in the middle char' that is used in catalan as part of words, and I think that I had errors even on flag slashes in the affix compressed ispell wordlist. Cheers, -- Agustin -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]