On Thu, Jun 30, 2005 at 11:27:48AM -0600, Kevin Atkinson wrote:
> > I assume this applies only to non affix compressed wordlists. 
> 
> No it also helps with affix compressed word lists.

Fine, I added today support for *.cwl.gz files to our aspell-autobuildhash
branch. This should work for prezip+gzip affix compressed wordlists as
long as they have that extension.

> 
> > I think we
> > should also encourage affix compression when possible, hash sizes are much
> > better.
> 
> If the language in question does not have does not have any sort of 
> soundslike data, than affix compression is a clear win;  however, if 
> the language uses phonetic soundslike data (such as German, French, and 
> most importunately English) than the choice to enable affix compression 
> in the compiled hash table is much more difficult.  Aspell will support 
> both affix compression and phonetic soundslike lookup but the results may 
> not be as good.  Thus I, in general, don't recommend it.  You should use 
> the settings in the official dictionary package.

This seems something to be decided in a per dict basis and delegated on dict
maintainers. For some dicts the replacements code might suffice, but for
some others real phonetic code is needed. Not to mention sizes. When we
played with an aspell version of the catalan dictionary, the hash file went
over 100MB (yes, not a typo), but in my experiments with affix compression,
I think it went below 4MB (not have the dict here). The choice at that time
was to severely strip down the dictionary so the hash was manageable
(I think was ~10MB).

Brian, we should write something about this so dict maintainers have a clue.

> 
> > Regarding bzip2, it implies adding another dependency. While most systems
> > already have it installed I personally prefer using gzip, which must always
> > be present, even if that implies a larger size.
> 
> Well bzip2 is very common now and in fact my dictionary packages use bzip2
> and no one has complained yet.  This sounds like a policy decision that
> should possibly be discussed with other Debian developers (or whoever the
> appropriate people are on policy decisions such as that).

But that is for building the hash, that is usually done by dict package
maintainer. In our case hash will be built by each user, so each user is
forced to have bzip2 installed. The vast majority will have it, but I am
rather reluctant to force that.

And from other message,
------------------------

> > You may also consider using "--validate-words" but those checks are
> > not very expensive.

> That should be "--dont-validate-words".

Thanks,

I only added "--dont-validate-affixes", I tested only gl-minimos and ca,
and catalan dict has some useless flags. Piping through
"aspell clean strict" gave me a lot of errors on the
'point in the middle char' that is used in catalan as part of words,
and I think that I had errors even on flag slashes in the affix
compressed ispell wordlist.

Cheers,

-- 
Agustin


-- 
To UNSUBSCRIBE, email to [EMAIL PROTECTED]
with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]

Reply via email to