At 8:48 AM +0100 3/7/00, Martin Povolny wrote:
>For some languages indexing over word roots makes much better sence than
>over whole words this is absolutly true for Czech.
>So we have experimented with lemma (commercial) and ajka (almost finished GNU)
>lemmatization software to get word roots, finaly we took out part of ispell --
>access to the hash and used this becouse it can be used also with other
>languages (but it knows much fewer word forms than the other two).
We'd be interested to see how you've done this. As for ispell, I
don't know offhand how you write the affix files, but it's definitely
possible to add more word forms to it. I know the German ispell files
are quite complete.
>At present we're trying to index out faculty's web, but it seems that
>e algorithm htdig uses for creation of the inverted file is too naive --
>seems to me like it's tryning to apply unix 'sort' on a 1GB file...
That is what it's doing. If you're concerned about it, I'd switch to
the 3.2 code, which builds the inverted index on-the-fly during
indexing.
Cheers,
-Geoff
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.