At 8:48 AM +0100 3/7/00, Martin Povolny wrote:
>For some languages indexing over word roots makes much better sence than
>over whole words this is absolutly true for Czech.
>So we have experimented with lemma (commercial) and ajka (almost finished GNU)
>lemmatization software to get word roots, finaly we took out part of ispell --
>access to the hash and used this becouse it can be used also with other
>languages (but it knows much fewer word forms than the other two).

We'd be interested to see how you've done this. As for ispell, I 
don't know offhand how you write the affix files, but it's definitely 
possible to add more word forms to it. I know the German ispell files 
are quite complete.

>At present we're trying to index out faculty's web, but it seems that
>e algorithm htdig uses for creation of the inverted file is too naive --
>seems to me like it's tryning to apply unix 'sort' on a 1GB file...

That is what it's doing. If you're concerned about it, I'd switch to 
the 3.2 code, which builds the inverted index on-the-fly during 
indexing.

Cheers,
-Geoff


------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] 
You will receive a message to confirm this. 

Reply via email to