Greetings Neal, I'll leave this to you.
Indexing the stems is a good suggestion. It would certainly give faster searching. If it replaced the unstemmed inverted file then it would also save on storage requirements, but it would mean we couldn't search on the unstemmed version (if that is of concern). Alternatively, indexing both the stemmed and unstemmed versions may be a bit extravagant... I have also been wondering if it is possible to turn off word-level indexing, to give (much) smaller inverted files if people don't need phrase searching. Does anybody know? That would be a compelling reason to store word attributes in a pure bit-map format, rather than using the more compact formats we were discussing recently. Cheers, Lachlan On Tue, 3 Dec 2002 08:08, Neal Richter wrote: > This is on my list of things to work on.. > > An alternative is to have a separate word stemmer which > stores the words in the index in stemmed form. > > The Porter Stemming algorithm is good for this, and I > have code to do it. -- Lachlan Andrew Phone: +613 8344-3816 Fax: +613 8344-6678 Dept of Electrical and Electronic Engg CRICOS Provider Code University of Melbourne, Victoria, 3010 AUSTRALIA 00116K ------------------------------------------------------- This SF.net email is sponsored by: Get the new Palm Tungsten T handheld. Power & Color in a compact size! http://ads.sourceforge.net/cgi-bin/redirect.pl?palm0002en _______________________________________________ htdig-dev mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/htdig-dev
