Greetings Neal,

I'll leave this to you.

Indexing the stems is a good suggestion.  It would 
certainly give faster searching.  If it replaced the 
unstemmed inverted file then it would also save on storage 
requirements, but it would mean we couldn't search on the 
unstemmed version (if that is of concern).  Alternatively, 
indexing both the stemmed and unstemmed versions may be a 
bit extravagant...

I have also been wondering if it is possible to turn off 
word-level indexing, to give (much) smaller inverted files 
if people don't need phrase searching.  Does anybody know? 
That would be a compelling reason to store word attributes 
in a pure bit-map format, rather than using the more 
compact formats we were discussing recently.

Cheers,
Lachlan

On Tue, 3 Dec 2002 08:08, Neal Richter wrote:
> This is on my list of things to work on..
>
> An alternative is to have a separate word stemmer which
> stores the words in the index in stemmed form.
>
> The Porter Stemming algorithm is good for this, and I
> have code to do it.

-- 
Lachlan Andrew  Phone: +613 8344-3816 Fax: +613 8344-6678
Dept of Electrical and Electronic Engg          CRICOS Provider Code
University of Melbourne, Victoria, 3010  AUSTRALIA      00116K


-------------------------------------------------------
This SF.net email is sponsored by: Get the new Palm Tungsten T 
handheld. Power & Color in a compact size! 
http://ads.sourceforge.net/cgi-bin/redirect.pl?palm0002en
_______________________________________________
htdig-dev mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/htdig-dev

Reply via email to