Greetings Neal,

Your suggestion sounds good, especially steps 1 and 2...

I have some beginner's questions:

- Given the flag to disable stemming, what is the
  dissadvantage of simply making it a three-value flag:
  index unstemmed words, index stems, index both?

- The format you describe sounds like a "half-inverted"
  file -- listing locations *within* a document by word, but
  listing *document* locations by document.  Is that
  correct?

- You said that the approach currently taken by fuzzy
  endings is uncharted waters.  I assume you are talking
  about the approach of simply creating a disjunction of
  the derived words.  What is hard to "get right" about
  that?  In terms of the documents returned, it sounds the
  same as what you have proposed.  In terms of
  implementation, it sounds like what 'fuzzy endings' does
  now, except for fixing the stemming.

- With stemming in general, what is done about negating
  affixes?  If I searched for 'mercy', I wouldn't want
  results about 'merciless' (although I would want results
  about 'merciful').

Thanks!
Lachlan

On Sat, 7 Dec 2002 07:09, Neal Richter wrote:

>   I agree with Geoff in that we don't want to go with
> stemming exclusively..
>   Here's a proposal for 'intelligent stemming' in HtDig:
>
>   1.  Fix index efficiency.
>   2.  Add a configuration switch to disable stemming ;-)
>   3.  Implement the stemming algorithm to ADD additional
> rows to the index with stemmed versions of the words
> (with a row flag to signify this).
>    This system does add duplicate rows in a sense to the
> index.
>
> traveling -> travel
> travel -> travel
> travels -> travel
> traveler -> travel
> traveled -> travel
>
>    Document    Word          StemFlag  Locations
>
>    20          traveling     0         24  36 110
>    20          travel        0         52  98 220
>    20          travels       0         10  75 340
>    20          traveler      0         13  180
>    20          traveled      0         200
>    20          travel        1         10 13 24 36 52 75
> 98 110 180 200 220 340
>
> FEEDBACK PLEASE!!


-- 
Lachlan Andrew  Phone: +613 8344-3816 Fax: +613 8344-6678
Dept of Electrical and Electronic Engg          CRICOS Provider Code
University of Melbourne, Victoria, 3010  AUSTRALIA      00116K


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
htdig-dev mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/htdig-dev

Reply via email to