Hi Yousef

You are not doing anything wrong - its just how the Porter stemmer works!

The problem with Porter is that it tries to do everything in a purely algorithmic way 
- which doesn't cater for irregular conjugations etc.

Don't worry too much though, as long as you do the same stemming on the query string 
as you did while indexing - you should be able to find what you are looking for but 
can have some issues with trailing wildcards.....

If you want a better stemmer, look for something that has a dictionary as well as 
algorithmic rules - a quick one that is readily available is Kstem which while not 
perfect I think is quite a bit better than Porter.

You can get the source code (Kstem.jar) from the floowing website:

http://ciir.cs.umass.edu/downloads/

For more info on Kstem see the paper by its designer Bob Krovetz at:

http://ciir.cs.umass.edu/pubfiles/ir-35.pdf

Cheers

Pete


----- Original Message ----- 
From: "Yousef Ourabi" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Saturday, November 06, 2004 1:13 AM
Subject: Stemming Oddness


> Hey,
> Thanks for everyone's reply to my last post, I have
> some quesiton. I imported the PorterStemmer and when I
> did the following
> 
> PorterStemmer ps = new PorterStemmer();
> string r1 = ps.stem("elephant");
> r1 is 'eleph'
> 
> also buying stems to bui, is this normal? Am I doing
> something wrong.
> 
> I am calling reset inbetween function calls.
> 
> Thanks,
> Yousef
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 

Reply via email to