RE: Stemming

2005-01-24 Thread Kevin L. Cobb
Do stemming algorithms take into consideration abbreviations too? Some
examples:

mg = milligrams
US = United States
CD = compact disc
vcr = video casette recorder

And, the next logical question, if stemming does not take care of
abbreviations, are there any solutions that include abbreviations inside
or outside of Lucene?

Thanks,

Kevin


-Original Message-
From: Chris Lamprecht [mailto:[EMAIL PROTECTED] 
Sent: Friday, January 21, 2005 5:51 PM
To: Lucene Users List
Subject: Re: Stemming

Also if you can't wait, see page 2 of
http://www.onjava.com/pub/a/onjava/2003/01/15/lucene.html

or the LIA e-book ;)

On Fri, 21 Jan 2005 09:27:42 -0500, Kevin L. Cobb
[EMAIL PROTECTED] wrote:
 OK, OK ... I'll buy the book. I guess its about time since I am deeply
 and forever in love with Lucene. Might as well take the final plunge.
 
 
 -Original Message-
 From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
 Sent: Friday, January 21, 2005 9:12 AM
 To: Lucene Users List
 Subject: Re: Stemming
 
 Hi Kevin,
 
 Stemming is an optional operation and is done in the analysis step.
 Lucene comes with a Porter stemmer and a Filter that you can use in an
 Analyzer:
 
 ./src/java/org/apache/lucene/analysis/PorterStemFilter.java
 ./src/java/org/apache/lucene/analysis/PorterStemmer.java
 
 You can find more about it here:
 http://www.lucenebook.com/search?query=stemming
 You can also see mentions of SnowballAnalyzer in those search results,
 and you can find an adapter for SnowballAnalyzers in Lucene Sandbox.
 
 Otis
 
 --- Kevin L. Cobb [EMAIL PROTECTED] wrote:
 
  I want to understand how Lucene uses stemming but can't find any
  documentation on the Lucene site. I'll continue to google but hope
  that
  this list can help narrow my search. I have several questions on the
  subject currently but hesitate to list them here since finding a
good
  document on the subject may answer most of them.
 
 
 
  Thanks in advance for any pointers,
 
 
 
  Kevin
 
 
 
 
 
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Stemming

2005-01-24 Thread Erik Hatcher
On Jan 24, 2005, at 7:24 AM, Kevin L. Cobb wrote:
Do stemming algorithms take into consideration abbreviations too?
No, they don't.  Adding abbreviations, aliases, synonyms, etc is not 
stemming.

And, the next logical question, if stemming does not take care of
abbreviations, are there any solutions that include abbreviations 
inside
or outside of Lucene?
Nothing built into Lucene does this, but the infrastructure allows it 
to be added in the form of a custom analysis step.  There are two basic 
approaches, adding aliases at indexing time, or adding them at query 
time by expanding the query.  I created some example analyzers in 
Lucene in Action (grab the source code from the site linked below) that 
demonstrate how this can be done using WordNet (and mock) synonym 
lookup.  You could extrapolate this into looking up abbreviations and 
adding them into the token stream.

http://www.lucenebook.com/search?query=synonyms
Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Stemming

2005-01-21 Thread Otis Gospodnetic
Hi Kevin,

Stemming is an optional operation and is done in the analysis step. 
Lucene comes with a Porter stemmer and a Filter that you can use in an
Analyzer:

./src/java/org/apache/lucene/analysis/PorterStemFilter.java
./src/java/org/apache/lucene/analysis/PorterStemmer.java

You can find more about it here:
http://www.lucenebook.com/search?query=stemming
You can also see mentions of SnowballAnalyzer in those search results,
and you can find an adapter for SnowballAnalyzers in Lucene Sandbox.

Otis

--- Kevin L. Cobb [EMAIL PROTECTED] wrote:

 I want to understand how Lucene uses stemming but can't find any
 documentation on the Lucene site. I'll continue to google but hope
 that
 this list can help narrow my search. I have several questions on the
 subject currently but hesitate to list them here since finding a good
 document on the subject may answer most of them. 
 
  
 
 Thanks in advance for any pointers,
 
  
 
 Kevin
 
  
 
  
 
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Stemming

2005-01-21 Thread Kevin L. Cobb
OK, OK ... I'll buy the book. I guess its about time since I am deeply
and forever in love with Lucene. Might as well take the final plunge.



-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] 
Sent: Friday, January 21, 2005 9:12 AM
To: Lucene Users List
Subject: Re: Stemming

Hi Kevin,

Stemming is an optional operation and is done in the analysis step. 
Lucene comes with a Porter stemmer and a Filter that you can use in an
Analyzer:

./src/java/org/apache/lucene/analysis/PorterStemFilter.java
./src/java/org/apache/lucene/analysis/PorterStemmer.java

You can find more about it here:
http://www.lucenebook.com/search?query=stemming
You can also see mentions of SnowballAnalyzer in those search results,
and you can find an adapter for SnowballAnalyzers in Lucene Sandbox.

Otis

--- Kevin L. Cobb [EMAIL PROTECTED] wrote:

 I want to understand how Lucene uses stemming but can't find any
 documentation on the Lucene site. I'll continue to google but hope
 that
 this list can help narrow my search. I have several questions on the
 subject currently but hesitate to list them here since finding a good
 document on the subject may answer most of them. 
 
  
 
 Thanks in advance for any pointers,
 
  
 
 Kevin
 
  
 
  
 
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Stemming

2005-01-21 Thread Chris Lamprecht
Also if you can't wait, see page 2 of
http://www.onjava.com/pub/a/onjava/2003/01/15/lucene.html

or the LIA e-book ;)

On Fri, 21 Jan 2005 09:27:42 -0500, Kevin L. Cobb
[EMAIL PROTECTED] wrote:
 OK, OK ... I'll buy the book. I guess its about time since I am deeply
 and forever in love with Lucene. Might as well take the final plunge.
 
 
 -Original Message-
 From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
 Sent: Friday, January 21, 2005 9:12 AM
 To: Lucene Users List
 Subject: Re: Stemming
 
 Hi Kevin,
 
 Stemming is an optional operation and is done in the analysis step.
 Lucene comes with a Porter stemmer and a Filter that you can use in an
 Analyzer:
 
 ./src/java/org/apache/lucene/analysis/PorterStemFilter.java
 ./src/java/org/apache/lucene/analysis/PorterStemmer.java
 
 You can find more about it here:
 http://www.lucenebook.com/search?query=stemming
 You can also see mentions of SnowballAnalyzer in those search results,
 and you can find an adapter for SnowballAnalyzers in Lucene Sandbox.
 
 Otis
 
 --- Kevin L. Cobb [EMAIL PROTECTED] wrote:
 
  I want to understand how Lucene uses stemming but can't find any
  documentation on the Lucene site. I'll continue to google but hope
  that
  this list can help narrow my search. I have several questions on the
  subject currently but hesitate to list them here since finding a good
  document on the subject may answer most of them.
 
 
 
  Thanks in advance for any pointers,
 
 
 
  Kevin
 
 
 
 
 
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Stemming Oddness

2004-11-06 Thread Pete Lewis
Hi Yousef

You are not doing anything wrong - its just how the Porter stemmer works!

The problem with Porter is that it tries to do everything in a purely algorithmic way 
- which doesn't cater for irregular conjugations etc.

Don't worry too much though, as long as you do the same stemming on the query string 
as you did while indexing - you should be able to find what you are looking for but 
can have some issues with trailing wildcards.

If you want a better stemmer, look for something that has a dictionary as well as 
algorithmic rules - a quick one that is readily available is Kstem which while not 
perfect I think is quite a bit better than Porter.

You can get the source code (Kstem.jar) from the floowing website:

http://ciir.cs.umass.edu/downloads/

For more info on Kstem see the paper by its designer Bob Krovetz at:

http://ciir.cs.umass.edu/pubfiles/ir-35.pdf

Cheers

Pete


- Original Message - 
From: Yousef Ourabi [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Saturday, November 06, 2004 1:13 AM
Subject: Stemming Oddness


 Hey,
 Thanks for everyone's reply to my last post, I have
 some quesiton. I imported the PorterStemmer and when I
 did the following
 
 PorterStemmer ps = new PorterStemmer();
 string r1 = ps.stem(elephant);
 r1 is 'eleph'
 
 also buying stems to bui, is this normal? Am I doing
 something wrong.
 
 I am calling reset inbetween function calls.
 
 Thanks,
 Yousef
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 

Re: stemming feature

2002-12-10 Thread Otis Gospodnetic
Check out PorterStemFilter class and Analyzer class.  Then look at some
Analyzer implementations and see how to implement your own
PorterAnalyzer.

Otis

--- M Srinivas Rao [EMAIL PROTECTED] wrote:
 Hi all
 
 Does the lucene will do stemming of a word?  If yes can anyone
 say how to do it in java using lucene api.
 
 Thanks
 rgds
 srinivas
 
 __
 Do you Yahoo!?
 Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
 http://mailplus.yahoo.com
 
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 


__
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Stemming

2002-05-02 Thread Otis Gospodnetic

You could have a single index with both stemmed and non-stemmed terms,
using different field names for each and searching a different set of
fields depending on the type of search.
You'd also have to use 2 types of analyzers/filters, I think.
Roughly :)

Otis


--- Joel Bernstein [EMAIL PROTECTED] wrote:
 In our search application the user can turn stemming off and on.
 
 With Lucene will I have to maintain two sets of indexes to create
 this functionality, one
 stemming and one non-stemming index?
 
 Or
 
 Is there a way to query a stemming index so that it does not return
 stems?
 
 
 Thanks,
 Joel
 


__
Do You Yahoo!?
Yahoo! Health - your guide to health and wellness
http://health.yahoo.com

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]