RE: StrictAnalyzer

Doug Cutting Wed, 20 Feb 2002 09:31:17 -0800

> From: Dmitry Serebrennikov [mailto:[EMAIL PROTECTED]]
> 
> I know at least in my case, I have a much more extensive list of stop 
> words and they are simply read from a file into an array and 
> then passed 
> to the existing class. Would this approach work in your case?


I think that serious applications will usually need to define an Analyzer
class, or at least parameterize an existing class, rather than just use
something as-is off the shelf.  They might want to analyze different fields
differently, or might want to use a particular stop list, or might care
about how particular acronyms are tokenized and normalized.

So we should not attempt to provide analyzers that make everyone happy: that
effort is destined to fail.  Rather, we should attempt to provide tools to
make it easy to create lots of different, useful, analyzers.

I think the proposed StrictAnalyzer shows that the analyzer toolkit is good:
Alan was able to create the analyzer he needs with just a few lines of code,
mostly assembling existing bits and pieces.  It would be simpler yet if he
was able to extend StandardAnalyzer, providing just a different stop list.

So the action item I see is that StandardAnalyzer should be made non-final.

We should not change the default stop lists in Lucene, since that would
break existing indexes when folks upgrade to a new version of Lucene.  A
library of file-based stop lists is a good idea, though.

Doug

--
To unsubscribe, e-mail:   <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>

RE: StrictAnalyzer

Reply via email to