> From: Dmitry Serebrennikov [mailto:[EMAIL PROTECTED]] > > I know at least in my case, I have a much more extensive list of stop > words and they are simply read from a file into an array and > then passed > to the existing class. Would this approach work in your case?
I think that serious applications will usually need to define an Analyzer class, or at least parameterize an existing class, rather than just use something as-is off the shelf. They might want to analyze different fields differently, or might want to use a particular stop list, or might care about how particular acronyms are tokenized and normalized. So we should not attempt to provide analyzers that make everyone happy: that effort is destined to fail. Rather, we should attempt to provide tools to make it easy to create lots of different, useful, analyzers. I think the proposed StrictAnalyzer shows that the analyzer toolkit is good: Alan was able to create the analyzer he needs with just a few lines of code, mostly assembling existing bits and pieces. It would be simpler yet if he was able to extend StandardAnalyzer, providing just a different stop list. So the action item I see is that StandardAnalyzer should be made non-final. We should not change the default stop lists in Lucene, since that would break existing indexes when folks upgrade to a new version of Lucene. A library of file-based stop lists is a good idea, though. Doug -- To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]> For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>
