On Wed, 20 Feb 2002, Otis Gospodnetic wrote:
> > (1)I rewrote StandardAnalyzer as StrictAnalyzer for the project I am
> > working
> > on. StandardAnalyzer does not filter enough words for my liking.
> > Basically all I did was add to the STOP_WORDS array. The stop words
> > I added
> > are based on the default values in SQL Server 2000's text indexing.
> > (Source code below)
>
> The change seems simple and looks fine to me. If nobody complains
> until tonight I'll commit it.
As Dmitry said, it seems to me that adding classes to a project which
differ from one another only in static data is poor software engineering
practice, and probably confusing to users. Since StopAnalyzer has a
constructor which allows users to specify their own arrays of stop words,
I'm not sure what the benefit of StrictAnalyzer is.
On the other hand, I do think that providing a repository of alternative
prefabricated stop word arrays would be useful to users. I suggest the
following:
(1) Create an area on the Lucene website to a repository of such
things. (Does Lucene have a 'contributions' ftp site?)
(2) Leave StopAnalyzer as is, to avoid confusion by people upgrading to
the new version, but include a link in the documentation to the
aforementioned repository.
Joshua
[EMAIL PROTECTED] Per Obscurius...www.ics.uci.edu/~jmadden
Joshua Madden: Information Scientist, Musician, Philosopher-At-Tall
It's that moment of dawning comprehension that I live for--Bill Watterson
My opinions are too rational and insightful to be those of any organization.
--
To unsubscribe, e-mail: <mailto:[EMAIL PROTECTED]>
For additional commands, e-mail: <mailto:[EMAIL PROTECTED]>