Re: Analyzer thread safety; Stop words

Antony Bowesman Wed, 29 Nov 2006 19:10:04 -0800

Yonik Seeley wrote:

On 11/29/06, Antony Bowesman <[EMAIL PROTECTED]> wrote:

That's true, but all the existing Analyzers allow the stop set to beconfigured
via the analyzer constructors, but in different ways.


But you can duplicate most Analyzers (all the ones in Lucene?) with a
chain of Tokenizers and TokenFilters (since that is how almost all of
them are implemented).  Most Analyzers are simply shortcuts to putting
together your own.

Something seems confused to me. Although stop words are use by Filters, theyare currently exposed via Analyzers which is the granularity used at theIndexWriter/Parser levels. This is what contributors are writing, not Filters.

There are lots of analysis contributions which deal with stop words that areperfectly usable as is. They shouldn't need to be duplicated to be re-used andif that's needed, it points to a deficiency in the design. If we all have toput together our own, again, doesn't this argue that there should be a standardway of doing it at the higher Analyzer level.

Sure, the solr way of using the configurable filters gives great flexibility,but in your solrconfig.xml example it shows how the GreekAnalyzer can bedeployed, but it also highlights the problem that it does not seem to bepossible to make use of the stopword Hashtable available to the GreekAnalyzerconstructor.

It seems to me that Lucene would benefit if there was an Analyzer Interface. Onthe other hand, maybe your TokenFilterFactory stuff would be useful as part ofLucene.


Anyway, just my penny's worth.
Antony


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Analyzer thread safety; Stop words

Reply via email to