Yonik Seeley wrote:
On 11/29/06, Antony Bowesman <[EMAIL PROTECTED]> wrote:

That's true, but all the existing Analyzers allow the stop set to be configured
via the analyzer constructors, but in different ways.

But you can duplicate most Analyzers (all the ones in Lucene?) with a
chain of Tokenizers and TokenFilters (since that is how almost all of
them are implemented).  Most Analyzers are simply shortcuts to putting
together your own.

Something seems confused to me. Although stop words are use by Filters, they are currently exposed via Analyzers which is the granularity used at the IndexWriter/Parser levels. This is what contributors are writing, not Filters.

There are lots of analysis contributions which deal with stop words that are perfectly usable as is. They shouldn't need to be duplicated to be re-used and if that's needed, it points to a deficiency in the design. If we all have to put together our own, again, doesn't this argue that there should be a standard way of doing it at the higher Analyzer level.

Sure, the solr way of using the configurable filters gives great flexibility, but in your solrconfig.xml example it shows how the GreekAnalyzer can be deployed, but it also highlights the problem that it does not seem to be possible to make use of the stopword Hashtable available to the GreekAnalyzer constructor.

It seems to me that Lucene would benefit if there was an Analyzer Interface. On the other hand, maybe your TokenFilterFactory stuff would be useful as part of Lucene.

Anyway, just my penny's worth.
Antony


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to