Yonik Seeley wrote:
On 11/29/06, Antony Bowesman <[EMAIL PROTECTED]> wrote:
That's true, but all the existing Analyzers allow the stop set to be
configured
via the analyzer constructors, but in different ways.
But you can duplicate most Analyzers (all the ones in Lucene?) with a
chain of Tokenizers and TokenFilters (since that is how almost all of
them are implemented). Most Analyzers are simply shortcuts to putting
together your own.
Something seems confused to me. Although stop words are use by Filters, they
are currently exposed via Analyzers which is the granularity used at the
IndexWriter/Parser levels. This is what contributors are writing, not Filters.
There are lots of analysis contributions which deal with stop words that are
perfectly usable as is. They shouldn't need to be duplicated to be re-used and
if that's needed, it points to a deficiency in the design. If we all have to
put together our own, again, doesn't this argue that there should be a standard
way of doing it at the higher Analyzer level.
Sure, the solr way of using the configurable filters gives great flexibility,
but in your solrconfig.xml example it shows how the GreekAnalyzer can be
deployed, but it also highlights the problem that it does not seem to be
possible to make use of the stopword Hashtable available to the GreekAnalyzer
constructor.
It seems to me that Lucene would benefit if there was an Analyzer Interface. On
the other hand, maybe your TokenFilterFactory stuff would be useful as part of
Lucene.
Anyway, just my penny's worth.
Antony
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]