Re: Analyzer thread safety; Stop words

Antony Bowesman Wed, 29 Nov 2006 22:25:07 -0800

Yonik Seeley wrote:

On 11/29/06, Antony Bowesman <[EMAIL PROTECTED]> wrote:

Yonik Seeley wrote:


The GreekAnalyzer is just an example of how you can use existing
Analyzers (as long as they have a default constructor), but it's not
the recommended approach.

TokenFilters are preffered over Analyzers.... you can plug them
together in any way you see fit to solve your analysis problem.  For
Solr, an added bonus of using chains of filters  is that Solr can
"know" about the results after each filter and show you the results on
an analysis web page (very useful for debugging).

If I were to analyze greek text, I might do something like this:

<fieldtype name="text" class="solr.TextField" positionIncrementGap="100">
     <analyzer>
         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
         <filter class="solr.SynonymFilterFactory"
synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
         <filter class="solr.LowerCaseFilterFactory"/>
         <filter class="solr.StopFilterFactory" words="stopwords.txt"/>

<filter class="solr.SnowballPorterFilterFactory"language="Greek" />

xt"/>
     </analyzer>
</fieldtype>

If you try to put everything in Analyzer constructors, you get
combinatorial explosion.

I guess you would use methods rather than, as you say, getting into constructorhell. Anyway, I'll have a deeper look at the solr stuff when I get to phase 2.Right now, I've gone as far with analysis as I need to, but I would like toget better configuration than I've currently got. I know it will come back tobite...


Thanks for your comments Yonik
Antony



---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Analyzer thread safety; Stop words

Reply via email to