RE: Language Specific Analyzer

Uwe Schindler Sat, 14 Nov 2015 09:11:19 -0800

Hi,

you cannot change the behavior of predefined analyzers! But since Lucene 5 
there is no need to write your own subclass to define a custom analyzer. Just 
use CustomAnalyzer and define via fluent builder API how your analysis should 
look like (see example in javadocs):


https://lucene.apache.org/core/5_3_1/analyzers-common/org/apache/lucene/analysis/custom/CustomAnalyzer.html

Please note: Language specific stemmers will fail to work correctly if the 
terms still contain punctuation! It also depends on the stemmer if lowercasing 
is needed before the stemmer.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

> -----Original Message-----
> From: marco turchi [mailto:marco.tur...@gmail.com]
> Sent: Saturday, November 14, 2015 5:39 PM
> To: java-user@lucene.apache.org
> Subject: Language Specific Analyzer
> 
> Dear Users,
> I need to develop my language specific analyzer that:
> 1) does not remove punctuations
> 2) lowercases and stems each term in the text.
> 
> I have tried some of the pre-implemented language analyzer (e.g. German
> and
> Italian analyzers), but they remove punctuation.  I/m not sure, but
> probably what I need is the whitespace analyzer instead of the standard
> analyzer.
> 
> Is there a way to force each language specific analyzer to use the
> whitespace analyzer or in general not to remove punctuations?
> 
> Thanks a lot!
> Marco


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

RE: Language Specific Analyzer

Reply via email to