Hi, you cannot change the behavior of predefined analyzers! But since Lucene 5 there is no need to write your own subclass to define a custom analyzer. Just use CustomAnalyzer and define via fluent builder API how your analysis should look like (see example in javadocs):
https://lucene.apache.org/core/5_3_1/analyzers-common/org/apache/lucene/analysis/custom/CustomAnalyzer.html Please note: Language specific stemmers will fail to work correctly if the terms still contain punctuation! It also depends on the stemmer if lowercasing is needed before the stemmer. Uwe ----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -----Original Message----- > From: marco turchi [mailto:marco.tur...@gmail.com] > Sent: Saturday, November 14, 2015 5:39 PM > To: java-user@lucene.apache.org > Subject: Language Specific Analyzer > > Dear Users, > I need to develop my language specific analyzer that: > 1) does not remove punctuations > 2) lowercases and stems each term in the text. > > I have tried some of the pre-implemented language analyzer (e.g. German > and > Italian analyzers), but they remove punctuation. I/m not sure, but > probably what I need is the whitespace analyzer instead of the standard > analyzer. > > Is there a way to force each language specific analyzer to use the > whitespace analyzer or in general not to remove punctuations? > > Thanks a lot! > Marco --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org