Hi, There is WordDelimiterFilter in Solr that was also ported to Lucene Analysis module in Lucene trunk (4.0). In 3.x yu can still add solr.jar to your classpath and WordDelimiterFilterFactory to produce one (WordDelimiterFilter itself is package-private).
----- Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -----Original Message----- > From: stephen.warner.tho...@gmail.com > [mailto:stephen.warner.tho...@gmail.com] On Behalf Of Stephen Thomas > Sent: Tuesday, November 29, 2011 5:20 PM > To: java-user@lucene.apache.org > Subject: Custom Filter for Splitting CamelCase? > > List, > > I have written my own CustomAnalyzer, as follows: > > public TokenStream tokenStream(String fieldName, Reader reader) { > > // TODO: add calls to RemovePuncation, and SplitIdentifiers > here > > // First, convert to lower case > TokenStream out = new LowerCaseTokenizer(reader); > > if (this.doStopping){ > out = new StopFilter(true, out, customStopSet); > } > > if (this.doStemming){ > out = new PorterStemFilter(out); > } > > return out; > } > > > > What I need to do is write two custom filters that do the following: > > - RemovePuncation() removes all characters except [a-zA-Z], preserving case. > E.g., > > "foo=bar*45;" ==> "foo bar 45" > "fooBar" ==> "fooBar" > "\"stho...@cs.queensu.ca\"" ==> "sthomas cs queensu ca" > > > - SplitIdentifers() breaks up words based on camelCase notation: > > "fooBar" ==> "foo Bar" > "ABCCompany" ==> "ABC Company" > > (I have the regex for this.) > > Note this step must be performed before LowerCaseTokenizer, because we > need case information to do the splitting. > > > How can I write custom filters, and how do I call them before > LowerCaseTokenizer()? > > > Thanks in advance, > Steve > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org