Re: Custom indexing
The standard analyzer/tokenizer should do a decent job of splitting on dot, hyphen, and underscore, in addition to whitespace and other punctuation. Can you post some specific test cases you are concerned with? (You should always run some test cases.) -- Jack Krupansky On Tue, Apr 12, 2016 at 10:35 AM, Ahmet Arslanwrote: > Hi Chamarty, > > Well, there are a lot of options here. > > 1) Use LetterTokenizer > 2) Use WordDelimeterFilter combined with WhiteSpaceTokenizer > 3) Use MappingCharFilter to replace those characters with spaces > . > . > . > > Ahmet > > > On Tuesday, April 12, 2016 3:58 PM, PrasannaKumar Chamarty < > tech.kumar...@gmail.com> wrote: > > > > Hi, > > What is the best way (in terms of maintenance required with new lucene > releases) to allow splitting of words on "." and "_" for indexing ? Thank > you. > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >
Re: Custom indexing
Hi Chamarty, Well, there are a lot of options here. 1) Use LetterTokenizer 2) Use WordDelimeterFilter combined with WhiteSpaceTokenizer 3) Use MappingCharFilter to replace those characters with spaces . . . Ahmet On Tuesday, April 12, 2016 3:58 PM, PrasannaKumar Chamartywrote: Hi, What is the best way (in terms of maintenance required with new lucene releases) to allow splitting of words on "." and "_" for indexing ? Thank you. - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Jackrabbit - Custom indexing
Hi, What is the best way (in terms of maintenance required with new lucene releases) to allow splitting of words (into tokens) on "." and "_" for indexing ? Please note that I am using lucene through Jackrabbit. Jackrabbit's Search configuration can be found at http://wiki.apache.org/jackrabbit/Search The default analyzer is org.apache.lucene.analysis.standard.StandardAnalyzer If writing custom analyzer is the only option, how to do that without maintenance overhead with new lucene releases. Thank you.