RE: Custom indexing

Uwe Schindler Tue, 19 Apr 2016 01:06:34 -0700

Hi,
> The main use case is searching in file names. For example, lucene.txt,
> lucene_new.txt, lucene_1_new.txt. If I use 'lucene', I need to get all 3
> files. with 'new' I need to get last two files. Please note that Standard
> analyzer/tokenizer of lucene 3.6 is not giving us the results with
> tokenization of  "." and "_". Are you referring to later versions than 3.6 ?


Hi StandardTokenizer in 3.6 is the old, non Unicode-compliant tokenizer classic 
tokenizer. In Lucene 4+ it is called "ClassicTokenizer" because it is still 
used by some users, but newer code should use the new StandardTokenizer. From 
Lucene 4 on, StandardTokenizer implements the Word Break rules from the Unicode 
Text Segmentation algorithm, as specified in Unicode Standard Annex #29.

This one is not available in such old Lucene versions, sorry. Your only chance 
is LetterTokenizer or write your own.

Uwe


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

RE: Custom indexing

Reply via email to