Hi, > The main use case is searching in file names. For example, lucene.txt, > lucene_new.txt, lucene_1_new.txt. If I use 'lucene', I need to get all 3 > files. with 'new' I need to get last two files. Please note that Standard > analyzer/tokenizer of lucene 3.6 is not giving us the results with > tokenization of "." and "_". Are you referring to later versions than 3.6 ?
Hi StandardTokenizer in 3.6 is the old, non Unicode-compliant tokenizer classic tokenizer. In Lucene 4+ it is called "ClassicTokenizer" because it is still used by some users, but newer code should use the new StandardTokenizer. From Lucene 4 on, StandardTokenizer implements the Word Break rules from the Unicode Text Segmentation algorithm, as specified in Unicode Standard Annex #29. This one is not available in such old Lucene versions, sorry. Your only chance is LetterTokenizer or write your own. Uwe --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org