If you don't explicitly enable automatic phrase queries, the Lucene query parser will assume an OR operator on the sub-terms when a white space-delimited term analyzes into a sequence of terms.
See: https://lucene.apache.org/core/5_2_0/queryparser/org/apache/lucene/queryparser/classic/QueryParserBase.html#setAutoGeneratePhraseQueries(boolean) -- Jack Krupansky On Fri, Jul 17, 2015 at 4:41 AM, Diego Socaceti <socac...@gmail.com> wrote: > Hi all, > > i'm new to lucene and tried to write my own analyzer to support > hyphenated words like wi-fi, jean-pierre, etc. > For our customer it is important to find the word > - wi-fi by wi, fi, wifi, wi-fi > - jean-pierre by jean, pierre, jean-pierre, jean-* > > > > > The analyzer: > public class SupportHyphenatedWordsAnalyzer extends Analyzer { > > protected NormalizeCharMap charConvertMap; > > public MinLuceneAnalyzer() { > initCharConvertMap(); > } > > protected void initCharConvertMap() { > NormalizeCharMap.Builder builder = new NormalizeCharMap.Builder(); > builder.add("\"", ""); > charConvertMap = builder.build(); > } > > @Override > protected TokenStreamComponents createComponents(final String fieldName) > { > > final Tokenizer src = new WhitespaceTokenizer(); > > TokenStream tok = new WordDelimiterFilter(src, > WordDelimiterFilter.PRESERVE_ORIGINAL > | WordDelimiterFilter.GENERATE_WORD_PARTS > | WordDelimiterFilter.GENERATE_NUMBER_PARTS > | WordDelimiterFilter.CATENATE_WORDS, > null); > tok = new LowerCaseFilter(tok); > tok = new LengthFilter(tok, 1, 255); > tok = new StopFilter(tok, StopAnalyzer.ENGLISH_STOP_WORDS_SET); > > return new TokenStreamComponents(src, tok); > } > > @Override > protected Reader initReader(String fieldName, Reader reader) { > return new MappingCharFilter(charConvertMap, reader); > } > } > > > > > > The analyzer seems to work except for exact phrase match queries. > > e.g. the following words are indexed > > FD-A320-REC-SIM-1 > FD-A320-REC-SIM-10 > FD-A320-REC-SIM-11 > MIA-FD-A320-REC-SIM-1 > SIN-FD-A320-REC-SIM-1 > > > The (exact) query "FD-A320-REC-SIM-1" returns > FD-A320-REC-SIM-1 > MIA-FD-A320-REC-SIM-1 > SIN-FD-A320-REC-SIM-1 > > for our customer this is wrong because this exact phrase match > query should only return the single entry FD-A320-REC-SIM-1 > > Do you have any ideas or tips, how we have to change our current > analyzer to support this requirement??? > > > Thanks and Kind regards > Diego >