Hey Jack, reading the doc : " Set to true if phrase queries will be automatically generated when the analyzer returns more than one term from whitespace delimited text. NOTE: this behavior may not be suitable for all languages.
Set to false if phrase queries should only be generated when surrounded by double quotes." In the user case , i guess he's likely to use double quotes. The only problem he sees so far is that the phrase query uses the query time analyser to actually split the tokens. First we need a feedback from him, but I guess he would like to have the phrase query, to not tokenise the text within the double quotes. In the case we should find a way. Cheers 2015-07-21 13:12 GMT+01:00 Jack Krupansky <jack.krupan...@gmail.com>: > If you don't explicitly enable automatic phrase queries, the Lucene query > parser will assume an OR operator on the sub-terms when a white > space-delimited term analyzes into a sequence of terms. > > See: > > https://lucene.apache.org/core/5_2_0/queryparser/org/apache/lucene/queryparser/classic/QueryParserBase.html#setAutoGeneratePhraseQueries(boolean) > > > -- Jack Krupansky > > On Fri, Jul 17, 2015 at 4:41 AM, Diego Socaceti <socac...@gmail.com> > wrote: > > > Hi all, > > > > i'm new to lucene and tried to write my own analyzer to support > > hyphenated words like wi-fi, jean-pierre, etc. > > For our customer it is important to find the word > > - wi-fi by wi, fi, wifi, wi-fi > > - jean-pierre by jean, pierre, jean-pierre, jean-* > > > > > > > > > > The analyzer: > > public class SupportHyphenatedWordsAnalyzer extends Analyzer { > > > > protected NormalizeCharMap charConvertMap; > > > > public MinLuceneAnalyzer() { > > initCharConvertMap(); > > } > > > > protected void initCharConvertMap() { > > NormalizeCharMap.Builder builder = new NormalizeCharMap.Builder(); > > builder.add("\"", ""); > > charConvertMap = builder.build(); > > } > > > > @Override > > protected TokenStreamComponents createComponents(final String > fieldName) > > { > > > > final Tokenizer src = new WhitespaceTokenizer(); > > > > TokenStream tok = new WordDelimiterFilter(src, > > WordDelimiterFilter.PRESERVE_ORIGINAL > > | WordDelimiterFilter.GENERATE_WORD_PARTS > > | WordDelimiterFilter.GENERATE_NUMBER_PARTS > > | WordDelimiterFilter.CATENATE_WORDS, > > null); > > tok = new LowerCaseFilter(tok); > > tok = new LengthFilter(tok, 1, 255); > > tok = new StopFilter(tok, StopAnalyzer.ENGLISH_STOP_WORDS_SET); > > > > return new TokenStreamComponents(src, tok); > > } > > > > @Override > > protected Reader initReader(String fieldName, Reader reader) { > > return new MappingCharFilter(charConvertMap, reader); > > } > > } > > > > > > > > > > > > The analyzer seems to work except for exact phrase match queries. > > > > e.g. the following words are indexed > > > > FD-A320-REC-SIM-1 > > FD-A320-REC-SIM-10 > > FD-A320-REC-SIM-11 > > MIA-FD-A320-REC-SIM-1 > > SIN-FD-A320-REC-SIM-1 > > > > > > The (exact) query "FD-A320-REC-SIM-1" returns > > FD-A320-REC-SIM-1 > > MIA-FD-A320-REC-SIM-1 > > SIN-FD-A320-REC-SIM-1 > > > > for our customer this is wrong because this exact phrase match > > query should only return the single entry FD-A320-REC-SIM-1 > > > > Do you have any ideas or tips, how we have to change our current > > analyzer to support this requirement??? > > > > > > Thanks and Kind regards > > Diego > > > -- -------------------------- Benedetti Alessandro Visiting card - http://about.me/alessandro_benedetti Blog - http://alexbenedetti.blogspot.co.uk "Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry?" William Blake - Songs of Experience -1794 England