As a start Diego, how do you currently parse the user query to build the Lucene queries ?
Cheers 2015-07-22 8:35 GMT+01:00 Diego Socaceti <socac...@gmail.com>: > Hi Alessandro, > > yes, i want the user to be able to surround the query with "" to run the > phrase query with a NOT tokenized phrase. > > What do i have to do? > > Thanks and Kind regards > > On Tue, Jul 21, 2015 at 2:47 PM, Alessandro Benedetti < > benedetti.ale...@gmail.com> wrote: > > > Hey Jack, reading the doc : > > > > " Set to true if phrase queries will be automatically generated when the > > analyzer returns more than one term from whitespace delimited text. NOTE: > > this behavior may not be suitable for all languages. > > > > Set to false if phrase queries should only be generated when surrounded > by > > double quotes." > > > > > > In the user case , i guess he's likely to use double quotes. > > > > The only problem he sees so far is that the phrase query uses the query > > time analyser to actually split the tokens. > > > > First we need a feedback from him, but I guess he would like to have the > > phrase query, to not tokenise the text within the double quotes. > > > > In the case we should find a way. > > > > > > Cheers > > > > 2015-07-21 13:12 GMT+01:00 Jack Krupansky <jack.krupan...@gmail.com>: > > > > > If you don't explicitly enable automatic phrase queries, the Lucene > query > > > parser will assume an OR operator on the sub-terms when a white > > > space-delimited term analyzes into a sequence of terms. > > > > > > See: > > > > > > > > > https://lucene.apache.org/core/5_2_0/queryparser/org/apache/lucene/queryparser/classic/QueryParserBase.html#setAutoGeneratePhraseQueries(boolean) > > > > > > > > > -- Jack Krupansky > > > > > > On Fri, Jul 17, 2015 at 4:41 AM, Diego Socaceti <socac...@gmail.com> > > > wrote: > > > > > > > Hi all, > > > > > > > > i'm new to lucene and tried to write my own analyzer to support > > > > hyphenated words like wi-fi, jean-pierre, etc. > > > > For our customer it is important to find the word > > > > - wi-fi by wi, fi, wifi, wi-fi > > > > - jean-pierre by jean, pierre, jean-pierre, jean-* > > > > > > > > > > > > > > > > > > > > The analyzer: > > > > public class SupportHyphenatedWordsAnalyzer extends Analyzer { > > > > > > > > protected NormalizeCharMap charConvertMap; > > > > > > > > public MinLuceneAnalyzer() { > > > > initCharConvertMap(); > > > > } > > > > > > > > protected void initCharConvertMap() { > > > > NormalizeCharMap.Builder builder = new > NormalizeCharMap.Builder(); > > > > builder.add("\"", ""); > > > > charConvertMap = builder.build(); > > > > } > > > > > > > > @Override > > > > protected TokenStreamComponents createComponents(final String > > > fieldName) > > > > { > > > > > > > > final Tokenizer src = new WhitespaceTokenizer(); > > > > > > > > TokenStream tok = new WordDelimiterFilter(src, > > > > WordDelimiterFilter.PRESERVE_ORIGINAL > > > > | WordDelimiterFilter.GENERATE_WORD_PARTS > > > > | WordDelimiterFilter.GENERATE_NUMBER_PARTS > > > > | WordDelimiterFilter.CATENATE_WORDS, > > > > null); > > > > tok = new LowerCaseFilter(tok); > > > > tok = new LengthFilter(tok, 1, 255); > > > > tok = new StopFilter(tok, StopAnalyzer.ENGLISH_STOP_WORDS_SET); > > > > > > > > return new TokenStreamComponents(src, tok); > > > > } > > > > > > > > @Override > > > > protected Reader initReader(String fieldName, Reader reader) { > > > > return new MappingCharFilter(charConvertMap, reader); > > > > } > > > > } > > > > > > > > > > > > > > > > > > > > > > > > The analyzer seems to work except for exact phrase match queries. > > > > > > > > e.g. the following words are indexed > > > > > > > > FD-A320-REC-SIM-1 > > > > FD-A320-REC-SIM-10 > > > > FD-A320-REC-SIM-11 > > > > MIA-FD-A320-REC-SIM-1 > > > > SIN-FD-A320-REC-SIM-1 > > > > > > > > > > > > The (exact) query "FD-A320-REC-SIM-1" returns > > > > FD-A320-REC-SIM-1 > > > > MIA-FD-A320-REC-SIM-1 > > > > SIN-FD-A320-REC-SIM-1 > > > > > > > > for our customer this is wrong because this exact phrase match > > > > query should only return the single entry FD-A320-REC-SIM-1 > > > > > > > > Do you have any ideas or tips, how we have to change our current > > > > analyzer to support this requirement??? > > > > > > > > > > > > Thanks and Kind regards > > > > Diego > > > > > > > > > > > > > > > -- > > -------------------------- > > > > Benedetti Alessandro > > Visiting card - http://about.me/alessandro_benedetti > > Blog - http://alexbenedetti.blogspot.co.uk > > > > "Tyger, tyger burning bright > > In the forests of the night, > > What immortal hand or eye > > Could frame thy fearful symmetry?" > > > > William Blake - Songs of Experience -1794 England > > > -- -------------------------- Benedetti Alessandro Visiting card - http://about.me/alessandro_benedetti Blog - http://alexbenedetti.blogspot.co.uk "Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry?" William Blake - Songs of Experience -1794 England