Yes what I meant is that you actually can use your analyser when the query is not in the quotes. When in the quotes you can directly build a term Query out of it. Now of course it is not so simple scenario, do you think quoted query and not quoted query parts are 2 different set of queries, which intersection is always empty ? i.e. a user OR ask for a quoted query OR for a classic query ? In that scenario it will be simple.
In the case of a mix, we should take a look better to the lucene query parser code and see how the tokenization of content within quotes is handled. Cheers 2015-07-22 11:32 GMT+01:00 Diego Socaceti <socac...@gmail.com>: > sorry little code refactoring typo: curTokenProcessed should be > userCriteriaProcessed > > ... > > public static final String EXACT_SEARCH_FORMAT = "\"%s\""; > public static final String MULTIPLE_CHARACTER_WILDCARD = "*"; > > ... > > if (isExactCriteriaString(userCriteria)) { > String userCriteriaEscaped = String.format(EXACT_SEARCH_FORMAT, > escape(userCriteria.substring(1, userCriteria.length() - 1))); > userCriteriaProcessed = userCriteriaEscaped; > } else { > userCriteriaProcessed = escape(userCriteria); > > if (!userCriteria.endsWith(MULTIPLE_CHARACTER_WILDCARD)) { > userCriteriaProcessed += MULTIPLE_CHARACTER_WILDCARD; > } > } > > > String queryStr = ""; > > for (String fieldName : fields) { > String escapedFieldName = escape(fieldName); > queryStr += String.format("%s:%s ", escapedFieldName, > userCriteriaProcessed); > } > > query = new QueryParser("", analyzer).parse(queryStr.trim()); > > ... > > On Wed, Jul 22, 2015 at 12:27 PM, Diego Socaceti <socac...@gmail.com> > wrote: > > > Hi Alessandro, > > > > sorry, that i forgot the important part. Here it is: > > > > ... > > > > public static final String EXACT_SEARCH_FORMAT = "\"%s\""; > > public static final String MULTIPLE_CHARACTER_WILDCARD = "*"; > > > > ... > > > > if (isExactCriteriaString(userCriteria)) { > > String userCriteriaEscaped = String.format(EXACT_SEARCH_FORMAT, > > escape(userCriteria.substring(1, userCriteria.length() - 1))); > > userCriteriaProcessed = userCriteriaEscaped; > > } else { > > userCriteriaProcessed = escape(userCriteria); > > > > if (!userCriteria.endsWith(MULTIPLE_CHARACTER_WILDCARD)) { > > userCriteriaProcessed += MULTIPLE_CHARACTER_WILDCARD; > > } > > } > > > > > > String queryStr = ""; > > > > for (String fieldName : fields) { > > String escapedFieldName = escape(fieldName); > > queryStr += String.format("%s:%s ", escapedFieldName, > > curTokenProcessed); > > } > > > > query = new QueryParser("", analyzer).parse(queryStr.trim()); > > > > ... > > > > > > As far as i understand my problem is, that in my - naive query syntax > > based solution - > > i have to use my analyzer, which means that the userCriteria is always > > tokenized. > > > > You suggest to use the java query classes to build the query, because > than > > i can > > control if the userCriteria will be tokenized or not. > > Did i get you right? > > > > > > Thanks and Kind regards > > > > On Wed, Jul 22, 2015 at 11:44 AM, Alessandro Benedetti < > > benedetti.ale...@gmail.com> wrote: > > > >> I read briefly, correct me if I am wrong, but that is to parse the > content > >> within the quotes " . > >> But we are still at a String level. > >> I want to see how you build the phraseQuery :) > >> Taking a look to the code the PhraseQuery allow you to add as many terms > >> you want. > >> > >> What you need to do, it's to not tokenise the content within the quotes > >> and > >> create actually a TermQuery ( in your case you are not even using the > >> feature offered by the phrase query regarding positions, you simply want > >> to > >> run a TermQuery) . > >> > >> So to clarify you should parse the content within the quotes ( as you > are > >> doing), than building a TermQuery out of that String, not tokenized at > >> all. > >> > >> Does this make sense to you ? > >> Can I see what you do after identifying the content within the quotes ? > >> > >> Cheers > >> > >> > >> 2015-07-22 10:20 GMT+01:00 Diego Socaceti <socac...@gmail.com>: > >> > >> > Hi Alessandro, > >> > > >> > i guess code says more than worlds :) > >> > > >> > ... > >> > > >> > public static final String EXACT_SEARCH_FORMAT = "\"%s\""; > >> > public static final String MULTIPLE_CHARACTER_WILDCARD = "*"; > >> > > >> > ... > >> > > >> > if (isExactCriteriaString(userCriteria)) { > >> > String userCriteriaEscaped = String.format(EXACT_SEARCH_FORMAT, > >> > escape(userCriteria.substring(1, userCriteria.length() - 1))); > >> > userCriteriaProcessed = userCriteriaEscaped; > >> > } else { > >> > userCriteriaProcessed = escape(userCriteria); > >> > > >> > if (!userCriteria.endsWith(MULTIPLE_CHARACTER_WILDCARD)) { > >> > userCriteriaProcessed += MULTIPLE_CHARACTER_WILDCARD; > >> > } > >> > } > >> > > >> > ... > >> > > >> > public static String escape(String s) { > >> > String result = s; > >> > > >> > if (s != null && !s.trim().isEmpty()) { > >> > String toEscape = s.trim(); > >> > > >> > if (toEscape.contains("*")) { > >> > StringBuilder sb = new StringBuilder(); > >> > > >> > for (int i = 0; i < toEscape.length(); i++) { > >> > char curChar = toEscape.charAt(i); > >> > if (curChar == '*') > >> > sb.append('*'); > >> > else > >> > sb.append(QueryParser.escape(toEscape.substring(i, i + 1))); > >> > } > >> > > >> > result = sb.toString(); > >> > } else { > >> > result = QueryParser.escape(toEscape); > >> > } > >> > } > >> > > >> > return result; > >> > } > >> > > >> > ... > >> > > >> > Thanks and Kind regards > >> > > >> > > >> > > >> > On Wed, Jul 22, 2015 at 11:04 AM, Alessandro Benedetti < > >> > benedetti.ale...@gmail.com> wrote: > >> > > >> > > As a start Diego, how do you currently parse the user query to build > >> the > >> > > Lucene queries ? > >> > > > >> > > Cheers > >> > > > >> > > 2015-07-22 8:35 GMT+01:00 Diego Socaceti <socac...@gmail.com>: > >> > > > >> > > > Hi Alessandro, > >> > > > > >> > > > yes, i want the user to be able to surround the query with "" to > run > >> > the > >> > > > phrase query with a NOT tokenized phrase. > >> > > > > >> > > > What do i have to do? > >> > > > > >> > > > Thanks and Kind regards > >> > > > > >> > > > On Tue, Jul 21, 2015 at 2:47 PM, Alessandro Benedetti < > >> > > > benedetti.ale...@gmail.com> wrote: > >> > > > > >> > > > > Hey Jack, reading the doc : > >> > > > > > >> > > > > " Set to true if phrase queries will be automatically generated > >> when > >> > > the > >> > > > > analyzer returns more than one term from whitespace delimited > >> text. > >> > > NOTE: > >> > > > > this behavior may not be suitable for all languages. > >> > > > > > >> > > > > Set to false if phrase queries should only be generated when > >> > surrounded > >> > > > by > >> > > > > double quotes." > >> > > > > > >> > > > > > >> > > > > In the user case , i guess he's likely to use double quotes. > >> > > > > > >> > > > > The only problem he sees so far is that the phrase query uses > the > >> > query > >> > > > > time analyser to actually split the tokens. > >> > > > > > >> > > > > First we need a feedback from him, but I guess he would like to > >> have > >> > > the > >> > > > > phrase query, to not tokenise the text within the double quotes. > >> > > > > > >> > > > > In the case we should find a way. > >> > > > > > >> > > > > > >> > > > > Cheers > >> > > > > > >> > > > > 2015-07-21 13:12 GMT+01:00 Jack Krupansky < > >> jack.krupan...@gmail.com > >> > >: > >> > > > > > >> > > > > > If you don't explicitly enable automatic phrase queries, the > >> Lucene > >> > > > query > >> > > > > > parser will assume an OR operator on the sub-terms when a > white > >> > > > > > space-delimited term analyzes into a sequence of terms. > >> > > > > > > >> > > > > > See: > >> > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > https://lucene.apache.org/core/5_2_0/queryparser/org/apache/lucene/queryparser/classic/QueryParserBase.html#setAutoGeneratePhraseQueries(boolean) > >> > > > > > > >> > > > > > > >> > > > > > -- Jack Krupansky > >> > > > > > > >> > > > > > On Fri, Jul 17, 2015 at 4:41 AM, Diego Socaceti < > >> > socac...@gmail.com> > >> > >> > > > > > wrote: > >> > > > > > > >> > > > > > > Hi all, > >> > > > > > > > >> > > > > > > i'm new to lucene and tried to write my own analyzer to > >> support > >> > > > > > > hyphenated words like wi-fi, jean-pierre, etc. > >> > > > > > > For our customer it is important to find the word > >> > > > > > > - wi-fi by wi, fi, wifi, wi-fi > >> > > > > > > - jean-pierre by jean, pierre, jean-pierre, jean-* > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > The analyzer: > >> > > > > > > public class SupportHyphenatedWordsAnalyzer extends > Analyzer { > >> > > > > > > > >> > > > > > > protected NormalizeCharMap charConvertMap; > >> > > > > > > > >> > > > > > > public MinLuceneAnalyzer() { > >> > > > > > > initCharConvertMap(); > >> > > > > > > } > >> > > > > > > > >> > > > > > > protected void initCharConvertMap() { > >> > > > > > > NormalizeCharMap.Builder builder = new > >> > > > NormalizeCharMap.Builder(); > >> > > > > > > builder.add("\"", ""); > >> > > > > > > charConvertMap = builder.build(); > >> > > > > > > } > >> > > > > > > > >> > > > > > > @Override > >> > > > > > > protected TokenStreamComponents createComponents(final > >> String > >> > > > > > fieldName) > >> > > > > > > { > >> > > > > > > > >> > > > > > > final Tokenizer src = new WhitespaceTokenizer(); > >> > > > > > > > >> > > > > > > TokenStream tok = new WordDelimiterFilter(src, > >> > > > > > > WordDelimiterFilter.PRESERVE_ORIGINAL > >> > > > > > > | WordDelimiterFilter.GENERATE_WORD_PARTS > >> > > > > > > | WordDelimiterFilter.GENERATE_NUMBER_PARTS > >> > > > > > > | WordDelimiterFilter.CATENATE_WORDS, > >> > > > > > > null); > >> > > > > > > tok = new LowerCaseFilter(tok); > >> > > > > > > tok = new LengthFilter(tok, 1, 255); > >> > > > > > > tok = new StopFilter(tok, > >> > StopAnalyzer.ENGLISH_STOP_WORDS_SET); > >> > > > > > > > >> > > > > > > return new TokenStreamComponents(src, tok); > >> > > > > > > } > >> > > > > > > > >> > > > > > > @Override > >> > > > > > > protected Reader initReader(String fieldName, Reader > >> reader) { > >> > > > > > > return new MappingCharFilter(charConvertMap, reader); > >> > > > > > > } > >> > > > > > > } > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > The analyzer seems to work except for exact phrase match > >> queries. > >> > > > > > > > >> > > > > > > e.g. the following words are indexed > >> > > > > > > > >> > > > > > > FD-A320-REC-SIM-1 > >> > > > > > > FD-A320-REC-SIM-10 > >> > > > > > > FD-A320-REC-SIM-11 > >> > > > > > > MIA-FD-A320-REC-SIM-1 > >> > > > > > > SIN-FD-A320-REC-SIM-1 > >> > > > > > > > >> > > > > > > > >> > > > > > > The (exact) query "FD-A320-REC-SIM-1" returns > >> > > > > > > FD-A320-REC-SIM-1 > >> > > > > > > MIA-FD-A320-REC-SIM-1 > >> > > > > > > SIN-FD-A320-REC-SIM-1 > >> > > > > > > > >> > > > > > > for our customer this is wrong because this exact phrase > match > >> > > > > > > query should only return the single entry FD-A320-REC-SIM-1 > >> > > > > > > > >> > > > > > > Do you have any ideas or tips, how we have to change our > >> current > >> > > > > > > analyzer to support this requirement??? > >> > > > > > > > >> > > > > > > > >> > > > > > > Thanks and Kind regards > >> > > > > > > Diego > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > -- > >> > > > > -------------------------- > >> > > > > > >> > > > > Benedetti Alessandro > >> > > > > Visiting card - http://about.me/alessandro_benedetti > >> > > > > Blog - http://alexbenedetti.blogspot.co.uk > >> > > > > > >> > > > > "Tyger, tyger burning bright > >> > > > > In the forests of the night, > >> > > > > What immortal hand or eye > >> > > > > Could frame thy fearful symmetry?" > >> > > > > > >> > > > > William Blake - Songs of Experience -1794 England > >> > > > > > >> > > > > >> > > > >> > > > >> > > > >> > > -- > >> > > -------------------------- > >> > > > >> > > Benedetti Alessandro > >> > > Visiting card - http://about.me/alessandro_benedetti > >> > > Blog - http://alexbenedetti.blogspot.co.uk > >> > > > >> > > "Tyger, tyger burning bright > >> > > In the forests of the night, > >> > > What immortal hand or eye > >> > > Could frame thy fearful symmetry?" > >> > > > >> > > William Blake - Songs of Experience -1794 England > >> > > > >> > > >> > >> > >> > >> -- > >> -------------------------- > >> > >> Benedetti Alessandro > >> Visiting card - http://about.me/alessandro_benedetti > >> Blog - http://alexbenedetti.blogspot.co.uk > >> > >> "Tyger, tyger burning bright > >> In the forests of the night, > >> What immortal hand or eye > >> Could frame thy fearful symmetry?" > >> > >> William Blake - Songs of Experience -1794 England > >> > > > > > -- -------------------------- Benedetti Alessandro Visiting card - http://about.me/alessandro_benedetti Blog - http://alexbenedetti.blogspot.co.uk "Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry?" William Blake - Songs of Experience -1794 England