sorry little code refactoring typo: curTokenProcessed should be userCriteriaProcessed
... public static final String EXACT_SEARCH_FORMAT = "\"%s\""; public static final String MULTIPLE_CHARACTER_WILDCARD = "*"; ... if (isExactCriteriaString(userCriteria)) { String userCriteriaEscaped = String.format(EXACT_SEARCH_FORMAT, escape(userCriteria.substring(1, userCriteria.length() - 1))); userCriteriaProcessed = userCriteriaEscaped; } else { userCriteriaProcessed = escape(userCriteria); if (!userCriteria.endsWith(MULTIPLE_CHARACTER_WILDCARD)) { userCriteriaProcessed += MULTIPLE_CHARACTER_WILDCARD; } } String queryStr = ""; for (String fieldName : fields) { String escapedFieldName = escape(fieldName); queryStr += String.format("%s:%s ", escapedFieldName, userCriteriaProcessed); } query = new QueryParser("", analyzer).parse(queryStr.trim()); ... On Wed, Jul 22, 2015 at 12:27 PM, Diego Socaceti <socac...@gmail.com> wrote: > Hi Alessandro, > > sorry, that i forgot the important part. Here it is: > > ... > > public static final String EXACT_SEARCH_FORMAT = "\"%s\""; > public static final String MULTIPLE_CHARACTER_WILDCARD = "*"; > > ... > > if (isExactCriteriaString(userCriteria)) { > String userCriteriaEscaped = String.format(EXACT_SEARCH_FORMAT, > escape(userCriteria.substring(1, userCriteria.length() - 1))); > userCriteriaProcessed = userCriteriaEscaped; > } else { > userCriteriaProcessed = escape(userCriteria); > > if (!userCriteria.endsWith(MULTIPLE_CHARACTER_WILDCARD)) { > userCriteriaProcessed += MULTIPLE_CHARACTER_WILDCARD; > } > } > > > String queryStr = ""; > > for (String fieldName : fields) { > String escapedFieldName = escape(fieldName); > queryStr += String.format("%s:%s ", escapedFieldName, > curTokenProcessed); > } > > query = new QueryParser("", analyzer).parse(queryStr.trim()); > > ... > > > As far as i understand my problem is, that in my - naive query syntax > based solution - > i have to use my analyzer, which means that the userCriteria is always > tokenized. > > You suggest to use the java query classes to build the query, because than > i can > control if the userCriteria will be tokenized or not. > Did i get you right? > > > Thanks and Kind regards > > On Wed, Jul 22, 2015 at 11:44 AM, Alessandro Benedetti < > benedetti.ale...@gmail.com> wrote: > >> I read briefly, correct me if I am wrong, but that is to parse the content >> within the quotes " . >> But we are still at a String level. >> I want to see how you build the phraseQuery :) >> Taking a look to the code the PhraseQuery allow you to add as many terms >> you want. >> >> What you need to do, it's to not tokenise the content within the quotes >> and >> create actually a TermQuery ( in your case you are not even using the >> feature offered by the phrase query regarding positions, you simply want >> to >> run a TermQuery) . >> >> So to clarify you should parse the content within the quotes ( as you are >> doing), than building a TermQuery out of that String, not tokenized at >> all. >> >> Does this make sense to you ? >> Can I see what you do after identifying the content within the quotes ? >> >> Cheers >> >> >> 2015-07-22 10:20 GMT+01:00 Diego Socaceti <socac...@gmail.com>: >> >> > Hi Alessandro, >> > >> > i guess code says more than worlds :) >> > >> > ... >> > >> > public static final String EXACT_SEARCH_FORMAT = "\"%s\""; >> > public static final String MULTIPLE_CHARACTER_WILDCARD = "*"; >> > >> > ... >> > >> > if (isExactCriteriaString(userCriteria)) { >> > String userCriteriaEscaped = String.format(EXACT_SEARCH_FORMAT, >> > escape(userCriteria.substring(1, userCriteria.length() - 1))); >> > userCriteriaProcessed = userCriteriaEscaped; >> > } else { >> > userCriteriaProcessed = escape(userCriteria); >> > >> > if (!userCriteria.endsWith(MULTIPLE_CHARACTER_WILDCARD)) { >> > userCriteriaProcessed += MULTIPLE_CHARACTER_WILDCARD; >> > } >> > } >> > >> > ... >> > >> > public static String escape(String s) { >> > String result = s; >> > >> > if (s != null && !s.trim().isEmpty()) { >> > String toEscape = s.trim(); >> > >> > if (toEscape.contains("*")) { >> > StringBuilder sb = new StringBuilder(); >> > >> > for (int i = 0; i < toEscape.length(); i++) { >> > char curChar = toEscape.charAt(i); >> > if (curChar == '*') >> > sb.append('*'); >> > else >> > sb.append(QueryParser.escape(toEscape.substring(i, i + 1))); >> > } >> > >> > result = sb.toString(); >> > } else { >> > result = QueryParser.escape(toEscape); >> > } >> > } >> > >> > return result; >> > } >> > >> > ... >> > >> > Thanks and Kind regards >> > >> > >> > >> > On Wed, Jul 22, 2015 at 11:04 AM, Alessandro Benedetti < >> > benedetti.ale...@gmail.com> wrote: >> > >> > > As a start Diego, how do you currently parse the user query to build >> the >> > > Lucene queries ? >> > > >> > > Cheers >> > > >> > > 2015-07-22 8:35 GMT+01:00 Diego Socaceti <socac...@gmail.com>: >> > > >> > > > Hi Alessandro, >> > > > >> > > > yes, i want the user to be able to surround the query with "" to run >> > the >> > > > phrase query with a NOT tokenized phrase. >> > > > >> > > > What do i have to do? >> > > > >> > > > Thanks and Kind regards >> > > > >> > > > On Tue, Jul 21, 2015 at 2:47 PM, Alessandro Benedetti < >> > > > benedetti.ale...@gmail.com> wrote: >> > > > >> > > > > Hey Jack, reading the doc : >> > > > > >> > > > > " Set to true if phrase queries will be automatically generated >> when >> > > the >> > > > > analyzer returns more than one term from whitespace delimited >> text. >> > > NOTE: >> > > > > this behavior may not be suitable for all languages. >> > > > > >> > > > > Set to false if phrase queries should only be generated when >> > surrounded >> > > > by >> > > > > double quotes." >> > > > > >> > > > > >> > > > > In the user case , i guess he's likely to use double quotes. >> > > > > >> > > > > The only problem he sees so far is that the phrase query uses the >> > query >> > > > > time analyser to actually split the tokens. >> > > > > >> > > > > First we need a feedback from him, but I guess he would like to >> have >> > > the >> > > > > phrase query, to not tokenise the text within the double quotes. >> > > > > >> > > > > In the case we should find a way. >> > > > > >> > > > > >> > > > > Cheers >> > > > > >> > > > > 2015-07-21 13:12 GMT+01:00 Jack Krupansky < >> jack.krupan...@gmail.com >> > >: >> > > > > >> > > > > > If you don't explicitly enable automatic phrase queries, the >> Lucene >> > > > query >> > > > > > parser will assume an OR operator on the sub-terms when a white >> > > > > > space-delimited term analyzes into a sequence of terms. >> > > > > > >> > > > > > See: >> > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> https://lucene.apache.org/core/5_2_0/queryparser/org/apache/lucene/queryparser/classic/QueryParserBase.html#setAutoGeneratePhraseQueries(boolean) >> > > > > > >> > > > > > >> > > > > > -- Jack Krupansky >> > > > > > >> > > > > > On Fri, Jul 17, 2015 at 4:41 AM, Diego Socaceti < >> > socac...@gmail.com> >> >> > > > > > wrote: >> > > > > > >> > > > > > > Hi all, >> > > > > > > >> > > > > > > i'm new to lucene and tried to write my own analyzer to >> support >> > > > > > > hyphenated words like wi-fi, jean-pierre, etc. >> > > > > > > For our customer it is important to find the word >> > > > > > > - wi-fi by wi, fi, wifi, wi-fi >> > > > > > > - jean-pierre by jean, pierre, jean-pierre, jean-* >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > The analyzer: >> > > > > > > public class SupportHyphenatedWordsAnalyzer extends Analyzer { >> > > > > > > >> > > > > > > protected NormalizeCharMap charConvertMap; >> > > > > > > >> > > > > > > public MinLuceneAnalyzer() { >> > > > > > > initCharConvertMap(); >> > > > > > > } >> > > > > > > >> > > > > > > protected void initCharConvertMap() { >> > > > > > > NormalizeCharMap.Builder builder = new >> > > > NormalizeCharMap.Builder(); >> > > > > > > builder.add("\"", ""); >> > > > > > > charConvertMap = builder.build(); >> > > > > > > } >> > > > > > > >> > > > > > > @Override >> > > > > > > protected TokenStreamComponents createComponents(final >> String >> > > > > > fieldName) >> > > > > > > { >> > > > > > > >> > > > > > > final Tokenizer src = new WhitespaceTokenizer(); >> > > > > > > >> > > > > > > TokenStream tok = new WordDelimiterFilter(src, >> > > > > > > WordDelimiterFilter.PRESERVE_ORIGINAL >> > > > > > > | WordDelimiterFilter.GENERATE_WORD_PARTS >> > > > > > > | WordDelimiterFilter.GENERATE_NUMBER_PARTS >> > > > > > > | WordDelimiterFilter.CATENATE_WORDS, >> > > > > > > null); >> > > > > > > tok = new LowerCaseFilter(tok); >> > > > > > > tok = new LengthFilter(tok, 1, 255); >> > > > > > > tok = new StopFilter(tok, >> > StopAnalyzer.ENGLISH_STOP_WORDS_SET); >> > > > > > > >> > > > > > > return new TokenStreamComponents(src, tok); >> > > > > > > } >> > > > > > > >> > > > > > > @Override >> > > > > > > protected Reader initReader(String fieldName, Reader >> reader) { >> > > > > > > return new MappingCharFilter(charConvertMap, reader); >> > > > > > > } >> > > > > > > } >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > The analyzer seems to work except for exact phrase match >> queries. >> > > > > > > >> > > > > > > e.g. the following words are indexed >> > > > > > > >> > > > > > > FD-A320-REC-SIM-1 >> > > > > > > FD-A320-REC-SIM-10 >> > > > > > > FD-A320-REC-SIM-11 >> > > > > > > MIA-FD-A320-REC-SIM-1 >> > > > > > > SIN-FD-A320-REC-SIM-1 >> > > > > > > >> > > > > > > >> > > > > > > The (exact) query "FD-A320-REC-SIM-1" returns >> > > > > > > FD-A320-REC-SIM-1 >> > > > > > > MIA-FD-A320-REC-SIM-1 >> > > > > > > SIN-FD-A320-REC-SIM-1 >> > > > > > > >> > > > > > > for our customer this is wrong because this exact phrase match >> > > > > > > query should only return the single entry FD-A320-REC-SIM-1 >> > > > > > > >> > > > > > > Do you have any ideas or tips, how we have to change our >> current >> > > > > > > analyzer to support this requirement??? >> > > > > > > >> > > > > > > >> > > > > > > Thanks and Kind regards >> > > > > > > Diego >> > > > > > > >> > > > > > >> > > > > >> > > > > >> > > > > >> > > > > -- >> > > > > -------------------------- >> > > > > >> > > > > Benedetti Alessandro >> > > > > Visiting card - http://about.me/alessandro_benedetti >> > > > > Blog - http://alexbenedetti.blogspot.co.uk >> > > > > >> > > > > "Tyger, tyger burning bright >> > > > > In the forests of the night, >> > > > > What immortal hand or eye >> > > > > Could frame thy fearful symmetry?" >> > > > > >> > > > > William Blake - Songs of Experience -1794 England >> > > > > >> > > > >> > > >> > > >> > > >> > > -- >> > > -------------------------- >> > > >> > > Benedetti Alessandro >> > > Visiting card - http://about.me/alessandro_benedetti >> > > Blog - http://alexbenedetti.blogspot.co.uk >> > > >> > > "Tyger, tyger burning bright >> > > In the forests of the night, >> > > What immortal hand or eye >> > > Could frame thy fearful symmetry?" >> > > >> > > William Blake - Songs of Experience -1794 England >> > > >> > >> >> >> >> -- >> -------------------------- >> >> Benedetti Alessandro >> Visiting card - http://about.me/alessandro_benedetti >> Blog - http://alexbenedetti.blogspot.co.uk >> >> "Tyger, tyger burning bright >> In the forests of the night, >> What immortal hand or eye >> Could frame thy fearful symmetry?" >> >> William Blake - Songs of Experience -1794 England >> > >