Hi Alessandro, sorry, that i forgot the important part. Here it is:
... public static final String EXACT_SEARCH_FORMAT = "\"%s\""; public static final String MULTIPLE_CHARACTER_WILDCARD = "*"; ... if (isExactCriteriaString(userCriteria)) { String userCriteriaEscaped = String.format(EXACT_SEARCH_FORMAT, escape(userCriteria.substring(1, userCriteria.length() - 1))); userCriteriaProcessed = userCriteriaEscaped; } else { userCriteriaProcessed = escape(userCriteria); if (!userCriteria.endsWith(MULTIPLE_CHARACTER_WILDCARD)) { userCriteriaProcessed += MULTIPLE_CHARACTER_WILDCARD; } } String queryStr = ""; for (String fieldName : fields) { String escapedFieldName = escape(fieldName); queryStr += String.format("%s:%s ", escapedFieldName, curTokenProcessed); } query = new QueryParser("", analyzer).parse(queryStr.trim()); ... As far as i understand my problem is, that in my - naive query syntax based solution - i have to use my analyzer, which means that the userCriteria is always tokenized. You suggest to use the java query classes to build the query, because than i can control if the userCriteria will be tokenized or not. Did i get you right? Thanks and Kind regards On Wed, Jul 22, 2015 at 11:44 AM, Alessandro Benedetti < benedetti.ale...@gmail.com> wrote: > I read briefly, correct me if I am wrong, but that is to parse the content > within the quotes " . > But we are still at a String level. > I want to see how you build the phraseQuery :) > Taking a look to the code the PhraseQuery allow you to add as many terms > you want. > > What you need to do, it's to not tokenise the content within the quotes and > create actually a TermQuery ( in your case you are not even using the > feature offered by the phrase query regarding positions, you simply want to > run a TermQuery) . > > So to clarify you should parse the content within the quotes ( as you are > doing), than building a TermQuery out of that String, not tokenized at all. > > Does this make sense to you ? > Can I see what you do after identifying the content within the quotes ? > > Cheers > > > 2015-07-22 10:20 GMT+01:00 Diego Socaceti <socac...@gmail.com>: > > > Hi Alessandro, > > > > i guess code says more than worlds :) > > > > ... > > > > public static final String EXACT_SEARCH_FORMAT = "\"%s\""; > > public static final String MULTIPLE_CHARACTER_WILDCARD = "*"; > > > > ... > > > > if (isExactCriteriaString(userCriteria)) { > > String userCriteriaEscaped = String.format(EXACT_SEARCH_FORMAT, > > escape(userCriteria.substring(1, userCriteria.length() - 1))); > > userCriteriaProcessed = userCriteriaEscaped; > > } else { > > userCriteriaProcessed = escape(userCriteria); > > > > if (!userCriteria.endsWith(MULTIPLE_CHARACTER_WILDCARD)) { > > userCriteriaProcessed += MULTIPLE_CHARACTER_WILDCARD; > > } > > } > > > > ... > > > > public static String escape(String s) { > > String result = s; > > > > if (s != null && !s.trim().isEmpty()) { > > String toEscape = s.trim(); > > > > if (toEscape.contains("*")) { > > StringBuilder sb = new StringBuilder(); > > > > for (int i = 0; i < toEscape.length(); i++) { > > char curChar = toEscape.charAt(i); > > if (curChar == '*') > > sb.append('*'); > > else > > sb.append(QueryParser.escape(toEscape.substring(i, i + 1))); > > } > > > > result = sb.toString(); > > } else { > > result = QueryParser.escape(toEscape); > > } > > } > > > > return result; > > } > > > > ... > > > > Thanks and Kind regards > > > > > > > > On Wed, Jul 22, 2015 at 11:04 AM, Alessandro Benedetti < > > benedetti.ale...@gmail.com> wrote: > > > > > As a start Diego, how do you currently parse the user query to build > the > > > Lucene queries ? > > > > > > Cheers > > > > > > 2015-07-22 8:35 GMT+01:00 Diego Socaceti <socac...@gmail.com>: > > > > > > > Hi Alessandro, > > > > > > > > yes, i want the user to be able to surround the query with "" to run > > the > > > > phrase query with a NOT tokenized phrase. > > > > > > > > What do i have to do? > > > > > > > > Thanks and Kind regards > > > > > > > > On Tue, Jul 21, 2015 at 2:47 PM, Alessandro Benedetti < > > > > benedetti.ale...@gmail.com> wrote: > > > > > > > > > Hey Jack, reading the doc : > > > > > > > > > > " Set to true if phrase queries will be automatically generated > when > > > the > > > > > analyzer returns more than one term from whitespace delimited text. > > > NOTE: > > > > > this behavior may not be suitable for all languages. > > > > > > > > > > Set to false if phrase queries should only be generated when > > surrounded > > > > by > > > > > double quotes." > > > > > > > > > > > > > > > In the user case , i guess he's likely to use double quotes. > > > > > > > > > > The only problem he sees so far is that the phrase query uses the > > query > > > > > time analyser to actually split the tokens. > > > > > > > > > > First we need a feedback from him, but I guess he would like to > have > > > the > > > > > phrase query, to not tokenise the text within the double quotes. > > > > > > > > > > In the case we should find a way. > > > > > > > > > > > > > > > Cheers > > > > > > > > > > 2015-07-21 13:12 GMT+01:00 Jack Krupansky < > jack.krupan...@gmail.com > > >: > > > > > > > > > > > If you don't explicitly enable automatic phrase queries, the > Lucene > > > > query > > > > > > parser will assume an OR operator on the sub-terms when a white > > > > > > space-delimited term analyzes into a sequence of terms. > > > > > > > > > > > > See: > > > > > > > > > > > > > > > > > > > > > > > > > > > https://lucene.apache.org/core/5_2_0/queryparser/org/apache/lucene/queryparser/classic/QueryParserBase.html#setAutoGeneratePhraseQueries(boolean) > > > > > > > > > > > > > > > > > > -- Jack Krupansky > > > > > > > > > > > > On Fri, Jul 17, 2015 at 4:41 AM, Diego Socaceti < > > socac...@gmail.com> > > > > > > wrote: > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > i'm new to lucene and tried to write my own analyzer to support > > > > > > > hyphenated words like wi-fi, jean-pierre, etc. > > > > > > > For our customer it is important to find the word > > > > > > > - wi-fi by wi, fi, wifi, wi-fi > > > > > > > - jean-pierre by jean, pierre, jean-pierre, jean-* > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The analyzer: > > > > > > > public class SupportHyphenatedWordsAnalyzer extends Analyzer { > > > > > > > > > > > > > > protected NormalizeCharMap charConvertMap; > > > > > > > > > > > > > > public MinLuceneAnalyzer() { > > > > > > > initCharConvertMap(); > > > > > > > } > > > > > > > > > > > > > > protected void initCharConvertMap() { > > > > > > > NormalizeCharMap.Builder builder = new > > > > NormalizeCharMap.Builder(); > > > > > > > builder.add("\"", ""); > > > > > > > charConvertMap = builder.build(); > > > > > > > } > > > > > > > > > > > > > > @Override > > > > > > > protected TokenStreamComponents createComponents(final String > > > > > > fieldName) > > > > > > > { > > > > > > > > > > > > > > final Tokenizer src = new WhitespaceTokenizer(); > > > > > > > > > > > > > > TokenStream tok = new WordDelimiterFilter(src, > > > > > > > WordDelimiterFilter.PRESERVE_ORIGINAL > > > > > > > | WordDelimiterFilter.GENERATE_WORD_PARTS > > > > > > > | WordDelimiterFilter.GENERATE_NUMBER_PARTS > > > > > > > | WordDelimiterFilter.CATENATE_WORDS, > > > > > > > null); > > > > > > > tok = new LowerCaseFilter(tok); > > > > > > > tok = new LengthFilter(tok, 1, 255); > > > > > > > tok = new StopFilter(tok, > > StopAnalyzer.ENGLISH_STOP_WORDS_SET); > > > > > > > > > > > > > > return new TokenStreamComponents(src, tok); > > > > > > > } > > > > > > > > > > > > > > @Override > > > > > > > protected Reader initReader(String fieldName, Reader reader) > { > > > > > > > return new MappingCharFilter(charConvertMap, reader); > > > > > > > } > > > > > > > } > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > The analyzer seems to work except for exact phrase match > queries. > > > > > > > > > > > > > > e.g. the following words are indexed > > > > > > > > > > > > > > FD-A320-REC-SIM-1 > > > > > > > FD-A320-REC-SIM-10 > > > > > > > FD-A320-REC-SIM-11 > > > > > > > MIA-FD-A320-REC-SIM-1 > > > > > > > SIN-FD-A320-REC-SIM-1 > > > > > > > > > > > > > > > > > > > > > The (exact) query "FD-A320-REC-SIM-1" returns > > > > > > > FD-A320-REC-SIM-1 > > > > > > > MIA-FD-A320-REC-SIM-1 > > > > > > > SIN-FD-A320-REC-SIM-1 > > > > > > > > > > > > > > for our customer this is wrong because this exact phrase match > > > > > > > query should only return the single entry FD-A320-REC-SIM-1 > > > > > > > > > > > > > > Do you have any ideas or tips, how we have to change our > current > > > > > > > analyzer to support this requirement??? > > > > > > > > > > > > > > > > > > > > > Thanks and Kind regards > > > > > > > Diego > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > -------------------------- > > > > > > > > > > Benedetti Alessandro > > > > > Visiting card - http://about.me/alessandro_benedetti > > > > > Blog - http://alexbenedetti.blogspot.co.uk > > > > > > > > > > "Tyger, tyger burning bright > > > > > In the forests of the night, > > > > > What immortal hand or eye > > > > > Could frame thy fearful symmetry?" > > > > > > > > > > William Blake - Songs of Experience -1794 England > > > > > > > > > > > > > > > > > > > > > -- > > > -------------------------- > > > > > > Benedetti Alessandro > > > Visiting card - http://about.me/alessandro_benedetti > > > Blog - http://alexbenedetti.blogspot.co.uk > > > > > > "Tyger, tyger burning bright > > > In the forests of the night, > > > What immortal hand or eye > > > Could frame thy fearful symmetry?" > > > > > > William Blake - Songs of Experience -1794 England > > > > > > > > > -- > -------------------------- > > Benedetti Alessandro > Visiting card - http://about.me/alessandro_benedetti > Blog - http://alexbenedetti.blogspot.co.uk > > "Tyger, tyger burning bright > In the forests of the night, > What immortal hand or eye > Could frame thy fearful symmetry?" > > William Blake - Songs of Experience -1794 England >