[ https://issues.apache.org/jira/browse/LUCENE-949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Steve Rowe updated LUCENE-949: ------------------------------ Attachment: LUCENE-949.patch Hi [~talli...@mitre.org], Sorry it took so long, I've attached a patch based on your patch with some fixes: * Removed tabs. * Restored license header and class javadoc to {{AnalyzingQueryParser.java}} (your patch removed them for some reason?). * Converted all code indentation to 2 spaces per level (you had a lot of 3 space per level indentation). * Converted the {{wildcardPattern}} to allow anything to be escaped, not just backslashes and wildcard chars '?' and '*'. Also removed the optional backslashes from group 2 (the actual wildcards) - when iterating over wildcardPattern matches, your patch would throw away any number of real wildcards following an escaped wildcard. I added a test for this. * When multiple output tokens are produced (and there should only be one), now reporting all of them in the exception message instead of just the first two. * Removed all references to "chunklet" in favor of "output token" - this non-standard terminology made the code harder to read. * Changed descriptions of multiple output tokens to not necessarily be as the result of splitting (e.g. synonyms). * In {{analyzeSingleChunk()}}, moved exception throwing to the source of problems. I also added a {{CHANGES.txt}} entry. Tim, let me know if you think my changes are okay - if so, I think it's ready to commit. > AnalyzingQueryParser can't work with leading wildcards. > ------------------------------------------------------- > > Key: LUCENE-949 > URL: https://issues.apache.org/jira/browse/LUCENE-949 > Project: Lucene - Core > Issue Type: Bug > Components: core/queryparser > Affects Versions: 2.2 > Reporter: Stefan Klein > Attachments: LUCENE-949.patch, LUCENE-949.patch, LUCENE-949.patch > > > The getWildcardQuery mehtod in AnalyzingQueryParser.java need the following > changes to accept leading wildcards: > protected Query getWildcardQuery(String field, String termStr) throws > ParseException > { > String useTermStr = termStr; > String leadingWildcard = null; > if ("*".equals(field)) > { > if ("*".equals(useTermStr)) > return new MatchAllDocsQuery(); > } > boolean hasLeadingWildcard = (useTermStr.startsWith("*") || > useTermStr.startsWith("?")) ? true : false; > if (!getAllowLeadingWildcard() && hasLeadingWildcard) > throw new ParseException("'*' or '?' not allowed as > first character in WildcardQuery"); > if (getLowercaseExpandedTerms()) > { > useTermStr = useTermStr.toLowerCase(); > } > if (hasLeadingWildcard) > { > leadingWildcard = useTermStr.substring(0, 1); > useTermStr = useTermStr.substring(1); > } > List tlist = new ArrayList(); > List wlist = new ArrayList(); > /* > * somewhat a hack: find/store wildcard chars in order to put > them back > * after analyzing > */ > boolean isWithinToken = (!useTermStr.startsWith("?") && > !useTermStr.startsWith("*")); > isWithinToken = true; > StringBuffer tmpBuffer = new StringBuffer(); > char[] chars = useTermStr.toCharArray(); > for (int i = 0; i < useTermStr.length(); i++) > { > if (chars[i] == '?' || chars[i] == '*') > { > if (isWithinToken) > { > tlist.add(tmpBuffer.toString()); > tmpBuffer.setLength(0); > } > isWithinToken = false; > } > else > { > if (!isWithinToken) > { > wlist.add(tmpBuffer.toString()); > tmpBuffer.setLength(0); > } > isWithinToken = true; > } > tmpBuffer.append(chars[i]); > } > if (isWithinToken) > { > tlist.add(tmpBuffer.toString()); > } > else > { > wlist.add(tmpBuffer.toString()); > } > // get Analyzer from superclass and tokenize the term > TokenStream source = getAnalyzer().tokenStream(field, new > StringReader(useTermStr)); > org.apache.lucene.analysis.Token t; > int countTokens = 0; > while (true) > { > try > { > t = source.next(); > } > catch (IOException e) > { > t = null; > } > if (t == null) > { > break; > } > if (!"".equals(t.termText())) > { > try > { > tlist.set(countTokens++, t.termText()); > } > catch (IndexOutOfBoundsException ioobe) > { > countTokens = -1; > } > } > } > try > { > source.close(); > } > catch (IOException e) > { > // ignore > } > if (countTokens != tlist.size()) > { > /* > * this means that the analyzer used either added or > consumed > * (common for a stemmer) tokens, and we can't build a > WildcardQuery > */ > throw new ParseException("Cannot build WildcardQuery > with analyzer " + getAnalyzer().getClass() > + " - tokens added or lost"); > } > if (tlist.size() == 0) > { > return null; > } > else if (tlist.size() == 1) > { > if (wlist.size() == 1) > { > /* > * if wlist contains one wildcard, it must be > at the end, > * because: 1) wildcards at 1st position of a > term by > * QueryParser where truncated 2) if wildcard > was *not* in end, > * there would be *two* or more tokens > */ > StringBuffer sb = new StringBuffer(); > if (hasLeadingWildcard) > { > // adding leadingWildcard > sb.append(leadingWildcard); > } > sb.append((String) tlist.get(0)); > sb.append(wlist.get(0).toString()); > return super.getWildcardQuery(field, > sb.toString()); > } > else if (wlist.size() == 0 && hasLeadingWildcard) > { > /* > * if wlist contains no wildcard, it must be at > 1st position > */ > StringBuffer sb = new StringBuffer(); > if (hasLeadingWildcard) > { > // adding leadingWildcard > sb.append(leadingWildcard); > } > sb.append((String) tlist.get(0)); > sb.append(wlist.get(0).toString()); > return super.getWildcardQuery(field, > sb.toString()); > } > else > { > /* > * we should never get here! if so, this method > was called with > * a termStr containing no wildcard ... > */ > throw new > IllegalArgumentException("getWildcardQuery called without wildcard"); > } > } > else > { > /* > * the term was tokenized, let's rebuild to one token > with wildcards > * put back in postion > */ > StringBuffer sb = new StringBuffer(); > if (hasLeadingWildcard) > { > // adding leadingWildcard > sb.append(leadingWildcard); > } > for (int i = 0; i < tlist.size(); i++) > { > sb.append((String) tlist.get(i)); > if (wlist != null && wlist.size() > i) > { > sb.append((String) wlist.get(i)); > } > } > return super.getWildcardQuery(field, sb.toString()); > } > } -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org