[ https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900278#action_12900278 ]
Terje Eggestad commented on LUCENE-1486: ---------------------------------------- Hi I'm about begin using the ComplexPhraseQueryParser with 3.0.2 as we need wildcard with phrases and proximity Our customers have a habit of including '-' in phrases which seem to trigger a bug : If you add the following tests to the TestComplexPhraseQueryParser class: checkMatches("\"joe john nosuchword\"", ""); checkMatches("\"joe-john-nosuchword\"", ""); checkMatches("\"john-nosuchword smith\"", ""); AND add a rewrite() in checkMatches() just after parse : Query q = qp.parse(qString); IndexReader reader = searcher.getIndexReader(); // need for rewrite q = q.rewrite(reader); The first two is OK, and is rewritten to: spanNear([name:joe, name:john, name:nosuchword], 0, true) name:"joe john nosuchword" The third bomb out on java.lang.IllegalArgumentException: Unknown query type "org.apache.lucene.search.PhraseQuery" found in phrase query string "john-nosuchword smith" at org.apache.lucene.queryParser.ComplexPhraseQueryParser$ComplexPhraseQuery.rewrite(ComplexPhraseQueryParser.java:281) at org.apache.lucene.queryParser.TestComplexPhraseQuery.checkMatches(TestComplexPhraseQuery.java:120) . . . I made a fix that *seem* to fixit, but I feel on very shaky ground here. I've made so many debugging hack around that I can't make a propper patch, but I added this fix to ComplexPhraseQueryParser::rewrite() just before the place the exception is thrown: } else { if (qc instanceof TermQuery) { TermQuery tq = (TermQuery) qc; allSpanClauses[i] = new SpanTermQuery(tq.getTerm()); // START FIX "A-B C" phrases } else if (qc instanceof PhraseQuery) { PhraseQuery pq = (PhraseQuery) qc; Term[] subterms = pq.getTerms(); SpanQuery[] clauses = new SpanQuery[subterms.length]; for (int j = 0; j < subterms.length; j++) { clauses[j] = new SpanTermQuery(subterms[j]); } allSpanClauses[i] = new SpanNearQuery(clauses, 0, true); // END FIX } else { throw new IllegalArgumentException("Unknown query type \"" + qc.getClass().getName() + "\" found in phrase query string \"" + phrasedQueryStringContents + "\""); } > Wildcards, ORs etc inside Phrase queries > ---------------------------------------- > > Key: LUCENE-1486 > URL: https://issues.apache.org/jira/browse/LUCENE-1486 > Project: Lucene - Java > Issue Type: Improvement > Components: QueryParser > Affects Versions: 2.4 > Reporter: Mark Harwood > Priority: Minor > Fix For: 4.0 > > Attachments: ComplexPhraseQueryParser.java, > junit_complex_phrase_qp_07_21_2009.patch, > junit_complex_phrase_qp_07_22_2009.patch, Lucene-1486 non default > field.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, > LUCENE-1486.patch, LUCENE-1486.patch, TestComplexPhraseQuery.java > > > An extension to the default QueryParser that overrides the parsing of > PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries. > The implementation feels a little hacky - this is arguably better handled in > QueryParser itself. This works as a proof of concept for much of the query > parser syntax. Examples from the Junit test include: > checkMatches("\"j* smyth~\"", "1,2"); //wildcards and fuzzies > are OK in phrases > checkMatches("\"(jo* -john) smith\"", "2"); // boolean logic > works > checkMatches("\"jo* smith\"~2", "1,2,3"); // position logic > works. > > checkBadQuery("\"jo* id:1 smith\""); //mixing fields in a > phrase is bad > checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases > is bad > checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries > inside phrases not supported > Code plus Junit test to follow... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org