[ 
https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900278#action_12900278
 ] 

Terje Eggestad commented on LUCENE-1486:
----------------------------------------

Hi 

I'm about begin using the ComplexPhraseQueryParser with 3.0.2 as we need 
wildcard with phrases and proximity 

Our customers have a habit of including '-' in phrases which seem to trigger a 
bug :

If you add the following tests to the TestComplexPhraseQueryParser class:

                checkMatches("\"joe john nosuchword\"", "");  
                checkMatches("\"joe-john-nosuchword\"", "");  
                checkMatches("\"john-nosuchword smith\"", "");  

AND add a rewrite() in checkMatches() just after parse :
                        Query q = qp.parse(qString);
                        IndexReader reader = searcher.getIndexReader();  // 
need for rewrite
                        q = q.rewrite(reader); 


The first two is OK, and is rewritten to:

spanNear([name:joe, name:john, name:nosuchword], 0, true)
name:"joe john nosuchword"


The third bomb out on 

java.lang.IllegalArgumentException: Unknown query type 
"org.apache.lucene.search.PhraseQuery" found in phrase query string 
"john-nosuchword smith"
        at 
org.apache.lucene.queryParser.ComplexPhraseQueryParser$ComplexPhraseQuery.rewrite(ComplexPhraseQueryParser.java:281)
        at 
org.apache.lucene.queryParser.TestComplexPhraseQuery.checkMatches(TestComplexPhraseQuery.java:120)
.
.
.


I made a fix that *seem* to fixit, but I feel on very shaky ground here.
I've made so many debugging hack around that I can't make a propper patch, but 
I added this fix to ComplexPhraseQueryParser::rewrite()
just before the place the exception is thrown:

       } else {
                if (qc instanceof TermQuery) {
                        TermQuery tq = (TermQuery) qc;
                        allSpanClauses[i] = new SpanTermQuery(tq.getTerm());

// START  FIX "A-B C" phrases
                } else if (qc instanceof PhraseQuery) {
                        PhraseQuery pq = (PhraseQuery) qc;
                        Term[] subterms = pq.getTerms();

                        SpanQuery[] clauses = new SpanQuery[subterms.length];
                        for (int j = 0; j < subterms.length; j++) {
                                clauses[j] = new SpanTermQuery(subterms[j]);
                        }
                        allSpanClauses[i] = new SpanNearQuery(clauses, 0, true);
// END FIX
                }       else {

                        throw new IllegalArgumentException("Unknown query type 
\""
                                        + qc.getClass().getName()
                                        + "\" found in phrase query string \""
                                        + phrasedQueryStringContents + "\"");
                }






> Wildcards, ORs etc inside Phrase queries
> ----------------------------------------
>
>                 Key: LUCENE-1486
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1486
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: QueryParser
>    Affects Versions: 2.4
>            Reporter: Mark Harwood
>            Priority: Minor
>             Fix For: 4.0
>
>         Attachments: ComplexPhraseQueryParser.java, 
> junit_complex_phrase_qp_07_21_2009.patch, 
> junit_complex_phrase_qp_07_22_2009.patch, Lucene-1486 non default 
> field.patch, LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch, 
> LUCENE-1486.patch, LUCENE-1486.patch, TestComplexPhraseQuery.java
>
>
> An extension to the default QueryParser that overrides the parsing of 
> PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries.
> The implementation feels a little hacky - this is arguably better handled in 
> QueryParser itself. This works as a proof of concept  for much of the query 
> parser syntax. Examples from the Junit test include:
>               checkMatches("\"j*   smyth~\"", "1,2"); //wildcards and fuzzies 
> are OK in phrases
>               checkMatches("\"(jo* -john)  smith\"", "2"); // boolean logic 
> works
>               checkMatches("\"jo*  smith\"~2", "1,2,3"); // position logic 
> works.
>               
>               checkBadQuery("\"jo*  id:1 smith\""); //mixing fields in a 
> phrase is bad
>               checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases 
> is bad
>               checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries 
> inside phrases not supported
> Code plus Junit test to follow...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to