[
https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734015#action_12734015
]
Luis Alves edited comment on LUCENE-1486 at 7/22/09 7:57 AM:
-------------------------------------------------------------
I share same opinion as Michael,
the implementation has a lot of undefined/undocumented behaviors,
simple because it reuses the queryparser to parse the text inside a phrase.
All the lucene syntax needs to be accounted on this design, but it does not
seem to be the case.
Problems like Adriano described, phrase inside a phrase, position reporting for
errors.
I also have a lot of concerns about having the full lucene syntax inside
phrases
and trying to restrict this by throwing exceptions for particular cases does
not seem the best design.
Here is a example of with OR, AND, PARENTESIS with a proximity search
"(( jakarta OR green) AND (blue AND orange) AND black~0.5) apache"~10
What should a user expect from this query, without looking at the code. I'm not
sure.
Does it even make sense to support this complex syntax? In my opinion. no
I think we should define what is the subset of the language we want to support
inside the phrases with a well defined behavior.
If Mark describes all the syntax he wants to support inside phrases, I actually
don't mind to implement a new parser.for this.
My view is, contrib is probably a better place to have this code, until we
figure out a implementation that does not impose as many restrictions on
changes to the original queryparser and describes a well defined syntax to be
applied inside phrases.
was (Author: lafa):
I share same opinion as Michael,
the implementation has a lot of undefined/undocumented behaviors,
simple because it reuses the queryparser to parse the text inside a phrase.
All the lucene syntax needs to be accounted on this design, but it does not
seem to be the case.
Problems like Adriano described, phrase inside a phrase, position reporting for
errors.
I also have a lot of concerns about having the full lucene syntax inside
phrases
and trying to restrict this by throwing exceptions for particular cases does
not seem the best design.
Here is a example of with OR, AND, PARENTESIS with a proximity search
"(( jakarta OR green) AND (blue AND orange) AND black~2) apache"~10
What should a user expect from this query, without looking at the code. I'm not
sure.
Does it even make sense to support this complex syntax? In my opinion. no
I think we should define what is the subset of the language we want to support
inside the phrases with a well defined behavior.
If Mark describes all the syntax he wants to support inside phrases, I actually
don't mind to implement a new parser.for this.
My view is, contrib is probably a better place to have this code, until we
figure out a implementation that does not impose as many restrictions on
changes to the original queryparser and describes a well defined syntax to be
applied inside phrases.
> Wildcards, ORs etc inside Phrase queries
> ----------------------------------------
>
> Key: LUCENE-1486
> URL: https://issues.apache.org/jira/browse/LUCENE-1486
> Project: Lucene - Java
> Issue Type: Improvement
> Components: QueryParser
> Affects Versions: 2.4
> Reporter: Mark Harwood
> Assignee: Mark Harwood
> Priority: Minor
> Fix For: 2.9
>
> Attachments: ComplexPhraseQueryParser.java,
> junit_complex_phrase_qp_07_21_2009.patch,
> junit_complex_phrase_qp_07_22_2009.patch, LUCENE-1486.patch,
> LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch,
> TestComplexPhraseQuery.java
>
>
> An extension to the default QueryParser that overrides the parsing of
> PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries.
> The implementation feels a little hacky - this is arguably better handled in
> QueryParser itself. This works as a proof of concept for much of the query
> parser syntax. Examples from the Junit test include:
> checkMatches("\"j* smyth~\"", "1,2"); //wildcards and fuzzies
> are OK in phrases
> checkMatches("\"(jo* -john) smith\"", "2"); // boolean logic
> works
> checkMatches("\"jo* smith\"~2", "1,2,3"); // position logic
> works.
>
> checkBadQuery("\"jo* id:1 smith\""); //mixing fields in a
> phrase is bad
> checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases
> is bad
> checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries
> inside phrases not supported
> Code plus Junit test to follow...
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]