[
https://issues.apache.org/jira/browse/LUCENE-1486?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734241#action_12734241
]
Adriano Crestani commented on LUCENE-1486:
------------------------------------------
Hi Mark H.,
Thanks for the response, some comments inline:
{quote}
Correct, the "inner phrase" example was a term not a phrase. This is perhaps a
better example:
checkBadQuery("\"jo* \"percival smith\" \""); //phrases inside phrases is bad
{quote}
I think you did not get what I meant, even with your new example, there is no
inner phrase, it is: a phrase <"jo* ">, followed by a term <percival>, followed
by another term <smith>, and an empty phrase <" ">. So, with your change, the
junit passes, but for the wrong reason. It gets an exception complaining about
the empty phrase and not because there is an inner phrase (I still don't see
how you can type an inner phrase with the current syntax). I think it's not a
big deal, but I'm just trying to understand and raise a probable wrong test. I
expect you understood what I mean, let me know if I did not make it clear.
{quote}
The Junit is currently the main form of documentation
{quote}
But not the ideal, because the source code (junit code) is not released in the
binary release. So, the ideal place should be in the javadocs.
{quote}
* Wildcard/fuzzy/range clauses can be used to define a phrase element (as
opposed to simply single terms)
* Brackets are used to group/define the acceptable variations for a given
phrase element e.g. "(john OR jonathon) smith"
* "AND" is irrelevant - there is effectively an implied "AND_NEXT_TO"
binding all phrase elements
{quote}
Thanks, now it's clearer for me what is supported or not. I have some questions:
I understand this AND_NEXT_TO implicit operator between the queries inside the
phrase. However, what happens if the user do not type any explicit boolean
operator between two terms inside parentheses: "(query parser) lucene". Is the
operator between 'query' and 'parser' the implicit AND_NEXT_TO or the default
boolean operator (usually OR)?
What happens if I type "(query AND parser) lucene". In my point of view it is:
"(query AND parser) AND_NEXT_TO lucene". Which means for me: find any document
that contains the term 'query' and the term 'parser' in the position x, and the
term 'lucene' in the position x+1. Is this the expected behaviour?
{quote}
1) Keep in core and improve error reporting and documentation
2) Move into "contrib" as experimental
3) Retain in core but simplify it to support only the simplest syntax (as in my
Britney~ example)
4) Re-engineer the QueryParser.jj to support a formally defined syntax for
acceptable "within phrase" operators e.g. *, ~, ( )
{quote}
1 is good, but I would prefer 4 too. Documentation and throw the right
exception are necessary. I just don't feel confortable on the complex phrase
query parser relying on the main query parser syntax, any change on the main
one could easialy brake the complex phrase QP. Anyway, 4 may be done in future
:)
Mark M.:
{quote}
With the new info from Mark H, how hard would it be to create a new imp for the
new parser that did a lot of this, in a more defined way? It seems you
basically just want to be able to use multiterm queries and group/or things,
right? We could even relax a little if we have to. This hasn't been released,
so there is still a lot of wiggle room I think. But there does have to be a
resolution with this and the new parser at some point either way.
{quote}
Yes, I am working on the new query parser code. I started recently to read and
understand how the ComplexPhraseQP works, so I could reproduce the behaviour
using the new QP framework. I first tried to look at this QP as a user and
could not figure out what exactly I can or not do with it. I think now we are
hitting a big problem, which is related to documentation. That is why I started
raising these question, because others could also have the same issues in
future.
So, yes, I can start coding some equivalent QP using the new QP framework, I'm
just questioning and trying to understand everything before I start any coding.
I don't wanna code anything that wil throw ConcurrentModificationExceptions,
that's why I'm raising these issues now, before I start moving it to the new QP.
Best Regards,
Adriano Crestani Campos
> Wildcards, ORs etc inside Phrase queries
> ----------------------------------------
>
> Key: LUCENE-1486
> URL: https://issues.apache.org/jira/browse/LUCENE-1486
> Project: Lucene - Java
> Issue Type: Improvement
> Components: QueryParser
> Affects Versions: 2.4
> Reporter: Mark Harwood
> Assignee: Mark Harwood
> Priority: Minor
> Fix For: 2.9
>
> Attachments: ComplexPhraseQueryParser.java,
> junit_complex_phrase_qp_07_21_2009.patch,
> junit_complex_phrase_qp_07_22_2009.patch, LUCENE-1486.patch,
> LUCENE-1486.patch, LUCENE-1486.patch, LUCENE-1486.patch,
> TestComplexPhraseQuery.java
>
>
> An extension to the default QueryParser that overrides the parsing of
> PhraseQueries to allow more complex syntax e.g. wildcards in phrase queries.
> The implementation feels a little hacky - this is arguably better handled in
> QueryParser itself. This works as a proof of concept for much of the query
> parser syntax. Examples from the Junit test include:
> checkMatches("\"j* smyth~\"", "1,2"); //wildcards and fuzzies
> are OK in phrases
> checkMatches("\"(jo* -john) smith\"", "2"); // boolean logic
> works
> checkMatches("\"jo* smith\"~2", "1,2,3"); // position logic
> works.
>
> checkBadQuery("\"jo* id:1 smith\""); //mixing fields in a
> phrase is bad
> checkBadQuery("\"jo* \"smith\" \""); //phrases inside phrases
> is bad
> checkBadQuery("\"jo* [sma TO smZ]\" \""); //range queries
> inside phrases not supported
> Code plus Junit test to follow...
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]