[jira] [Commented] (LUCENE-8531) QueryBuilder hard-codes inOrder=true for generated sloppy span near queries

Jim Ferenczi (JIRA) Tue, 23 Oct 2018 10:33:06 -0700


    [ 
https://issues.apache.org/jira/browse/LUCENE-8531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16661015#comment-16661015
 ]


Jim Ferenczi commented on LUCENE-8531:
--------------------------------------

> Can you explain, or point to docs that explain what you mean?

I am referring to the javadoc of PhraseQuery#getSlop where it is explained how 
unordered terms could match:
{noformat}
* <p>The slop is an edit distance between respective positions of terms as
* defined in this {@link PhraseQuery} and the positions of terms in a
* document.
*
* <p>For instance, when searching for {@code "quick fox"}, it is expected that
* the difference between the positions of {@code fox} and {@code quick} is 1.
* So {@code "a quick brown fox"} would be at an edit distance of 1 since the
* difference of the positions of {@code fox} and {@code quick} is 2.
* Similarly, {@code "the fox is quick"} would be at an edit distance of 3
* since the difference of the positions of {@code fox} and {@code quick} is -2.
* The slop defines the maximum edit distance for a document to match.
*
* <p>More exact matches are scored higher than sloppier matches, thus search
* results are sorted by exactness.
*/{noformat}
This is different than an unordered span near query which does not take the 
terms query order into account.

This is also what is explained in the description of the issue:
{noformat}
unlike with (Multi)PhraseQuery-s, reordering edits are not allowed, so this is 
a kind of regression. {noformat}
 

> That said, there surely are potential use cases for the {{inOrder=true}} 
> behavior, which is supported by {{SpanNearQuery}} but not by 
> ({{Multi)PhraseQuery}}. Would it be worth opening a new issue to consider 
> introducing the ability to specifically request construction of 
> {{SpanNearQuery}} and/or {{inOrder=true}}behavior? The work that went into 
> building {{SpanNearQuery}} for phrases (commit 
> [96e8f0a0afe|https://github.com/apache/lucene-solr/commit/96e8f0a0afeb68e2d07ec1dda362894f0b94333d])
>  is still useful and relevant, even if the result isn't backward-compatible 
> for the case where {{slop > 0}}.

 

I think it's something specific that can be handled in a custom QueryBuilder. 
The API specifically mentions that it builds a phrase so the default 
implementation should follow the semantic of a PhraseQuery. If we can optimize 
with a SpanNearQuery instead we need to ensure that it matches the same 
document than the multi phrase queries approach. That's not the case when slop 
is greater than 0 so I think we should keep the default behavior as is. You can 
still override QueryBuilder#analyzeGraphPhrase to apply a different logic on 
your side if you want.

 

> QueryBuilder hard-codes inOrder=true for generated sloppy span near queries
> ---------------------------------------------------------------------------
>
>                 Key: LUCENE-8531
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8531
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: core/queryparser
>            Reporter: Steve Rowe
>            Assignee: Steve Rowe
>            Priority: Major
>             Fix For: 7.6, master (8.0)
>
>         Attachments: LUCENE-8531.patch
>
>
> QueryBuilder.analyzeGraphPhrase() generates SpanNearQuery-s with passed-in 
> phraseSlop, but hard-codes inOrder ctor param as true.
> Before multi-term synonym support and graph token streams introduced the 
> possibility of generating SpanNearQuery-s, QueryBuilder generated 
> (Multi)PhraseQuery-s, which always interpret slop as allowing reordering 
> edits.  Solr's eDismax query parser generates phrase queries when its 
> pf/pf2/pf3 params are specified, and when multi-term synonyms are used with a 
> graph-aware synonym filter, SpanNearQuery-s are generated that require 
> clauses to be in order; unlike with (Multi)PhraseQuery-s, reordering edits 
> are not allowed, so this is a kind of regression.  See SOLR-12243 for edismax 
> pf/pf2/pf3 context.  (Note that the patch on SOLR-12243 also addresses 
> another problem that blocks eDismax from generating queries *at all* under 
> the above-described circumstances.)
> I propose adding a new analyzeGraphPhrase() method that allows configuration 
> of inOrder, which would allow eDismax to specify inOrder=false.  The existing 
> analyzeGraphPhrase() method would remain with its hard-coded inOrder=true, so 
> existing client behavior would remain unchanged.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8531) QueryBuilder hard-codes inOrder=true for generated sloppy span near queries

Reply via email to