[ 
https://issues.apache.org/jira/browse/LUCENE-8477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16800692#comment-16800692
 ] 

Alan Woodward commented on LUCENE-8477:
---------------------------------------

I've opened a PR to make discussing this easier, as it's grown to a fairly big 
change (although the public API is pretty much the same): 
https://github.com/apache/lucene-solr/pull/620

I agree that rewriting queries can be sub-optimal, but I think we still need to 
make it possible to get accurate hits, which
is currently difficult to do at construction time because disjunctions can end 
up being wrapped multiple times, and the implementing classes are all 
package-private so you can't just use instanceof checks.

My suggestion is that we automatically rewrite things to match accurately, but 
add a flag to Intervals.or() that allows you to opt out of the rewriting if you 
want speed above accuracy, or if you know that the members of a disjunction 
won't overlap (for example if you have no synonyms and so know that there are 
no stacked tokens).

> Improve handling of inner disjunctions in intervals
> ---------------------------------------------------
>
>                 Key: LUCENE-8477
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8477
>             Project: Lucene - Core
>          Issue Type: New Feature
>            Reporter: Alan Woodward
>            Priority: Major
>         Attachments: LUCENE-8477.patch, LUCENE-8477.patch, LUCENE-8477.patch, 
> LUCENE-8477.patch
>
>
> The current implementation of the disjunction interval produced by 
> {{Intervals.or}} is a direct implementation of the OR operator from the Vigna 
> paper.  This produces minimal intervals, meaning that (a) is preferred over 
> (a b), and (b) also over (a b).  This has advantages when it comes to 
> counting intervals for scoring, but also has drawbacks when it comes to 
> matching.  For example, a phrase query for ((a OR (a b)) BLOCK (c)) will not 
> match the document (a b c), because (a) will be preferred over (a b), and (a 
> c) does not match.
> This ticket is to discuss the best way of dealing with disjunctions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to