[ 
https://issues.apache.org/jira/browse/LUCENE-8477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Woodward updated LUCENE-8477:
----------------------------------
    Description: 
The current implementation of the disjunction interval produced by 
{{Intervals.or}} is a direct implementation of the OR operator from the Vigna 
paper.  This produces minimal intervals, meaning that (a) is preferred over (a 
b), and (b) also over (a b).  This has advantages when it comes to counting 
intervals for scoring, but also has drawbacks when it comes to matching.  For 
example, a phrase query for ((a OR (a b)) BLOCK (c)) will not match the 
document (a b c), because (a) will be preferred over (a b), and (a c) does not 
match.

This ticket is to discuss the best way of dealing with disjunctions.

  was:
The current implementation of the disjunction interval produced by 
{{Intervals.or}} is a direct implementation of the OR operator from the Vigna 
paper.  This produces minimal intervals, meaning that (a) is preferred over (a 
b), and (b) also over (a b).  This has advantages when it comes to counting 
intervals for scoring, but also has drawbacks when it comes to matching.  For 
example, a phrase query for ((a OR (a b)) NEAR (c)) will not match the document 
(a b c), because (a) will be preferred over (a b), and (a c) does not match.

This ticket is to discuss the best way of dealing with disjunctions.


> Improve handling of inner disjunctions in intervals
> ---------------------------------------------------
>
>                 Key: LUCENE-8477
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8477
>             Project: Lucene - Core
>          Issue Type: New Feature
>            Reporter: Alan Woodward
>            Priority: Major
>
> The current implementation of the disjunction interval produced by 
> {{Intervals.or}} is a direct implementation of the OR operator from the Vigna 
> paper.  This produces minimal intervals, meaning that (a) is preferred over 
> (a b), and (b) also over (a b).  This has advantages when it comes to 
> counting intervals for scoring, but also has drawbacks when it comes to 
> matching.  For example, a phrase query for ((a OR (a b)) BLOCK (c)) will not 
> match the document (a b c), because (a) will be preferred over (a b), and (a 
> c) does not match.
> This ticket is to discuss the best way of dealing with disjunctions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to