[ https://issues.apache.org/jira/browse/LUCENE-8477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16800692#comment-16800692 ]
Alan Woodward commented on LUCENE-8477: --------------------------------------- I've opened a PR to make discussing this easier, as it's grown to a fairly big change (although the public API is pretty much the same): https://github.com/apache/lucene-solr/pull/620 I agree that rewriting queries can be sub-optimal, but I think we still need to make it possible to get accurate hits, which is currently difficult to do at construction time because disjunctions can end up being wrapped multiple times, and the implementing classes are all package-private so you can't just use instanceof checks. My suggestion is that we automatically rewrite things to match accurately, but add a flag to Intervals.or() that allows you to opt out of the rewriting if you want speed above accuracy, or if you know that the members of a disjunction won't overlap (for example if you have no synonyms and so know that there are no stacked tokens). > Improve handling of inner disjunctions in intervals > --------------------------------------------------- > > Key: LUCENE-8477 > URL: https://issues.apache.org/jira/browse/LUCENE-8477 > Project: Lucene - Core > Issue Type: New Feature > Reporter: Alan Woodward > Priority: Major > Attachments: LUCENE-8477.patch, LUCENE-8477.patch, LUCENE-8477.patch, > LUCENE-8477.patch > > > The current implementation of the disjunction interval produced by > {{Intervals.or}} is a direct implementation of the OR operator from the Vigna > paper. This produces minimal intervals, meaning that (a) is preferred over > (a b), and (b) also over (a b). This has advantages when it comes to > counting intervals for scoring, but also has drawbacks when it comes to > matching. For example, a phrase query for ((a OR (a b)) BLOCK (c)) will not > match the document (a b c), because (a) will be preferred over (a b), and (a > c) does not match. > This ticket is to discuss the best way of dealing with disjunctions. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org