[ 
https://issues.apache.org/jira/browse/LUCENE-10140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17427544#comment-17427544
 ] 

ASF subversion and git services commented on LUCENE-10140:
----------------------------------------------------------

Commit ca073c98fa83c3f8d73552e9078eff4f5fe36c19 in lucene's branch 
refs/heads/main from Alan Woodward
[ https://gitbox.apache.org/repos/asf?p=lucene.git;h=ca073c9 ]

LUCENE-10140: Correct minimizing iterator sub-matches (#370)

Some interval iterators will attempt to minimize themselves by moving
sub-iterators forward until they are no longer positioned within the 
current match.  This causes problems when we try and pull Matches
for these iterators, as their sub-iterators are now out of position.  We
have previously tried to deal with this by introducing caching iterators
that check to see if they have been moved beyond the end of the current
interval, but this fails in cases where an interval can contain multiple
copies of a particular iterator.

This commit adds a the ability for minimizing iterators to signal to their
children when a prospective match has been found, so that they can
cache their positions and offsets.

Co-authored-by: Nikolay Khitrin <khit...@gmail.com>

> Minimizing intervals can give inaccurate positions for duplicate terms
> ----------------------------------------------------------------------
>
>                 Key: LUCENE-10140
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10140
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: modules/queries
>            Reporter: Nikolay Khitrin
>            Priority: Minor
>             Fix For: main (9.0)
>
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> Minimizing intervals (maybe just ORDERED and AT_LEAST, but not sure) can move 
> sub iterators to non-sub-match position *inside* match window, but 
> CachingMatchesIterator logic relies on heuristic that any position inside 
> matching interval is a sub-match.
> For example: ORDERED("a", "b", "a") over "a b a" highlights (report 
> sub-matches) only "a <b>b</b> <b>a</b>", and ORDERED("a", "b", "a", "b", "a") 
>  highlights only "a b <b>a</b> <b>b</b> <b>a</b>".
> Looks like there is no way to determine the right moment to cache from 
> caching iterator perspective, so I propose to add an interface allowing 
> minimizing IntervalIterators notify sub-sources positioned at sub-match 
> positions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to