[ https://issues.apache.org/jira/browse/LUCENE-10140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17427544#comment-17427544 ]
ASF subversion and git services commented on LUCENE-10140: ---------------------------------------------------------- Commit ca073c98fa83c3f8d73552e9078eff4f5fe36c19 in lucene's branch refs/heads/main from Alan Woodward [ https://gitbox.apache.org/repos/asf?p=lucene.git;h=ca073c9 ] LUCENE-10140: Correct minimizing iterator sub-matches (#370) Some interval iterators will attempt to minimize themselves by moving sub-iterators forward until they are no longer positioned within the current match. This causes problems when we try and pull Matches for these iterators, as their sub-iterators are now out of position. We have previously tried to deal with this by introducing caching iterators that check to see if they have been moved beyond the end of the current interval, but this fails in cases where an interval can contain multiple copies of a particular iterator. This commit adds a the ability for minimizing iterators to signal to their children when a prospective match has been found, so that they can cache their positions and offsets. Co-authored-by: Nikolay Khitrin <khit...@gmail.com> > Minimizing intervals can give inaccurate positions for duplicate terms > ---------------------------------------------------------------------- > > Key: LUCENE-10140 > URL: https://issues.apache.org/jira/browse/LUCENE-10140 > Project: Lucene - Core > Issue Type: Bug > Components: modules/queries > Reporter: Nikolay Khitrin > Priority: Minor > Fix For: main (9.0) > > Time Spent: 50m > Remaining Estimate: 0h > > Minimizing intervals (maybe just ORDERED and AT_LEAST, but not sure) can move > sub iterators to non-sub-match position *inside* match window, but > CachingMatchesIterator logic relies on heuristic that any position inside > matching interval is a sub-match. > For example: ORDERED("a", "b", "a") over "a b a" highlights (report > sub-matches) only "a <b>b</b> <b>a</b>", and ORDERED("a", "b", "a", "b", "a") > highlights only "a b <b>a</b> <b>b</b> <b>a</b>". > Looks like there is no way to determine the right moment to cache from > caching iterator perspective, so I propose to add an interface allowing > minimizing IntervalIterators notify sub-sources positioned at sub-match > positions. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org