[ https://issues.apache.org/jira/browse/LUCENE-7628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15823096#comment-15823096 ]
Alan Woodward commented on LUCENE-7628: --------------------------------------- I've reverted the change. To keep the API the same size, I can try merging the functionality of getChildren() and getMatchingChildren() (my second suggestion here: https://issues.apache.org/jira/browse/LUCENE-7628?focusedCommentId=15818614&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15818614). On the issue of bulk scoring, maybe we should add a visitsSubScorers() method to Collector, analogous to needsScores(). Then we can enable bulk-scoring or not depending on the needs of the Collector implementation. This would be another way to deal with LUCENE-7365. > Add a getMatchingChildren() method to DisjunctionScorer > ------------------------------------------------------- > > Key: LUCENE-7628 > URL: https://issues.apache.org/jira/browse/LUCENE-7628 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Alan Woodward > Assignee: Alan Woodward > Priority: Minor > Fix For: 6.4 > > Attachments: LUCENE-7628.patch > > > This one is a bit convoluted, so bear with me... > The luwak highlighter works by rewriting queries into their Span-equivalents, > and then running them with a special Collector. At each matching doc, the > highlighter gathers all the Spans objects positioned on the current doc and > collects their positions using the SpanCollection API. > Some queries can't be translated into Spans. For those queries that generate > Scorers with ChildScorers, like BooleanQuery, we can call .getChildren() on > the Scorer and see if any of them are SpanScorers, and for those that aren't > we can call .getChildren() again and recurse down. For each child scorer, we > check that it's positioned on the current document, so non-matching > subscorers can be skipped. > This all works correctly *except* in the case of a DisjunctionScorer where > one of the children is a two-phase iterator that has matched its > approximation, but not its refinement query. A SpanScorer in this situation > will be correctly positioned on the current document, but its Spans will be > in an undefined state, meaning the highlighter will either collect incorrect > hits, or it will throw an Exception and prevent hits being collected from > other subspans. > We've tried various ways around this (including forking SpanNearQuery and > adding a bunch of slow position checks to it that are used only by the > highlighting code), but it turns out that the simplest fix is to add a new > method to DisjunctionScorer that only returns the currently matching child > Scorers. It's a bit of a hack, and it won't be used anywhere else, but it's > a fairly small and contained hack. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org