[ https://issues.apache.org/jira/browse/LUCENE-7628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15818614#comment-15818614 ]
Alan Woodward commented on LUCENE-7628: --------------------------------------- Turns out it's not as simple as just adding the method, because DisjunctionScorer is package-private and so it can't be accessed from the highlighter code. There are a couple of options I see: * add getMatchingChildren() to Scorer itself - a fairly minimal change (default implementation just forwards to getChildren()), but increases the Scorer API surface area * make DisjunctionScorer.getChildren() only return matching children - this is a bigger change, altering current behaviour and adding IOException to the getChildren() signature, although it's still pretty small in terms of the number of changed lines in the codebase. Any opinions? > Add a getMatchingChildren() method to DisjunctionScorer > ------------------------------------------------------- > > Key: LUCENE-7628 > URL: https://issues.apache.org/jira/browse/LUCENE-7628 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Alan Woodward > Assignee: Alan Woodward > Priority: Minor > > This one is a bit convoluted, so bear with me... > The luwak highlighter works by rewriting queries into their Span-equivalents, > and then running them with a special Collector. At each matching doc, the > highlighter gathers all the Spans objects positioned on the current doc and > collects their positions using the SpanCollection API. > Some queries can't be translated into Spans. For those queries that generate > Scorers with ChildScorers, like BooleanQuery, we can call .getChildren() on > the Scorer and see if any of them are SpanScorers, and for those that aren't > we can call .getChildren() again and recurse down. For each child scorer, we > check that it's positioned on the current document, so non-matching > subscorers can be skipped. > This all works correctly *except* in the case of a DisjunctionScorer where > one of the children is a two-phase iterator that has matched its > approximation, but not its refinement query. A SpanScorer in this situation > will be correctly positioned on the current document, but its Spans will be > in an undefined state, meaning the highlighter will either collect incorrect > hits, or it will throw an Exception and prevent hits being collected from > other subspans. > We've tried various ways around this (including forking SpanNearQuery and > adding a bunch of slow position checks to it that are used only by the > highlighting code), but it turns out that the simplest fix is to add a new > method to DisjunctionScorer that only returns the currently matching child > Scorers. It's a bit of a hack, and it won't be used anywhere else, but it's > a fairly small and contained hack. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org