[
https://issues.apache.org/jira/browse/LUCENE-1593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12705020#action_12705020
]
Michael McCandless commented on LUCENE-1593:
--------------------------------------------
{quote}
> Yonik does Solr have any Scorers that iterate on docs out of order? Or is
> BooleanScorer the only one we all know about?
Nope. BooleanScorer is the only one I know about. And it's sort of special
too... it's not like BooleanScorer can accept out-of-order scorers as
sub-scorers itself - the ids need to be delivered in the range of the current
bucket. IMO custom out-of-order scorers aren't supported in Lucene.
{quote}
Actually BS can accept out-of-order sub-scorers? They just have to implement
the Scorer.score(Collector, int maxDoc)? So, yes, they have to stay w/in the
requested bracket, but inside there they can do things out of order -- the
collector is an instance of BolleanScorerCollector (hmm -- mispelled -- I'll
fix) which happily accepts out of order but within bracket docs.
But it's good to know that out-of-order scorers are not generally supported
even if Lucene uses one internally for better BooleanQuery (OR) performance.
> Optimizations to TopScoreDocCollector and TopFieldCollector
> -----------------------------------------------------------
>
> Key: LUCENE-1593
> URL: https://issues.apache.org/jira/browse/LUCENE-1593
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Search
> Reporter: Shai Erera
> Fix For: 2.9
>
> Attachments: LUCENE-1593.patch, LUCENE-1593.patch, PerfTest.java
>
>
> This is a spin-off of LUCENE-1575 and proposes to optimize TSDC and TFC code
> to remove unnecessary checks. The plan is:
> # Ensure that IndexSearcher returns segements in increasing doc Id order,
> instead of numDocs().
> # Change TSDC and TFC's code to not use the doc id as a tie breaker. New docs
> will always have larger ids and therefore cannot compete.
> # Pre-populate HitQueue with sentinel values in TSDC (score = Float.NEG_INF)
> and remove the check if reusableSD == null.
> # Also move to use "changing top" and then call adjustTop(), in case we
> update the queue.
> # some methods in Sort explicitly add SortField.FIELD_DOC as a "tie breaker"
> for the last SortField. But, doing so should not be necessary (since we
> already break ties by docID), and is in fact less efficient (once the above
> optimization is in).
> # Investigate PQ - can we deprecate insert() and have only
> insertWithOverflow()? Add a addDummyObjects method which will populate the
> queue without "arranging" it, just store the objects in the array (this can
> be used to pre-populate sentinel values)?
> I will post a patch as well as some perf measurements as soon as I have them.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]