[
https://issues.apache.org/jira/browse/LUCENE-4600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13526678#comment-13526678
]
Michael McCandless commented on LUCENE-4600:
--------------------------------------------
bq. sampling benefits from this two pass, because that way we can guarantee a
minimum sample set size.
Ahh true ...
We have talked about adding a Scorer.getEstimatedHitCount (somewhere Robert has
a patch...), so that eg BooleanQuery can do a better job ordering its
sub-scorers, but I think we could use it for facets too (ie to pick sampling
collector or not).
But, if the estimate was off (which it's allowed to be) ... then it could get
tricky for facets, eg you may have to re-run the query with the non-sampling
collector (or with higher sampling %tg) ...
> Facets should aggregate during collection, not at the end
> ---------------------------------------------------------
>
> Key: LUCENE-4600
> URL: https://issues.apache.org/jira/browse/LUCENE-4600
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Michael McCandless
>
> Today the facet module simply gathers all hits (as a bitset, optionally with
> a float[] to hold scores as well, if you will aggregate them) during
> collection, and then at the end when you call getFacetsResults(), it makes a
> 2nd pass over all those hits doing the actual aggregation.
> We should investigate just aggregating as we collect instead, so we don't
> have to tie up transient RAM (fairly small for the bit set but possibly big
> for the float[]).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]