[
https://issues.apache.org/jira/browse/LUCENE-4600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13526716#comment-13526716
]
Shai Erera commented on LUCENE-4600:
------------------------------------
I'd rather if we rename this issue to something like "implement an
in-collection FacetsAccumulator/Collector". I don't think that "facets should"
aggregate only one way. There are many faceting examples, and some will have
different flavors than others.
However, if this new Collector will perform better on a 'common' case, then I'm
+1 for making it the default.
Note that I put 'common' in quotes. The benchmark that you're doing indexing
Wikipedia w/ a single Date facet dimension is not common. I think that we
should define the common case, maybe following how Solr users use facets. I.e.,
is it the eCommerce case, where each document is associated with <10
dimensions, and each dimension is not very deep (say, depth <= 3)? If so, let's
say that the facets defaults are tuned for that case, and then we benchmark it.
After we have such benchmark, we can compare the two aggregating collectors and
decide which should be default.
And we should also define other scenarios too: few dimensions, flat taxonomies,
but with hundred thousands or millions of categories -- what
FacetsAccumulator/Collector (including maybe an entirely different indexing
chain) suits that case?
We then document some recipes on the Wiki, and recommend the best configuration
for each case.
> Facets should aggregate during collection, not at the end
> ---------------------------------------------------------
>
> Key: LUCENE-4600
> URL: https://issues.apache.org/jira/browse/LUCENE-4600
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Michael McCandless
>
> Today the facet module simply gathers all hits (as a bitset, optionally with
> a float[] to hold scores as well, if you will aggregate them) during
> collection, and then at the end when you call getFacetsResults(), it makes a
> 2nd pass over all those hits doing the actual aggregation.
> We should investigate just aggregating as we collect instead, so we don't
> have to tie up transient RAM (fairly small for the bit set but possibly big
> for the float[]).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]