[ 
https://issues.apache.org/jira/browse/LUCENE-4600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13526716#comment-13526716
 ] 

Shai Erera commented on LUCENE-4600:
------------------------------------

I'd rather if we rename this issue to something like "implement an 
in-collection FacetsAccumulator/Collector". I don't think that "facets should" 
aggregate only one way. There are many faceting examples, and some will have 
different flavors than others.

However, if this new Collector will perform better on a 'common' case, then I'm 
+1 for making it the default.

Note that I put 'common' in quotes. The benchmark that you're doing indexing 
Wikipedia w/ a single Date facet dimension is not common. I think that we 
should define the common case, maybe following how Solr users use facets. I.e., 
is it the eCommerce case, where each document is associated with <10 
dimensions, and each dimension is not very deep (say, depth <= 3)? If so, let's 
say that the facets defaults are tuned for that case, and then we benchmark it.

After we have such benchmark, we can compare the two aggregating collectors and 
decide which should be default.

And we should also define other scenarios too: few dimensions, flat taxonomies, 
but with hundred thousands or millions of categories -- what 
FacetsAccumulator/Collector (including maybe an entirely different indexing 
chain) suits that case?

We then document some recipes on the Wiki, and recommend the best configuration 
for each case.
                
> Facets should aggregate during collection, not at the end
> ---------------------------------------------------------
>
>                 Key: LUCENE-4600
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4600
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>
> Today the facet module simply gathers all hits (as a bitset, optionally with 
> a float[] to hold scores as well, if you will aggregate them) during 
> collection, and then at the end when you call getFacetsResults(), it makes a 
> 2nd pass over all those hits doing the actual aggregation.
> We should investigate just aggregating as we collect instead, so we don't 
> have to tie up transient RAM (fairly small for the bit set but possibly big 
> for the float[]).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to