[jira] [Commented] (LUCENE-4600) Explore facets aggregation during documents collection

Shai Erera (JIRA) Sun, 20 Jan 2013 05:56:16 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-4600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13558252#comment-13558252
 ]


Shai Erera commented on LUCENE-4600:
------------------------------------

ok so the Decoder abstraction hurts ... that's a bummer. While dgap+vint 
specialization is simple, specializing e.g. a packed-ints (or whatever other 
block encoding algorithm we'll come up with on LUCENE-4609) will make the code 
uglier :).

It looks like PostCollection doesn't hurt much? Can you compare it to Counting 
directly? I'm confused by the results ... they seem to improve the Decoder 
collector, but not sure how it will match to Counting. If the differences are 
miniscule (to any direction), then it could mean good news to sampling, because 
then we will be able to fold in sampling to this specialized Collector. But it 
would also mean that we can fold in complements (TotalFacetCounts).

So it looks like using any abstraction will hurt us. I didn't even try 
Aggregator, because it needs to either use the decoder, or do bulk-API (i.e. 
the Collector will decode into an IntsRef, not using IntDecoder, and then 
delegate to Aggregator) -- seems useless to me, as counting + default decoding 
are the common scenario that we want to target.

Based on the Counting vs PostCollection results, we should decide whether to 
always do post-collection in Counting, or not. Folding in Sampling and 
Complements should be done separately, because they are not so easy to bring in 
w/ the current state of the API.
                
> Explore facets aggregation during documents collection
> ------------------------------------------------------
>
>                 Key: LUCENE-4600
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4600
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/facet
>            Reporter: Michael McCandless
>         Attachments: LUCENE-4600-cli.patch, LUCENE-4600.patch, 
> LUCENE-4600.patch, LUCENE-4600.patch, LUCENE-4600.patch, LUCENE-4600.patch
>
>
> Today the facet module simply gathers all hits (as a bitset, optionally with 
> a float[] to hold scores as well, if you will aggregate them) during 
> collection, and then at the end when you call getFacetsResults(), it makes a 
> 2nd pass over all those hits doing the actual aggregation.
> We should investigate just aggregating as we collect instead, so we don't 
> have to tie up transient RAM (fairly small for the bit set but possibly big 
> for the float[]).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-4600) Explore facets aggregation during documents collection

Reply via email to