[
https://issues.apache.org/jira/browse/LUCENE-4600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13558252#comment-13558252
]
Shai Erera commented on LUCENE-4600:
------------------------------------
ok so the Decoder abstraction hurts ... that's a bummer. While dgap+vint
specialization is simple, specializing e.g. a packed-ints (or whatever other
block encoding algorithm we'll come up with on LUCENE-4609) will make the code
uglier :).
It looks like PostCollection doesn't hurt much? Can you compare it to Counting
directly? I'm confused by the results ... they seem to improve the Decoder
collector, but not sure how it will match to Counting. If the differences are
miniscule (to any direction), then it could mean good news to sampling, because
then we will be able to fold in sampling to this specialized Collector. But it
would also mean that we can fold in complements (TotalFacetCounts).
So it looks like using any abstraction will hurt us. I didn't even try
Aggregator, because it needs to either use the decoder, or do bulk-API (i.e.
the Collector will decode into an IntsRef, not using IntDecoder, and then
delegate to Aggregator) -- seems useless to me, as counting + default decoding
are the common scenario that we want to target.
Based on the Counting vs PostCollection results, we should decide whether to
always do post-collection in Counting, or not. Folding in Sampling and
Complements should be done separately, because they are not so easy to bring in
w/ the current state of the API.
> Explore facets aggregation during documents collection
> ------------------------------------------------------
>
> Key: LUCENE-4600
> URL: https://issues.apache.org/jira/browse/LUCENE-4600
> Project: Lucene - Core
> Issue Type: Improvement
> Components: modules/facet
> Reporter: Michael McCandless
> Attachments: LUCENE-4600-cli.patch, LUCENE-4600.patch,
> LUCENE-4600.patch, LUCENE-4600.patch, LUCENE-4600.patch, LUCENE-4600.patch
>
>
> Today the facet module simply gathers all hits (as a bitset, optionally with
> a float[] to hold scores as well, if you will aggregate them) during
> collection, and then at the end when you call getFacetsResults(), it makes a
> 2nd pass over all those hits doing the actual aggregation.
> We should investigate just aggregating as we collect instead, so we don't
> have to tie up transient RAM (fairly small for the bit set but possibly big
> for the float[]).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]