[ https://issues.apache.org/jira/browse/LUCENE-4600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13558252#comment-13558252 ]
Shai Erera commented on LUCENE-4600: ------------------------------------ ok so the Decoder abstraction hurts ... that's a bummer. While dgap+vint specialization is simple, specializing e.g. a packed-ints (or whatever other block encoding algorithm we'll come up with on LUCENE-4609) will make the code uglier :). It looks like PostCollection doesn't hurt much? Can you compare it to Counting directly? I'm confused by the results ... they seem to improve the Decoder collector, but not sure how it will match to Counting. If the differences are miniscule (to any direction), then it could mean good news to sampling, because then we will be able to fold in sampling to this specialized Collector. But it would also mean that we can fold in complements (TotalFacetCounts). So it looks like using any abstraction will hurt us. I didn't even try Aggregator, because it needs to either use the decoder, or do bulk-API (i.e. the Collector will decode into an IntsRef, not using IntDecoder, and then delegate to Aggregator) -- seems useless to me, as counting + default decoding are the common scenario that we want to target. Based on the Counting vs PostCollection results, we should decide whether to always do post-collection in Counting, or not. Folding in Sampling and Complements should be done separately, because they are not so easy to bring in w/ the current state of the API. > Explore facets aggregation during documents collection > ------------------------------------------------------ > > Key: LUCENE-4600 > URL: https://issues.apache.org/jira/browse/LUCENE-4600 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet > Reporter: Michael McCandless > Attachments: LUCENE-4600-cli.patch, LUCENE-4600.patch, > LUCENE-4600.patch, LUCENE-4600.patch, LUCENE-4600.patch, LUCENE-4600.patch > > > Today the facet module simply gathers all hits (as a bitset, optionally with > a float[] to hold scores as well, if you will aggregate them) during > collection, and then at the end when you call getFacetsResults(), it makes a > 2nd pass over all those hits doing the actual aggregation. > We should investigate just aggregating as we collect instead, so we don't > have to tie up transient RAM (fairly small for the bit set but possibly big > for the float[]). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org