[ https://issues.apache.org/jira/browse/LUCENE-4619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13530307#comment-13530307 ]
Gilad Barkai commented on LUCENE-4619: -------------------------------------- Throwing in a crazy idea.. can the facedIndexingParams be part of IndexWriterConfig? > Create a specialized path for facets counting > --------------------------------------------- > > Key: LUCENE-4619 > URL: https://issues.apache.org/jira/browse/LUCENE-4619 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet > Reporter: Shai Erera > Attachments: LUCENE-4619.patch > > > Mike and I have been discussing that on several issues (LUCENE-4600, > LUCENE-4602) and on GTalk ... it looks like the current API abstractions may > be responsible for some of the performance loss that we see, compared to > specialized code. > During our discussion, we've decided to target a specific use case - facets > counting and work on it, top-to-bottom by reusing as much code as possible. > Specifically, we'd like to implement a FacetsCollector/Accumulator which can > do only counting (i.e. respects only CountFacetRequest), no sampling, > partitions and complements. The API allows us to do so very cleanly, and in > the context of that issue, we'd like to do the following: > * Implement a FacetsField which takes a TaxonomyWriter, FacetIndexingParams > and CategoryPath (List, Iterable, whatever) and adds the needed information > to both the taxonomy index as well as the search index. > ** That API is similar in nature to CategoryDocumentBuilder, only easier to > consume -- it's just another field that you add to the Document. > ** We'll have two extensions for it: PayloadFacetsField and > DocValuesFacetsField, so that we can benchmark the two approaches. > Eventually, one of them we believe, will be eliminated, and we'll remain w/ > just one (hopefully the DV one). > * Implement either a FacetsAccumulator/Collector which takes a bunch of > CountFacetRequests and returns the top-counts. > ** Aggregations are done in-collection, rather than post. Note that we have > LUCENE-4600 open for exploring that. Either we finish this exploration here, > or do it there. Just FYI that the issue exists. > ** Reuses the CategoryListIterator, IntDecoder and Aggregator code. I'll open > a separate issue to explore improving that API to be bulk, and then we can > decide if this specialized Collector should use those abstractions, or be > really optimized for the facet counting case. > * At the moment, this path will assume that a document holds multiple > dimensions, but only one value from each (i.e. no Author/Shai, Author/Mike > for a document), and therefore use OrdPolicy.NO_PARENTS. > ** Later, we'd like to explore how to have this specialized path handle the > ALL_PARENTS case too, as it shouldn't be so hard to do. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org