[ 
https://issues.apache.org/jira/browse/LUCENE-4619?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13530205#comment-13530205
 ] 

Michael McCandless commented on LUCENE-4619:
--------------------------------------------

Maybe if we rename CDB to FacetsDocumentBuilder, move it to oal.document, make 
it a single method call for the user (FDB.addFields), that's good enough 
progress for the common case for now?

I still don't like this field/dimension duality: it feels like the facet module 
is "hiding" what should be separate fields, within a single Lucene field.  If I 
need to store these fields (because I want to present them in the the UI), I'm 
already adding them as separate fields.

I think doc.add(new FacetField(...)) is more intuitive than fdb.addFields(doc, 
....) for a the common/basic use case... but at least improving CDB here would 
be progress.



                
> Create a specialized path for facets counting
> ---------------------------------------------
>
>                 Key: LUCENE-4619
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4619
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/facet
>            Reporter: Shai Erera
>         Attachments: LUCENE-4619.patch
>
>
> Mike and I have been discussing that on several issues (LUCENE-4600, 
> LUCENE-4602) and on GTalk ... it looks like the current API abstractions may 
> be responsible for some of the performance loss that we see, compared to 
> specialized code.
> During our discussion, we've decided to target a specific use case - facets 
> counting and work on it, top-to-bottom by reusing as much code as possible. 
> Specifically, we'd like to implement a FacetsCollector/Accumulator which can 
> do only counting (i.e. respects only CountFacetRequest), no sampling, 
> partitions and complements. The API allows us to do so very cleanly, and in 
> the context of that issue, we'd like to do the following:
> * Implement a FacetsField which takes a TaxonomyWriter, FacetIndexingParams 
> and CategoryPath (List, Iterable, whatever) and adds the needed information 
> to both the taxonomy index as well as the search index.
> ** That API is similar in nature to CategoryDocumentBuilder, only easier to 
> consume -- it's just another field that you add to the Document.
> ** We'll have two extensions for it: PayloadFacetsField and 
> DocValuesFacetsField, so that we can benchmark the two approaches. 
> Eventually, one of them we believe, will be eliminated, and we'll remain w/ 
> just one (hopefully the DV one).
> * Implement either a FacetsAccumulator/Collector which takes a bunch of 
> CountFacetRequests and returns the top-counts.
> ** Aggregations are done in-collection, rather than post. Note that we have 
> LUCENE-4600 open for exploring that. Either we finish this exploration here, 
> or do it there. Just FYI that the issue exists.
> ** Reuses the CategoryListIterator, IntDecoder and Aggregator code. I'll open 
> a separate issue to explore improving that API to be bulk, and then we can 
> decide if this specialized Collector should use those abstractions, or be 
> really optimized for the facet counting case.
> * At the moment, this path will assume that a document holds multiple 
> dimensions, but only one value from each (i.e. no Author/Shai, Author/Mike 
> for a document), and therefore use OrdPolicy.NO_PARENTS.
> ** Later, we'd like to explore how to have this specialized path handle the 
> ALL_PARENTS case too, as it shouldn't be so hard to do.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to