[jira] [Commented] (LUCENE-4600) Explore facets aggregation during documents collection

Shai Erera (JIRA) Sun, 09 Dec 2012 04:31:32 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-4600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13527425#comment-13527425
 ]


Shai Erera commented on LUCENE-4600:
------------------------------------

I would like to see one ordinals-store, I don't think that we should allow 
either payload or DV. If DV lets us write byte[], and we could read it off-disk 
or RAM, we should make the cut to DV.

But note that DV means ugrading existing indexes. How do you move from a 
payload to DV? Is it something that can be done in addIndexes? If facets could 
determine where the data is written, per-segment, the indexes will be migrated 
on-the-fly, as segments are merged.

But if there's a clean way to do a one-time index upgrade to DV, then let's 
just write it once, and then DVs are migratable, so that's another +1 for DV.

If you want to simulate DVs, you'll need to implement few classes. First, 
instead of CategoryDocBuilder, you can constuct your own Document, while adding 
DVFields. Just make sure that when you resolve a CP to its ord, you also 
resolve all its parents and add all of them to the DV - to compare 
today(payload) to today(DV) (today == writing all parents).

Then, I think that you should also write your CategoryListIterator, to iterate 
on the DV.

Those are the base classes for sure, maybe you'll need a few others to get the 
CLI into the chain.

I hope that I related to all the comments, but I might have missed a question 
:).
                
> Explore facets aggregation during documents collection
> ------------------------------------------------------
>
>                 Key: LUCENE-4600
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4600
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>         Attachments: LUCENE-4600.patch, LUCENE-4600.patch
>
>
> Today the facet module simply gathers all hits (as a bitset, optionally with 
> a float[] to hold scores as well, if you will aggregate them) during 
> collection, and then at the end when you call getFacetsResults(), it makes a 
> 2nd pass over all those hits doing the actual aggregation.
> We should investigate just aggregating as we collect instead, so we don't 
> have to tie up transient RAM (fairly small for the bit set but possibly big 
> for the float[]).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-4600) Explore facets aggregation during documents collection

Reply via email to