[jira] [Commented] (LUCENE-4769) Add a CountingFacetsAggregator which reads ordinals from a cache

Michael McCandless (JIRA) Mon, 11 Feb 2013 06:21:17 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-4769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13575802#comment-13575802
 ]


Michael McCandless commented on LUCENE-4769:
--------------------------------------------

Full (6.6M) wikibig index, 7 facet dims:

{noformat}
                    Task    QPS base      StdDev    QPS comp      StdDev        
        Pct diff
                 Respell       46.60      (3.4%)       45.82      (4.1%)   
-1.7% (  -8% -    6%)
            HighSpanNear        3.49      (1.7%)        3.51      (2.2%)    
0.8% (  -3% -    4%)
              HighPhrase       17.13     (10.5%)       17.42     (11.0%)    
1.7% ( -17% -   26%)
                  Fuzzy2       53.25      (2.8%)       54.19      (3.1%)    
1.8% (  -4% -    7%)
              AndHighLow      587.43      (2.3%)      597.84      (2.6%)    
1.8% (  -3% -    6%)
         LowSloppyPhrase       20.30      (2.3%)       20.68      (2.3%)    
1.9% (  -2% -    6%)
             LowSpanNear        8.24      (2.3%)        8.42      (2.9%)    
2.1% (  -3% -    7%)
             AndHighHigh       23.36      (1.3%)       23.95      (0.9%)    
2.5% (   0% -    4%)
        HighSloppyPhrase        0.92      (5.1%)        0.94      (6.1%)    
2.8% (  -7% -   14%)
               LowPhrase       21.02      (6.2%)       21.63      (6.7%)    
2.9% (  -9% -   16%)
             MedSpanNear       28.31      (1.3%)       29.20      (1.5%)    
3.1% (   0% -    6%)
         MedSloppyPhrase       25.98      (1.7%)       26.79      (1.7%)    
3.1% (   0% -    6%)
                 MedTerm       47.54      (1.9%)       49.49      (3.4%)    
4.1% (  -1% -    9%)
                  Fuzzy1       47.28      (2.2%)       49.27      (2.6%)    
4.2% (   0% -    9%)
              AndHighMed      105.55      (0.9%)      112.03      (1.2%)    
6.1% (   3% -    8%)
                Wildcard       27.63      (1.2%)       30.03      (1.6%)    
8.7% (   5% -   11%)
               MedPhrase      109.43      (5.6%)      122.45      (7.4%)   
11.9% (   0% -   26%)
                 LowTerm      110.94      (1.9%)      128.73      (1.8%)   
16.0% (  12% -   20%)
               OrHighLow       17.11      (2.2%)       22.44      (3.7%)   
31.1% (  24% -   37%)
               OrHighMed       16.63      (2.1%)       21.89      (3.8%)   
31.6% (  25% -   38%)
                HighTerm       19.17      (1.9%)       26.30      (3.5%)   
37.2% (  31% -   43%)
              OrHighHigh        8.77      (2.4%)       12.45      (4.7%)   
42.1% (  34% -   50%)
                 Prefix3       13.06      (1.8%)       18.66      (2.2%)   
42.9% (  38% -   47%)
                  IntNRQ        3.59      (1.6%)        6.45      (3.3%)   
79.8% (  73% -   86%)
{noformat}

trunk DVs take 61.4 MB while the int[] cache takes 202.9 MB (3.3X
more).  Also, if users use the int[] cache they must remember to use
(and maybe we check / warn about it somehow) a disk-backed DV else
it's silly since you'd be double-caching in RAM.

Curiously these gains are not that much better (except IntNRQ) than
LUCENE-4764, which was only ~31% larger... which is odd because we had
previously tested [close to] LUCENE-4764 against int[] cache and it
was faster.

                
> Add a CountingFacetsAggregator which reads ordinals from a cache
> ----------------------------------------------------------------
>
>                 Key: LUCENE-4769
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4769
>             Project: Lucene - Core
>          Issue Type: New Feature
>          Components: modules/facet
>            Reporter: Shai Erera
>            Assignee: Shai Erera
>         Attachments: LUCENE-4769.patch
>
>
> Mike wrote a prototype of a FacetsCollector which reads ordinals from a 
> CachedInts structure on LUCENE-4609. I ported it to the new facets API, as a 
> FacetsAggregator. I think we should offer users the means to use such a 
> cache, even if it consumes more RAM. Mike tests show that this cache consumed 
> x2 more RAM than if the DocValues were loaded into memory in their raw form. 
> Also, a PackedInts version of such cache took almost the same amount of RAM 
> as straight int[], but the gains were minor.
> I will post the patch shortly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-4769) Add a CountingFacetsAggregator which reads ordinals from a cache

Reply via email to