[ 
https://issues.apache.org/jira/browse/LUCENE-5425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13890040#comment-13890040
 ] 

Lei Wang commented on LUCENE-5425:
----------------------------------

tried with the lucenutil, but got some problem. I cannot get same numbers for 
two identical code of trunk. even if they are all trunks, i get different 
numbers:
Report after iter 19:
                    TaskQPS baseline      StdDevQPS my_modified_version      
StdDev                Pct diff
               OrHighMed       74.15      (7.1%)       71.24      (8.3%)   
-3.9% ( -18% -   12%)
                 LowTerm      515.68     (15.1%)      496.20     (12.3%)   
-3.8% ( -27% -   27%)
            OrNotHighLow       72.22      (8.2%)       70.36      (7.6%)   
-2.6% ( -17% -   14%)
            OrNotHighMed       79.01      (7.3%)       77.43      (8.4%)   
-2.0% ( -16% -   14%)
           OrHighNotHigh       38.66      (4.5%)       37.90      (6.4%)   
-2.0% ( -12% -    9%)
                 Respell       51.21      (7.1%)       50.23      (6.5%)   
-1.9% ( -14% -   12%)
               MedPhrase       69.67      (7.5%)       68.35      (7.4%)   
-1.9% ( -15% -   14%)
               OrHighLow       67.24      (7.8%)       66.00      (9.0%)   
-1.8% ( -17% -   16%)
                  Fuzzy1       27.37      (5.7%)       26.96      (5.5%)   
-1.5% ( -11% -   10%)
                  Fuzzy2       37.21      (3.8%)       36.71      (5.6%)   
-1.3% ( -10% -    8%)
         MedSloppyPhrase        9.94      (5.4%)        9.83      (3.9%)   
-1.1% (  -9% -    8%)
             LowSpanNear        8.60      (3.9%)        8.54      (3.8%)   
-0.7% (  -8% -    7%)
             AndHighHigh       40.23      (3.1%)       40.03      (2.5%)   
-0.5% (  -5% -    5%)
                HighTerm       76.07      (9.0%)       75.96      (9.1%)   
-0.2% ( -16% -   19%)
              OrHighHigh       11.62      (3.0%)       11.62      (4.8%)   
-0.1% (  -7% -    7%)
                  IntNRQ        9.51      (3.9%)        9.51      (8.3%)    
0.0% ( -11% -   12%)
              HighPhrase       25.61      (7.0%)       25.63      (7.7%)    
0.1% ( -13% -   15%)
         LowSloppyPhrase       30.21      (5.2%)       30.24      (4.3%)    
0.1% (  -8% -   10%)
                PKLookup      212.03      (9.0%)      212.25     (11.5%)    
0.1% ( -18% -   22%)
           OrNotHighHigh       27.75      (3.5%)       27.80      (6.5%)    
0.2% (  -9% -   10%)
            OrHighNotMed       58.14      (5.9%)       58.27      (8.3%)    
0.2% ( -13% -   15%)
             MedSpanNear       22.73      (3.7%)       22.80      (5.1%)    
0.3% (  -8% -    9%)
                Wildcard       42.84      (5.0%)       42.97      (5.4%)    
0.3% (  -9% -   11%)
        HighSloppyPhrase       23.99      (7.4%)       24.08      (6.3%)    
0.4% ( -12% -   15%)
              AndHighLow      625.62      (6.6%)      629.52     (10.5%)    
0.6% ( -15% -   18%)
                 Prefix3       77.68      (7.2%)       78.17      (6.2%)    
0.6% ( -11% -   15%)
               LowPhrase       14.58      (4.7%)       14.77      (5.0%)    
1.3% (  -8% -   11%)
            HighSpanNear       11.84      (4.3%)       11.99      (5.2%)    
1.3% (  -7% -   11%)
            OrHighNotLow       66.04      (8.4%)       67.28      (9.2%)    
1.9% ( -14% -   21%)
              AndHighMed       66.55      (4.3%)       67.91      (6.2%)    
2.1% (  -8% -   13%)
                 MedTerm      139.78      (9.5%)      145.63     (10.3%)    
4.2% ( -14% -   26%)

with the patch, the numbers are also different, but no bigger difference than 
the trunk-trunk numbers:
Report after iter 19:
                    TaskQPS baseline      StdDevQPS my_modified_version      
StdDev                Pct diff
              AndHighLow      730.30     (11.5%)      700.95     (10.6%)   
-4.0% ( -23% -   20%)
                 LowTerm      520.94     (10.6%)      504.25     (11.4%)   
-3.2% ( -22% -   21%)
                  Fuzzy1       57.55      (5.1%)       56.26      (4.8%)   
-2.2% ( -11% -    8%)
                 Respell       35.85      (4.7%)       35.18      (4.1%)   
-1.9% ( -10% -    7%)
           OrHighNotHigh       37.77      (7.3%)       37.19      (5.9%)   
-1.5% ( -13% -   12%)
        HighSloppyPhrase       12.30      (7.5%)       12.17      (7.7%)   
-1.1% ( -15% -   15%)
              HighPhrase       29.38      (5.2%)       29.06      (4.3%)   
-1.1% ( -10% -    8%)
            OrNotHighMed       25.93      (6.2%)       25.68      (5.5%)   
-1.0% ( -11% -   11%)
           OrNotHighHigh       19.72      (5.9%)       19.53      (4.9%)   
-0.9% ( -11% -   10%)
                  Fuzzy2       11.30      (3.6%)       11.24      (5.1%)   
-0.6% (  -8% -    8%)
                PKLookup      218.16      (8.6%)      217.53      (9.3%)   
-0.3% ( -16% -   19%)
         LowSloppyPhrase       43.09      (5.6%)       43.00      (3.5%)   
-0.2% (  -8% -    9%)
             MedSpanNear       30.65      (4.4%)       30.60      (3.2%)   
-0.1% (  -7% -    7%)
         MedSloppyPhrase       21.71      (5.7%)       21.70      (3.8%)   
-0.0% (  -8% -    9%)
                Wildcard       14.67      (3.3%)       14.67      (2.6%)   
-0.0% (  -5% -    6%)
            HighSpanNear        0.64      (4.6%)        0.64      (5.0%)    
0.1% (  -9% -   10%)
               LowPhrase       21.05      (5.6%)       21.09      (7.6%)    
0.2% ( -12% -   14%)
              AndHighMed      175.53      (7.2%)      176.00      (8.2%)    
0.3% ( -14% -   16%)
                 Prefix3       31.24      (3.3%)       31.37      (2.7%)    
0.4% (  -5% -    6%)
            OrNotHighLow       76.32      (6.3%)       76.80      (7.7%)    
0.6% ( -12% -   15%)
              OrHighHigh       33.43      (6.4%)       33.65      (7.6%)    
0.7% ( -12% -   15%)
             AndHighHigh       35.51      (3.1%)       35.76      (3.1%)    
0.7% (  -5% -    7%)
                  IntNRQ        9.36      (4.4%)        9.43      (3.7%)    
0.7% (  -7% -    9%)
                HighTerm       90.42      (7.0%)       91.40      (5.3%)    
1.1% ( -10% -   14%)
               OrHighLow       71.32      (8.6%)       72.13      (8.2%)    
1.1% ( -14% -   19%)
             LowSpanNear      107.82      (6.8%)      109.19      (5.9%)    
1.3% ( -10% -   14%)
               OrHighMed       45.43      (8.8%)       46.09      (8.8%)    
1.5% ( -14% -   20%)
                 MedTerm      139.24      (7.0%)      141.28      (8.4%)    
1.5% ( -13% -   18%)
               MedPhrase       96.51      (5.0%)       98.10      (6.8%)    
1.6% (  -9% -   14%)
            OrHighNotMed       50.88      (6.9%)       52.13      (8.3%)    
2.5% ( -11% -   18%)
            OrHighNotLow       65.31      (8.9%)       67.16      (8.8%)    
2.8% ( -13% -   22%)

Btw, I copied the facet config from the nightly py, and the index looks like:
index = comp.newIndex('trunk', WIKI_MEDIUM_10M, facets = (('Date',),), 
facetDVFormat='Direct')



> Make creation of FixedBitSet in FacetsCollector overridable
> -----------------------------------------------------------
>
>                 Key: LUCENE-5425
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5425
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/facet
>    Affects Versions: 4.6
>            Reporter: John Wang
>         Attachments: facetscollector.patch, facetscollector.patch, 
> fixbitset.patch
>
>
> In FacetsCollector, creation of bits in MatchingDocs are allocated per query. 
> For large indexes where maxDocs are large creating a bitset of maxDoc bits 
> will be expensive and would great a lot of garbage.
> Attached patch is to allow for this allocation customizable while maintaining 
> current behavior.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to