[jira] [Commented] (LUCENE-4599) Compressed term vectors

Adrien Grand (JIRA) Sun, 20 Jan 2013 13:56:14 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-4599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13558401#comment-13558401
 ]


Adrien Grand commented on LUCENE-4599:
--------------------------------------

OK, I think I understood: I had forgotten to turn debug off, and although 
documents in this collection are rather big, queries tend to favor small docs, 
whose chunks contain more documents (up to 30). I ran the benchmark again with 
a very small chunk size (128) so that chunks would likely contain a single doc 
and results got better :
{noformat}
                  Fuzzy2       94.39      (7.8%)       88.33      (7.5%)   
-6.4% ( -20% -    9%)
                 MedTerm      292.09      (2.7%)      279.01      (2.6%)   
-4.5% (  -9% -    0%)
              OrHighHigh       76.84      (7.4%)       73.58      (5.8%)   
-4.2% ( -16% -    9%)
                  Fuzzy1       93.07      (4.8%)       89.59      (4.4%)   
-3.7% ( -12% -    5%)
               OrHighMed       69.23      (6.4%)       67.17      (4.9%)   
-3.0% ( -13% -    8%)
              HighPhrase        8.54      (9.4%)        8.36     (11.6%)   
-2.1% ( -21% -   20%)
               LowPhrase      125.02      (2.5%)      122.91      (3.4%)   
-1.7% (  -7% -    4%)
               MedPhrase       39.97      (5.3%)       39.58      (7.6%)   
-1.0% ( -13% -   12%)
                HighTerm      177.70      (2.4%)      176.21      (2.2%)   
-0.8% (  -5% -    3%)
                 LowTerm      370.26      (3.7%)      367.36      (2.8%)   
-0.8% (  -7% -    5%)
               OrHighLow      106.08      (5.2%)      105.41      (4.7%)   
-0.6% ( -10% -    9%)
         LowSloppyPhrase       71.29      (5.2%)       70.95      (5.3%)   
-0.5% ( -10% -   10%)
        HighSloppyPhrase       30.52      (5.6%)       30.39      (5.2%)   
-0.4% ( -10% -   10%)
                PKLookup      339.12      (3.0%)      338.09      (3.1%)   
-0.3% (  -6% -    5%)
         MedSloppyPhrase       71.13      (4.2%)       70.95      (4.4%)   
-0.3% (  -8% -    8%)
              AndHighLow      259.19      (3.8%)      258.54      (5.1%)   
-0.2% (  -8% -    8%)
                 Respell       69.04      (3.7%)       68.92      (3.2%)   
-0.2% (  -6% -    6%)
             AndHighHigh       74.49      (1.5%)       74.47      (1.8%)   
-0.0% (  -3% -    3%)
                Wildcard      157.16      (2.0%)      157.21      (1.9%)    
0.0% (  -3% -    3%)
              AndHighMed       79.81      (2.1%)       80.16      (1.6%)    
0.4% (  -3% -    4%)
             MedSpanNear       14.09      (3.6%)       14.16      (4.4%)    
0.5% (  -7% -    8%)
                 Prefix3      281.17      (2.7%)      282.85      (2.5%)    
0.6% (  -4% -    5%)
            HighSpanNear        7.73      (3.9%)        7.79      (2.8%)    
0.8% (  -5% -    7%)
                  IntNRQ      143.14      (3.0%)      144.45      (3.2%)    
0.9% (  -5% -    7%)
             LowSpanNear       23.85      (6.6%)       24.36      (6.0%)    
2.2% (  -9% -   15%)
{noformat}

(Decreasing the chunk size from 16KB to 128 made the compression ratio increase 
from 66% to 68%.)
                
> Compressed term vectors
> -----------------------
>
>                 Key: LUCENE-4599
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4599
>             Project: Lucene - Core
>          Issue Type: Task
>          Components: core/codecs, core/termvectors
>            Reporter: Adrien Grand
>            Assignee: Adrien Grand
>            Priority: Minor
>             Fix For: 4.2
>
>         Attachments: 4599-dataimport-fail.log, 4599-zookeer-fail.log, 
> CompressingTVF_ingest_rate.png, highlightNoStop.tasks, 
> Lucene40TVF_ingest_rate.png, LUCENE-4599.patch, LUCENE-4599.patch, 
> LUCENE-4599.patch, solr.patch
>
>
> We should have codec-compressed term vectors similarly to what we have with 
> stored fields.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-4599) Compressed term vectors

Reply via email to