[ 
https://issues.apache.org/jira/browse/LUCENE-3069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Han Jiang updated LUCENE-3069:
------------------------------

    Attachment: df-ttf-estimate.txt

Uploaded detail data for wikimediumall.

Oh, sorry, there is an error when I 
caculated index size for df==0 trick, 
it should be 105MB instead of 70MB.

But the real test is still beyond 
estimation (weird...). df==0 tricks
gains similar compression.

Index size are below:
{noformat}
v0:                   13195304
v1 = v0 + flag byte:  12847172
v2 = v1 + steal bit:  12770700
v3 = v1 + zero df:    12780884
{noformat}

Another thing that surprised me is, with the same code/conf, 
luceneutil creates different sizes of index? I tested 
that df==0 trick several times on wikimedium1m, the 
index size varies from 514M~522M... Will multi-threading affects
much here?

                
> Lucene should have an entirely memory resident term dictionary
> --------------------------------------------------------------
>
>                 Key: LUCENE-3069
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3069
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/index, core/search
>    Affects Versions: 4.0-ALPHA
>            Reporter: Simon Willnauer
>            Assignee: Han Jiang
>              Labels: gsoc2013
>             Fix For: 4.4
>
>         Attachments: df-ttf-estimate.txt, example.png, LUCENE-3069.patch
>
>
> FST based TermDictionary has been a great improvement yet it still uses a 
> delta codec file for scanning to terms. Some environments have enough memory 
> available to keep the entire FST based term dict in memory. We should add a 
> TermDictionary implementation that encodes all needed information for each 
> term into the FST (custom fst.Output) and builds a FST from the entire term 
> not just the delta.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to