[ 
https://issues.apache.org/jira/browse/LUCENE-3932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13243102#comment-13243102
 ] 

Michael McCandless commented on LUCENE-3932:
--------------------------------------------

bq. Is the space savings of delta encoding worth the processing time? You could 
write the .tii file to disk such that on open you could read it straight into a 
byte[].

This is actually what we do in 4.0's default codec (the index is an FST).

It is tempting to do that in 3.x (if we were to do another 3.x release after 
3.6) ... we'd need to alter other things as well, eg the term bytes are also 
delta-coded in the file but not in RAM.

I'm curious how much larger it'd be if we stopped delta coding... for your 
case, how large is the byte[] in RAM (just call dataPagedBytes.getPointer(), 
just before we freeze it, and print that result) vs the tii on disk...?
                
> Improve load time of .tii files
> -------------------------------
>
>                 Key: LUCENE-3932
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3932
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 3.5
>         Environment: Linux
>            Reporter: Sean Bridges
>         Attachments: LUCENE-3932.trunk.patch, perf.csv
>
>
> We have a large 50 gig index which is optimized as one segment, with a 66 MEG 
> .tii file.  This index has no norms, and no field cache.
> It takes about 5 seconds to load this index, profiling reveals that 60% of 
> the time is spent in GrowableWriter.set(index, value), and most of time in 
> set(...) is spent resizing PackedInts.Mutatable current.
> In the constructor for TermInfosReaderIndex, you initialize the writer with 
> the line,
> {quote}GrowableWriter indexToTerms = new GrowableWriter(4, indexSize, 
> false);{quote}
> For our index using four as the bit estimate results in 27 resizes.
> The last value in indexToTerms is going to be ~ tiiFileLength, and if instead 
> you use,
> {quote}int bitEstimate = (int) Math.ceil(Math.log10(tiiFileLength) / 
> Math.log10(2));
> GrowableWriter indexToTerms = new GrowableWriter(bitEstimate, indexSize, 
> false);{quote}
> Load time improves to ~ 2 seconds.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Reply via email to