[jira] [Commented] (LUCENE-2205) Rework of the TermInfosReader class to remove the Terms[], TermInfos[], and the index pointer long[] and create a more memory efficient data structure.

Doug Cutting (JIRA) Wed, 14 Sep 2011 16:08:34 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13104982#comment-13104982
 ]


Doug Cutting commented on LUCENE-2205:
--------------------------------------

Also have you tried specifying termInfosIndexDivisor?  I added that feature 
many years ago to address the memory footprint of the terms index.

http://lucene.apache.org/java/3_0_3/api/core/org/apache/lucene/index/IndexReader.html#open(org.apache.lucene.store.Directory,
 org.apache.lucene.index.IndexDeletionPolicy, boolean, int)

If this is 2 then the memory use is halved, but the compute cost of looking up 
each search term is doubled.  It would be interesting to compare the 
performance of the two approaches, since the approach of this patch probably 
increases lookup cost somewhat too.

> Rework of the TermInfosReader class to remove the Terms[], TermInfos[], and 
> the index pointer long[] and create a more memory efficient data structure.
> -------------------------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-2205
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2205
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: core/index
>         Environment: Java5
>            Reporter: Aaron McCurry
>         Attachments: RandomAccessTest.java, TermInfosReader.java, 
> TermInfosReaderIndex.java, TermInfosReaderIndexDefault.java, 
> TermInfosReaderIndexSmall.java, patch-final.txt, rawoutput.txt
>
>
> Basically packing those three arrays into a byte array with an int array as 
> an index offset.  
> The performance benefits are stagering on my test index (of size 6.2 GB, with 
> ~1,000,000 documents and ~175,000,000 terms), the memory needed to load the 
> terminfos into memory were reduced to 17% of there original size.  From 291.5 
> MB to 49.7 MB.  The random access speed has been made better by 1-2%, load 
> time of the segments are ~40% faster as well, and full GC's on my JVM were 
> made 7 times faster.
> I have already performed the work and am offering this code as a patch.  
> Currently all test in the trunk pass with this new code enabled.  I did write 
> a system property switch to allow for the original implementation to be used 
> as well.
> -Dorg.apache.lucene.index.TermInfosReader=default or small
> I have also written a blog about this patch here is the link.
> http://www.nearinfinity.com/blogs/aaron_mccurry/my_first_lucene_patch.html

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-2205) Rework of the TermInfosReader class to remove the Terms[], TermInfos[], and the index pointer long[] and create a more memory efficient data structure.

Reply via email to