[ https://issues.apache.org/jira/browse/LUCENE-2205?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115741#comment-13115741 ]
Michael McCandless commented on LUCENE-2205: -------------------------------------------- bq. I have done the PagedBytes back port already. It was a simple copy of the class (assuming that's what you want me to do). Excellent, I'm glad it was straightforward. We also need a DataInput impl that reads from the PagedBytes... I can help on that if you want. bq. As for the oal.util.packed package for the packed ints, I think they should be modified to work against the DataInput and DataOutput instead of the IndexInput and IndexOutput. I agree -- I committed this to trunk. > Rework of the TermInfosReader class to remove the Terms[], TermInfos[], and > the index pointer long[] and create a more memory efficient data structure. > ------------------------------------------------------------------------------------------------------------------------------------------------------- > > Key: LUCENE-2205 > URL: https://issues.apache.org/jira/browse/LUCENE-2205 > Project: Lucene - Java > Issue Type: Improvement > Components: core/index > Environment: Java5 > Reporter: Aaron McCurry > Assignee: Michael McCandless > Fix For: 3.5 > > Attachments: RandomAccessTest.java, TermInfosReader.java, > TermInfosReaderIndex.java, TermInfosReaderIndexDefault.java, > TermInfosReaderIndexSmall.java, lowmemory_w_utf8_encoding.patch, > patch-final.txt, rawoutput.txt > > > Basically packing those three arrays into a byte array with an int array as > an index offset. > The performance benefits are stagering on my test index (of size 6.2 GB, with > ~1,000,000 documents and ~175,000,000 terms), the memory needed to load the > terminfos into memory were reduced to 17% of there original size. From 291.5 > MB to 49.7 MB. The random access speed has been made better by 1-2%, load > time of the segments are ~40% faster as well, and full GC's on my JVM were > made 7 times faster. > I have already performed the work and am offering this code as a patch. > Currently all test in the trunk pass with this new code enabled. I did write > a system property switch to allow for the original implementation to be used > as well. > -Dorg.apache.lucene.index.TermInfosReader=default or small > I have also written a blog about this patch here is the link. > http://www.nearinfinity.com/blogs/aaron_mccurry/my_first_lucene_patch.html -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org