[ https://issues.apache.org/jira/browse/LUCENE-2329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12846807#action_12846807 ]
Michael McCandless commented on LUCENE-2329: -------------------------------------------- This would be great! But, note that term vectors today do not store the term char[] again -- they piggyback on the term char[] already stored for the postings. Though, I believe they store "int textStart" (increments by term length per unique term), which is less compact than the termID would be (increments +1 per unique term), so if eg we someday use packed ints we'd be more RAM efficient by storing termIDs... > Use parallel arrays instead of PostingList objects > -------------------------------------------------- > > Key: LUCENE-2329 > URL: https://issues.apache.org/jira/browse/LUCENE-2329 > Project: Lucene - Java > Issue Type: Improvement > Components: Index > Reporter: Michael Busch > Assignee: Michael Busch > Priority: Minor > Fix For: 3.1 > > > This is Mike's idea that was discussed in LUCENE-2293 and LUCENE-2324. > In order to avoid having very many long-living PostingList objects in > TermsHashPerField we want to switch to parallel arrays. The termsHash will > simply be a int[] which maps each term to dense termIDs. > All data that the PostingList classes currently hold will then we placed in > parallel arrays, where the termID is the index into the arrays. This will > avoid the need for object pooling, will remove the overhead of object > initialization and garbage collection. Especially garbage collection should > benefit significantly when the JVM runs out of memory, because in such a > situation the gc mark times can get very long if there is a big number of > long-living objects in memory. > Another benefit could be to build more efficient TermVectors. We could avoid > the need of having to store the term string per document in the TermVector. > Instead we could just store the segment-wide termIDs. This would reduce the > size and also make it easier to implement efficient algorithms that use > TermVectors, because no term mapping across documents in a segment would be > necessary. Though this improvement we can make with a separate jira issue. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org