[ https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916000#action_12916000 ]
Jason Rutherglen commented on LUCENE-2575: ------------------------------------------ OK, I think there's a solution to copying the actual byte[], we'd need to alter the behavior of BBPs. It would require always allocating 3 empty bytes at the end of a slice for the forwarding address, rather than what we do today, which is write the postings up to the end of the slice, then when allocating a new slice, copying the last 3 bytes forward to the new slice location. We would also need to pass a unique parallel posting upto array to each reader. This is required so that the reader never ventures beyond the end of a slice, as the slice was written when the reader was instantiated. This would yield significant savings because we would not be generating garbage from the byte[]s, which are 32 KB each. They add up if the indexing is touching many different byte[]s for example. With this solution, there would essentially not be any garbage generated from incremental indexing, only after a DWPTs segment is flushed (and all readers were also GCed). The only downside is we'd be leaving those 3 bytes per term unallocated at all times, that's not a very high price. Perhaps more impacting is the posting upto array per reader, which'd be 4 bytes per term, the same cost as the term freq array. It's a pick your poison problem. > Concurrent byte and int block implementations > --------------------------------------------- > > Key: LUCENE-2575 > URL: https://issues.apache.org/jira/browse/LUCENE-2575 > Project: Lucene - Java > Issue Type: Improvement > Components: Index > Affects Versions: Realtime Branch > Reporter: Jason Rutherglen > Fix For: Realtime Branch > > Attachments: LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch, > LUCENE-2575.patch > > > The current *BlockPool implementations aren't quite concurrent. > We really need something that has a locking flush method, where > flush is called at the end of adding a document. Once flushed, > the newly written data would be available to all other reading > threads (ie, postings etc). I'm not sure I understand the slices > concept, it seems like it'd be easier to implement a seekable > random access file like API. One'd seek to a given position, > then read or write from there. The underlying management of byte > arrays could then be hidden? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org