[ https://issues.apache.org/jira/browse/LUCENE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845943#action_12845943 ]
Jason Rutherglen commented on LUCENE-2312: ------------------------------------------ I thought we're moving away from byte block pooling and we're going to try relying on garbage collection? Does a volatile object[] publish changes to all threads? Probably not, again it'd just be the pointer. In the case of posting/termdocs iteration, I'm more concerned that the lastDocID be volatile than the with the byte array containing extra data. Extra docs is OK in the byte array because we'll simply stop iterating when we've reached the last doc. Though with our system, we shouldn't even run into this either, meaning a byte array is copied and published, perhaps the master byte array is still being written to and the same byte array (by id or something) is published again? Then we'd have multiple versions of byte arrays. That could be bad. Because there is one DW per thread, there's only one document being indexed at a time. There's no writer concurrency. This leaves reader concurrency. However after each doc, we *could* simply flush all bytes related to the doc. Any new docs must simply start writing to new byte arrays? The problem with this is, unless the byte arrays are really small, we'll have a lot of extra data around, well, unless the byte arrays are trimmed before publication. Or we can simply RW lock (or some other analogous thing) individual byte arrays, not publish them after each doc, then only publish them when get reader is called. To clarify, the RW lock (or flag) would only be per byte array, in fact, all writing to the byte array could necessarily cease on flush, and new byte arrays allocated. The published byte array could point to the next byte array. I think we simply need a way to publish byte arrays to all threads? Michael B. can you post something of what you have so we can get an idea of how your system will work (ie, mainly what the assumptions are)? We do need to strive for correctness of data, and perhaps performance will be slightly impacted (though compared with our current NRT we'll have an overall win). > Search on IndexWriter's RAM Buffer > ---------------------------------- > > Key: LUCENE-2312 > URL: https://issues.apache.org/jira/browse/LUCENE-2312 > Project: Lucene - Java > Issue Type: New Feature > Components: Search > Affects Versions: 3.0.1 > Reporter: Jason Rutherglen > Assignee: Michael Busch > Fix For: 3.1 > > > In order to offer user's near realtime search, without incurring > an indexing performance penalty, we can implement search on > IndexWriter's RAM buffer. This is the buffer that is filled in > RAM as documents are indexed. Currently the RAM buffer is > flushed to the underlying directory (usually disk) before being > made searchable. > Todays Lucene based NRT systems must incur the cost of merging > segments, which can slow indexing. > Michael Busch has good suggestions regarding how to handle deletes using max > doc ids. > https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841923&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841923 > The area that isn't fully fleshed out is the terms dictionary, > which needs to be sorted prior to queries executing. Currently > IW implements a specialized hash table. Michael B has a > suggestion here: > https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841915&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841915 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org