[ https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12921784#action_12921784 ]
Jason Rutherglen commented on LUCENE-2575: ------------------------------------------ The issue with the model given is the posting-upto is handed to the byte slice reader as the end index. However newly written bytes may not actually make it to a reader thread as per the JMM. A reader thread may reach partially written bytes. There doesn't seem to be a way to tell the reader it's reached the end of the written bytes and so we probably need to add 2 paged ints arrays for freq and prox uptos respectively. This would be unfortunate because either the paged ints will need to be updated during the get reader call, or during indexing. Both could be detrimental to performance, though the net is still faster that the current NRT solution. The alternative is to simply copy-on-write the byte blocks, though that'd need to include the int blocks as well. I think we'd want to update the paged ints during indexing, otherwise discount it as a solution because otherwise it'd require full array iterations in the get reader call to compare and update. The advantage of copy-on-write of the blocks is the indexing speed will not be affected, nor the read speed, the main potential performance drag could be the garbage generated by the byte and int arrays thrown away on reader close. It would depend on how many blocks were updated in between get reader calls. We probably need to implement both solutions, try them out and measure the performance difference. There's Michael B.'s multiple slice levels linked together by atomic int arrays illustrated here: http://www.box.net/shared/hivdg1hge9 After reading this, the main idea I think we can use is to instead of using paged ints, simply maintain 2 upto arrays. One that's being written to, and a 2nd that's guaranteed to be in sync with the byte blocks. This would save on garbage and lookups into paged ints. The cost would is the array copy in the get reader lock. Given the array already exists, the copy should be fast? Perhaps this is the go ahead solution? > Concurrent byte and int block implementations > --------------------------------------------- > > Key: LUCENE-2575 > URL: https://issues.apache.org/jira/browse/LUCENE-2575 > Project: Lucene - Java > Issue Type: Improvement > Components: Index > Affects Versions: Realtime Branch > Reporter: Jason Rutherglen > Fix For: Realtime Branch > > Attachments: LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch, > LUCENE-2575.patch, LUCENE-2575.patch > > > The current *BlockPool implementations aren't quite concurrent. > We really need something that has a locking flush method, where > flush is called at the end of adding a document. Once flushed, > the newly written data would be available to all other reading > threads (ie, postings etc). I'm not sure I understand the slices > concept, it seems like it'd be easier to implement a seekable > random access file like API. One'd seek to a given position, > then read or write from there. The underlying management of byte > arrays could then be hidden? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org