[ https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12916199#action_12916199 ]
Jason Rutherglen commented on LUCENE-2575: ------------------------------------------ bq. We'd need to increase the level 0 slice size... Yes. {quote}but the reader needs to read 'beyond' the end of a given slice, still? Ie say global maxDoc is 42, and a given posting just read doc 27 (which in fact is its last doc). It would then try to read the next doc?{quote} The posting-upto should stop the reader prior to reaching a byte element whose value is 0, ie, it should never happen. The main 'issue', which really isn't one, is that each reader cannot maintain a copy of the byte[][] spine as it'll be growing. New buffers will be added and the master posting-upto will also be changing, therefore allowing 'older' readers to possibly continue past their original point-in-time byte[][]. This is solved by adding synchronized around the obtainment of the byte[] buffer from the BBP, thereby preventing out of bounds exceptions. {quote}We don't store tf now do we? Adding 4 bytes per unique term isn't innocuous!{quote} What I meant is, if we're merely maintaining the term freq array during normal, non-RT indexing, then we're not constantly creating new arrays, we're in innocuous land, though there is no use for the array in this case, eg, it shouldn't be created unless RT had been flipped on, modally. {quote}Hmm the full copy of the tf parallal array is going to put a highish cost on reopen? So some some of transactional (incremental copy-on-write) data structure is needed (eg PagedInts)...{quote} Right, this to me is the remaining 'problem', or rather something that needs a reasonable go-ahead solution. For now we can assume PagedInts is the answer. In addition, to summarize the skip list. It needs to store the doc, address into the BBP, and the length to the end of the slice from the given address. This allows us to point to a document anywhere in the postings BBP, and still continue with slice iteration. In the test code I've written, the slice level is stored as well, I'm not sure why/if that's required. I think it's a hint to the BBP reader as to the level of the next slice. > Concurrent byte and int block implementations > --------------------------------------------- > > Key: LUCENE-2575 > URL: https://issues.apache.org/jira/browse/LUCENE-2575 > Project: Lucene - Java > Issue Type: Improvement > Components: Index > Affects Versions: Realtime Branch > Reporter: Jason Rutherglen > Fix For: Realtime Branch > > Attachments: LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch, > LUCENE-2575.patch > > > The current *BlockPool implementations aren't quite concurrent. > We really need something that has a locking flush method, where > flush is called at the end of adding a document. Once flushed, > the newly written data would be available to all other reading > threads (ie, postings etc). I'm not sure I understand the slices > concept, it seems like it'd be easier to implement a seekable > random access file like API. One'd seek to a given position, > then read or write from there. The underlying management of byte > arrays could then be hidden? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org