[ https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914684#action_12914684 ]
Jason Rutherglen commented on LUCENE-2575: ------------------------------------------ {quote}Maybe we can not skip until we've hit the max slice? This way skipping would always know it's on the max slice. This works out to 429 bytes into the stream... likely this is fine. {quote} Me like-y. I'll implement the skip list to point to the largest level slices. {quote}Can we just have IW allocate a new byte[][] after flush? So then any open readers can keep using the one they have?{quote} This means the prior byte[]s will still be recycled after all active previous flush readers are closed? If there are multiple readers from the previous flush, we'd probably still need reference counting (ala bitvector and norms)? Unfortunately a reference count parallel array will not quite work because we're copy-on-writing the byte[]s, eg, there's nothing consistent for the index numeral to point to. A hash map of byte[]s would likely be too heavyweight? We may need to implement a ByteArray object composed of a byte[] and a refcount. This is somewhat counter to our parallel array memory savings strategy, though it is directly analogous to the way norms are implemented in SegmentReader. {quote}it's possible single level skipping, with a larger skip interval, is fine for even large RAM buffers.{quote} True, I'll implement a default of one level, and a default large-ish skip interval. {quote}Maybe we can get an initial version of this working, without the skipping? Ie skipping is implemented as scanning. {quote} How many scorers, or how often is skipping used? It's mostly for disjunction queries? If we limit the skip level to one, and not implement the BBP level byte at the beginning of the slice, the MLSL will be a lot easier (ie faster) to implement and test. I'd like to see BytesHash get out of THPF (eg, LUCENE-2662), get deletes working in the RT branch, and merge the flush by DWPT to trunk. Concurrently I'll work on the search on the RAM buffer which is most of the way completed. I'd prefer to test a more complete version of LUCENE-2312 with skip lists (which can easily be turned off), so that when we do take it through the laundromat of testing, we won't need to retrofit anything back in, re-test, and possibly re-design. On a side note related to testing: One naive way I've tested is to do the copy-on-write of the BBP when the segment needs to be flushed to disk, and write the segment from the read-only copy of the BBP. If the segment is correct, then at least we know the copy worked properly and nothing's missing. > Concurrent byte and int block implementations > --------------------------------------------- > > Key: LUCENE-2575 > URL: https://issues.apache.org/jira/browse/LUCENE-2575 > Project: Lucene - Java > Issue Type: Improvement > Components: Index > Affects Versions: Realtime Branch > Reporter: Jason Rutherglen > Fix For: Realtime Branch > > Attachments: LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch, > LUCENE-2575.patch > > > The current *BlockPool implementations aren't quite concurrent. > We really need something that has a locking flush method, where > flush is called at the end of adding a document. Once flushed, > the newly written data would be available to all other reading > threads (ie, postings etc). I'm not sure I understand the slices > concept, it seems like it'd be easier to implement a seekable > random access file like API. One'd seek to a given position, > then read or write from there. The underlying management of byte > arrays could then be hidden? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org