[ https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914838#action_12914838 ]
Jason Rutherglen commented on LUCENE-2575: ------------------------------------------ {quote}Can you explain what's the "copy on write ByteBlockPool"? Exactly when do we make a copy....? {quote} A copy of the byte[][] refs is made when getReader is called. Each DWPT is locked, eg, writes stop, a copy of the byte[][] is made (just the refs) for that reader. I think the issue at the moment is I'm using a boolean[] to signify if a byte[] needs to be copied before being written to. As with BV and norms cloning, read-only references are carried forward, which would imply making copies of the boolean[] as well. In other words, as with BV and norms, I think we need ref counts to the individual byte[]s so that read-only references to byte[]s are carried forward properly. However this implies creating a BytesRefCount object because a parallel array cannot point back to the same underlying byte[] if the byte[] in the byte[][] can be replaced when a copy is made. {quote}Do we have a design thought out for this? The challenge is because every doc state now has its own private docID stream{quote} It sounded easy when I first heard it, however, I needed to write it down to fully understand and work through what's going on. That process is located in LUCENE-2558. {quote}Well, I was thinking only implement the single-level skip case (since it ought to be alot simpler than the MLSLW/R)....{quote} I started on this, eg, implementing a single-level skip list that reads and writes from the BBP. It's a good lesson in how to use the BBP. {quote}Actually, conjunction (AND) queries, and also PhraseQuery{quote} Both very common types of queries, so we probably need some type of skipping, which we will, it'll just be single-level. {quote}Probably we should stop reusing the byte[] with this change? So when all readers using a given byte[] are finally GCd, is when that byte[] is reclaimed.{quote} I have a suspicion we'll change our minds about pooling byte[]s. We may end up implementing ref counting anyways (as described above), and the sudden garbage generated *could* be a massive change for users? Of course ref counting was difficult to implement the first time around in LUCENE-1314, perhaps however it'll be easier the 2nd time. As a side note, there is still an issue in my mind around the term frequencies parallel array (introduced in these patches), in that we'd need to make a copy of it for each reader (because if it changes, the scoring model becomes inaccurate?). However, we could in fact use a 2 dimensional PagedBytes (in this case, PagesInts) for this purpose. Or is the garbage of an int[] the size of the number of docs OK per reader? There is also the lookup cost to consider. > Concurrent byte and int block implementations > --------------------------------------------- > > Key: LUCENE-2575 > URL: https://issues.apache.org/jira/browse/LUCENE-2575 > Project: Lucene - Java > Issue Type: Improvement > Components: Index > Affects Versions: Realtime Branch > Reporter: Jason Rutherglen > Fix For: Realtime Branch > > Attachments: LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch, > LUCENE-2575.patch > > > The current *BlockPool implementations aren't quite concurrent. > We really need something that has a locking flush method, where > flush is called at the end of adding a document. Once flushed, > the newly written data would be available to all other reading > threads (ie, postings etc). I'm not sure I understand the slices > concept, it seems like it'd be easier to implement a seekable > random access file like API. One'd seek to a given position, > then read or write from there. The underlying management of byte > arrays could then be hidden? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org