[ https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12914802#action_12914802 ]
Michael McCandless commented on LUCENE-2575: -------------------------------------------- {quote} bq. Can we just have IW allocate a new byte[][] after flush? So then any open readers can keep using the one they have? This means the prior byte[]s will still be recycled after all active previous flush readers are closed? {quote} Probably we should stop reusing the byte[] with this change? So when all readers using a given byte[] are finally GCd, is when that byte[] is reclaimed. {quote} bq. it's possible single level skipping, with a larger skip interval, is fine for even large RAM buffers. True, I'll implement a default of one level, and a default large-ish skip interval. {quote} Well, I was thinking only implement the single-level skip case (since it ought to be alot simpler than the MLSLW/R).... {quote} How many scorers, or how often is skipping used? It's mostly for disjunction queries? {quote} Actually, conjunction (AND) queries, and also PhraseQuery (which is really an AND query followed by positions checking). One thing to remember is that skipping is *costly* (especially, the first time you use it) -- I think we over-use it today, ie, in many cases we should do a spin loop (.next()) instead, if your target "is not that far away". PhraseQuery (the exact case) has a heuristic to do this, but really this ought to be implemented in the codec. bq. get deletes working in the RT branch, Do we have a design thought out for this? The challenge is because every doc state now has its own private docID stream, we need a global sequence ID to track "when" a deletion arrived, to know whether or not that deletion applies to each docID, right? (And, each added doc must also record the sequenceID when it was added). > Concurrent byte and int block implementations > --------------------------------------------- > > Key: LUCENE-2575 > URL: https://issues.apache.org/jira/browse/LUCENE-2575 > Project: Lucene - Java > Issue Type: Improvement > Components: Index > Affects Versions: Realtime Branch > Reporter: Jason Rutherglen > Fix For: Realtime Branch > > Attachments: LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch, > LUCENE-2575.patch > > > The current *BlockPool implementations aren't quite concurrent. > We really need something that has a locking flush method, where > flush is called at the end of adding a document. Once flushed, > the newly written data would be available to all other reading > threads (ie, postings etc). I'm not sure I understand the slices > concept, it seems like it'd be easier to implement a seekable > random access file like API. One'd seek to a given position, > then read or write from there. The underlying management of byte > arrays could then be hidden? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org