[ https://issues.apache.org/jira/browse/LUCENE-2312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12845061#action_12845061 ]
Michael McCandless commented on LUCENE-2312: -------------------------------------------- bq. IW commitMerge calls docWriter's remapDeletes, a synchronized method to prevent concurrent updates. I'm not sure how we should efficiently block calls to the different DW's. Yeah this is because when we buffer a delete Term/Query, the docID we store against it is absolute. It *seems* like it could/should be relative (ie, within the RAM segment), then remapping wouldn't be needed when a merge commits. I think? bq. _mergeInit calls docWriter getDocStoreSegment - unsure what to change It wouldn't anymore once we have private RAM segments: we would no longer share doc stores across segments, meaning merging will always merge doc stores and there's no need to call that method nor have all the logic in SegmentMerger to determine whether doc store merging is required. This will necessarily be a perf hit when up and building a large index from scratch in a single IW session. Today that index creates one large set of doc stores and never has to merge it while building. This is the biggest perf downside to this change, I think. But maybe the perf loss will not be so bad, because of bulk merging, in the case when all docs always add the same fields in the same order. Or... if we could fix lucene to always bind the same field name to the same field number (LUCENE-1737) then we'd always bulk-merge regardless of which & which order app adds fields to docs. bq. Some of the config settings (such as maxBufferedDocs) can simply be removed from DW, and instead accessed via WriterConfig Ahh, you mean push IWC down to DW? That sounds great. > Search on IndexWriter's RAM Buffer > ---------------------------------- > > Key: LUCENE-2312 > URL: https://issues.apache.org/jira/browse/LUCENE-2312 > Project: Lucene - Java > Issue Type: New Feature > Components: Search > Affects Versions: 3.0.1 > Reporter: Jason Rutherglen > Assignee: Michael Busch > Fix For: 3.1 > > > In order to offer user's near realtime search, without incurring > an indexing performance penalty, we can implement search on > IndexWriter's RAM buffer. This is the buffer that is filled in > RAM as documents are indexed. Currently the RAM buffer is > flushed to the underlying directory (usually disk) before being > made searchable. > Todays Lucene based NRT systems must incur the cost of merging > segments, which can slow indexing. > Michael Busch has good suggestions regarding how to handle deletes using max > doc ids. > https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841923&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841923 > The area that isn't fully fleshed out is the terms dictionary, > which needs to be sorted prior to queries executing. Currently > IW implements a specialized hash table. Michael B has a > suggestion here: > https://issues.apache.org/jira/browse/LUCENE-2293?focusedCommentId=12841915&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12841915 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org