[ https://issues.apache.org/jira/browse/LUCENE-2047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778452#action_12778452 ]
Jason Rutherglen commented on LUCENE-2047: ------------------------------------------ I want to replay how DW handle the updateDoc call to see if my understanding is correct. 1: Analyzing hits an exception for a doc, it's doc id has already been allocated so we mark it for deletion later (on flush?) in BufferedDeletes. 2: RAM Buffer writing hits an exception, we've had updates which marked deletes in current segments, however they haven't been applied yet because they're stored in BufferedDeletes docids. They're applied on successful flush. Are these the two scenarios correct or am I completely off target? If correct, isn't update doc already deleting in the foreground? bq. prefer not to add further BG threads Maybe we can use 1.5's ReentrantReadWriteLock to effectively allow multiple del/update doc calls to concurrently acquire the read lock, and perform the deletes in the foreground. The write lock could be acquired during commitDeletes, commit(), and after a segment is flushed? I'm not sure it would be necessary to acquire this write lock anytime segment infos is changed? I think it's important to remove unnecessary global locks on unitary operations (like deletes). We've had great results removing these locks for isDeleted, (NIO)FSDirectory where we didn't think there'd be an improvement, and there was. I think this patch (or a follow on one that implements the shared lock solution) could effectively increase throughput (for deleting and updating), measurably. {quote}Lucene shouldn't aim to be able to reopen 100s of times per second{quote} Reopening after every doc could be a valid case that I suspect will come up again in the future. I don't think it's too hard to support. {quote} It's true that net latency of reopen will be reduced by being incremental, but Lucene shouldn't aim to be able to reopen 100s of times per second: {quote} Perhaps update/del throughput will increase because of the shared lock which would makes the patch(s) worth implementing. {quote} but I bet in practice that concurrency isn't necessary (ie the performance of a single thread resolving all buffered deletes is plenty fast). {quote} We thought the same thing about the sync in FSDirectory, and it turned out that in practice, NIOFSDir is an order of magnitude faster on *nix machines. For NRT, every little bit of concurrency will probably increase throughput. (i.e. most users will have their indexes in IO cache and/or a ram dir, which means we wouldn't be penalizing concurrency as we are today with the global lock IW for del/up docs). I'm going to go ahead and wrap up this patch, which will shift deletion cost to the del/up methods (still synchronously). Then create a separate patch that implements the shared lock solution. Exposing SRs for updates by the user can be done today, I'll open a patch for this. > IndexWriter should immediately resolve deleted docs to docID in > near-real-time mode > ----------------------------------------------------------------------------------- > > Key: LUCENE-2047 > URL: https://issues.apache.org/jira/browse/LUCENE-2047 > Project: Lucene - Java > Issue Type: Improvement > Components: Index > Reporter: Michael McCandless > Assignee: Michael McCandless > Priority: Minor > Fix For: 3.1 > > Attachments: LUCENE-2047.patch, LUCENE-2047.patch > > > Spinoff from LUCENE-1526. > When deleteDocuments(Term) is called, we currently always buffer the > Term and only later, when it's time to flush deletes, resolve to > docIDs. This is necessary because we don't in general hold > SegmentReaders open. > But, when IndexWriter is in NRT mode, we pool the readers, and so > deleting in the foreground is possible. > It's also beneficial, in that in can reduce the turnaround time when > reopening a new NRT reader by taking this resolution off the reopen > path. And if multiple threads are used to do the deletion, then we > gain concurrency, vs reopen which is not concurrent when flushing the > deletes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org