[
https://issues.apache.org/jira/browse/LUCENE-2047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778335#action_12778335
]
Michael McCandless commented on LUCENE-2047:
--------------------------------------------
Thinking more on this... I'm actually no longer convinced that this
change is worthwhile.
Net/net this will not improve the dps/qps throughput on a given fixed
hardware, because this is a zero sum game: the deletes must be
resolved one way or another.
Whether we do it in batch (as today), or incrementally/concurrently,
one at a time as they arrive, the same work must be done. In fact,
batch should be less costly in practice since it clearly has temporal
locality in resolving terms -> postings, so on a machine whose IO
cache can't hold the entire index in RAM, bulk flushing should be
a win.
It's true that net latency of reopen will be reduced by being
incremental, but Lucene shouldn't aim to be able to reopen 100s of
times per second: I think that's a mis-feature (most apps don't need
it), and those that really do can and should use an approach like
Zoie.
Finally, one can always set the max buffered delete terms/docs to
something low, to achieve this same tradeoff. It's true that won't
get you concurrent resolving of deleted Terms -> docIDs, but I bet in
practice that concurrency isn't necessary (ie the performance of a
single thread resolving all buffered deletes is plenty fast).
If the reopen time today is plenty fast, especially if you configure
your writer to flush often, then I don't think we need incremental
resolving of the deletions?
> IndexWriter should immediately resolve deleted docs to docID in
> near-real-time mode
> -----------------------------------------------------------------------------------
>
> Key: LUCENE-2047
> URL: https://issues.apache.org/jira/browse/LUCENE-2047
> Project: Lucene - Java
> Issue Type: Improvement
> Components: Index
> Reporter: Michael McCandless
> Assignee: Michael McCandless
> Priority: Minor
> Fix For: 3.1
>
> Attachments: LUCENE-2047.patch, LUCENE-2047.patch
>
>
> Spinoff from LUCENE-1526.
> When deleteDocuments(Term) is called, we currently always buffer the
> Term and only later, when it's time to flush deletes, resolve to
> docIDs. This is necessary because we don't in general hold
> SegmentReaders open.
> But, when IndexWriter is in NRT mode, we pool the readers, and so
> deleting in the foreground is possible.
> It's also beneficial, in that in can reduce the turnaround time when
> reopening a new NRT reader by taking this resolution off the reopen
> path. And if multiple threads are used to do the deletion, then we
> gain concurrency, vs reopen which is not concurrent when flushing the
> deletes.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]