[jira] Commented: (LUCENE-2047) IndexWriter should immediately resolve deleted docs to docID in near-real-time mode

Michael McCandless (JIRA) Mon, 16 Nov 2009 14:54:04 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-2047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12778620#action_12778620
 ]


Michael McCandless commented on LUCENE-2047:
--------------------------------------------

bq. Reopening after every doc could be a valid case that I suspect will come up 
again in the future.

I suspect the vast majority of apps would be fine with 10 reopens per
second.

Those that really must reopen 100s of times per second can use Zoie,
or an approach like it.

bq. I don't think it's too hard to support.

Whoa!  Merely thinking about and discussing even how to run proper
tests for NRT, let alone the possible improvements to Lucene on the
table, is sucking up all my time ;)

{quote}
I think it's important to remove unnecessary global locks on
unitary operations (like deletes).
{quote}

Yeah, I agree we should in general always improve our concurrency.  In
this case, resolving deletes syncs the entire IW + DW, so that blocks
indexing new docs, launching/committing merges, flushing, etc. which
we should fix.  I just don't think NRT is really a driver for this...

{quote}
1: Analyzing hits an exception for a doc, it's doc id has
already been allocated so we mark it for deletion later (on
flush?) in BufferedDeletes.
{quote}

Analyzing or any other "non-aborting" exception, right.

{quote}
2: RAM Buffer writing hits an exception, we've had updates which
marked deletes in current segments, however they haven't been
applied yet because they're stored in BufferedDeletes docids.
They're applied on successful flush.
{quote}

No -- the deletes are buffered as Term, Query or docids, in the
BufferedDeletes.  The only case that buffers docids now is your #1
above.  On successful flush, these buffered things are moved to the
deletesFlush (but not resolved).  They are only resolved when we
decide it's time to apply them (just before a new merge starts, or,
when we've triggered the time-to-flush-deletes limits).

{quote}
Maybe we can use 1.5's ReentrantReadWriteLock to effectively
allow multiple del/update doc calls to concurrently acquire the
read lock, and perform the deletes in the foreground.
{quote}

I think that should work well?


> IndexWriter should immediately resolve deleted docs to docID in 
> near-real-time mode
> -----------------------------------------------------------------------------------
>
>                 Key: LUCENE-2047
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2047
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael McCandless
>            Assignee: Michael McCandless
>            Priority: Minor
>             Fix For: 3.1
>
>         Attachments: LUCENE-2047.patch, LUCENE-2047.patch
>
>
> Spinoff from LUCENE-1526.
> When deleteDocuments(Term) is called, we currently always buffer the
> Term and only later, when it's time to flush deletes, resolve to
> docIDs.  This is necessary because we don't in general hold
> SegmentReaders open.
> But, when IndexWriter is in NRT mode, we pool the readers, and so
> deleting in the foreground is possible.
> It's also beneficial, in that in can reduce the turnaround time when
> reopening a new NRT reader by taking this resolution off the reopen
> path.  And if multiple threads are used to do the deletion, then we
> gain concurrency, vs reopen which is not concurrent when flushing the
> deletes.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] Commented: (LUCENE-2047) IndexWriter should immediately resolve deleted docs to docID in near-real-time mode

Reply via email to