[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter

Michael McCandless (JIRA) Fri, 20 Feb 2009 04:19:31 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12675345#action_12675345
 ]


Michael McCandless commented on LUCENE-1516:
--------------------------------------------

{quote}
The path forward seems to be exposing a cloned readonly reader
from IW.getReader.
{quote}

+1

{quote}
> can't we move away from allowing any changes via IR? (Ie
> deprecate deleteDocuments/setNorms/etc.)

This would simplify things however as a thought experiment how would
the setNorms work if it were a part of IndexWriter?
{quote}

I think it'd look like this?
{code}
IndexWriter.setNorm(Term term, String field, byte norm)
{code}

Ie the Term IDs the doc(s) you want to set the norm for.

{quote}

> And, clone should not be reopening segments...? 

DirectoryIndexReader.clone(boolean openReadonly) calls
doReopen(SegmentInfos infos, boolean doClone, boolean openReadOnly)
which is an abstract method that in SegmentReader and
MultiSegmentReader reopens the segments? The segment infos for a
ReaderIW is obtained from IW, which is how it knows about the new
segments. Perhaps not desired behavior?
{quote}

OK, I think it does not reopen *existing* segments.  Meaning, if a
segment is in common w/ old and new, it truly clones it (does not
reopen norms nor del).  But if there is a new segment that did not
exist in old, it opens a whole new segment reader?  I'll commit an
assert that this doesn't happen -- if caller passes in "doClone=true"
then caller should not have passed in a segmentInfos with changes?
Else the reader is on thin ice (mismatch what's in RAM vs what
SegmentInfo says).

{quote}
> do we need delete by docID once we have realtime search? I
> think the last compelling reason to keep IR's delete by docID was
> immediacy, but realtime search can give us that, from IW, even when
> deleting by Term or Query? 

Good point! I think we may want to support it but for now it's
shouldn't be necessary. I'm thinking of the case where someone is
using the field cache (or some variant), performs some sort of query
on it and then needs to delete based on doc id. What do they do?
Would we expose a callback mechanism where a deleteFrom(IndexReader
ir) method is exposed and deletes occur at the time of the IW's
choosing?
{quote}

Wouldn't delete-by-Query cover this?  Ie one could always make a
Filter implementing the "look @ field cache, do some logic, provide
docIDs to delete", wrap as Query, then delete-by-Query?

{quote}
> It seems like calling reader.reopen() (on reader obtained
> from writer) should basically do the same thing as calling
> writer.getReader(). Ie they are nearly synonyms? (Except for small
> difference in ref counting - I think writer.getReader() should always
> incRef, but reopen only incRefs if it returns a new reader). 

Perhaps ReaderIW.reopen will call IW.getReader underneath instead of
using IR's usual mechanism.
{quote}

Right, that's what I'm thinking.  Once you've obtained reader coupled
to a writer, you can then simply reopen it whenever you want to see
(materialize) changes done by the writer.

We still need a solution for the "warm the just merged
segment"... else we will not be realtime, especially when big merge
finishes.  It seems like after merge finishes, it should immediately
1) open a SegmentReader on the new segment, 2) invoke the method you
passed in (or you subclassed -- not sure which), 3) carry over deletes
that materialized during the merge, 4) commit the merge (replace old
segments w/ new one).


> Integrate IndexReader with IndexWriter 
> ---------------------------------------
>
>                 Key: LUCENE-1516
>                 URL: https://issues.apache.org/jira/browse/LUCENE-1516
>             Project: Lucene - Java
>          Issue Type: Improvement
>    Affects Versions: 2.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>             Fix For: 2.9
>
>         Attachments: LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch, 
> LUCENE-1516.patch, LUCENE-1516.patch, LUCENE-1516.patch
>
>   Original Estimate: 672h
>  Remaining Estimate: 672h
>
> The current problem is an IndexReader and IndexWriter cannot be open
> at the same time and perform updates as they both require a write
> lock to the index. While methods such as IW.deleteDocuments enables
> deleting from IW, methods such as IR.deleteDocument(int doc) and
> norms updating are not available from IW. This limits the
> capabilities of performing updates to the index dynamically or in
> realtime without closing the IW and opening an IR, deleting or
> updating norms, flushing, then opening the IW again, a process which
> can be detrimental to realtime updates. 
> This patch will expose an IndexWriter.getReader method that returns
> the currently flushed state of the index as a class that implements
> IndexReader. The new IR implementation will differ from existing IR
> implementations such as MultiSegmentReader in that flushing will
> synchronize updates with IW in part by sharing the write lock. All
> methods of IR will be usable including reopen and clone. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-1516) Integrate IndexReader with IndexWriter

Reply via email to