Thanks Mike for a great explanation on Flush IOException

I was thinking on the perspective of a HDFSDirectory. In addition to the
all causes of IOException during flush you have listed, a HDFSDirectory
also has to deal with network issues, which is not lucene's problem at all.

But I would ideally like to handle momentary network blips, as these are
fully recoverable errors.


Will NRTCachingDirectory help in case of HDFSDirectory? If all goes well, I
should always flush to RAM and sync to HDFS happens only during commits. In
such cases, I can have a retry logic inside sync() method for handling
momentary IOExceptions


--
Ravi


On Tue, Dec 17, 2013 at 9:14 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:

> On Mon, Dec 16, 2013 at 7:33 AM, Ravikumar Govindarajan
> <ravikumar.govindara...@gmail.com> wrote:
> > I am trying to model a transaction-log for lucene, which creates a
> > transaction-log per-commit
> >
> > Things work fine during normal operations, but I cannot fathom the effect
> > during
> >
> > a. IOException during Index-Commit
> >
> > Will the index be restored to previous commit-point? Can I blindly re-try
> > operations from the current transaction log, after some time interval?
>
> Yes: if an IOException is thrown from IndexWriter.commit then the
> commit failed and the index still "shows" the previous successful
> commit.
>
> > b. IOException during Background-Flush
> >
> > Will all the RAM buffers including deletes for that DWPT be cleaned up?
> > flush() being per-thread and async obviously has problems with my
> > transaction-log-per-commit approach, right?
> >
> > Most of the time, the IOExceptions are temporary and recoverable [Ex:
> > Solr's HDFSDirectory etc...]. So, I must definitely retry these
> operations
> > after some time-interval.
>
> IOExceptions during flush are trickier.  Often it will mean all
> documents assigned to that segment are lost, but not necessarily (e.g.
> if the IOE happened while creating a compound file).
>
> IOExceptions during add/updateDocument are also possible (e.g. we
> write stored fields, term vectors per-doc), which can result in losing
> all documents in that one segment as well (an aborting exception), but
> e.g. an IOE thrown by the analyzer, will just result in that one
> document being lost (a non-aborting exception).
>
> Since you cannot know which case it was, it's probably safest to
> define a primary key field, and always use IW.updateDocument.  This
> way if the document was in fact not lost, and you re-index it, you
> just replace it, instead of creating a duplicate.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

Reply via email to