"Dino Korah" <[EMAIL PROTECTED]> wrote:
> I understand lucene has a requirement of double the size of index
> available
> free on the disk on which the index is being optimised. But if in case
> the
> disk gets filled up during optimisation, what will happen to the index,
> theoretically? Is there an effective way of avoiding this?

Beware, if you have readers using the index while optimize is running,
the disk space required is 3X the index size.  Worse, if those readers
are refreshing while optimize is running (by checking isCurrent() for
example), it will be even more than 3X.  It's best not to refresh
readers during an optimize.  LUCENE-710 (patch pending) aims to make
this easier by adding a "commit only on close" mode to the IndexWriter
such that readers don't see any changes until the writer is closed.

Also note that addDocument() periodically merges segments.  Every so
often (maxBufferedDocs * powers of mergeFactor) that merging will
completely cascade (produce 1 segment), making it the equivalent of an
optimize() call.  So the same disk usage caveat applies for those
addDocument() calls.

On disk full during addDocument() or during optimize(), the index will
not become corrupt.  However: prior to Lucene 2.1, you will find extra
unused partially written large files in your index (often they end
with .tmp).  As of Lucene 2.1, Lucene tries to remove such files on
hitting an exception (disk full or otherwise).

Note that if you hit disk full during addIndexes(), prior to Lucene
2.1 this could corrupt your index.  As of Lucene 2.1, it won't.  (See
LUCENE-702 for details.)

The only way to avoid disk full is to get more disk space, I think.
Unfortunately, simply "never calling optimize" won't do it because of
the case above where addDocument effectively results in an optimize.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to