Christiaan Fluit wrote:
It seems that it gets up to the point to commit, but the "IW:
commitMerge done" message is never reached.
Furthermore, no exceptions are printed to the output, so
handleMergeException does not seem to have been invoked.
Should I add more debug statements elsewhere?
I may be on to something already.
I just looked at the commitMerge code and was surprised to see that the
commitMerge message that is almost at the beginning wasn't printed. Then
I saw the "if (hitOOM) return false;" part that takes place before that.
I think that this can only mean that an OOME was encountered at some
point in time.
Now, the fact is that in my indexing code I do a catch(Throwable) in
several places. I do this particularly because JET handles OOMEs in a
very, very nasty way. Often you will just get an error dialog and then
it quits the entire application. Therefore, my client code catches, logs
and swallows the OOME before the JET runtime can intercept it.
*Usually*, the application can then recover gracefully and continue
processing the rest of the information.
Catching a OOME that results from the operation of a text extraction
library is one thing (and a fact of life really), but perhaps there are
also OOME's that occur during Lucene processing.
I remember seeing those in the past with the original Java code, when
very large Strings were being tokenized and I got an OOME with a deep
Lucene stacktrace. I copied one such stacktrace that I have saved at the
end of this mail.
I see some caught and swallowed OOME's in my log file but unfortunately
they are without a stacktrace - probably again a JET issue. I can run
the normal Java build though to see if such OOMEs occur on this dataset.
Now, I wonder:
- when the IW is in auto-commit mode, can the failed processing of a
Document due to an OOME have an impact on the processing of subsequent
Documents or the merge/optimize operations? Can the index(writer) become
corrupt and result in problems such as these?
- even though the commitMerge returns false, it should probably not get
into an infinite loop. Is this an internal Lucene problem or is there
something I can/should do about it myself?
- more generally, what is the recommended behavior when I get an OOME
during Lucene processing, particularly IW.addDocument? Should the IW be
able to recover by itself or is there some sort of rollback I need to
perform? Again, note that my index is in auto-commit mode (though I had
hoped to let go of that too, it's only for historic reasons).
Regards,
Chris
--
java.lang.OutOfMemoryError: Java heap space
at
org.apache.lucene.index.DocumentsWriter.getPostings(DocumentsWriter.java:3069)
at
org.apache.lucene.index.DocumentsWriter$ThreadState$FieldData.addPosition(DocumentsWriter.java:1696)
at
org.apache.lucene.index.DocumentsWriter$ThreadState$FieldData.invertField(DocumentsWriter.java:1525)
at
org.apache.lucene.index.DocumentsWriter$ThreadState$FieldData.processField(DocumentsWriter.java:1412)
at
org.apache.lucene.index.DocumentsWriter$ThreadState.processDocument(DocumentsWriter.java:1121)
at
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:2442)
at
org.apache.lucene.index.DocumentsWriter.addDocument(DocumentsWriter.java:2424)
at
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1464)
at
org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1442)
at info.aduna...........
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org