I have had similar experience, but it was always a problem on the java side. What helped was to dump memory:
-Xms512m -Xmx4500m -XX:+HeapDumpOnCtrlBreak -XX:+HeapDumpOnOutOfMemoryError Documentation says that upon catching the OOM, you should stop the JVM immediately. But actually it was possible to handle these problems. I started the processing inside a separate thread, cleaning properly -- if the thread raises OOM, it is possible to continue - I have done tests on thousands of docs and it always worked. But the main benefit of that solution is that I can see the errors inside Python and gracefully stop execution (without being shut out into the space). Marcus, I would recommend wrapping your processing inside a thread that starts another worker thread and make sure no references are kept. Roman On Fri, Apr 15, 2011 at 4:33 PM, Bill Janssen <jans...@parc.com> wrote: > Marcus <qwe...@gmail.com> wrote: > >> --bcaec53043296dfbfd04a0ece1ac >> Content-Type: text/plain; charset=ISO-8859-1 >> >> we're currently using 4GB max heap. >> We recently moved from 2GB to 4GB when we discovered it prevented a crash >> with a certain set of docs. >> Marcus > > I've tried the same workaround with the heap in the past, and I found it > caused NoMemory crashes in the Python side of the house, because the > Python VM couldn't get enough memory to operate. So, be careful. > >> On Thu, Apr 14, 2011 at 5:01 PM, Andi Vajda <va...@apache.org> wrote: >> >> > >> > On Thu, 14 Apr 2011, Marcus wrote: >> > >> > thanks. >> >> >> >> I have documents that will consistently cause this upon writing them to >> >> the >> >> index. let me see if I can reduce them down to the crux of the crash. >> >> granted, these are docs are very large, unruly "bad" data, that should >> >> have >> >> never gotten this stage in our pipeline, but I was hoping for a java or >> >> lucene exception. >> >> >> >> I also get "Java GC overhead" exceptions passed into my code from time to >> >> time, but those manageable, and not crashes. >> >> >> >> Are there known memory constraint scenarios that force a c++ exception, >> >> whereas in a normal Java environment, you would get a memory error? >> >> >> > >> > Not sure. >> > >> > >> > and just confirming, do "java.lang.OutOfMemoryError" errors pass into >> >> python, or force a crash? >> >> >> > >> > Not sure, I've never seen these as I make sure I've got enough memory. >> > initVM() is the place where you can configure the memory for your JVM. >> > >> > Andi.. >> > >> > >> > >> >> thanks again >> >> Marcus >> >> >> >> On Thu, Apr 14, 2011 at 2:07 PM, Andi Vajda <va...@apache.org> wrote: >> >> >> >> >> >>> On Thu, 14 Apr 2011, Marcus wrote: >> >>> >> >>> in certain cases when a java/pylucene exception occurs, it gets passed >> >>> up >> >>> >> >>>> in my code, and I'm able to analyze the situation. >> >>>> sometimes though, the python process just crashes, and if I happen to >> >>>> be >> >>>> in >> >>>> top (linux top that is), I see a JCC exception flash up in the top >> >>>> console. >> >>>> where can I go to look for this exception, or is it just lost? >> >>>> I looked in the locations where a java crash would be located, but >> >>>> didn't >> >>>> find anything. >> >>>> >> >>>> >> >>> If you're hitting a crash because of an unhandled C++ exception, running >> >>> a >> >>> debug build with symbols under gdb will help greatly in tracking it down. >> >>> >> >>> An unhandled C++ exception would be a PyLucene/JCC bug. If you have a >> >>> simple way to reproduce this failure, send it to this list. >> >>> >> >>> Andi.. >> >>> >> >>> >> >> >> >> --bcaec53043296dfbfd04a0ece1ac-- >