I have had similar experience, but it was always a problem on the java side.
What helped was to dump memory:

-Xms512m -Xmx4500m -XX:+HeapDumpOnCtrlBreak -XX:+HeapDumpOnOutOfMemoryError

Documentation says that upon catching the OOM, you should stop the JVM
immediately. But actually it was possible to handle these problems. I
started the processing inside a separate thread, cleaning properly --
if the thread raises OOM, it is possible to continue - I have done
tests on thousands of docs and it always worked. But the main benefit
of that solution is that I can see the errors inside Python and
gracefully stop execution (without being shut out into the space).
Marcus, I would recommend wrapping your processing inside a thread
that starts another worker thread and make sure no references are
kept.

Roman

On Fri, Apr 15, 2011 at 4:33 PM, Bill Janssen <jans...@parc.com> wrote:
> Marcus <qwe...@gmail.com> wrote:
>
>> --bcaec53043296dfbfd04a0ece1ac
>> Content-Type: text/plain; charset=ISO-8859-1
>>
>> we're currently using 4GB max heap.
>> We recently moved from 2GB to 4GB when we discovered it prevented a crash
>> with a certain set of docs.
>> Marcus
>
> I've tried the same workaround with the heap in the past, and I found it
> caused NoMemory crashes in the Python side of the house, because the
> Python VM couldn't get enough memory to operate.  So, be careful.
>
>> On Thu, Apr 14, 2011 at 5:01 PM, Andi Vajda <va...@apache.org> wrote:
>>
>> >
>> > On Thu, 14 Apr 2011, Marcus wrote:
>> >
>> >  thanks.
>> >>
>> >> I have documents that will consistently cause this upon writing them to
>> >> the
>> >> index. let me see if I can reduce them down to the crux of the crash.
>> >> granted, these are docs are very large, unruly "bad" data, that should
>> >> have
>> >> never gotten this stage in our pipeline, but I was hoping for a java or
>> >> lucene exception.
>> >>
>> >> I also get "Java GC overhead" exceptions passed into my code from time to
>> >> time, but those manageable, and not crashes.
>> >>
>> >> Are there known memory constraint scenarios that force a c++ exception,
>> >> whereas in a normal Java environment,  you would get a memory error?
>> >>
>> >
>> > Not sure.
>> >
>> >
>> >  and just confirming, do "java.lang.OutOfMemoryError" errors pass into
>> >> python, or force a crash?
>> >>
>> >
>> > Not sure, I've never seen these as I make sure I've got enough memory.
>> > initVM() is the place where you can configure the memory for your JVM.
>> >
>> > Andi..
>> >
>> >
>> >
>> >> thanks again
>> >> Marcus
>> >>
>> >> On Thu, Apr 14, 2011 at 2:07 PM, Andi Vajda <va...@apache.org> wrote:
>> >>
>> >>
>> >>> On Thu, 14 Apr 2011, Marcus wrote:
>> >>>
>> >>>  in certain cases when a java/pylucene exception occurs,  it gets passed
>> >>> up
>> >>>
>> >>>> in my code, and I'm able to analyze the situation.
>> >>>> sometimes though,  the python process just crashes, and if I happen to
>> >>>> be
>> >>>> in
>> >>>> top (linux top that is), I see a JCC exception flash up in the top
>> >>>> console.
>> >>>> where can I go to look for this exception, or is it just lost?
>> >>>> I looked in the locations where a java crash would be located, but
>> >>>> didn't
>> >>>> find anything.
>> >>>>
>> >>>>
>> >>> If you're hitting a crash because of an unhandled C++ exception, running
>> >>> a
>> >>> debug build with symbols under gdb will help greatly in tracking it down.
>> >>>
>> >>> An unhandled C++ exception would be a PyLucene/JCC bug. If you have a
>> >>> simple way to reproduce this failure, send it to this list.
>> >>>
>> >>> Andi..
>> >>>
>> >>>
>> >>
>>
>> --bcaec53043296dfbfd04a0ece1ac--
>

Reply via email to