[follow up to the new mailing list]

Andi Vajda wrote:
> Looking at your stacktrace, it would seem that JNIEnv is NULL (this=0x0).
> I recently fixed a bug in JCC with a NULL JNIEnv caused by a line of code 
> being emitted too late in an extension method. You would hit this bug if you 
> wrote Python extensions of some Lucene Java classes as is possible with JCC.
> 
> If you could send me a more complete stacktrace, up to the method in your 
> code or PyLucene, and its corresponding source code I could confirm this.
> 
> You could also try out the fix I did (which fixed the bug I had) by getting 
> the latest JCC sources from PyLucene's new home at Apache:
>      http://svn.apache.org/repos/asf/lucene/pylucene/trunk/
> and rebuilding your libraries. This was a bug in the C++ code generator.
> 
> Please, let me know if this fixes your problem as well.
> Thanks !

Thanks Andi!

An update to the latest svn revision of JCC and Lucene didn't help the
cause. The server kept crashing in regular intervals. However the
additional core dumps gave me additional data. Eventually I was able to
debug and fix the culprit.

JCC was causing a crash in a piece of code and a thread that wasn't
using any Java objects at all. At least I thought so in the first place.
We are using a CherryPy plugin called 'Dowser' [1] to keep track of
reference counts and possible memory leaks. The segfault has always
occurred inside Dowser code and the Dowser thread. I couldn't make sense
of it. Dowser doesn't touch Lucene and JCC at all. Or does it?

Once I started paying more attention to the exact line -- the pystack
macro from Python's gdbinit is a life saver -- I got a clue. Dowser uses
gc.get_objects() to iterate every 5 seconds over all objects tracked by
Python's cyclic gc. A race condition induced a situation where the list
returned by gc.get_objects() was holding the last reference to a JCC
object.
The Dowser thread didn't have a JNI instance attached to because I never
thought it would matter. At the end of the "for obj in
gc.get_objects():" loop, the ref count of the JCC object dropped to zero
... no JCC ENV ... SEGFAULT.

Conclusion:
Never combine JCC and gc.get_objects() unless you attach *all* Python
threads.

Christian

[1] http://www.aminus.net/wiki/Dowser

Reply via email to