Hi,
after spending too many hours trying to build a small test case, I have to rip
pylucene out of my app because at one point I experience a reproducible
SIGSEGV. This looks to me like a "super bug" (bugs that are nearly impossible
to fix because they only occur in very complex situations). So I try to
describe my symptoms and things I have done so far. Maybe someone, sometime
will have an idea how to fix it.
I can reproduce the bug in my TurboGears app with > 10.000 LoC with only two
HTTP requests but was unable to strip it down even to a simple TG app.
Sometimes even removing an unused import statement camouflages the problem :-(
It definitely has something to do with threading. If I only enable one thread,
everything is rock-solid. And from the backtrace I get, it looks like the
garbage collector is involved there, too:
# An unexpected error has been detected by Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x0144f25b, pid=11801, tid=165989264
#
# Java VM: OpenJDK Client VM (1.6.0_0-b11 mixed mode linux-x86)
# Problematic frame:
# C [_lucene.so+0x46525b] _ZN7JNIEnv_15DeleteGlobalRefEP8_jobject+0x9
#
This is the stack I get:
Stack: [0x0944c000,0x09e4d000], sp=0x09e469b0, free space=10218k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C [_lucene.so+0x46525b] _ZN7JNIEnv_15DeleteGlobalRefEP8_jobject+0x9
C [_lucene.so+0x4648d9] _ZN6JCCEnv15deleteGlobalRefEP8_jobjecti+0xed
C [_lucene.so+0x14e338] _ZN7JObjectaSERKS_+0xb6
C [_lucene.so+0x2797d9]
C [libpython2.4.so.1.0+0x488e2]
C [libpython2.4.so.1.0+0x63eac]
C [libpython2.4.so.1.0+0x488e2]
C [collectionsmodule.so+0x18c7]
C [collectionsmodule.so+0x29b0]
C [libpython2.4.so.1.0+0x4a2cf]
C [libpython2.4.so.1.0+0x49efc] PyDict_Clear+0x13c
C [libpython2.4.so.1.0+0x49f8d]
C [libpython2.4.so.1.0+0xac2f4]
C [libpython2.4.so.1.0+0xac67b] _PyObject_GC_Malloc+0xdb
C [libpython2.4.so.1.0+0xac705] _PyObject_GC_New+0x25
C [libpython2.4.so.1.0+0x49b97] PyDict_New+0x127
C [libpython2.4.so.1.0+0x29fe7] PyInstance_NewRaw+0x127
C [libpython2.4.so.1.0+0x2a0e7] PyInstance_New+0x27
C [libpython2.4.so.1.0+0x1fd87] PyObject_Call+0x37
I'm using Python 2.4 on CentOS 5.2 (i386 and x86_64) with OpenJDK
(java-1.6.0-openjdk-1.6.0.0-0.20.b11.el5). However I can reproduce the bug
with Python 2.5 on Fedora 9 (x86_64) and Python 2.4 on Windows (i386).
I'm pretty sure that I do attachCurrentThread for every thread which actually
instantiates/works with Lucene (Java) objects. Only some threads may have an
"import lucene" but every call into lucene is guarded by
attachCurrentThread... I called attachCurrentThread twice considered harmful?
For some months I did not really care because I thought this bug would be i386
only as I was not able to reproduce it on our x86_64 machines a single time.
Now I discovered an access pattern which crashes even the 64 bit machines...
If someone has advise on this topic, I would be *very* thankful - although I'm
quite pessimistic, given the complex interactions to reproduce the bug [1].
fs
[1] Heck, even doing an "import thread; print thread.get_ident()" before the
line where it crashes camouflages the crash!
_______________________________________________
pylucene-dev mailing list
pylucene-dev@osafoundation.org
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev