Hi,

after spending too many hours trying to build a small test case, I have to rip pylucene out of my app because at one point I experience a reproducible SIGSEGV. This looks to me like a "super bug" (bugs that are nearly impossible to fix because they only occur in very complex situations). So I try to describe my symptoms and things I have done so far. Maybe someone, sometime will have an idea how to fix it.

I can reproduce the bug in my TurboGears app with > 10.000 LoC with only two HTTP requests but was unable to strip it down even to a simple TG app. Sometimes even removing an unused import statement camouflages the problem :-(

It definitely has something to do with threading. If I only enable one thread, everything is rock-solid. And from the backtrace I get, it looks like the garbage collector is involved there, too:

# An unexpected error has been detected by Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x0144f25b, pid=11801, tid=165989264
#
# Java VM: OpenJDK Client VM (1.6.0_0-b11 mixed mode linux-x86)
# Problematic frame:
# C  [_lucene.so+0x46525b]  _ZN7JNIEnv_15DeleteGlobalRefEP8_jobject+0x9
#

This is the stack I get:
Stack: [0x0944c000,0x09e4d000],  sp=0x09e469b0,  free space=10218k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [_lucene.so+0x46525b]  _ZN7JNIEnv_15DeleteGlobalRefEP8_jobject+0x9
C  [_lucene.so+0x4648d9]  _ZN6JCCEnv15deleteGlobalRefEP8_jobjecti+0xed
C  [_lucene.so+0x14e338]  _ZN7JObjectaSERKS_+0xb6
C  [_lucene.so+0x2797d9]
C  [libpython2.4.so.1.0+0x488e2]
C  [libpython2.4.so.1.0+0x63eac]
C  [libpython2.4.so.1.0+0x488e2]
C  [collectionsmodule.so+0x18c7]
C  [collectionsmodule.so+0x29b0]
C  [libpython2.4.so.1.0+0x4a2cf]
C  [libpython2.4.so.1.0+0x49efc]  PyDict_Clear+0x13c
C  [libpython2.4.so.1.0+0x49f8d]
C  [libpython2.4.so.1.0+0xac2f4]
C  [libpython2.4.so.1.0+0xac67b]  _PyObject_GC_Malloc+0xdb
C  [libpython2.4.so.1.0+0xac705]  _PyObject_GC_New+0x25
C  [libpython2.4.so.1.0+0x49b97]  PyDict_New+0x127
C  [libpython2.4.so.1.0+0x29fe7]  PyInstance_NewRaw+0x127
C  [libpython2.4.so.1.0+0x2a0e7]  PyInstance_New+0x27
C  [libpython2.4.so.1.0+0x1fd87]  PyObject_Call+0x37

I'm using Python 2.4 on CentOS 5.2 (i386 and x86_64) with OpenJDK (java-1.6.0-openjdk-1.6.0.0-0.20.b11.el5). However I can reproduce the bug with Python 2.5 on Fedora 9 (x86_64) and Python 2.4 on Windows (i386).

I'm pretty sure that I do attachCurrentThread for every thread which actually instantiates/works with Lucene (Java) objects. Only some threads may have an "import lucene" but every call into lucene is guarded by attachCurrentThread... I called attachCurrentThread twice considered harmful?

For some months I did not really care because I thought this bug would be i386 only as I was not able to reproduce it on our x86_64 machines a single time. Now I discovered an access pattern which crashes even the 64 bit machines...

If someone has advise on this topic, I would be *very* thankful - although I'm quite pessimistic, given the complex interactions to reproduce the bug [1].

fs

[1] Heck, even doing an "import thread; print thread.get_ident()" before the line where it crashes camouflages the crash!

_______________________________________________
pylucene-dev mailing list
pylucene-dev@osafoundation.org
http://lists.osafoundation.org/mailman/listinfo/pylucene-dev

Reply via email to