Re: segfault in JCCEnv::deleteGlobalRef
Am 11.05.2011 19:41, schrieb Andi Vajda: > If these functions eventually instantiate a Thread class, even indirectly, > the monkey-patching may still work. Some of the code doesn't use the threading module at all, just thread or the internal C API. I'd have to patch the modules and C code. > That may cover this case but what about all the others ? > There is a reason the call has to be manual. > > I've not been able to automate it before. > Over time, I've added checks where I could but I've not found it possible to > cover all cases where attachCurrentThread() wasn't called. > > Anyhow, try it and see if it fixes the problem you're seeing. > If any of the objects being freed invoke user code that eventually call into > the JVM, the problem is going to appear again elsewhere. I understand your reluctance to automate the attaching of Python threads to the JVM. Explicit is better than implicit. However this is a special case. CPython doesn't allow to control cyclic garbage collector's threading attachment nor does CPython have a hook that is called for newly created threads. It's hard to debug a segfault when even code like "a = []" can trigger the bug. The attached patch doesn't trigger the bug in my artificial test code. I'm going to run our test suite several times. That's going to take a while. Christian Index: jcc/sources/jcc.cpp === --- jcc/sources/jcc.cpp (Revision 1088091) +++ jcc/sources/jcc.cpp (Arbeitskopie) @@ -33,6 +33,25 @@ /* JCCEnv */ +int jccenv_attachCurrentThread(char *name, int asDaemon) +{ + int result; +JNIEnv *jenv = NULL; + +JavaVMAttachArgs attach = { +JNI_VERSION_1_4, name, NULL +}; + +if (asDaemon) +result = env->vm->AttachCurrentThreadAsDaemon((void **) &jenv, &attach); +else +result = env->vm->AttachCurrentThread((void **) &jenv, &attach); + +env->set_vm_env(jenv); + +return result; +} + class t_jccenv { public: PyObject_HEAD @@ -154,21 +173,11 @@ { char *name = NULL; int asDaemon = 0, result; -JNIEnv *jenv = NULL; if (!PyArg_ParseTuple(args, "|si", &name, &asDaemon)) return NULL; -JavaVMAttachArgs attach = { -JNI_VERSION_1_4, name, NULL -}; - -if (asDaemon) -result = env->vm->AttachCurrentThreadAsDaemon((void **) &jenv, &attach); -else -result = env->vm->AttachCurrentThread((void **) &jenv, &attach); - -env->set_vm_env(jenv); +result = jccenv_attachCurrentThread(name, asDaemon); return PyInt_FromLong(result); } Index: jcc/sources/JCCEnv.cpp === --- jcc/sources/JCCEnv.cpp (Revision 1088091) +++ jcc/sources/JCCEnv.cpp (Arbeitskopie) @@ -318,6 +318,16 @@ { if (iter->second.count == 1) { +JNIEnv *vm_env = get_vm_env(); +if (!vm_env) +{ +/* Python's cyclic garbage collector may remove + * an object inside a thread that is not attached + * to the JVM. This makes sure JCC doesn't segfault. + */ +jccenv_attachCurrentThread(NULL, 0); +vm_env = get_vm_env(); +} get_vm_env()->DeleteGlobalRef(iter->second.global); refs.erase(iter); } Index: jcc/sources/JCCEnv.h === --- jcc/sources/JCCEnv.h (Revision 1088091) +++ jcc/sources/JCCEnv.h (Arbeitskopie) @@ -72,6 +72,8 @@ typedef jclass (*getclassfn)(void); +int jccenv_attachCurrentThread(char *name, int asDaemon); + class countedRef { public: jobject global;
Re: segfault in JCCEnv::deleteGlobalRef
On Wed, 11 May 2011, Christian Heimes wrote: Am 11.05.2011 19:03, schrieb Andi Vajda: If these libraries use Python's Thread class you have some control. Create a subclass of Thread that runs your hook and insert it into the threading module (threading.Thread = YourThreadSubclass) before anyone else gets a chance to create threads. One library is using thread.start_new_thread() and another uses Python's C API to create an internal monitor thread. This makes it even harder to fix the issue. If these functions eventually instantiate a Thread class, even indirectly, the monkey-patching may still work. How would you feel about another approach? * factor out the attach routine of t_jccenv_attachCurrentThread() as C function int jccenv_attachCurrentThread(char *name, int asDaemon) { JNIEnv *jenv = NULL; JavaVMAttachArgs attach = { JNI_VERSION_1_4, name, NULL }; if (asDaemon) result = env->vm->AttachCurrentThreadAsDaemon((void **) &jenv, &attach); else result = env->vm->AttachCurrentThread((void **) &jenv, &attach); env->set_vm_env(jenv); return result; } * modify JCCEnv::deleteGlobalRef() to check get_vm_env() for NULL if (iter->second.count == 1) { JNIEnv *vm_env = get_vm_env() if (!vm_env) { jccenv_attachCurrentThread(NULL, 0); vm_env = get_vm_env(); } vm_env->DeleteGlobalRef(iter->second.global); refs.erase(iter); } That may cover this case but what about all the others ? There is a reason the call has to be manual. I've not been able to automate it before. Over time, I've added checks where I could but I've not found it possible to cover all cases where attachCurrentThread() wasn't called. Anyhow, try it and see if it fixes the problem you're seeing. If any of the objects being freed invoke user code that eventually call into the JVM, the problem is going to appear again elsewhere. Andi..
Re: segfault in JCCEnv::deleteGlobalRef
Am 11.05.2011 19:03, schrieb Andi Vajda: > If these libraries use Python's Thread class you have some control. > > Create a subclass of Thread that runs your hook and insert it into the > threading module (threading.Thread = YourThreadSubclass) before anyone else > gets a chance to create threads. One library is using thread.start_new_thread() and another uses Python's C API to create an internal monitor thread. This makes it even harder to fix the issue. How would you feel about another approach? * factor out the attach routine of t_jccenv_attachCurrentThread() as C function int jccenv_attachCurrentThread(char *name, int asDaemon) { JNIEnv *jenv = NULL; JavaVMAttachArgs attach = { JNI_VERSION_1_4, name, NULL }; if (asDaemon) result = env->vm->AttachCurrentThreadAsDaemon((void **) &jenv, &attach); else result = env->vm->AttachCurrentThread((void **) &jenv, &attach); env->set_vm_env(jenv); return result; } * modify JCCEnv::deleteGlobalRef() to check get_vm_env() for NULL if (iter->second.count == 1) { JNIEnv *vm_env = get_vm_env() if (!vm_env) { jccenv_attachCurrentThread(NULL, 0); vm_env = get_vm_env(); } vm_env->DeleteGlobalRef(iter->second.global); refs.erase(iter); } Christian
Re: segfault in JCCEnv::deleteGlobalRef
Am 11.05.2011 18:26, schrieb Andi Vajda: > There shouldn't be any random threads. Threads don't just appear out of thin > air. You create them. If there is a chance that they call into the JVM, then > attachCurrentThread(). I've already made sure, that all our code and threads are calling a hook, which attaches the thread to the JVM. But I don't have control over all threads. Some threads are created in third party libraries. I would have to check and patch every third party tool, we are using. >> I wonder, why it wasn't noticed earlier. > > Did anything else change in your application besides the Python version ? > 32-bit to 64-bit ? (more memory used, more frequent GCs) > Something in the code ? I done testing with the same code base on a single machine. The Python 2.7 branch of our application just has a few changes like python2.6 -> python2.7. Nothing else is different. JCC and Lucence are compiled from the very same tar ball with the same version of GCC. We had very few segfaults in our test suite over the past months (more than five test runs every day, less than one crash per week). With Python 2.7 I'm seeing crashes three of five test runs. The example code crashes both Python 2.6.6. + JCC 2.7 + PyLucene 3.0.3 and Python 2.7.1 + JCC 2.8 + PyLucene 3.1.0 on my laptop (Ubuntu 10.10 X86_64). Christian
Re: segfault in JCCEnv::deleteGlobalRef
Am 11.05.2011 18:27, schrieb Andi Vajda: > Does it crash as easily with Python 2.6 ? > If not, then that could be an answer as to why this wasn't noticed before. With 20 test samples, it seems like Python 2.6 survives 50% longer than Python 2.7. python2.6 0, 1.089 1, 2.688 2, 1.066 3, 6.416 4, 0.921 5, 1.859 6, 0.896 7, 0.910 8, 1.851 9, 1.042 10, 1.110 11, 1.040 12, 1.072 13, 1.825 14, 3.720 15, 1.822 16, 0.983 17, 1.931 18, 0.998 19, 1.105 cnt: 20, min: 0.896, max: 6.416, avg: 1.717 python2.7 0, 1.795 1, 0.953 2, 1.802 3, 1.022 4, 0.906 5, 1.841 6, 1.080 7, 0.958 8, 1.110 9, 0.924 10, 0.894 11, 1.958 12, 0.898 13, 1.846 14, 0.936 15, 1.859 16, 1.036 17, 1.092 18, 0.920 19, 0.949 cnt: 20, min: 0.894, max: 1.958, avg: 1.239 import subprocess from time import time log = open("log.txt", "w") cnt = 100 for py in ("python2.6", "python2.7"): log.write(py + "\n") dur = [] for i in range(cnt): start = time() subprocess.call(["python2.6", "cyclic.py"]) run = time() - start dur.append(run) log.write("%i, %0.3f\n" % (i, run)) print i log.write("cnt: %i, min: %0.3f, max: %0.3f, avg: %0.3f\n\n" % (cnt, min(dur), max(dur), sum(dur) / cnt)) import lucene import threading import time import gc lucene.initVM() def alloc(): while 1: a = {}, {}, {}, {}, {}, {} time.sleep(0.011) t = threading.Thread(target=alloc) t.daemon = True t.start() while 1: obj = {} # create cycle obj["obj"] = obj obj["jcc"] = lucene.JArray('object')(1, lucene.File) time.sleep(0.001)
Re: segfault in JCCEnv::deleteGlobalRef
Am 11.05.2011 18:27, schrieb Andi Vajda: > Does it crash as easily with Python 2.6 ? > If not, then that could be an answer as to why this wasn't noticed before. It's crashing with Python 2.6.6 and Python 2.7.1. Sometimes it takes less than a second, sometimes half a minute or more. I would have to run a serious of tests to compare both versions. Christian
Re: segfault in JCCEnv::deleteGlobalRef
> There shouldn't be any random threads. Threads don't just appear out of thin > air. You create them. If there is a chance that they call into the JVM, then > attachCurrentThread(). I've already made sure, that all our code and threads are calling a hook, which attaches the thread to the JVM. But I don't have control over all threads. Some threads are created in third party libraries. I would have to check and patch every third party tool, we are using. >> I wonder, why it wasn't noticed earlier. > > Did anything else change in your application besides the Python version ? > 32-bit to 64-bit ? (more memory used, more frequent GCs) > Something in the code ? I done testing with the same code base on a single machine. The Python 2.7 branch of our application just has a few changes like python2.6 -> python2.7. Nothing else is different. JCC and Lucence are compiled from the very same tar ball with the same version of GCC. We had very few segfaults in our test suite over the past months (more than five test runs every day, less than one crash per week). With Python 2.7 I'm seeing crashes three of five test runs. The example code crashes both Python 2.6.6. + JCC 2.7 + PyLucene 3.0.3 and Python 2.7.1 + JCC 2.8 + PyLucene 3.1.0 on my laptop (Ubuntu 10.10 X86_64). Christian
Re: segfault in JCCEnv::deleteGlobalRef
On Wed, 11 May 2011, Christian Heimes wrote: Am 11.05.2011 18:14, schrieb Christian Heimes: --- import lucene import threading import time import gc lucene.initVM() def alloc(): while 1: gc.collect() time.sleep(0.011) t = threading.Thread(target=alloc) t.daemon = True t.start() while 1: obj = {} # create cycle obj["obj"] = obj obj["jcc"] = lucene.JArray('object')(1, lucene.File) time.sleep(0.001) --- The example crashes also with functions like but it takes a bit longer def alloc(): while 1: a = {}, {}, {}, {}, {}, {} time.sleep(0.011) def alloc(): while 1: # create 500 bound methods to exceed PyMethod_MAXFREELIST 256 methods = [] for i in xrange(500): methods.append(str("abc").strip) time.sleep(0.011) Does it crash as easily with Python 2.6 ? If not, then that could be an answer as to why this wasn't noticed before. Andi..
Re: segfault in JCCEnv::deleteGlobalRef
On Wed, 11 May 2011, Christian Heimes wrote: Am 11.05.2011 17:36, schrieb Andi Vajda: As you can clearly see, the JNIEnv_ instance is a NULL pointer. Contrary to my initial assumption, the thread doesn't have a JCC thread local object. Since any thread may trigger a GC collect run, and not just threads, that use JCC, this looks like a bug in JCC to me. Any thread that is going to call into the JVM must call attachCurrentThread() first. This includes a thread doing GC of object wrapping java refs which it is going to delete. I'm well aware of requirement to call attachCurrentThread() in every thread that uses wrapped objects. This segfault is not caused by passing JVM objects between threads explicitly. It's Python's cyclic GC that breaks and collects reference cyclic with JVM objects in random threads. There shouldn't be any random threads. Threads don't just appear out of thin air. You create them. If there is a chance that they call into the JVM, then attachCurrentThread(). Something in Python 2.7's gc must have been altered to increase the chance, that a cyclic GC collect run is started inside a thread that isn't attached to the JVM. As far as I know the implementation of Python's cyclic GC detection, it's not possible to restrict the cyclic GC to some threads. So any unattached thread that creates objects, that are allocated with _PyObject_GC_New(), has a chance to trigger the segfault. Almost all Python objects are using _PyObject_GC_New(). Only very simple types like str, int, that can't reference other objects, are not tracked. Everything else (including bound methods of simple types) is tracked. In a few words: Any unattached thread has the chance to crash the interpreter unless the code is very, very limited. This can be easily reproduced with a small script: --- import lucene import threading import time import gc lucene.initVM() def alloc(): while 1: gc.collect() time.sleep(0.011) t = threading.Thread(target=alloc) t.daemon = True t.start() while 1: obj = {} # create cycle obj["obj"] = obj obj["jcc"] = lucene.JArray('object')(1, lucene.File) time.sleep(0.001) --- I wonder, why it wasn't noticed earlier. Did anything else change in your application besides the Python version ? 32-bit to 64-bit ? (more memory used, more frequent GCs) Something in the code ? Andi..
Re: segfault in JCCEnv::deleteGlobalRef
Am 11.05.2011 18:14, schrieb Christian Heimes: > --- > import lucene > import threading > import time > import gc > > lucene.initVM() > > def alloc(): > while 1: > gc.collect() > time.sleep(0.011) > > t = threading.Thread(target=alloc) > t.daemon = True > > t.start() > > while 1: > obj = {} > # create cycle > obj["obj"] = obj > obj["jcc"] = lucene.JArray('object')(1, lucene.File) > time.sleep(0.001) > > --- The example crashes also with functions like but it takes a bit longer def alloc(): while 1: a = {}, {}, {}, {}, {}, {} time.sleep(0.011) def alloc(): while 1: # create 500 bound methods to exceed PyMethod_MAXFREELIST 256 methods = [] for i in xrange(500): methods.append(str("abc").strip) time.sleep(0.011) Christian
Re: segfault in JCCEnv::deleteGlobalRef
Am 11.05.2011 17:36, schrieb Andi Vajda: >> As you can clearly see, the JNIEnv_ instance is a NULL pointer. Contrary >> to my initial assumption, the thread doesn't have a JCC thread local >> object. Since any thread may trigger a GC collect run, and not just >> threads, that use JCC, this looks like a bug in JCC to me. > > Any thread that is going to call into the JVM must call attachCurrentThread() > first. This includes a thread doing GC of object wrapping java refs which it > is going to delete. I'm well aware of requirement to call attachCurrentThread() in every thread that uses wrapped objects. This segfault is not caused by passing JVM objects between threads explicitly. It's Python's cyclic GC that breaks and collects reference cyclic with JVM objects in random threads. Something in Python 2.7's gc must have been altered to increase the chance, that a cyclic GC collect run is started inside a thread that isn't attached to the JVM. As far as I know the implementation of Python's cyclic GC detection, it's not possible to restrict the cyclic GC to some threads. So any unattached thread that creates objects, that are allocated with _PyObject_GC_New(), has a chance to trigger the segfault. Almost all Python objects are using _PyObject_GC_New(). Only very simple types like str, int, that can't reference other objects, are not tracked. Everything else (including bound methods of simple types) is tracked. In a few words: Any unattached thread has the chance to crash the interpreter unless the code is very, very limited. This can be easily reproduced with a small script: --- import lucene import threading import time import gc lucene.initVM() def alloc(): while 1: gc.collect() time.sleep(0.011) t = threading.Thread(target=alloc) t.daemon = True t.start() while 1: obj = {} # create cycle obj["obj"] = obj obj["jcc"] = lucene.JArray('object')(1, lucene.File) time.sleep(0.001) --- I wonder, why it wasn't noticed earlier. Christian
Re: segfault in JCCEnv::deleteGlobalRef
On May 11, 2011, at 7:59, Christian Heimes wrote: > Here is another backtrace with a debug build of JCC. > > #10 > #11 0x2b05b2cf4c74 in JNIEnv_::DeleteGlobalRef (this=0x0, > gref=0x2b05d02e0250) at /opt/vlspy27/lib/jdk1.6/include/jni.h:830 > #12 0x2b05b2cf0ce5 in JCCEnv::deleteGlobalRef (this=0x360f650, > obj=0x2b05d02e0250, id=1351575229) at jcc/sources/JCCEnv.cpp:321 > #13 0x2b05b2522649 in t_JObject_dealloc(t_JObject*) () > from > /opt/vlspy27/lib/python2.7/site-packages/lucene-3.1.0-py2.7-linux-x86_64.egg/lucene/_lucene.so > #14 0x2b05a303b1eb in dict_dealloc (mp=0xa2d1e20) at > Objects/dictobject.c:985 > #15 0x2b05a303cedb in PyDict_Clear (op=) at > Objects/dictobject.c:891 > #16 0x2b05a303cf49 in dict_tp_clear (op=0x0) at > Objects/dictobject.c:2088 > #17 0x2b05a30ddb7e in delete_garbage (generation= out>) at Modules/gcmodule.c:769 > #18 collect (generation=) at Modules/gcmodule.c:930 > #19 0x2b05a30de3ae in collect_generations (basicsize= optimized out>) at Modules/gcmodule.c:996 > #20 _PyObject_GC_Malloc (basicsize=) at > Modules/gcmodule.c:1457 > #21 0x2b05a30de44d in _PyObject_GC_New (tp=0x2b05a334dfa0) at > Modules/gcmodule.c:1467 > #22 0x2b05a303abbc in PyDict_New () at Objects/dictobject.c:277 > #23 0x2b05a303c846 in _PyDict_NewPresized (minused=0) at > Objects/dictobject.c:677 > #24 0x2b05a30a3088 in PyEval_EvalFrameEx (f=0x5302b10, > throwflag=) at Python/ceval.c:2220 > #25 0x2b05a30a88b8 in PyEval_EvalCodeEx (co=0x23c2ab0, > globals=, locals=, args=0x66c74c8, >argcount=, kws=, > kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3252 > > As you can clearly see, the JNIEnv_ instance is a NULL pointer. Contrary > to my initial assumption, the thread doesn't have a JCC thread local > object. Since any thread may trigger a GC collect run, and not just > threads, that use JCC, this looks like a bug in JCC to me. Any thread that is going to call into the JVM must call attachCurrentThread() first. This includes a thread doing GC of object wrapping java refs which it is going to delete. Andi.. > > Christian > >
Re: segfault in JCCEnv::deleteGlobalRef
Here is another backtrace with a debug build of JCC. #10 #11 0x2b05b2cf4c74 in JNIEnv_::DeleteGlobalRef (this=0x0, gref=0x2b05d02e0250) at /opt/vlspy27/lib/jdk1.6/include/jni.h:830 #12 0x2b05b2cf0ce5 in JCCEnv::deleteGlobalRef (this=0x360f650, obj=0x2b05d02e0250, id=1351575229) at jcc/sources/JCCEnv.cpp:321 #13 0x2b05b2522649 in t_JObject_dealloc(t_JObject*) () from /opt/vlspy27/lib/python2.7/site-packages/lucene-3.1.0-py2.7-linux-x86_64.egg/lucene/_lucene.so #14 0x2b05a303b1eb in dict_dealloc (mp=0xa2d1e20) at Objects/dictobject.c:985 #15 0x2b05a303cedb in PyDict_Clear (op=) at Objects/dictobject.c:891 #16 0x2b05a303cf49 in dict_tp_clear (op=0x0) at Objects/dictobject.c:2088 #17 0x2b05a30ddb7e in delete_garbage (generation=) at Modules/gcmodule.c:769 #18 collect (generation=) at Modules/gcmodule.c:930 #19 0x2b05a30de3ae in collect_generations (basicsize=) at Modules/gcmodule.c:996 #20 _PyObject_GC_Malloc (basicsize=) at Modules/gcmodule.c:1457 #21 0x2b05a30de44d in _PyObject_GC_New (tp=0x2b05a334dfa0) at Modules/gcmodule.c:1467 #22 0x2b05a303abbc in PyDict_New () at Objects/dictobject.c:277 #23 0x2b05a303c846 in _PyDict_NewPresized (minused=0) at Objects/dictobject.c:677 #24 0x2b05a30a3088 in PyEval_EvalFrameEx (f=0x5302b10, throwflag=) at Python/ceval.c:2220 #25 0x2b05a30a88b8 in PyEval_EvalCodeEx (co=0x23c2ab0, globals=, locals=, args=0x66c74c8, argcount=, kws=, kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3252 As you can clearly see, the JNIEnv_ instance is a NULL pointer. Contrary to my initial assumption, the thread doesn't have a JCC thread local object. Since any thread may trigger a GC collect run, and not just threads, that use JCC, this looks like a bug in JCC to me. Christian
segfault in JCCEnv::deleteGlobalRef
Hello, I've updated our software stack from Python 2.6.6 to Python 2.7.1. Since the update I'm seeing random segfaults all related to JCCEnv::deleteGlobalRef() and Python's GC. At first I thought the bug is an incompatibility between Python 2.7 and JCC 2.7. However an update to JCC 2.8 and Lucence 3.1.0 didn't resolve my issue. So far all segfaults have the same pattern. The creation or removal of a Python object triggers a cyclic GC run which runs into t_JObject_dealloc() and crashes inside JCCEnv::deleteGlobalRef(). At least some of the crashing code paths run inside threads with an attached JCC thread. (gdb) bt #10 #11 0x2ba7deb380c9 in JCCEnv::deleteGlobalRef(_jobject*, int) () from /opt/vlspy27/lib/python2.7/site-packages/JCC-2.8-py2.7-linux-x86_64.egg/libjcc.so #12 0x2ba7de36c649 in t_JObject_dealloc(t_JObject*) () from /opt/vlspy27/lib/python2.7/site-packages/lucene-3.1.0-py2.7-linux-x86_64.egg/lucene/_lucene.so #13 0x2ba7cee851eb in dict_dealloc (mp=0x9975720) at Objects/dictobject.c:985 #14 0x2ba7cee86edb in PyDict_Clear (op=) at Objects/dictobject.c:891 #15 0x2ba7cee86f49 in dict_tp_clear (op=0x3) at Objects/dictobject.c:2088 #16 0x2ba7cef27b7e in delete_garbage (generation=) at Modules/gcmodule.c:769 #17 collect (generation=) at Modules/gcmodule.c:930 #18 0x2ba7cef283ae in collect_generations (basicsize=) at Modules/gcmodule.c:996 #19 _PyObject_GC_Malloc (basicsize=) at Modules/gcmodule.c:1457 #20 0x2ba7cef2844d in _PyObject_GC_New (tp=0x2ba7cf197fa0) at Modules/gcmodule.c:1467 #21 0x2ba7cee84bbc in PyDict_New () at Objects/dictobject.c:277 #22 0x2ba7cee8b188 in _PyObject_GenericSetAttrWithDict (obj=, name=0x12d5ae8, value=0x7c636b0, dict=0x0) at Objects/object.c:1510 #23 0x2ba7cee8b537 in PyObject_SetAttr (v=0x77704d0, name=0x12d5ae8, value=0x7c636b0) at Objects/object.c:1245 #24 0x2ba7c4b4 in PyEval_EvalFrameEx (f=0x50d7520, throwflag=) at Python/ceval.c:2003 #25 0x2ba7ceef28b8 in PyEval_EvalCodeEx (co=0x2199ab0, globals=, locals=, args=0x8bd7b58, (gdb) select-frame 24 (gdb) pyframe /opt/vlspy27/lib/python2.7/site-packages/kinterbasdb-3.3.0-py2.7-linux-x86_64.egg/kinterbasdb/__init__.py (1499): __init__ class _RowMapping(object): def __init__(self, description, row): self._description = description fields = self._fields = {} # <-- 1499 pos = 0 (gdb) bt #11 0x2ba90298b0c9 in JCCEnv::deleteGlobalRef(_jobject*, int) () from /opt/vlspy27/lib/python2.7/site-packages/JCC-2.8-py2.7-linux-x86_64.egg/libjcc.so #12 0x2ba9021bf649 in t_JObject_dealloc(t_JObject*) () from /opt/vlspy27/lib/python2.7/site-packages/lucene-3.1.0-py2.7-linux-x86_64.egg/lucene/_lucene.so #13 0x2ba8f2cd81eb in dict_dealloc (mp=0x105df800) at Objects/dictobject.c:985 #14 0x2ba8f2cd9edb in PyDict_Clear (op=) at Objects/dictobject.c:891 #15 0x2ba8f2cd9f49 in dict_tp_clear (op=0x3) at Objects/dictobject.c:2088 #16 0x2ba8f2d7ab7e in delete_garbage (generation=) at Modules/gcmodule.c:769 #17 collect (generation=) at Modules/gcmodule.c:930 #18 0x2ba8f2d7b3ae in collect_generations (basicsize=) at Modules/gcmodule.c:996 #19 _PyObject_GC_Malloc (basicsize=) at Modules/gcmodule.c:1457 #20 0x2ba8f2d7b44d in _PyObject_GC_New (tp=0x2ba8f2fddfc0) at Modules/gcmodule.c:1467 #21 0x2ba8f2cb0aa8 in PyWrapper_New (d=0x1e5e140, self=0x2ba9242509e0) at Objects/descrobject.c:1051 #22 0x2ba8f2cb0be3 in wrapperdescr_call (descr=0x1e5e140, args=0x28f87520, kwds=0x0) at Objects/descrobject.c:296 #23 0x2ba8f2c93533 in PyObject_Call (func=0x1e5e140, arg=0x2ba9242509e0, kw=0x2ba928815a40) at Objects/abstract.c:2529 #24 0x2ba8f89b8d6c in __pyx_pf_4lxml_5etree_9_ErrorLog___init__ (__pyx_v_self=0x229db820, __pyx_args=, __pyx_kwds=) at src/lxml/lxml.etree.c:28498 #25 0x2ba8f2cf6068 in type_call (type=, args=0x2ba8f3c64050, kwds=0x0) at Objects/typeobject.c:728 #26 0x2ba8f2c93533 in PyObject_Call (func=0x2ba8f8cbb1e0, arg=0x2ba9242509e0, kw=0x2ba928815a40) at Objects/abstract.c:2529 #27 0x2ba8f89b91c0 in __pyx_pf_4lxml_5etree_19_XPathEvaluatorBase___cinit__ (__pyx_v_self=0x6c5cdb8, __pyx_args=, __pyx_kwds=) at src/lxml/lxml.etree.c:111873 #28 0x2ba8f89bcb7c in __pyx_tp_new_4lxml_5etree__XPathEvaluatorBase (t=, a=, k=) at src/lxml/lxml.etree.c:149259 #29 __pyx_tp_new_4lxml_5etree_XPath (t=, a=, k=) at src/lxml/lxml.etree.c:18769 #30 0x2ba8f2cf6023 in type_call (type=0x3, args=0x20515510, kwds=0x2ba9243a0bb0) at Objects/typeobject.c:712 #31 0x2ba8f2c93533 in PyObject_Call (func=0x2ba8f8cc26e0, arg=0x2ba9242509e0, kw=0x2ba928815a40) at Objects/abstract.c:2529 #32 0x2ba8f2d4303e in do_call (f=0x23b8e80, throwflag=) at Python/ceval.c:4230 #33 call_function (f=0x23b8e80, throwflag=) at Python/ceval.c:4035 #34 PyEval_EvalFrameEx (f=0x23b8e80, throwflag=) at Python/ceval.c:2665 #35 0x2ba8f2d458b8 in PyEval_EvalCodeEx (co=0x1bf87b0, globals=, locals=, args=0x3, (gdb) sele
Re: [pylucene-dev] Segfault in JCCEnv::deleteGlobalRef
[follow up to the new mailing list] Andi Vajda wrote: > Looking at your stacktrace, it would seem that JNIEnv is NULL (this=0x0). > I recently fixed a bug in JCC with a NULL JNIEnv caused by a line of code > being emitted too late in an extension method. You would hit this bug if you > wrote Python extensions of some Lucene Java classes as is possible with JCC. > > If you could send me a more complete stacktrace, up to the method in your > code or PyLucene, and its corresponding source code I could confirm this. > > You could also try out the fix I did (which fixed the bug I had) by getting > the latest JCC sources from PyLucene's new home at Apache: > http://svn.apache.org/repos/asf/lucene/pylucene/trunk/ > and rebuilding your libraries. This was a bug in the C++ code generator. > > Please, let me know if this fixes your problem as well. > Thanks ! Thanks Andi! An update to the latest svn revision of JCC and Lucene didn't help the cause. The server kept crashing in regular intervals. However the additional core dumps gave me additional data. Eventually I was able to debug and fix the culprit. JCC was causing a crash in a piece of code and a thread that wasn't using any Java objects at all. At least I thought so in the first place. We are using a CherryPy plugin called 'Dowser' [1] to keep track of reference counts and possible memory leaks. The segfault has always occurred inside Dowser code and the Dowser thread. I couldn't make sense of it. Dowser doesn't touch Lucene and JCC at all. Or does it? Once I started paying more attention to the exact line -- the pystack macro from Python's gdbinit is a life saver -- I got a clue. Dowser uses gc.get_objects() to iterate every 5 seconds over all objects tracked by Python's cyclic gc. A race condition induced a situation where the list returned by gc.get_objects() was holding the last reference to a JCC object. The Dowser thread didn't have a JNI instance attached to because I never thought it would matter. At the end of the "for obj in gc.get_objects():" loop, the ref count of the JCC object dropped to zero ... no JCC ENV ... SEGFAULT. Conclusion: Never combine JCC and gc.get_objects() unless you attach *all* Python threads. Christian [1] http://www.aminus.net/wiki/Dowser