Re: segfault in JCCEnv::deleteGlobalRef

2011-05-11 Thread Christian Heimes
Am 11.05.2011 19:41, schrieb Andi Vajda:
> If these functions eventually instantiate a Thread class, even indirectly, 
> the monkey-patching may still work.

Some of the code doesn't use the threading module at all, just thread or
the internal C API. I'd have to patch the modules and C code.

> That may cover this case but what about all the others ?
> There is a reason the call has to be manual.
> 
> I've not been able to automate it before.
> Over time, I've added checks where I could but I've not found it possible to 
> cover all cases where attachCurrentThread() wasn't called.
> 
> Anyhow, try it and see if it fixes the problem you're seeing.
> If any of the objects being freed invoke user code that eventually call into 
> the JVM, the problem is going to appear again elsewhere.

I understand your reluctance to automate the attaching of Python threads
to the JVM. Explicit is better than implicit. However this is a special
case. CPython doesn't allow to control cyclic garbage collector's
threading attachment nor does CPython have a hook that is called for
newly created threads. It's hard to debug a segfault when even code like
"a = []" can trigger the bug.

The attached patch doesn't trigger the bug in my artificial test code.
I'm going to run our test suite several times. That's going to take a
while.

Christian
Index: jcc/sources/jcc.cpp
===
--- jcc/sources/jcc.cpp	(Revision 1088091)
+++ jcc/sources/jcc.cpp	(Arbeitskopie)
@@ -33,6 +33,25 @@
 
 /* JCCEnv */
 
+int jccenv_attachCurrentThread(char *name, int asDaemon)
+{
+	int result;
+JNIEnv *jenv = NULL;
+
+JavaVMAttachArgs attach = {
+JNI_VERSION_1_4, name, NULL
+};
+
+if (asDaemon)
+result = env->vm->AttachCurrentThreadAsDaemon((void **) &jenv, &attach);
+else
+result = env->vm->AttachCurrentThread((void **) &jenv, &attach);
+
+env->set_vm_env(jenv);
+
+return result;
+}
+
 class t_jccenv {
 public:
 PyObject_HEAD
@@ -154,21 +173,11 @@
 {
 char *name = NULL;
 int asDaemon = 0, result;
-JNIEnv *jenv = NULL;
 
 if (!PyArg_ParseTuple(args, "|si", &name, &asDaemon))
 return NULL;
 
-JavaVMAttachArgs attach = {
-JNI_VERSION_1_4, name, NULL
-};
-
-if (asDaemon)
-result = env->vm->AttachCurrentThreadAsDaemon((void **) &jenv, &attach);
-else
-result = env->vm->AttachCurrentThread((void **) &jenv, &attach);
-
-env->set_vm_env(jenv);
+result = jccenv_attachCurrentThread(name, asDaemon);
 
 return PyInt_FromLong(result);
 }
Index: jcc/sources/JCCEnv.cpp
===
--- jcc/sources/JCCEnv.cpp	(Revision 1088091)
+++ jcc/sources/JCCEnv.cpp	(Arbeitskopie)
@@ -318,6 +318,16 @@
 {
 if (iter->second.count == 1)
 {
+JNIEnv *vm_env = get_vm_env();
+if (!vm_env)
+{
+/* Python's cyclic garbage collector may remove
+ * an object inside a thread that is not attached
+ * to the JVM. This makes sure JCC doesn't segfault.
+ */
+jccenv_attachCurrentThread(NULL, 0);
+vm_env = get_vm_env();
+}
 get_vm_env()->DeleteGlobalRef(iter->second.global);
 refs.erase(iter);
 }
Index: jcc/sources/JCCEnv.h
===
--- jcc/sources/JCCEnv.h	(Revision 1088091)
+++ jcc/sources/JCCEnv.h	(Arbeitskopie)
@@ -72,6 +72,8 @@
 
 typedef jclass (*getclassfn)(void);
 
+int jccenv_attachCurrentThread(char *name, int asDaemon);
+
 class countedRef {
 public:
 jobject global;


Re: segfault in JCCEnv::deleteGlobalRef

2011-05-11 Thread Andi Vajda


On Wed, 11 May 2011, Christian Heimes wrote:


Am 11.05.2011 19:03, schrieb Andi Vajda:

If these libraries use Python's Thread class you have some control.

Create a subclass of Thread that runs your hook and insert it into the
threading module (threading.Thread = YourThreadSubclass) before anyone else
gets a chance to create threads.


One library is using thread.start_new_thread() and another uses Python's
C API to create an internal monitor thread. This makes it even harder to
fix the issue.


If these functions eventually instantiate a Thread class, even indirectly, 
the monkey-patching may still work.




How would you feel about another approach?

* factor out the attach routine of t_jccenv_attachCurrentThread() as C
function

int jccenv_attachCurrentThread(char *name, int asDaemon) {
   JNIEnv *jenv = NULL;

   JavaVMAttachArgs attach = {
   JNI_VERSION_1_4, name, NULL
   };

   if (asDaemon)
   result = env->vm->AttachCurrentThreadAsDaemon((void **) &jenv,
&attach);
   else
   result = env->vm->AttachCurrentThread((void **) &jenv, &attach);

   env->set_vm_env(jenv);

   return result;
}


* modify JCCEnv::deleteGlobalRef() to check get_vm_env() for NULL

if (iter->second.count == 1)
{
   JNIEnv *vm_env = get_vm_env()
   if (!vm_env) {
   jccenv_attachCurrentThread(NULL, 0);
   vm_env = get_vm_env();
   }
   vm_env->DeleteGlobalRef(iter->second.global);
   refs.erase(iter);
}


That may cover this case but what about all the others ?
There is a reason the call has to be manual.

I've not been able to automate it before.
Over time, I've added checks where I could but I've not found it possible to 
cover all cases where attachCurrentThread() wasn't called.


Anyhow, try it and see if it fixes the problem you're seeing.
If any of the objects being freed invoke user code that eventually call into 
the JVM, the problem is going to appear again elsewhere.


Andi..


Re: segfault in JCCEnv::deleteGlobalRef

2011-05-11 Thread Christian Heimes
Am 11.05.2011 19:03, schrieb Andi Vajda:
> If these libraries use Python's Thread class you have some control.
> 
> Create a subclass of Thread that runs your hook and insert it into the 
> threading module (threading.Thread = YourThreadSubclass) before anyone else 
> gets a chance to create threads.

One library is using thread.start_new_thread() and another uses Python's
C API to create an internal monitor thread. This makes it even harder to
fix the issue.

How would you feel about another approach?

* factor out the attach routine of t_jccenv_attachCurrentThread() as C
function

int jccenv_attachCurrentThread(char *name, int asDaemon) {
JNIEnv *jenv = NULL;

JavaVMAttachArgs attach = {
JNI_VERSION_1_4, name, NULL
};

if (asDaemon)
result = env->vm->AttachCurrentThreadAsDaemon((void **) &jenv,
&attach);
else
result = env->vm->AttachCurrentThread((void **) &jenv, &attach);

env->set_vm_env(jenv);

return result;
}


* modify JCCEnv::deleteGlobalRef() to check get_vm_env() for NULL

if (iter->second.count == 1)
{
JNIEnv *vm_env = get_vm_env()
if (!vm_env) {
jccenv_attachCurrentThread(NULL, 0);
vm_env = get_vm_env();
}
vm_env->DeleteGlobalRef(iter->second.global);
refs.erase(iter);
}

Christian


Re: segfault in JCCEnv::deleteGlobalRef

2011-05-11 Thread Christian Heimes
Am 11.05.2011 18:26, schrieb Andi Vajda:
> There shouldn't be any random threads. Threads don't just appear out of thin 
> air. You create them. If there is a chance that they call into the JVM, then 
> attachCurrentThread().

I've already made sure, that all our code and threads are calling a
hook, which attaches the thread to the JVM. But I don't have control
over all threads. Some threads are created in third party libraries. I
would have to check and patch every third party tool, we are using.

>> I wonder, why it wasn't noticed earlier.
> 
> Did anything else change in your application besides the Python version ?
> 32-bit to 64-bit ? (more memory used, more frequent GCs)
> Something in the code ?

I done testing with the same code base on a single machine. The Python
2.7 branch of our application just has a few changes like python2.6 ->
python2.7. Nothing else is different. JCC and Lucence are compiled from
the very same tar ball with the same version of GCC. We had very few
segfaults in our test suite over the past months (more than five test
runs every day, less than one crash per week). With Python 2.7 I'm
seeing crashes three of five test runs.

The example code crashes both Python 2.6.6. + JCC 2.7 + PyLucene 3.0.3
and Python 2.7.1 + JCC 2.8 + PyLucene 3.1.0 on my laptop (Ubuntu 10.10
X86_64).

Christian


Re: segfault in JCCEnv::deleteGlobalRef

2011-05-11 Thread Christian Heimes
Am 11.05.2011 18:27, schrieb Andi Vajda:
> Does it crash as easily with Python 2.6 ?
> If not, then that could be an answer as to why this wasn't noticed before.

With 20 test samples, it seems like Python 2.6 survives 50% longer than
Python 2.7.

python2.6
0, 1.089
1, 2.688
2, 1.066
3, 6.416
4, 0.921
5, 1.859
6, 0.896
7, 0.910
8, 1.851
9, 1.042
10, 1.110
11, 1.040
12, 1.072
13, 1.825
14, 3.720
15, 1.822
16, 0.983
17, 1.931
18, 0.998
19, 1.105
cnt: 20, min: 0.896, max: 6.416, avg: 1.717

python2.7
0, 1.795
1, 0.953
2, 1.802
3, 1.022
4, 0.906
5, 1.841
6, 1.080
7, 0.958
8, 1.110
9, 0.924
10, 0.894
11, 1.958
12, 0.898
13, 1.846
14, 0.936
15, 1.859
16, 1.036
17, 1.092
18, 0.920
19, 0.949
cnt: 20, min: 0.894, max: 1.958, avg: 1.239
import subprocess
from time import time
log = open("log.txt", "w")
cnt = 100

for py in ("python2.6", "python2.7"):
log.write(py + "\n")
dur = []
for i in range(cnt):
start = time()
subprocess.call(["python2.6", "cyclic.py"])
run = time() - start
dur.append(run)
log.write("%i, %0.3f\n" % (i, run))
print i
log.write("cnt: %i, min: %0.3f, max: %0.3f, avg: %0.3f\n\n" % 
  (cnt, min(dur), max(dur), sum(dur) / cnt)) 



import lucene
import threading
import time
import gc

lucene.initVM()

def alloc():
while 1:
a = {}, {}, {}, {}, {}, {}
time.sleep(0.011)

t = threading.Thread(target=alloc)
t.daemon = True

t.start()

while 1:
obj = {}
# create cycle
obj["obj"] = obj
obj["jcc"] = lucene.JArray('object')(1, lucene.File)
time.sleep(0.001)


Re: segfault in JCCEnv::deleteGlobalRef

2011-05-11 Thread Christian Heimes
Am 11.05.2011 18:27, schrieb Andi Vajda:
> Does it crash as easily with Python 2.6 ?
> If not, then that could be an answer as to why this wasn't noticed before.

It's crashing with Python 2.6.6 and Python 2.7.1. Sometimes it takes
less than a second, sometimes half a minute or more. I would have to run
a serious of tests to compare both versions.

Christian


Re: segfault in JCCEnv::deleteGlobalRef

2011-05-11 Thread Christian Heimes
> There shouldn't be any random threads. Threads don't just appear out of thin 
> air. You create them. If there is a chance that they call into the JVM, then 
> attachCurrentThread().

I've already made sure, that all our code and threads are calling a
hook, which attaches the thread to the JVM. But I don't have control
over all threads. Some threads are created in third party libraries. I
would have to check and patch every third party tool, we are using.

>> I wonder, why it wasn't noticed earlier.
> 
> Did anything else change in your application besides the Python version ?
> 32-bit to 64-bit ? (more memory used, more frequent GCs)
> Something in the code ?

I done testing with the same code base on a single machine. The Python
2.7 branch of our application just has a few changes like python2.6 ->
python2.7. Nothing else is different. JCC and Lucence are compiled from
the very same tar ball with the same version of GCC. We had very few
segfaults in our test suite over the past months (more than five test
runs every day, less than one crash per week). With Python 2.7 I'm
seeing crashes three of five test runs.

The example code crashes both Python 2.6.6. + JCC 2.7 + PyLucene 3.0.3
and Python 2.7.1 + JCC 2.8 + PyLucene 3.1.0 on my laptop (Ubuntu 10.10
X86_64).

Christian


Re: segfault in JCCEnv::deleteGlobalRef

2011-05-11 Thread Andi Vajda


On Wed, 11 May 2011, Christian Heimes wrote:


Am 11.05.2011 18:14, schrieb Christian Heimes:

---
import lucene
import threading
import time
import gc

lucene.initVM()

def alloc():
while 1:
gc.collect()
time.sleep(0.011)

t = threading.Thread(target=alloc)
t.daemon = True

t.start()

while 1:
obj = {}
# create cycle
obj["obj"] = obj
obj["jcc"] = lucene.JArray('object')(1, lucene.File)
time.sleep(0.001)

---


The example crashes also with functions like but it takes a bit longer

def alloc():
   while 1:
   a = {}, {}, {}, {}, {}, {}
   time.sleep(0.011)

def alloc():
   while 1:
   # create 500 bound methods to exceed PyMethod_MAXFREELIST 256
   methods = []
   for i in xrange(500):
   methods.append(str("abc").strip)
   time.sleep(0.011)


Does it crash as easily with Python 2.6 ?
If not, then that could be an answer as to why this wasn't noticed before.

Andi..


Re: segfault in JCCEnv::deleteGlobalRef

2011-05-11 Thread Andi Vajda



On Wed, 11 May 2011, Christian Heimes wrote:


Am 11.05.2011 17:36, schrieb Andi Vajda:

As you can clearly see, the JNIEnv_ instance is a NULL pointer. Contrary
to my initial assumption, the thread doesn't have a JCC thread local
object. Since any thread may trigger a GC collect run, and not just
threads, that use JCC, this looks like a bug in JCC to me.


Any thread that is going to call into the JVM must call attachCurrentThread() 
first. This includes a thread doing GC of object wrapping java refs which it is 
going to delete.


I'm well aware of requirement to call attachCurrentThread() in every
thread that uses wrapped objects. This segfault is not caused by passing
JVM objects between threads explicitly. It's Python's cyclic GC that
breaks and collects reference cyclic with JVM objects in random threads.


There shouldn't be any random threads. Threads don't just appear out of thin 
air. You create them. If there is a chance that they call into the JVM, then 
attachCurrentThread().



Something in Python 2.7's gc must have been altered to increase the
chance, that a cyclic GC collect run is started inside a thread that
isn't attached to the JVM. As far as I know the implementation of
Python's cyclic GC detection, it's not possible to restrict the cyclic
GC to some threads. So any unattached thread that creates objects, that
are allocated with _PyObject_GC_New(), has a chance to trigger the
segfault. Almost all Python objects are using _PyObject_GC_New(). Only
very simple types like str, int, that can't reference other objects, are
not tracked. Everything else (including bound methods of simple types)
is tracked.

In a few words: Any unattached thread has the chance to crash the
interpreter unless the code is very, very limited. This can be easily
reproduced with a small script:

---
import lucene
import threading
import time
import gc

lucene.initVM()

def alloc():
   while 1:
   gc.collect()
   time.sleep(0.011)

t = threading.Thread(target=alloc)
t.daemon = True

t.start()

while 1:
   obj = {}
   # create cycle
   obj["obj"] = obj
   obj["jcc"] = lucene.JArray('object')(1, lucene.File)
   time.sleep(0.001)

---

I wonder, why it wasn't noticed earlier.


Did anything else change in your application besides the Python version ?
32-bit to 64-bit ? (more memory used, more frequent GCs)
Something in the code ?

Andi..


Re: segfault in JCCEnv::deleteGlobalRef

2011-05-11 Thread Christian Heimes
Am 11.05.2011 18:14, schrieb Christian Heimes:
> ---
> import lucene
> import threading
> import time
> import gc
> 
> lucene.initVM()
> 
> def alloc():
> while 1:
> gc.collect()
> time.sleep(0.011)
> 
> t = threading.Thread(target=alloc)
> t.daemon = True
> 
> t.start()
> 
> while 1:
> obj = {}
> # create cycle
> obj["obj"] = obj
> obj["jcc"] = lucene.JArray('object')(1, lucene.File)
> time.sleep(0.001)
> 
> ---

The example crashes also with functions like but it takes a bit longer

def alloc():
while 1:
a = {}, {}, {}, {}, {}, {}
time.sleep(0.011)

def alloc():
while 1:
# create 500 bound methods to exceed PyMethod_MAXFREELIST 256
methods = []
for i in xrange(500):
methods.append(str("abc").strip)
time.sleep(0.011)

Christian


Re: segfault in JCCEnv::deleteGlobalRef

2011-05-11 Thread Christian Heimes
Am 11.05.2011 17:36, schrieb Andi Vajda:
>> As you can clearly see, the JNIEnv_ instance is a NULL pointer. Contrary
>> to my initial assumption, the thread doesn't have a JCC thread local
>> object. Since any thread may trigger a GC collect run, and not just
>> threads, that use JCC, this looks like a bug in JCC to me.
> 
> Any thread that is going to call into the JVM must call attachCurrentThread() 
> first. This includes a thread doing GC of object wrapping java refs which it 
> is going to delete.

I'm well aware of requirement to call attachCurrentThread() in every
thread that uses wrapped objects. This segfault is not caused by passing
JVM objects between threads explicitly. It's Python's cyclic GC that
breaks and collects reference cyclic with JVM objects in random threads.

Something in Python 2.7's gc must have been altered to increase the
chance, that a cyclic GC collect run is started inside a thread that
isn't attached to the JVM. As far as I know the implementation of
Python's cyclic GC detection, it's not possible to restrict the cyclic
GC to some threads. So any unattached thread that creates objects, that
are allocated with _PyObject_GC_New(), has a chance to trigger the
segfault. Almost all Python objects are using _PyObject_GC_New(). Only
very simple types like str, int, that can't reference other objects, are
not tracked. Everything else (including bound methods of simple types)
is tracked.

In a few words: Any unattached thread has the chance to crash the
interpreter unless the code is very, very limited. This can be easily
reproduced with a small script:

---
import lucene
import threading
import time
import gc

lucene.initVM()

def alloc():
while 1:
gc.collect()
time.sleep(0.011)

t = threading.Thread(target=alloc)
t.daemon = True

t.start()

while 1:
obj = {}
# create cycle
obj["obj"] = obj
obj["jcc"] = lucene.JArray('object')(1, lucene.File)
time.sleep(0.001)

---

I wonder, why it wasn't noticed earlier.

Christian


Re: segfault in JCCEnv::deleteGlobalRef

2011-05-11 Thread Andi Vajda

On May 11, 2011, at 7:59, Christian Heimes  wrote:

> Here is another backtrace with a debug build of JCC.
> 
> #10 
> #11 0x2b05b2cf4c74 in JNIEnv_::DeleteGlobalRef (this=0x0,
> gref=0x2b05d02e0250) at /opt/vlspy27/lib/jdk1.6/include/jni.h:830
> #12 0x2b05b2cf0ce5 in JCCEnv::deleteGlobalRef (this=0x360f650,
> obj=0x2b05d02e0250, id=1351575229) at jcc/sources/JCCEnv.cpp:321
> #13 0x2b05b2522649 in t_JObject_dealloc(t_JObject*) ()
>   from
> /opt/vlspy27/lib/python2.7/site-packages/lucene-3.1.0-py2.7-linux-x86_64.egg/lucene/_lucene.so
> #14 0x2b05a303b1eb in dict_dealloc (mp=0xa2d1e20) at
> Objects/dictobject.c:985
> #15 0x2b05a303cedb in PyDict_Clear (op=) at
> Objects/dictobject.c:891
> #16 0x2b05a303cf49 in dict_tp_clear (op=0x0) at
> Objects/dictobject.c:2088
> #17 0x2b05a30ddb7e in delete_garbage (generation= out>) at Modules/gcmodule.c:769
> #18 collect (generation=) at Modules/gcmodule.c:930
> #19 0x2b05a30de3ae in collect_generations (basicsize= optimized out>) at Modules/gcmodule.c:996
> #20 _PyObject_GC_Malloc (basicsize=) at
> Modules/gcmodule.c:1457
> #21 0x2b05a30de44d in _PyObject_GC_New (tp=0x2b05a334dfa0) at
> Modules/gcmodule.c:1467
> #22 0x2b05a303abbc in PyDict_New () at Objects/dictobject.c:277
> #23 0x2b05a303c846 in _PyDict_NewPresized (minused=0) at
> Objects/dictobject.c:677
> #24 0x2b05a30a3088 in PyEval_EvalFrameEx (f=0x5302b10,
> throwflag=) at Python/ceval.c:2220
> #25 0x2b05a30a88b8 in PyEval_EvalCodeEx (co=0x23c2ab0,
> globals=, locals=, args=0x66c74c8,
>argcount=, kws=,
> kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3252
> 
> As you can clearly see, the JNIEnv_ instance is a NULL pointer. Contrary
> to my initial assumption, the thread doesn't have a JCC thread local
> object. Since any thread may trigger a GC collect run, and not just
> threads, that use JCC, this looks like a bug in JCC to me.

Any thread that is going to call into the JVM must call attachCurrentThread() 
first. This includes a thread doing GC of object wrapping java refs which it is 
going to delete.

Andi..

> 
> Christian
> 
> 


Re: segfault in JCCEnv::deleteGlobalRef

2011-05-11 Thread Christian Heimes
Here is another backtrace with a debug build of JCC.

#10 
#11 0x2b05b2cf4c74 in JNIEnv_::DeleteGlobalRef (this=0x0,
gref=0x2b05d02e0250) at /opt/vlspy27/lib/jdk1.6/include/jni.h:830
#12 0x2b05b2cf0ce5 in JCCEnv::deleteGlobalRef (this=0x360f650,
obj=0x2b05d02e0250, id=1351575229) at jcc/sources/JCCEnv.cpp:321
#13 0x2b05b2522649 in t_JObject_dealloc(t_JObject*) ()
   from
/opt/vlspy27/lib/python2.7/site-packages/lucene-3.1.0-py2.7-linux-x86_64.egg/lucene/_lucene.so
#14 0x2b05a303b1eb in dict_dealloc (mp=0xa2d1e20) at
Objects/dictobject.c:985
#15 0x2b05a303cedb in PyDict_Clear (op=) at
Objects/dictobject.c:891
#16 0x2b05a303cf49 in dict_tp_clear (op=0x0) at
Objects/dictobject.c:2088
#17 0x2b05a30ddb7e in delete_garbage (generation=) at Modules/gcmodule.c:769
#18 collect (generation=) at Modules/gcmodule.c:930
#19 0x2b05a30de3ae in collect_generations (basicsize=) at Modules/gcmodule.c:996
#20 _PyObject_GC_Malloc (basicsize=) at
Modules/gcmodule.c:1457
#21 0x2b05a30de44d in _PyObject_GC_New (tp=0x2b05a334dfa0) at
Modules/gcmodule.c:1467
#22 0x2b05a303abbc in PyDict_New () at Objects/dictobject.c:277
#23 0x2b05a303c846 in _PyDict_NewPresized (minused=0) at
Objects/dictobject.c:677
#24 0x2b05a30a3088 in PyEval_EvalFrameEx (f=0x5302b10,
throwflag=) at Python/ceval.c:2220
#25 0x2b05a30a88b8 in PyEval_EvalCodeEx (co=0x23c2ab0,
globals=, locals=, args=0x66c74c8,
argcount=, kws=,
kwcount=0, defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3252

As you can clearly see, the JNIEnv_ instance is a NULL pointer. Contrary
to my initial assumption, the thread doesn't have a JCC thread local
object. Since any thread may trigger a GC collect run, and not just
threads, that use JCC, this looks like a bug in JCC to me.

Christian





segfault in JCCEnv::deleteGlobalRef

2011-05-11 Thread Christian Heimes
Hello,

I've updated our software stack from Python 2.6.6 to Python 2.7.1. Since
the update I'm seeing random segfaults all related to
JCCEnv::deleteGlobalRef() and Python's GC. At first I thought the bug is
an incompatibility between Python 2.7 and JCC 2.7. However an update to
JCC 2.8 and Lucence 3.1.0 didn't resolve my issue.

So far all segfaults have the same pattern. The creation or removal of a
Python object triggers a cyclic GC run which runs into
t_JObject_dealloc() and crashes inside JCCEnv::deleteGlobalRef(). At
least some of the crashing code paths run inside threads with an
attached JCC thread.

(gdb) bt
#10 
#11 0x2ba7deb380c9 in JCCEnv::deleteGlobalRef(_jobject*, int) ()
   from
/opt/vlspy27/lib/python2.7/site-packages/JCC-2.8-py2.7-linux-x86_64.egg/libjcc.so
#12 0x2ba7de36c649 in t_JObject_dealloc(t_JObject*) ()
   from
/opt/vlspy27/lib/python2.7/site-packages/lucene-3.1.0-py2.7-linux-x86_64.egg/lucene/_lucene.so
#13 0x2ba7cee851eb in dict_dealloc (mp=0x9975720) at
Objects/dictobject.c:985
#14 0x2ba7cee86edb in PyDict_Clear (op=) at
Objects/dictobject.c:891
#15 0x2ba7cee86f49 in dict_tp_clear (op=0x3) at
Objects/dictobject.c:2088
#16 0x2ba7cef27b7e in delete_garbage (generation=) at Modules/gcmodule.c:769
#17 collect (generation=) at Modules/gcmodule.c:930
#18 0x2ba7cef283ae in collect_generations (basicsize=) at Modules/gcmodule.c:996
#19 _PyObject_GC_Malloc (basicsize=) at
Modules/gcmodule.c:1457
#20 0x2ba7cef2844d in _PyObject_GC_New (tp=0x2ba7cf197fa0) at
Modules/gcmodule.c:1467
#21 0x2ba7cee84bbc in PyDict_New () at Objects/dictobject.c:277
#22 0x2ba7cee8b188 in _PyObject_GenericSetAttrWithDict (obj=, name=0x12d5ae8, value=0x7c636b0, dict=0x0)
at Objects/object.c:1510
#23 0x2ba7cee8b537 in PyObject_SetAttr (v=0x77704d0, name=0x12d5ae8,
value=0x7c636b0) at Objects/object.c:1245
#24 0x2ba7c4b4 in PyEval_EvalFrameEx (f=0x50d7520,
throwflag=) at Python/ceval.c:2003
#25 0x2ba7ceef28b8 in PyEval_EvalCodeEx (co=0x2199ab0,
globals=, locals=,
args=0x8bd7b58,
(gdb) select-frame 24
(gdb) pyframe
/opt/vlspy27/lib/python2.7/site-packages/kinterbasdb-3.3.0-py2.7-linux-x86_64.egg/kinterbasdb/__init__.py
(1499): __init__


class _RowMapping(object):
def __init__(self, description, row):
self._description = description
fields = self._fields = {} # <-- 1499
pos = 0

(gdb) bt
#11 0x2ba90298b0c9 in JCCEnv::deleteGlobalRef(_jobject*, int) ()
   from
/opt/vlspy27/lib/python2.7/site-packages/JCC-2.8-py2.7-linux-x86_64.egg/libjcc.so
#12 0x2ba9021bf649 in t_JObject_dealloc(t_JObject*) ()
   from
/opt/vlspy27/lib/python2.7/site-packages/lucene-3.1.0-py2.7-linux-x86_64.egg/lucene/_lucene.so
#13 0x2ba8f2cd81eb in dict_dealloc (mp=0x105df800) at
Objects/dictobject.c:985
#14 0x2ba8f2cd9edb in PyDict_Clear (op=) at
Objects/dictobject.c:891
#15 0x2ba8f2cd9f49 in dict_tp_clear (op=0x3) at
Objects/dictobject.c:2088
#16 0x2ba8f2d7ab7e in delete_garbage (generation=) at Modules/gcmodule.c:769
#17 collect (generation=) at Modules/gcmodule.c:930
#18 0x2ba8f2d7b3ae in collect_generations (basicsize=) at Modules/gcmodule.c:996
#19 _PyObject_GC_Malloc (basicsize=) at
Modules/gcmodule.c:1457
#20 0x2ba8f2d7b44d in _PyObject_GC_New (tp=0x2ba8f2fddfc0) at
Modules/gcmodule.c:1467
#21 0x2ba8f2cb0aa8 in PyWrapper_New (d=0x1e5e140,
self=0x2ba9242509e0) at Objects/descrobject.c:1051
#22 0x2ba8f2cb0be3 in wrapperdescr_call (descr=0x1e5e140,
args=0x28f87520, kwds=0x0) at Objects/descrobject.c:296
#23 0x2ba8f2c93533 in PyObject_Call (func=0x1e5e140,
arg=0x2ba9242509e0, kw=0x2ba928815a40) at Objects/abstract.c:2529
#24 0x2ba8f89b8d6c in __pyx_pf_4lxml_5etree_9_ErrorLog___init__
(__pyx_v_self=0x229db820, __pyx_args=,
__pyx_kwds=) at src/lxml/lxml.etree.c:28498
#25 0x2ba8f2cf6068 in type_call (type=,
args=0x2ba8f3c64050, kwds=0x0) at Objects/typeobject.c:728
#26 0x2ba8f2c93533 in PyObject_Call (func=0x2ba8f8cbb1e0,
arg=0x2ba9242509e0, kw=0x2ba928815a40) at Objects/abstract.c:2529
#27 0x2ba8f89b91c0 in
__pyx_pf_4lxml_5etree_19_XPathEvaluatorBase___cinit__
(__pyx_v_self=0x6c5cdb8, __pyx_args=,
__pyx_kwds=) at src/lxml/lxml.etree.c:111873
#28 0x2ba8f89bcb7c in __pyx_tp_new_4lxml_5etree__XPathEvaluatorBase
(t=, a=,
k=) at src/lxml/lxml.etree.c:149259
#29 __pyx_tp_new_4lxml_5etree_XPath (t=, a=, k=)
at src/lxml/lxml.etree.c:18769
#30 0x2ba8f2cf6023 in type_call (type=0x3, args=0x20515510,
kwds=0x2ba9243a0bb0) at Objects/typeobject.c:712
#31 0x2ba8f2c93533 in PyObject_Call (func=0x2ba8f8cc26e0,
arg=0x2ba9242509e0, kw=0x2ba928815a40) at Objects/abstract.c:2529
#32 0x2ba8f2d4303e in do_call (f=0x23b8e80, throwflag=) at Python/ceval.c:4230
#33 call_function (f=0x23b8e80, throwflag=) at
Python/ceval.c:4035
#34 PyEval_EvalFrameEx (f=0x23b8e80, throwflag=) at
Python/ceval.c:2665
#35 0x2ba8f2d458b8 in PyEval_EvalCodeEx (co=0x1bf87b0,
globals=, locals=, args=0x3,
(gdb) sele

Re: [pylucene-dev] Segfault in JCCEnv::deleteGlobalRef

2009-01-28 Thread Christian Heimes
[follow up to the new mailing list]

Andi Vajda wrote:
> Looking at your stacktrace, it would seem that JNIEnv is NULL (this=0x0).
> I recently fixed a bug in JCC with a NULL JNIEnv caused by a line of code 
> being emitted too late in an extension method. You would hit this bug if you 
> wrote Python extensions of some Lucene Java classes as is possible with JCC.
> 
> If you could send me a more complete stacktrace, up to the method in your 
> code or PyLucene, and its corresponding source code I could confirm this.
> 
> You could also try out the fix I did (which fixed the bug I had) by getting 
> the latest JCC sources from PyLucene's new home at Apache:
>  http://svn.apache.org/repos/asf/lucene/pylucene/trunk/
> and rebuilding your libraries. This was a bug in the C++ code generator.
> 
> Please, let me know if this fixes your problem as well.
> Thanks !

Thanks Andi!

An update to the latest svn revision of JCC and Lucene didn't help the
cause. The server kept crashing in regular intervals. However the
additional core dumps gave me additional data. Eventually I was able to
debug and fix the culprit.

JCC was causing a crash in a piece of code and a thread that wasn't
using any Java objects at all. At least I thought so in the first place.
We are using a CherryPy plugin called 'Dowser' [1] to keep track of
reference counts and possible memory leaks. The segfault has always
occurred inside Dowser code and the Dowser thread. I couldn't make sense
of it. Dowser doesn't touch Lucene and JCC at all. Or does it?

Once I started paying more attention to the exact line -- the pystack
macro from Python's gdbinit is a life saver -- I got a clue. Dowser uses
gc.get_objects() to iterate every 5 seconds over all objects tracked by
Python's cyclic gc. A race condition induced a situation where the list
returned by gc.get_objects() was holding the last reference to a JCC
object.
The Dowser thread didn't have a JNI instance attached to because I never
thought it would matter. At the end of the "for obj in
gc.get_objects():" loop, the ref count of the JCC object dropped to zero
... no JCC ENV ... SEGFAULT.

Conclusion:
Never combine JCC and gc.get_objects() unless you attach *all* Python
threads.

Christian

[1] http://www.aminus.net/wiki/Dowser