Hello recently, I met a problem with one python application running with python2.5 | debian/lenny adm64 system: it crashed occasionally in our production environment. The problem started to happen just after we upgraded the python application from python2.4 | debian/etch amd64.
after configuring the system to enable core dump & debugging with the core dumps by following the guide line from http://wiki.python.org/moin/DebuggingWithGdb, I became more confused about that. The first crash case was happening in calling python-xml module, which is claimed as a pure python module, and it's not supposed to crash python interpreter. because the python application is relatively a big one, I can not show u guys the exact source code related with the crash, but only the piece of python modules. GDB shows it's crashed at string join operation: #0 string_join (self=0x7f7075baf030, orig=<value optimized out>) at ../Objects/stringobject.c:1795 1795 ../Objects/stringobject.c: No such file or directory. in ../Objects/stringobject.c and pystack macro shows the details: gdb) pystack /usr/lib/python2.5/StringIO.py (271): getvalue /usr/lib/python2.5/site-packages/_xmlplus/dom/minidom.py (62): toprettyxml /usr/lib/python2.5/site-packages/_xmlplus/dom/minidom.py (47): toxml at that time, we also found python-xml module has performance issue for our application, so we decided to use python-lxml to replace python-xml. After that replacement, the crash was gone. That's a bit weird for me, but anyway, it's gone. Unfortunately, another two 'kinds' of crashes happening after that, and the core dumps show they are not related with the replacement. One is crashed with "Program terminated with signal 11", and the pystack macro shows it's crashed at calling the built-in id() function. #0 visit_decref (op=0x20200a3e22726574, data=0x0) at ../Modules/ gcmodule.c:270 270 ../Modules/gcmodule.c: No such file or directory. in ../Modules/gcmodule.c Another is crashed with "Program terminated with signal 7", and the pystack macro shows it's crashed at the exactly same operation (string join) as the first one (python-xml), but in different library python- simplejson: #0 string_join (self=0x7f5149877030, orig=<value optimized out>) at ../Objects/stringobject.c:1795 1795 ../Objects/stringobject.c: No such file or directory. in ../Objects/stringobject.c (gdb) pystack /var/lib/python-support/python2.5/simplejson/encoder.py (367): encode /var/lib/python-support/python2.5/simplejson/__init__.py (243): dumps I'm not good at using gdb & C programming, then I tried some other ways to dig further: * get the source code of python2.5, but can not figure out the crash reason :( * since Debian distribution provides python-dbg package, and I tried to use python2.5-dbg interpreter, but not the python2.5, so that I can get more debug information in the core dump file. Unfortunately, my python application is using a bunch of C modules, and not all of them provides -dbg package in Debian/Lenny. So it still doesn't make any progress yet. I will be really appreciated if somebody can help me about how to debug the python crashes. Thanks in advance! BR Jacky Wang -- http://mail.python.org/mailman/listinfo/python-list