Fernando Perez wrote: > On 10/23/06, Travis Oliphant <[EMAIL PROTECTED]> wrote: > >> Fernando Perez wrote: >> >>> Hi all, >>> >>> two colleagues have been seeing occasional crashes from very >>> long-running code which uses numpy. We've now gotten a backtrace from >>> one such crash, unfortunately it uses a build from a few days ago: >>> >>> >> This looks like a reference-count problem on the data-type objects >> (probably one of the builtin ones is trying to be released). The >> reference count problem is probably hard to track down. >> >> A quick fix is to not allow the built-ins to be "freed" (the attempt >> should never be made, but if it is, then we should just incref the >> reference count and continue rather than die). >> >> Ideally, the reference count problem should be found, but other-wise >> I'll just insert some print statements if the attempt is made, but not >> actually do it as a safety measure. >> > > If you point me to the right place in the sources, I'll be happy to > add something to my local copy, rebuild numpy and rerun with these > print statements in place. >
I've placed them in SVN (r3384): arraydescr_dealloc needs to do something like. if (self->fields == Py_None) { print something incref(self) return; } Most likely there is a missing Py_INCREF() before some call that uses the data-type object (and consumes it's reference count) --- do you have any Pyrex code (it's harder to get it right with Pyrex). > I realize this is probably a very difficult problem to track down, but > it really sucks to run a code for 4 days only to have it explode at > the end. Right now this is starting to be a serious problem for us as > we move our codes into large production runs, so I'm willing to put in > the necessary effort to track it down, though I'll need some guidance > from our gurus. > Tracking the reference count of the built-in data-type objects should not be too difficult. First, figure out which one is causing problems (if you still have the gdb traceback, then go up to the arraydescr_dealloc function and look at self->type_num and self->type). Then, put print statements throughout your code for the reference count of this data-type object. Something like, sys.getrefcount(numpy.dtype('float')) would be enough at a looping point in your code. Good luck, -Travis ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion