> Looking at the uses of PyArray_FromAny I can see the motivation for this > design: core/include/numpy/ndarrayobject.h has a lot of calls which take a > value returned by PyArray_DescrFromType as argument. This has prompted me > to take a trawl through the code to see what else is going on, and I note > a couple more issues with patches below. > The core issue is that NumPy grew out of Numeric. In Numeric PyArray_Descr was just a C-structure, but in NumPy it is now a real Python object with reference counting. Trying to have a compatible C-API to the old one and making the transition with out huge changes to the C-API is what led to the "stealing" strategy.
I did not just out the blue decide to do it that way. Yes, it is a bit of a pain, and yes, it isn't the way it is always done, but as you point out there are precedents, and so that's the direction I took. It is *way* too late to change that in any meaningful way > In the patch below the problem being fixed is that the first call to > PyArray_FromAny can result in the erasure of dtype *before* Py_INCREF is > called. Perhaps you can argue that this only occurs when NULL is > returned... > Indeed I would argue that because the array object holds a reference to the typecode (data-type object). Only if the call returns NULL does the data-type object lose it's reference count, and in fact that works out rather nicely and avoids a host of extra Py_DECREFs. > The next patch deals with an interestingly subtle memory leak in > _string_richcompare where if casting to a common type fails then a > reference count will leaked. Actually this one has nothing to do with > PyArray_FromAny, but I spotted it in passing. > This is a good catch. Thanks! > I really don't think that this design of reference count handling in > PyArray_FromAny (and consequently PyArray_CheckFromAny) is a good idea. > Your point is well noted, but again given the provenance of the code, I still think it was the best choice. And, yes, it is too late to change it. > Not only is this not a good idea, it's not documented in the API > documentation (I'm referring to the "Guide to NumPy" book) Hmmm... Are you sure? I remember writing a bit about in the paragraphs that describe the relevant API calls. But, you could be right. > I've been trying to find some documentation on stealing references. The > Python C API reference (http://docs.python.org/api/refcountDetails.html) > says > > Few functions steal references; the two notable exceptions are > PyList_SetItem() and PyTuple_SetItem() > > An interesting essay on reference counting is at > http://lists.blender.org/pipermail/bf-python/2005-September/003092.html > Believe me, I understand reference counting pretty well. Still, it can be tricky to do correctly and it is easy to forget corner cases and error-returns. I very much appreciate your careful analysis, but I did an analysis of my own when I wrote the code, and so I will be resistant to change things if I can't see the error. I read something from Guido once that said something to the effect that nothing beats studying the code to get reference counting right. I think this is true. > In conclusion, I can't find much about the role of stealing in reference > count management, but it's such a source of surprise (and frankly doesn't > actually work out all that well in numpy) I strongly beg to differ. This sounds very naive to me. IMHO it has worked out extremely well in converting the PyArray_Descr C-structure into the data-type objects that adds so much power to NumPy. Yes, there are a few corner cases that you have done an excellent job in digging up, but they are "corner" cases that don't cause problems for 99.9% of the use-cases. -Travis _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion