[Numpy-discussion] How to debug reference counting errors
Hi, There is segfault reported here: http://projects.scipy.org/numpy/ticket/1588 I've managed to isolate the problem and even provide a simple patch, that fixes it here: https://github.com/numpy/numpy/issues/398 however the patch simply doesn't decrease the proper reference, so it might leak. I've used bisection (took the whole evening unfortunately...) but the good news is that I've isolated commits that actually broke it. See the github issue #398 for details, diffs etc. Unfortunately, it's 12 commits from Mark and the individual commits raise exception on the segfaulting code, so I can't pin point the problem further. In general, how can I debug this sort of problem? I tried to use valgrind, with a debugging build of numpy, but it provides tons of false (?) positives: https://gist.github.com/3549063 Mark, by looking at the changes that broke it, as well as at my fix, do you see where the problem could be? I suspect it is something with the changes in PyArray_FromAny() or PyArray_FromArray() in ctors.c. But I don't see anything so far that could cause it. Thanks for any help. This is one of the issues blocking the 1.7.0 release. Ondrej ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] How to debug reference counting errors
Hi, re: valgrind - to get better results you might try the suggestions from: http://svn.python.org/projects/python/trunk/Misc/README.valgrind Richard On 31 August 2012 09:03, Ondřej Čertík ondrej.cer...@gmail.com wrote: Hi, There is segfault reported here: http://projects.scipy.org/numpy/ticket/1588 I've managed to isolate the problem and even provide a simple patch, that fixes it here: https://github.com/numpy/numpy/issues/398 however the patch simply doesn't decrease the proper reference, so it might leak. I've used bisection (took the whole evening unfortunately...) but the good news is that I've isolated commits that actually broke it. See the github issue #398 for details, diffs etc. Unfortunately, it's 12 commits from Mark and the individual commits raise exception on the segfaulting code, so I can't pin point the problem further. In general, how can I debug this sort of problem? I tried to use valgrind, with a debugging build of numpy, but it provides tons of false (?) positives: https://gist.github.com/3549063 Mark, by looking at the changes that broke it, as well as at my fix, do you see where the problem could be? I suspect it is something with the changes in PyArray_FromAny() or PyArray_FromArray() in ctors.c. But I don't see anything so far that could cause it. Thanks for any help. This is one of the issues blocking the 1.7.0 release. Ondrej ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] how is y += x computed when y.strides = (0, 8) and x.strides=(16, 8) ?
Hi, I'm using numpy 1.6.1 on Ubuntu 12.04.1 LTS. A code that used to work with an older version of numpy now fails with an error. Were there any changes in the way inplace operations like +=, *=, etc. work on arrays with non-standard strides? For the script: --- start of code --- import numpy x = numpy.arange(6).reshape((3,2)) y = numpy.arange(2) print 'x=\n', x print 'y=\n', y u,v = numpy.broadcast_arrays(x, y) print 'u=\n', u print 'v=\n', v print 'v.strides=\n', v.strides v += u print 'v=\n', v # expectation: v = [[6,12], [6,12], [6,12]] print 'u=\n', u print 'y=\n', y # expectation: y = [6,12] --- end of code --- I get the output start of output - x= [[0 1] [2 3] [4 5]] y= [0 1] u= [[0 1] [2 3] [4 5]] v= [[0 1] [0 1] [0 1]] v.strides= (0, 8) v= [[4 6] [4 6] [4 6]] u= [[0 1] [2 3] [4 5]] y= [4 6] end of output I would have expected that v += u performs an element-by-element += v[0,0] += u[0,0] # increments y[0] v[0,1] += u[0,1] # increments y[1] v[1,0] += u[1,0] # increments y[0] v[1,1] += u[1,1] # increments y[1] v[2,0] += u[2,0] # increments y[0] v[2,1] += u[2,1] # increments y[1] yielding the result y = [6,12] but instead one obtains y = [4, 6] which could be the result of v[2,0] += u[2,0] # increments y[0] v[2,1] += u[2,1] # increments y[1] Is this the intended behavior? regards, Sebastian ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] How to debug reference counting errors
On 08/31/2012 09:03 AM, Ondřej Čertík wrote: Hi, There is segfault reported here: http://projects.scipy.org/numpy/ticket/1588 I've managed to isolate the problem and even provide a simple patch, that fixes it here: https://github.com/numpy/numpy/issues/398 however the patch simply doesn't decrease the proper reference, so it might leak. I've used bisection (took the whole evening unfortunately...) but the good news is that I've isolated commits that actually broke it. See the github issue #398 for details, diffs etc. Unfortunately, it's 12 commits from Mark and the individual commits raise exception on the segfaulting code, so I can't pin point the problem further. In general, how can I debug this sort of problem? I tried to use valgrind, with a debugging build of numpy, but it provides tons of false (?) positives: https://gist.github.com/3549063 Mark, by looking at the changes that broke it, as well as at my fix, do you see where the problem could be? I suspect it is something with the changes in PyArray_FromAny() or PyArray_FromArray() in ctors.c. But I don't see anything so far that could cause it. Thanks for any help. This is one of the issues blocking the 1.7.0 release. IIRC you can recompile Python with some support for detecting memory leaks. One of the issues with using Valgrind, after suppressing the false positives, is that Python uses its own memory allocator so that sits between the bug and what Valgrind detects. So at least recompile Python to not do that. As for hardening the NumPy source in general, you should at least be aware of these two options: 1) David Malcolm (dmalc...@redhat.com) was writing a static code analysis plugin for gcc that would check every routine that the reference count semantics was correct. (I don't know how far he's got with that.) 2) In Cython we have a reference count nanny. This requires changes to all the code though, so not an option just for finding this bug, just thought I'd mention it. In addition to the INCREF/DECREF you need to insert new GIVEREF and GOTREF calls (which are noops in a normal compile) to declare where you get and give away a reference. When Cython-generated sources are enabled with -DCYTHON_REFNANNY, INCREF/DECREF/GIVEREF/GOTREF are tracked within each function and a failure is raised if the function violates any contract. Dag ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] view of recarray issue
Ondrej, Sorry for the delay in getting back to this. I have some free time today to get this resolved if you haven't already fixed it. -Jay On Wed, Aug 29, 2012 at 7:19 PM, Ondřej Čertík ondrej.cer...@gmail.comwrote: Jay, On Mon, Aug 20, 2012 at 12:40 PM, Ondřej Čertík ondrej.cer...@gmail.com wrote: On Wed, Jul 25, 2012 at 10:29 AM, Jay Bourque jay.bour...@continuum.io wrote: I'm actively looking at this issue since it was my pull request that broke this (https://github.com/numpy/numpy/pull/350). We definitely don't want to break this functionality for 1.7. The problem is that even though indexing with a subset of fields still returns a copy (for now), it now returns a copy of a view of the original array. When you call copy() on a view, it copies the entire original structured array with the view dtype. A short term fix would be to manually create a proper copy to return similar to what _index_fields() did before my change, but since the idea is to eventually return the view instead of a copy, long term we need a way to do a proper copy of a structured array view that doesn't copy the unwanted fields. This should be fixed for 1.7.0. However, I am going to release beta now, and then see what we can do about this. What would be the best short term fix, so that we can release 1.7.0? I am still trying to understand what exactly the problem with dtype is in _index_fields(). Would you suggest to keep using the view, or somehow revert to the old behavior while still trying to pass all the new tests in your PR 350? If you have any hints, it would save me some time. Thanks, Ondrej ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ANN: NumPy 1.7.0b1 release
Hello, On Tue, Aug 21, 2012 at 6:24 PM, Ondřej Čertík ondrej.cer...@gmail.com wrote: Hi, I'm pleased to announce the availability of the first beta release of NumPy 1.7.0b1. I've just uploaded it to Debian experimental, so we can give it a run while in freeze. Some of the buildds are already building[1] the package, so we should get results asap (either failures or successes). [1] https://buildd.debian.org/status/package.php?p=python-numpysuite=experimental If tests fail, it won't stop the build, and indeed I got at least 2 errors (actually 1 error and 1 crash), when running tests for python 2.7 and 3.2 with debug enabled: 2.7 dbg == ERROR: test_power_zero (test_umath.TestPower) -- Traceback (most recent call last): File /tmp/buildd/python-numpy-1.7.0~b1/debian/tmp/usr/lib/python2.7/dist-packages/numpy/core/tests/test_umath.py, line 139, in test_power_zero assert_complex_equal(np.power(zero, 0+1j), cnan) RuntimeWarning: invalid value encountered in power -- 3.2 dbg python3.2-dbg: numpy/core/src/multiarray/common.c:161: PyArray_DTypeFromObjectHelper: Assertion `((PyObject*)(temp))-ob_type))-tp_flags ((1L27))) != 0)' failed. Aborted I'm reporting them here since you asked so, dunno if you want an issue on github to track them. I'll look at the buildds logs and report additional failures if they come up. Cheers, -- Sandro Tosi (aka morph, morpheus, matrixhasu) My website: http://matrixhasu.altervista.org/ Me at Debian: http://wiki.debian.org/SandroTosi ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Issues for 1.7.0
On Thu, Aug 30, 2012 at 10:47 PM, Ondřej Čertík ondrej.cer...@gmail.comwrote: Hi, I am keeping track of all issues that need to be done for the 1.7.0 release here: https://github.com/numpy/numpy/issues/396 If you have trac and github push access, here is how you can help (by closing/merging): Issues that need clarification: http://projects.scipy.org/numpy/ticket/2150 http://projects.scipy.org/numpy/ticket/2101 Issues fixed (should be closed): http://projects.scipy.org/numpy/ticket/2185 http://projects.scipy.org/numpy/ticket/2066 http://projects.scipy.org/numpy/ticket/2189 PRs that need merging: https://github.com/numpy/numpy/pull/395 https://github.com/numpy/numpy/pull/397 There are still a few more (see my github issue above), that I am working on right now. Ondrej, It looks like you don't have commit rights. Is that the case? If you are the release manager I think you need both commit rights and the right to close tickets. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Temporary error accessing NumPy tickets
Ondřej Čertík ondrej.certik at gmail.com writes: When I access tickets, for example: http://projects.scipy.org/numpy/ticket/2185 then sometimes I get: Trac detected an internal error: OperationalError: database is locked For example yesterday. A refresh in about a minute fixed the problem. Today it still lasts at the moment. The failures are probably partly triggered by the machine running out of memory. It runs services on mod_python, which apparently slowly leaks. Someone (who?) with root access on the machine needs to restart Apache. (Note: apachectl graceful is not enough to correct this, it needs a real restart of the process.) Longer term solution is to move out of mod_python (mod_wsgi likely, going to CGI will create other performance problems), or to transition the stuff there to a more beefy server. -- Pauli Virtanen ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] view of recarray issue
On Fri, Aug 31, 2012 at 6:15 AM, Jay Bourque jay.bour...@continuum.io wrote: Ondrej, Sorry for the delay in getting back to this. I have some free time today to get this resolved if you haven't already fixed it. I haven't. If you can look at it, that would be absolutely awesome. If you don't manage to fix it, if you can give me some hints what's going on, that would also be a huge help. Many thanks! Ondrej ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Issues for 1.7.0
On Fri, Aug 31, 2012 at 9:27 AM, Charles R Harris charlesr.har...@gmail.com wrote: On Thu, Aug 30, 2012 at 10:47 PM, Ondřej Čertík ondrej.cer...@gmail.com wrote: Hi, I am keeping track of all issues that need to be done for the 1.7.0 release here: https://github.com/numpy/numpy/issues/396 If you have trac and github push access, here is how you can help (by closing/merging): Issues that need clarification: http://projects.scipy.org/numpy/ticket/2150 http://projects.scipy.org/numpy/ticket/2101 Issues fixed (should be closed): http://projects.scipy.org/numpy/ticket/2185 http://projects.scipy.org/numpy/ticket/2066 http://projects.scipy.org/numpy/ticket/2189 PRs that need merging: https://github.com/numpy/numpy/pull/395 https://github.com/numpy/numpy/pull/397 There are still a few more (see my github issue above), that I am working on right now. Ondrej, It looks like you don't have commit rights. Is that the case? If you are the release manager I think you need both commit rights and the right to close tickets. Yes, I don't have commit rights nor the rights to close tickets. Ondrej ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Temporary error accessing NumPy tickets
On Fri, Aug 31, 2012 at 9:35 AM, Pauli Virtanen p...@iki.fi wrote: Ondřej Čertík ondrej.certik at gmail.com writes: When I access tickets, for example: http://projects.scipy.org/numpy/ticket/2185 then sometimes I get: Trac detected an internal error: OperationalError: database is locked For example yesterday. A refresh in about a minute fixed the problem. Today it still lasts at the moment. The failures are probably partly triggered by the machine running out of memory. It runs services on mod_python, which apparently slowly leaks. Someone (who?) with root access on the machine needs to restart Apache. (Note: apachectl graceful is not enough to correct this, it needs a real restart of the process.) I see. Longer term solution is to move out of mod_python (mod_wsgi likely, going to CGI will create other performance problems), or to transition the stuff there to a more beefy server. Or move the tickets to github. Yesterday it was very unreliable (I had to wait a long time before a comment was posted, and about 50% of the time it was not posted due to the database error). So I just created a github issue for the same thing and posted my comments there. Then I could work fast. Ondrej ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ANN: NumPy 1.7.0b1 release
Hi Sandro, On Fri, Aug 31, 2012 at 6:18 AM, Sandro Tosi mo...@debian.org wrote: Hello, On Tue, Aug 21, 2012 at 6:24 PM, Ondřej Čertík ondrej.cer...@gmail.com wrote: Hi, I'm pleased to announce the availability of the first beta release of NumPy 1.7.0b1. I've just uploaded it to Debian experimental, so we can give it a run while in freeze. Some of the buildds are already building[1] the package, so we should get results asap (either failures or successes). This is awesome, thanks you so much for doing this. This should reveal some bugs. [1] https://buildd.debian.org/status/package.php?p=python-numpysuite=experimental If tests fail, it won't stop the build, and indeed I got at least 2 errors (actually 1 error and 1 crash), when running tests for python 2.7 and 3.2 with debug enabled: 2.7 dbg == ERROR: test_power_zero (test_umath.TestPower) -- Traceback (most recent call last): File /tmp/buildd/python-numpy-1.7.0~b1/debian/tmp/usr/lib/python2.7/dist-packages/numpy/core/tests/test_umath.py, line 139, in test_power_zero assert_complex_equal(np.power(zero, 0+1j), cnan) RuntimeWarning: invalid value encountered in power -- 3.2 dbg python3.2-dbg: numpy/core/src/multiarray/common.c:161: PyArray_DTypeFromObjectHelper: Assertion `((PyObject*)(temp))-ob_type))-tp_flags ((1L27))) != 0)' failed. Aborted I'm reporting them here since you asked so, dunno if you want an issue on github to track them. I'll look at the buildds logs and report additional failures if they come up. If you could create issues at github: https://github.com/numpy/numpy/issues that would be great. If you have time, also with some info about the platform and how to reproduce it. Or at least a link to the build logs. I'll add it to the release TODO and try to fix it. Ondrej ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Issues for 1.7.0
On Fri, Aug 31, 2012 at 11:10 AM, Ondřej Čertík ondrej.cer...@gmail.comwrote: On Fri, Aug 31, 2012 at 9:27 AM, Charles R Harris charlesr.har...@gmail.com wrote: On Thu, Aug 30, 2012 at 10:47 PM, Ondřej Čertík ondrej.cer...@gmail.com wrote: Hi, I am keeping track of all issues that need to be done for the 1.7.0 release here: https://github.com/numpy/numpy/issues/396 If you have trac and github push access, here is how you can help (by closing/merging): Issues that need clarification: http://projects.scipy.org/numpy/ticket/2150 http://projects.scipy.org/numpy/ticket/2101 Issues fixed (should be closed): http://projects.scipy.org/numpy/ticket/2185 http://projects.scipy.org/numpy/ticket/2066 http://projects.scipy.org/numpy/ticket/2189 PRs that need merging: https://github.com/numpy/numpy/pull/395 https://github.com/numpy/numpy/pull/397 There are still a few more (see my github issue above), that I am working on right now. Ondrej, It looks like you don't have commit rights. Is that the case? If you are the release manager I think you need both commit rights and the right to close tickets. Yes, I don't have commit rights nor the rights to close tickets. OK, I gave commit rights to you. Someone else (Pauli) will need to give you rights to close tickets. I think Thouis also needs rights if he is going to do the issue tracking. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Issues for 1.7.0
On Fri, Aug 31, 2012 at 10:26 AM, Charles R Harris charlesr.har...@gmail.com wrote: On Fri, Aug 31, 2012 at 11:10 AM, Ondřej Čertík ondrej.cer...@gmail.com [...] Yes, I don't have commit rights nor the rights to close tickets. OK, I gave commit rights to you. Someone else (Pauli) will need to give you rights to close tickets. I think Thouis also needs rights if he is going to do the issue tracking. Thanks a lot. I just wrote to Pauli privately and CCed you. Ondrej ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ANN: NumPy 1.7.0b1 release
Ond??ej ??ert??k ondrej.cer...@gmail.com wrote: python3.2-dbg: numpy/core/src/multiarray/common.c:161: PyArray_DTypeFromObjectHelper: Assertion `((PyObject*)(temp))-ob_type))-tp_flags ((1L27))) != 0)' If you could create issues at github: https://github.com/numpy/numpy/issues that would be great. If you have time, also with some info about the platform and how to reproduce it. Or at least a link to the build logs. For the second one there's an issue here: http://projects.scipy.org/numpy/ticket/2193 Stefan Krah ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] view of recarray issue
Ondrej, Just submitted the following pull request for this: https://github.com/numpy/numpy/pull/401 -Jay On Fri, Aug 31, 2012 at 12:09 PM, Ondřej Čertík ondrej.cer...@gmail.comwrote: On Fri, Aug 31, 2012 at 6:15 AM, Jay Bourque jay.bour...@continuum.io wrote: Ondrej, Sorry for the delay in getting back to this. I have some free time today to get this resolved if you haven't already fixed it. I haven't. If you can look at it, that would be absolutely awesome. If you don't manage to fix it, if you can give me some hints what's going on, that would also be a huge help. Many thanks! Ondrej ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ANN: NumPy 1.7.0b1 release
On Fri, Aug 31, 2012 at 7:17 PM, Ondřej Čertík ondrej.cer...@gmail.com wrote: If you could create issues at github: https://github.com/numpy/numpy/issues that would be great. If you have time, also with some info about the platform and how to reproduce it. Or at least a link to the build logs. I've reported it here: https://github.com/numpy/numpy/issues/402 Cheers, -- Sandro Tosi (aka morph, morpheus, matrixhasu) My website: http://matrixhasu.altervista.org/ Me at Debian: http://wiki.debian.org/SandroTosi ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Temporary error accessing NumPy tickets
On Fri, Aug 31, 2012 at 11:35 AM, Pauli Virtanen p...@iki.fi wrote: Ondřej Čertík ondrej.certik at gmail.com writes: When I access tickets, for example: http://projects.scipy.org/numpy/ticket/2185 then sometimes I get: Trac detected an internal error: OperationalError: database is locked For example yesterday. A refresh in about a minute fixed the problem. Today it still lasts at the moment. The failures are probably partly triggered by the machine running out of memory. It runs services on mod_python, which apparently slowly leaks. Someone (who?) with root access on the machine needs to restart Apache. (Note: apachectl graceful is not enough to correct this, it needs a real restart of the process.) I do that regularly. Longer term solution is to move out of mod_python (mod_wsgi likely, going to CGI will create other performance problems), or to transition the stuff there to a more beefy server. There is also Trac. Between Trac and mod_python the load on the machine goes up to 20+ at times. I spent some time trying to figure out a move of the current machine to Amazon to a beefier instance (and I am not opposed to it but there is a lot of cruft and strange setup on it as well as the fact that it is not really clear what is what and why it is running) but this would be a case of solving a problem by throwing more hardware at it. If everyone is OK with that, fine. I personally think moving away from Trac (which IMHO is bloated and awkward in addition to having a very weird way of being administered) would be a better idea. My $0.02 Ognen ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Temporary error accessing NumPy tickets
01.09.2012 00:08, Ognen Duzlevski kirjoitti: [clip] I personally think moving away from Trac (which IMHO is bloated and awkward in addition to having a very weird way of being administered) would be a better idea. Yes, moving away from Trac is planned, both for Numpy and Scipy. Also agreed on the point of clumsy administration. This however leaves the other services still on the machine, although after dropping Trac, the juice probably is enough for them. -- Pauli Virtanen ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] How to debug reference counting errors
Hi Dag, On Fri, Aug 31, 2012 at 4:22 AM, Dag Sverre Seljebotn d.s.seljeb...@astro.uio.no wrote: On 08/31/2012 09:03 AM, Ondřej Čertík wrote: Hi, There is segfault reported here: http://projects.scipy.org/numpy/ticket/1588 I've managed to isolate the problem and even provide a simple patch, that fixes it here: https://github.com/numpy/numpy/issues/398 however the patch simply doesn't decrease the proper reference, so it might leak. I've used bisection (took the whole evening unfortunately...) but the good news is that I've isolated commits that actually broke it. See the github issue #398 for details, diffs etc. Unfortunately, it's 12 commits from Mark and the individual commits raise exception on the segfaulting code, so I can't pin point the problem further. In general, how can I debug this sort of problem? I tried to use valgrind, with a debugging build of numpy, but it provides tons of false (?) positives: https://gist.github.com/3549063 Mark, by looking at the changes that broke it, as well as at my fix, do you see where the problem could be? I suspect it is something with the changes in PyArray_FromAny() or PyArray_FromArray() in ctors.c. But I don't see anything so far that could cause it. Thanks for any help. This is one of the issues blocking the 1.7.0 release. IIRC you can recompile Python with some support for detecting memory leaks. One of the issues with using Valgrind, after suppressing the false positives, is that Python uses its own memory allocator so that sits between the bug and what Valgrind detects. So at least recompile Python to not do that. Right. Compiling with --without-pymalloc (per README.valgrind as suggested above by Richard) should improve things a lot. Thanks for the tip. As for hardening the NumPy source in general, you should at least be aware of these two options: 1) David Malcolm (dmalc...@redhat.com) was writing a static code analysis plugin for gcc that would check every routine that the reference count semantics was correct. (I don't know how far he's got with that.) 2) In Cython we have a reference count nanny. This requires changes to all the code though, so not an option just for finding this bug, just thought I'd mention it. In addition to the INCREF/DECREF you need to insert new GIVEREF and GOTREF calls (which are noops in a normal compile) to declare where you get and give away a reference. When Cython-generated sources are enabled with -DCYTHON_REFNANNY, INCREF/DECREF/GIVEREF/GOTREF are tracked within each function and a failure is raised if the function violates any contract. I see. That's a nice option. For my own code, I never touch the reference counting by hand and rather just use Cython. In the meantime, Mark fixed it: https://github.com/numpy/numpy/pull/400 https://github.com/numpy/numpy/pull/405 Mark, thanks again for this. That saved me a lot of time. Ondrej ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] How to debug reference counting errors
On Fri, Aug 31, 2012 at 5:35 PM, Ondřej Čertík ondrej.cer...@gmail.comwrote: Hi Dag, On Fri, Aug 31, 2012 at 4:22 AM, Dag Sverre Seljebotn d.s.seljeb...@astro.uio.no wrote: On 08/31/2012 09:03 AM, Ondřej Čertík wrote: Hi, There is segfault reported here: http://projects.scipy.org/numpy/ticket/1588 I've managed to isolate the problem and even provide a simple patch, that fixes it here: https://github.com/numpy/numpy/issues/398 however the patch simply doesn't decrease the proper reference, so it might leak. I've used bisection (took the whole evening unfortunately...) but the good news is that I've isolated commits that actually broke it. See the github issue #398 for details, diffs etc. Unfortunately, it's 12 commits from Mark and the individual commits raise exception on the segfaulting code, so I can't pin point the problem further. In general, how can I debug this sort of problem? I tried to use valgrind, with a debugging build of numpy, but it provides tons of false (?) positives: https://gist.github.com/3549063 Mark, by looking at the changes that broke it, as well as at my fix, do you see where the problem could be? I suspect it is something with the changes in PyArray_FromAny() or PyArray_FromArray() in ctors.c. But I don't see anything so far that could cause it. Thanks for any help. This is one of the issues blocking the 1.7.0 release. IIRC you can recompile Python with some support for detecting memory leaks. One of the issues with using Valgrind, after suppressing the false positives, is that Python uses its own memory allocator so that sits between the bug and what Valgrind detects. So at least recompile Python to not do that. Right. Compiling with --without-pymalloc (per README.valgrind as suggested above by Richard) should improve things a lot. Thanks for the tip. As for hardening the NumPy source in general, you should at least be aware of these two options: 1) David Malcolm (dmalc...@redhat.com) was writing a static code analysis plugin for gcc that would check every routine that the reference count semantics was correct. (I don't know how far he's got with that.) 2) In Cython we have a reference count nanny. This requires changes to all the code though, so not an option just for finding this bug, just thought I'd mention it. In addition to the INCREF/DECREF you need to insert new GIVEREF and GOTREF calls (which are noops in a normal compile) to declare where you get and give away a reference. When Cython-generated sources are enabled with -DCYTHON_REFNANNY, INCREF/DECREF/GIVEREF/GOTREF are tracked within each function and a failure is raised if the function violates any contract. I see. That's a nice option. For my own code, I never touch the reference counting by hand and rather just use Cython. In the meantime, Mark fixed it: https://github.com/numpy/numpy/pull/400 https://github.com/numpy/numpy/pull/405 Mark, thanks again for this. That saved me a lot of time. No problem. The way I prefer to deal with this kind of error is use C++ smart pointers. C++11's unique_ptr and boost's intrusive_ptr are both useful for painlessly managing this kind of reference counting headache. -Mark Ondrej ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] How to debug reference counting errors
On Fri, Aug 31, 2012 at 5:56 PM, Mark Wiebe mwwi...@gmail.com wrote: On Fri, Aug 31, 2012 at 5:35 PM, Ondřej Čertík ondrej.cer...@gmail.com wrote: Hi Dag, On Fri, Aug 31, 2012 at 4:22 AM, Dag Sverre Seljebotn d.s.seljeb...@astro.uio.no wrote: On 08/31/2012 09:03 AM, Ondřej Čertík wrote: Hi, There is segfault reported here: http://projects.scipy.org/numpy/ticket/1588 I've managed to isolate the problem and even provide a simple patch, that fixes it here: https://github.com/numpy/numpy/issues/398 however the patch simply doesn't decrease the proper reference, so it might leak. I've used bisection (took the whole evening unfortunately...) but the good news is that I've isolated commits that actually broke it. See the github issue #398 for details, diffs etc. Unfortunately, it's 12 commits from Mark and the individual commits raise exception on the segfaulting code, so I can't pin point the problem further. In general, how can I debug this sort of problem? I tried to use valgrind, with a debugging build of numpy, but it provides tons of false (?) positives: https://gist.github.com/3549063 Mark, by looking at the changes that broke it, as well as at my fix, do you see where the problem could be? I suspect it is something with the changes in PyArray_FromAny() or PyArray_FromArray() in ctors.c. But I don't see anything so far that could cause it. Thanks for any help. This is one of the issues blocking the 1.7.0 release. IIRC you can recompile Python with some support for detecting memory leaks. One of the issues with using Valgrind, after suppressing the false positives, is that Python uses its own memory allocator so that sits between the bug and what Valgrind detects. So at least recompile Python to not do that. Right. Compiling with --without-pymalloc (per README.valgrind as suggested above by Richard) should improve things a lot. Thanks for the tip. As for hardening the NumPy source in general, you should at least be aware of these two options: 1) David Malcolm (dmalc...@redhat.com) was writing a static code analysis plugin for gcc that would check every routine that the reference count semantics was correct. (I don't know how far he's got with that.) 2) In Cython we have a reference count nanny. This requires changes to all the code though, so not an option just for finding this bug, just thought I'd mention it. In addition to the INCREF/DECREF you need to insert new GIVEREF and GOTREF calls (which are noops in a normal compile) to declare where you get and give away a reference. When Cython-generated sources are enabled with -DCYTHON_REFNANNY, INCREF/DECREF/GIVEREF/GOTREF are tracked within each function and a failure is raised if the function violates any contract. I see. That's a nice option. For my own code, I never touch the reference counting by hand and rather just use Cython. In the meantime, Mark fixed it: https://github.com/numpy/numpy/pull/400 https://github.com/numpy/numpy/pull/405 Mark, thanks again for this. That saved me a lot of time. No problem. The way I prefer to deal with this kind of error is use C++ smart pointers. C++11's unique_ptr and boost's intrusive_ptr are both useful for painlessly managing this kind of reference counting headache. Oh yes. I prefer to use Trilinos' RCP, which is a shared pointer (just like in C++11), but has better debugging info if something goes wrong. It can be compiled in two modes -- one is slower and it can't segfault, and the other is optimized, most operations are at native raw pointer speed, but it can segfault. Ondrej ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion