[Numpy-discussion] How to debug reference counting errors

2012-08-31 Thread Ondřej Čertík
Hi,

There is segfault reported here:

http://projects.scipy.org/numpy/ticket/1588

I've managed to isolate the problem and even provide a simple patch,
that fixes it here:

https://github.com/numpy/numpy/issues/398

however the patch simply doesn't decrease the proper reference, so it
might leak. I've used
bisection (took the whole evening unfortunately...) but the good news
is that I've isolated commits
that actually broke it. See the github issue #398 for details, diffs etc.

Unfortunately, it's 12 commits from Mark and the individual commits
raise exception on the segfaulting code,
so I can't pin point the problem further.

In general, how can I debug this sort of problem? I tried to use
valgrind, with a debugging build of numpy,
but it provides tons of false (?) positives: https://gist.github.com/3549063

Mark, by looking at the changes that broke it, as well as at my fix,
do you see where the problem could be?

I suspect it is something with the changes in PyArray_FromAny() or
PyArray_FromArray() in ctors.c.
But I don't see anything so far that could cause it.

Thanks for any help. This is one of the issues blocking the 1.7.0 release.

Ondrej
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] How to debug reference counting errors

2012-08-31 Thread Richard Hattersley
Hi,

re: valgrind - to get better results you might try the suggestions from:
http://svn.python.org/projects/python/trunk/Misc/README.valgrind

Richard

On 31 August 2012 09:03, Ondřej Čertík ondrej.cer...@gmail.com wrote:

 Hi,

 There is segfault reported here:

 http://projects.scipy.org/numpy/ticket/1588

 I've managed to isolate the problem and even provide a simple patch,
 that fixes it here:

 https://github.com/numpy/numpy/issues/398

 however the patch simply doesn't decrease the proper reference, so it
 might leak. I've used
 bisection (took the whole evening unfortunately...) but the good news
 is that I've isolated commits
 that actually broke it. See the github issue #398 for details, diffs etc.

 Unfortunately, it's 12 commits from Mark and the individual commits
 raise exception on the segfaulting code,
 so I can't pin point the problem further.

 In general, how can I debug this sort of problem? I tried to use
 valgrind, with a debugging build of numpy,
 but it provides tons of false (?) positives:
 https://gist.github.com/3549063

 Mark, by looking at the changes that broke it, as well as at my fix,
 do you see where the problem could be?

 I suspect it is something with the changes in PyArray_FromAny() or
 PyArray_FromArray() in ctors.c.
 But I don't see anything so far that could cause it.

 Thanks for any help. This is one of the issues blocking the 1.7.0 release.

 Ondrej
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] how is y += x computed when y.strides = (0, 8) and x.strides=(16, 8) ?

2012-08-31 Thread Sebastian Walter
Hi,

I'm using numpy 1.6.1 on Ubuntu 12.04.1 LTS.
A code that used to work with an older version of numpy now fails with an error.

Were there any changes in the way inplace operations like +=, *=, etc.
work on arrays with non-standard strides?

For the script:

--- start of code ---

import numpy

x = numpy.arange(6).reshape((3,2))
y = numpy.arange(2)

print 'x=\n', x
print 'y=\n', y

u,v = numpy.broadcast_arrays(x, y)

print 'u=\n', u
print 'v=\n', v
print 'v.strides=\n', v.strides

v += u

print 'v=\n', v  # expectation: v = [[6,12], [6,12], [6,12]]
print 'u=\n', u
print 'y=\n', y  # expectation: y = [6,12]

--- end of code ---

I get the output

 start of output -
x=
[[0 1]
 [2 3]
 [4 5]]
y=
[0 1]
u=
[[0 1]
 [2 3]
 [4 5]]
v=
[[0 1]
 [0 1]
 [0 1]]
v.strides=
(0, 8)
v=
[[4 6]
 [4 6]
 [4 6]]
u=
[[0 1]
 [2 3]
 [4 5]]
y=
[4 6]

 end of output 

I would have expected that

v += u

performs an element-by-element +=

v[0,0] += u[0,0]  # increments y[0]
v[0,1] += u[0,1]  # increments y[1]
v[1,0] += u[1,0]  # increments y[0]
v[1,1] += u[1,1]  # increments y[1]
v[2,0] += u[2,0]  # increments y[0]
v[2,1] += u[2,1]  # increments y[1]

 yielding the result

y = [6,12]

but instead one obtains

y = [4, 6]

which could be the result of

v[2,0] += u[2,0]  # increments y[0]
v[2,1] += u[2,1]  # increments y[1]


Is this the intended behavior?

regards,
Sebastian
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] How to debug reference counting errors

2012-08-31 Thread Dag Sverre Seljebotn
On 08/31/2012 09:03 AM, Ondřej Čertík wrote:
 Hi,

 There is segfault reported here:

 http://projects.scipy.org/numpy/ticket/1588

 I've managed to isolate the problem and even provide a simple patch,
 that fixes it here:

 https://github.com/numpy/numpy/issues/398

 however the patch simply doesn't decrease the proper reference, so it
 might leak. I've used
 bisection (took the whole evening unfortunately...) but the good news
 is that I've isolated commits
 that actually broke it. See the github issue #398 for details, diffs etc.

 Unfortunately, it's 12 commits from Mark and the individual commits
 raise exception on the segfaulting code,
 so I can't pin point the problem further.

 In general, how can I debug this sort of problem? I tried to use
 valgrind, with a debugging build of numpy,
 but it provides tons of false (?) positives: https://gist.github.com/3549063

 Mark, by looking at the changes that broke it, as well as at my fix,
 do you see where the problem could be?

 I suspect it is something with the changes in PyArray_FromAny() or
 PyArray_FromArray() in ctors.c.
 But I don't see anything so far that could cause it.

 Thanks for any help. This is one of the issues blocking the 1.7.0 release.

IIRC you can recompile Python with some support for detecting memory 
leaks. One of the issues with using Valgrind, after suppressing the 
false positives, is that Python uses its own memory allocator so that 
sits between the bug and what Valgrind detects. So at least recompile 
Python to not do that.

As for hardening the NumPy source in general, you should at least be 
aware of these two options:

1) David Malcolm (dmalc...@redhat.com) was writing a static code 
analysis plugin for gcc that would check every routine that the 
reference count semantics was correct. (I don't know how far he's got 
with that.)

2) In Cython we have a reference count nanny. This requires changes to 
all the code though, so not an option just for finding this bug, just 
thought I'd mention it. In addition to the INCREF/DECREF you need to 
insert new GIVEREF and GOTREF calls (which are noops in a normal 
compile) to declare where you get and give away a reference. When 
Cython-generated sources are enabled with -DCYTHON_REFNANNY, 
INCREF/DECREF/GIVEREF/GOTREF are tracked within each function and a 
failure is raised if the function violates any contract.

Dag
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] view of recarray issue

2012-08-31 Thread Jay Bourque
Ondrej,

Sorry for the delay in getting back to this. I have some free time today to
get this resolved if you haven't already fixed it.

-Jay

On Wed, Aug 29, 2012 at 7:19 PM, Ondřej Čertík ondrej.cer...@gmail.comwrote:

 Jay,

 On Mon, Aug 20, 2012 at 12:40 PM, Ondřej Čertík ondrej.cer...@gmail.com
 wrote:
  On Wed, Jul 25, 2012 at 10:29 AM, Jay Bourque jay.bour...@continuum.io
 wrote:
  I'm actively looking at this issue since it was my pull request that
 broke
  this (https://github.com/numpy/numpy/pull/350). We definitely don't
 want to
  break this functionality for 1.7. The problem is that even though
 indexing
  with a subset of fields still returns a copy (for now), it now returns a
  copy of a view of the original array. When you call copy() on a view, it
  copies the entire original structured array with the view dtype. A short
  term fix would be to manually create a proper copy to return similar
 to
  what _index_fields() did before my change, but since the idea is to
  eventually return the view instead of a copy, long term we need a way
 to do
  a proper copy of a structured array view that doesn't copy the unwanted
  fields.
 
  This should be fixed for 1.7.0. However, I am going to release beta now,
  and then see what we can do about this.

 What would be the best short term fix, so that we can release 1.7.0?

 I am still trying to understand what exactly the problem with dtype is
 in _index_fields().
 Would you suggest to keep using the view, or somehow revert to the old
 behavior while
 still trying to pass all the new tests in your PR 350? If you have any
 hints,
 it would save me some time.

 Thanks,
 Ondrej

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ANN: NumPy 1.7.0b1 release

2012-08-31 Thread Sandro Tosi
Hello,

On Tue, Aug 21, 2012 at 6:24 PM, Ondřej Čertík ondrej.cer...@gmail.com wrote:
 Hi,

 I'm pleased to announce the availability of the first beta release of
 NumPy 1.7.0b1.

I've just uploaded it to Debian experimental, so we can give it a run
while in freeze. Some of the buildds are already building[1] the
package, so we should get results asap (either failures or successes).

[1] 
https://buildd.debian.org/status/package.php?p=python-numpysuite=experimental

If tests fail, it won't stop the build, and indeed I got at least 2
errors (actually 1 error and 1 crash), when running tests for python
2.7 and 3.2 with debug enabled:

2.7 dbg

==
ERROR: test_power_zero (test_umath.TestPower)
--
Traceback (most recent call last):
  File 
/tmp/buildd/python-numpy-1.7.0~b1/debian/tmp/usr/lib/python2.7/dist-packages/numpy/core/tests/test_umath.py,
line 139, in test_power_zero
assert_complex_equal(np.power(zero, 0+1j), cnan)
RuntimeWarning: invalid value encountered in power

--

3.2 dbg

python3.2-dbg: numpy/core/src/multiarray/common.c:161:
PyArray_DTypeFromObjectHelper: Assertion
`((PyObject*)(temp))-ob_type))-tp_flags  ((1L27))) != 0)'
failed.
Aborted

I'm reporting them here since you asked so, dunno if you want an issue
on github to track them. I'll look at the buildds logs and report
additional failures if they come up.

Cheers,
-- 
Sandro Tosi (aka morph, morpheus, matrixhasu)
My website: http://matrixhasu.altervista.org/
Me at Debian: http://wiki.debian.org/SandroTosi
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Issues for 1.7.0

2012-08-31 Thread Charles R Harris
On Thu, Aug 30, 2012 at 10:47 PM, Ondřej Čertík ondrej.cer...@gmail.comwrote:

 Hi,

 I am keeping track of all issues that need to be done for the 1.7.0
 release here:

 https://github.com/numpy/numpy/issues/396

 If you have trac and github push access, here is how you can help (by
 closing/merging):

 Issues that need clarification:

 http://projects.scipy.org/numpy/ticket/2150
 http://projects.scipy.org/numpy/ticket/2101

 Issues fixed (should be closed):

 http://projects.scipy.org/numpy/ticket/2185
 http://projects.scipy.org/numpy/ticket/2066
 http://projects.scipy.org/numpy/ticket/2189

 PRs that need merging:

 https://github.com/numpy/numpy/pull/395
 https://github.com/numpy/numpy/pull/397


 There are still a few more (see my github issue above), that I am
 working on right now.


Ondrej,

It looks like you don't have commit rights. Is that the case? If you are
the release manager I think you need both commit rights and the right to
close tickets.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Temporary error accessing NumPy tickets

2012-08-31 Thread Pauli Virtanen
Ondřej Čertík ondrej.certik at gmail.com writes:
 When I access tickets, for example:
 
 http://projects.scipy.org/numpy/ticket/2185
 
 then sometimes I get:
 
 Trac detected an internal error:
 OperationalError: database is locked
 
 For example yesterday. A refresh in about a minute fixed the problem.
 Today it still lasts at the moment.

The failures are probably partly triggered by the machine running out of memory.
It runs services on mod_python, which apparently slowly leaks. Someone (who?)
with root access on the machine needs to restart Apache. (Note: apachectl
graceful is not enough to correct this, it needs a real restart of the 
process.)

Longer term solution is to move out of mod_python (mod_wsgi likely, going to CGI
will create other performance problems), or to transition the stuff there to a
more beefy server.

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] view of recarray issue

2012-08-31 Thread Ondřej Čertík
On Fri, Aug 31, 2012 at 6:15 AM, Jay Bourque jay.bour...@continuum.io wrote:
 Ondrej,

 Sorry for the delay in getting back to this. I have some free time today to
 get this resolved if you haven't already fixed it.

I haven't. If you can look at it, that would be absolutely awesome.
If you don't manage to fix it, if you can give me some hints what's
going on, that would also be a huge help.

Many thanks!
Ondrej
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Issues for 1.7.0

2012-08-31 Thread Ondřej Čertík
On Fri, Aug 31, 2012 at 9:27 AM, Charles R Harris
charlesr.har...@gmail.com wrote:


 On Thu, Aug 30, 2012 at 10:47 PM, Ondřej Čertík ondrej.cer...@gmail.com
 wrote:

 Hi,

 I am keeping track of all issues that need to be done for the 1.7.0
 release here:

 https://github.com/numpy/numpy/issues/396

 If you have trac and github push access, here is how you can help (by
 closing/merging):

 Issues that need clarification:

 http://projects.scipy.org/numpy/ticket/2150
 http://projects.scipy.org/numpy/ticket/2101

 Issues fixed (should be closed):

 http://projects.scipy.org/numpy/ticket/2185
 http://projects.scipy.org/numpy/ticket/2066
 http://projects.scipy.org/numpy/ticket/2189

 PRs that need merging:

 https://github.com/numpy/numpy/pull/395
 https://github.com/numpy/numpy/pull/397


 There are still a few more (see my github issue above), that I am
 working on right now.


 Ondrej,

 It looks like you don't have commit rights. Is that the case? If you are the
 release manager I think you need both commit rights and the right to close
 tickets.

Yes, I don't have commit rights nor the rights to close tickets.

Ondrej
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Temporary error accessing NumPy tickets

2012-08-31 Thread Ondřej Čertík
On Fri, Aug 31, 2012 at 9:35 AM, Pauli Virtanen p...@iki.fi wrote:
 Ondřej Čertík ondrej.certik at gmail.com writes:
 When I access tickets, for example:

 http://projects.scipy.org/numpy/ticket/2185

 then sometimes I get:

 Trac detected an internal error:
 OperationalError: database is locked

 For example yesterday. A refresh in about a minute fixed the problem.
 Today it still lasts at the moment.

 The failures are probably partly triggered by the machine running out of 
 memory.
 It runs services on mod_python, which apparently slowly leaks. Someone (who?)
 with root access on the machine needs to restart Apache. (Note: apachectl
 graceful is not enough to correct this, it needs a real restart of the 
 process.)

I see.


 Longer term solution is to move out of mod_python (mod_wsgi likely, going to 
 CGI
 will create other performance problems), or to transition the stuff there to a
 more beefy server.

Or move the tickets to github.

Yesterday it was very unreliable (I had to wait a long time before a
comment was posted, and about 50%
of the time it was not posted due to the database error). So I just
created a github issue for the same thing and posted
my comments there. Then I could work fast.

Ondrej
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ANN: NumPy 1.7.0b1 release

2012-08-31 Thread Ondřej Čertík
Hi Sandro,

On Fri, Aug 31, 2012 at 6:18 AM, Sandro Tosi mo...@debian.org wrote:
 Hello,

 On Tue, Aug 21, 2012 at 6:24 PM, Ondřej Čertík ondrej.cer...@gmail.com 
 wrote:
 Hi,

 I'm pleased to announce the availability of the first beta release of
 NumPy 1.7.0b1.

 I've just uploaded it to Debian experimental, so we can give it a run
 while in freeze. Some of the buildds are already building[1] the
 package, so we should get results asap (either failures or successes).

This is awesome, thanks you so much for doing this. This should
reveal some bugs.


 [1] 
 https://buildd.debian.org/status/package.php?p=python-numpysuite=experimental

 If tests fail, it won't stop the build, and indeed I got at least 2
 errors (actually 1 error and 1 crash), when running tests for python
 2.7 and 3.2 with debug enabled:

 2.7 dbg

 ==
 ERROR: test_power_zero (test_umath.TestPower)
 --
 Traceback (most recent call last):
   File 
 /tmp/buildd/python-numpy-1.7.0~b1/debian/tmp/usr/lib/python2.7/dist-packages/numpy/core/tests/test_umath.py,
 line 139, in test_power_zero
 assert_complex_equal(np.power(zero, 0+1j), cnan)
 RuntimeWarning: invalid value encountered in power

 --

 3.2 dbg

 python3.2-dbg: numpy/core/src/multiarray/common.c:161:
 PyArray_DTypeFromObjectHelper: Assertion
 `((PyObject*)(temp))-ob_type))-tp_flags  ((1L27))) != 0)'
 failed.
 Aborted

 I'm reporting them here since you asked so, dunno if you want an issue
 on github to track them. I'll look at the buildds logs and report
 additional failures if they come up.

If you could create issues at github: https://github.com/numpy/numpy/issues
that would be great. If you have time, also with some info about the platform
and how to reproduce it. Or at least a link to the build logs.

I'll add it to the release TODO and try to fix it.

Ondrej
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Issues for 1.7.0

2012-08-31 Thread Charles R Harris
On Fri, Aug 31, 2012 at 11:10 AM, Ondřej Čertík ondrej.cer...@gmail.comwrote:

 On Fri, Aug 31, 2012 at 9:27 AM, Charles R Harris
 charlesr.har...@gmail.com wrote:
 
 
  On Thu, Aug 30, 2012 at 10:47 PM, Ondřej Čertík ondrej.cer...@gmail.com
 
  wrote:
 
  Hi,
 
  I am keeping track of all issues that need to be done for the 1.7.0
  release here:
 
  https://github.com/numpy/numpy/issues/396
 
  If you have trac and github push access, here is how you can help (by
  closing/merging):
 
  Issues that need clarification:
 
  http://projects.scipy.org/numpy/ticket/2150
  http://projects.scipy.org/numpy/ticket/2101
 
  Issues fixed (should be closed):
 
  http://projects.scipy.org/numpy/ticket/2185
  http://projects.scipy.org/numpy/ticket/2066
  http://projects.scipy.org/numpy/ticket/2189
 
  PRs that need merging:
 
  https://github.com/numpy/numpy/pull/395
  https://github.com/numpy/numpy/pull/397
 
 
  There are still a few more (see my github issue above), that I am
  working on right now.
 
 
  Ondrej,
 
  It looks like you don't have commit rights. Is that the case? If you are
 the
  release manager I think you need both commit rights and the right to
 close
  tickets.

 Yes, I don't have commit rights nor the rights to close tickets.


OK, I gave commit rights to you. Someone else (Pauli) will need to give you
rights to close tickets.  I think Thouis also needs rights if he is going
to do the issue tracking.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Issues for 1.7.0

2012-08-31 Thread Ondřej Čertík
On Fri, Aug 31, 2012 at 10:26 AM, Charles R Harris
charlesr.har...@gmail.com wrote:


 On Fri, Aug 31, 2012 at 11:10 AM, Ondřej Čertík ondrej.cer...@gmail.com
[...]
 Yes, I don't have commit rights nor the rights to close tickets.


 OK, I gave commit rights to you. Someone else (Pauli) will need to give you
 rights to close tickets.  I think Thouis also needs rights if he is going to
 do the issue tracking.

Thanks a lot. I just wrote to Pauli privately and CCed you.

Ondrej
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ANN: NumPy 1.7.0b1 release

2012-08-31 Thread Stefan Krah
Ond??ej ??ert??k ondrej.cer...@gmail.com wrote:
  python3.2-dbg: numpy/core/src/multiarray/common.c:161:
  PyArray_DTypeFromObjectHelper: Assertion
  `((PyObject*)(temp))-ob_type))-tp_flags  ((1L27))) != 0)'
 
 If you could create issues at github: https://github.com/numpy/numpy/issues
 that would be great. If you have time, also with some info about the platform
 and how to reproduce it. Or at least a link to the build logs.

For the second one there's an issue here:

http://projects.scipy.org/numpy/ticket/2193


Stefan Krah


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] view of recarray issue

2012-08-31 Thread Jay Bourque
Ondrej,

Just submitted the following pull request for this:

https://github.com/numpy/numpy/pull/401

-Jay

On Fri, Aug 31, 2012 at 12:09 PM, Ondřej Čertík ondrej.cer...@gmail.comwrote:

 On Fri, Aug 31, 2012 at 6:15 AM, Jay Bourque jay.bour...@continuum.io
 wrote:
  Ondrej,
 
  Sorry for the delay in getting back to this. I have some free time today
 to
  get this resolved if you haven't already fixed it.

 I haven't. If you can look at it, that would be absolutely awesome.
 If you don't manage to fix it, if you can give me some hints what's
 going on, that would also be a huge help.

 Many thanks!
 Ondrej

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ANN: NumPy 1.7.0b1 release

2012-08-31 Thread Sandro Tosi
On Fri, Aug 31, 2012 at 7:17 PM, Ondřej Čertík ondrej.cer...@gmail.com wrote:
 If you could create issues at github: https://github.com/numpy/numpy/issues
 that would be great. If you have time, also with some info about the platform
 and how to reproduce it. Or at least a link to the build logs.

I've reported it here: https://github.com/numpy/numpy/issues/402

Cheers,
-- 
Sandro Tosi (aka morph, morpheus, matrixhasu)
My website: http://matrixhasu.altervista.org/
Me at Debian: http://wiki.debian.org/SandroTosi
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Temporary error accessing NumPy tickets

2012-08-31 Thread Ognen Duzlevski
On Fri, Aug 31, 2012 at 11:35 AM, Pauli Virtanen p...@iki.fi wrote:
 Ondřej Čertík ondrej.certik at gmail.com writes:
 When I access tickets, for example:

 http://projects.scipy.org/numpy/ticket/2185

 then sometimes I get:

 Trac detected an internal error:
 OperationalError: database is locked

 For example yesterday. A refresh in about a minute fixed the problem.
 Today it still lasts at the moment.

 The failures are probably partly triggered by the machine running out of 
 memory.
 It runs services on mod_python, which apparently slowly leaks. Someone (who?)
 with root access on the machine needs to restart Apache. (Note: apachectl
 graceful is not enough to correct this, it needs a real restart of the 
 process.)

I do that regularly.

 Longer term solution is to move out of mod_python (mod_wsgi likely, going to 
 CGI
 will create other performance problems), or to transition the stuff there to a
 more beefy server.

There is also Trac. Between Trac and mod_python the load on the
machine goes up to 20+ at times. I spent some time trying to figure
out a move of the current machine to Amazon to a beefier instance (and
I am not opposed to it but there is a lot of cruft and strange setup
on it as well as the fact that it is not really clear what is what and
why it is running) but this would be a case of solving a problem by
throwing more hardware at it. If everyone is OK with that, fine. I
personally think moving away from Trac (which IMHO is bloated and
awkward in addition to having a very weird way of being administered)
would be a better idea.

My $0.02
Ognen
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Temporary error accessing NumPy tickets

2012-08-31 Thread Pauli Virtanen
01.09.2012 00:08, Ognen Duzlevski kirjoitti:
[clip]
 I personally think moving away from Trac (which IMHO is bloated and
 awkward in addition to having a very weird way of being administered)
 would be a better idea.

Yes, moving away from Trac is planned, both for Numpy and Scipy. Also
agreed on the point of clumsy administration.

This however leaves the other services still on the machine, although
after dropping Trac, the juice probably is enough for them.

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] How to debug reference counting errors

2012-08-31 Thread Ondřej Čertík
Hi Dag,

On Fri, Aug 31, 2012 at 4:22 AM, Dag Sverre Seljebotn
d.s.seljeb...@astro.uio.no wrote:
 On 08/31/2012 09:03 AM, Ondřej Čertík wrote:
 Hi,

 There is segfault reported here:

 http://projects.scipy.org/numpy/ticket/1588

 I've managed to isolate the problem and even provide a simple patch,
 that fixes it here:

 https://github.com/numpy/numpy/issues/398

 however the patch simply doesn't decrease the proper reference, so it
 might leak. I've used
 bisection (took the whole evening unfortunately...) but the good news
 is that I've isolated commits
 that actually broke it. See the github issue #398 for details, diffs etc.

 Unfortunately, it's 12 commits from Mark and the individual commits
 raise exception on the segfaulting code,
 so I can't pin point the problem further.

 In general, how can I debug this sort of problem? I tried to use
 valgrind, with a debugging build of numpy,
 but it provides tons of false (?) positives: https://gist.github.com/3549063

 Mark, by looking at the changes that broke it, as well as at my fix,
 do you see where the problem could be?

 I suspect it is something with the changes in PyArray_FromAny() or
 PyArray_FromArray() in ctors.c.
 But I don't see anything so far that could cause it.

 Thanks for any help. This is one of the issues blocking the 1.7.0 release.

 IIRC you can recompile Python with some support for detecting memory
 leaks. One of the issues with using Valgrind, after suppressing the
 false positives, is that Python uses its own memory allocator so that
 sits between the bug and what Valgrind detects. So at least recompile
 Python to not do that.

Right. Compiling with --without-pymalloc (per README.valgrind as suggested
above by Richard) should improve things a lot. Thanks for the tip.


 As for hardening the NumPy source in general, you should at least be
 aware of these two options:

 1) David Malcolm (dmalc...@redhat.com) was writing a static code
 analysis plugin for gcc that would check every routine that the
 reference count semantics was correct. (I don't know how far he's got
 with that.)

 2) In Cython we have a reference count nanny. This requires changes to
 all the code though, so not an option just for finding this bug, just
 thought I'd mention it. In addition to the INCREF/DECREF you need to
 insert new GIVEREF and GOTREF calls (which are noops in a normal
 compile) to declare where you get and give away a reference. When
 Cython-generated sources are enabled with -DCYTHON_REFNANNY,
 INCREF/DECREF/GIVEREF/GOTREF are tracked within each function and a
 failure is raised if the function violates any contract.

I see. That's a nice option. For my own code, I never touch the
reference counting
by hand and rather just use Cython.


In the meantime, Mark fixed it:

https://github.com/numpy/numpy/pull/400
https://github.com/numpy/numpy/pull/405

Mark, thanks again for this. That saved me a lot of time.

Ondrej
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] How to debug reference counting errors

2012-08-31 Thread Mark Wiebe
On Fri, Aug 31, 2012 at 5:35 PM, Ondřej Čertík ondrej.cer...@gmail.comwrote:

 Hi Dag,

 On Fri, Aug 31, 2012 at 4:22 AM, Dag Sverre Seljebotn
 d.s.seljeb...@astro.uio.no wrote:
  On 08/31/2012 09:03 AM, Ondřej Čertík wrote:
  Hi,
 
  There is segfault reported here:
 
  http://projects.scipy.org/numpy/ticket/1588
 
  I've managed to isolate the problem and even provide a simple patch,
  that fixes it here:
 
  https://github.com/numpy/numpy/issues/398
 
  however the patch simply doesn't decrease the proper reference, so it
  might leak. I've used
  bisection (took the whole evening unfortunately...) but the good news
  is that I've isolated commits
  that actually broke it. See the github issue #398 for details, diffs
 etc.
 
  Unfortunately, it's 12 commits from Mark and the individual commits
  raise exception on the segfaulting code,
  so I can't pin point the problem further.
 
  In general, how can I debug this sort of problem? I tried to use
  valgrind, with a debugging build of numpy,
  but it provides tons of false (?) positives:
 https://gist.github.com/3549063
 
  Mark, by looking at the changes that broke it, as well as at my fix,
  do you see where the problem could be?
 
  I suspect it is something with the changes in PyArray_FromAny() or
  PyArray_FromArray() in ctors.c.
  But I don't see anything so far that could cause it.
 
  Thanks for any help. This is one of the issues blocking the 1.7.0
 release.
 
  IIRC you can recompile Python with some support for detecting memory
  leaks. One of the issues with using Valgrind, after suppressing the
  false positives, is that Python uses its own memory allocator so that
  sits between the bug and what Valgrind detects. So at least recompile
  Python to not do that.

 Right. Compiling with --without-pymalloc (per README.valgrind as
 suggested
 above by Richard) should improve things a lot. Thanks for the tip.

 
  As for hardening the NumPy source in general, you should at least be
  aware of these two options:
 
  1) David Malcolm (dmalc...@redhat.com) was writing a static code
  analysis plugin for gcc that would check every routine that the
  reference count semantics was correct. (I don't know how far he's got
  with that.)
 
  2) In Cython we have a reference count nanny. This requires changes to
  all the code though, so not an option just for finding this bug, just
  thought I'd mention it. In addition to the INCREF/DECREF you need to
  insert new GIVEREF and GOTREF calls (which are noops in a normal
  compile) to declare where you get and give away a reference. When
  Cython-generated sources are enabled with -DCYTHON_REFNANNY,
  INCREF/DECREF/GIVEREF/GOTREF are tracked within each function and a
  failure is raised if the function violates any contract.

 I see. That's a nice option. For my own code, I never touch the
 reference counting
 by hand and rather just use Cython.


 In the meantime, Mark fixed it:

 https://github.com/numpy/numpy/pull/400
 https://github.com/numpy/numpy/pull/405

 Mark, thanks again for this. That saved me a lot of time.


No problem. The way I prefer to deal with this kind of error is use C++
smart pointers. C++11's unique_ptr and boost's intrusive_ptr are both
useful for painlessly managing this kind of reference counting headache.

-Mark


 Ondrej
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] How to debug reference counting errors

2012-08-31 Thread Ondřej Čertík
On Fri, Aug 31, 2012 at 5:56 PM, Mark Wiebe mwwi...@gmail.com wrote:
 On Fri, Aug 31, 2012 at 5:35 PM, Ondřej Čertík ondrej.cer...@gmail.com
 wrote:

 Hi Dag,

 On Fri, Aug 31, 2012 at 4:22 AM, Dag Sverre Seljebotn
 d.s.seljeb...@astro.uio.no wrote:
  On 08/31/2012 09:03 AM, Ondřej Čertík wrote:
  Hi,
 
  There is segfault reported here:
 
  http://projects.scipy.org/numpy/ticket/1588
 
  I've managed to isolate the problem and even provide a simple patch,
  that fixes it here:
 
  https://github.com/numpy/numpy/issues/398
 
  however the patch simply doesn't decrease the proper reference, so it
  might leak. I've used
  bisection (took the whole evening unfortunately...) but the good news
  is that I've isolated commits
  that actually broke it. See the github issue #398 for details, diffs
  etc.
 
  Unfortunately, it's 12 commits from Mark and the individual commits
  raise exception on the segfaulting code,
  so I can't pin point the problem further.
 
  In general, how can I debug this sort of problem? I tried to use
  valgrind, with a debugging build of numpy,
  but it provides tons of false (?) positives:
  https://gist.github.com/3549063
 
  Mark, by looking at the changes that broke it, as well as at my fix,
  do you see where the problem could be?
 
  I suspect it is something with the changes in PyArray_FromAny() or
  PyArray_FromArray() in ctors.c.
  But I don't see anything so far that could cause it.
 
  Thanks for any help. This is one of the issues blocking the 1.7.0
  release.
 
  IIRC you can recompile Python with some support for detecting memory
  leaks. One of the issues with using Valgrind, after suppressing the
  false positives, is that Python uses its own memory allocator so that
  sits between the bug and what Valgrind detects. So at least recompile
  Python to not do that.

 Right. Compiling with --without-pymalloc (per README.valgrind as
 suggested
 above by Richard) should improve things a lot. Thanks for the tip.

 
  As for hardening the NumPy source in general, you should at least be
  aware of these two options:
 
  1) David Malcolm (dmalc...@redhat.com) was writing a static code
  analysis plugin for gcc that would check every routine that the
  reference count semantics was correct. (I don't know how far he's got
  with that.)
 
  2) In Cython we have a reference count nanny. This requires changes to
  all the code though, so not an option just for finding this bug, just
  thought I'd mention it. In addition to the INCREF/DECREF you need to
  insert new GIVEREF and GOTREF calls (which are noops in a normal
  compile) to declare where you get and give away a reference. When
  Cython-generated sources are enabled with -DCYTHON_REFNANNY,
  INCREF/DECREF/GIVEREF/GOTREF are tracked within each function and a
  failure is raised if the function violates any contract.

 I see. That's a nice option. For my own code, I never touch the
 reference counting
 by hand and rather just use Cython.


 In the meantime, Mark fixed it:

 https://github.com/numpy/numpy/pull/400
 https://github.com/numpy/numpy/pull/405

 Mark, thanks again for this. That saved me a lot of time.


 No problem. The way I prefer to deal with this kind of error is use C++
 smart pointers. C++11's unique_ptr and boost's intrusive_ptr are both useful
 for painlessly managing this kind of reference counting headache.

Oh yes. I prefer to use Trilinos' RCP, which is a shared pointer (just
like in C++11), but has better debugging info if something goes wrong.
It can be compiled in two modes -- one is slower and it can't
segfault, and the other is optimized, most operations are at native
raw pointer speed, but it can segfault.

Ondrej
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion