[Numpy-discussion] Array addition inconsistency
Dear all, After some surprise, I noticed an inconsistency while adding array slices: a = np.arange(5) a[1:] = a[1:] + a[:-1] a array([0, 1, 3, 5, 7]) versus inplace a = np.arange(5) a[1:] += a[:-1] a array([ 0, 1, 3, 6, 10]) My suspicition is that the second variant does not create intermediate storage, and thus works on the intermediate result, effectively performing a.cumsum(). This behaviour is certainly surprising, and leads to unexpected errors if used without testing. Best, Martin -- Dr. Martin Lüthi lue...@vaw.baug.ethz.ch VAW Glaciology http://www.vaw.ethz.ch/gz ETH Zürich http://people.ee.ethz.ch/~luethim 8093 ZürichTel: +41 44 632 40 93 ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Array addition inconsistency
On Thu, Aug 29, 2013 at 12:00 PM, Martin Luethi lue...@vaw.baug.ethz.ch wrote: Dear all, After some surprise, I noticed an inconsistency while adding array slices: a = np.arange(5) a[1:] = a[1:] + a[:-1] a array([0, 1, 3, 5, 7]) versus inplace a = np.arange(5) a[1:] += a[:-1] a array([ 0, 1, 3, 6, 10]) My suspicition is that the second variant does not create intermediate storage, and thus works on the intermediate result, effectively performing a.cumsum(). Correct. Not creating intermediate storage is the point of using augmented assignment. -- Robert Kern ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Array addition inconsistency
On Thu, Aug 29, 2013 at 8:04 AM, Robert Kern robert.k...@gmail.com wrote: On Thu, Aug 29, 2013 at 12:00 PM, Martin Luethi lue...@vaw.baug.ethz.ch wrote: Dear all, After some surprise, I noticed an inconsistency while adding array slices: a = np.arange(5) a[1:] = a[1:] + a[:-1] a array([0, 1, 3, 5, 7]) versus inplace a = np.arange(5) a[1:] += a[:-1] a array([ 0, 1, 3, 6, 10]) My suspicition is that the second variant does not create intermediate storage, and thus works on the intermediate result, effectively performing a.cumsum(). Correct. Not creating intermediate storage is the point of using augmented assignment. This can be very sneaky. a = np.arange(5) a[:-1] = a[:-1] + a[1:] a array([1, 3, 5, 7, 4]) a = np.arange(5) a[:-1] += a[1:] a array([1, 3, 5, 7, 4]) So, if someone is semi-careful and tries out that example, they might (incorrectly) assume that such operations are safe without realizing that it was safe because the values of a[1:] were ahead of the values of a[:-1] in memory. I could easily imagine a situation where views of an array are passed around only to finally end up in an in-place operation like this and sometimes be right and sometimes be wrong. Maybe there can be some simple check that could be performed to detect this sort of situation? Cheers! Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Relative speed
Hi, I need to know about the relative speed (i.e., which one is faster) of the followings: 1. list and numpy array, tuples and numpy array 2. list of tuples and numpy matrix (first one is rectangular) 3. random.randint() and numpy.random.random_integers() Thank you. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Array addition inconsistency
On Thu, Aug 29, 2013 at 9:39 AM, Benjamin Root ben.r...@ou.edu wrote: On Thu, Aug 29, 2013 at 8:04 AM, Robert Kern robert.k...@gmail.comwrote: On Thu, Aug 29, 2013 at 12:00 PM, Martin Luethi lue...@vaw.baug.ethz.ch wrote: Dear all, After some surprise, I noticed an inconsistency while adding array slices: a = np.arange(5) a[1:] = a[1:] + a[:-1] a array([0, 1, 3, 5, 7]) versus inplace a = np.arange(5) a[1:] += a[:-1] a array([ 0, 1, 3, 6, 10]) My suspicition is that the second variant does not create intermediate storage, and thus works on the intermediate result, effectively performing a.cumsum(). Correct. Not creating intermediate storage is the point of using augmented assignment. This can be very sneaky. a = np.arange(5) a[:-1] = a[:-1] + a[1:] a array([1, 3, 5, 7, 4]) a = np.arange(5) a[:-1] += a[1:] a array([1, 3, 5, 7, 4]) So, if someone is semi-careful and tries out that example, they might (incorrectly) assume that such operations are safe without realizing that it was safe because the values of a[1:] were ahead of the values of a[:-1] in memory. I could easily imagine a situation where views of an array are passed around only to finally end up in an in-place operation like this and sometimes be right and sometimes be wrong. Maybe there can be some simple check that could be performed to detect this sort of situation? I think the main message is that you don't use inplace operation with mutables unless you know what you are doing, and you really need them. inplace cumsum in python a = range(5) for i in xrange(1, 5): a[i] += a[i-1] ... a [0, 1, 3, 6, 10] Defensive programming. Josef Cheers! Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Relative speed
On 08/29/2013 09:33 AM, Anubhab Baksi wrote: Hi, I need to know about the relative speed (i.e., which one is faster) of the followings: 1. list and numpy array, tuples and numpy array 2. list of tuples and numpy matrix (first one is rectangular) 3. random.randint() and numpy.random.random_integers() African or European? It really depends on what you're doing with it. The ipython %timeit magic is pretty useful for answering that question. Note that the answer may change dramatically based on the size of the data set. -- Jonathan Niehof ISR-3 Space Data Systems Los Alamos National Laboratory MS-D466 Los Alamos, NM 87545 Phone: 505-667-9595 email: jnie...@lanl.gov Correspondence / Technical data or Software Publicly Available ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Use numpy.distutils to build fortran with BLAS
In the Clawpack projects (specifically the Riemann solvers) we compile against LAPACK and the BLAS using f2py via the `--link-lapack_opt` flag. This does cause some problems in terms of portability though, Aron Ahmadia might be able to shed some light on this as he has looked into it most recently. On Wed, Aug 28, 2013 at 5:08 AM, Matt Hoffman hoffm...@cs.ubc.ca wrote: I have some fortran code that I would like to build and make accessible from python which needs to call a few BLAS routines. Building it in a one-off manner is not a problem, but I can't seem to find a good way to distribute it in a way that just works. So really I'm wondering if there is a correct way use numpy.distutils in order to link against the same BLAS libraries used by numpy. I'll paste my current approach below, but I'm not sure that this is the best/most-portable way to do it. (And it certainly breaks on MacOS due to the weird way that macs use rpath.) Anyway my setup contains something like: from numpy.distutils.core import setup from numpy.distutils.system_info import get_info, NotFoundError from numpy.distutils.misc_util import Configuration sources = ['foo.f'] extra_info = {} extra_link_args = [] try: extra_info = get_info('mkl', 2) extra_link_args = ['-Wl,-rpath,'+path for path in extra_info['library_dirs']], except NotFoundError: try: extra_info = get_info('blas', 2) extra_link_args = ['-Wl,-rpath,'+path for path in extra_info['library_dirs']], except NotFoundError: # couldn't find anything. just fall back to building the functions we need ourselves. sources += ['ddot.f'] config = Configuration() config.add_extension( sources=sources, extra_info=extra_info, extra_link_args=extra_link_args ) setup(**config.todict()) Thanks! -matt ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Relative speed
African or European? Why on earth would you ask that? Its a Monty Python and the Holy Grail reference. Eric ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Relative speed
On Thu, Aug 29, 2013 at 6:10 PM, Eric Moore e...@redtetrahedron.org wrote: African or European? Why on earth would you ask that? Its a Monty Python and the Holy Grail reference. Thanks. I had read that quite differently, and I'm sure I'm not the only one. Some context would have helped Ralf ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Relative speed
On 8/29/2013 3:48 PM, Ralf Gommers wrote: Some context would have helped. http://www.youtube.com/watch?v=y2R3FvS4xr4 fwiw, Alan ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Relative speed
On 08/29/2013 01:48 PM, Ralf Gommers wrote: Thanks. I had read that quite differently, and I'm sure I'm not the only one. Some context would have helped My apologies--that was a rather obtuse reference. In my oddly-wired brain it struck me as a fairly similar, suboptimally-posed question: all data structures sit in memory at the same speed, it's a question of the operations. And as you pointed out, most of the time for non-trivial datasets the numpy operations will be faster. (I'm daunted by the notion of trying to do linear algebra on lists of tuples, assuming that's the relevant set of operations given the comparison to the matrix class.) -- Jonathan Niehof ISR-3 Space Data Systems Los Alamos National Laboratory MS-D466 Los Alamos, NM 87545 Phone: 505-667-9595 email: jnie...@lanl.gov Correspondence / Technical data or Software Publicly Available ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Relative speed
And as you pointed out, most of the time for non-trivial datasets the numpy operations will be faster. (I'm daunted by the notion of trying to do linear algebra on lists of tuples, assuming that's the relevant set of operations given the comparison to the matrix class.) Note the important and pretty common exception of building up a list one element (or row of elements) at a time. Here, python lists usually rule, unless the final size is known in advance. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Stick (line segments) percolation algorithm - graph theory?
Thanks a lot!! José Luis De: Brett Olsen brett.ol...@gmail.com Para: Discussion of Numerical Python numpy-discussion@scipy.org Enviado: lunes, 26 de agosto de 2013 14:08 Asunto: Re: [Numpy-discussion] Stick (line segments) percolation algorithm - graph theory? I can see a couple opportunities for improvements in your algorithm. Running your code on a single experiment, I get about 2.9 seconds to run. I get this down to about 1.0 seconds by (1) exploiting the symmetry of the M matrix and (2) avoiding the costly inner loop over k in favor of array operations: def check_segments(j, others, data): x1, y1, x2, y2 = data x_A1B1 = x2[j]-x1[j] y_A1B1 = y2[j]-y1[j] x_A1A2 = x1[others]-x1[j] y_A1A2 = y1[others]-y1[j] x_A2A1 = -1*x_A1A2 y_A2A1 = -1*y_A1A2 x_A2B2 = x2[others]-x1[others] y_A2B2 = y2[others]-y1[others] x_A1B2 = x2[others]-x1[j] y_A1B2 = y2[others]-y1[j] x_A2B1 = x2[j]-x1[others] y_A2B1 = y2[j]-y1[others] p1 = x_A1B1*y_A1A2 - y_A1B1*x_A1A2 p2 = x_A1B1*y_A1B2 - y_A1B1*x_A1B2 p3 = x_A2B2*y_A2B1 - y_A2B2*x_A2B1 p4 = x_A2B2*y_A2A1 - y_A2B2*x_A2A1 condition_1=p1*p2 condition_2=p3*p4 return (p1 * p2 = 0) (p3 * p4 = 0) for j in xrange(1, N): valid = check_segments(j, range(j), (x1, y1, x2, y2)) M[j,0:j] = valid M[0:j,j] = valid I don't see any other particularly simple ways to improve this. You could probably add an interval check to ensure that the x and y intervals for the segments of interest overlap before doing the full check, but how much that would help would depend on the implementations. ~Brett On Fri, Aug 23, 2013 at 5:09 PM, Josè Luis Mietta joseluismie...@yahoo.com.ar wrote: I wrote an algorithm for study stick percolation (i.e.: networks between line segments that intersect between them). In my algorithm N sticks (line segments) are created inside a rectangular box of sides 'b' and 'h' and then, one by one, the algorithm explores the intersection between all line segments. This is a Monte Carlo simulation, so the 'experiment' is executed many times (no less than 100 times). Written like that, very much RAM is consumed: Here, the element Mij=1 if stick i intersects stick j and Mij=0 if not. How can I optimize my algorithm? Graph theory is useful in this case? ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] _PyADt
Anyone know what _PyADt is? It turns up in ndarraytypes.h #define PyDataType_ISBOOL(obj) PyTypeNum_ISBOOL(_PyADt(obj)) and only there. It's not in the build directory, google yields nothing. I suspect it is an historical artifact turned bug and should be replaced by ((PyArray_Descr*)(obj))-type_num. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Matrix peculiarities
Ralf, Could you please elaborate on the matrix weaknesses? Is there any work planned to eliminate the peculiarities? Regards, Colin W. Subject: Re: [Numpy-discussion] Relative speed To: Discussion of Numerical Python numpy-discussion@scipy.org Message-ID: Â Â Â CABL7CQjq6wZdfFmBgMhF5kFGpgQxqCY-Nv20=zbmtlwpxdd...@mail.gmail.com Content-Type: text/plain; charset="iso-8859-1" On Thu, Aug 29, 2013 at 3:41 PM, Jonathan T. Niehof jnie...@lanl.govwrote: On 08/29/2013 09:33 AM, Anubhab Baksi wrote: Hi, I need to know about the relative speed (i.e., which one is faster) of the followings: 1. list and numpy array, tuples and numpy array 2. list of tuples and numpy matrix (first one is rectangular) 3. random.randint() and numpy.random.random_integers() Hi Anubhab, if you have a reasonably large amount of data (say O(100)), always try to use numpy arrays and not lists or tuples - it'll be faster. I'd recommend not to use numpy.matrix, it's speed will be similar to numpy arrays but it has some peculiarities that you'd rather not deal with. For the random numbers I'm not sure without checking, just timing it in ipython with %timeit is indeed the way to go. Cheers, Ralf ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Relative speed
On Aug 29, 2013 4:11 PM, Jonathan T. Niehof jnie...@lanl.gov wrote: On 08/29/2013 01:48 PM, Ralf Gommers wrote: Thanks. I had read that quite differently, and I'm sure I'm not the only one. Some context would have helped My apologies--that was a rather obtuse reference. Just for future reference, the language and the community is full of references like these. IDLE, is named for Eric Idle, one of the members of Monty Python, while Guido's title of BDFL is a reference to a sketch. But I am sure you'd never expected that... :-p Cheers! Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Relative speed
Thanks all, my client actually wants the output at a minimum time. On Thu, Aug 29, 2013 at 11:30 PM, Ralf Gommers ralf.gomm...@gmail.comwrote: if you have a reasonably large amount of data (say O(100)), I need to deal with nearly 2**19 or 2**20 arrays of length about 250 each. On Thu, Aug 29, 2013 at 11:30 PM, Ralf Gommers ralf.gomm...@gmail.comwrote: On Thu, Aug 29, 2013 at 3:41 PM, Jonathan T. Niehof jnie...@lanl.govwrote: On 08/29/2013 09:33 AM, Anubhab Baksi wrote: Hi, I need to know about the relative speed (i.e., which one is faster) of the followings: 1. list and numpy array, tuples and numpy array 2. list of tuples and numpy matrix (first one is rectangular) 3. random.randint() and numpy.random.random_integers() Hi Anubhab, if you have a reasonably large amount of data (say O(100)), always try to use numpy arrays and not lists or tuples - it'll be faster. I'd recommend not to use numpy.matrix, it's speed will be similar to numpy arrays but it has some peculiarities that you'd rather not deal with. For the random numbers I'm not sure without checking, just timing it in ipython with %timeit is indeed the way to go. Cheers, Ralf African or European? Why on earth would you ask that? It really depends on what you're doing with it. The ipython %timeit magic is pretty useful for answering that question. Note that the answer may change dramatically based on the size of the data set. -- Jonathan Niehof ISR-3 Space Data Systems Los Alamos National Laboratory MS-D466 Los Alamos, NM 87545 Phone: 505-667-9595 email: jnie...@lanl.gov Correspondence / Technical data or Software Publicly Available ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion