[Numpy-discussion] Simple problem. Is it possible without a loop?
Hello, I am trying to solve a simple problem that becomes complex if I try to avoid looping. Let's say I have a 1D array, x, where x[i] = x[i+1] Given a certain value delta, I would like to get a subset of x, named y, where (y[i+1] - y[i]) = delta In a non-optimized and trivial way, the operation I would like to do is: y=[x[0]] for value in x: if (y[-1] -value) delta: y.append(value) y=numpy.array(y) Any hint? Best regards, Armando ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Simple problem. Is it possible without a loop?
Well, this seems to be quite close to what I need y = numpy.cumsum((x[1:]-x[:-1])/delta).astype(numpy.int) i1 = numpy.nonzero(y[1:] y[:-1]) y = numpy.take(x, i1) Sorry for the time taken! Best regards, Armando V. Armando Solé wrote: Hello, I am trying to solve a simple problem that becomes complex if I try to avoid looping. Let's say I have a 1D array, x, where x[i] = x[i+1] Given a certain value delta, I would like to get a subset of x, named y, where (y[i+1] - y[i]) = delta In a non-optimized and trivial way, the operation I would like to do is: y=[x[0]] for value in x: if (y[-1] -value) delta: y.append(value) y=numpy.array(y) Any hint? Best regards, Armando ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Simple problem. Is it possible without a loop?
There might be an easier way to accomplish that y = x[(x[1:]-x[:-1]) = delta] cheers Am Mittwoch, den 09.06.2010, 10:00 +0200 schrieb V. Armando Solé: Well, this seems to be quite close to what I need y = numpy.cumsum((x[1:]-x[:-1])/delta).astype(numpy.int) i1 = numpy.nonzero(y[1:] y[:-1]) y = numpy.take(x, i1) Sorry for the time taken! Best regards, Armando V. Armando Solé wrote: Hello, I am trying to solve a simple problem that becomes complex if I try to avoid looping. Let's say I have a 1D array, x, where x[i] = x[i+1] Given a certain value delta, I would like to get a subset of x, named y, where (y[i+1] - y[i]) = delta In a non-optimized and trivial way, the operation I would like to do is: y=[x[0]] for value in x: if (y[-1] -value) delta: y.append(value) y=numpy.array(y) Any hint? Best regards, Armando ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Simple problem. Is it possible without a loop?
A Wednesday 09 June 2010 10:00:50 V. Armando Solé escrigué: Well, this seems to be quite close to what I need y = numpy.cumsum((x[1:]-x[:-1])/delta).astype(numpy.int) i1 = numpy.nonzero(y[1:] y[:-1]) y = numpy.take(x, i1) Perhaps this is a bit shorter: y = x[(x[1:] - x[:-1]) = delta] -- Francesc Alted ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Simple problem. Is it possible without a loop?
Hah beat you to it one minute ;) Am Mittwoch, den 09.06.2010, 10:08 +0200 schrieb Francesc Alted: A Wednesday 09 June 2010 10:00:50 V. Armando Solé escrigué: Well, this seems to be quite close to what I need y = numpy.cumsum((x[1:]-x[:-1])/delta).astype(numpy.int) i1 = numpy.nonzero(y[1:] y[:-1]) y = numpy.take(x, i1) Perhaps this is a bit shorter: y = x[(x[1:] - x[:-1]) = delta] ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Simple problem. Is it possible without a loop?
That was my first thought, but that only warrants me to skip one point in x but not more than one. x= numpy.arange(10.) delta = 3 print x[(x[1:] - x[:-1]) = delta] [] instead of the requested [0, 4, 8] Armando Francesc Alted wrote: A Wednesday 09 June 2010 10:00:50 V. Armando Solé escrigué: Well, this seems to be quite close to what I need y = numpy.cumsum((x[1:]-x[:-1])/delta).astype(numpy.int) i1 = numpy.nonzero(y[1:] y[:-1]) y = numpy.take(x, i1) Perhaps this is a bit shorter: y = x[(x[1:] - x[:-1]) = delta] ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Simple problem. Is it possible without a loop?
Yeah, damn you! ;-) A Wednesday 09 June 2010 10:11:33 Robert Elsner escrigué: Hah beat you to it one minute ;) Am Mittwoch, den 09.06.2010, 10:08 +0200 schrieb Francesc Alted: A Wednesday 09 June 2010 10:00:50 V. Armando Solé escrigué: Well, this seems to be quite close to what I need y = numpy.cumsum((x[1:]-x[:-1])/delta).astype(numpy.int) i1 = numpy.nonzero(y[1:] y[:-1]) y = numpy.take(x, i1) Perhaps this is a bit shorter: y = x[(x[1:] - x[:-1]) = delta] ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Francesc Alted ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Simple problem. Is it possible without a loop?
Francesc Alted wrote: Yeah, damn you! ;-) I think you still have room for improvement ;-) ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Simple problem. Is it possible without a loop?
A Wednesday 09 June 2010 10:14:22 V. Armando Solé escrigué: That was my first thought, but that only warrants me to skip one point in x but not more than one. x= numpy.arange(10.) delta = 3 print x[(x[1:] - x[:-1]) = delta] [] instead of the requested [0, 4, 8] True! Wanting to be fast always makes you ending with the wrong result :-/ -- Francesc Alted ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Simple problem. Is it possible without a loop?
Given a certain value delta, I would like to get a subset of x, named y, where (y[i+1] - y[i]) = delta So in fact the problem is to find y such that (y[i(k)+n] - y[i(k)]) = delta for n = len(x) - 1 - i and i(0) = 0, i(k+1) = i(k) + n ? Well a loop or list comparison seems like a good choice to me. It is much more obvious at the expense of two LOCs. Did you profile the two possibilities and are they actually performance-critical? cheers Am Mittwoch, den 09.06.2010, 10:14 +0200 schrieb V. Armando Solé: That was my first thought, but that only warrants me to skip one point in x but not more than one. x= numpy.arange(10.) delta = 3 print x[(x[1:] - x[:-1]) = delta] [] instead of the requested [0, 4, 8] Armando Francesc Alted wrote: A Wednesday 09 June 2010 10:00:50 V. Armando Solé escrigué: Well, this seems to be quite close to what I need y = numpy.cumsum((x[1:]-x[:-1])/delta).astype(numpy.int) i1 = numpy.nonzero(y[1:] y[:-1]) y = numpy.take(x, i1) Perhaps this is a bit shorter: y = x[(x[1:] - x[:-1]) = delta] ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Simple problem. Is it possible without a loop?
Well happens. The problem description was not 100% clear thus I still think your line did solve the problem. A simple misunderstanding. So what do I learn from it?: Always look at the code, not the description :D Am Mittwoch, den 09.06.2010, 10:19 +0200 schrieb Francesc Alted: A Wednesday 09 June 2010 10:14:22 V. Armando Solé escrigué: That was my first thought, but that only warrants me to skip one point in x but not more than one. x= numpy.arange(10.) delta = 3 print x[(x[1:] - x[:-1]) = delta] [] instead of the requested [0, 4, 8] True! Wanting to be fast always makes you ending with the wrong result :-/ ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] C vs. Fortran order -- misleading documentation?
On Wed, Jun 9, 2010 at 4:16 PM, Francesc Alted fal...@pytables.org wrote: A Tuesday 08 June 2010 23:34:09 Anne Archibald escrigué: But the issue isn't one of efficiency, it's merely an arbitrarily chosen convention. (Does anyone know the history of the choices for FORTRAN and C, esp. why KR chose the opposite of what was already in common usage in FORTRAN? Just curious?) This is speculation, not knowledge, but it's worth pointing out that there are actually two ways to represent a multidimensional array in C: as a block of memory with appropriate type definitions, or as an array of pointers to subarrays. This latter approach is generally not used for numerical work, but is potentially useful for other applications. More relevantly, it already has a natural syntax; a[2][3][5] naturally follows the chain of pointers and gives you what you want; it also forces your last index to change most rapidly as you walk through memory. So it would be very odd if multidimensional arrays defined without pointers but using the same syntax were indexed the other way around. (Let's ignore abominations like 5[3[2[a]]].) Hey, maybe it is only speculation, but this is the most convincing argument for breaking Fortran convention that I've ever heard I think that arrays are just syntax on pointer is indeed the key reason for how C works here. Since a[b] really means a + b (which is why 5[a] and a[5] are the same), I don't see how to do it differently. (although I'm not sure if C was really breaking Fortran convention, as both languages should have born more or less in time, although I'd say that Fortran is a bit older). Fortran is the oldest language I am aware of - certainly the oldest still widely in use. it is even older than Lisp, the first version is from 1956-57, and was proposed by Backus to IBM in 53 according to wikipedia. It was created at a time where many people thought the very idea of a compiler did not make any sense and was impossible. So yes, Fortran is *much* older than C. cheers, David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Newby question: best way to create a list of indices
Hi, I'm reading netCDF files using pupynere and I want to extract 22 values from a 1440x400 array. I have the indices of the values, they are: 92 832 92 833 91 832 91 833 91 834 90 835 90 832 90 833 90 834 89 832 89 833 89 834 88 832 88 833 88 834 87 832 87 833 87 834 86 832 86 833 86 834 85 833 What is the best way to store these indices so that I can programmatically extract the values? I have tried storing them in pairs - (index1='92,832') - and then I can use: - precipitation.data[int(index1[:2]),int(index1[3:])] - Is there a better way? The values are also not regular enough to use a nested loop, as far as I can see. Thanks Hanlie ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Newby question: best way to create a list of indices
On 9 June 2010 12:01, Hanlie Pretorius hanlie.pretor...@gmail.com wrote: I'm reading netCDF files using pupynere and I want to extract 22 values from a 1440x400 array. I have the indices of the values, they are: 92 832 92 833 91 832 91 833 ... What is the best way to store these indices so that I can programmatically extract the values? I have tried storing them in pairs - (index1='92,832') - and then I can use: - precipitation.data[int(index1[:2]),int(index1[3:])] - Is there a better way? The values are also not regular enough to use a nested loop, as far as I can see. This is a use case where NumPy is very convenient. You can use fancy indexing import numpy as np a = np.random.random((1440, 400)) rows = [832, 833, 832, 833] # first 4 rows cols = [92, 92, 91, 91] # first 4 cols a[rows, cols] array([ 0.56539344, 0.14711586, 0.40491459, 0.29997256]) a[832, 92] # check 0.56539343852732926 Read more here http://docs.scipy.org/doc/numpy/user/basics.indexing.html#index-arrays Cheers, Scott ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Simple problem. Is it possible without a loop?
/ / Given a certain value delta, I would like to get a subset of x, named // y, // where (y[i+1] - y[i])= delta / So in fact the problem is to find y such that (y[i(k)+n] - y[i(k)])= delta for n= len(x) - 1 - i and i(0) = 0, i(k+1) = i(k) + n ? Well a loop or list comparison seems like a good choice to me. It is much more obvious at the expense of two LOCs. Did you profile the two possibilities and are they actually performance-critical? cheers Am Mittwoch, den 09.06.2010, 10:14 +0200 schrieb V. Armando Solé: / That was my first thought, but that only warrants me to skip one point // in x but not more than one. // // x= numpy.arange(10.) // delta = 3 // print x[(x[1:] - x[:-1])= delta] // [] // // instead of the requested [0, 4, 8] // // Armando // // Francesc Alted wrote: //A Wednesday 09 June 2010 10:00:50 V. Armando Solé escrigué: // //Well, this seems to be quite close to what I need // //y = numpy.cumsum((x[1:]-x[:-1])/delta).astype(numpy.int) //i1 = numpy.nonzero(y[1:] y[:-1]) //y = numpy.take(x, i1) // // //Perhaps this is a bit shorter: // //y = x[(x[1:] - x[:-1])= delta] // // // // // ___ // NumPy-Discussion mailing list // NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion // http://mail.scipy.org/mailman/listinfo/numpy-discussion/ / Playing around with range/arange can be misleading as x[x%4==0] array([ 0., 4., 8.]) I don't know you really want because your first code x= numpy.arange(10.) delta=3 y=[x[0]] for value in x: ... if (y[-1] -value) delta: ...y.append(value) ... y [0.0, 0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0] Which is not [0, 4, 8]. Bruce ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Simple problem. Is it possible without a loop?
I suspect the author meant that instead (or a simple minus in front of the delta). Just posting because I wondered the same this morning after looking at it (after the first misunderstanding). It looks much better to me than the cumsum approach with its hidden test for true using .astype(numpy.int) x= numpy.arange(10.) delta=3 y=[x[0]] for value in x: if (value - y[-1]) delta: y.append(value) print y Am Mittwoch, den 09.06.2010, 09:24 -0500 schrieb Bruce Southey: Playing around with range/arange can be misleading as x[x%4==0] array([ 0., 4., 8.]) I don't know you really want because your first code x= numpy.arange(10.) delta=3 y=[x[0]] for value in x: ... if (y[-1] -value) delta: ...y.append(value) ... y [0.0, 0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0] Which is not [0, 4, 8]. Bruce ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Simple problem. Is it possible without a loop?
Correct. I thought just multiplying by -1 and inverting the logical condition would give me the same output. This makes exactly what I want: x= numpy.arange(10.) delta=3 y=[x[0]] for value in x: ... if (value-y[-1]) delta: ...y.append(value) ... y [0., 4., 8.] Armando ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] C vs. Fortran order -- misleading documentation?
I think that arrays are just syntax on pointer is indeed the key reason for how C works here. Since a[b] really means a + b (which is why 5[a] and a[5] are the same), I don't see how to do it differently. Holy crap! You can do that in C?! ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] typo in docs
On Wed, Jun 9, 2010 at 02:19, Pierre GM pgmdevl...@gmail.com wrote: On Jun 8, 2010, at 4:37 AM, Sebastian Haase wrote: another note: http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#arrays-indexing-rec should not say record array - because recarray a special in that they can even access the named fields via attribute access rather than using the dictionary-like syntax. Well, traditionally, any item of a structured array is called a record, so a ndarray w/ flexible dtype can also be called a record array... recarrays are just a particular case of structured (or record) arrays. In general usage, that is true. However, we have more or less settled on consistently using structured array in the documentation in order to avoid confusion with recarrays. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Simple problem. Is it possible without a loop?
? Well a loop or list comparison seems like a good choice to me. It is much more obvious at the expense of two LOCs. Did you profile the two possibilities and are they actually performance-critical? cheers The second is between 8 and ten times faster on my machine. import numpy import time x0 = numpy.arange(1.) niter = 2000 # I expect between 1 and 10 def option1(x, delta=0.2): y = [x[0]] for value in x: if (value - y[-1]) delta: y.append(value) return numpy.array(y) def option2(x, delta=0.2): y = numpy.cumsum((x[1:]-x[:-1])/delta).astype(numpy.int) i1 = numpy.nonzero(y[1:] y[:-1]) return numpy.take(x, i1) t0 = time.time() for i in range(niter): t = option1(x0) print Elapsed = , time.time() - t0 t0 = time.time() for i in range(niter): t = option2(x0) print Elapsed = , time.time() - t0 ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Simple problem. Is it possible without a loop?
It gets even worse with data similar to those I will be using. With: x0 = numpy.linspace(-1,1, 1) niter = 2000 I get 24 seconds for option1 and 0.64 seconds for option2. Considering I expect between 5 and 50 times that number of iterations, the difference is already quite considerable. Armando Quoting Vicente Sole s...@esrf.fr: ? Well a loop or list comparison seems like a good choice to me. It is much more obvious at the expense of two LOCs. Did you profile the two possibilities and are they actually performance-critical? cheers The second is between 8 and ten times faster on my machine. import numpy import time x0 = numpy.arange(1.) niter = 2000 # I expect between 1 and 10 def option1(x, delta=0.2): y = [x[0]] for value in x: if (value - y[-1]) delta: y.append(value) return numpy.array(y) def option2(x, delta=0.2): y = numpy.cumsum((x[1:]-x[:-1])/delta).astype(numpy.int) i1 = numpy.nonzero(y[1:] y[:-1]) return numpy.take(x, i1) t0 = time.time() for i in range(niter): t = option1(x0) print Elapsed = , time.time() - t0 t0 = time.time() for i in range(niter): t = option2(x0) print Elapsed = , time.time() - t0 ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Simple problem. Is it possible without a loop?
On Wed, Jun 9, 2010 at 11:31 AM, Vicente Sole s...@esrf.fr wrote: It gets even worse with data similar to those I will be using. With: x0 = numpy.linspace(-1,1, 1) niter = 2000 I get 24 seconds for option1 and 0.64 seconds for option2. Considering I expect between 5 and 50 times that number of iterations, the difference is already quite considerable. but the two options don't produce the same result in general, the cumsum version doesn't restart from zero, I think try x0 = np.random.randint(5,size=30).cumsum() with delta=3 I don't see a way around recursive looping Josef Armando Quoting Vicente Sole s...@esrf.fr: ? Well a loop or list comparison seems like a good choice to me. It is much more obvious at the expense of two LOCs. Did you profile the two possibilities and are they actually performance-critical? cheers The second is between 8 and ten times faster on my machine. import numpy import time x0 = numpy.arange(1.) niter = 2000 # I expect between 1 and 10 def option1(x, delta=0.2): y = [x[0]] for value in x: if (value - y[-1]) delta: y.append(value) return numpy.array(y) def option2(x, delta=0.2): y = numpy.cumsum((x[1:]-x[:-1])/delta).astype(numpy.int) i1 = numpy.nonzero(y[1:] y[:-1]) return numpy.take(x, i1) t0 = time.time() for i in range(niter): t = option1(x0) print Elapsed = , time.time() - t0 t0 = time.time() for i in range(niter): t = option2(x0) print Elapsed = , time.time() - t0 ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Simple problem. Is it possible without a loop?
Quoting josef.p...@gmail.com: but the two options don't produce the same result in general, the cumsum version doesn't restart from zero, I think try x0 = np.random.randint(5,size=30).cumsum() with delta=3 I don't see a way around recursive looping The x0 data are already sorted. It was one of the premises of the first post. The solution I proposed makes almost what I need and I will most likely use it. It just misses the first value but takes the next one what is fine for my application. For the simple example, it returns [1, 5, 9] instead of [0, 4, 8] but it should not disturb me. Armando ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Simple problem. Is it possible without a loop?
On Wed, Jun 9, 2010 at 11:47 AM, Vicente Sole s...@esrf.fr wrote: Quoting josef.p...@gmail.com: but the two options don't produce the same result in general, the cumsum version doesn't restart from zero, I think try x0 = np.random.randint(5,size=30).cumsum() with delta=3 I don't see a way around recursive looping The x0 data are already sorted. It was one of the premises of the first post. The solution I proposed makes almost what I need and I will most likely use it. It just misses the first value but takes the next one what is fine for my application. For the simple example, it returns [1, 5, 9] instead of [0, 4, 8] but it should not disturb me. but unless you have pretty regular spacing, np.diff(y) can be anything e.g. in a random example, increments of y are if x0 is weakly increasing [[5 0 3 3 4 9 6 0 6 1 8 4]] and if x0 is strictly increasing [[ 2 4 7 1 9 9 5 4 6 2 5 11]] maybe you should check np.diff(y) for your case, in my random example (np.diff(y1)delta).mean() 0.0 (np.diff(y2)delta).mean() 0.25 Josef Armando ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] C vs. Fortran order -- misleading documentation?
On Thu, Jun 10, 2010 at 12:09 AM, Benjamin Root ben.r...@ou.edu wrote: I think that arrays are just syntax on pointer is indeed the key reason for how C works here. Since a[b] really means a + b (which is why 5[a] and a[5] are the same), I don't see how to do it differently. Holy crap! You can do that in C?! Yes: #include stdio.h int main() { float a[2] = {1.0, 2.0}; printf(%f %f %f\n, a[1], *(a+1), 1[a]); } ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] C vs. Fortran order -- misleading documentation?
On Wed, Jun 9, 2010 at 9:00 AM, David Cournapeau courn...@gmail.com wrote: On Thu, Jun 10, 2010 at 12:09 AM, Benjamin Root ben.r...@ou.edu wrote: I think that arrays are just syntax on pointer is indeed the key reason for how C works here. Since a[b] really means a + b (which is why 5[a] and a[5] are the same), I don't see how to do it differently. Holy crap! You can do that in C?! Yes: #include stdio.h int main() { float a[2] = {1.0, 2.0}; printf(%f %f %f\n, a[1], *(a+1), 1[a]); } This is all _very_ educational (and I mean that sincerely), but can we please get back to the topic at hand ( :-) ). A specific proposal is on the table: we remove discussion of the whole C/Fortran ordering issue from basics.indexing.rst and promote it to a more advanced document TBD. DG ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Simple problem. Is it possible without a loop?
Hi Josef, I do not need regular spacing of the original data. I only need them to be sorted and that I get it with a previous numpy call. Then the algorithm using the cumsum does the trick without a explicit loop. Armando ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] MemoryError with dot(A, A.T) where A is 800MB on 32-bit Vista
When I run import numpy as np a = np.ones((400, 50), dtype=np.float32) c = np.dot(a, a.T) produces a MemoryError on the 32-bit Enthought Python Distribution on 32-bit Vista. I understand this has to do with the 2GB limit with 32-bit python and the fact numpy wants a contiguous chunk of memory for an array. When I look at the memory use in the task manager though, it looks like it's trying to allocate enough for two 400x50 arrays. I guess it's explicitly forming a.T. Is there a way to avoid this? I tried c = scipy.lib.blas.fblas.dgemm(1.0, a, a, trans_b=1) but I get the same result. It appears to be using a lot of extra memory. Isn't this just a wrapper to the blas library that passes a pointer to the memory location of a? Why does it seem to be generating the transpose? Is there a way to do A*A.T without two copies of A? Thanks, Greg ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] MemoryError with dot(A, A.T) where A is 800MB on 32-bit Vista
greg whittier wrote: When I run import numpy as np a = np.ones((400, 50), dtype=np.float32) c = np.dot(a, a.T) produces a MemoryError on the 32-bit Enthought Python Distribution on 32-bit Vista. I understand this has to do with the 2GB limit with 32-bit python and the fact numpy wants a contiguous chunk of memory for an array. When I look at the memory use in the task manager though, it looks like it's trying to allocate enough for two 400x50 arrays. I guess it's explicitly forming a.T. Is there a way to avoid this? I tried c = scipy.lib.blas.fblas.dgemm(1.0, a, a, trans_b=1) but I get the same result. It appears to be using a lot of extra memory. Isn't this just a wrapper to the blas library that passes a pointer to the memory location of a? Why does it seem to be generating the transpose? Is there a way to do A*A.T without two copies of A? In such cases I create a matrix of zeros with the final size and I fill it with a loop of dot products of smaller chunks of the original a matrix. The MDP package also does something similar. Armando ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] MemoryError with dot(A, A.T) where A is 800MB on 32-bit Vista
On Wed, Jun 9, 2010 at 12:57 PM, V. Armando Solé s...@esrf.fr wrote: greg whittier wrote: a = np.ones((400, 50), dtype=np.float32) c = np.dot(a, a.T) In such cases I create a matrix of zeros with the final size and I fill it with a loop of dot products of smaller chunks of the original a matrix. The MDP package also does something similar. Armando Thanks. I've done that as a workaround, but I was hoping there was a more elegant, native solution. Thanks for the pointer to MDP by the way! Looks very interesting. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] MemoryError with dot(A, A.T) where A is 800MB on 32-bit Vista
On 6/9/2010 12:49 PM, greg whittier wrote: Is there a way to do A*A.T without two copies of A? Does this do what you want? Alan Isaac a array([[0, 1], [2, 3], [4, 5], [6, 7], [8, 9]]) np.tensordot(a,a,axes=(-1,-1)) array([[ 1, 3, 5, 7, 9], [ 3, 13, 23, 33, 43], [ 5, 23, 41, 59, 77], [ 7, 33, 59, 85, 111], [ 9, 43, 77, 111, 145]]) ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] MemoryError with dot(A, A.T) where A is 800MB on 32-bit Vista
Alan G Isaac wrote: On 6/9/2010 12:49 PM, greg whittier wrote: Is there a way to do A*A.T without two copies of A? Does this do what you want? Alan Isaac a array([[0, 1], [2, 3], [4, 5], [6, 7], [8, 9]]) np.tensordot(a,a,axes=(-1,-1)) array([[ 1, 3, 5, 7, 9], [ 3, 13, 23, 33, 43], [ 5, 23, 41, 59, 77], [ 7, 33, 59, 85, 111], [ 9, 43, 77, 111, 145]]) In my case, this still uses twice as much memory as should be needed, just like using dot. Xavier ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] MemoryError with dot(A, A.T) where A is 800MB on 32-bit Vista
On Wed, Jun 9, 2010 at 1:16 PM, Alan G Isaac ais...@american.edu wrote: On 6/9/2010 12:49 PM, greg whittier wrote: Is there a way to do A*A.T without two copies of A? Does this do what you want? Alan Isaac np.tensordot(a,a,axes=(-1,-1)) This seems to suffer from the same problem. A temporary array is being creating somewhere causing a MemoryError. I did learn a new function though! Thanks, Greg ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Tools / data structures for statistical analysis and related applications
Dear all, We've been having discussions on the pystatsmodels mailing list recently regarding data structures and other tools for statistics / other related data analysis applications. I believe we're trying to answer a number of different, but related questions: 1. What are the sets of functionality (and use cases) which would be desirable for the scientific (or statistical) Python programmer? Things like groupby (http://projects.scipy.org/numpy/browser/trunk/doc/neps/groupby_additions.rst) fall into this category. 2. Do we really need to build custom data structures (larry, pandas, tabular, etc.) or are structured ndarrays enough? (My conclusion is that we do need to, but others might disagree). If so, how much performance are we willing to trade for functionality? 3. What needs to happen for Python / NumPy / SciPy to really break in to the statistical computing field? In other words, could a Python-based stack one day be a competitive alternative to R? These are just some ideas for collecting community input. Of course as we're all working in different problem domains, the needs of users will vary quite a bit across the board. We've started to collect some thoughts, links, etc. on the scipy.org wiki: http://scipy.org/StatisticalDataStructures A lot of what's there already is commentary and comparison on the functionality provided by pandas and la / larry (since Keith and I wrote most of the stuff there). But I think we're trying to identify more generally the things that are lacking in NumPy/SciPy and related libraries for particular applications. At minimum it should be good fodder for the SciPy conferences this year and afterward (I am submitting a paper on this subject based on my experiences). - Wes ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Simple problem. Is it possible without a loop?
On 06/09/2010 10:24 AM, Vicente Sole wrote: ? Well a loop or list comparison seems like a good choice to me. It is much more obvious at the expense of two LOCs. Did you profile the two possibilities and are they actually performance-critical? cheers The second is between 8 and ten times faster on my machine. import numpy import time x0 = numpy.arange(1.) niter = 2000 # I expect between 1 and 10 def option1(x, delta=0.2): y = [x[0]] for value in x: if (value - y[-1]) delta: y.append(value) return numpy.array(y) def option2(x, delta=0.2): y = numpy.cumsum((x[1:]-x[:-1])/delta).astype(numpy.int) i1 = numpy.nonzero(y[1:] y[:-1]) return numpy.take(x, i1) t0 = time.time() for i in range(niter): t = option1(x0) print Elapsed = , time.time() - t0 t0 = time.time() for i in range(niter): t = option2(x0) print Elapsed = , time.time() - t0 ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion For integer arguments for delta, I don't see any different between using option1 and using the '%' operator. (x0[(x0*10)%2==0]-option1(x0)).sum() 0.0 Also option2 gives a different result than option1 so these are not equivalent functions. You can see that from the shapes option2(x0).shape (1, 9998) option1(x0).shape (1,) ((option1(x0)[:9998])-option2(x0)).sum() 0.0 So, allowing for shape difference, option2 is the same for most of output from option1 but it is still smaller than option1. Probably the main reason for the speed difference is that option2 is virtually pure numpy (and hence done in C) and option1 is using a lot of array lookups that are always slow. So keep it in numpy as most as possible. Bruce ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] NumPy re-factoring project
Hi everyone, This is a follow-up to Travis's message on the re-factoring project from May 25th and the subsequent discussion. For background, I am a developer at Enthought working on the NumPy re-factoring project with Travis and Scott. The immediate goal from our perspective is to re-factor the core of NumPy into two architectural layers: a true core that is CPython-independent and an interface layer that binds the core to CPython. A design proposal is now posted on the NumPy developer wiki: http://projects.scipy.org/numpy/wiki/NumPyRefactoring The write-up is a fairly high-level description of what we think the split will look like and how to deal with issues such as memory management. There are also placeholders listed as 'TBD' where more investigation is still needed and will be filled in over time. At the end of the page there is a section on the C API with a link to a function-by-function breakdown of the C API and whether the function belongs in the interface layer, the core, or need to be split between the two. All functions listed as 'core' will continue to have an interface-level wrapper with the same name to ensure source-compatibility. All of this, particularly the interface/core function designations, is a first analysis and in flux. The goal is to get the information out and elicit discussion and feedback from the community. Best regards, Jason Jason McCampbell Enthought, Inc. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] yet another scipy Install problem
Dear All .. I tried to install numpy and scipy ... basicaly I want to use OMPC I Installed SuiteSparse : ---START--- b...@bino:~/mydoc/buku/arduino/scipy/scipy-0.8.0b1$ ls -l /usr/local/lib/lib*a -rw-r--r-- 1 root root 51860 2010-06-10 10:59 /usr/local/lib/libamd.2.2.1.a lrwxrwxrwx 1 root root 14 2010-06-10 10:59 /usr/local/lib/libamd.a - libamd.2.2.1.a -rw-r--r-- 1 root root9466 2010-06-10 10:59 /usr/local/lib/libbtf.1.1.1.a lrwxrwxrwx 1 root root 14 2010-06-10 10:59 /usr/local/lib/libbtf.a - libbtf.1.1.1.a -rw-r--r-- 1 root root 53848 2010-06-10 10:59 /usr/local/lib/libcamd.2.2.1.a lrwxrwxrwx 1 root root 15 2010-06-10 10:59 /usr/local/lib/libcamd.a - libcamd.2.2.1.a -rw-r--r-- 1 root root 38264 2010-06-10 10:59 /usr/local/lib/libccolamd.2.7.2.a lrwxrwxrwx 1 root root 18 2010-06-10 10:59 /usr/local/lib/libccolamd.a - libccolamd.2.7.2.a -rw-r--r-- 1 root root 1032104 2010-06-10 10:59 /usr/local/lib/libcholmod.1.7.2.a lrwxrwxrwx 1 root root 18 2010-06-10 10:59 /usr/local/lib/libcholmod.a - libcholmod.1.7.2.a -rw-r--r-- 1 root root 26072 2010-06-10 10:59 /usr/local/lib/libcolamd.2.7.2.a lrwxrwxrwx 1 root root 17 2010-06-10 10:59 /usr/local/lib/libcolamd.a - libcolamd.2.7.2.a -rw-r--r-- 1 root root 361906 2010-06-10 10:59 /usr/local/lib/libcxsparse.2.2.4.a lrwxrwxrwx 1 root root 19 2010-06-10 10:59 /usr/local/lib/libcxsparse.a - libcxsparse.2.2.4.a -rw-r--r-- 1 root root 468572 2010-06-10 00:17 /usr/local/lib/libfblas.a -rw-r--r-- 1 root root 6227200 2010-06-10 00:00 /usr/local/lib/libflapack.a -rw-r--r-- 1 root root 475472 2010-05-20 10:16 /usr/local/lib/libgarmin.a -rw-r--r-- 1 root root 216520 2010-06-10 10:59 /usr/local/lib/libklu.1.1.1.a lrwxrwxrwx 1 root root 14 2010-06-10 10:59 /usr/local/lib/libklu.a - libklu.1.1.1.a -rw-r--r-- 1 root root5548 2010-06-10 10:59 /usr/local/lib/libldl.2.0.2.a lrwxrwxrwx 1 root root 14 2010-06-10 10:59 /usr/local/lib/libldl.a - libldl.2.0.2.a -rw-r--r-- 1 root root 28642 2010-06-10 10:59 /usr/local/lib/librbio.2.0.0.a lrwxrwxrwx 1 root root 15 2010-06-10 10:59 /usr/local/lib/librbio.a - librbio.2.0.0.a -rw-r--r-- 1 root root 255520 2010-06-10 10:59 /usr/local/lib/libspqr.1.2.0.a lrwxrwxrwx 1 root root 15 2010-06-10 10:59 /usr/local/lib/libspqr.a - libspqr.1.2.0.a -rw-r--r-- 1 root root1412 2010-06-10 10:59 /usr/local/lib/libufconfig.3.5.0.a lrwxrwxrwx 1 root root 19 2010-06-10 10:59 /usr/local/lib/libufconfig.a - libufconfig.3.5.0.a -rw-r--r-- 1 root root 1064532 2010-06-10 10:59 /usr/local/lib/libumfpack.5.5.0.a lrwxrwxrwx 1 root root 18 2010-06-10 10:59 /usr/local/lib/libumfpack.a - libumfpack.5.5.0.a b...@bino:~/mydoc/buku/arduino/scipy/scipy-0.8.0b1$ ls -l /usr/local/include/ total 736 -rw-r--r-- 1 root root 17293 2010-06-10 10:59 amd.h -rw-r--r-- 1 root root 12129 2010-06-10 10:59 btf.h -rw-r--r-- 1 root root 16975 2010-06-10 10:59 camd.h -rw-r--r-- 1 root root 10713 2010-06-10 10:59 ccolamd.h -rw-r--r-- 1 root root 13990 2010-06-10 10:59 cholmod_blas.h -rw-r--r-- 1 root root 14685 2010-06-10 10:59 cholmod_check.h -rw-r--r-- 1 root root 20091 2010-06-10 10:59 cholmod_cholesky.h -rw-r--r-- 1 root root 9380 2010-06-10 10:59 cholmod_complexity.h -rw-r--r-- 1 root root 3162 2010-06-10 10:59 cholmod_config.h -rw-r--r-- 1 root root 96574 2010-06-10 10:59 cholmod_core.h -rw-r--r-- 1 root root 3858 2010-06-10 10:59 cholmod.h -rw-r--r-- 1 root root 1710 2010-06-10 10:59 cholmod_io64.h -rw-r--r-- 1 root root 8606 2010-06-10 10:59 cholmod_matrixops.h -rw-r--r-- 1 root root 12969 2010-06-10 10:59 cholmod_modify.h -rw-r--r-- 1 root root 8850 2010-06-10 10:59 cholmod_partition.h -rw-r--r-- 1 root root 6435 2010-06-10 10:59 cholmod_supernodal.h -rw-r--r-- 1 root root 9217 2010-06-10 10:59 cholmod_template.h -rw-r--r-- 1 root root 8666 2010-06-10 10:59 colamd.h -rw-r--r-- 1 root root 30977 2010-06-10 10:59 cs.h -rw-r--r-- 1 root root 29336 2010-06-10 10:59 klu.h -rw-r--r-- 1 root root 3934 2010-06-10 10:59 ldl.h -rw-r--r-- 1 root root 6150 2010-05-20 10:16 libgarmin.h -rw-r--r-- 1 root root 11648 2010-06-10 10:59 RBio.h -rw-r--r-- 1 root root 34283 2010-06-10 10:59 spqr.hpp -rw-r--r-- 1 root root 9652 2010-06-10 10:59 SuiteSparseQR_C.h -rw-r--r-- 1 root root 2906 2010-06-10 10:59 SuiteSparseQR_definitions.h -rw-r--r-- 1 root root 24518 2010-06-10 10:59 SuiteSparseQR.hpp -rw-r--r-- 1 root root 5467 2010-06-10 10:59 UFconfig.h -rw-r--r-- 1 root root 3712 2010-06-10 10:59 umfpack_col_to_triplet.h -rw-r--r-- 1 root root 1892 2010-06-10 10:59 umfpack_defaults.h -rw-r--r-- 1 root root 1690 2010-06-10 10:59 umfpack_free_numeric.h -rw-r--r-- 1 root root 1708 2010-06-10 10:59 umfpack_free_symbolic.h -rw-r--r-- 1 root root 6270 2010-06-10 10:59 umfpack_get_determinant.h -rw-r--r-- 1 root root 3989 2010-06-10 10:59 umfpack_get_lunz.h -rw-r--r-- 1 root root 9019 2010-06-10 10:59 umfpack_get_numeric.h -rw-r--r-- 1 root root 13040