Re: [Numpy-discussion] merge_arrays is very slow; alternatives?
On Fri, 26 Nov 2010 20:57:30 +0100, Gerrit Holl wrote: [clip] > I wonder, am I missing something or have I really written a significant > improvement in less than 10 LOC? Should I file a patch for this? The implementation of merge_arrays doesn't look optimal -- it seems to actually iterate over the data, which should not be needed. So yes, rewriting the function would be useful. The main difficulty in the rewrite seems to be appropriate mask handling, but slicing is a faster way to do that. -- Pauli Virtanen ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] merge_arrays is very slow; alternatives?
On 26 November 2010 20:16, Gerrit Holl wrote: > Hi, > > upon profiling my code, I found that > numpy.lib.recfunctions.merge_arrays is extremely slow; it does some > 7000 rows/second. This is not acceptable for me. ... > How can I do this in a faster way? Replying to my own code here. Either I have written a much faster implementation of this, or I am missing something. I consider it unlikely that I write a much faster implementation of an established numpy function with little effort, so I suspect I am missing something here. I wrote this implementation of the flattened version of merge_arrays: def merge_arrays(arr1, arr2): t1 = arr1.dtype t2 = arr2.dtype newdtype = numpy.dtype(t1.descr + t2.descr) newarray = numpy.empty(shape=arr1.shape, dtype=newdtype) for field in t1.names: newarray[field] = arr1[field] for field in t2.names: newarray[field] = arr2[field] return newarray and benchmarks show it's almost 100 times faster for a medium-sized array: In [211]: %timeit merged1 = numpy.lib.recfunctions.merge_arrays([metarows[:1], targetrows2[:1]], flatten=True) 1 loops, best of 3: 1.01 s per loop In [212]: %timeit merged2 = pyatmlab.tools.merge_arrays(metarows[:1], targetrows2[:1]) 100 loops, best of 3: 10.8 ms per loop In [214]: (merged1.view(dtype=uint64).reshape(-1, 100) == merged2.view(dtype=uint64).reshape(-1, 100)).all() Out[214]: True # and still 4 times faster for a small array: In [215]: %timeit merged1 = numpy.lib.recfunctions.merge_arrays([metarows[:10], targetrows2[:10]], flatten=True) 1000 loops, best of 3: 1.31 ms per loop In [216]: %timeit merged2 = pyatmlab.tools.merge_arrays(metarows[:10], targetrows2[:10]) 1000 loops, best of 3: 344 us per loop # and 15 times faster for a large array (1.5 million elements): In [218]: %timeit merged1 = numpy.lib.recfunctions.merge_arrays([metarows, targetrows2], flatten=True) 1 loops, best of 3: 110 s per loop In [217]: %timeit merged2 = pyatmlab.tools.merge_arrays(metarows, targetrows2) 1 loops, best of 3: 7.26 s per loop I wonder, am I missing something or have I really written a significant improvement in less than 10 LOC? Should I file a patch for this? Gerrit. -- Exploring space at http://gerrit-explores.blogspot.com/ Personal homepage at http://www.topjaklont.org/ Asperger Syndroom: http://www.topjaklont.org/nl/asperger.html ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] merge_arrays is very slow; alternatives?
Hi, upon profiling my code, I found that numpy.lib.recfunctions.merge_arrays is extremely slow; it does some 7000 rows/second. This is not acceptable for me. I have two large record arrays, or arrays with a complicated dtype. All I want to do is to merge them into one. I don't think that should have to be a very slow operation, I don't need to copy anything, I just want to view the two record arrays as one. How can I do this in a faster way? In [45]: cProfile.runctx("numpy.lib.recfunctions.merge_arrays([metarows, targetrows2], flatten=True)", globals(), locals()) 225381902 function calls (150254635 primitive calls) in 166.620 CPU seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 0.031 0.031 166.620 166.620 :1() 68/1 0.000 0.000 0.000 0.000 _internal.py:82(_array_descr) 2 0.000 0.000 0.000 0.000 numeric.py:286(asanyarray) 2 0.000 0.000 0.000 0.000 recfunctions.py:135(flatten_descr) 1 0.000 0.000 0.001 0.001 recfunctions.py:161(zip_descr) 149165600/74038400 117.195 0.000 139.701 0.000 recfunctions.py:235(_izip_fields_flat) 1088801 12.146 0.000 151.847 0.000 recfunctions.py:263(izip_records) 3 0.000 0.000 0.000 0.000 recfunctions.py:277(sentinel) 1 4.599 4.599 166.589 166.589 recfunctions.py:328(merge_arrays) 3 0.000 0.000 0.000 0.000 recfunctions.py:406() 75127201 22.506 0.000 22.506 0.000 {isinstance} 69 0.000 0.000 0.000 0.000 {len} 1 0.000 0.000 0.000 0.000 {map} 1 0.000 0.000 0.000 0.000 {max} 2 0.000 0.000 0.000 0.000 {method '__array__' of 'numpy.ndarray' objects} 136 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects} 1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} 2 0.000 0.000 0.000 0.000 {method 'extend' of 'list' objects} 2 0.000 0.000 0.000 0.000 {method 'pop' of 'list' objects} 2 0.000 0.000 0.000 0.000 {method 'ravel' of 'numpy.ndarray' objects} 2 0.000 0.000 0.000 0.000 {numpy.core.multiarray.array} 1 10.142 10.142 10.142 10.142 {numpy.core.multiarray.fromiter} Gerrit. -- Gerrit Holl PhD student at Department of Space Science, Luleå University of Technology, Kiruna, Sweden http://www.sat.ltu.se/members/gerrit/ ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy speed question
A Thursday 25 November 2010 11:13:49 Jean-Luc Menut escrigué: > Hello all, > > I have a little question about the speed of numpy vs IDL 7.0. I did a > very simple little check by computing just a cosine in a loop. I was > quite surprised to see an order of magnitude of difference between > numpy and IDL, I would have thought that for such a basic function, > the speed would be approximatively the same. > > I suppose that some of the difference may come from the default data > type of 64bits in numpy and 32 bits in IDL. Is there a way to change > the numpy default data type (without recompiling) ? > > And I'm not an expert at all, maybe there is a better explanation, > like a better use of the several CPU core by IDL ? As others have already point out, you should make sure that you use numpy.cos with arrays in order to get good performance. I don't know whether IDL is using multi-cores or not, but if you are looking for ultimate performance, you can always use Numexpr that makes use of multicores. For example, using a machine with 8 cores (w/ hyperthreading), we have: >>> from math import pi >>> import numpy as np >>> import numexpr as ne >>> i = np.arange(1e6) >>> %timeit np.cos(2*pi*i/100.) 10 loops, best of 3: 85.2 ms per loop >>> %timeit ne.evaluate("cos(2*pi*i/100.)") 100 loops, best of 3: 8.28 ms per loop If you don't have a machine with a lot of cores, but still want to get good performance, you can still link Numexpr against Intel's VML (Vector Math Library). For example, using Numexpr+VML with only one core (in another machine): >>> %timeit np.cos(2*pi*i/100.) 10 loops, best of 3: 66.7 ms per loop >>> ne.set_vml_num_threads(1) >>> %timeit ne.evaluate("cos(2*pi*i/100.)") 100 loops, best of 3: 9.1 ms per loop which also gives a pretty good speedup. Curiously, Numexpr+VML is not that good at using multicores in this case: >>> ne.set_vml_num_threads(2) >>> %timeit ne.evaluate("cos(2*pi*i/100.)") 10 loops, best of 3: 14.7 ms per loop I don't really know why Numexpr+VML is taking more time using 2 threads than only one, but it is probably due to Numexpr requiring better fine- tuning in combination with VML :-/ -- Francesc Alted ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Weibull analysis ?
Hello, After careful Google searches, I was not successful in finding any project dealing with Weibull analysis with neither python nor numpy or scipy. So before reinventing the wheel, I ask here whether any of you have already started such a project and is eager to share. Thanks, David ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numpy speed question
Although this was mentioned earlier, it's worth emphasizing that if you need to use functions such as cosine with scalar arguments, you should use math.cos(), not numpy.cos(). The numpy versions of these functions are optimized for handling array arguments and are much slower than the math versions for scalar arguments. Bruce Sherwood On Thu, Nov 25, 2010 at 2:34 PM, Gökhan Sever wrote: > On Thu, Nov 25, 2010 at 4:13 AM, Jean-Luc Menut wrote: >> Hello all, >> >> I have a little question about the speed of numpy vs IDL 7.0. I did a >> very simple little check by computing just a cosine in a loop. I was >> quite surprised to see an order of magnitude of difference between numpy >> and IDL, I would have thought that for such a basic function, the speed >> would be approximatively the same. >> >> I suppose that some of the difference may come from the default data >> type of 64bits in numpy and 32 bits in IDL. Is there a way to change the >> numpy default data type (without recompiling) ? >> >> And I'm not an expert at all, maybe there is a better explanation, like >> a better use of the several CPU core by IDL ? >> >> I'm working with windows 7 64 bits on a core i7. >> >> any hint is welcome. >> Thanks. >> >> Here the IDL code : >> Julian1 = SYSTIME( /JULIAN , /UTC ) >> for j=0, do begin >> for i=0,999 do begin >> a=cos(2*!pi*i/100.) >> endfor >> endfor >> Julian2 = SYSTIME( /JULIAN , /UTC ) >> print, (Julian2-Julian1)*86400.0 >> print,cpt >> end >> >> result: >> % Compiled module: $MAIN$. >> 2.837 >> >> >> The python code: >> from numpy import * >> from time import time >> time1 = time() >> for j in range(1): >> for i in range(1000): >> a=cos(2*pi*i/100.) >> time2 = time() >> print time2-time1 >> >> result: >> In [2]: run python_test_speed.py >> 24.180943 >> >> ___ >> NumPy-Discussion mailing list >> NumPy-Discussion@scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > Vectorised numpy version already blow away the results. > > Here is what I get using the IDL version (with IDL v7.1): > > IDL> .r test_idl > % Compiled module: $MAIN$. > 4.185 > > I[10]: time run test_python > 43.305727005 > > and using a Cythonized version: > > from math import pi > > cdef extern from "math.h": > float cos(float) > > cpdef float myloop(int n1, int n2, float n3): > cdef float a > cdef int i, j > for j in range(n1): > for i in range(n2): > a=cos(2*pi*i/n3) > > compiling the setup.py file python setup.py build_ext --inplace > and importing the function into IPython > > from mycython import myloop > > I[6]: timeit myloop(1, 1000, 100.0) > 1 loops, best of 3: 2.91 s per loop > > > -- > Gökhan > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion