Re: [Numpy-discussion] merge_arrays is very slow; alternatives?

2010-11-26 Thread Pauli Virtanen
On Fri, 26 Nov 2010 20:57:30 +0100, Gerrit Holl wrote:
[clip]
> I wonder, am I missing something or have I really written a significant
> improvement in less than 10 LOC? Should I file a patch for this?

The implementation of merge_arrays doesn't look optimal -- it seems to 
actually iterate over the data, which should not be needed.

So yes, rewriting the function would be useful. The main difficulty in 
the rewrite seems to be appropriate mask handling, but slicing is a 
faster way to do that.

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] merge_arrays is very slow; alternatives?

2010-11-26 Thread Gerrit Holl
On 26 November 2010 20:16, Gerrit Holl  wrote:
> Hi,
>
> upon profiling my code, I found that
> numpy.lib.recfunctions.merge_arrays is extremely slow; it does some
> 7000 rows/second. This is not acceptable for me.
...
> How can I do this in a faster way?

Replying to my own code here. Either I have written a much faster
implementation of this, or I am missing something. I consider it
unlikely that I write a much faster implementation of an established
numpy function with little effort, so I suspect I am missing something
here.

I wrote this implementation of the flattened version of merge_arrays:

def merge_arrays(arr1, arr2):
t1 = arr1.dtype
t2 = arr2.dtype
newdtype = numpy.dtype(t1.descr + t2.descr)
newarray = numpy.empty(shape=arr1.shape, dtype=newdtype)
for field in t1.names:
newarray[field] = arr1[field]
for field in t2.names:
newarray[field] = arr2[field]
return newarray

and benchmarks show it's almost 100 times faster for a medium-sized array:

In [211]: %timeit merged1 =
numpy.lib.recfunctions.merge_arrays([metarows[:1],
targetrows2[:1]], flatten=True)
1 loops, best of 3: 1.01 s per loop

In [212]: %timeit merged2 =
pyatmlab.tools.merge_arrays(metarows[:1], targetrows2[:1])
100 loops, best of 3: 10.8 ms per loop

In [214]: (merged1.view(dtype=uint64).reshape(-1, 100) ==
merged2.view(dtype=uint64).reshape(-1, 100)).all()
Out[214]: True

# and still 4 times faster for a small array:

In [215]: %timeit merged1 =
numpy.lib.recfunctions.merge_arrays([metarows[:10], targetrows2[:10]],
flatten=True)
1000 loops, best of 3: 1.31 ms per loop

In [216]: %timeit merged2 = pyatmlab.tools.merge_arrays(metarows[:10],
targetrows2[:10])
1000 loops, best of 3: 344 us per loop

# and 15 times faster for a large array (1.5 million elements):

In [218]: %timeit merged1 =
numpy.lib.recfunctions.merge_arrays([metarows, targetrows2],
flatten=True)
1 loops, best of 3: 110 s per loop

In [217]: %timeit merged2 = pyatmlab.tools.merge_arrays(metarows, targetrows2)
1 loops, best of 3: 7.26 s per loop

I wonder, am I missing something or have I really written a
significant improvement in less than 10 LOC? Should I file a patch for
this?

Gerrit.

-- 
Exploring space at http://gerrit-explores.blogspot.com/
Personal homepage at http://www.topjaklont.org/
Asperger Syndroom: http://www.topjaklont.org/nl/asperger.html
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] merge_arrays is very slow; alternatives?

2010-11-26 Thread Gerrit Holl
Hi,

upon profiling my code, I found that
numpy.lib.recfunctions.merge_arrays is extremely slow; it does some
7000 rows/second. This is not acceptable for me.

I have two large record arrays, or arrays with a complicated dtype.
All I want to do is to merge them into one. I don't think that should
have to be a very slow operation, I don't need to copy anything, I
just want to view the two record arrays as one.

How can I do this in a faster way?

In [45]: cProfile.runctx("numpy.lib.recfunctions.merge_arrays([metarows,
targetrows2], flatten=True)", globals(), locals())
        225381902 function calls (150254635 primitive calls) in
166.620 CPU seconds

  Ordered by: standard name

  ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       1    0.031    0.031  166.620  166.620 :1()
    68/1    0.000    0.000    0.000    0.000 _internal.py:82(_array_descr)
       2    0.000    0.000    0.000    0.000 numeric.py:286(asanyarray)
       2    0.000    0.000    0.000    0.000 recfunctions.py:135(flatten_descr)
       1    0.000    0.000    0.001    0.001 recfunctions.py:161(zip_descr)
149165600/74038400  117.195    0.000  139.701    0.000
recfunctions.py:235(_izip_fields_flat)
 1088801   12.146    0.000  151.847    0.000 recfunctions.py:263(izip_records)
       3    0.000    0.000    0.000    0.000 recfunctions.py:277(sentinel)
       1    4.599    4.599  166.589  166.589 recfunctions.py:328(merge_arrays)
       3    0.000    0.000    0.000    0.000 recfunctions.py:406()
 75127201   22.506    0.000   22.506    0.000 {isinstance}
      69    0.000    0.000    0.000    0.000 {len}
       1    0.000    0.000    0.000    0.000 {map}
       1    0.000    0.000    0.000    0.000 {max}
       2    0.000    0.000    0.000    0.000 {method '__array__' of
'numpy.ndarray' objects}
     136    0.000    0.000    0.000    0.000 {method 'append' of
'list' objects}
       1    0.000    0.000    0.000    0.000 {method 'disable' of
'_lsprof.Profiler' objects}
       2    0.000    0.000    0.000    0.000 {method 'extend' of
'list' objects}
       2    0.000    0.000    0.000    0.000 {method 'pop' of 'list' objects}
       2    0.000    0.000    0.000    0.000 {method 'ravel' of
'numpy.ndarray' objects}
       2    0.000    0.000    0.000    0.000 {numpy.core.multiarray.array}
       1   10.142   10.142   10.142   10.142 {numpy.core.multiarray.fromiter}


Gerrit.

--
Gerrit Holl
PhD student at Department of Space Science, Luleå University of
Technology, Kiruna, Sweden
http://www.sat.ltu.se/members/gerrit/
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy speed question

2010-11-26 Thread Francesc Alted
A Thursday 25 November 2010 11:13:49 Jean-Luc Menut escrigué:
> Hello all,
> 
> I have a little question about the speed of numpy vs IDL 7.0. I did a
> very simple little check by computing just a cosine in a loop. I was
> quite surprised to see an order of magnitude of difference between
> numpy and IDL, I would have thought that for such a basic function,
> the speed would be approximatively the same.
> 
> I suppose that some of the difference may come from  the default data
> type of 64bits in numpy and 32 bits in IDL. Is there a way to change
> the numpy default data type (without recompiling) ?
> 
> And I'm not an expert at all, maybe there is a better explanation,
> like a better use of the several CPU core by IDL ?

As others have already point out, you should make sure that you use 
numpy.cos with arrays in order to get good performance.

I don't know whether IDL is using multi-cores or not, but if you are 
looking for ultimate performance, you can always use Numexpr that makes 
use of multicores.  For example, using a machine with 8 cores (w/ 
hyperthreading), we have:

>>> from math import pi
>>> import numpy as np
>>> import numexpr as ne
>>> i = np.arange(1e6)
>>> %timeit np.cos(2*pi*i/100.)
10 loops, best of 3: 85.2 ms per loop
>>> %timeit ne.evaluate("cos(2*pi*i/100.)")
100 loops, best of 3: 8.28 ms per loop

If you don't have a machine with a lot of cores, but still want to get 
good performance, you can still link Numexpr against Intel's VML (Vector 
Math Library).  For example, using Numexpr+VML with only one core (in 
another machine):

>>> %timeit np.cos(2*pi*i/100.)
10 loops, best of 3: 66.7 ms per loop
>>> ne.set_vml_num_threads(1)
>>> %timeit ne.evaluate("cos(2*pi*i/100.)")
100 loops, best of 3: 9.1 ms per loop

which also gives a pretty good speedup.  Curiously, Numexpr+VML is not 
that good at using multicores in this case:

>>> ne.set_vml_num_threads(2)
>>> %timeit ne.evaluate("cos(2*pi*i/100.)")
10 loops, best of 3: 14.7 ms per loop

I don't really know why Numexpr+VML is taking more time using 2 threads 
than only one, but it is probably due to Numexpr requiring better fine-
tuning in combination with VML :-/

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Weibull analysis ?

2010-11-26 Thread David Trémouilles
Hello,

   After careful Google searches, I was not successful in finding any
project dealing with Weibull analysis with neither python nor
numpy or scipy.
So before reinventing the wheel, I ask here whether any of you
have already started such a project and is eager to share.

Thanks,

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numpy speed question

2010-11-26 Thread Bruce Sherwood
Although this was mentioned earlier, it's worth emphasizing that if
you need to use functions such as cosine with scalar arguments, you
should use math.cos(), not numpy.cos(). The numpy versions of these
functions are optimized for handling array arguments and are much
slower than the math versions for scalar arguments.

Bruce Sherwood

On Thu, Nov 25, 2010 at 2:34 PM, Gökhan Sever  wrote:
> On Thu, Nov 25, 2010 at 4:13 AM, Jean-Luc Menut  wrote:
>> Hello all,
>>
>> I have a little question about the speed of numpy vs IDL 7.0. I did a
>> very simple little check by computing just a cosine in a loop. I was
>> quite surprised to see an order of magnitude of difference between numpy
>> and IDL, I would have thought that for such a basic function, the speed
>> would be approximatively the same.
>>
>> I suppose that some of the difference may come from  the default data
>> type of 64bits in numpy and 32 bits in IDL. Is there a way to change the
>> numpy default data type (without recompiling) ?
>>
>> And I'm not an expert at all, maybe there is a better explanation, like
>> a better use of the several CPU core by IDL ?
>>
>> I'm working with windows 7 64 bits on a core i7.
>>
>> any hint is welcome.
>> Thanks.
>>
>> Here the IDL code :
>> Julian1 = SYSTIME( /JULIAN , /UTC )
>> for j=0, do begin
>>   for i=0,999 do begin
>>     a=cos(2*!pi*i/100.)
>>   endfor
>> endfor
>> Julian2 = SYSTIME( /JULIAN , /UTC )
>> print, (Julian2-Julian1)*86400.0
>> print,cpt
>> end
>>
>> result:
>> % Compiled module: $MAIN$.
>>        2.837
>>
>>
>> The python code:
>> from numpy import *
>> from time import time
>> time1 = time()
>> for j in range(1):
>>     for i in range(1000):
>>         a=cos(2*pi*i/100.)
>> time2 = time()
>> print time2-time1
>>
>> result:
>> In [2]: run python_test_speed.py
>> 24.180943
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
> Vectorised numpy version already blow away the results.
>
> Here is what I get using the IDL version (with IDL v7.1):
>
> IDL> .r test_idl
> % Compiled module: $MAIN$.
>       4.185
>
> I[10]: time run test_python
> 43.305727005
>
> and using a Cythonized version:
>
> from math import pi
>
> cdef extern from "math.h":
>    float cos(float)
>
> cpdef float myloop(int n1, int n2, float n3):
>    cdef float a
>    cdef int i, j
>    for j in range(n1):
>        for i in range(n2):
>            a=cos(2*pi*i/n3)
>
> compiling the setup.py file python setup.py build_ext --inplace
> and importing the function into IPython
>
> from mycython import myloop
>
> I[6]: timeit myloop(1, 1000, 100.0)
> 1 loops, best of 3: 2.91 s per loop
>
>
> --
> Gökhan
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion