Re: [Numpy-discussion] Odd-looking long double on windows 32 bit

2011-11-12 Thread Matthew Brett
Hi,

On Sat, Nov 12, 2011 at 11:35 PM, Matthew Brett  wrote:
> Hi,
>
> Sorry for my continued confusion here.  This is numpy 1.6.1 on windows
> XP 32 bit.
>
> In [2]: np.finfo(np.float96).nmant
> Out[2]: 52
>
> In [3]: np.finfo(np.float96).nexp
> Out[3]: 15
>
> In [4]: np.finfo(np.float64).nmant
> Out[4]: 52
>
> In [5]: np.finfo(np.float64).nexp
> Out[5]: 11
>
> If there are 52 bits of precision, 2**53+1 should not be
> representable, and sure enough:
>
> In [6]: np.float96(2**53)+1
> Out[6]: 9007199254740992.0
>
> In [7]: np.float64(2**53)+1
> Out[7]: 9007199254740992.0
>
> If the nexp is right, the max should be higher for the float96 type:
>
> In [9]: np.finfo(np.float64).max
> Out[9]: 1.7976931348623157e+308
>
> In [10]: np.finfo(np.float96).max
> Out[10]: 1.#INF
>
> I see that long double in C is 12 bytes wide, and double is the usual 8 bytes.

Sorry - sizeof(long double) is 12 using mingw.  I see that long double
is the same as double in MS Visual C++.

http://en.wikipedia.org/wiki/Long_double

but, as expected from the name:

In [11]: np.dtype(np.float96).itemsize
Out[11]: 12

Cheers,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Odd-looking long double on windows 32 bit

2011-11-12 Thread Matthew Brett
Hi,

Sorry for my continued confusion here.  This is numpy 1.6.1 on windows
XP 32 bit.

In [2]: np.finfo(np.float96).nmant
Out[2]: 52

In [3]: np.finfo(np.float96).nexp
Out[3]: 15

In [4]: np.finfo(np.float64).nmant
Out[4]: 52

In [5]: np.finfo(np.float64).nexp
Out[5]: 11

If there are 52 bits of precision, 2**53+1 should not be
representable, and sure enough:

In [6]: np.float96(2**53)+1
Out[6]: 9007199254740992.0

In [7]: np.float64(2**53)+1
Out[7]: 9007199254740992.0

If the nexp is right, the max should be higher for the float96 type:

In [9]: np.finfo(np.float64).max
Out[9]: 1.7976931348623157e+308

In [10]: np.finfo(np.float96).max
Out[10]: 1.#INF

I see that long double in C is 12 bytes wide, and double is the usual 8 bytes.

So - now I am not sure what this float96 is.  I was expecting 80 bit
extended precision, but it doesn't look right for that...

Does anyone know what representation this is?

Thanks a lot,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] speeding up the following expression

2011-11-12 Thread Geoffrey Zhu
Hi Warren,

On Sat, Nov 12, 2011 at 9:31 AM, Warren Weckesser
 wrote:
>
>
> On Sat, Nov 12, 2011 at 6:43 AM,  wrote:
>>
>> On Sat, Nov 12, 2011 at 3:36 AM, Geoffrey Zhu  wrote:
>> > Hi,
>> >
>> > I am playing with multiple ways to speed up the following expression
>> > (it is in the inner loop):
>> >
>> >
>> > C[1:(M - 1)]=(a * C[2:] + b * C[1:(M-1)] + c * C[:(M-2)])
>> >
>> > where C is an array of about 200-300 elements, M=len(C), a, b, c are
>> > scalars.
>> >
>> > I played with numexpr, but it was way slower than directly using
>> > numpy. It would be nice if I could create a Mx3 matrix without copying
>> > memory and so I can use dot() to calculate the whole thing.
>> >
>> > Can anyone help with giving some advices to make this faster?
>>
>> looks like a np.convolve(C, [a,b,c])  to me except for the boundary
>> conditions.
>
>
>
> As Josef pointed out, this is a convolution.  There are (at least)
> three convolution functions in numpy+scipy that you could use:
> numpy.convolve, scipy.signal.convolve, and scipy.ndimage.convolve1d.
> Of these, scipy.ndimage.convolve1d is the fastest.  However, it doesn't
> beat the simple expression
>    a*x[2:] + b*x[1:-1] + c*x[:-2]
> Your idea of forming a matrix without copying memory can be done using
> "stride tricks", and for arrays of the size you are interested in, it
> computes the result faster than the simple expression (see below).
>
> Another fast alternative is to use one of the inline code generators.
> This example is a easy to implement with scipy.weave.blitz, and it gives
> a big speedup.
>
> Here's a test script:
>
> #- convolve1dtest.py -
>
>
> import numpy as np
> from numpy.lib.stride_tricks import as_strided
> from scipy.ndimage import convolve1d
> from scipy.weave import blitz
>
> # weighting coefficients
> a = -0.5
> b = 1.0
> c = -0.25
> w = np.array((a,b,c))
> # Reversed w:
> rw = w[::-1]
>
> # Length of C
> n = 250
>
> # The original version of the calculation:
> # Some dummy data
> C = np.arange(float(n))
> C[1:-1] = a * C[2:] + b * C[1:-1] + c * C[:-2]
> # Save for comparison.
> C0 = C.copy()
>
> # Do it again using a matrix multiplication.
> C = np.arange(float(n))
> # The "virtual" matrix view of C.
> V = as_strided(C, shape=(C.size-2, 3), strides=(C.strides[0], C.strides[0]))
> C[1:-1] = np.dot(V, rw)
> C1 = C.copy()
>
> # Again, with convolve1d this time.
> C = np.arange(float(n))
> C[1:-1] = convolve1d(C, w)[1:-1]
> C2 = C.copy()
>
> # scipy.weave.blitz
> C = np.arange(float(n))
> # Must work with a copy, D, in the formula, because blitz does not use
> # a temporary variable.
> D = C.copy()
> expr = "C[1:-1] = a * D[2:] + b * D[1:-1] + c * D[:-2]"
> blitz(expr, check_size=0)
> C3 = C.copy()
>
>
> # Verify that all the methods give the same result.
> print np.all(C0 == C1)
> print np.all(C0 == C2)
> print np.all(C0 == C3)
>
> #-
>
> And here's a snippet from an ipython session:
>
> In [51]: run convolve1dtest.py
> True
> True
> True
>
> In [52]: %timeit C[1:-1] = a * C[2:] + b * C[1:-1] + c * C[:-2]
> 10 loops, best of 3: 16.5 us per loop
>
> In [53]: %timeit C[1:-1] = np.dot(V, rw)
> 10 loops, best of 3: 9.84 us per loop
>
> In [54]: %timeit C[1:-1] = convolve1d(C, w)[1:-1]
> 10 loops, best of 3: 18.7 us per loop
>
> In [55]: %timeit D = C.copy(); blitz(expr, check_size=0)
> 10 loops, best of 3: 4.91 us per loop
>
>
>
> scipy.weave.blitz is fastest (but note that blitz has already been called
> once, so the time shown does not include the compilation required in
> the first call).  You could also try scipy.weave.inline, cython.inline,
> or np_inline (http://pages.cs.wisc.edu/~johnl/np_inline/).
>
> Also note that C[-1:1] = np.dot(V, rw) is faster than either the simple
> expression or convolve1d.  However, if you also have to set up V inside
> your inner loop, the speed gain will be lost.  The relative speeds also
> depend on the size of C.  For large C, the simple expression is faster
> than the matrix multiplication by V (but blitz is still faster).  In
> the following, I have changed n to 2500 before running convolve1dtest.py:
>
> In [56]: run convolve1dtest.py
> True
> True
> True
>
> In [57]: %timeit C[1:-1] = a * C[2:] + b * C[1:-1] + c * C[:-2]
> 1 loops, best of 3: 29.5 us per loop
>
> In [58]: %timeit C[1:-1] = np.dot(V, rw)
> 1 loops, best of 3: 56.4 us per loop
>
> In [59]: %timeit C[1:-1] = convolve1d(C, w)[1:-1]
> 1 loops, best of 3: 37.3 us per loop
>
> In [60]: %timeit D = C.copy(); blitz(expr, check_size=0)
> 10 loops, best of 3: 10.3 us per loop
>
>
> blitz wins, the simple numpy expression is a distant second, and now
> the matrix multiplication is slowest.
>
> I hope that helps--I know I learned quite a bit. :)
>
>
> Warren

Thanks for the very helpful response. Fortunately I don't have to set
up the matrix every time as I can work on the same C over and over
again inside the loop. It's counterintuitive to see that the
performance of dot() degrades as n goes up. If I didn'

Re: [Numpy-discussion] speeding up the following expression

2011-11-12 Thread Warren Weckesser
On Sat, Nov 12, 2011 at 11:16 AM,  wrote:

> On Sat, Nov 12, 2011 at 11:32 AM, Warren Weckesser
>  wrote:
> >
> >
> > On Sat, Nov 12, 2011 at 9:59 AM,  wrote:
> >>
> >> On Sat, Nov 12, 2011 at 10:31 AM, Warren Weckesser
> >>  wrote:
> >> >
> >> >
> >> > On Sat, Nov 12, 2011 at 6:43 AM,  wrote:
> >> >>
> >> >> On Sat, Nov 12, 2011 at 3:36 AM, Geoffrey Zhu 
> >> >> wrote:
> >> >> > Hi,
> >> >> >
> >> >> > I am playing with multiple ways to speed up the following
> expression
> >> >> > (it is in the inner loop):
> >> >> >
> >> >> >
> >> >> > C[1:(M - 1)]=(a * C[2:] + b * C[1:(M-1)] + c * C[:(M-2)])
> >> >> >
> >> >> > where C is an array of about 200-300 elements, M=len(C), a, b, c
> are
> >> >> > scalars.
> >> >> >
> >> >> > I played with numexpr, but it was way slower than directly using
> >> >> > numpy. It would be nice if I could create a Mx3 matrix without
> >> >> > copying
> >> >> > memory and so I can use dot() to calculate the whole thing.
> >> >> >
> >> >> > Can anyone help with giving some advices to make this faster?
> >> >>
> >> >> looks like a np.convolve(C, [a,b,c])  to me except for the boundary
> >> >> conditions.
> >> >
> >> >
> >> >
> >> > As Josef pointed out, this is a convolution.  There are (at least)
> >> > three convolution functions in numpy+scipy that you could use:
> >> > numpy.convolve, scipy.signal.convolve, and scipy.ndimage.convolve1d.
> >> > Of these, scipy.ndimage.convolve1d is the fastest.  However, it
> doesn't
> >> > beat the simple expression
> >> >a*x[2:] + b*x[1:-1] + c*x[:-2]
> >> > Your idea of forming a matrix without copying memory can be done using
> >> > "stride tricks", and for arrays of the size you are interested in, it
> >> > computes the result faster than the simple expression (see below).
> >> >
> >> > Another fast alternative is to use one of the inline code generators.
> >> > This example is a easy to implement with scipy.weave.blitz, and it
> gives
> >> > a big speedup.
> >> >
> >> > Here's a test script:
> >> >
> >> > #- convolve1dtest.py -
> >> >
> >> >
> >> > import numpy as np
> >> > from numpy.lib.stride_tricks import as_strided
> >> > from scipy.ndimage import convolve1d
> >> > from scipy.weave import blitz
> >> >
> >> > # weighting coefficients
> >> > a = -0.5
> >> > b = 1.0
> >> > c = -0.25
> >> > w = np.array((a,b,c))
> >> > # Reversed w:
> >> > rw = w[::-1]
> >> >
> >> > # Length of C
> >> > n = 250
> >> >
> >> > # The original version of the calculation:
> >> > # Some dummy data
> >> > C = np.arange(float(n))
> >> > C[1:-1] = a * C[2:] + b * C[1:-1] + c * C[:-2]
> >> > # Save for comparison.
> >> > C0 = C.copy()
> >> >
> >> > # Do it again using a matrix multiplication.
> >> > C = np.arange(float(n))
> >> > # The "virtual" matrix view of C.
> >> > V = as_strided(C, shape=(C.size-2, 3), strides=(C.strides[0],
> >> > C.strides[0]))
> >> > C[1:-1] = np.dot(V, rw)
> >> > C1 = C.copy()
> >> >
> >> > # Again, with convolve1d this time.
> >> > C = np.arange(float(n))
> >> > C[1:-1] = convolve1d(C, w)[1:-1]
> >> > C2 = C.copy()
> >> >
> >> > # scipy.weave.blitz
> >> > C = np.arange(float(n))
> >> > # Must work with a copy, D, in the formula, because blitz does not use
> >> > # a temporary variable.
> >> > D = C.copy()
> >> > expr = "C[1:-1] = a * D[2:] + b * D[1:-1] + c * D[:-2]"
> >> > blitz(expr, check_size=0)
> >> > C3 = C.copy()
> >> >
> >> >
> >> > # Verify that all the methods give the same result.
> >> > print np.all(C0 == C1)
> >> > print np.all(C0 == C2)
> >> > print np.all(C0 == C3)
> >> >
> >> > #-
> >> >
> >> > And here's a snippet from an ipython session:
> >> >
> >> > In [51]: run convolve1dtest.py
> >> > True
> >> > True
> >> > True
> >> >
> >> > In [52]: %timeit C[1:-1] = a * C[2:] + b * C[1:-1] + c * C[:-2]
> >> > 10 loops, best of 3: 16.5 us per loop
> >> >
> >> > In [53]: %timeit C[1:-1] = np.dot(V, rw)
> >> > 10 loops, best of 3: 9.84 us per loop
> >> >
> >> > In [54]: %timeit C[1:-1] = convolve1d(C, w)[1:-1]
> >> > 10 loops, best of 3: 18.7 us per loop
> >> >
> >> > In [55]: %timeit D = C.copy(); blitz(expr, check_size=0)
> >> > 10 loops, best of 3: 4.91 us per loop
> >> >
> >> >
> >> >
> >> > scipy.weave.blitz is fastest (but note that blitz has already been
> >> > called
> >> > once, so the time shown does not include the compilation required in
> >> > the first call).  You could also try scipy.weave.inline,
> cython.inline,
> >> > or np_inline (http://pages.cs.wisc.edu/~johnl/np_inline/).
> >> >
> >> > Also note that C[-1:1] = np.dot(V, rw) is faster than either the
> simple
> >> > expression or convolve1d.  However, if you also have to set up V
> inside
> >> > your inner loop, the speed gain will be lost.  The relative speeds
> also
> >> > depend on the size of C.  For large C, the simple expression is faster
> >> > than the matrix multiplication by V (but blitz is still faster).  In
> >> > the following, I have changed n to 2500 before running
> >> > convolve1dtest.py:
> >> >
> >> > In [56]: run

Re: [Numpy-discussion] speeding up the following expression

2011-11-12 Thread josef . pktd
On Sat, Nov 12, 2011 at 11:32 AM, Warren Weckesser
 wrote:
>
>
> On Sat, Nov 12, 2011 at 9:59 AM,  wrote:
>>
>> On Sat, Nov 12, 2011 at 10:31 AM, Warren Weckesser
>>  wrote:
>> >
>> >
>> > On Sat, Nov 12, 2011 at 6:43 AM,  wrote:
>> >>
>> >> On Sat, Nov 12, 2011 at 3:36 AM, Geoffrey Zhu 
>> >> wrote:
>> >> > Hi,
>> >> >
>> >> > I am playing with multiple ways to speed up the following expression
>> >> > (it is in the inner loop):
>> >> >
>> >> >
>> >> > C[1:(M - 1)]=(a * C[2:] + b * C[1:(M-1)] + c * C[:(M-2)])
>> >> >
>> >> > where C is an array of about 200-300 elements, M=len(C), a, b, c are
>> >> > scalars.
>> >> >
>> >> > I played with numexpr, but it was way slower than directly using
>> >> > numpy. It would be nice if I could create a Mx3 matrix without
>> >> > copying
>> >> > memory and so I can use dot() to calculate the whole thing.
>> >> >
>> >> > Can anyone help with giving some advices to make this faster?
>> >>
>> >> looks like a np.convolve(C, [a,b,c])  to me except for the boundary
>> >> conditions.
>> >
>> >
>> >
>> > As Josef pointed out, this is a convolution.  There are (at least)
>> > three convolution functions in numpy+scipy that you could use:
>> > numpy.convolve, scipy.signal.convolve, and scipy.ndimage.convolve1d.
>> > Of these, scipy.ndimage.convolve1d is the fastest.  However, it doesn't
>> > beat the simple expression
>> >    a*x[2:] + b*x[1:-1] + c*x[:-2]
>> > Your idea of forming a matrix without copying memory can be done using
>> > "stride tricks", and for arrays of the size you are interested in, it
>> > computes the result faster than the simple expression (see below).
>> >
>> > Another fast alternative is to use one of the inline code generators.
>> > This example is a easy to implement with scipy.weave.blitz, and it gives
>> > a big speedup.
>> >
>> > Here's a test script:
>> >
>> > #- convolve1dtest.py -
>> >
>> >
>> > import numpy as np
>> > from numpy.lib.stride_tricks import as_strided
>> > from scipy.ndimage import convolve1d
>> > from scipy.weave import blitz
>> >
>> > # weighting coefficients
>> > a = -0.5
>> > b = 1.0
>> > c = -0.25
>> > w = np.array((a,b,c))
>> > # Reversed w:
>> > rw = w[::-1]
>> >
>> > # Length of C
>> > n = 250
>> >
>> > # The original version of the calculation:
>> > # Some dummy data
>> > C = np.arange(float(n))
>> > C[1:-1] = a * C[2:] + b * C[1:-1] + c * C[:-2]
>> > # Save for comparison.
>> > C0 = C.copy()
>> >
>> > # Do it again using a matrix multiplication.
>> > C = np.arange(float(n))
>> > # The "virtual" matrix view of C.
>> > V = as_strided(C, shape=(C.size-2, 3), strides=(C.strides[0],
>> > C.strides[0]))
>> > C[1:-1] = np.dot(V, rw)
>> > C1 = C.copy()
>> >
>> > # Again, with convolve1d this time.
>> > C = np.arange(float(n))
>> > C[1:-1] = convolve1d(C, w)[1:-1]
>> > C2 = C.copy()
>> >
>> > # scipy.weave.blitz
>> > C = np.arange(float(n))
>> > # Must work with a copy, D, in the formula, because blitz does not use
>> > # a temporary variable.
>> > D = C.copy()
>> > expr = "C[1:-1] = a * D[2:] + b * D[1:-1] + c * D[:-2]"
>> > blitz(expr, check_size=0)
>> > C3 = C.copy()
>> >
>> >
>> > # Verify that all the methods give the same result.
>> > print np.all(C0 == C1)
>> > print np.all(C0 == C2)
>> > print np.all(C0 == C3)
>> >
>> > #-
>> >
>> > And here's a snippet from an ipython session:
>> >
>> > In [51]: run convolve1dtest.py
>> > True
>> > True
>> > True
>> >
>> > In [52]: %timeit C[1:-1] = a * C[2:] + b * C[1:-1] + c * C[:-2]
>> > 10 loops, best of 3: 16.5 us per loop
>> >
>> > In [53]: %timeit C[1:-1] = np.dot(V, rw)
>> > 10 loops, best of 3: 9.84 us per loop
>> >
>> > In [54]: %timeit C[1:-1] = convolve1d(C, w)[1:-1]
>> > 10 loops, best of 3: 18.7 us per loop
>> >
>> > In [55]: %timeit D = C.copy(); blitz(expr, check_size=0)
>> > 10 loops, best of 3: 4.91 us per loop
>> >
>> >
>> >
>> > scipy.weave.blitz is fastest (but note that blitz has already been
>> > called
>> > once, so the time shown does not include the compilation required in
>> > the first call).  You could also try scipy.weave.inline, cython.inline,
>> > or np_inline (http://pages.cs.wisc.edu/~johnl/np_inline/).
>> >
>> > Also note that C[-1:1] = np.dot(V, rw) is faster than either the simple
>> > expression or convolve1d.  However, if you also have to set up V inside
>> > your inner loop, the speed gain will be lost.  The relative speeds also
>> > depend on the size of C.  For large C, the simple expression is faster
>> > than the matrix multiplication by V (but blitz is still faster).  In
>> > the following, I have changed n to 2500 before running
>> > convolve1dtest.py:
>> >
>> > In [56]: run convolve1dtest.py
>> > True
>> > True
>> > True
>> >
>> > In [57]: %timeit C[1:-1] = a * C[2:] + b * C[1:-1] + c * C[:-2]
>> > 1 loops, best of 3: 29.5 us per loop
>> >
>> > In [58]: %timeit C[1:-1] = np.dot(V, rw)
>> > 1 loops, best of 3: 56.4 us per loop
>> >
>> > In [59]: %timeit C[1:-1] = convolve1d(C, w)[1:-1]
>> > 1 loops, best 

Re: [Numpy-discussion] speeding up the following expression

2011-11-12 Thread Warren Weckesser
On Sat, Nov 12, 2011 at 9:59 AM,  wrote:

> On Sat, Nov 12, 2011 at 10:31 AM, Warren Weckesser
>  wrote:
> >
> >
> > On Sat, Nov 12, 2011 at 6:43 AM,  wrote:
> >>
> >> On Sat, Nov 12, 2011 at 3:36 AM, Geoffrey Zhu 
> wrote:
> >> > Hi,
> >> >
> >> > I am playing with multiple ways to speed up the following expression
> >> > (it is in the inner loop):
> >> >
> >> >
> >> > C[1:(M - 1)]=(a * C[2:] + b * C[1:(M-1)] + c * C[:(M-2)])
> >> >
> >> > where C is an array of about 200-300 elements, M=len(C), a, b, c are
> >> > scalars.
> >> >
> >> > I played with numexpr, but it was way slower than directly using
> >> > numpy. It would be nice if I could create a Mx3 matrix without copying
> >> > memory and so I can use dot() to calculate the whole thing.
> >> >
> >> > Can anyone help with giving some advices to make this faster?
> >>
> >> looks like a np.convolve(C, [a,b,c])  to me except for the boundary
> >> conditions.
> >
> >
> >
> > As Josef pointed out, this is a convolution.  There are (at least)
> > three convolution functions in numpy+scipy that you could use:
> > numpy.convolve, scipy.signal.convolve, and scipy.ndimage.convolve1d.
> > Of these, scipy.ndimage.convolve1d is the fastest.  However, it doesn't
> > beat the simple expression
> >a*x[2:] + b*x[1:-1] + c*x[:-2]
> > Your idea of forming a matrix without copying memory can be done using
> > "stride tricks", and for arrays of the size you are interested in, it
> > computes the result faster than the simple expression (see below).
> >
> > Another fast alternative is to use one of the inline code generators.
> > This example is a easy to implement with scipy.weave.blitz, and it gives
> > a big speedup.
> >
> > Here's a test script:
> >
> > #- convolve1dtest.py -
> >
> >
> > import numpy as np
> > from numpy.lib.stride_tricks import as_strided
> > from scipy.ndimage import convolve1d
> > from scipy.weave import blitz
> >
> > # weighting coefficients
> > a = -0.5
> > b = 1.0
> > c = -0.25
> > w = np.array((a,b,c))
> > # Reversed w:
> > rw = w[::-1]
> >
> > # Length of C
> > n = 250
> >
> > # The original version of the calculation:
> > # Some dummy data
> > C = np.arange(float(n))
> > C[1:-1] = a * C[2:] + b * C[1:-1] + c * C[:-2]
> > # Save for comparison.
> > C0 = C.copy()
> >
> > # Do it again using a matrix multiplication.
> > C = np.arange(float(n))
> > # The "virtual" matrix view of C.
> > V = as_strided(C, shape=(C.size-2, 3), strides=(C.strides[0],
> C.strides[0]))
> > C[1:-1] = np.dot(V, rw)
> > C1 = C.copy()
> >
> > # Again, with convolve1d this time.
> > C = np.arange(float(n))
> > C[1:-1] = convolve1d(C, w)[1:-1]
> > C2 = C.copy()
> >
> > # scipy.weave.blitz
> > C = np.arange(float(n))
> > # Must work with a copy, D, in the formula, because blitz does not use
> > # a temporary variable.
> > D = C.copy()
> > expr = "C[1:-1] = a * D[2:] + b * D[1:-1] + c * D[:-2]"
> > blitz(expr, check_size=0)
> > C3 = C.copy()
> >
> >
> > # Verify that all the methods give the same result.
> > print np.all(C0 == C1)
> > print np.all(C0 == C2)
> > print np.all(C0 == C3)
> >
> > #-
> >
> > And here's a snippet from an ipython session:
> >
> > In [51]: run convolve1dtest.py
> > True
> > True
> > True
> >
> > In [52]: %timeit C[1:-1] = a * C[2:] + b * C[1:-1] + c * C[:-2]
> > 10 loops, best of 3: 16.5 us per loop
> >
> > In [53]: %timeit C[1:-1] = np.dot(V, rw)
> > 10 loops, best of 3: 9.84 us per loop
> >
> > In [54]: %timeit C[1:-1] = convolve1d(C, w)[1:-1]
> > 10 loops, best of 3: 18.7 us per loop
> >
> > In [55]: %timeit D = C.copy(); blitz(expr, check_size=0)
> > 10 loops, best of 3: 4.91 us per loop
> >
> >
> >
> > scipy.weave.blitz is fastest (but note that blitz has already been called
> > once, so the time shown does not include the compilation required in
> > the first call).  You could also try scipy.weave.inline, cython.inline,
> > or np_inline (http://pages.cs.wisc.edu/~johnl/np_inline/).
> >
> > Also note that C[-1:1] = np.dot(V, rw) is faster than either the simple
> > expression or convolve1d.  However, if you also have to set up V inside
> > your inner loop, the speed gain will be lost.  The relative speeds also
> > depend on the size of C.  For large C, the simple expression is faster
> > than the matrix multiplication by V (but blitz is still faster).  In
> > the following, I have changed n to 2500 before running convolve1dtest.py:
> >
> > In [56]: run convolve1dtest.py
> > True
> > True
> > True
> >
> > In [57]: %timeit C[1:-1] = a * C[2:] + b * C[1:-1] + c * C[:-2]
> > 1 loops, best of 3: 29.5 us per loop
> >
> > In [58]: %timeit C[1:-1] = np.dot(V, rw)
> > 1 loops, best of 3: 56.4 us per loop
> >
> > In [59]: %timeit C[1:-1] = convolve1d(C, w)[1:-1]
> > 1 loops, best of 3: 37.3 us per loop
> >
> > In [60]: %timeit D = C.copy(); blitz(expr, check_size=0)
> > 10 loops, best of 3: 10.3 us per loop
> >
> >
> > blitz wins, the simple numpy expression is a distant second, and now
> > the matrix multipli

Re: [Numpy-discussion] speeding up the following expression

2011-11-12 Thread josef . pktd
On Sat, Nov 12, 2011 at 10:31 AM, Warren Weckesser
 wrote:
>
>
> On Sat, Nov 12, 2011 at 6:43 AM,  wrote:
>>
>> On Sat, Nov 12, 2011 at 3:36 AM, Geoffrey Zhu  wrote:
>> > Hi,
>> >
>> > I am playing with multiple ways to speed up the following expression
>> > (it is in the inner loop):
>> >
>> >
>> > C[1:(M - 1)]=(a * C[2:] + b * C[1:(M-1)] + c * C[:(M-2)])
>> >
>> > where C is an array of about 200-300 elements, M=len(C), a, b, c are
>> > scalars.
>> >
>> > I played with numexpr, but it was way slower than directly using
>> > numpy. It would be nice if I could create a Mx3 matrix without copying
>> > memory and so I can use dot() to calculate the whole thing.
>> >
>> > Can anyone help with giving some advices to make this faster?
>>
>> looks like a np.convolve(C, [a,b,c])  to me except for the boundary
>> conditions.
>
>
>
> As Josef pointed out, this is a convolution.  There are (at least)
> three convolution functions in numpy+scipy that you could use:
> numpy.convolve, scipy.signal.convolve, and scipy.ndimage.convolve1d.
> Of these, scipy.ndimage.convolve1d is the fastest.  However, it doesn't
> beat the simple expression
>    a*x[2:] + b*x[1:-1] + c*x[:-2]
> Your idea of forming a matrix without copying memory can be done using
> "stride tricks", and for arrays of the size you are interested in, it
> computes the result faster than the simple expression (see below).
>
> Another fast alternative is to use one of the inline code generators.
> This example is a easy to implement with scipy.weave.blitz, and it gives
> a big speedup.
>
> Here's a test script:
>
> #- convolve1dtest.py -
>
>
> import numpy as np
> from numpy.lib.stride_tricks import as_strided
> from scipy.ndimage import convolve1d
> from scipy.weave import blitz
>
> # weighting coefficients
> a = -0.5
> b = 1.0
> c = -0.25
> w = np.array((a,b,c))
> # Reversed w:
> rw = w[::-1]
>
> # Length of C
> n = 250
>
> # The original version of the calculation:
> # Some dummy data
> C = np.arange(float(n))
> C[1:-1] = a * C[2:] + b * C[1:-1] + c * C[:-2]
> # Save for comparison.
> C0 = C.copy()
>
> # Do it again using a matrix multiplication.
> C = np.arange(float(n))
> # The "virtual" matrix view of C.
> V = as_strided(C, shape=(C.size-2, 3), strides=(C.strides[0], C.strides[0]))
> C[1:-1] = np.dot(V, rw)
> C1 = C.copy()
>
> # Again, with convolve1d this time.
> C = np.arange(float(n))
> C[1:-1] = convolve1d(C, w)[1:-1]
> C2 = C.copy()
>
> # scipy.weave.blitz
> C = np.arange(float(n))
> # Must work with a copy, D, in the formula, because blitz does not use
> # a temporary variable.
> D = C.copy()
> expr = "C[1:-1] = a * D[2:] + b * D[1:-1] + c * D[:-2]"
> blitz(expr, check_size=0)
> C3 = C.copy()
>
>
> # Verify that all the methods give the same result.
> print np.all(C0 == C1)
> print np.all(C0 == C2)
> print np.all(C0 == C3)
>
> #-
>
> And here's a snippet from an ipython session:
>
> In [51]: run convolve1dtest.py
> True
> True
> True
>
> In [52]: %timeit C[1:-1] = a * C[2:] + b * C[1:-1] + c * C[:-2]
> 10 loops, best of 3: 16.5 us per loop
>
> In [53]: %timeit C[1:-1] = np.dot(V, rw)
> 10 loops, best of 3: 9.84 us per loop
>
> In [54]: %timeit C[1:-1] = convolve1d(C, w)[1:-1]
> 10 loops, best of 3: 18.7 us per loop
>
> In [55]: %timeit D = C.copy(); blitz(expr, check_size=0)
> 10 loops, best of 3: 4.91 us per loop
>
>
>
> scipy.weave.blitz is fastest (but note that blitz has already been called
> once, so the time shown does not include the compilation required in
> the first call).  You could also try scipy.weave.inline, cython.inline,
> or np_inline (http://pages.cs.wisc.edu/~johnl/np_inline/).
>
> Also note that C[-1:1] = np.dot(V, rw) is faster than either the simple
> expression or convolve1d.  However, if you also have to set up V inside
> your inner loop, the speed gain will be lost.  The relative speeds also
> depend on the size of C.  For large C, the simple expression is faster
> than the matrix multiplication by V (but blitz is still faster).  In
> the following, I have changed n to 2500 before running convolve1dtest.py:
>
> In [56]: run convolve1dtest.py
> True
> True
> True
>
> In [57]: %timeit C[1:-1] = a * C[2:] + b * C[1:-1] + c * C[:-2]
> 1 loops, best of 3: 29.5 us per loop
>
> In [58]: %timeit C[1:-1] = np.dot(V, rw)
> 1 loops, best of 3: 56.4 us per loop
>
> In [59]: %timeit C[1:-1] = convolve1d(C, w)[1:-1]
> 1 loops, best of 3: 37.3 us per loop
>
> In [60]: %timeit D = C.copy(); blitz(expr, check_size=0)
> 10 loops, best of 3: 10.3 us per loop
>
>
> blitz wins, the simple numpy expression is a distant second, and now
> the matrix multiplication is slowest.
>
> I hope that helps--I know I learned quite a bit. :)

Interesting, two questions

does scipy.signal convolve have a similar overhead as np.convolve1d ?

memory:
the blitz code doesn't include the array copy (D), so the timing might
be a bit misleading?
I assume the as_strided call doesn't allocate any memory yet, so the
timing s

Re: [Numpy-discussion] speeding up the following expression

2011-11-12 Thread Warren Weckesser
On Sat, Nov 12, 2011 at 6:43 AM,  wrote:

> On Sat, Nov 12, 2011 at 3:36 AM, Geoffrey Zhu  wrote:
> > Hi,
> >
> > I am playing with multiple ways to speed up the following expression
> > (it is in the inner loop):
> >
> >
> > C[1:(M - 1)]=(a * C[2:] + b * C[1:(M-1)] + c * C[:(M-2)])
> >
> > where C is an array of about 200-300 elements, M=len(C), a, b, c are
> scalars.
> >
> > I played with numexpr, but it was way slower than directly using
> > numpy. It would be nice if I could create a Mx3 matrix without copying
> > memory and so I can use dot() to calculate the whole thing.
> >
> > Can anyone help with giving some advices to make this faster?
>
> looks like a np.convolve(C, [a,b,c])  to me except for the boundary
> conditions.
>



As Josef pointed out, this is a convolution.  There are (at least)
three convolution functions in numpy+scipy that you could use:
numpy.convolve, scipy.signal.convolve, and scipy.ndimage.convolve1d.
Of these, scipy.ndimage.convolve1d is the fastest.  However, it doesn't
beat the simple expression
   a*x[2:] + b*x[1:-1] + c*x[:-2]
Your idea of forming a matrix without copying memory can be done using
"stride tricks", and for arrays of the size you are interested in, it
computes the result faster than the simple expression (see below).

Another fast alternative is to use one of the inline code generators.
This example is a easy to implement with scipy.weave.blitz, and it gives
a big speedup.

Here's a test script:

#- convolve1dtest.py -


import numpy as np
from numpy.lib.stride_tricks import as_strided
from scipy.ndimage import convolve1d
from scipy.weave import blitz

# weighting coefficients
a = -0.5
b = 1.0
c = -0.25
w = np.array((a,b,c))
# Reversed w:
rw = w[::-1]

# Length of C
n = 250

# The original version of the calculation:
# Some dummy data
C = np.arange(float(n))
C[1:-1] = a * C[2:] + b * C[1:-1] + c * C[:-2]
# Save for comparison.
C0 = C.copy()

# Do it again using a matrix multiplication.
C = np.arange(float(n))
# The "virtual" matrix view of C.
V = as_strided(C, shape=(C.size-2, 3), strides=(C.strides[0], C.strides[0]))
C[1:-1] = np.dot(V, rw)
C1 = C.copy()

# Again, with convolve1d this time.
C = np.arange(float(n))
C[1:-1] = convolve1d(C, w)[1:-1]
C2 = C.copy()

# scipy.weave.blitz
C = np.arange(float(n))
# Must work with a copy, D, in the formula, because blitz does not use
# a temporary variable.
D = C.copy()
expr = "C[1:-1] = a * D[2:] + b * D[1:-1] + c * D[:-2]"
blitz(expr, check_size=0)
C3 = C.copy()


# Verify that all the methods give the same result.
print np.all(C0 == C1)
print np.all(C0 == C2)
print np.all(C0 == C3)

#-

And here's a snippet from an ipython session:

In [51]: run convolve1dtest.py
True
True
True

In [52]: %timeit C[1:-1] = a * C[2:] + b * C[1:-1] + c * C[:-2]
10 loops, best of 3: 16.5 us per loop

In [53]: %timeit C[1:-1] = np.dot(V, rw)
10 loops, best of 3: 9.84 us per loop

In [54]: %timeit C[1:-1] = convolve1d(C, w)[1:-1]
10 loops, best of 3: 18.7 us per loop

In [55]: %timeit D = C.copy(); blitz(expr, check_size=0)
10 loops, best of 3: 4.91 us per loop



scipy.weave.blitz is fastest (but note that blitz has already been called
once, so the time shown does not include the compilation required in
the first call).  You could also try scipy.weave.inline, cython.inline,
or np_inline (http://pages.cs.wisc.edu/~johnl/np_inline/).

Also note that C[-1:1] = np.dot(V, rw) is faster than either the simple
expression or convolve1d.  However, if you also have to set up V inside
your inner loop, the speed gain will be lost.  The relative speeds also
depend on the size of C.  For large C, the simple expression is faster
than the matrix multiplication by V (but blitz is still faster).  In
the following, I have changed n to 2500 before running convolve1dtest.py:

In [56]: run convolve1dtest.py
True
True
True

In [57]: %timeit C[1:-1] = a * C[2:] + b * C[1:-1] + c * C[:-2]
1 loops, best of 3: 29.5 us per loop

In [58]: %timeit C[1:-1] = np.dot(V, rw)
1 loops, best of 3: 56.4 us per loop

In [59]: %timeit C[1:-1] = convolve1d(C, w)[1:-1]
1 loops, best of 3: 37.3 us per loop

In [60]: %timeit D = C.copy(); blitz(expr, check_size=0)
10 loops, best of 3: 10.3 us per loop


blitz wins, the simple numpy expression is a distant second, and now
the matrix multiplication is slowest.

I hope that helps--I know I learned quite a bit. :)


Warren



> Josef
>
>
> >
> > Thanks,
> > G
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> >
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] speeding up the following expression

2011-11-12 Thread josef . pktd
On Sat, Nov 12, 2011 at 3:36 AM, Geoffrey Zhu  wrote:
> Hi,
>
> I am playing with multiple ways to speed up the following expression
> (it is in the inner loop):
>
>
> C[1:(M - 1)]=(a * C[2:] + b * C[1:(M-1)] + c * C[:(M-2)])
>
> where C is an array of about 200-300 elements, M=len(C), a, b, c are scalars.
>
> I played with numexpr, but it was way slower than directly using
> numpy. It would be nice if I could create a Mx3 matrix without copying
> memory and so I can use dot() to calculate the whole thing.
>
> Can anyone help with giving some advices to make this faster?

looks like a np.convolve(C, [a,b,c])  to me except for the boundary conditions.

Josef


>
> Thanks,
> G
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] speeding up the following expression

2011-11-12 Thread Geoffrey Zhu
Hi,

I am playing with multiple ways to speed up the following expression
(it is in the inner loop):


C[1:(M - 1)]=(a * C[2:] + b * C[1:(M-1)] + c * C[:(M-2)])

where C is an array of about 200-300 elements, M=len(C), a, b, c are scalars.

I played with numexpr, but it was way slower than directly using
numpy. It would be nice if I could create a Mx3 matrix without copying
memory and so I can use dot() to calculate the whole thing.

Can anyone help with giving some advices to make this faster?

Thanks,
G
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion