Re: [Numpy-discussion] speeding up the following expression

2011-11-12 Thread josef . pktd
On Sat, Nov 12, 2011 at 3:36 AM, Geoffrey Zhu zyzhu2...@gmail.com wrote:
 Hi,

 I am playing with multiple ways to speed up the following expression
 (it is in the inner loop):


 C[1:(M - 1)]=(a * C[2:] + b * C[1:(M-1)] + c * C[:(M-2)])

 where C is an array of about 200-300 elements, M=len(C), a, b, c are scalars.

 I played with numexpr, but it was way slower than directly using
 numpy. It would be nice if I could create a Mx3 matrix without copying
 memory and so I can use dot() to calculate the whole thing.

 Can anyone help with giving some advices to make this faster?

looks like a np.convolve(C, [a,b,c])  to me except for the boundary conditions.

Josef



 Thanks,
 G
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] speeding up the following expression

2011-11-12 Thread Warren Weckesser
On Sat, Nov 12, 2011 at 6:43 AM, josef.p...@gmail.com wrote:

 On Sat, Nov 12, 2011 at 3:36 AM, Geoffrey Zhu zyzhu2...@gmail.com wrote:
  Hi,
 
  I am playing with multiple ways to speed up the following expression
  (it is in the inner loop):
 
 
  C[1:(M - 1)]=(a * C[2:] + b * C[1:(M-1)] + c * C[:(M-2)])
 
  where C is an array of about 200-300 elements, M=len(C), a, b, c are
 scalars.
 
  I played with numexpr, but it was way slower than directly using
  numpy. It would be nice if I could create a Mx3 matrix without copying
  memory and so I can use dot() to calculate the whole thing.
 
  Can anyone help with giving some advices to make this faster?

 looks like a np.convolve(C, [a,b,c])  to me except for the boundary
 conditions.




As Josef pointed out, this is a convolution.  There are (at least)
three convolution functions in numpy+scipy that you could use:
numpy.convolve, scipy.signal.convolve, and scipy.ndimage.convolve1d.
Of these, scipy.ndimage.convolve1d is the fastest.  However, it doesn't
beat the simple expression
   a*x[2:] + b*x[1:-1] + c*x[:-2]
Your idea of forming a matrix without copying memory can be done using
stride tricks, and for arrays of the size you are interested in, it
computes the result faster than the simple expression (see below).

Another fast alternative is to use one of the inline code generators.
This example is a easy to implement with scipy.weave.blitz, and it gives
a big speedup.

Here's a test script:

#- convolve1dtest.py -


import numpy as np
from numpy.lib.stride_tricks import as_strided
from scipy.ndimage import convolve1d
from scipy.weave import blitz

# weighting coefficients
a = -0.5
b = 1.0
c = -0.25
w = np.array((a,b,c))
# Reversed w:
rw = w[::-1]

# Length of C
n = 250

# The original version of the calculation:
# Some dummy data
C = np.arange(float(n))
C[1:-1] = a * C[2:] + b * C[1:-1] + c * C[:-2]
# Save for comparison.
C0 = C.copy()

# Do it again using a matrix multiplication.
C = np.arange(float(n))
# The virtual matrix view of C.
V = as_strided(C, shape=(C.size-2, 3), strides=(C.strides[0], C.strides[0]))
C[1:-1] = np.dot(V, rw)
C1 = C.copy()

# Again, with convolve1d this time.
C = np.arange(float(n))
C[1:-1] = convolve1d(C, w)[1:-1]
C2 = C.copy()

# scipy.weave.blitz
C = np.arange(float(n))
# Must work with a copy, D, in the formula, because blitz does not use
# a temporary variable.
D = C.copy()
expr = C[1:-1] = a * D[2:] + b * D[1:-1] + c * D[:-2]
blitz(expr, check_size=0)
C3 = C.copy()


# Verify that all the methods give the same result.
print np.all(C0 == C1)
print np.all(C0 == C2)
print np.all(C0 == C3)

#-

And here's a snippet from an ipython session:

In [51]: run convolve1dtest.py
True
True
True

In [52]: %timeit C[1:-1] = a * C[2:] + b * C[1:-1] + c * C[:-2]
10 loops, best of 3: 16.5 us per loop

In [53]: %timeit C[1:-1] = np.dot(V, rw)
10 loops, best of 3: 9.84 us per loop

In [54]: %timeit C[1:-1] = convolve1d(C, w)[1:-1]
10 loops, best of 3: 18.7 us per loop

In [55]: %timeit D = C.copy(); blitz(expr, check_size=0)
10 loops, best of 3: 4.91 us per loop



scipy.weave.blitz is fastest (but note that blitz has already been called
once, so the time shown does not include the compilation required in
the first call).  You could also try scipy.weave.inline, cython.inline,
or np_inline (http://pages.cs.wisc.edu/~johnl/np_inline/).

Also note that C[-1:1] = np.dot(V, rw) is faster than either the simple
expression or convolve1d.  However, if you also have to set up V inside
your inner loop, the speed gain will be lost.  The relative speeds also
depend on the size of C.  For large C, the simple expression is faster
than the matrix multiplication by V (but blitz is still faster).  In
the following, I have changed n to 2500 before running convolve1dtest.py:

In [56]: run convolve1dtest.py
True
True
True

In [57]: %timeit C[1:-1] = a * C[2:] + b * C[1:-1] + c * C[:-2]
1 loops, best of 3: 29.5 us per loop

In [58]: %timeit C[1:-1] = np.dot(V, rw)
1 loops, best of 3: 56.4 us per loop

In [59]: %timeit C[1:-1] = convolve1d(C, w)[1:-1]
1 loops, best of 3: 37.3 us per loop

In [60]: %timeit D = C.copy(); blitz(expr, check_size=0)
10 loops, best of 3: 10.3 us per loop


blitz wins, the simple numpy expression is a distant second, and now
the matrix multiplication is slowest.

I hope that helps--I know I learned quite a bit. :)


Warren



 Josef


 
  Thanks,
  G
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] speeding up the following expression

2011-11-12 Thread josef . pktd
On Sat, Nov 12, 2011 at 10:31 AM, Warren Weckesser
warren.weckes...@enthought.com wrote:


 On Sat, Nov 12, 2011 at 6:43 AM, josef.p...@gmail.com wrote:

 On Sat, Nov 12, 2011 at 3:36 AM, Geoffrey Zhu zyzhu2...@gmail.com wrote:
  Hi,
 
  I am playing with multiple ways to speed up the following expression
  (it is in the inner loop):
 
 
  C[1:(M - 1)]=(a * C[2:] + b * C[1:(M-1)] + c * C[:(M-2)])
 
  where C is an array of about 200-300 elements, M=len(C), a, b, c are
  scalars.
 
  I played with numexpr, but it was way slower than directly using
  numpy. It would be nice if I could create a Mx3 matrix without copying
  memory and so I can use dot() to calculate the whole thing.
 
  Can anyone help with giving some advices to make this faster?

 looks like a np.convolve(C, [a,b,c])  to me except for the boundary
 conditions.



 As Josef pointed out, this is a convolution.  There are (at least)
 three convolution functions in numpy+scipy that you could use:
 numpy.convolve, scipy.signal.convolve, and scipy.ndimage.convolve1d.
 Of these, scipy.ndimage.convolve1d is the fastest.  However, it doesn't
 beat the simple expression
    a*x[2:] + b*x[1:-1] + c*x[:-2]
 Your idea of forming a matrix without copying memory can be done using
 stride tricks, and for arrays of the size you are interested in, it
 computes the result faster than the simple expression (see below).

 Another fast alternative is to use one of the inline code generators.
 This example is a easy to implement with scipy.weave.blitz, and it gives
 a big speedup.

 Here's a test script:

 #- convolve1dtest.py -


 import numpy as np
 from numpy.lib.stride_tricks import as_strided
 from scipy.ndimage import convolve1d
 from scipy.weave import blitz

 # weighting coefficients
 a = -0.5
 b = 1.0
 c = -0.25
 w = np.array((a,b,c))
 # Reversed w:
 rw = w[::-1]

 # Length of C
 n = 250

 # The original version of the calculation:
 # Some dummy data
 C = np.arange(float(n))
 C[1:-1] = a * C[2:] + b * C[1:-1] + c * C[:-2]
 # Save for comparison.
 C0 = C.copy()

 # Do it again using a matrix multiplication.
 C = np.arange(float(n))
 # The virtual matrix view of C.
 V = as_strided(C, shape=(C.size-2, 3), strides=(C.strides[0], C.strides[0]))
 C[1:-1] = np.dot(V, rw)
 C1 = C.copy()

 # Again, with convolve1d this time.
 C = np.arange(float(n))
 C[1:-1] = convolve1d(C, w)[1:-1]
 C2 = C.copy()

 # scipy.weave.blitz
 C = np.arange(float(n))
 # Must work with a copy, D, in the formula, because blitz does not use
 # a temporary variable.
 D = C.copy()
 expr = C[1:-1] = a * D[2:] + b * D[1:-1] + c * D[:-2]
 blitz(expr, check_size=0)
 C3 = C.copy()


 # Verify that all the methods give the same result.
 print np.all(C0 == C1)
 print np.all(C0 == C2)
 print np.all(C0 == C3)

 #-

 And here's a snippet from an ipython session:

 In [51]: run convolve1dtest.py
 True
 True
 True

 In [52]: %timeit C[1:-1] = a * C[2:] + b * C[1:-1] + c * C[:-2]
 10 loops, best of 3: 16.5 us per loop

 In [53]: %timeit C[1:-1] = np.dot(V, rw)
 10 loops, best of 3: 9.84 us per loop

 In [54]: %timeit C[1:-1] = convolve1d(C, w)[1:-1]
 10 loops, best of 3: 18.7 us per loop

 In [55]: %timeit D = C.copy(); blitz(expr, check_size=0)
 10 loops, best of 3: 4.91 us per loop



 scipy.weave.blitz is fastest (but note that blitz has already been called
 once, so the time shown does not include the compilation required in
 the first call).  You could also try scipy.weave.inline, cython.inline,
 or np_inline (http://pages.cs.wisc.edu/~johnl/np_inline/).

 Also note that C[-1:1] = np.dot(V, rw) is faster than either the simple
 expression or convolve1d.  However, if you also have to set up V inside
 your inner loop, the speed gain will be lost.  The relative speeds also
 depend on the size of C.  For large C, the simple expression is faster
 than the matrix multiplication by V (but blitz is still faster).  In
 the following, I have changed n to 2500 before running convolve1dtest.py:

 In [56]: run convolve1dtest.py
 True
 True
 True

 In [57]: %timeit C[1:-1] = a * C[2:] + b * C[1:-1] + c * C[:-2]
 1 loops, best of 3: 29.5 us per loop

 In [58]: %timeit C[1:-1] = np.dot(V, rw)
 1 loops, best of 3: 56.4 us per loop

 In [59]: %timeit C[1:-1] = convolve1d(C, w)[1:-1]
 1 loops, best of 3: 37.3 us per loop

 In [60]: %timeit D = C.copy(); blitz(expr, check_size=0)
 10 loops, best of 3: 10.3 us per loop


 blitz wins, the simple numpy expression is a distant second, and now
 the matrix multiplication is slowest.

 I hope that helps--I know I learned quite a bit. :)

Interesting, two questions

does scipy.signal convolve have a similar overhead as np.convolve1d ?

memory:
the blitz code doesn't include the array copy (D), so the timing might
be a bit misleading?
I assume the as_strided call doesn't allocate any memory yet, so the
timing should be correct. (or is this your comment about setting up V
in the inner loop)

Josef


 Warren



 Josef


 
  Thanks,
  G
 

Re: [Numpy-discussion] speeding up the following expression

2011-11-12 Thread Warren Weckesser
On Sat, Nov 12, 2011 at 9:59 AM, josef.p...@gmail.com wrote:

 On Sat, Nov 12, 2011 at 10:31 AM, Warren Weckesser
 warren.weckes...@enthought.com wrote:
 
 
  On Sat, Nov 12, 2011 at 6:43 AM, josef.p...@gmail.com wrote:
 
  On Sat, Nov 12, 2011 at 3:36 AM, Geoffrey Zhu zyzhu2...@gmail.com
 wrote:
   Hi,
  
   I am playing with multiple ways to speed up the following expression
   (it is in the inner loop):
  
  
   C[1:(M - 1)]=(a * C[2:] + b * C[1:(M-1)] + c * C[:(M-2)])
  
   where C is an array of about 200-300 elements, M=len(C), a, b, c are
   scalars.
  
   I played with numexpr, but it was way slower than directly using
   numpy. It would be nice if I could create a Mx3 matrix without copying
   memory and so I can use dot() to calculate the whole thing.
  
   Can anyone help with giving some advices to make this faster?
 
  looks like a np.convolve(C, [a,b,c])  to me except for the boundary
  conditions.
 
 
 
  As Josef pointed out, this is a convolution.  There are (at least)
  three convolution functions in numpy+scipy that you could use:
  numpy.convolve, scipy.signal.convolve, and scipy.ndimage.convolve1d.
  Of these, scipy.ndimage.convolve1d is the fastest.  However, it doesn't
  beat the simple expression
 a*x[2:] + b*x[1:-1] + c*x[:-2]
  Your idea of forming a matrix without copying memory can be done using
  stride tricks, and for arrays of the size you are interested in, it
  computes the result faster than the simple expression (see below).
 
  Another fast alternative is to use one of the inline code generators.
  This example is a easy to implement with scipy.weave.blitz, and it gives
  a big speedup.
 
  Here's a test script:
 
  #- convolve1dtest.py -
 
 
  import numpy as np
  from numpy.lib.stride_tricks import as_strided
  from scipy.ndimage import convolve1d
  from scipy.weave import blitz
 
  # weighting coefficients
  a = -0.5
  b = 1.0
  c = -0.25
  w = np.array((a,b,c))
  # Reversed w:
  rw = w[::-1]
 
  # Length of C
  n = 250
 
  # The original version of the calculation:
  # Some dummy data
  C = np.arange(float(n))
  C[1:-1] = a * C[2:] + b * C[1:-1] + c * C[:-2]
  # Save for comparison.
  C0 = C.copy()
 
  # Do it again using a matrix multiplication.
  C = np.arange(float(n))
  # The virtual matrix view of C.
  V = as_strided(C, shape=(C.size-2, 3), strides=(C.strides[0],
 C.strides[0]))
  C[1:-1] = np.dot(V, rw)
  C1 = C.copy()
 
  # Again, with convolve1d this time.
  C = np.arange(float(n))
  C[1:-1] = convolve1d(C, w)[1:-1]
  C2 = C.copy()
 
  # scipy.weave.blitz
  C = np.arange(float(n))
  # Must work with a copy, D, in the formula, because blitz does not use
  # a temporary variable.
  D = C.copy()
  expr = C[1:-1] = a * D[2:] + b * D[1:-1] + c * D[:-2]
  blitz(expr, check_size=0)
  C3 = C.copy()
 
 
  # Verify that all the methods give the same result.
  print np.all(C0 == C1)
  print np.all(C0 == C2)
  print np.all(C0 == C3)
 
  #-
 
  And here's a snippet from an ipython session:
 
  In [51]: run convolve1dtest.py
  True
  True
  True
 
  In [52]: %timeit C[1:-1] = a * C[2:] + b * C[1:-1] + c * C[:-2]
  10 loops, best of 3: 16.5 us per loop
 
  In [53]: %timeit C[1:-1] = np.dot(V, rw)
  10 loops, best of 3: 9.84 us per loop
 
  In [54]: %timeit C[1:-1] = convolve1d(C, w)[1:-1]
  10 loops, best of 3: 18.7 us per loop
 
  In [55]: %timeit D = C.copy(); blitz(expr, check_size=0)
  10 loops, best of 3: 4.91 us per loop
 
 
 
  scipy.weave.blitz is fastest (but note that blitz has already been called
  once, so the time shown does not include the compilation required in
  the first call).  You could also try scipy.weave.inline, cython.inline,
  or np_inline (http://pages.cs.wisc.edu/~johnl/np_inline/).
 
  Also note that C[-1:1] = np.dot(V, rw) is faster than either the simple
  expression or convolve1d.  However, if you also have to set up V inside
  your inner loop, the speed gain will be lost.  The relative speeds also
  depend on the size of C.  For large C, the simple expression is faster
  than the matrix multiplication by V (but blitz is still faster).  In
  the following, I have changed n to 2500 before running convolve1dtest.py:
 
  In [56]: run convolve1dtest.py
  True
  True
  True
 
  In [57]: %timeit C[1:-1] = a * C[2:] + b * C[1:-1] + c * C[:-2]
  1 loops, best of 3: 29.5 us per loop
 
  In [58]: %timeit C[1:-1] = np.dot(V, rw)
  1 loops, best of 3: 56.4 us per loop
 
  In [59]: %timeit C[1:-1] = convolve1d(C, w)[1:-1]
  1 loops, best of 3: 37.3 us per loop
 
  In [60]: %timeit D = C.copy(); blitz(expr, check_size=0)
  10 loops, best of 3: 10.3 us per loop
 
 
  blitz wins, the simple numpy expression is a distant second, and now
  the matrix multiplication is slowest.
 
  I hope that helps--I know I learned quite a bit. :)

 Interesting, two questions

 does scipy.signal convolve have a similar overhead as np.convolve1d ?



Did you mean np.convolve?  There is no np.convolve1d.   Some of the tests
that 

Re: [Numpy-discussion] speeding up the following expression

2011-11-12 Thread josef . pktd
On Sat, Nov 12, 2011 at 11:32 AM, Warren Weckesser
warren.weckes...@enthought.com wrote:


 On Sat, Nov 12, 2011 at 9:59 AM, josef.p...@gmail.com wrote:

 On Sat, Nov 12, 2011 at 10:31 AM, Warren Weckesser
 warren.weckes...@enthought.com wrote:
 
 
  On Sat, Nov 12, 2011 at 6:43 AM, josef.p...@gmail.com wrote:
 
  On Sat, Nov 12, 2011 at 3:36 AM, Geoffrey Zhu zyzhu2...@gmail.com
  wrote:
   Hi,
  
   I am playing with multiple ways to speed up the following expression
   (it is in the inner loop):
  
  
   C[1:(M - 1)]=(a * C[2:] + b * C[1:(M-1)] + c * C[:(M-2)])
  
   where C is an array of about 200-300 elements, M=len(C), a, b, c are
   scalars.
  
   I played with numexpr, but it was way slower than directly using
   numpy. It would be nice if I could create a Mx3 matrix without
   copying
   memory and so I can use dot() to calculate the whole thing.
  
   Can anyone help with giving some advices to make this faster?
 
  looks like a np.convolve(C, [a,b,c])  to me except for the boundary
  conditions.
 
 
 
  As Josef pointed out, this is a convolution.  There are (at least)
  three convolution functions in numpy+scipy that you could use:
  numpy.convolve, scipy.signal.convolve, and scipy.ndimage.convolve1d.
  Of these, scipy.ndimage.convolve1d is the fastest.  However, it doesn't
  beat the simple expression
     a*x[2:] + b*x[1:-1] + c*x[:-2]
  Your idea of forming a matrix without copying memory can be done using
  stride tricks, and for arrays of the size you are interested in, it
  computes the result faster than the simple expression (see below).
 
  Another fast alternative is to use one of the inline code generators.
  This example is a easy to implement with scipy.weave.blitz, and it gives
  a big speedup.
 
  Here's a test script:
 
  #- convolve1dtest.py -
 
 
  import numpy as np
  from numpy.lib.stride_tricks import as_strided
  from scipy.ndimage import convolve1d
  from scipy.weave import blitz
 
  # weighting coefficients
  a = -0.5
  b = 1.0
  c = -0.25
  w = np.array((a,b,c))
  # Reversed w:
  rw = w[::-1]
 
  # Length of C
  n = 250
 
  # The original version of the calculation:
  # Some dummy data
  C = np.arange(float(n))
  C[1:-1] = a * C[2:] + b * C[1:-1] + c * C[:-2]
  # Save for comparison.
  C0 = C.copy()
 
  # Do it again using a matrix multiplication.
  C = np.arange(float(n))
  # The virtual matrix view of C.
  V = as_strided(C, shape=(C.size-2, 3), strides=(C.strides[0],
  C.strides[0]))
  C[1:-1] = np.dot(V, rw)
  C1 = C.copy()
 
  # Again, with convolve1d this time.
  C = np.arange(float(n))
  C[1:-1] = convolve1d(C, w)[1:-1]
  C2 = C.copy()
 
  # scipy.weave.blitz
  C = np.arange(float(n))
  # Must work with a copy, D, in the formula, because blitz does not use
  # a temporary variable.
  D = C.copy()
  expr = C[1:-1] = a * D[2:] + b * D[1:-1] + c * D[:-2]
  blitz(expr, check_size=0)
  C3 = C.copy()
 
 
  # Verify that all the methods give the same result.
  print np.all(C0 == C1)
  print np.all(C0 == C2)
  print np.all(C0 == C3)
 
  #-
 
  And here's a snippet from an ipython session:
 
  In [51]: run convolve1dtest.py
  True
  True
  True
 
  In [52]: %timeit C[1:-1] = a * C[2:] + b * C[1:-1] + c * C[:-2]
  10 loops, best of 3: 16.5 us per loop
 
  In [53]: %timeit C[1:-1] = np.dot(V, rw)
  10 loops, best of 3: 9.84 us per loop
 
  In [54]: %timeit C[1:-1] = convolve1d(C, w)[1:-1]
  10 loops, best of 3: 18.7 us per loop
 
  In [55]: %timeit D = C.copy(); blitz(expr, check_size=0)
  10 loops, best of 3: 4.91 us per loop
 
 
 
  scipy.weave.blitz is fastest (but note that blitz has already been
  called
  once, so the time shown does not include the compilation required in
  the first call).  You could also try scipy.weave.inline, cython.inline,
  or np_inline (http://pages.cs.wisc.edu/~johnl/np_inline/).
 
  Also note that C[-1:1] = np.dot(V, rw) is faster than either the simple
  expression or convolve1d.  However, if you also have to set up V inside
  your inner loop, the speed gain will be lost.  The relative speeds also
  depend on the size of C.  For large C, the simple expression is faster
  than the matrix multiplication by V (but blitz is still faster).  In
  the following, I have changed n to 2500 before running
  convolve1dtest.py:
 
  In [56]: run convolve1dtest.py
  True
  True
  True
 
  In [57]: %timeit C[1:-1] = a * C[2:] + b * C[1:-1] + c * C[:-2]
  1 loops, best of 3: 29.5 us per loop
 
  In [58]: %timeit C[1:-1] = np.dot(V, rw)
  1 loops, best of 3: 56.4 us per loop
 
  In [59]: %timeit C[1:-1] = convolve1d(C, w)[1:-1]
  1 loops, best of 3: 37.3 us per loop
 
  In [60]: %timeit D = C.copy(); blitz(expr, check_size=0)
  10 loops, best of 3: 10.3 us per loop
 
 
  blitz wins, the simple numpy expression is a distant second, and now
  the matrix multiplication is slowest.
 
  I hope that helps--I know I learned quite a bit. :)

 Interesting, two questions

 does scipy.signal convolve have a similar overhead 

Re: [Numpy-discussion] speeding up the following expression

2011-11-12 Thread Warren Weckesser
On Sat, Nov 12, 2011 at 11:16 AM, josef.p...@gmail.com wrote:

 On Sat, Nov 12, 2011 at 11:32 AM, Warren Weckesser
 warren.weckes...@enthought.com wrote:
 
 
  On Sat, Nov 12, 2011 at 9:59 AM, josef.p...@gmail.com wrote:
 
  On Sat, Nov 12, 2011 at 10:31 AM, Warren Weckesser
  warren.weckes...@enthought.com wrote:
  
  
   On Sat, Nov 12, 2011 at 6:43 AM, josef.p...@gmail.com wrote:
  
   On Sat, Nov 12, 2011 at 3:36 AM, Geoffrey Zhu zyzhu2...@gmail.com
   wrote:
Hi,
   
I am playing with multiple ways to speed up the following
 expression
(it is in the inner loop):
   
   
C[1:(M - 1)]=(a * C[2:] + b * C[1:(M-1)] + c * C[:(M-2)])
   
where C is an array of about 200-300 elements, M=len(C), a, b, c
 are
scalars.
   
I played with numexpr, but it was way slower than directly using
numpy. It would be nice if I could create a Mx3 matrix without
copying
memory and so I can use dot() to calculate the whole thing.
   
Can anyone help with giving some advices to make this faster?
  
   looks like a np.convolve(C, [a,b,c])  to me except for the boundary
   conditions.
  
  
  
   As Josef pointed out, this is a convolution.  There are (at least)
   three convolution functions in numpy+scipy that you could use:
   numpy.convolve, scipy.signal.convolve, and scipy.ndimage.convolve1d.
   Of these, scipy.ndimage.convolve1d is the fastest.  However, it
 doesn't
   beat the simple expression
  a*x[2:] + b*x[1:-1] + c*x[:-2]
   Your idea of forming a matrix without copying memory can be done using
   stride tricks, and for arrays of the size you are interested in, it
   computes the result faster than the simple expression (see below).
  
   Another fast alternative is to use one of the inline code generators.
   This example is a easy to implement with scipy.weave.blitz, and it
 gives
   a big speedup.
  
   Here's a test script:
  
   #- convolve1dtest.py -
  
  
   import numpy as np
   from numpy.lib.stride_tricks import as_strided
   from scipy.ndimage import convolve1d
   from scipy.weave import blitz
  
   # weighting coefficients
   a = -0.5
   b = 1.0
   c = -0.25
   w = np.array((a,b,c))
   # Reversed w:
   rw = w[::-1]
  
   # Length of C
   n = 250
  
   # The original version of the calculation:
   # Some dummy data
   C = np.arange(float(n))
   C[1:-1] = a * C[2:] + b * C[1:-1] + c * C[:-2]
   # Save for comparison.
   C0 = C.copy()
  
   # Do it again using a matrix multiplication.
   C = np.arange(float(n))
   # The virtual matrix view of C.
   V = as_strided(C, shape=(C.size-2, 3), strides=(C.strides[0],
   C.strides[0]))
   C[1:-1] = np.dot(V, rw)
   C1 = C.copy()
  
   # Again, with convolve1d this time.
   C = np.arange(float(n))
   C[1:-1] = convolve1d(C, w)[1:-1]
   C2 = C.copy()
  
   # scipy.weave.blitz
   C = np.arange(float(n))
   # Must work with a copy, D, in the formula, because blitz does not use
   # a temporary variable.
   D = C.copy()
   expr = C[1:-1] = a * D[2:] + b * D[1:-1] + c * D[:-2]
   blitz(expr, check_size=0)
   C3 = C.copy()
  
  
   # Verify that all the methods give the same result.
   print np.all(C0 == C1)
   print np.all(C0 == C2)
   print np.all(C0 == C3)
  
   #-
  
   And here's a snippet from an ipython session:
  
   In [51]: run convolve1dtest.py
   True
   True
   True
  
   In [52]: %timeit C[1:-1] = a * C[2:] + b * C[1:-1] + c * C[:-2]
   10 loops, best of 3: 16.5 us per loop
  
   In [53]: %timeit C[1:-1] = np.dot(V, rw)
   10 loops, best of 3: 9.84 us per loop
  
   In [54]: %timeit C[1:-1] = convolve1d(C, w)[1:-1]
   10 loops, best of 3: 18.7 us per loop
  
   In [55]: %timeit D = C.copy(); blitz(expr, check_size=0)
   10 loops, best of 3: 4.91 us per loop
  
  
  
   scipy.weave.blitz is fastest (but note that blitz has already been
   called
   once, so the time shown does not include the compilation required in
   the first call).  You could also try scipy.weave.inline,
 cython.inline,
   or np_inline (http://pages.cs.wisc.edu/~johnl/np_inline/).
  
   Also note that C[-1:1] = np.dot(V, rw) is faster than either the
 simple
   expression or convolve1d.  However, if you also have to set up V
 inside
   your inner loop, the speed gain will be lost.  The relative speeds
 also
   depend on the size of C.  For large C, the simple expression is faster
   than the matrix multiplication by V (but blitz is still faster).  In
   the following, I have changed n to 2500 before running
   convolve1dtest.py:
  
   In [56]: run convolve1dtest.py
   True
   True
   True
  
   In [57]: %timeit C[1:-1] = a * C[2:] + b * C[1:-1] + c * C[:-2]
   1 loops, best of 3: 29.5 us per loop
  
   In [58]: %timeit C[1:-1] = np.dot(V, rw)
   1 loops, best of 3: 56.4 us per loop
  
   In [59]: %timeit C[1:-1] = convolve1d(C, w)[1:-1]
   1 loops, best of 3: 37.3 us per loop
  
   In [60]: %timeit D = C.copy(); blitz(expr, check_size=0)
   10 loops, best of 3: 10.3 us per loop
  
  
   blitz wins, 

Re: [Numpy-discussion] speeding up the following expression

2011-11-12 Thread Geoffrey Zhu
Hi Warren,

On Sat, Nov 12, 2011 at 9:31 AM, Warren Weckesser
warren.weckes...@enthought.com wrote:


 On Sat, Nov 12, 2011 at 6:43 AM, josef.p...@gmail.com wrote:

 On Sat, Nov 12, 2011 at 3:36 AM, Geoffrey Zhu zyzhu2...@gmail.com wrote:
  Hi,
 
  I am playing with multiple ways to speed up the following expression
  (it is in the inner loop):
 
 
  C[1:(M - 1)]=(a * C[2:] + b * C[1:(M-1)] + c * C[:(M-2)])
 
  where C is an array of about 200-300 elements, M=len(C), a, b, c are
  scalars.
 
  I played with numexpr, but it was way slower than directly using
  numpy. It would be nice if I could create a Mx3 matrix without copying
  memory and so I can use dot() to calculate the whole thing.
 
  Can anyone help with giving some advices to make this faster?

 looks like a np.convolve(C, [a,b,c])  to me except for the boundary
 conditions.



 As Josef pointed out, this is a convolution.  There are (at least)
 three convolution functions in numpy+scipy that you could use:
 numpy.convolve, scipy.signal.convolve, and scipy.ndimage.convolve1d.
 Of these, scipy.ndimage.convolve1d is the fastest.  However, it doesn't
 beat the simple expression
    a*x[2:] + b*x[1:-1] + c*x[:-2]
 Your idea of forming a matrix without copying memory can be done using
 stride tricks, and for arrays of the size you are interested in, it
 computes the result faster than the simple expression (see below).

 Another fast alternative is to use one of the inline code generators.
 This example is a easy to implement with scipy.weave.blitz, and it gives
 a big speedup.

 Here's a test script:

 #- convolve1dtest.py -


 import numpy as np
 from numpy.lib.stride_tricks import as_strided
 from scipy.ndimage import convolve1d
 from scipy.weave import blitz

 # weighting coefficients
 a = -0.5
 b = 1.0
 c = -0.25
 w = np.array((a,b,c))
 # Reversed w:
 rw = w[::-1]

 # Length of C
 n = 250

 # The original version of the calculation:
 # Some dummy data
 C = np.arange(float(n))
 C[1:-1] = a * C[2:] + b * C[1:-1] + c * C[:-2]
 # Save for comparison.
 C0 = C.copy()

 # Do it again using a matrix multiplication.
 C = np.arange(float(n))
 # The virtual matrix view of C.
 V = as_strided(C, shape=(C.size-2, 3), strides=(C.strides[0], C.strides[0]))
 C[1:-1] = np.dot(V, rw)
 C1 = C.copy()

 # Again, with convolve1d this time.
 C = np.arange(float(n))
 C[1:-1] = convolve1d(C, w)[1:-1]
 C2 = C.copy()

 # scipy.weave.blitz
 C = np.arange(float(n))
 # Must work with a copy, D, in the formula, because blitz does not use
 # a temporary variable.
 D = C.copy()
 expr = C[1:-1] = a * D[2:] + b * D[1:-1] + c * D[:-2]
 blitz(expr, check_size=0)
 C3 = C.copy()


 # Verify that all the methods give the same result.
 print np.all(C0 == C1)
 print np.all(C0 == C2)
 print np.all(C0 == C3)

 #-

 And here's a snippet from an ipython session:

 In [51]: run convolve1dtest.py
 True
 True
 True

 In [52]: %timeit C[1:-1] = a * C[2:] + b * C[1:-1] + c * C[:-2]
 10 loops, best of 3: 16.5 us per loop

 In [53]: %timeit C[1:-1] = np.dot(V, rw)
 10 loops, best of 3: 9.84 us per loop

 In [54]: %timeit C[1:-1] = convolve1d(C, w)[1:-1]
 10 loops, best of 3: 18.7 us per loop

 In [55]: %timeit D = C.copy(); blitz(expr, check_size=0)
 10 loops, best of 3: 4.91 us per loop



 scipy.weave.blitz is fastest (but note that blitz has already been called
 once, so the time shown does not include the compilation required in
 the first call).  You could also try scipy.weave.inline, cython.inline,
 or np_inline (http://pages.cs.wisc.edu/~johnl/np_inline/).

 Also note that C[-1:1] = np.dot(V, rw) is faster than either the simple
 expression or convolve1d.  However, if you also have to set up V inside
 your inner loop, the speed gain will be lost.  The relative speeds also
 depend on the size of C.  For large C, the simple expression is faster
 than the matrix multiplication by V (but blitz is still faster).  In
 the following, I have changed n to 2500 before running convolve1dtest.py:

 In [56]: run convolve1dtest.py
 True
 True
 True

 In [57]: %timeit C[1:-1] = a * C[2:] + b * C[1:-1] + c * C[:-2]
 1 loops, best of 3: 29.5 us per loop

 In [58]: %timeit C[1:-1] = np.dot(V, rw)
 1 loops, best of 3: 56.4 us per loop

 In [59]: %timeit C[1:-1] = convolve1d(C, w)[1:-1]
 1 loops, best of 3: 37.3 us per loop

 In [60]: %timeit D = C.copy(); blitz(expr, check_size=0)
 10 loops, best of 3: 10.3 us per loop


 blitz wins, the simple numpy expression is a distant second, and now
 the matrix multiplication is slowest.

 I hope that helps--I know I learned quite a bit. :)


 Warren

Thanks for the very helpful response. Fortunately I don't have to set
up the matrix every time as I can work on the same C over and over
again inside the loop. It's counterintuitive to see that the
performance of dot() degrades as n goes up. If I didn't see the
results, I would have guessed otherwise.

Thanks again!

Geoffrey
___