On 06/09/2010 10:24 AM, Vicente Sole wrote:
? Well a loop or list comparison seems like a good choice to me. It is
much more obvious at the expense of two LOCs. Did you profile the two
possibilities and are they actually performance-critical?

cheers


The second is between 8 and ten times faster on my machine.

import numpy
import time
x0 = numpy.arange(10000.)
niter = 2000   # I expect between 10000 and 100000


def option1(x, delta=0.2):
      y = [x[0]]
      for value in x:
          if (value - y[-1])>  delta:
              y.append(value)
      return numpy.array(y)

def option2(x, delta=0.2):
      y = numpy.cumsum((x[1:]-x[:-1])/delta).astype(numpy.int)
      i1 = numpy.nonzero(y[1:]>   y[:-1])
      return numpy.take(x, i1)


t0 = time.time()
for i in range(niter):
      t = option1(x0)
print "Elapsed = ", time.time() - t0
t0 = time.time()
for i in range(niter):
      t = option2(x0)
print "Elapsed = ", time.time() - t0

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
For integer arguments for delta, I don't see any different between using option1 and using the '%' operator.
>>> (x0[(x0*10)%2==0]-option1(x0)).sum()
0.0

Also option2 gives a different result than option1 so these are not equivalent functions. You can see that from the shapes
>>> option2(x0).shape
(1, 9998)
>>> option1(x0).shape
(10000,)
>>> ((option1(x0)[:9998])-option2(x0)).sum()
0.0

So, allowing for shape difference, option2 is the same for most of output from option1 but it is still smaller than option1.

Probably the main reason for the speed difference is that option2 is virtually pure numpy (and hence done in C) and option1 is using a lot of array lookups that are always slow. So keep it in numpy as most as possible.


Bruce
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Reply via email to