Never mind this, it was my own mistake as I expected :-) def __chunk(n,size): x = range(0,n,size) x.append(n) return zip(x[:-1],x[1:])
makes it a lot better :) Sturla Den 18.01.2012 06:26, skrev Sturla Molden: > While "playing" with a point-in-polygon test, I have discovered some a > failure mode that I cannot make sence of. > > The algorithm is vectorized for NumPy from a C and Python implementation > I found on the net (see links below). It is written to process a large > dataset in chunks. I'm rather happy with it, it can test 100,000 x,y > points against a non-convex pentagon in just 50 ms. > > Anyway, here is something very strange (or at least I think so): > > If I use a small chunk size, it sometimes fails. I know I shouldn't > blame it on NumPy, beacuse it is by all likelood my mistake. But it does > not make any sence, as the parameter should not affect the computation. > > Observed behavior: > > 1. Processing the whole dataset in one big chunk always works. > > 2. Processing the dataset in big chunks (e.g. 8192 points) always works. > > 3. Processing the dataset in small chunks (e.g. 32 points) sometimes fail. > > 4. Processing the dataset element-wise always work. > > 5. The scalar version behaves like the numpy version: fine for large > chunks, sometimes it fails for small. That is, when list comprehensions > is used for chunks. Big list comprehensions always work, small ones > might fail. > > It looks like the numerical robstness of the alorithm depends on a > parameter that has nothing to do with the algorithm at all. For example > in (5), we might think that calling a function from a nested loop makes > it fail, depending on the length of the inner loop. But calling it from > a single loop works just fine. > > ??? > > So I wonder: > > Could there be a bug in numpy that only shows up only when taking a huge > number of short slices? > > I don't know... But try it if you care. > > In the function "inpolygon", change the call that says __chunk(n,8192) > to e.g. __chunk(n,32) to see it fail (or at least it does on my > computer, running Enthought 7.2-1 on Win64). > > > Regards, > Sturla Molden > > > > > > def __inpolygon_scalar(x,y,poly): > > # Source code taken from: > # http://paulbourke.net/geometry/insidepoly > # http://www.ariel.com.au/a/python-point-int-poly.html > > n = len(poly) > inside = False > p1x,p1y = poly[0] > xinters = 0 > for i in range(n+1): > p2x,p2y = poly[i % n] > if y> min(p1y,p2y): > if y<= max(p1y,p2y): > if x<= max(p1x,p2x): > if p1y != p2y: > xinters = (y-p1y)*(p2x-p1x)/(p2y-p1y)+p1x > if p1x == p2x or x<= xinters: > inside = not inside > p1x,p1y = p2x,p2y > return inside > > > # the rest is (C) Sturla Molden, 2012 > # University of Oslo > > def __inpolygon_numpy(x,y,poly): > """ numpy vectorized version """ > n = len(poly) > inside = np.zeros(x.shape[0], dtype=bool) > xinters = np.zeros(x.shape[0], dtype=float) > p1x,p1y = poly[0] > for i in range(n+1): > p2x,p2y = poly[i % n] > mask = (y> min(p1y,p2y))& (y<= max(p1y,p2y))& (x<= > max(p1x,p2x)) > if p1y != p2y: > xinters[mask] = (y[mask]-p1y)*(p2x-p1x)/(p2y-p1y)+p1x > if p1x == p2x: > inside[mask] = ~inside[mask] > else: > mask2 = x[mask]<= xinters[mask] > idx, = np.where(mask) > idx2, = np.where(mask2) > idx = idx[idx2] > inside[idx] = ~inside[idx] > p1x,p1y = p2x,p2y > return inside > > def __chunk(n,size): > x = range(0,n,size) > if (n%size): > x.append(n) > return zip(x[:-1],x[1:]) > > def inpolygon(x, y, poly): > """ > point-in-polygon test > x and y are numpy arrays > polygon is a list of (x,y) vertex tuples > """ > if np.isscalar(x) and np.isscalar(y): > return __inpolygon_scalar(x, y, poly) > else: > x = np.asarray(x) > y = np.asarray(y) > n = x.shape[0] > z = np.zeros(n, dtype=bool) > for i,j in __chunk(n,8192): # COMPARE WITH __chunk(n,32) ??? > if j-i> 1: > z[i:j] = __inpolygon_numpy(x[i:j], y[i:j], poly) > else: > z[i] = __inpolygon_scalar(x[i], y[i], poly) > return z > > > > if __name__ == "__main__": > > import matplotlib > import matplotlib.pyplot as plt > from time import clock > > n = 100000 > polygon = [(0.,.1), (1.,.1), (.5,1.), (0.,.75), (.5,.5), (0.,.1)] > xp = [x for x,y in polygon] > yp = [y for x,y in polygon] > x = np.random.rand(n) > y = np.random.rand(n) > t0 = clock() > inside = inpolygon(x,y,polygon) > t1 = clock() > print 'elapsed time %.3g ms' % ((t0-t1)*1E3,) > plt.figure() > plt.plot(x[~inside],y[~inside],'ob', xp, yp, '-g') > plt.axis([0,1,0,1]) > plt.show() > > > > > > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion