On 07/22/2010 06:47 AM, Robin Kraft wrote: > Hello all, > > The short version: For a given NxN array, is there an efficient way to use a > moving window to collect a summary statistic on a chunk of the array, and > insert it into another array?
Hi Robin, been wrestling with similar stuff myself, though I have no obvious answer... A lot depends on your data and the stats you want. Some thoughts: - though you want non-overlapping stats, you could take a look at scipy.ndimage. Lots of this is coded in C, and though it does overlapping moving window stats, its speed might outweigh the stuff you have to do otherwise to 'tile' your array. You would then just slice the result of the ndimage operation (e.g. [::2, ::2] or similar). - an other option would be some smart reshaping, which finally gives you a [y//2, x//2, 2, 2] array, which you could then reduce to calculate stats (mean, std, etc) on the last two axes. I *think* you'd have to first reshape both x and y axes, and then reposition the axes. An example: a = gdal_array.BandReadAsArray(blabla) y,x = a.shape #y and x need be divideable by 2! b = a.reshape(y/2, 2, x/2, x).transpose(0,2,1,3).reshape(y/2, x/2, 4) bMean = b.mean(axis=-1) bMax = ......etc. - a third option would be to create an index array, which has a unique value per 2x2 square, and then use histogram2d. This would, if you use its 'weight' functionality, at least enable you to get efficient counts and sums/means. Other stats might be hard, though. Good luck! Vincent Schut. > > The long version: I am trying to resample an image loaded with GDAL into an > NxN array. Note that this is for statistical purposes, so image quality > doesn't matter. For the curious, the image is derived from satellite imagery > and displays a map of hotspots of tropical deforestation at 500m resolution. > I need to assign a count of these deforestation 'hits' to each pixel in an > existing array of 1000m pixels. > > > Let's say the image looks like this: np.random.randint(0,2, 16).reshape(4,4) > > array([[0, 0, 0, 1], > [0, 0, 1, 1], > [1, 1, 0, 0], > [0, 0, 0, 0]]) > > I want to use a square, non-overlapping moving window for resampling, so that > I get a count of all the 1's in each 2x2 window. > > 0, 0, 0, 1 > 0, 0, 1, 1 0 3 > => 2 0 > 1, 1, 0, 0 > 0, 0, 0, 0 > > In another situation with similar data I'll need the average, or the maximum > value, etc.. > > My brute-force method is to loop through the rows and columns to get little > chunks of data to process. But must be a more elegant way to do this - it's > probably obvious too ... (inelegant way further down). > > Another way to do it would be to use np.tile to create an array called "arr" > filed with blocks of [[0,1],[2,3]]. I could then use something like this to > get the stats I want: > > d0 = img[np.where(arr==0)] > d1 = img[np.where(arr==1)] > d2 = img[np.where(arr==2)] > d3 = img[np.where(arr==3)] > > img_out = d0 + d1 + d2 + d3 > > This would be quite snappy if I could create arr efficiently. Unfortunately > np.tile does something akin to np.hstack to create this array, so it isn't > square and can't be reshaped appropriately > (np.tile(np.arange(2**2).reshape(2, 2), 4)): > > array([[0, 1, 0, 1, 0, 1, 0, 1], > [2, 3, 2, 3, 2, 3, 2, 3]]) > > Inefficient sample code below. Advice greatly appreciated! > > -Robin > > > import numpy as np > from math import sqrt > from time import time > > def rs(img_size=16, window_size=2): > w = window_size > > # making sure the numbers work > > if img_size % sqrt(img_size)> 0: > print "Square root of image size must be an integer." > print "Sqrt =", sqrt(img_size) > > return > > elif (img_size / sqrt(img_size)) % w> 0: > print "Square root of image size must be evenly divisible by", w > print "Sqrt =", sqrt(img_size) > print sqrt(img_size), "/", w, "=", sqrt(img_size)/w > > return > > else: > > rows = int(sqrt(img_size)) > cols = int(sqrt(img_size)) > > # create fake data: ones and zeros > img = np.random.randint(0, 2, img_size).reshape(rows, cols) > > # create empty array for resampled data > img_out = np.zeros((rows/w, cols/w), dtype=np.int) > > t = time() > > # retreive blocks of pixels in img > # insert summary into img_out > > for r in xrange(0, rows, w): # skip rows based on window size > for c in xrange(0, cols, w): # skip columns based on window size > > # calculate the sum of the values in moving window, > #insert value into corresponding pixel in img_out > > img_out[r/w, c/w] = np.int(np.sum(img[r:r+w, c:c+w])) > > t1= time() > elapsed = t1-t > print "img shape:", img.shape > print img, "\n" > print "img_out shape:", img_out.shape > print img_out > > print "%.7f seconds" % elapsed > > rs(imgage_size=16, window=2) > rs(81, 3) > rs(1000000) > #rs(4800*4800) # very slow _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion