Re: [Numpy-discussion] summarizing blocks of an array using a moving window

2010-07-22 Thread Keith Goodman
On Thu, Jul 22, 2010 at 10:35 AM, Warren Weckesser
 wrote:
> Keith Goodman wrote:
>> On Thu, Jul 22, 2010 at 7:48 AM, Warren Weckesser
>>  wrote:
>>
>>
>>> Actually, because of the use of reshape(3,3,4), your second
>>> example does make a copy.
>>>
>>
>> When does reshape return a view and when does it return a copy?
>>
>>
>
> According to the numpy.reshape docstring, it returns
> a view when it can.  In the previous example, it is
> not possible to configure the strides so that the
> four elements in each 2x2 block can be represented
> in a single axis using the original memory layout,
> so the data must be copied to achieve the shape
> (3,3,4).
>
>> Here's a simple example that returns a view:
>>
>>
 x = np.array([1,2,3,4])
 y = x.reshape(2,2)
 y[0,0] = 9
 x

>>    array([9, 2, 3, 4])
>>
>> What's a simple example that returns a copy?
>>
>
> In [85]: x = np.array([[1,2],[3,4],[5,6]]).T  # Note the transpose.
>
> In [86]: x
> Out[86]:
> array([[1, 3, 5],
>       [2, 4, 6]])
>
> In [87]: y = x.reshape(6)
>
> In [88]: x[0,1] = 99
>
> In [89]: y
> Out[89]: array([1, 3, 5, 2, 4, 6])

Thanks, Warren. It's very useful to me to know that reshape can return
a copy. Now I know how to slow down the other guy's function in a
horse race.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] summarizing blocks of an array using a moving window

2010-07-22 Thread Warren Weckesser
Keith Goodman wrote:
> On Thu, Jul 22, 2010 at 7:48 AM, Warren Weckesser
>  wrote:
>
>   
>> Actually, because of the use of reshape(3,3,4), your second
>> example does make a copy.
>> 
>
> When does reshape return a view and when does it return a copy?
>
>   

According to the numpy.reshape docstring, it returns
a view when it can.  In the previous example, it is
not possible to configure the strides so that the
four elements in each 2x2 block can be represented
in a single axis using the original memory layout,
so the data must be copied to achieve the shape
(3,3,4).

> Here's a simple example that returns a view:
>
>   
>>> x = np.array([1,2,3,4])
>>> y = x.reshape(2,2)
>>> y[0,0] = 9
>>> x
>>>   
>array([9, 2, 3, 4])
>
> What's a simple example that returns a copy?
>   

In [85]: x = np.array([[1,2],[3,4],[5,6]]).T  # Note the transpose.

In [86]: x
Out[86]:
array([[1, 3, 5],
   [2, 4, 6]])

In [87]: y = x.reshape(6)

In [88]: x[0,1] = 99

In [89]: y
Out[89]: array([1, 3, 5, 2, 4, 6])



Warren

> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>   

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] summarizing blocks of an array using a moving window

2010-07-22 Thread Robin Kraft
Vincent, Pauli,


> From: Vincent Schut 

> - an other option would be some smart reshaping, which finally gives you 
> a [y//2, x//2, 2, 2] array, which you could then reduce to calculate 
> stats (mean, std, etc) on the last two axes.  I *think* you'd have to 
> first reshape both x and y axes, and then reposition the axes. An
> example:
> a = gdal_array.BandReadAsArray(blabla)
> y,x = a.shape #y and x need be divideable by 2!
> b = a.reshape(y/2, 2, x/2, x).transpose(0,2,1,3).reshape(y/2, x/2, 4)
> bMean = b.mean(axis=-1)
> bMax = ..etc.


You and Pauli agree on this - seems like a good option.


> - a third option would be to create an index array, which has a unique 
> value per 2x2 square, and then use histogram2d. This would, if you use 
> its 'weight' functionality, at least enable you to get efficient counts 
> and sums/means. Other stats might be hard, though.

H I don't get this, but I'll experiment. I've seen some really useful 
things done with histogram2d so it seems worthwhile to figure out.


> Message: 6
> From: Pauli Virtanen 

> > Let's say the image looks like this: np.random.randint(0,2,
> > 16).reshape(4,4)
> > 
> > array([[0, 0, 0, 1],
> >[0, 0, 1, 1],
> >[1, 1, 0, 0],
> >[0, 0, 0, 0]])
> > 
> > I want to use a square, non-overlapping moving window for resampling, so
> > that I get a count of all the 1's in each 2x2 window.
> > 
> > 0, 0,   0, 1
> > 0, 0,   1, 1 0  3
> > =>   2  0
> > 1, 1,   0, 0
> > 0, 0,   0, 0
> > 
> > In another situation with similar data I'll need the average, or the
> > maximum value, etc..
> 
> Non-overlapping windows can be done by reshaping:
> 
> x = np.array([[0, 0, 0, 1, 1, 1],
>   [0, 0, 1, 1, 0, 0],
>   [1, 1, 0, 0, 1, 1],
>   [0, 0, 0, 0, 1, 1],
>   [1, 0, 1, 0, 1, 1],
>   [0, 0, 1, 0, 0, 0]])
> 
> y = x.reshape(3,2,3,2)
> y2 = y.sum(axis=3).sum(axis=1)


This is perfect - and so fast! Thanks! Now I just have to understand why it 
works ...

Can anyone recommend a tutorial on working with (slicing, reshaping, etc.) 
multi-dimensional arrays? Stefan's slides are beautiful but my brain starts to 
hurt if I try to follow them line by line.

-Robin
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] summarizing blocks of an array using a moving window

2010-07-22 Thread Keith Goodman
On Thu, Jul 22, 2010 at 7:48 AM, Warren Weckesser
 wrote:

> Actually, because of the use of reshape(3,3,4), your second
> example does make a copy.

When does reshape return a view and when does it return a copy?

Here's a simple example that returns a view:

>> x = np.array([1,2,3,4])
>> y = x.reshape(2,2)
>> y[0,0] = 9
>> x
   array([9, 2, 3, 4])

What's a simple example that returns a copy?
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] summarizing blocks of an array using a moving window

2010-07-22 Thread Warren Weckesser
Pauli Virtanen wrote:
> Thu, 22 Jul 2010 00:47:20 -0400, Robin Kraft wrote:
> [clip]
>   
>> Let's say the image looks like this: np.random.randint(0,2,
>> 16).reshape(4,4)
>>
>> array([[0, 0, 0, 1],
>>[0, 0, 1, 1],
>>[1, 1, 0, 0],
>>[0, 0, 0, 0]])
>>
>> I want to use a square, non-overlapping moving window for resampling, so
>> that I get a count of all the 1's in each 2x2 window.
>>
>> 0, 0,   0, 1
>> 0, 0,   1, 1 0  3
>> =>   2  0
>> 1, 1,   0, 0
>> 0, 0,   0, 0
>>
>> In another situation with similar data I'll need the average, or the
>> maximum value, etc..
>> 
>
> Non-overlapping windows can be done by reshaping:
>
> x = np.array([[0, 0, 0, 1, 1, 1],
>   [0, 0, 1, 1, 0, 0],
>   [1, 1, 0, 0, 1, 1],
>   [0, 0, 0, 0, 1, 1],
>   [1, 0, 1, 0, 1, 1],
>   [0, 0, 1, 0, 0, 0]])
>
> y = x.reshape(3,2,3,2)
> y2 = y.sum(axis=3).sum(axis=1)
>
> # -> array([[0, 3, 2],
> #   [2, 0, 4],
> #   [1, 2, 2]])
>
> y2 = x.reshape(3,2,3,2).transpose(0,2,1,3).reshape(3,3,4).sum(axis=-1)
>
> # -> array([[0, 3, 2],
> #   [2, 0, 4],
> #   [1, 2, 2]])
>
>
> The above requires no copying of data, and should be relatively fast.

Actually, because of the use of reshape(3,3,4), your second
example does make a copy.

Warren

>  If 
> you need overlapping windows, those can be emulated with strides:
>
>   http://mentat.za.net/numpy/scipy2009/stefanv_numpy_advanced.pdf
>   http://conference.scipy.org/scipy2010/slides/tutorials
>   /stefan_vd_walt_numpy_advanced.pdf
>
>   

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] summarizing blocks of an array using a moving window

2010-07-22 Thread Pauli Virtanen
Thu, 22 Jul 2010 00:47:20 -0400, Robin Kraft wrote:
[clip]
> Let's say the image looks like this: np.random.randint(0,2,
> 16).reshape(4,4)
> 
> array([[0, 0, 0, 1],
>[0, 0, 1, 1],
>[1, 1, 0, 0],
>[0, 0, 0, 0]])
> 
> I want to use a square, non-overlapping moving window for resampling, so
> that I get a count of all the 1's in each 2x2 window.
> 
> 0, 0,   0, 1
> 0, 0,   1, 1 0  3
> =>   2  0
> 1, 1,   0, 0
> 0, 0,   0, 0
> 
> In another situation with similar data I'll need the average, or the
> maximum value, etc..

Non-overlapping windows can be done by reshaping:

x = np.array([[0, 0, 0, 1, 1, 1],
  [0, 0, 1, 1, 0, 0],
  [1, 1, 0, 0, 1, 1],
  [0, 0, 0, 0, 1, 1],
  [1, 0, 1, 0, 1, 1],
  [0, 0, 1, 0, 0, 0]])

y = x.reshape(3,2,3,2)
y2 = y.sum(axis=3).sum(axis=1)

# -> array([[0, 3, 2],
#   [2, 0, 4],
#   [1, 2, 2]])

y2 = x.reshape(3,2,3,2).transpose(0,2,1,3).reshape(3,3,4).sum(axis=-1)

# -> array([[0, 3, 2],
#   [2, 0, 4],
#   [1, 2, 2]])


The above requires no copying of data, and should be relatively fast. If 
you need overlapping windows, those can be emulated with strides:

http://mentat.za.net/numpy/scipy2009/stefanv_numpy_advanced.pdf
http://conference.scipy.org/scipy2010/slides/tutorials
/stefan_vd_walt_numpy_advanced.pdf

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] summarizing blocks of an array using a moving window

2010-07-22 Thread Vincent Schut
On 07/22/2010 06:47 AM, Robin Kraft wrote:
> Hello all,
>
> The short version: For a given NxN array, is there an efficient way to use a 
> moving window to collect a summary statistic on a chunk of the array, and 
> insert it into another array?

Hi Robin,

been wrestling with similar stuff myself, though I have no obvious 
answer... A lot depends on your data and the stats you want.

Some thoughts:

- though you want non-overlapping stats, you could take a look at 
scipy.ndimage. Lots of this is coded in C, and though it does 
overlapping moving window stats, its speed might outweigh the stuff you 
have to do otherwise to 'tile' your array. You would then just slice the 
result of the ndimage operation (e.g. [::2, ::2] or similar).
- an other option would be some smart reshaping, which finally gives you 
a [y//2, x//2, 2, 2] array, which you could then reduce to calculate 
stats (mean, std, etc) on the last two axes.  I *think* you'd have to 
first reshape both x and y axes, and then reposition the axes. An example:
a = gdal_array.BandReadAsArray(blabla)
y,x = a.shape #y and x need be divideable by 2!
b = a.reshape(y/2, 2, x/2, x).transpose(0,2,1,3).reshape(y/2, x/2, 4)
bMean = b.mean(axis=-1)
bMax = ..etc.
- a third option would be to create an index array, which has a unique 
value per 2x2 square, and then use histogram2d. This would, if you use 
its 'weight' functionality, at least enable you to get efficient counts 
and sums/means. Other stats might be hard, though.

Good luck!
Vincent Schut.
>
> The long version: I am trying to resample an image loaded with GDAL into an 
> NxN array. Note that this is for statistical purposes, so image quality 
> doesn't matter. For the curious, the image is derived from satellite imagery 
> and displays a map of hotspots of tropical deforestation at 500m resolution. 
> I need to assign a count of these deforestation 'hits' to each pixel in an 
> existing array of 1000m pixels.
>
>
> Let's say the image looks like this: np.random.randint(0,2, 16).reshape(4,4)
>
> array([[0, 0, 0, 1],
> [0, 0, 1, 1],
> [1, 1, 0, 0],
> [0, 0, 0, 0]])
>
> I want to use a square, non-overlapping moving window for resampling, so that 
> I get a count of all the 1's in each 2x2 window.
>
> 0, 0,   0, 1
> 0, 0,   1, 1 0  3
>  =>2  0
> 1, 1,   0, 0
> 0, 0,   0, 0
>
> In another situation with similar data I'll need the average, or the maximum 
> value, etc..
>
> My brute-force method is to loop through the rows and columns to get little 
> chunks of data to process. But must be a more elegant way to do this - it's 
> probably obvious too ... (inelegant way further down).
>
> Another way to do it would be to use np.tile to create an array called "arr" 
> filed with blocks of [[0,1],[2,3]]. I could then use something like this to 
> get the stats I want:
>
> d0 = img[np.where(arr==0)]
> d1 = img[np.where(arr==1)]
> d2 = img[np.where(arr==2)]
> d3 = img[np.where(arr==3)]
>
> img_out = d0 + d1 + d2 + d3
>
> This would be quite snappy if I could create arr efficiently. Unfortunately 
> np.tile does something akin to np.hstack to create this array, so it isn't 
> square and can't be reshaped appropriately 
> (np.tile(np.arange(2**2).reshape(2, 2), 4)):
>
> array([[0, 1, 0, 1, 0, 1, 0, 1],
> [2, 3, 2, 3, 2, 3, 2, 3]])
>
> Inefficient sample code below. Advice greatly appreciated!
>
> -Robin
>
>
> import numpy as np
> from math import sqrt
> from time import time
>
> def rs(img_size=16, window_size=2):
>  w = window_size
>
>  # making sure the numbers work
>
>  if img_size % sqrt(img_size)>  0:
>  print "Square root of image size must be an integer."
>  print "Sqrt =", sqrt(img_size)
>
>  return
>
>  elif (img_size / sqrt(img_size)) % w>  0:
>  print "Square root of image size must be evenly divisible by", w
>  print "Sqrt =", sqrt(img_size)
>  print sqrt(img_size), "/", w, "=", sqrt(img_size)/w
>
>  return
>
>  else:
>
>  rows = int(sqrt(img_size))
>  cols = int(sqrt(img_size))
>
>  # create fake data: ones and zeros
>  img = np.random.randint(0, 2, img_size).reshape(rows, cols)
>
>  # create empty array for resampled data
>  img_out = np.zeros((rows/w, cols/w), dtype=np.int)
>
>  t = time()
>
>  # retreive blocks of pixels in img
>  # insert summary into img_out
>
>  for r in xrange(0, rows, w): # skip rows based on window size
>  for c in xrange(0, cols, w): # skip columns based on window size
>
>  # calculate the sum of the values in moving window,
>  #insert value into corresponding pixel in img_out
>
>  img_out[r/w, c/w] = np.int(np.sum(img[r:r+w, c:c+w]))
>
>  t1= time()
>  elapsed = t1-t
>  print "img shape:", img.shape
>  print img, "\n"
>  print "img_out 

[Numpy-discussion] summarizing blocks of an array using a moving window

2010-07-21 Thread Robin Kraft
Hello all,

The short version: For a given NxN array, is there an efficient way to use a 
moving window to collect a summary statistic on a chunk of the array, and 
insert it into another array?

The long version: I am trying to resample an image loaded with GDAL into an NxN 
array. Note that this is for statistical purposes, so image quality doesn't 
matter. For the curious, the image is derived from satellite imagery and 
displays a map of hotspots of tropical deforestation at 500m resolution. I need 
to assign a count of these deforestation 'hits' to each pixel in an existing 
array of 1000m pixels.


Let's say the image looks like this: np.random.randint(0,2, 16).reshape(4,4)

array([[0, 0, 0, 1],
   [0, 0, 1, 1],
   [1, 1, 0, 0],
   [0, 0, 0, 0]])

I want to use a square, non-overlapping moving window for resampling, so that I 
get a count of all the 1's in each 2x2 window.

0, 0,   0, 1
0, 0,   1, 1 0  3
=>   2  0
1, 1,   0, 0
0, 0,   0, 0

In another situation with similar data I'll need the average, or the maximum 
value, etc..

My brute-force method is to loop through the rows and columns to get little 
chunks of data to process. But must be a more elegant way to do this - it's 
probably obvious too ... (inelegant way further down).

Another way to do it would be to use np.tile to create an array called "arr" 
filed with blocks of [[0,1],[2,3]]. I could then use something like this to get 
the stats I want:

d0 = img[np.where(arr==0)]
d1 = img[np.where(arr==1)]
d2 = img[np.where(arr==2)]
d3 = img[np.where(arr==3)]

img_out = d0 + d1 + d2 + d3

This would be quite snappy if I could create arr efficiently. Unfortunately 
np.tile does something akin to np.hstack to create this array, so it isn't 
square and can't be reshaped appropriately (np.tile(np.arange(2**2).reshape(2, 
2), 4)):

array([[0, 1, 0, 1, 0, 1, 0, 1],
   [2, 3, 2, 3, 2, 3, 2, 3]])

Inefficient sample code below. Advice greatly appreciated!

-Robin


import numpy as np
from math import sqrt
from time import time

def rs(img_size=16, window_size=2):
w = window_size

# making sure the numbers work

if img_size % sqrt(img_size) > 0:
print "Square root of image size must be an integer."
print "Sqrt =", sqrt(img_size)

return

elif (img_size / sqrt(img_size)) % w > 0:
print "Square root of image size must be evenly divisible by", w
print "Sqrt =", sqrt(img_size)
print sqrt(img_size), "/", w, "=", sqrt(img_size)/w

return

else:

rows = int(sqrt(img_size))
cols = int(sqrt(img_size))

# create fake data: ones and zeros
img = np.random.randint(0, 2, img_size).reshape(rows, cols)

# create empty array for resampled data
img_out = np.zeros((rows/w, cols/w), dtype=np.int)

t = time()

# retreive blocks of pixels in img
# insert summary into img_out

for r in xrange(0, rows, w): # skip rows based on window size
for c in xrange(0, cols, w): # skip columns based on window size

# calculate the sum of the values in moving window,
#insert value into corresponding pixel in img_out

img_out[r/w, c/w] = np.int(np.sum(img[r:r+w, c:c+w]))

t1= time()
elapsed = t1-t
print "img shape:", img.shape
print img, "\n"
print "img_out shape:", img_out.shape
print img_out

print "%.7f seconds" % elapsed

rs(imgage_size=16, window=2)
rs(81, 3)
rs(100)
#rs(4800*4800) # very slow
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion