Re: [Numpy-discussion] first recarray steps

2008-05-22 Thread Vincent Schut
Anne Archibald wrote:
> 2008/5/21 Vincent Schut <[EMAIL PROTECTED]>:
>> Christopher Barker wrote:
>>> Also, if you image data is rgb, usually, that's a (width, height, 3)
>>> array: rgbrgbrgbrgb... in memory. If you have a (3, width, height)
>>> array, then that's rrr... Some image libs
>>> may give you that, I'm not sure.
>> My data is. In fact, this is a simplification of my situation; I'm
>> processing satellite data, which usually has more (and other) bands than
>> just rgb. But the data is definitely in shape (bands, y, x).
> 
> You may find your life becomes easier if you transpose the data in
> memory. This can make a big difference to efficiency. Years ago I was
> working with enormous (by the standards of the day) MATLAB files on
> disk, storing complex data. The way (that version of) MATLAB
> represented complex data was the way you describe: matrix of real
> parts, matrix of imaginary parts. This meant that to draw a single
> pixel, the disk needed to seek twice... depending on what sort of
> operations you're doing, transposing your data so that each pixel is
> all in one place may improve cache coherency as well as making the use
> of record arrays possible.
> 
> Anne

Anne, thanks for the thoughts. In most cases, you'll probably be right. 
In this case, however, it won't give me much (if any) speedup, maybe 
even slowdown. Satellite images often are stored on disk in a band 
sequential manner. The library I use for IO is GDAL, which is a higly 
optimized c library for reading/writing almost any kind of satellite 
data type. It also features an internal caching mechanism. And it gives 
me my data as (y, x, bands).
I'm not reading single pixels anyway. The amounts of data I have to 
process (enormous, even by the standards of today ;-)) require me to do 
this in chunks, in parallel, even on different cores/cpu's/computers. 
Every chunk usually is (chunkYSize, chunkXSize, allBands) with xsize and 
ysize being not so small (think from 64^2 to 1024^2) so that pretty much 
eliminates any performance issues regarding the data on disk. 
Furthermore, having to process on multiple computers forces me to have 
my data on networked storage. The latency and transfer rate of the 
network will probably eliminate any small speedup because my drive has 
to do less seeks...
Now for the recarray part, that would indeed ease my life a bit :) 
However, having to transpose the data in memory on every read and write 
does not sound very attractive. It will spoil cycles, and memory, and be 
asking for bugs. I can live without recarrays, for sure. I only hoped 
they might make my live a bit easier and my code a bit more readable, 
without too much effort. Well, they won't, apparently... I'll just go on 
like I did before this little excercise.

Thanks all for the inputs.

Cheers,
Vincent.

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] first recarray steps

2008-05-21 Thread Anne Archibald
2008/5/21 Vincent Schut <[EMAIL PROTECTED]>:
> Christopher Barker wrote:
>>
>> Also, if you image data is rgb, usually, that's a (width, height, 3)
>> array: rgbrgbrgbrgb... in memory. If you have a (3, width, height)
>> array, then that's rrr... Some image libs
>> may give you that, I'm not sure.
>
> My data is. In fact, this is a simplification of my situation; I'm
> processing satellite data, which usually has more (and other) bands than
> just rgb. But the data is definitely in shape (bands, y, x).

You may find your life becomes easier if you transpose the data in
memory. This can make a big difference to efficiency. Years ago I was
working with enormous (by the standards of the day) MATLAB files on
disk, storing complex data. The way (that version of) MATLAB
represented complex data was the way you describe: matrix of real
parts, matrix of imaginary parts. This meant that to draw a single
pixel, the disk needed to seek twice... depending on what sort of
operations you're doing, transposing your data so that each pixel is
all in one place may improve cache coherency as well as making the use
of record arrays possible.

Anne
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] first recarray steps

2008-05-21 Thread Vincent Schut
Robert Kern wrote:
> On Wed, May 21, 2008 at 2:03 AM, Vincent Schut <[EMAIL PROTECTED]> wrote:
>> Robert Kern wrote:
>>> On Wed, May 21, 2008 at 1:48 AM, Vincent Schut <[EMAIL PROTECTED]> wrote:
 Christopher Barker wrote:
> Also, if you image data is rgb, usually, that's a (width, height, 3)
> array: rgbrgbrgbrgb... in memory. If you have a (3, width, height)
> array, then that's rrr... Some image libs
> may give you that, I'm not sure.
 My data is. In fact, this is a simplification of my situation; I'm
 processing satellite data, which usually has more (and other) bands than
 just rgb. But the data is definitely in shape (bands, y, x).
>>> I don't think record arrays will help you much, then. Individual
>>> records need to be contiguous (bar padding). You can't interleave
>>> them.
>>>
>> Hmm, that was just what I was wondering about, when reading Stefan's
>> reply. So in fact, recarrays aren't just another way to view some data,
>> no matter in what shape it is.
>>
>> So his solution:
>> x.T.reshape((-1,x.shape[0])).view(dt).reshape(x.shape[1:]).T won't work,
>> than. Or, at least, won't give me a view on my original dat, but would
>> give me a recarray with a copy of my data.
> 
> Right.
> 
>> I guess I was misled by this text on the recarray wiki page:
>>
>> "We would like to represent a small colour image. The image is two
>> pixels high and two pixels wide. Each pixel has a red, green and blue
>> colour component, which is represented by a 32-bit floating point number
>> between 0 and 1.
>>
>> Intuitively, we could represent the image as a 3x2x2 array, where the
>> first dimension represents the color, and the last two the pixel
>> positions, i.e. "
>>
>> Note the "3x2x2", which suggested imho that this would work with an
>> image with (bands,y,x) shape, not with (x,y,bands) shape.
> 
> Yes, the tutorial goes on to use record arrays as a view onto an
> (x,y,bands) array and also make a (bands,x,y) view from that, too.
> That is, in fact, quite a confusing presentation of the subject.
> 
> Now, there is a way to use record arrays here; it's a bit ugly but can
> be quite useful when parsing data formats. Each item in the record can
> also be an array. So let's pretend we have a (3,nx,ny) RGB array.
> 
> nbands, nx, ny = a.shape
> dtype = numpy.dtype([
>   ('r', a.dtype, [nx, ny]),
>   ('g', a.dtype, [nx, ny]),
>   ('b', a.dtype, [nx, ny]),
> ])
> 
> # The flatten() is necessary to pre-empt numpy from
> # trying to do too much interpretation of a's shape.
> rec = a.flatten().view(dtype)
> print rec['r']
> print rec['g']
> print rec['b']
> 

Ah, now that is clarifying! Thanks a lot. I'll do some experiments to 
see whether this way of viewing my data is useful to me (in a sense that 
making may code more readable is already very useful).

Cheers,
Vincent.

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] first recarray steps

2008-05-21 Thread Robert Kern
On Wed, May 21, 2008 at 2:03 AM, Vincent Schut <[EMAIL PROTECTED]> wrote:
> Robert Kern wrote:
>> On Wed, May 21, 2008 at 1:48 AM, Vincent Schut <[EMAIL PROTECTED]> wrote:
>>> Christopher Barker wrote:
>>
 Also, if you image data is rgb, usually, that's a (width, height, 3)
 array: rgbrgbrgbrgb... in memory. If you have a (3, width, height)
 array, then that's rrr... Some image libs
 may give you that, I'm not sure.
>>> My data is. In fact, this is a simplification of my situation; I'm
>>> processing satellite data, which usually has more (and other) bands than
>>> just rgb. But the data is definitely in shape (bands, y, x).
>>
>> I don't think record arrays will help you much, then. Individual
>> records need to be contiguous (bar padding). You can't interleave
>> them.
>>
> Hmm, that was just what I was wondering about, when reading Stefan's
> reply. So in fact, recarrays aren't just another way to view some data,
> no matter in what shape it is.
>
> So his solution:
> x.T.reshape((-1,x.shape[0])).view(dt).reshape(x.shape[1:]).T won't work,
> than. Or, at least, won't give me a view on my original dat, but would
> give me a recarray with a copy of my data.

Right.

> I guess I was misled by this text on the recarray wiki page:
>
> "We would like to represent a small colour image. The image is two
> pixels high and two pixels wide. Each pixel has a red, green and blue
> colour component, which is represented by a 32-bit floating point number
> between 0 and 1.
>
> Intuitively, we could represent the image as a 3x2x2 array, where the
> first dimension represents the color, and the last two the pixel
> positions, i.e. "
>
> Note the "3x2x2", which suggested imho that this would work with an
> image with (bands,y,x) shape, not with (x,y,bands) shape.

Yes, the tutorial goes on to use record arrays as a view onto an
(x,y,bands) array and also make a (bands,x,y) view from that, too.
That is, in fact, quite a confusing presentation of the subject.

Now, there is a way to use record arrays here; it's a bit ugly but can
be quite useful when parsing data formats. Each item in the record can
also be an array. So let's pretend we have a (3,nx,ny) RGB array.

nbands, nx, ny = a.shape
dtype = numpy.dtype([
  ('r', a.dtype, [nx, ny]),
  ('g', a.dtype, [nx, ny]),
  ('b', a.dtype, [nx, ny]),
])

# The flatten() is necessary to pre-empt numpy from
# trying to do too much interpretation of a's shape.
rec = a.flatten().view(dtype)
print rec['r']
print rec['g']
print rec['b']

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
 -- Umberto Eco
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] first recarray steps

2008-05-21 Thread Vincent Schut
Robert Kern wrote:
> On Wed, May 21, 2008 at 1:48 AM, Vincent Schut <[EMAIL PROTECTED]> wrote:
>> Christopher Barker wrote:
> 
>>> Also, if you image data is rgb, usually, that's a (width, height, 3)
>>> array: rgbrgbrgbrgb... in memory. If you have a (3, width, height)
>>> array, then that's rrr... Some image libs
>>> may give you that, I'm not sure.
>> My data is. In fact, this is a simplification of my situation; I'm
>> processing satellite data, which usually has more (and other) bands than
>> just rgb. But the data is definitely in shape (bands, y, x).
> 
> I don't think record arrays will help you much, then. Individual
> records need to be contiguous (bar padding). You can't interleave
> them.
> 
Hmm, that was just what I was wondering about, when reading Stefan's 
reply. So in fact, recarrays aren't just another way to view some data, 
no matter in what shape it is.

So his solution: 
x.T.reshape((-1,x.shape[0])).view(dt).reshape(x.shape[1:]).T won't work, 
than. Or, at least, won't give me a view on my original dat, but would 
give me a recarray with a copy of my data.

I guess I was misled by this text on the recarray wiki page:

"We would like to represent a small colour image. The image is two 
pixels high and two pixels wide. Each pixel has a red, green and blue 
colour component, which is represented by a 32-bit floating point number 
between 0 and 1.

Intuitively, we could represent the image as a 3x2x2 array, where the 
first dimension represents the color, and the last two the pixel 
positions, i.e. "

Note the "3x2x2", which suggested imho that this would work with an 
image with (bands,y,x) shape, not with (x,y,bands) shape. But I 
understand that it's not shape, but internal representation in memory 
(contiguous or not, C/Fortran, etc) that matters?

I know I can change the wiki text, but I'm afraid I still don't feel 
confident on this matter...

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] first recarray steps

2008-05-20 Thread Robert Kern
On Wed, May 21, 2008 at 1:48 AM, Vincent Schut <[EMAIL PROTECTED]> wrote:
> Christopher Barker wrote:

>> Also, if you image data is rgb, usually, that's a (width, height, 3)
>> array: rgbrgbrgbrgb... in memory. If you have a (3, width, height)
>> array, then that's rrr... Some image libs
>> may give you that, I'm not sure.
>
> My data is. In fact, this is a simplification of my situation; I'm
> processing satellite data, which usually has more (and other) bands than
> just rgb. But the data is definitely in shape (bands, y, x).

I don't think record arrays will help you much, then. Individual
records need to be contiguous (bar padding). You can't interleave
them.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
 -- Umberto Eco
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] first recarray steps

2008-05-20 Thread Vincent Schut
Christopher Barker wrote:
> 
> Vincent Schut wrote:
>> Lets say I have a rgb image of arbitrary size, as a normal ndarray 
>> (that's what my image reading lib gives me). Thus shape is 
>> (3,ysize,xsize), dtype = int8. How would I convert/view this as a 
>> recarray of shape (ysize, xsize) with the first dimension split up into 
>> 'r', 'g', 'b' fields? No need for 'x' and 'y' fields.
> 
> Take a look in this list for a thread entitled "recarray fun" about a 
> month ago -- you'll find some more discussion of approaches.

Well, actually that thread was my inspiration to take a closer look into 
recarrays...
> 
> Also, if you image data is rgb, usually, that's a (width, height, 3) 
> array: rgbrgbrgbrgb... in memory. If you have a (3, width, height) 
> array, then that's rrr... Some image libs 
> may give you that, I'm not sure.

My data is. In fact, this is a simplification of my situation; I'm 
processing satellite data, which usually has more (and other) bands than 
just rgb. But the data is definitely in shape (bands, y, x).
> 
> Also, you probably want a uint8 dtype, giving you 0-255 for each byte.

Same story. In fact, in this case it's int16, but can actually be any 
data type, even floats, even complex.
But thanks for the thoughts :-)
> 
> -Chris
> 
> 
> 

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] first recarray steps

2008-05-20 Thread Christopher Barker


Vincent Schut wrote:
> Lets say I have a rgb image of arbitrary size, as a normal ndarray 
> (that's what my image reading lib gives me). Thus shape is 
> (3,ysize,xsize), dtype = int8. How would I convert/view this as a 
> recarray of shape (ysize, xsize) with the first dimension split up into 
> 'r', 'g', 'b' fields? No need for 'x' and 'y' fields.

Take a look in this list for a thread entitled "recarray fun" about a 
month ago -- you'll find some more discussion of approaches.

Also, if you image data is rgb, usually, that's a (width, height, 3) 
array: rgbrgbrgbrgb... in memory. If you have a (3, width, height) 
array, then that's rrr... Some image libs 
may give you that, I'm not sure.

Also, you probably want a uint8 dtype, giving you 0-255 for each byte.

-Chris



-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

[EMAIL PROTECTED]
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] first recarray steps

2008-05-20 Thread Stéfan van der Walt
Hi Vincent

2008/5/20 Vincent Schut <[EMAIL PROTECTED]>:
> Hi, I'm trying to get into recarrays. Unfortunately documentation is a
> bit on the short side...
>
> Lets say I have a rgb image of arbitrary size, as a normal ndarray
> (that's what my image reading lib gives me). Thus shape is
> (3,ysize,xsize), dtype = int8. How would I convert/view this as a
> recarray of shape (ysize, xsize) with the first dimension split up into
> 'r', 'g', 'b' fields? No need for 'x' and 'y' fields.

First, you need to flatten the array so you have one (r,g,b) element
per row.  Say you have x with shape (3, 4, 4):

x = x.T.reshape((-1,3))

Then you can view it with your new dtype:

dt = np.dtype([('r',np.int8),('g',np.int8),('b',np.int8)])
x = x.view(dt)

Then you must reshape it back to your original pixel arrangement:

x = x.reshape((4,4)).T

Or you can do it all in one go:

x.T.reshape((-1,x.shape[0])).view(dt).reshape(x.shape[1:]).T

Maybe someone else comes up with an easier way.

Cheers
Stéfan
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] first recarray steps

2008-05-20 Thread Vincent Schut
Hi, I'm trying to get into recarrays. Unfortunately documentation is a 
bit on the short side...

Lets say I have a rgb image of arbitrary size, as a normal ndarray 
(that's what my image reading lib gives me). Thus shape is 
(3,ysize,xsize), dtype = int8. How would I convert/view this as a 
recarray of shape (ysize, xsize) with the first dimension split up into 
'r', 'g', 'b' fields? No need for 'x' and 'y' fields.

I tried creating a numpy dtype {names: ('r','g','b'), formats: 
(numpy.int8,)*3}, but when I try to raw_img.view(rgb_dtype) I get:
"ValueError: new type not compatible with array."

Now this probably should not be too difficult, but I just don't see it...

Thanks,
Vincent.

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion