Anne Archibald wrote: > 2008/5/21 Vincent Schut <[EMAIL PROTECTED]>: >> Christopher Barker wrote: >>> Also, if you image data is rgb, usually, that's a (width, height, 3) >>> array: rgbrgbrgbrgb... in memory. If you have a (3, width, height) >>> array, then that's rrrrrrr....gggggggg......bbbbbbbb. Some image libs >>> may give you that, I'm not sure. >> My data is. In fact, this is a simplification of my situation; I'm >> processing satellite data, which usually has more (and other) bands than >> just rgb. But the data is definitely in shape (bands, y, x). > > You may find your life becomes easier if you transpose the data in > memory. This can make a big difference to efficiency. Years ago I was > working with enormous (by the standards of the day) MATLAB files on > disk, storing complex data. The way (that version of) MATLAB > represented complex data was the way you describe: matrix of real > parts, matrix of imaginary parts. This meant that to draw a single > pixel, the disk needed to seek twice... depending on what sort of > operations you're doing, transposing your data so that each pixel is > all in one place may improve cache coherency as well as making the use > of record arrays possible. > > Anne
Anne, thanks for the thoughts. In most cases, you'll probably be right. In this case, however, it won't give me much (if any) speedup, maybe even slowdown. Satellite images often are stored on disk in a band sequential manner. The library I use for IO is GDAL, which is a higly optimized c library for reading/writing almost any kind of satellite data type. It also features an internal caching mechanism. And it gives me my data as (y, x, bands). I'm not reading single pixels anyway. The amounts of data I have to process (enormous, even by the standards of today ;-)) require me to do this in chunks, in parallel, even on different cores/cpu's/computers. Every chunk usually is (chunkYSize, chunkXSize, allBands) with xsize and ysize being not so small (think from 64^2 to 1024^2) so that pretty much eliminates any performance issues regarding the data on disk. Furthermore, having to process on multiple computers forces me to have my data on networked storage. The latency and transfer rate of the network will probably eliminate any small speedup because my drive has to do less seeks... Now for the recarray part, that would indeed ease my life a bit :) However, having to transpose the data in memory on every read and write does not sound very attractive. It will spoil cycles, and memory, and be asking for bugs. I can live without recarrays, for sure. I only hoped they might make my live a bit easier and my code a bit more readable, without too much effort. Well, they won't, apparently... I'll just go on like I did before this little excercise. Thanks all for the inputs. Cheers, Vincent. _______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion