On Tue, Sep 8, 2009 at 12:53 PM, Christopher Barker <chris.bar...@noaa.gov> wrote: > Skipper Seabold wrote: >> Hmm, okay, well I came across this in trying to create a recarray like >> data2 below, so I guess I should just combine the two questions. > > key to understanding this is to understand what is going on under the > hood in numpy. Travis O. gave a nice intro in an Enthought webcast a few > months ago -- I"m not sure if those are recorded and up on the web, but > it's worth a look. It was also discussed int eh advanced numpy tutorial > at SciPy this year -- and that is up on the web: > > http://www.archive.org/details/scipy09_advancedTutorialDay1_1 >
Thanks. I wasn't able to watch the Enthought webcasts on Linux, but I've seen a few of the video tutorials. What a great resource. I'm really glad this came together. > > Anyway, here is my minimal attempt to clarify: > >> import numpy as np >> >> data = np.array([[10.75, 1, 1],[10.39, 0, 1],[18.18, 0, 1]]) > > here we are using a standard array constructor -- it will look at the > data you are passing in (a mixture of python floats and ints), and > decide that they can best be represented by a numpy array of float64s. > > numpy arrays are essentially a pointer to a black of memory, and a bunch > of attributes that describe how the bytes pointed to are to be > interpreted. In this case, they are a 9 C doubles, representing a 3x3 > array of doubles. > >> dt = np.dtype([('var1', '<f8'), ('var2', '<i8'), ('var3', '<i8')]) > > (NOTE: I'm on a big-endian machine, so I've used: > dt = np.dtype([('var1', '>f8'), ('var2', '>i8'), ('var3', '>i8')]) > ) > > This is a data type descriptor that is analogous to a C struct, > containing a float64 and two int84s > >> # Doesn't work, raises TypeError: expected a readable buffer object >> data2 = data2.view(np.recarray) >> data2.astype(dt) > > I'm don't understand that error either, but recarrays are about adding > the ability to access parts of a structured array by name, but you still > need the dtype to specify the types and names. This does seem to work > (though may not be giving the results you expect): > > In [19]: data2 = data.copy() > In [20]: data2 = data2.view(np.recarray) > In [21]: data2 = data2.view(dtype=dt) > > or, indeed in the opposite order: > > In [24]: data2 = data.copy() > In [25]: data2 = data2.view(dtype=dt) > In [26]: data2 = data2.view(np.recarray) > > > So you've done two operations, one is to change the dtype -- the > interpretation of the bytes in the data buffer, and one is to make this > a recarray, which allows you to access the "fields" by name: > > In [31]: data2['var1'] > Out[31]: > array([[ 10.75], > [ 10.39], > [ 18.18]]) > >> # Works without error (?) with unexpected result >> data3 = data3.view(np.recarray) >> data3.dtype = dt > > that all depends what you expect! I used "view" above, 'cause I think > there is less magic, though it's the same thing. I suppose changing the > dtype in place like that is a tiny bit more efficient -- if you use > .view() , you are creating a new array pointing to the same data, rather > than changing the array in place. > > But anyway, the dtype describes how the bytes in the memory black are to > be interpreted, changing it by assigning the attribute or using .view() > changes the interpretation, but does not change the bytes themselves at > all, so in this case, you are taking the 8 bytes representing a float64 > of value: 1.0, and interpreting those bytes as an 8 byte int -- which is > going to give you garbage, essentially. > >> # One correct (though IMHO) unintuitive way >> data = np.rec.fromarrays(data.swapaxes(1,0), dtype=dt) > > This is using the np.rec.fromarrays constructor to build a new record > array with the dtype you want, the data is being converted and copied, > it won't change the original at all: > > So the question remains -- is there a way to convert the floats in > "data" to ints in place? > Ah, ok. I understand roughly the above. But, yes, this is my question. > > This seems to work: > In [78]: data = np.array([[10.75, 1, 1],[10.39, 0, 1],[18.18, 0, 1]]) > > In [79]: data[:,1:3] = data[:,1:3].astype('>i8').view(dtype='>f8') > > In [80]: data.dtype = dt > > It is making a copy of the integer data in process -- but I think that > is required, as you are changing the value, not just the interpretation > of the bytes. I suppose we could have a "astype_inplace" method, but > that would only work if the two types were the same size, and I'm not > sure it's a common enough use to be worth it. > > What is your real use case? I suspect that what you really should do > here is define your dtype first, then create the array of data: > I have a function that eventually appends an ndarray of floats that are 0 to 1 to a recarray, and I ran into it trying to debug. Then I was just curious about the modification in place. > data = np.array([(10.75, 1, 1), (10.39, 0, 1), (18.18, 0, 1)], dtype=dt) > > which does require that you use tuples, rather than lists to hold the > "structs". > Ah yes, I have had a bit of trouble extending my same function to structured arrays, but that's another thread if I can't figure it out. Thanks for the help. Cheers, Skipper _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion