Re: numpy question (fairly basic, I think)

2014-12-14 Thread Albert-Jan Roskam


- Original Message -

 From: Steven D'Aprano steve+comp.lang.pyt...@pearwood.info
 To: python-list@python.org
 Cc: 
 Sent: Sunday, December 14, 2014 12:52 AM
 Subject: Re: numpy question (fairly basic, I think)
 
 Albert-Jan Roskam wrote:
 
  Hi,
 
  I am new to numpy. I am reading binary data one record at a time (I have
  to) and I would like to store all the records in a numpy array which I
  pre-allocate. Below I try to fill the empty array with exactly one record,
  but it is filled with as many rows as there are columns. Why is this? It
  is probably something simple, but I am stuck! It is like the original
  record is not unpacked *as in tuple unpacking) into the array, so it
  remains one chunk, not an (nrows, ncols) structure.
 
 Can you simplify the example to something shorter that focuses on the issue
 at hand? It isn't clear to me which bits of the code you show are behaving
 the way you expect and which bits are not.


Hi Steven,

Thanks for replying. My code was so elaborate because I did not know which part 
made it go wrong. I think I have got it already. Numpy arrays (ndarrays) must 
be homogeneous wrt their datatype (dtype). 

Probably to make vectorization work (?). However, a structured array (which I 
was using) *can* contain multiple dtypes, but it can only be one-dimensional. 
Its records are tuples (or arrays). In this sense, even a structured array is 
homogeneous. I was trying to change the one-dim array into a two-dim array so I 
could easily retrieve columns. I now use a pandas DataFrame to do that. If my 
sample data would have contained *only* floats (or ints, or ...), my original 
approach would have worked. 


Thanks!

Albert-Jan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: numpy question (fairly basic, I think)

2014-12-14 Thread Gregory Ewing

Albert-Jan Roskam wrote:

I was trying to change the one-dim array into a two-dim array so
I could easily retrieve columns. I now use a pandas DataFrame to do that.


Numpy can do that, if I understand what you want correctly,
but it requires an unintuitive trick.

The trick is to index the array with the name of a field,
and *only* the name of the field. For example, the following
gives you a slice containing all the values from the 'v02'
fields of your data:

  array['v02']

Note: Indexing with a field name generally seems to follow
a different set of rules. Some things that you'd think would
work don't, e.g.

   array[:, 'v02']

fails with a rather confusing error message. Also,

   array[0, 'v02']

doesn't work either, and has to be written

   array[0]['v02']

instead.

--
Greg
--
https://mail.python.org/mailman/listinfo/python-list


numpy question (fairly basic, I think)

2014-12-13 Thread Albert-Jan Roskam
Hi,

I am new to numpy. I am reading binary data one record at a time (I have to) 
and I would like to store all the records in a numpy array which I 
pre-allocate. Below I try to fill the empty array with exactly one record, but 
it is filled with as many rows as there are columns. Why is this? It is 
probably something simple, but I am stuck! It is like the original record is 
not unpacked *as in tuple unpacking) into the array, so it remains one chunk, 
not an (nrows, ncols) structure.


from __future__ import print_function
import numpy as np


# one binary record
s = 
'\x00\x00\x00\x00\x00\x00\xf0?\x00\x00\x00\x00\x00\x00;@\x00\x00\x00\x00\x00\x00\xf0?\x00\x00\x00\x00\x80U\xe1@\x00\x00\x00\x00\x80\xd9\xe4@\x00\x00\x00\x00@\xa7\xe3@\xab\xaa\xaa\xaajG\xe3@\x00\x00\x00\x00\x80\xd9\xe4@\x00\x00\x00\x00\x00\x00;@\x00\x00\x00\x00\x00\x00;@\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\xf0?\x00\x00\x00\x00\x00\x00\xf0?\x00\x00\x00\x00\x00\x00\xf0?\x00\x00\x00\xa4DI\tBx
   qwertyuiopasdfghjklzxcvbnm,./
   
\x00\x00\x00\x00\x00\x00\xf0?\x00\x00\x00\x00\x00p\x9f@\x00\x00\x00\x00\x00\x00\x10@\x00\x00\x00\x00\x00\x00(@DEC
 2012'


# read it into a structured array
formats = ['d', 'd', 'd', 'd', 'd', 'd', 'd', 'd', 'd', 'd', 'd', 
'd', 'd', 'd', 'd', 'a8', 'a104', 'd', 'd', 'd', 'd', 'a8']
names = [v%02d % i for i in range(len(formats))]
dt = np.dtype({'formats': formats, names: names})
record = np.fromstring(s, dtype=dt)


# make it more compact
trunc_formats = ['f4', 'f4', 'f4', 'f4', 'f4', 'f4', 'f4', 'f4', 'f4', 'f4', 
'f4', 'f4', 'f4', 'f4', 'f4', 'a1', 'a100', 'f4', 'f4', 'f4', 'f4', 'a8']
trunc_dt = np.dtype({'formats': trunc_formats, names: names})
record = record.astype(trunc_dt)
print(record.shape)  # (1,), but it needs to be (1, 22)??
#record = np.asarray(*tuple(record.astype(trunc_dt)), dtype=np.object)
#record = np.expand_dims(record, axis=0)


# initialize an empty array and fill it with one record

nrows = 50  # arbitrary number
ncols = len(formats) # 22

array = np.zeros((nrows, ncols), trunc_dt)
array[0,:] = record
print(array)

# output: why is the record repeated ncols times?
[[ (1.0, 27.0, 1.0, 35500.0, 42700.0, 40250.0, 39483.33203125, 42700.0, 27.0, 
27.0, 0.0, 1.0, 1.0, 1.0, 13575427072.0, 'x', 'qwertyuiopasdfghjklzxcvbnm,./
   ', 1.0, 
2012.0, 4.0, 12.0, 'DEC 2012')
(1.0, 27.0, 1.0, 35500.0, 42700.0, 40250.0, 39483.33203125, 42700.0, 27.0, 
27.0, 0.0, 1.0, 1.0, 1.0, 13575427072.0, 'x', 'qwertyuiopasdfghjklzxcvbnm,./
   ', 1.0, 
2012.0, 4.0, 12.0, 'DEC 2012')
(1.0, 27.0, 1.0, 35500.0, 42700.0, 40250.0, 39483.33203125, 42700.0, 27.0, 
27.0, 0.0, 1.0, 1.0, 1.0, 13575427072.0, 'x', 'qwertyuiopasdfghjklzxcvbnm,./
   ', 1.0, 
2012.0, 4.0, 12.0, 'DEC 2012')
...,
(1.0, 27.0, 1.0, 35500.0, 42700.0, 40250.0, 39483.33203125, 42700.0, 27.0, 
27.0, 0.0, 1.0, 1.0, 1.0, 13575427072.0, 'x', 'qwertyuiopasdfghjklzxcvbnm,./
   ', 1.0, 
2012.0, 4.0, 12.0, 'DEC 2012')
(1.0, 27.0, 1.0, 35500.0, 42700.0, 40250.0, 39483.33203125, 42700.0, 27.0, 
27.0, 0.0, 1.0, 1.0, 1.0, 13575427072.0, 'x', 'qwertyuiopasdfghjklzxcvbnm,./
   ', 1.0, 
2012.0, 4.0, 12.0, 'DEC 2012')
(1.0, 27.0, 1.0, 35500.0, 42700.0, 40250.0, 39483.33203125, 42700.0, 27.0, 
27.0, 0.0, 1.0, 1.0, 1.0, 13575427072.0, 'x', 'qwertyuiopasdfghjklzxcvbnm,./
   ', 1.0, 
2012.0, 4.0, 12.0, 'DEC 2012')]
[ (0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 
'', '', 0.0, 0.0, 0.0, 0.0, '')
(0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, '', 
'', 0.0, 0.0, 0.0, 0.0, '')
(0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, '', 
'', 0.0, 0.0, 0.0, 0.0, '')
...,
(0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, '', 
'', 0.0, 0.0, 0.0, 0.0, '')
(0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, '', 
'', 0.0, 0.0, 0.0, 0.0, '')
(0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, '', 
'', 0.0, 0.0, 0.0, 0.0, '')]
etc
etc

 
Thank you in advance!


Regards,

Albert-Jan




~~

All right, but apart from the sanitation, the medicine, education, wine, public 
order, irrigation, roads, a 

fresh water system, and public health, what have the Romans ever done for us?

~~ 
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: numpy question (fairly basic, I think)

2014-12-13 Thread Steven D'Aprano
Albert-Jan Roskam wrote:

 Hi,
 
 I am new to numpy. I am reading binary data one record at a time (I have
 to) and I would like to store all the records in a numpy array which I
 pre-allocate. Below I try to fill the empty array with exactly one record,
 but it is filled with as many rows as there are columns. Why is this? It
 is probably something simple, but I am stuck! It is like the original
 record is not unpacked *as in tuple unpacking) into the array, so it
 remains one chunk, not an (nrows, ncols) structure.

Can you simplify the example to something shorter that focuses on the issue
at hand? It isn't clear to me which bits of the code you show are behaving
the way you expect and which bits are not.


To get you started, here is what I got working:


import numpy as np

# one binary record
s = '\x00\x01\x00\xff'*2  # eight bytes makes one C double
# read it into a structured array
formats = ['d']
names = [v%02d % i for i in range(len(formats))]
dt = np.dtype({'formats': formats, names: names})
record = np.fromstring(s, dtype=dt)


which gives this for record:

array([(-5.4874686660912e+303,)],
  dtype=[('v00', 'f8')])


Is that what you expected? If not, what did you expect?

Now modify the example the *least amount possible* to demonstrate the issue.



-- 
Steven

-- 
https://mail.python.org/mailman/listinfo/python-list