On Feb 5, 2:48 pm, MRAB <goo...@mrabarnett.plus.com> wrote: > Lionel wrote: > > > Hello, > > I have data stored in binary files. Some of these files are > > huge...upwards of 2 gigs or more. They consist of 32-bit float complex > > numbers where the first 32 bits of the file is the real component, the > > second 32bits is the imaginary, the 3rd 32-bits is the real component > > of the second number, etc. > > > > I'd like to be able to read in just the real components, load them > > into a numpy.ndarray, then load the imaginary coponents and load them > > into a numpy.ndarray. I need the real and imaginary components stored > > in seperate arrays, they cannot be in a single array of complex > > numbers except for temporarily. I'm trying to avoid temporary storage, > > though, because of the size of the files. > > > > I'm currently reading the file scanline-by-scanline to extract rows of > > complex numbers which I then loop over and load into the real/ > > imaginary arrays as follows: > > > > > > self._realData = numpy.empty((Rows, Columns), dtype = > > numpy.float32) > > self._imaginaryData = numpy.empty((Rows, Columns), dtype = > > numpy.float32) > > > > floatData = array.array('f') > > > > for CurrentRow in range(Rows): > > > > floatData.fromfile(DataFH, (Columns*2)) > > > > position = 0 > > for CurrentColumn in range(Columns): > > > > self._realData[CurrentRow, CurrentColumn] = > > floatData[position] > > self._imaginaryData[CurrentRow, CurrentColumn] = > > floatData[position+1] > > position = position + 2 > > > > > > The above code works but is much too slow. If I comment out the body > > of the "for CurrentColumn in range(Columns)" loop, the performance is > > perfectly adequate i.e. function call overhead associated with the > > "fromfile(...)" call is not very bad at all. What seems to be most > > time-consuming are the simple assignment statements in the > > "CurrentColumn" for-loop. > > > [snip] > Try array slicing. floatData[0::2] will return the real parts and > floatData[1::2] will return the imaginary parts. You'll have to read up > how to assign to a slice of the numpy array (it might be > "self._realData[CurrentRow] = real_parts" or "self._realData[CurrentRow, > :] = real_parts"). > > BTW, it's not the function call overhead of fromfile() which takes the > time, but actually reading data from the file.
Very nice! I like that! I'll post the improvement (if any). L -- http://mail.python.org/mailman/listinfo/python-list