On Feb 5, 2:22 pm, Lionel <lionel.ke...@gmail.com> wrote: > Hello, > I have data stored in binary files. Some of these files are > huge...upwards of 2 gigs or more. They consist of 32-bit float complex > numbers where the first 32 bits of the file is the real component, the > second 32bits is the imaginary, the 3rd 32-bits is the real component > of the second number, etc. > > I'd like to be able to read in just the real components, load them > into a numpy.ndarray, then load the imaginary coponents and load them > into a numpy.ndarray. I need the real and imaginary components stored > in seperate arrays, they cannot be in a single array of complex > numbers except for temporarily. I'm trying to avoid temporary storage, > though, because of the size of the files. > > I'm currently reading the file scanline-by-scanline to extract rows of > complex numbers which I then loop over and load into the real/ > imaginary arrays as follows: > > self._realData = numpy.empty((Rows, Columns), dtype = > numpy.float32) > self._imaginaryData = numpy.empty((Rows, Columns), dtype = > numpy.float32) > > floatData = array.array('f') > > for CurrentRow in range(Rows): > > floatData.fromfile(DataFH, (Columns*2)) > > position = 0 > for CurrentColumn in range(Columns): > > self._realData[CurrentRow, CurrentColumn] = > floatData[position] > self._imaginaryData[CurrentRow, CurrentColumn] = > floatData[position+1] > position = position + 2 > > The above code works but is much too slow. If I comment out the body > of the "for CurrentColumn in range(Columns)" loop, the performance is > perfectly adequate i.e. function call overhead associated with the > "fromfile(...)" call is not very bad at all. What seems to be most > time-consuming are the simple assignment statements in the > "CurrentColumn" for-loop. > > Does anyone see any ways of speeding this up at all? Reading > everything into a complex64 ndarray in one fell swoop would certainly > be easier and faster, but at some point I'll need to split this array > into two parts (real / imaginary). I'd like to have that done > initially to keep the memory usage down since the files are so > ginormous. > > Psyco is out because I need 64-bits, and I didn't see anything on the > forums regarding a method that reads in every other 32-bit chunk form > a file into an array. I'm not sure what else to try. > > Thanks in advance. > L
Hmmm...I've just discovered "weave.inline()". Maybe I'll just do the assignments in C. Still soliciting advice, of course. :-) -- http://mail.python.org/mailman/listinfo/python-list