Anand Patil (el 2007-10-31 a les 17:53:17 -0700) va dir::
> I have a file full of 32-bit floats, in binary format, compressed with zip.
> I'd like to get it into a PyTables array, but this:
>
> Z = ZipFile('data_file.zip')
> binary_data = Z.read('data_file')
> numpy_array = numpy.fromstring(data, dtype=float32)
> h5file.createArray('/', 'data', numpy_array)
>
> won't work because I don't have enough memory for the intermediate stages.
> Is there an easy way to do this piece-by-piece or in a 'streaming' fashion?
First of all I'd avoid using an ``Array`` object for storing such a big
array. ``CArray`` or ``EArray`` objects are more suited for that, since
they are chunked so they are a lot more memory-efficient. Both allow
you to store your data little by little, since disk space is only
allocated for a chunk when really needed. The first ones have a fixed
shape, while the second ones are enlargeable.
I guess the big obstacle would be to extract data from the zip file
incrementally. Since the ``ZipFile`` interface doesn't allow this, you
may unzip ``data_file`` to disk, then open it and read chunks of data
from it. Something like this:
nptype = numpy.float32
atom = tables.Atom.from_sctype(nptype)
extract data_file from data_file.zip (e.g. with subprocess)
total_rows = size of data_file / atom.itemsize (e.g. with stat)
array = h5file.createCArray( '/', 'data', atom,
shape=(total_rows,) )
# or
array = h5file.createEArray( '/', 'data', atom,
shape=(0,), expectedrows=total_rows )
# We will be reading blocks as big as a chunk.
rows_to_read = array.chunkshape[0]
bytes_to_read = rows_to_read * atom.itemsize
dfile = open('data_file', 'b')
data = dfile.read(bytes_to_read)
base = 0 # only for CArray
while data:
arr = numpy.fromstring(data, dtype=nptype)
# CArray case
array[base:base+len(arr)] = arr
base += len(arr)
# EArray case
array.append(arr)
data = dfile.read(bytes_to_read)
array.flush()
dfile.close()
This is untested, but I hope you get the idea.
Cheers,
::
Ivan Vilata i Balaguer >qo< http://www.carabos.com/
Cárabos Coop. V. V V Enjoy Data
""
signature.asc
Description: Digital signature
------------------------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Still grepping through log files to find problems? Stop. Now Search log events and configuration files using AJAX and a browser. Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________ Pytables-users mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/pytables-users
