> Anand Patil (el 2007-10-31 a les 17:53:17 -0700) va dir::
>
> > I have a file full of 32-bit floats, in binary format, compressed with
> zip.
> > I'd like to get it into a PyTables array, but this:
> >
> > Z = ZipFile('data_file.zip')
> > binary_data = Z.read('data_file')
> > numpy_array = numpy.fromstring(data, dtype=float32)
> > h5file.createArray('/', 'data', numpy_array)
> >
> > won't work because I don't have enough memory for the intermediate
> stages.
> > Is there an easy way to do this piece-by-piece or in a 'streaming'
> fashion?
>
> First of all I'd avoid using an ``Array`` object for storing such a big
> array. ``CArray`` or ``EArray`` objects are more suited for that, since
> they are chunked so they are a lot more memory-efficient. Both allow
> you to store your data little by little, since disk space is only
> allocated for a chunk when really needed. The first ones have a fixed
> shape, while the second ones are enlargeable.
>
> I guess the big obstacle would be to extract data from the zip file
> incrementally. Since the ``ZipFile`` interface doesn't allow this, you
> may unzip ``data_file`` to disk, then open it and read chunks of data
> from it. Something like this:
>
> nptype = numpy.float32
> atom = tables.Atom.from_sctype(nptype)
>
> extract data_file from data_file.zip (e.g. with subprocess)
> total_rows = size of data_file / atom.itemsize (e.g. with stat)
>
> array = h5file.createCArray( '/', 'data', atom,
> shape=(total_rows,) )
> # or
> array = h5file.createEArray( '/', 'data', atom,
> shape=(0,), expectedrows=total_rows )
> # We will be reading blocks as big as a chunk.
> rows_to_read = array.chunkshape[0]
> bytes_to_read = rows_to_read * atom.itemsize
>
> dfile = open('data_file', 'b')
> data = dfile.read(bytes_to_read)
> base = 0 # only for CArray
> while data:
> arr = numpy.fromstring(data, dtype=nptype)
> # CArray case
> array[base:base+len(arr)] = arr
> base += len(arr)
> # EArray case
> array.append(arr)
> data = dfile.read(bytes_to_read)
> array.flush()
> dfile.close()
>
> This is untested, but I hope you get the idea.
>
> Cheers,
>
> ::
>
> Ivan Vilata i Balaguer >qo< http://www.carabos.com/
> C?rabos Coop. V. V V Enjoy Data
Got it, thanks!
-------------------------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc.
Still grepping through log files to find problems? Stop.
Now Search log events and configuration files using AJAX and a browser.
Download your FREE copy of Splunk now >> http://get.splunk.com/
_______________________________________________
Pytables-users mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/pytables-users