Re: Memory efficient tuple storage

Kurt Smith Fri, 13 Mar 2009 10:14:26 -0700

On Fri, Mar 13, 2009 at 11:33 AM, Kurt Smith <kwmsm...@gmail.com> wrote:
[snip OP]
>
> Assuming your data is in a plaintext file something like
> 'genomedata.txt' below, the following will load it into a numpy array
> with a customized dtype.  You can access the different fields by name
> ('chromo', 'position', and 'dpoint' -- change to your liking).  Don't
> know if this works or not; might give it a try.


To clarify -- I don't know if this will work for your particular
problem, but I do know that it will read in the array correctly and
cut down on memory usage in the final array size.

Specifically, if you use a dtype with 'S50', 'i4' and 'f8' (see the
numpy dtype docs) -- that's 50 bytes for your chromosome string, 4
bytes for the position and 8 bytes for the data point -- each entry
will use just 50 + 4 + 8 bytes, and the numpy array will have just
enough memory allocated for all of these records.  The datatypes
stored in the array will be a char array for the string, a C int and a
C double; it won't use the corresponding python datatypes which have a
bunch of other memory usage associated with them.

Hope this helps,

Kurt
--
http://mail.python.org/mailman/listinfo/python-list

Re: Memory efficient tuple storage

Reply via email to