On Fri, Mar 13, 2009 at 11:33 AM, Kurt Smith <[email protected]> wrote: [snip OP] > > Assuming your data is in a plaintext file something like > 'genomedata.txt' below, the following will load it into a numpy array > with a customized dtype. You can access the different fields by name > ('chromo', 'position', and 'dpoint' -- change to your liking). Don't > know if this works or not; might give it a try.
To clarify -- I don't know if this will work for your particular problem, but I do know that it will read in the array correctly and cut down on memory usage in the final array size. Specifically, if you use a dtype with 'S50', 'i4' and 'f8' (see the numpy dtype docs) -- that's 50 bytes for your chromosome string, 4 bytes for the position and 8 bytes for the data point -- each entry will use just 50 + 4 + 8 bytes, and the numpy array will have just enough memory allocated for all of these records. The datatypes stored in the array will be a char array for the string, a C int and a C double; it won't use the corresponding python datatypes which have a bunch of other memory usage associated with them. Hope this helps, Kurt -- http://mail.python.org/mailman/listinfo/python-list
