Hi,

what about pytables? It's built for big data collections and it doesn't clog up the memory.

Am 17.01.2011 23:54, schrieb Dan Stromberg:
On Mon, Jan 17, 2011 at 2:20 PM, Jake Biesinger
<jake.biesin...@gmail.com>  wrote:
Hi all,

Using numpy, I can create large 2-dimensional arrays quite easily.
import numpy
mylist = numpy.zeros((100000000,2), dtype=numpy.int32)
Unfortunately, my target audience may not have numpy so I'd prefer not to use 
it.

Similarly, a list-of-tuples using standard python syntax.
mylist = [(0,0) for i in xrange(100000000)
but this method uses way too much memory (>4GB for 100 million items, compared 
to 1.5GB for numpy method).

Since I want to keep the two elements together during a sort, I *can't* use 
array.array.
mylist = [array.array('i',xrange(100000000)), 
array.array('i',xrange(100000000))]
If I knew the size in advance, I could use ctypes arrays.
from ctypes import *
class myStruct(Structure):
     _fields_ = [('x',c_int),('y',c_int)]
mylist_type = myStruct * 100000000
mylist = mylist_type()
but I don't know that size (and it can vary between 1 million-200 million), so 
preallocating doesn't seem to be an option.

Is there a python standard library way of creating *efficient* 2-dimensional 
lists/arrays, still allowing me to sort and append?

Thanks!
--
http://mail.python.org/mailman/listinfo/python-list

I recently had need of a two dimensional array also, but I only needed
small ones, so I just used a dictionary indexed by tuples.

If you need to sort a row as an aggregate type, it seems that you
probably either want to:
1) Use a list of lists, where the outer list is the easier one to sort
by - this is analogous to the array of pointers in C - sorting by the
outer would want to compare "elements" by looking at the 0th inner,
then 1st inner, etc.  So if you can organize things this way, things
might be pretty easy.
2) Use a list of instances, where the instances (rows) might just include a list
3) Use array.array with a custom sort so that you can move multiple
things when a more common sort would move "one" (possibly an
aggregate).

If you want some sorting code to start from for #3, I have a variety
of sorts written in Python (pure python or cython, your option, each
automatically derived from the same m4 file for a given sort) at
http://stromberg.dnsalias.org/svn/sorts/compare/trunk/ - however, even
the cython timsort provided there doesn't perform quite as well as the
standard library's timsort.

HTH

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to