Re: Efficient python 2-d arrays?
Thanks for the great suggestions! On a related note, is there no way to efficiently sort a python array? x = array('f', xrange(1000)) x.sort() --- AttributeErrorTraceback (most recent call last) /home/wbiesing/src/python-sort/ipython console in module() AttributeError: 'array.array' object has no attribute 'sort' type(sorted(x)) type 'list' Seems like a common enough task, but no way to do it efficiently without scipy/numpy. -- http://mail.python.org/mailman/listinfo/python-list
Re: Efficient python 2-d arrays?
On 1/20/11 1:36 PM, Jake Biesinger wrote: Thanks for the great suggestions! On a related note, is there no way to efficiently sort a python array? No. x = array('f', xrange(1000)) x.sort() --- AttributeErrorTraceback (most recent call last) /home/wbiesing/src/python-sort/ipython console inmodule() AttributeError: 'array.array' object has no attribute 'sort' type(sorted(x)) type 'list' Seems like a common enough task, but no way to do it efficiently without scipy/numpy. It's not really common. The array module is mostly used for data accumulation and interchange. Most people who need to compute things over efficiently-implemented homogeneous arrays use numpy. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco -- http://mail.python.org/mailman/listinfo/python-list
Re: Efficient python 2-d arrays?
On Thu, Jan 20, 2011 at 11:36 AM, Jake Biesinger jake.biesin...@gmail.com wrote: Thanks for the great suggestions! On a related note, is there no way to efficiently sort a python array? x = array('f', xrange(1000)) x.sort() --- AttributeError Traceback (most recent call last) /home/wbiesing/src/python-sort/ipython console in module() AttributeError: 'array.array' object has no attribute 'sort' type(sorted(x)) type 'list' Seems like a common enough task, but no way to do it efficiently without scipy/numpy. Tap, tap: Is this thing on? I keep sending messages to this list, but no one seems to hear what I'm writing. Anyway, I have a bunch of sorts in python (pure or cython, your option) at http://stromberg.dnsalias.org/cgi-bin/viewvc.cgi/sorts/compare/trunk/?root=svn The pure python versions should all work out of the box with an array.array. I tested the timsort on an array.array, and it worked well. Also, I just made a trivial modification (not checked in! ...but I did test it) to make the cythonized timsort support sorting arrays - just change the list type declarations to object (or perhaps array.array) prior to compiling. I also just changed the license on them from GPL v3 to Apache v2. timsort truly is a somewhat supernatural algorithm. -- http://mail.python.org/mailman/listinfo/python-list
Re: Efficient python 2-d arrays?
On Thu, Jan 20, 2011 at 12:48 PM, Dan Stromberg drsali...@gmail.com wrote: On Thu, Jan 20, 2011 at 11:36 AM, Jake Biesinger jake.biesin...@gmail.com wrote: Thanks for the great suggestions! On a related note, is there no way to efficiently sort a python array? x = array('f', xrange(1000)) x.sort() --- AttributeError Traceback (most recent call last) /home/wbiesing/src/python-sort/ipython console in module() AttributeError: 'array.array' object has no attribute 'sort' type(sorted(x)) type 'list' Seems like a common enough task, but no way to do it efficiently without scipy/numpy. Tap, tap: Is this thing on? I keep sending messages to this list, but no one seems to hear what I'm writing. Hey Dan, indeed thanks for the code; I pulled it down and played around with timsort a while ago. Timsort is indeed a beast. My current solution is to do an argsort. Since I can't roll with your cython implementations (pure python *only*), this seems a bit faster than sorting the original two lists using a modified pure-python timsort. Something along the lines of: args = sorted(range(len(seq1)), key=seq1.__getitem__) seq1 = [seq1[i] for i in args] seq2 = [seq2[i] for i in args] but the method suffers from a fairly high memory usage (big list of int's for args). If this barrier is too high, I will be switching over to your pure-python timsort. Anyway, I have a bunch of sorts in python (pure or cython, your option) at http://stromberg.dnsalias.org/cgi-bin/viewvc.cgi/sorts/compare/trunk/?root=svn The pure python versions should all work out of the box with an array.array. I tested the timsort on an array.array, and it worked well. your pylint runs fail for me, saying you should use disable, not disable-msg. pylint 0.23.0, astng 0.21.1, common 0.54.0 Python 2.6.6 (r266:84292, Sep 15 2010, 16:22:56) [GCC 4.4.5] Also, I just made a trivial modification (not checked in! ...but I did test it) to make the cythonized timsort support sorting arrays - just change the list type declarations to object (or perhaps array.array) prior to compiling. I'm not very familiar with cython syntax-- how do you bring in the array module? cimport array? import array? I see you need to change cdef list to cdef array.array or something like that. Perhaps you could send a diff? Thanks! -- http://mail.python.org/mailman/listinfo/python-list
Re: Efficient python 2-d arrays?
On Thu, Jan 20, 2011 at 4:07 PM, Jacob Biesinger jake.biesin...@gmail.com wrote: On Thu, Jan 20, 2011 at 12:48 PM, Dan Stromberg drsali...@gmail.com wrote: On Thu, Jan 20, 2011 at 11:36 AM, Jake Biesinger jake.biesin...@gmail.com wrote: Thanks for the great suggestions! On a related note, is there no way to efficiently sort a python array? x = array('f', xrange(1000)) x.sort() --- AttributeError Traceback (most recent call last) /home/wbiesing/src/python-sort/ipython console in module() AttributeError: 'array.array' object has no attribute 'sort' type(sorted(x)) type 'list' Seems like a common enough task, but no way to do it efficiently without scipy/numpy. Tap, tap: Is this thing on? I keep sending messages to this list, but no one seems to hear what I'm writing. Hey Dan, indeed thanks for the code; I pulled it down and played around with timsort a while ago. Timsort is indeed a beast. My current solution is to do an argsort. Since I can't roll with your cython implementations (pure python *only*), this seems a bit faster than sorting the original two lists using a modified pure-python timsort. Something along the lines of: args = sorted(range(len(seq1)), key=seq1.__getitem__) seq1 = [seq1[i] for i in args] seq2 = [seq2[i] for i in args] but the method suffers from a fairly high memory usage (big list of int's for args). If this barrier is too high, I will be switching over to your pure-python timsort. Oh, cool. I guess I'm not a ghost after all. :) And an argsort sounds like a nice solution. Anyway, I have a bunch of sorts in python (pure or cython, your option) at http://stromberg.dnsalias.org/cgi-bin/viewvc.cgi/sorts/compare/trunk/?root=svn The pure python versions should all work out of the box with an array.array. I tested the timsort on an array.array, and it worked well. your pylint runs fail for me, saying you should use disable, not disable-msg. pylint 0.23.0, astng 0.21.1, common 0.54.0 Python 2.6.6 (r266:84292, Sep 15 2010, 16:22:56) [GCC 4.4.5] Earlier today I updated the code to deal with newer pylints - including the disable vs disable-msg thing. If you svn update, you should get these changes. Also, I just made a trivial modification (not checked in! ...but I did test it) to make the cythonized timsort support sorting arrays - just change the list type declarations to object (or perhaps array.array) prior to compiling. I'm not very familiar with cython syntax-- how do you bring in the array module? cimport array? import array? I see you need to change cdef list to cdef array.array or something like that. Perhaps you could send a diff? http://stromberg.dnsalias.org/svn/sorts/compare/branches/arrays now has a version of timsort that expects objects (IOW, about anything, duck typed) instead of lists. I tried briefly to declare some of the types to be array.array, but perhaps Cython doesn't like that - I was getting compile-time errors from it. HTH. -- http://mail.python.org/mailman/listinfo/python-list
Re: Efficient python 2-d arrays?
On Jan 17, 10:20 pm, Jake Biesinger jake.biesin...@gmail.com wrote: Hi all, Using numpy, I can create large 2-dimensional arrays quite easily. import numpy mylist = numpy.zeros((1,2), dtype=numpy.int32) Unfortunately, my target audience may not have numpy so I'd prefer not to use it. Similarly, a list-of-tuples using standard python syntax. mylist = [(0,0) for i in xrange(1) but this method uses way too much memory (4GB for 100 million items, compared to 1.5GB for numpy method). Since I want to keep the two elements together during a sort, I *can't* use array.array. mylist = [array.array('i',xrange(1)), array.array('i',xrange(1))] If I knew the size in advance, I could use ctypes arrays. from ctypes import * class myStruct(Structure): _fields_ = [('x',c_int),('y',c_int)] mylist_type = myStruct * 1 mylist = mylist_type() but I don't know that size (and it can vary between 1 million-200 million), so preallocating doesn't seem to be an option. Is there a python standard library way of creating *efficient* 2-dimensional lists/arrays, still allowing me to sort and append? Thanks! Since you can't depend on your users installing the dependencies, is it vital that your users run from source? You could bundle up your application along with numpy and other dependencies using py2Exe or similar. This also means you wouldn't have to require users to have the right (or any) version of Python installed. -- http://mail.python.org/mailman/listinfo/python-list
Re: Efficient python 2-d arrays?
On 1/17/11 7:32 PM, Jake Biesinger wrote: On Monday, January 17, 2011 4:12:51 PM UTC-8, OAN wrote: Hi, what about pytables? It's built for big data collections and it doesn't clog up the memory. I thought PyTables depends on NumPy? It does. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco -- http://mail.python.org/mailman/listinfo/python-list
Re: Efficient python 2-d arrays?
Since you can't depend on your users installing the dependencies, is it vital that your users run from source? You could bundle up your application along with numpy and other dependencies using py2Exe or similar. This also means you wouldn't have to require users to have the right (or any) version of Python installed. It's a good suggestion, though I am far from familiar with the process. I've just finished implementing another alternative-- I'm doing a merge sort, where the array chunks are zipped together and then sorted using python's builtin sort then unzipped back to their original arrays. This seems fast enough and the reduced memory requirement for 2 arrays vs 1 list-of-tuples is substantial (1.5 GB vs 4GB). Thanks for the great suggestions! -- http://mail.python.org/mailman/listinfo/python-list
Re: Efficient python 2-d arrays?
Without using third party libraries, no not really. numpy has it covered so there's not really a lot of demand for it. If your users are loading 1.5 GB arrays into memory, it's probably not unreasonable to expect them to have numpy installed. My users are biologists, and I can't expect them to have numpy (and the barrier to entry for this group is particularly high). -- http://mail.python.org/mailman/listinfo/python-list
Efficient python 2-d arrays?
Hi all, Using numpy, I can create large 2-dimensional arrays quite easily. import numpy mylist = numpy.zeros((1,2), dtype=numpy.int32) Unfortunately, my target audience may not have numpy so I'd prefer not to use it. Similarly, a list-of-tuples using standard python syntax. mylist = [(0,0) for i in xrange(1) but this method uses way too much memory (4GB for 100 million items, compared to 1.5GB for numpy method). Since I want to keep the two elements together during a sort, I *can't* use array.array. mylist = [array.array('i',xrange(1)), array.array('i',xrange(1))] If I knew the size in advance, I could use ctypes arrays. from ctypes import * class myStruct(Structure): _fields_ = [('x',c_int),('y',c_int)] mylist_type = myStruct * 1 mylist = mylist_type() but I don't know that size (and it can vary between 1 million-200 million), so preallocating doesn't seem to be an option. Is there a python standard library way of creating *efficient* 2-dimensional lists/arrays, still allowing me to sort and append? Thanks! -- http://mail.python.org/mailman/listinfo/python-list
Re: Efficient python 2-d arrays?
On Mon, Jan 17, 2011 at 2:20 PM, Jake Biesinger jake.biesin...@gmail.com wrote: Hi all, Using numpy, I can create large 2-dimensional arrays quite easily. import numpy mylist = numpy.zeros((1,2), dtype=numpy.int32) Unfortunately, my target audience may not have numpy so I'd prefer not to use it. Similarly, a list-of-tuples using standard python syntax. mylist = [(0,0) for i in xrange(1) but this method uses way too much memory (4GB for 100 million items, compared to 1.5GB for numpy method). Since I want to keep the two elements together during a sort, I *can't* use array.array. mylist = [array.array('i',xrange(1)), array.array('i',xrange(1))] If I knew the size in advance, I could use ctypes arrays. from ctypes import * class myStruct(Structure): _fields_ = [('x',c_int),('y',c_int)] mylist_type = myStruct * 1 mylist = mylist_type() but I don't know that size (and it can vary between 1 million-200 million), so preallocating doesn't seem to be an option. Is there a python standard library way of creating *efficient* 2-dimensional lists/arrays, still allowing me to sort and append? Thanks! -- http://mail.python.org/mailman/listinfo/python-list I recently had need of a two dimensional array also, but I only needed small ones, so I just used a dictionary indexed by tuples. If you need to sort a row as an aggregate type, it seems that you probably either want to: 1) Use a list of lists, where the outer list is the easier one to sort by - this is analogous to the array of pointers in C - sorting by the outer would want to compare elements by looking at the 0th inner, then 1st inner, etc. So if you can organize things this way, things might be pretty easy. 2) Use a list of instances, where the instances (rows) might just include a list 3) Use array.array with a custom sort so that you can move multiple things when a more common sort would move one (possibly an aggregate). If you want some sorting code to start from for #3, I have a variety of sorts written in Python (pure python or cython, your option, each automatically derived from the same m4 file for a given sort) at http://stromberg.dnsalias.org/svn/sorts/compare/trunk/ - however, even the cython timsort provided there doesn't perform quite as well as the standard library's timsort. HTH -- http://mail.python.org/mailman/listinfo/python-list
Re: Efficient python 2-d arrays?
Hi, what about pytables? It's built for big data collections and it doesn't clog up the memory. Am 17.01.2011 23:54, schrieb Dan Stromberg: On Mon, Jan 17, 2011 at 2:20 PM, Jake Biesinger jake.biesin...@gmail.com wrote: Hi all, Using numpy, I can create large 2-dimensional arrays quite easily. import numpy mylist = numpy.zeros((1,2), dtype=numpy.int32) Unfortunately, my target audience may not have numpy so I'd prefer not to use it. Similarly, a list-of-tuples using standard python syntax. mylist = [(0,0) for i in xrange(1) but this method uses way too much memory (4GB for 100 million items, compared to 1.5GB for numpy method). Since I want to keep the two elements together during a sort, I *can't* use array.array. mylist = [array.array('i',xrange(1)), array.array('i',xrange(1))] If I knew the size in advance, I could use ctypes arrays. from ctypes import * class myStruct(Structure): _fields_ = [('x',c_int),('y',c_int)] mylist_type = myStruct * 1 mylist = mylist_type() but I don't know that size (and it can vary between 1 million-200 million), so preallocating doesn't seem to be an option. Is there a python standard library way of creating *efficient* 2-dimensional lists/arrays, still allowing me to sort and append? Thanks! -- http://mail.python.org/mailman/listinfo/python-list I recently had need of a two dimensional array also, but I only needed small ones, so I just used a dictionary indexed by tuples. If you need to sort a row as an aggregate type, it seems that you probably either want to: 1) Use a list of lists, where the outer list is the easier one to sort by - this is analogous to the array of pointers in C - sorting by the outer would want to compare elements by looking at the 0th inner, then 1st inner, etc. So if you can organize things this way, things might be pretty easy. 2) Use a list of instances, where the instances (rows) might just include a list 3) Use array.array with a custom sort so that you can move multiple things when a more common sort would move one (possibly an aggregate). If you want some sorting code to start from for #3, I have a variety of sorts written in Python (pure python or cython, your option, each automatically derived from the same m4 file for a given sort) at http://stromberg.dnsalias.org/svn/sorts/compare/trunk/ - however, even the cython timsort provided there doesn't perform quite as well as the standard library's timsort. HTH -- http://mail.python.org/mailman/listinfo/python-list
Re: Efficient python 2-d arrays?
Using numpy, I can create large 2-dimensional arrays quite easily. IIUC (please confirm), you don't need a generic two-dimensional array, but rather an Nx2 array, where N may be large (but the other dimension will always have a magnitude of 2). Since I want to keep the two elements together during a sort I assume (please confirm) that you want to sort by one of the numbers, and keep the other one together with the sort key. I suggest that you implement your own sorting algorithm, or reuse heapsort (actually, any of the sorting algorithms would do). You then use array.array, and take the elements at indices 2k and 2k+1 together. Sorting would then only look at even indices, but any swapping you do during the sorting would always swap two numbers with two other numbers. Regards, Martin -- http://mail.python.org/mailman/listinfo/python-list
Re: Efficient python 2-d arrays?
On Monday, January 17, 2011 4:12:51 PM UTC-8, OAN wrote: Hi, what about pytables? It's built for big data collections and it doesn't clog up the memory. I thought PyTables depends on NumPy? Otherwise I would indeed use their carray module. Thanks! -- http://mail.python.org/mailman/listinfo/python-list
Re: Efficient python 2-d arrays?
IIUC (please confirm), you don't need a generic two-dimensional array, but rather an Nx2 array, where N may be large (but the other dimension will always have a magnitude of 2). Yes, that's right, Nx2 not NxM. Since I want to keep the two elements together during a sort I assume (please confirm) that you want to sort by one of the numbers, and keep the other one together with the sort key. Again, yes. I suggest that you implement your own sorting algorithm, or reuse heapsort (actually, any of the sorting algorithms would do). You then use array.array, and take the elements at indices 2k and 2k+1 together. Sorting would then only look at even indices, but any swapping you do during the sorting would always swap two numbers with two other numbers. Yes, this is an option. Writing a custom sort function just didn't sound very appealing and I was hoping for a more generic solution. Thanks! -- http://mail.python.org/mailman/listinfo/python-list
Re: Efficient python 2-d arrays?
On Jan 17, 2:20 pm, Jake Biesinger jake.biesin...@gmail.com wrote: Is there a python standard library way of creating *efficient* 2-dimensional lists/arrays, still allowing me to sort and append? Without using third party libraries, no not really. numpy has it covered so there's not really a lot of demand for it. If your users are loading 1.5 GB arrays into memory, it's probably not unreasonable to expect them to have numpy installed. Other than that you're probably stuck emulating 2D with the array module. Carl Banks -- http://mail.python.org/mailman/listinfo/python-list