Re: Efficient python 2-d arrays?

2011-01-20 Thread Jake Biesinger
 Thanks for the great suggestions!

On a related note, is there no way to efficiently sort a python array?


 x = array('f', xrange(1000))
 x.sort()
---
AttributeErrorTraceback (most recent call last)

/home/wbiesing/src/python-sort/ipython console in module()

AttributeError: 'array.array' object has no attribute 'sort'

 type(sorted(x))
type 'list'

Seems like a common enough task, but no way to do it efficiently without 
scipy/numpy.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Efficient python 2-d arrays?

2011-01-20 Thread Robert Kern

On 1/20/11 1:36 PM, Jake Biesinger wrote:

Thanks for the great suggestions!


On a related note, is there no way to efficiently sort a python array?


No.


x = array('f', xrange(1000))
x.sort()

---
AttributeErrorTraceback (most recent call last)

/home/wbiesing/src/python-sort/ipython console  inmodule()

AttributeError: 'array.array' object has no attribute 'sort'


type(sorted(x))

type 'list'

Seems like a common enough task, but no way to do it efficiently without 
scipy/numpy.


It's not really common. The array module is mostly used for data accumulation 
and interchange. Most people who need to compute things over 
efficiently-implemented homogeneous arrays use numpy.


--
Robert Kern

I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth.
  -- Umberto Eco

--
http://mail.python.org/mailman/listinfo/python-list


Re: Efficient python 2-d arrays?

2011-01-20 Thread Dan Stromberg
On Thu, Jan 20, 2011 at 11:36 AM, Jake Biesinger
jake.biesin...@gmail.com wrote:
 Thanks for the great suggestions!

 On a related note, is there no way to efficiently sort a python array?


 x = array('f', xrange(1000))
 x.sort()
 ---
 AttributeError                            Traceback (most recent call last)

 /home/wbiesing/src/python-sort/ipython console in module()

 AttributeError: 'array.array' object has no attribute 'sort'

 type(sorted(x))
 type 'list'

 Seems like a common enough task, but no way to do it efficiently without 
 scipy/numpy.

Tap, tap: Is this thing on?  I keep sending messages to this list, but
no one seems to hear what I'm writing.

Anyway, I have a bunch of sorts in python (pure or cython, your
option) at 
http://stromberg.dnsalias.org/cgi-bin/viewvc.cgi/sorts/compare/trunk/?root=svn

The pure python versions should all work out of the box with an
array.array.  I tested the timsort on an array.array, and it worked
well.

Also, I just made a trivial modification (not checked in! ...but I did
test it) to make the cythonized timsort support sorting arrays - just
change the list type declarations to object (or perhaps
array.array) prior to compiling.

I also just changed the license on them from GPL v3 to Apache v2.

timsort truly is a somewhat supernatural algorithm.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Efficient python 2-d arrays?

2011-01-20 Thread Jacob Biesinger
On Thu, Jan 20, 2011 at 12:48 PM, Dan Stromberg drsali...@gmail.com wrote:
 On Thu, Jan 20, 2011 at 11:36 AM, Jake Biesinger
 jake.biesin...@gmail.com wrote:
 Thanks for the great suggestions!

 On a related note, is there no way to efficiently sort a python array?


 x = array('f', xrange(1000))
 x.sort()
 ---
 AttributeError                            Traceback (most recent call last)

 /home/wbiesing/src/python-sort/ipython console in module()

 AttributeError: 'array.array' object has no attribute 'sort'

 type(sorted(x))
 type 'list'

 Seems like a common enough task, but no way to do it efficiently without 
 scipy/numpy.

 Tap, tap: Is this thing on?  I keep sending messages to this list, but
 no one seems to hear what I'm writing.

Hey Dan, indeed thanks for the code; I pulled it down and played
around with timsort a while ago. Timsort is indeed a beast. My current
solution is to do an argsort.  Since I can't roll with your cython
implementations (pure python *only*), this seems a bit faster than
sorting the original two lists using a modified pure-python timsort.
Something along the lines of:

args = sorted(range(len(seq1)), key=seq1.__getitem__)
seq1 = [seq1[i] for i in args]
seq2 = [seq2[i] for i in args]

but the method suffers from a fairly high memory usage (big list of
int's for args).  If this barrier is too high, I will be switching
over to your pure-python timsort.

 Anyway, I have a bunch of sorts in python (pure or cython, your
 option) at 
 http://stromberg.dnsalias.org/cgi-bin/viewvc.cgi/sorts/compare/trunk/?root=svn

 The pure python versions should all work out of the box with an
 array.array.  I tested the timsort on an array.array, and it worked
 well.

your pylint runs fail for me, saying you should use disable, not disable-msg.
pylint 0.23.0,
astng 0.21.1, common 0.54.0
Python 2.6.6 (r266:84292, Sep 15 2010, 16:22:56)
[GCC 4.4.5]

 Also, I just made a trivial modification (not checked in! ...but I did
 test it) to make the cythonized timsort support sorting arrays - just
 change the list type declarations to object (or perhaps
 array.array) prior to compiling.

I'm not very familiar with cython syntax-- how do you bring in the array module?
cimport array? import array? I see you need to change cdef list to
cdef array.array or something like that.  Perhaps you could send a
diff?

Thanks!
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Efficient python 2-d arrays?

2011-01-20 Thread Dan Stromberg
On Thu, Jan 20, 2011 at 4:07 PM, Jacob Biesinger
jake.biesin...@gmail.com wrote:
 On Thu, Jan 20, 2011 at 12:48 PM, Dan Stromberg drsali...@gmail.com wrote:
 On Thu, Jan 20, 2011 at 11:36 AM, Jake Biesinger
 jake.biesin...@gmail.com wrote:
 Thanks for the great suggestions!

 On a related note, is there no way to efficiently sort a python array?


 x = array('f', xrange(1000))
 x.sort()
 ---
 AttributeError                            Traceback (most recent call last)

 /home/wbiesing/src/python-sort/ipython console in module()

 AttributeError: 'array.array' object has no attribute 'sort'

 type(sorted(x))
 type 'list'

 Seems like a common enough task, but no way to do it efficiently without 
 scipy/numpy.

 Tap, tap: Is this thing on?  I keep sending messages to this list, but
 no one seems to hear what I'm writing.

 Hey Dan, indeed thanks for the code; I pulled it down and played
 around with timsort a while ago. Timsort is indeed a beast. My current
 solution is to do an argsort.  Since I can't roll with your cython
 implementations (pure python *only*), this seems a bit faster than
 sorting the original two lists using a modified pure-python timsort.
 Something along the lines of:

 args = sorted(range(len(seq1)), key=seq1.__getitem__)
 seq1 = [seq1[i] for i in args]
 seq2 = [seq2[i] for i in args]

 but the method suffers from a fairly high memory usage (big list of
 int's for args).  If this barrier is too high, I will be switching
 over to your pure-python timsort.

Oh, cool.  I guess I'm not a ghost after all.  :)

And an argsort sounds like a nice solution.

 Anyway, I have a bunch of sorts in python (pure or cython, your
 option) at 
 http://stromberg.dnsalias.org/cgi-bin/viewvc.cgi/sorts/compare/trunk/?root=svn

 The pure python versions should all work out of the box with an
 array.array.  I tested the timsort on an array.array, and it worked
 well.

 your pylint runs fail for me, saying you should use disable, not disable-msg.
 pylint 0.23.0,
 astng 0.21.1, common 0.54.0
 Python 2.6.6 (r266:84292, Sep 15 2010, 16:22:56)
 [GCC 4.4.5]

Earlier today I updated the code to deal with newer pylints -
including the disable vs disable-msg thing.  If you svn update, you
should get these changes.

 Also, I just made a trivial modification (not checked in! ...but I did
 test it) to make the cythonized timsort support sorting arrays - just
 change the list type declarations to object (or perhaps
 array.array) prior to compiling.

 I'm not very familiar with cython syntax-- how do you bring in the array 
 module?
 cimport array? import array? I see you need to change cdef list to
 cdef array.array or something like that.  Perhaps you could send a
 diff?

http://stromberg.dnsalias.org/svn/sorts/compare/branches/arrays now
has a version of timsort that expects objects (IOW, about anything,
duck typed) instead of lists.  I tried briefly to declare some of the
types to be array.array, but perhaps Cython doesn't like that - I was
getting compile-time errors from it.

HTH.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Efficient python 2-d arrays?

2011-01-18 Thread Jonathan Hartley
On Jan 17, 10:20 pm, Jake Biesinger jake.biesin...@gmail.com wrote:
 Hi all,

 Using numpy, I can create large 2-dimensional arrays quite easily.

  import numpy
  mylist = numpy.zeros((1,2), dtype=numpy.int32)

 Unfortunately, my target audience may not have numpy so I'd prefer not to use 
 it.

 Similarly, a list-of-tuples using standard python syntax.

  mylist = [(0,0) for i in xrange(1)

 but this method uses way too much memory (4GB for 100 million items, 
 compared to 1.5GB for numpy method).

 Since I want to keep the two elements together during a sort, I *can't* use 
 array.array.

  mylist = [array.array('i',xrange(1)), 
  array.array('i',xrange(1))]

 If I knew the size in advance, I could use ctypes arrays.

  from ctypes import *
  class myStruct(Structure):
      _fields_ = [('x',c_int),('y',c_int)]
  mylist_type = myStruct * 1
  mylist = mylist_type()

 but I don't know that size (and it can vary between 1 million-200 million), 
 so preallocating doesn't seem to be an option.

 Is there a python standard library way of creating *efficient* 2-dimensional 
 lists/arrays, still allowing me to sort and append?

 Thanks!



Since you can't depend on your users installing the dependencies, is
it vital that your users run from source? You could bundle up your
application along with numpy and other dependencies using py2Exe or
similar. This also means you wouldn't have to require users to have
the right (or any) version of Python installed.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Efficient python 2-d arrays?

2011-01-18 Thread Robert Kern

On 1/17/11 7:32 PM, Jake Biesinger wrote:

On Monday, January 17, 2011 4:12:51 PM UTC-8, OAN wrote:

Hi,

what about pytables? It's built for big data collections and it doesn't
clog up the memory.


I thought PyTables depends on NumPy?


It does.

--
Robert Kern

I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth.
  -- Umberto Eco

--
http://mail.python.org/mailman/listinfo/python-list


Re: Efficient python 2-d arrays?

2011-01-18 Thread Jake Biesinger
 Since you can't depend on your users installing the dependencies, is
 it vital that your users run from source? You could bundle up your
 application along with numpy and other dependencies using py2Exe or
 similar. This also means you wouldn't have to require users to have
 the right (or any) version of Python installed.

It's a good suggestion, though I am far from familiar with the process.

I've just finished implementing another alternative-- I'm doing a merge sort, 
where the array chunks are zipped together and then sorted using python's 
builtin sort then unzipped back to their original arrays. This seems fast 
enough and the reduced memory requirement for 2 arrays vs 1 list-of-tuples is 
substantial (1.5 GB vs 4GB).

Thanks for the great suggestions!
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Efficient python 2-d arrays?

2011-01-18 Thread Jake Biesinger
 Without using third party libraries, no not really.  numpy has it
 covered so there's not really a lot of demand for it.  If your users
 are loading 1.5 GB arrays into memory, it's probably not unreasonable
 to expect them to have numpy installed.

My users are biologists, and I can't expect them to have numpy (and the barrier 
to entry for this group is particularly high).
-- 
http://mail.python.org/mailman/listinfo/python-list


Efficient python 2-d arrays?

2011-01-17 Thread Jake Biesinger
Hi all,

Using numpy, I can create large 2-dimensional arrays quite easily.
 import numpy
 mylist = numpy.zeros((1,2), dtype=numpy.int32)

Unfortunately, my target audience may not have numpy so I'd prefer not to use 
it.

Similarly, a list-of-tuples using standard python syntax.
 mylist = [(0,0) for i in xrange(1)

but this method uses way too much memory (4GB for 100 million items, compared 
to 1.5GB for numpy method).

Since I want to keep the two elements together during a sort, I *can't* use 
array.array.
 mylist = [array.array('i',xrange(1)), 
 array.array('i',xrange(1))]

If I knew the size in advance, I could use ctypes arrays.
 from ctypes import *
 class myStruct(Structure):
 _fields_ = [('x',c_int),('y',c_int)]
 mylist_type = myStruct * 1
 mylist = mylist_type()

but I don't know that size (and it can vary between 1 million-200 million), so 
preallocating doesn't seem to be an option.

Is there a python standard library way of creating *efficient* 2-dimensional 
lists/arrays, still allowing me to sort and append?

Thanks!
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Efficient python 2-d arrays?

2011-01-17 Thread Dan Stromberg
On Mon, Jan 17, 2011 at 2:20 PM, Jake Biesinger
jake.biesin...@gmail.com wrote:
 Hi all,

 Using numpy, I can create large 2-dimensional arrays quite easily.
 import numpy
 mylist = numpy.zeros((1,2), dtype=numpy.int32)

 Unfortunately, my target audience may not have numpy so I'd prefer not to use 
 it.

 Similarly, a list-of-tuples using standard python syntax.
 mylist = [(0,0) for i in xrange(1)

 but this method uses way too much memory (4GB for 100 million items, 
 compared to 1.5GB for numpy method).

 Since I want to keep the two elements together during a sort, I *can't* use 
 array.array.
 mylist = [array.array('i',xrange(1)), 
 array.array('i',xrange(1))]

 If I knew the size in advance, I could use ctypes arrays.
 from ctypes import *
 class myStruct(Structure):
     _fields_ = [('x',c_int),('y',c_int)]
 mylist_type = myStruct * 1
 mylist = mylist_type()

 but I don't know that size (and it can vary between 1 million-200 million), 
 so preallocating doesn't seem to be an option.

 Is there a python standard library way of creating *efficient* 2-dimensional 
 lists/arrays, still allowing me to sort and append?

 Thanks!
 --
 http://mail.python.org/mailman/listinfo/python-list


I recently had need of a two dimensional array also, but I only needed
small ones, so I just used a dictionary indexed by tuples.

If you need to sort a row as an aggregate type, it seems that you
probably either want to:
1) Use a list of lists, where the outer list is the easier one to sort
by - this is analogous to the array of pointers in C - sorting by the
outer would want to compare elements by looking at the 0th inner,
then 1st inner, etc.  So if you can organize things this way, things
might be pretty easy.
2) Use a list of instances, where the instances (rows) might just include a list
3) Use array.array with a custom sort so that you can move multiple
things when a more common sort would move one (possibly an
aggregate).

If you want some sorting code to start from for #3, I have a variety
of sorts written in Python (pure python or cython, your option, each
automatically derived from the same m4 file for a given sort) at
http://stromberg.dnsalias.org/svn/sorts/compare/trunk/ - however, even
the cython timsort provided there doesn't perform quite as well as the
standard library's timsort.

HTH
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Efficient python 2-d arrays?

2011-01-17 Thread OAN

Hi,

what about pytables? It's built for big data collections and it doesn't 
clog up the memory.


Am 17.01.2011 23:54, schrieb Dan Stromberg:

On Mon, Jan 17, 2011 at 2:20 PM, Jake Biesinger
jake.biesin...@gmail.com  wrote:

Hi all,

Using numpy, I can create large 2-dimensional arrays quite easily.

import numpy
mylist = numpy.zeros((1,2), dtype=numpy.int32)

Unfortunately, my target audience may not have numpy so I'd prefer not to use 
it.

Similarly, a list-of-tuples using standard python syntax.

mylist = [(0,0) for i in xrange(1)

but this method uses way too much memory (4GB for 100 million items, compared 
to 1.5GB for numpy method).

Since I want to keep the two elements together during a sort, I *can't* use 
array.array.

mylist = [array.array('i',xrange(1)), 
array.array('i',xrange(1))]

If I knew the size in advance, I could use ctypes arrays.

from ctypes import *
class myStruct(Structure):
 _fields_ = [('x',c_int),('y',c_int)]
mylist_type = myStruct * 1
mylist = mylist_type()

but I don't know that size (and it can vary between 1 million-200 million), so 
preallocating doesn't seem to be an option.

Is there a python standard library way of creating *efficient* 2-dimensional 
lists/arrays, still allowing me to sort and append?

Thanks!
--
http://mail.python.org/mailman/listinfo/python-list


I recently had need of a two dimensional array also, but I only needed
small ones, so I just used a dictionary indexed by tuples.

If you need to sort a row as an aggregate type, it seems that you
probably either want to:
1) Use a list of lists, where the outer list is the easier one to sort
by - this is analogous to the array of pointers in C - sorting by the
outer would want to compare elements by looking at the 0th inner,
then 1st inner, etc.  So if you can organize things this way, things
might be pretty easy.
2) Use a list of instances, where the instances (rows) might just include a list
3) Use array.array with a custom sort so that you can move multiple
things when a more common sort would move one (possibly an
aggregate).

If you want some sorting code to start from for #3, I have a variety
of sorts written in Python (pure python or cython, your option, each
automatically derived from the same m4 file for a given sort) at
http://stromberg.dnsalias.org/svn/sorts/compare/trunk/ - however, even
the cython timsort provided there doesn't perform quite as well as the
standard library's timsort.

HTH


--
http://mail.python.org/mailman/listinfo/python-list


Re: Efficient python 2-d arrays?

2011-01-17 Thread Martin v. Loewis
 Using numpy, I can create large 2-dimensional arrays quite easily.

IIUC (please confirm), you don't need a generic two-dimensional
array, but rather an Nx2 array, where N may be large (but the other
dimension will always have a magnitude of 2).

 Since I want to keep the two elements together during a sort

I assume (please confirm) that you want to sort by one of the numbers,
and keep the other one together with the sort key.

I suggest that you implement your own sorting algorithm, or reuse
heapsort (actually, any of the sorting algorithms would do).

You then use array.array, and take the elements at indices 2k and 2k+1
together. Sorting would then only look at even indices, but any swapping
you do during the sorting would always swap two numbers with two other
numbers.

Regards,
Martin
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Efficient python 2-d arrays?

2011-01-17 Thread Jake Biesinger
On Monday, January 17, 2011 4:12:51 PM UTC-8, OAN wrote:
 Hi,
 
 what about pytables? It's built for big data collections and it doesn't 
 clog up the memory.

I thought PyTables depends on NumPy?  Otherwise I would indeed use their carray 
module.

Thanks!
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Efficient python 2-d arrays?

2011-01-17 Thread Jake Biesinger
 IIUC (please confirm), you don't need a generic two-dimensional
 array, but rather an Nx2 array, where N may be large (but the other
 dimension will always have a magnitude of 2).

Yes, that's right, Nx2 not NxM.

  Since I want to keep the two elements together during a sort
 
 I assume (please confirm) that you want to sort by one of the numbers,
 and keep the other one together with the sort key.

Again, yes.
 
 I suggest that you implement your own sorting algorithm, or reuse
 heapsort (actually, any of the sorting algorithms would do).
 
 You then use array.array, and take the elements at indices 2k and 2k+1
 together. Sorting would then only look at even indices, but any swapping
 you do during the sorting would always swap two numbers with two other
 numbers.

Yes, this is an option. Writing a custom sort function just didn't sound very 
appealing and I was hoping for a more generic solution.  

Thanks!
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Efficient python 2-d arrays?

2011-01-17 Thread Carl Banks
On Jan 17, 2:20 pm, Jake Biesinger jake.biesin...@gmail.com wrote:
 Is there a python standard library way of creating *efficient* 2-dimensional 
 lists/arrays, still allowing me to sort and append?

Without using third party libraries, no not really.  numpy has it
covered so there's not really a lot of demand for it.  If your users
are loading 1.5 GB arrays into memory, it's probably not unreasonable
to expect them to have numpy installed.

Other than that you're probably stuck emulating 2D with the array
module.


Carl Banks
-- 
http://mail.python.org/mailman/listinfo/python-list