Hi folks,
Thanks, for all the help. I tried running the various options, and
here is what I found:
from array import array
from time import time
def f1(recs, cols):
for r in recs:
for i,v in enumerate(r):
cols[i].append(v)
def f2(recs, cols):
for r in recs:
for v,c in zip(r, cols):
c.append(v)
def f3(recs, cols):
for r in recs:
map(list.append, cols, r)
def f4(recs):
return zip(*recs)
records = [ tuple(range(10)) for i in xrange(1000000) ]
columns = tuple([] for i in xrange(10))
t = time()
f1(records, columns)
print 'f1: ', time()-t
columns = tuple([] for i in xrange(10))
t = time()
f2(records, columns)
print 'f2: ', time()-t
columns = tuple([] for i in xrange(10))
t = time()
f3(records, columns)
print 'f3: ', time()-t
t = time()
columns = f4(records)
print 'f4: ', time()-t
f1: 5.10132408142
f2: 5.06787180901
f3: 4.04700708389
f4: 19.13633203506
So there is some benefit in using map(list.append). f4 is very clever
and cool but it doesn't seem to scale.
Incidentally, it took me a while to figure out why the following
initialization doesn't work:
columns = ([],)*10
apparently you end up with 10 copies of the same list.
Finally, in my case the output columns are integer arrays (to save
memory). I can still use array.append but it's a little slower so the
difference between f1-f3 gets even smaller. f4 is not an option with
arrays.
--
http://mail.python.org/mailman/listinfo/python-list