lists as an efficient implementation of large two-dimensional arrays(!)

Mitchell L Model Tue, 02 Feb 2010 12:18:04 -0800

An instructive lesson in YAGNI ("you aren't going to need it"),premature optimization, and not making assumptions about Python datastructure implementations.

I need a 1000 x 1000 two-dimensional array of objects. (Since they areinstances of application classes it appears that the array module isuseless; likewise, since I am using Python 3.1, so among other things,I can't use numpy or its relatives.) The usage pattern is that thearray is first completely filled with objects. Later, objects aresometimes accessed individually by row and column and often the entirearray is iterated over.

Worried (unnecessarily, as it turns out) by the prospect of 1,000,000element list I started by constructing a dictionary with the keys 1through 1000, each of which had as its value another dictionary withthe keys 1 through 1000. Actual values were the values of the secondlevel dictionary.

Using numbers to fill the array to minimize the effect of creating mymore complex objects, and running Python 3.1.1 on an 8-core Mac Prowith 8Gb memory, I tried the following


#create and fill the array:
t1 = time.time()
d2 = {}
for j in range(1000):
    d2[j] = dict()
    for k in range(1000):
        d2[j][k] = k
print( round(time.time() - t1, 2))

0.41

# access each element of the array:
t1 = time.time()
for j in range(1000):
    for k in range(1000):
        elt = d2[j][k]
print( round(time.time() - t1, 2))

0.55

My program was too slow, so I started investigating whether I couldimprove on the two-level dictionary, which got used a lot. To getanother baseline I tried a pure 1,000,000-element list, expecting thetimes to be be horrendous, but look!


# fill a list using append
t1 = time.time()
lst = []
for n in range(1000000):
    lst.append(n)
print( round(time.time() - t1, 2))

0.26

# access every element of a list
t1 = time.time()
for n in range(1000000):
    elt = lst[n]
print( round(time.time() - t1, 2))

0.25

What a shock! I could save half the execution time and all my cleverwork and awkward double-layer dictionary expressions by just using alist!

Even better, look what happens using a comprehension to create thelist instead of a loop with list.append:


t1 = time.time()
lst = [n for n in range(1000000)]
print( round(time.time() - t1, 2))

0.11

Half again to create the list.

Iterating over the whole list is easier and faster than iterating overthe double-level dictionary, in particular because it doesn't involvea two-level loop. But what about individual access given a row and acolumn?


t1 = time.time()
for j in range(1000):
    for k in range(1000):
        elt = lst[j * 1000 + k]
print( round(time.time() - t1, 2))

0.45

This is the same as for the dictionary.

I tried a two-level list and a few other things but still haven'tfound anything that works better than a single long list -- just like2-D arrays are coded in old-style languages, with indices computed asoffsets from the beginning of the linear sequence of all the values.What's amazing is that creating and accessing 1,000,000-element listin Python is so efficient. The usual moral obtains: start simple,analyze problems (functional or performance) as they arise, decidewhether they are worth the cost of change, then change in very limitedways. And of course abstract and modularize so that, in my case, forexample, none of the program's code would be affected by the changefrom a two-level dictionary representation to using a single long list.

I realize there are many directions an analysis like this can follow,and many factors affecting it, including patterns of use. I justwanted to demonstrate the basics for a situation that I justencountered. In particular, if the array was sparse, rather thancompletely full, the two-level dictionary implementation would be thenatural representation.

--
http://mail.python.org/mailman/listinfo/python-list

lists as an efficient implementation of large two-dimensional arrays(!)

Reply via email to