Dear pythonistas, I am writing a tiny utility to produce a file consisting of a specified number of lines of a given length of random ascii characters. I am hoping to find a more time and memory efficient way, that is still fairly simple clear, and _pythonic_.
I would like to have something that I can use at both extremes of data: 32M chars per line * 100 lines or 5 chars per line * 1e8 lines. E.g., the output of bigrand.py for 10 characters, 2 lines might be: gw2+M/5t&. S[[db/l?Vx I'm using python 2.7.0 on linux. I need to use only out-of-the box modules, since this has to work on a bunch of different computers. At this point I'm especially concerned with the case of a few very long lines, since that seems to use a lot of memory, and take a long time. Characters are a slight subset of the printable ascii's, specified in the examples below. My first naive try was: from sys import stdout import random nchars = 32000000 rows = 10 avail_chrs = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%& \'()*+,-./:;<=>?@[\\]^_`{}' def make_varchar(nchars): return (''.join([random.choice(avail_chrs) for i in range(nchars)])) for l in range(rows): stdout.write(make_varchar(nchars)) stdout.write('\n') This version used around 1.2GB resident/1.2GB virtual of memory for 3min 38sec. My second try uses much less RAM, but more CPU time, and seems rather, umm, un-pythonic (the array module always seems a little un pythonic...) from sys import stdout from array import array import random nchars = 32000000 rows = 10 avail_chrs = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%& \'()*+,-./:;<=>?@[\\]^_`{}' a = array('c', 'X' * nchars) for l in range(rows): for i in xrange(nchars): a[i] = random.choice(avail_chrs) a.tofile(stdout) stdout.write('\n') This version using array took 4 min, 29 sec, using 34MB resident/110 virtual. So, much smaller than the first attempt, but a bit slower. Can someone suggest a better code? And help me understand the performance issues here? -- George -- http://mail.python.org/mailman/listinfo/python-list