Re: performance of tight loop
Thank you for the explanation, Ryan! Uli -- Domino Laser GmbH Geschäftsführer: Thorsten Föcking, Amtsgericht Hamburg HR B62 932 -- http://mail.python.org/mailman/listinfo/python-list
Re: performance of tight loop
gry georgeryo...@gmail.com writes: ... rest = ['%d' % randint(1, mx) for i in range(wd - 1)] for i in range(i,i+rows): ... One thing that immediately comes to mind is use xrange instead of range. Also, instead of first = ['%d' % i] rest = ['%d' % randint(1, mx) for i in range(wd - 1)] return first + rest you might save some copying with: rest = ['%d' % randint(1, mx) for i in xrange(wd)] rest[0] = '%d'%i return rest That's uglier than the old-fashioned rest = ['%d'%i] for i in xrange(wd-1): rest.append('%d' % randint(1, mx)) I think a generator would be cleanest, but maybe slowest: def row(i, wd, mx): yield '%d' % i for j in xrange(wd-1): yield ('%d' % randint(1, mx)) -- http://mail.python.org/mailman/listinfo/python-list
Re: performance of tight loop
gry wrote: [python-2.4.3, rh CentOS release 5.5 linux, 24 xeon cpu's, 24GB ram] I have a little data generator that I'd like to go faster... any suggestions? maxint is usually 9223372036854775808(max 64bit int), but could occasionally be 99. width is usually 500 or 1600, rows ~ 5000. from random import randint def row(i, wd, mx): first = ['%d' % i] rest = ['%d' % randint(1, mx) for i in range(wd - 1)] return first + rest ... while True: print copy %s from stdin direct delimiter ','; % table_name for i in range(i,i+rows): print ','.join(row(i, width, maxint)) print '\.' I see the biggest potential in inlining randint. Unfortunately you did not provide an executable script and I had to make it up: $ cat gry.py from random import randint import sys def row(i, wd, mx): first = ['%d' % i] rest = ['%d' % randint(1, mx) for i in range(wd - 1)] return first + rest def main(): table_name = unknown maxint = sys.maxint width = 500 rows = 1000 offset = 0 print copy %s from stdin direct delimiter ','; % table_name for i in range(offset, offset+rows): print ','.join(row(i, width, maxint)) print '\.' if __name__ == __main__: main() $ time python gry.py /dev/null real0m5.280s user0m5.230s sys 0m0.050s $ $ cat gry_inline.py import random import math import sys def make_rand(n): if n 1 random.BPF: def rand(random=random.random): return int(n*random())+1 else: k = int(1.1 + math.log(n-1, 2.0)) def rand(getrandbits=random.getrandbits): r = getrandbits(k) while r = n: r = getrandbits(k) return r+1 return rand def row(i, wd, rand): first = ['%d' % i] rest = ['%d' % rand() for i in range(wd - 1)] return first + rest def main(): table_name = unknown maxint = sys.maxint width = 500 rows = 1000 offset = 0 rand = make_rand(maxint) print copy %s from stdin direct delimiter ','; % table_name for i in range(offset, offset+rows): print ','.join(row(i, width, rand)) print '\.' if __name__ == __main__: main() $ time python gry_inline.py /dev/null real0m2.004s user0m2.000s sys 0m0.000s $ Disclaimer: the code in random.py is complex enough that I cannot guarantee I snatched the right pieces. Peter -- http://mail.python.org/mailman/listinfo/python-list
Re: performance of tight loop
Peter Otten wrote: gry wrote: [python-2.4.3, rh CentOS release 5.5 linux, 24 xeon cpu's, 24GB ram] I have a little data generator that I'd like to go faster... any suggestions? maxint is usually 9223372036854775808(max 64bit int), but could occasionally be 99. width is usually 500 or 1600, rows ~ 5000. from random import randint def row(i, wd, mx): first = ['%d' % i] rest = ['%d' % randint(1, mx) for i in range(wd - 1)] return first + rest ... while True: print copy %s from stdin direct delimiter ','; % table_name for i in range(i,i+rows): print ','.join(row(i, width, maxint)) print '\.' I see the biggest potential in inlining randint. Unfortunately you did not provide an executable script and I had to make it up: $ time python gry_inline.py /dev/null real0m2.004s user0m2.000s sys 0m0.000s On second thought, if you have numpy available: $ cat gry_numpy.py from numpy.random import randint import sys def row(i, wd, mx): first = ['%d' % i] rest = ['%d' % i for i in randint(1, mx, wd - 1)] return first + rest def main(): table_name = unknown maxint = sys.maxint width = 500 rows = 1000 offset = 0 print copy %s from stdin direct delimiter ','; % table_name for i in range(offset, offset+rows): print ','.join(row(i, width, maxint)) print '\.' if __name__ == __main__: main() $ time python gry_numpy.py /dev/null real0m1.024s user0m1.010s sys 0m0.010s $ Argh Peter -- http://mail.python.org/mailman/listinfo/python-list
performance of tight loop
[python-2.4.3, rh CentOS release 5.5 linux, 24 xeon cpu's, 24GB ram] I have a little data generator that I'd like to go faster... any suggestions? maxint is usually 9223372036854775808(max 64bit int), but could occasionally be 99. width is usually 500 or 1600, rows ~ 5000. from random import randint def row(i, wd, mx): first = ['%d' % i] rest = ['%d' % randint(1, mx) for i in range(wd - 1)] return first + rest ... while True: print copy %s from stdin direct delimiter ','; % table_name for i in range(i,i+rows): print ','.join(row(i, width, maxint)) print '\.' -- http://mail.python.org/mailman/listinfo/python-list
Re: performance of tight loop
On Mon, 13 Dec 2010 18:50:38 -0800, gry wrote: [python-2.4.3, rh CentOS release 5.5 linux, 24 xeon cpu's, 24GB ram] I have a little data generator that I'd like to go faster... any suggestions? maxint is usually 9223372036854775808(max 64bit int), but could occasionally be 99. width is usually 500 or 1600, rows ~ 5000. from random import randint def row(i, wd, mx): first = ['%d' % i] rest = ['%d' % randint(1, mx) for i in range(wd - 1)] return first + rest ... while True: print copy %s from stdin direct delimiter ','; % table_name for i in range(i,i+rows): print ','.join(row(i, width, maxint)) print '\.' This isn't entirely clear to me. Why is the while loop indented? I assume it's part of some other function that you haven't shown us, rather than part of the function row(). Assuming this, I would say that the overhead of I/O (the print commands) will likely be tens or hundreds of times greater than the overhead of the loop, so you're probably not likely to see much appreciable benefit. You might save off a few seconds from something that runs for many minutes. I don't see the point, really. If the print statements are informative rather than necessary, I would print every tenth (say) line rather than every line. That should save *lots* of time. Replacing while True with while 1 may save a tiny bit of overhead. Whether it is significant or not is another thing. Replacing range with xrange should also make a difference, especially if rows is a large number. Moving the code from row() inline, replacing string interpolation with calls to str(), may also help. Making local variables of any globals may also help a tiny bit. But as I said, you're shaving microseconds of overhead and spending millseconds printing -- the difference will be tiny. But for what it's worth, I'd try this: # Avoid globals in favour of locals. from random import randint _maxint = maxint loop = xrange(i, i+rows) # Where does i come from? inner_loop = xrange(width) # Note 1 more than before. while 1: print copy %s from stdin direct delimiter ','; % table_name for i in loop: row = [str(randint(1, _maxint)) for _ in inner_loop] row[0] = str(i) # replace in place print ','.join(row) print '\.' Hope it helps. -- Steven -- http://mail.python.org/mailman/listinfo/python-list
Re: performance of tight loop
gry wrote: I have a little data generator that I'd like to go faster... any suggestions? maxint is usually 9223372036854775808(max 64bit int), but could occasionally be 99. width is usually 500 or 1600, rows ~ 5000. from random import randint def row(i, wd, mx): first = ['%d' % i] rest = ['%d' % randint(1, mx) for i in range(wd - 1)] return first + rest A few things here: * If you can, don't convert the ints to strings. I'm not 100% sure about Python 2.4, but newer versions will automatically yield long instead of int if the range exceeds that of an int, so even with large numbers that should be safe. * Replace range with xrange. * Instead of creating and appending lists, you could also use a generator expression. print ','.join(row(i, width, maxint)) All you do here is take a list of strings, build a single string from them and then print the string. Why not iterate over the list (or, as suggested, the generator) and print the elements? Summary: Avoid unnecessary conversions. This includes int to string, but also logical sequences into arrays. Uli -- Domino Laser GmbH Geschäftsführer: Thorsten Föcking, Amtsgericht Hamburg HR B62 932 -- http://mail.python.org/mailman/listinfo/python-list
Re: performance of tight loop
Steven D'Aprano wrote: Replacing while True with while 1 may save a tiny bit of overhead. Whether it is significant or not is another thing. Is this the price for an intentional complexity or just a well-known optimizer deficiency? Just curious... Uli -- Domino Laser GmbH Geschäftsführer: Thorsten Föcking, Amtsgericht Hamburg HR B62 932 -- http://mail.python.org/mailman/listinfo/python-list
Re: performance of tight loop
On Tue, 2010-12-14 at 08:08 +0100, Ulrich Eckhardt wrote: Steven D'Aprano wrote: Replacing while True with while 1 may save a tiny bit of overhead. Whether it is significant or not is another thing. Is this the price for an intentional complexity or just a well-known optimizer deficiency? At least on older pythons, you can assign to the name True so it's not possible to optimize that loop - you must look up the name True on each iteration. For example, in python 2.6 this loop will exit after one iteration: while True: ... True = False ... To see the difference, take a look at the bytecode python generators for the type types of loop: import dis def while1(): ... while 1: ... pass ... def whileTrue(): ... while True: ... pass ... dis.dis(while1) 2 0 SETUP_LOOP 3 (to 6) 3 3 JUMP_ABSOLUTE3 6 LOAD_CONST 0 (None) 9 RETURN_VALUE dis.dis(whileTrue) 2 0 SETUP_LOOP 12 (to 15) 3 LOAD_GLOBAL 0 (True) 6 JUMP_IF_FALSE4 (to 13) 9 POP_TOP 3 10 JUMP_ABSOLUTE3 13 POP_TOP 14 POP_BLOCK 15 LOAD_CONST 0 (None) 18 RETURN_VALUE Still, I just can't bring myself to write while 1 in favour of while True in code. Python 3 does away with this madness entirely: while True: ... True = False ... File stdin, line 2 SyntaxError: assignment to keyword Looking at the bytecode shows that in Python 3, while 1 and while True are indeed identical. Cheers, Ryan -- Ryan Kelly http://www.rfk.id.au | This message is digitally signed. Please visit r...@rfk.id.au| http://www.rfk.id.au/ramblings/gpg/ for details signature.asc Description: This is a digitally signed message part -- http://mail.python.org/mailman/listinfo/python-list