Re: Speed quirk: redundant line gives six-fold speedup
Raymond Hettinger wrote: Mark Dickinson wrote: I have a simple 192-line Python script that begins with the line: dummy0 = 47 The script runs in less than 2.5 seconds. The variable dummy0 is never referenced again, directly or indirectly, by the rest of the script. Here's the surprise: if I remove or comment out this first line, the script takes more than 15 seconds to run. So it appears that adding a redundant line produces a spectacular six-fold increase in speed! Thanks for your post. It is cute, brilliant, and interesting. I haven't had time to investigate but can point at the likely cause. All of the global names are stored in a single hash table. Search time is dictated by two factors, the sparseness of the hash table and the distribution of hash values. With respect to sparseness, whenever that table becomes 2/3 full, it grows by a factor of four and becomes only 1/6 full (leading to many fewer collisions). With respect to distribution, it should be noted that string hash values are decidely non-random and your variable names likely congested consecutive spaces in a nearly full table (resulting in seven times as many search probes to find a global value). When the extra value was added, it likely resized the table four-fold and redistributed the hash values into fewer consecutive positions. If that's the cause, then here's another reason to use long, descriptive names instead of C64-BASIC style a, b, c, i, j... - with long names the chances of hash collisions are pretty low. Or everyone will start optimizing their programs by using long, *random* names ;) -- Benjamin Niemann Email: pink at odahoda dot de WWW: http://www.odahoda.de/ -- http://mail.python.org/mailman/listinfo/python-list
Re: Speed quirk: redundant line gives six-fold speedup
[Raymond Hettinger] With respect to distribution, it should be noted that string hash values are decidely non-random and your variable names likely congested consecutive spaces in a nearly full table (resulting in seven times as many search probes to find a global value). When the extra value was added, it likely resized the table four-fold and redistributed the hash values into fewer consecutive positions. [Benjamin Niemann] If that's the cause, then here's another reason to use long, descriptive names instead of C64-BASIC style a, b, c, i, j... - with long names the chances of hash collisions are pretty low. Or everyone will start optimizing their programs by using long, *random* names ;) The wink applied to second line but could have also applied to the first. For the most part, the non-randomness of string hashes tends to work in your favor -- it can result in collision free hash tables. For optimization, the best approach is to use local variables in your innermost loops. Alternatively, if you want to be tricky, try something like this: globals().update(globals()) What this does and why it can improve search times is left as an exercise for the reader :-) Raymond -- http://mail.python.org/mailman/listinfo/python-list
Speed quirk: redundant line gives six-fold speedup
I have a simple 192-line Python script that begins with the line: dummy0 = 47 The script runs in less than 2.5 seconds. The variable dummy0 is never referenced again, directly or indirectly, by the rest of the script. Here's the surprise: if I remove or comment out this first line, the script takes more than 15 seconds to run. So it appears that adding a redundant line produces a spectacular six-fold increase in speed! (Actually, I had to add 29 dummy lines at the beginning of the code to get the speed increase; if any one of these lines is removed the running time reverts to around 15 seconds again.) Questions: (1) Can anyone else reproduce this behaviour, or is it just some quirk of my setup? (2) Any possible explanations? Is there some optimization that kicks in at a certain number of lines, or at a certain length of bytecode? (3) If (2), is there some way to force the optimization, so that I can get the speed increase without having to add the extra lines? I'm running Python 2.4.1 on a 1.2Ghz iBook G4: Python 2.4.1 (#1, May 21 2005, 19:56:42) [GCC 3.3 20030304 (Apple Computer, Inc. build 1495)] on darwin I've posted the code below, with some trepidation, since it's not a work of art and wasn't really intended to be seen by other human beings. It's necessarily quite long: any attempt to shorten it significantly seems to cancel the speed gain. Any clues as to what might be going on would be greatly appreciated! Mark # code starts here dummy0 = 47 dummy1 = 47 dummy2 = 47 dummy3 = 47 dummy4 = 47 dummy5 = 47 dummy6 = 47 dummy7 = 47 dummy8 = 47 dummy9 = 47 dummy10 = 47 dummy11 = 47 dummy12 = 47 dummy13 = 47 dummy14 = 47 dummy15 = 47 dummy16 = 47 dummy17 = 47 dummy18 = 47 dummy19 = 47 dummy20 = 47 dummy21 = 47 dummy22 = 47 dummy23 = 47 dummy24 = 47 dummy25 = 47 dummy26 = 47 dummy27 = 47 dummy28 = 47 # Sudoku puzzle solver via Knuth's method of `dancing links'. # Initial data: list of constraints, list of moves, and map from moves to lists of constraints template = ( | %s %s %s | %s %s %s | %s %s %s |\n * 3).join([ +---+---+---+\n] * 4) div_nums = range(9) symbols = 123456789 constraints = [position %d, %d % (i, j) for i in div_nums for j in div_nums] + \ [%s in %s %d % (i, j, k) for i in symbols for j in [row, column, block] for k in div_nums] satisfies = dict(((s, i, j), [position %d, %d % (i, j), %s in row %d % (s, i), %s in column %d % (s, j), %s in block %d % (s, i-i%3+j//3)]) for s in symbols for i in div_nums for j in div_nums) moves = satisfies.keys() class LLentry(object): pass # First set up the data objects and column objects def rowhead(obj): obj.L = obj.R = obj.M = obj def colhead(obj): obj.U = obj.D = obj.C = obj obj.S = 0 def rowinsert(obj, pos): # insert into doubly-linked list with header pos posL = pos.L obj.R = pos pos.L = obj obj.L = posL posL.R = obj obj.M = pos def colinsert(obj, pos): # as above posU = pos.U obj.D = pos pos.U = obj obj.U = posU posU.D = obj obj.C = pos pos.S += 1 def rowitems(pos): c = pos.R while c is not pos: yield c c = c.R def move(m): cc = m.R while cc is not m: c = cc.C c.R.L = c.L; c.L.R = c.R r = c.D while r is not c: j = r.R while j is not r: j.D.U = j.U j.U.D = j.D j.C.S -= 1 j = j.R r = r.D cc = cc.R moves_so_far.append(m) h = LLentry() rowhead(h); colhead(h) constraint_from_name = {} for name in constraints: obj = LLentry() obj.N = name; constraint_from_name[name] = obj rowinsert(obj, h); colhead(obj) obj.S = 0 move_from_name = {} for m in satisfies.keys(): # we must assume that each move satisfies at least one constraint obj = LLentry() obj.N = m; move_from_name[m] = obj colinsert(obj, h); rowhead(obj) ones = [(move_from_name[m], constraint_from_name[c]) for m, cc in satisfies.items() for c in cc] for m, c in ones: obj = LLentry() rowinsert(obj, m) colinsert(obj, c) moves_so_far = [] # everything's now set up to start the search def search(): if h.L is h: data = dict(((i, j), s) for s, i, j in (m.N for m in moves_so_far)) yield template % tuple(data[i, j] for i in div_nums for j in div_nums) else: mm = min((c.S, c) for c in rowitems(h))[1].D while mm is not mm.C: m = mm.M cc = m.R while cc is not m: c = cc.C c.R.L = c.L c.L.R = c.R r = c.D while r is not c: j = r.R while j is not r: j.D.U = j.U j.U.D = j.D j.C.S -= 1
Re: Speed quirk: redundant line gives six-fold speedup
On Thu, Aug 25, 2005 at 04:44:24PM +, Mark Dickinson wrote: I have a simple 192-line Python script that begins with the line: dummy0 = 47 The script runs in less than 2.5 seconds. The variable dummy0 is never referenced again, directly or indirectly, by the rest of the script. Here's the surprise: if I remove or comment out this first line, the script takes more than 15 seconds to run. So it appears that adding a redundant line produces a spectacular six-fold increase in speed! (Actually, I had to add 29 dummy lines at the beginning of the code to get the speed increase; if any one of these lines is removed the running time reverts to around 15 seconds again.) Questions: (1) Can anyone else reproduce this behaviour, or is it just some quirk of my setup? I get the same thing. (2) Any possible explanations? Is there some optimization that kicks in at a certain number of lines, or at a certain length of bytecode? It seems to be related to the number of globals. I get the fast version with 30 to 120 globals and the slow version with less than 30 or more than 130. It actually gets even slower for higher numbers of globals. Here is a snippet to adjust the number of globals for (i) in range(100): globals()['dummy%d' % (i)] = 1 (3) If (2), is there some way to force the optimization, so that I can get the speed increase without having to add the extra lines? Yes, module level globals have bad lookup times compared to function local names. If you refactor your code to pass around the data currently at the global module level you should see times at least as fast as the current 'fast' one. That said, I'm very surprised that the lookup times jump around so much. Your code does bazillions of namespace lookups, so a small difference in lookup times is getting multiplied into some really big numbers. -jackdied -- http://mail.python.org/mailman/listinfo/python-list
Re: Speed quirk: redundant line gives six-fold speedup
Mark Dickinson wrote: Questions: (1) Can anyone else reproduce this behaviour, or is it just some quirk of my setup? (2) Any possible explanations? Is there some optimization that kicks in at a certain number of lines, or at a certain length of bytecode? (3) If (2), is there some way to force the optimization, so that I can get the speed increase without having to add the extra lines? I see no difference in execution times, as expected. The most likely explanation is simply that other things were going on on your system when you ran the first test, but not the second test, resulting in the discrepancy. In other words, the speed change had nothing to do with your dummy lines. -- Erik Max Francis [EMAIL PROTECTED] http://www.alcyone.com/max/ San Jose, CA, USA 37 20 N 121 53 W AIM erikmaxfrancis There never was a good war or a bad peace. -- Benjamin Franklin -- http://mail.python.org/mailman/listinfo/python-list
Re: Speed quirk: redundant line gives six-fold speedup
On 8/25/05, Mark Dickinson [EMAIL PROTECTED] wrote: I have a simple 192-line Python script that begins with the line: dummy0 = 47 The script runs in less than 2.5 seconds. The variable dummy0 is never referenced again, directly or indirectly, by the rest of the script. Here's the surprise: if I remove or comment out this first line, the script takes more than 15 seconds to run. So it appears that adding a redundant line produces a spectacular six-fold increase in speed! (Actually, I had to add 29 dummy lines at the beginning of the code to get the speed increase; if any one of these lines is removed the running time reverts to around 15 seconds again.) Questions: snip One of my own: what in the world made you think maybe I'll add 29 dummy global variables to speed things up? It seems to work (19x speedup on my machine), I'm just curious what path you followed to get there. And, finally, you should forward this to the python-dev list, if somebody hasn't already. There are more people who know a ton about python internals there. Peace Bill Mill bill.mill at gmail.com -- http://mail.python.org/mailman/listinfo/python-list
Re: Speed quirk: redundant line gives six-fold speedup
On 8/25/05, Erik Max Francis [EMAIL PROTECTED] wrote: Mark Dickinson wrote: Questions: (1) Can anyone else reproduce this behaviour, or is it just some quirk of my setup? (2) Any possible explanations? Is there some optimization that kicks in at a certain number of lines, or at a certain length of bytecode? (3) If (2), is there some way to force the optimization, so that I can get the speed increase without having to add the extra lines? I see no difference in execution times, as expected. The most likely explanation is simply that other things were going on on your system when you ran the first test, but not the second test, resulting in the discrepancy. In other words, the speed change had nothing to do with your dummy lines. Unlikely; 2 people have confirmed these results already. I did find, though, that if I remove all print statements from the program, the dummy and non-dummy variable versions take indentical time. Can others reproduce this? I'm Investigating further... Peace Bill Mill bill.mill at gmail.com -- http://mail.python.org/mailman/listinfo/python-list
Re: Speed quirk: redundant line gives six-fold speedup
Bill Mill wrote: Unlikely; 2 people have confirmed these results already. I did find, though, that if I remove all print statements from the program, the dummy and non-dummy variable versions take indentical time. Can others reproduce this? Yes, it's obviously a real effect given the other sightings. I don't see any speed difference, myself (Pentium IV 3.0 GHz running Slackware Linux). -- Erik Max Francis [EMAIL PROTECTED] http://www.alcyone.com/max/ San Jose, CA, USA 37 20 N 121 53 W AIM erikmaxfrancis There never was a good war or a bad peace. -- Benjamin Franklin -- http://mail.python.org/mailman/listinfo/python-list
Re: Speed quirk: redundant line gives six-fold speedup
Bill Mill wrote: Pentium M 1.8 GHz Windows 2k. Here's the top of the profile results for fast and slow on my machine (these won't look decent except in a fixed-width font): snip profiles Interestingly, the test.py:36 line, which takes 45 seconds (!!) in the slow version, does not appear at all in the fast profile. I can't figure out why - both printed out their data, so template must have been called somewhere. OK, I'm getting somewhere now. When I replace: template = ( | %s %s %s | %s %s %s | %s %s %s |\n * 3).join([ +---+---+---+\n] * 4) wtih: template =| %s %s %s | %s %s %s | %s %s %s |\n | %s %s %s | %s %s %s | %s %s %s |\n | %s %s %s | %s %s %s | %s %s %s |\n +---+---+---+\n | %s %s %s | %s %s %s | %s %s %s |\n | %s %s %s | %s %s %s | %s %s %s |\n | %s %s %s | %s %s %s | %s %s %s |\n +---+---+---+\n | %s %s %s | %s %s %s | %s %s %s |\n | %s %s %s | %s %s %s | %s %s %s |\n | %s %s %s | %s %s %s | %s %s %s |\n +---+---+---+\n Then the non-dummy version is faster than the dummy version (very slightly, presumably because it doesn't need to allocate 28 dummy variables). Peace Bill Mill bill.mill at gmail.com -- http://mail.python.org/mailman/listinfo/python-list
Re: Speed quirk: redundant line gives six-fold speedup
On Thu, Aug 25, 2005 at 01:35:04PM -0400, Bill Mill wrote: On 8/25/05, Erik Max Francis [EMAIL PROTECTED] wrote: Mark Dickinson wrote: Questions: (1) Can anyone else reproduce this behaviour, or is it just some quirk of my setup? (2) Any possible explanations? Is there some optimization that kicks in at a certain number of lines, or at a certain length of bytecode? (3) If (2), is there some way to force the optimization, so that I can get the speed increase without having to add the extra lines? I did find, though, that if I remove all print statements from the program, the dummy and non-dummy variable versions take indentical time. Can others reproduce this? I'm Investigating further... I'm getting similarly freakish results. I tried a little ghetto debugging by putting a printf in dictobject.c's resize method and recompiling python. Sadly I can't get the problem to reproduce itself with the new binary (with or without the printf). The Ubuntu default 2.4.1 is sometimes fast, my hand compiled one (./configure make) is always slow. There are some very arcane low level things going on here. sprat:~/src/Python-2.4.1# time ./python /tmp/odd.py /dev/null 7.876u 0.008s 0:07.91 99.4% 0+0k 0+0io 0pf+0w sprat:~/src/Python-2.4.1# time python /tmp/odd.py /dev/null 1.813u 0.004s 0:01.77 102.2%0+0k 0+0io 0pf+0w sprat:~/src/Python-2.4.1# ./python Python 2.4.1 (#5, Aug 25 2005, 13:55:44) [GCC 3.3.5 (Debian 1:3.3.5-8ubuntu2)] on linux2 Type help, copyright, credits or license for more information. sprat:~/src/Python-2.4.1# python Python 2.4.1 (#2, Mar 30 2005, 21:51:10) [GCC 3.3.5 (Debian 1:3.3.5-8ubuntu2)] on linux2 Type help, copyright, credits or license for more information. No-idea-ly, -jackdied -- http://mail.python.org/mailman/listinfo/python-list
Re: Speed quirk: redundant line gives six-fold speedup
In article [EMAIL PROTECTED], Bill Mill [EMAIL PROTECTED] wrote: One of my own: what in the world made you think maybe I'll add 29 dummy global variables to speed things up? You mean this isn't a well-known optimization technique? :) I was refactoring the code, and after making a particular function redundant I noticed that removing the code for that function produced the slow down described. Then I naturally experimented to try to figure out what I had to put back in to recover the original speed. Not that I really care about the speed itself, but I'd dearly like to understand what's at work here. Mark -- http://mail.python.org/mailman/listinfo/python-list
Re: Speed quirk: redundant line gives six-fold speedup
Mark Dickinson wrote: Questions: (1) Can anyone else reproduce this behaviour, or is it just some quirk of my setup? yes. I get 7 sec vs 1 sec on my laptop. (2) Any possible explanations? Is there some optimization that kicks in at a certain number of lines, or at a certain length of bytecode? I don't think there are any optimizations at play here. (3) If (2), is there some way to force the optimization, so that I can get the speed increase without having to add the extra lines? When I start a python interpreter and import the module the speed difference disappears and it actually gets about two times faster. YMMV. I don't have a solution but I admire the problem. [1] ... jay [1] Google tells me that this is probably attributable to Ashleigh Brilliant. -- http://mail.python.org/mailman/listinfo/python-list
Re: Speed quirk: redundant line gives six-fold speedup
On Thu, Aug 25, 2005 at 01:55:48PM -0400, Bill Mill wrote: Bill Mill wrote: Pentium M 1.8 GHz Windows 2k. Here's the top of the profile results for fast and slow on my machine (these won't look decent except in a fixed-width font): snip profiles Interestingly, the test.py:36 line, which takes 45 seconds (!!) in the slow version, does not appear at all in the fast profile. I can't figure out why - both printed out their data, so template must have been called somewhere. OK, I'm getting somewhere now. When I replace: template = ( | %s %s %s | %s %s %s | %s %s %s |\n * 3).join([ +---+---+---+\n] * 4) wtih: template =| %s %s %s | %s %s %s | %s %s %s |\n | %s %s %s | %s %s %s | %s %s %s |\n | %s %s %s | %s %s %s | %s %s %s |\n +---+---+---+\n | %s %s %s | %s %s %s | %s %s %s |\n | %s %s %s | %s %s %s | %s %s %s |\n | %s %s %s | %s %s %s | %s %s %s |\n +---+---+---+\n | %s %s %s | %s %s %s | %s %s %s |\n | %s %s %s | %s %s %s | %s %s %s |\n | %s %s %s | %s %s %s | %s %s %s |\n +---+---+---+\n Then the non-dummy version is faster than the dummy version (very slightly, presumably because it doesn't need to allocate 28 dummy variables). I think this is just another tweaking that hits the magic spot (or avoids the bad spot) in some lookup table somewhere. Perhaps it has to do with the number interned strings and some worst case behavior? varnames are interened, so changing the number of globals would change the number of interened strings as well. -jackdied -- http://mail.python.org/mailman/listinfo/python-list
Re: Speed quirk: redundant line gives six-fold speedup
On 8/25/05, Jack Diederich [EMAIL PROTECTED] wrote: On Thu, Aug 25, 2005 at 01:35:04PM -0400, Bill Mill wrote: On 8/25/05, Erik Max Francis [EMAIL PROTECTED] wrote: Mark Dickinson wrote: Questions: (1) Can anyone else reproduce this behaviour, or is it just some quirk of my setup? (2) Any possible explanations? Is there some optimization that kicks in at a certain number of lines, or at a certain length of bytecode? (3) If (2), is there some way to force the optimization, so that I can get the speed increase without having to add the extra lines? I did find, though, that if I remove all print statements from the program, the dummy and non-dummy variable versions take indentical time. Can others reproduce this? I'm Investigating further... I'm getting similarly freakish results. I tried a little ghetto debugging by putting a printf in dictobject.c's resize method and recompiling python. Sadly I can't get the problem to reproduce itself with the new binary (with or without the printf). The Ubuntu default 2.4.1 is sometimes fast, my hand compiled one (./configure make) is always slow. There are some very arcane low level things going on here. agreed. Also, either I was temporarily insane, or the version with the explicit template no longer runs faster for me, so I hope nobody spends a lot of time on that. Peace Bill Mill bill.mill at gmail.com -- http://mail.python.org/mailman/listinfo/python-list
Re: Speed quirk: redundant line gives six-fold speedup
Mark Dickinson wrote: I have a simple 192-line Python script that begins with the line: dummy0 = 47 The script runs in less than 2.5 seconds. The variable dummy0 is never referenced again, directly or indirectly, by the rest of the script. Here's the surprise: if I remove or comment out this first line, the script takes more than 15 seconds to run. So it appears that adding a redundant line produces a spectacular six-fold increase in speed! (Actually, I had to add 29 dummy lines at the beginning of the code to get the speed increase; if any one of these lines is removed the running time reverts to around 15 seconds again.) Questions: (1) Can anyone else reproduce this behaviour, or is it just some quirk of my setup? (2) Any possible explanations? Is there some optimization that kicks in at a certain number of lines, or at a certain length of bytecode? (3) If (2), is there some way to force the optimization, so that I can get the speed increase without having to add the extra lines? Hi. I haven't been able to reproduce this but I had a similar case before (a program that some times crashed and some times worked perfectly and the difference was whitespace in the comments!!!). After lots of wondering and thinking that I must be dreaming (luckily I had pyvm which also crashed but for different amounts of whitespace), it was solved. The explanation is this: hash and comparison of objects depends on the state of the memory allocator. A sample case is this: class A: pass dummy0=47 # comment this to get a different result for min a=A() b=A() print min (a, b) the result of 'min' is not only non-deterministic but also depends on whether other things have been allocated before. The same thing can happen for 'dictionary.keys()' if the keys are objects and 'iterate-over-set' when the set contains objects. In the sudoku solver, there is a min (number, object) which is probably what's affected by the extistance of the dummy variable. Now, in sudoku puzzles some times the algorithm has to suppose that in a box the right solution is one of the possible, try it and if it fails then try the other one. I guess that the result of from the different 'min' leads the solver to first try the correct solution in the fast case and therefore need not attempt the wrong one. Or at least this is what I think that happens. By the way, since we started posting code, here is a more pythonic sudoku solver. #The 'man with scissors runs around shifting barrels algorithm' from sys import argv SEMANTIC = 1'SEM' in argv class Impossible: pass Patterns = [] for r in range (9): for c in range (9): p = 27*(r/3) + 3*(c/3) pl = set (range (9*r, 9*r+9) + range (c, 81, 9) + [p+x for x in (0,1,2,9,10,11,18,19,20)]) pl.remove (9*r+c) Patterns.append (tuple (sorted (list (pl def initboard (): x = range (1, 10) return [ x [:] for i in xrange (9*9) ] def printboard (board): if not SEMANTIC: return print 30*'-' for i in range (9): for j in board [9*i:9*(i+1)]: if type (j) is list: #print 'X', print ''.join (map (str, j)), else: print j, print print 30*'-' def dupboard (board): B = [] for i in board: if type (i) is list: B.append (i [:]) else: B.append (i) return B def solve (board, coords): while coords: p, v = coords.pop () board [p] = v for i in Patterns [p]: if type (board [i]) is list: if v in board [i]: board [i].remove (v) if len (board [i]) == 1: board [i] = board [i][0] coords.append ((i, board [i])) else: if board [i] == v: raise Impossible for p, i in enumerate (board): if type (i) is list: for j in i: try: return solve (dupboard (board), [(p, j)]) except Impossible: pass raise Impossible return board PP = [ [ 7xx19, 46x19, xxx6827x4, x9xx7, xxx3xx4x5, xx67x, xx1xx, 2xxx74xxx, xxx2xx3xx, ] ] def puz2coord (P): if len (P) != 9: print P must have 9 rows raise SystemExit coords = [] for r, i in enumerate (P): if len (i) != 9: print Row [%s] doesn't have 9 columns %i raise SystemExit for c, j in enumerate (list (i)): if j != 'x': coords.append ((9*r + c, int (j))) return coords try: if SEMANTIC: for i in xrange (10): for P in PP: printboard (solve (initboard (), puz2coord (P))) else: for i in xrange (TIMES): for P in PP: printboard (solve (initboard (),
Re: Speed quirk: redundant line gives six-fold speedup
On 8/25/05, Erik Max Francis [EMAIL PROTECTED] wrote: Bill Mill wrote: Unlikely; 2 people have confirmed these results already. I did find, though, that if I remove all print statements from the program, the dummy and non-dummy variable versions take indentical time. Can others reproduce this? Yes, it's obviously a real effect given the other sightings. I don't see any speed difference, myself (Pentium IV 3.0 GHz running Slackware Linux). Pentium M 1.8 GHz Windows 2k. Here's the top of the profile results for fast and slow on my machine (these won't look decent except in a fixed-width font): Slow: 6766494 function calls (6737594 primitive calls) in 45.740 CPU seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 3322320 20.5390.000 31.1520.000 test.py:135(generator expression ) 27520 10.6410.000 41.7920.002 :0(min) 3322320 10.6130.000 10.6130.000 test.py:81(rowitems) 28100/203.6200.000 45.6332.282 test.py:130(search) 275450.1130.0000.1130.000 :0(append) 275200.0980.0000.0980.000 :0(pop) 10.0410.041 45.736 45.736 test.py:36(?) Fast: 540174 function calls (536514 primitive calls) in 3.506 CPU seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 2596401.5160.0002.3030.000 test.py:135(generator expression ) 22800.7910.0003.0940.001 :0(min) 2596400.7880.0000.7880.000 test.py:81(rowitems) 2860/200.2690.0003.3910.170 test.py:130(search) 10.0450.0453.4993.499 test.py:2(?) 36450.0210.0000.0210.000 test.py:71(colinsert) 32400.0190.0000.0190.000 test.py:62(rowinsert) 23050.0100.0000.0100.000 :0(append) Interestingly, the test.py:36 line, which takes 45 seconds (!!) in the slow version, does not appear at all in the fast profile. I can't figure out why - both printed out their data, so template must have been called somewhere. Peace Bill Mill bill.mill at gmail.com -- http://mail.python.org/mailman/listinfo/python-list
Re: Speed quirk: redundant line gives six-fold speedup
On Thu, Aug 25, 2005 at 09:23:09PM +0300, Stelios Xanthakis wrote: The explanation is this: hash and comparison of objects depends on the state of the memory allocator. A sample case is this: class A: pass dummy0=47 # comment this to get a different result for min a=A() b=A() print min (a, b) the result of 'min' is not only non-deterministic but also depends on whether other things have been allocated before. The same thing can happen for 'dictionary.keys()' if the keys are objects and 'iterate-over-set' when the set contains objects. In the sudoku solver, there is a min (number, object) which is probably what's affected by the extistance of the dummy variable. Now, in sudoku puzzles some times the algorithm has to suppose Doh, I feel silly. Without an 'import random' in the program I assumed it was deterministic. I would have also expected the comparison to be a TypeError, and it sometimes is. mynum = 7 myob = object() cmp(mynum, myob) -1 mynum.__cmp__(myob) Traceback (most recent call last): File stdin, line 1, in ? TypeError: int.__cmp__(x,y) requires y to be a 'int', not a 'object' Thanks for the explanation, -jackdied -- http://mail.python.org/mailman/listinfo/python-list
Re: Speed quirk: redundant line gives six-fold speedup
On 8/25/05, Jack Diederich [EMAIL PROTECTED] wrote: On Thu, Aug 25, 2005 at 09:23:09PM +0300, Stelios Xanthakis wrote: The explanation is this: hash and comparison of objects depends on the state of the memory allocator. A sample case is this: class A: pass dummy0=47 # comment this to get a different result for min a=A() b=A() print min (a, b) the result of 'min' is not only non-deterministic but also depends on whether other things have been allocated before. The same thing can happen for 'dictionary.keys()' if the keys are objects and 'iterate-over-set' when the set contains objects. I'm also pretty sure I've caught a bug in his code, though I'm not sure how it works exactly. I replaced the 'min' built-in with my own min, and he's going to get nondeterministic results from this line: mm = min((c.S, c) for c in rowitems(h))[1].D because 'c' is often the exact same object. A snippet from my debugging version of 'min', which prints out the tuple its handed: (1, __main__.LLentry object at 0x00969710) (1, __main__.LLentry object at 0x00969710) (4, __main__.LLentry object at 0x00969710) snip more 4s from the same object (4, __main__.LLentry object at 0x00969710) (3, __main__.LLentry object at 0x00969710) (3, __main__.LLentry object at 0x00969710) (3, __main__.LLentry object at 0x00969710) (2, __main__.LLentry object at 0x00969710) snip more 2s, same object Although they appear in order here, they don't always. Often, multiple objects have a value of 1, and he's going to get one of them at random as the 'min' object. I'm pretty sure. Mark, can you confirm that this is/isn't a bug? (btw, it still runs fast with and slow without the dummies with my custom min() func) Peace Bill Mill bill.mill at gmail.com -- http://mail.python.org/mailman/listinfo/python-list
Re: Speed quirk: redundant line gives six-fold speedup
In article [EMAIL PROTECTED], Stelios Xanthakis [EMAIL PROTECTED] wrote: In the sudoku solver, there is a min (number, object) which is probably what's affected by the extistance of the dummy variable. Now, in sudoku puzzles some times the algorithm has to suppose that in a box the right solution is one of the possible, try it and if it fails then try the other one. I guess that the result of from the different 'min' leads the solver to first try the correct solution in the fast case and therefore need not attempt the wrong one. Or at least this is what I think that happens. Thank you! That does indeed seem to be the explanation. The min() line looks for the shortest `constraint'; that is, the box with the fewest possible symbols, or the symbol with the smallest number of possible positions within a given row, column or block; the code below that line then tries all possibilities for this constraint. The line should really be something like min(c for c in rowitems(h), key = lambda c: c.S) but that isn't going to work until Python 2.5 comes out, and I was too lazy to expand the whole thing properly instead of using the cheap trick I did. Thanks again for clearing up this confusion. Mark -- http://mail.python.org/mailman/listinfo/python-list
Re: Speed quirk: redundant line gives six-fold speedup
In article [EMAIL PROTECTED], Bill Mill [EMAIL PROTECTED] wrote: I'm also pretty sure I've caught a bug in his code, though I'm not sure how it works exactly. I replaced the 'min' built-in with my own min, and he's going to get nondeterministic results from this line: mm = min((c.S, c) for c in rowitems(h))[1].D because 'c' is often the exact same object. A snippet from my debugging version of 'min', which prints out the tuple its handed: (1, main .LLentry object at 0x00969710) (1, main .LLentry object at 0x00969710) (4, main .LLentry object at 0x00969710) snip more 4s from the same object (4, main .LLentry object at 0x00969710) (3, main .LLentry object at 0x00969710) (3, main .LLentry object at 0x00969710) (3, main .LLentry object at 0x00969710) (2, main .LLentry object at 0x00969710) snip more 2s, same object The c's returned by any one call to rowitems(h) should all be distinct. This seems to work in my code---I can't reproduce your results above, and I don't *think* there's a bug there. Although they appear in order here, they don't always. Often, multiple objects have a value of 1, and he's going to get one of them at random as the 'min' object. I'm pretty sure. I agree entirely---this is where my bug, and the source of all the confusion comes from. I was briefly aware as I wrote this that the result would be nondeterministic, but persuaded myself after two second's thought that it didn't matter, then forgot all about it. So mild changes in the rest of the program give rise to a different `random' choice in the min, and choosing to eunumerate all 3 possibilities for position 5,7 instead of the 3 possibilities for position 2, 1 (say) makes a huge diffference to the running time. I'm still surprised by the magnitude of the differences, though. I've learnt my lesson :) Thank you for your help, and apologies for wasting other people's time with this as well as my own! Mark -- http://mail.python.org/mailman/listinfo/python-list
Re: Speed quirk: redundant line gives six-fold speedup
Mark Dickinson wrote: I have a simple 192-line Python script that begins with the line: dummy0 = 47 The script runs in less than 2.5 seconds. The variable dummy0 is never referenced again, directly or indirectly, by the rest of the script. Here's the surprise: if I remove or comment out this first line, the script takes more than 15 seconds to run. So it appears that adding a redundant line produces a spectacular six-fold increase in speed! Thanks for your post. It is cute, brilliant, and interesting. I haven't had time to investigate but can point at the likely cause. All of the global names are stored in a single hash table. Search time is dictated by two factors, the sparseness of the hash table and the distribution of hash values. With respect to sparseness, whenever that table becomes 2/3 full, it grows by a factor of four and becomes only 1/6 full (leading to many fewer collisions). With respect to distribution, it should be noted that string hash values are decidely non-random and your variable names likely congested consecutive spaces in a nearly full table (resulting in seven times as many search probes to find a global value). When the extra value was added, it likely resized the table four-fold and redistributed the hash values into fewer consecutive positions. Raymond P.S. To analyze it further, start with something like this: len(set(hash('dummy%d' %i) 31 for i in xrange(29))) 26 len(set(hash('dummy%d' %i) 127 for i in xrange(29))) 29 -- http://mail.python.org/mailman/listinfo/python-list
Re: Speed quirk: redundant line gives six-fold speedup
Stelios Xanthakis wrote: Mark Dickinson wrote: I have a simple 192-line Python script that begins with the line: dummy0 = 47 The script runs in less than 2.5 seconds. The variable dummy0 is never referenced again, directly or indirectly, by the rest of the script. Here's the surprise: if I remove or comment out this first line, the script takes more than 15 seconds to run. So it appears that adding a redundant line produces a spectacular six-fold increase in speed! (Actually, I had to add 29 dummy lines at the beginning of the code to get the speed increase; if any one of these lines is removed the running time reverts to around 15 seconds again.) Questions: (1) Can anyone else reproduce this behaviour, or is it just some quirk of my setup? (2) Any possible explanations? Is there some optimization that kicks in at a certain number of lines, or at a certain length of bytecode? (3) If (2), is there some way to force the optimization, so that I can get the speed increase without having to add the extra lines? Hi. I haven't been able to reproduce this but I had a similar case before (a program that some times crashed and some times worked perfectly and the difference was whitespace in the comments!!!). After lots of wondering and thinking that I must be dreaming (luckily I had pyvm which also crashed but for different amounts of whitespace), it was solved. The explanation is this: hash and comparison of objects depends on the state of the memory allocator. A sample case is this: class A: pass dummy0=47 # comment this to get a different result for min a=A() b=A() print min (a, b) the result of 'min' is not only non-deterministic but also depends on whether other things have been allocated before. The same thing can happen for 'dictionary.keys()' if the keys are objects and 'iterate-over-set' when the set contains objects. I do not get the point here: isn't min comparing the adress in memory as there is nothing else to compare? [python 2.4.1 on ubuntu linux] On 10 runs from within emacs I had about 50% for `a' and 50% for `b' returned by min (a,b). It was the same without the dummy0=47. On 10 runs from the command line I had always `a' returned (with and without the dummy). The only difference was the address of b's object. (For all the run of each case the two addresses were exactly the same) For your first post: On 10 tests run for each case (average time, but each time was never different one from the other more that .02s) 0.554 when the 28 dummies are in place 6.679 when commented out, removed or put in a single multiline string 2.576 when putting the 28 dummies in a list or a dict using a for loop 7.195 when creating the 28 dummies using an exec in a for loop What about memory allocation that would be performed chunk by chunk by the interpreter? Then being at the end or at the beginning of a chunk may not be the same for processing? All the instructions of the program would then be in the cpu cache for example in the same block while in the other case theyr would be in two distinct block - thus less caching for the cpu... [this particular problem arose when I was working several year ago on a persistent object manager for a distributed operating system] -- rafi Imagination is more important than knowledge. (Albert Einstein) -- http://mail.python.org/mailman/listinfo/python-list
Re: Speed quirk: redundant line gives six-fold speedup
Two observations: 1 - The difference in run time with and without the dummy* globals is due to a difference in the number of invokations of the search() function: 1,140 resp. 27,530 in my environment. To verify, just change the line def search(): to searches = 0 def search(): global searches searches += 1 and add at the very end print searches, searches 2 - The run times with and without the dummy* variables is equal(ly slow) if the LLentry() class and min() function call are modified to be independent of the object value. Change line class LLentry: pass to LLinst = 0 class LLentry(object): def __init__(self): global LLinst LLinst += 1 self.I = LLinst and change line mm = min((c.S, c) for c in rowitems(h))[1].D to mm = min((c.S, c.I, c) for c in rowitems(h))[2].D /Jean Brouwers -- http://mail.python.org/mailman/listinfo/python-list
Re: Speed quirk: redundant line gives six-fold speedup
I've learnt my lesson :) Thank you for your help, and apologies for wasting other people's time with this as well as my own! I've learnt my lesson reading through this thread, too. I am glad to be given the chance of wasting my time with it and very happy and thankful, that you posted your problem here before I have bumped into similar one myself. I suppose, that there are many others on this newsgroup who appreciated your posting, too. Claudio -- http://mail.python.org/mailman/listinfo/python-list