Re: Speed quirk: redundant line gives six-fold speedup

2005-08-26 Thread Benjamin Niemann
Raymond Hettinger wrote:

 
 Mark Dickinson wrote:
 I have a simple 192-line Python script that begins with the line:

 dummy0 = 47

 The script runs in less than 2.5 seconds.  The variable dummy0 is never
 referenced again, directly or indirectly, by the rest of the script.

 Here's the surprise: if I remove or comment out this first line, the
 script takes more than 15 seconds to run.  So it appears that adding a
 redundant line produces a spectacular six-fold increase in speed!
 
 Thanks for your post.  It is cute, brilliant, and interesting.
 
 I haven't had time to investigate but can point at the likely cause.
 All of the global names are stored in a single hash table.  Search time
 is dictated by two factors, the sparseness of the hash table and the
 distribution of hash values.  With respect to sparseness, whenever that
 table becomes 2/3 full, it grows by a factor of four and becomes only
 1/6 full (leading to many fewer collisions).  With respect to
 distribution, it should be noted that string hash values are decidely
 non-random and your variable names likely congested consecutive spaces
 in a nearly full table (resulting in seven times as many search probes
 to find a global value).
 
 When the extra value was added, it likely resized the table four-fold
 and redistributed the hash values into fewer consecutive positions.

If that's the cause, then here's another reason to use long, descriptive
names instead of C64-BASIC style a, b, c, i, j... - with long names the
chances of hash collisions are pretty low.
Or everyone will start optimizing their programs by using long, *random*
names ;)


-- 
Benjamin Niemann
Email: pink at odahoda dot de
WWW: http://www.odahoda.de/
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Speed quirk: redundant line gives six-fold speedup

2005-08-26 Thread Raymond Hettinger
[Raymond Hettinger]
  With respect to
  distribution, it should be noted that string hash values are decidely
  non-random and your variable names likely congested consecutive spaces
  in a nearly full table (resulting in seven times as many search probes
  to find a global value).
 
  When the extra value was added, it likely resized the table four-fold
  and redistributed the hash values into fewer consecutive positions.

[Benjamin Niemann]
 If that's the cause, then here's another reason to use long, descriptive
 names instead of C64-BASIC style a, b, c, i, j... - with long names the
 chances of hash collisions are pretty low.
 Or everyone will start optimizing their programs by using long, *random*
 names ;)

The wink applied to second line but could have also applied to the
first.  For the most part, the non-randomness of string hashes tends to
work in your favor -- it can result in collision free hash tables.

For optimization, the best approach is to use local variables in your
innermost loops.

Alternatively, if you want to be tricky, try something like this:

   globals().update(globals())

What this does and why it can improve search times is left as an
exercise for the reader :-)


Raymond

-- 
http://mail.python.org/mailman/listinfo/python-list


Speed quirk: redundant line gives six-fold speedup

2005-08-25 Thread Mark Dickinson
I have a simple 192-line Python script that begins with the line:

dummy0 = 47

The script runs in less than 2.5 seconds.  The variable dummy0 is never
referenced again, directly or indirectly, by the rest of the script.

Here's the surprise: if I remove or comment out this first line, the
script takes more than 15 seconds to run.  So it appears that adding a
redundant line produces a spectacular six-fold increase in speed!

(Actually, I had to add 29 dummy lines at the beginning of the code to
get the speed increase; if any one of these lines is removed the
running time reverts to around 15 seconds again.)

Questions:

(1) Can anyone else reproduce this behaviour, or is it just some quirk
of my setup?
(2) Any possible explanations?  Is there some optimization that kicks
in at a certain number of lines, or at a certain length of
bytecode?
(3) If (2), is there some way to force the optimization, so that I can
get the speed increase without having to add the extra lines?

I'm running Python 2.4.1 on a 1.2Ghz iBook G4:

Python 2.4.1 (#1, May 21 2005, 19:56:42) 
[GCC 3.3 20030304 (Apple Computer, Inc. build 1495)] on darwin

I've posted the code below, with some trepidation, since it's not a
work of art and wasn't really intended to be seen by other human
beings.  It's necessarily quite long: any attempt to shorten it
significantly seems to cancel the speed gain.  Any clues as to what
might be going on would be greatly appreciated!

Mark

# code starts here
dummy0 = 47
dummy1 = 47
dummy2 = 47
dummy3 = 47
dummy4 = 47
dummy5 = 47
dummy6 = 47
dummy7 = 47
dummy8 = 47
dummy9 = 47
dummy10 = 47
dummy11 = 47
dummy12 = 47
dummy13 = 47
dummy14 = 47
dummy15 = 47
dummy16 = 47
dummy17 = 47
dummy18 = 47
dummy19 = 47
dummy20 = 47
dummy21 = 47
dummy22 = 47
dummy23 = 47
dummy24 = 47
dummy25 = 47
dummy26 = 47
dummy27 = 47
dummy28 = 47

# Sudoku puzzle solver via Knuth's method of `dancing links'.

# Initial data: list of constraints, list of moves, and map from moves to lists 
of constraints

template = (  | %s %s %s | %s %s %s | %s %s %s |\n * 3).join([  
+---+---+---+\n] * 4)
div_nums = range(9)
symbols = 123456789

constraints = [position %d, %d % (i, j) for i in div_nums for j in div_nums] 
+ \
  [%s in %s %d % (i, j, k) for i in symbols for j in [row, 
column, block] for k in div_nums]

satisfies = dict(((s, i, j),
  [position %d, %d % (i, j),
   %s in row %d % (s, i),
   %s in column %d % (s, j),
   %s in block %d % (s, i-i%3+j//3)]) for s in symbols for i 
in div_nums for j in div_nums)

moves = satisfies.keys()

class LLentry(object): pass

# First set up the data objects and column objects

def rowhead(obj):
obj.L = obj.R = obj.M = obj

def colhead(obj):
obj.U = obj.D = obj.C = obj
obj.S = 0

def rowinsert(obj, pos):
# insert into doubly-linked list with header pos
posL = pos.L
obj.R = pos
pos.L = obj
obj.L = posL
posL.R = obj
obj.M = pos

def colinsert(obj, pos):
# as above
posU = pos.U
obj.D = pos
pos.U = obj
obj.U = posU
posU.D = obj
obj.C = pos
pos.S += 1

def rowitems(pos):
c = pos.R
while c is not pos:
yield c
c = c.R

def move(m):
cc = m.R
while cc is not m:
c = cc.C
c.R.L = c.L; c.L.R = c.R
r = c.D
while r is not c:
j = r.R
while j is not r:
j.D.U = j.U
j.U.D = j.D
j.C.S -= 1
j = j.R
r = r.D
cc = cc.R
moves_so_far.append(m)

h = LLentry()
rowhead(h); colhead(h)

constraint_from_name = {}
for name in constraints:
obj = LLentry()
obj.N = name; constraint_from_name[name] = obj
rowinsert(obj, h); colhead(obj)
obj.S = 0

move_from_name = {}
for m in satisfies.keys():
# we must assume that each move satisfies at least one constraint
obj = LLentry()
obj.N = m; move_from_name[m] = obj
colinsert(obj, h); rowhead(obj)

ones = [(move_from_name[m], constraint_from_name[c]) for m, cc in 
satisfies.items() for c in cc]
for m, c in ones:
obj = LLentry()
rowinsert(obj, m)
colinsert(obj, c)

moves_so_far = []
# everything's now set up to start the search

def search():
if h.L is h:
data = dict(((i, j), s) for s, i, j in (m.N for m in moves_so_far))
yield template % tuple(data[i, j] for i in div_nums for j in div_nums)
else:
mm = min((c.S, c) for c in rowitems(h))[1].D
while mm is not mm.C:
m = mm.M
cc = m.R
while cc is not m:
c = cc.C
c.R.L = c.L
c.L.R = c.R
r = c.D
while r is not c:
j = r.R
while j is not r:
j.D.U = j.U
j.U.D = j.D
j.C.S -= 1

Re: Speed quirk: redundant line gives six-fold speedup

2005-08-25 Thread Jack Diederich
On Thu, Aug 25, 2005 at 04:44:24PM +, Mark Dickinson wrote:
 I have a simple 192-line Python script that begins with the line:
 
 dummy0 = 47
 
 The script runs in less than 2.5 seconds.  The variable dummy0 is never
 referenced again, directly or indirectly, by the rest of the script.
 
 Here's the surprise: if I remove or comment out this first line, the
 script takes more than 15 seconds to run.  So it appears that adding a
 redundant line produces a spectacular six-fold increase in speed!
 
 (Actually, I had to add 29 dummy lines at the beginning of the code to
 get the speed increase; if any one of these lines is removed the
 running time reverts to around 15 seconds again.)
 
 Questions:
 
 (1) Can anyone else reproduce this behaviour, or is it just some quirk
 of my setup?

I get the same thing.

 (2) Any possible explanations?  Is there some optimization that kicks
 in at a certain number of lines, or at a certain length of
 bytecode?

It seems to be related to the number of globals.  I get the fast
version with 30 to 120 globals and the slow version with less than 30
or more than 130.  It actually gets even slower for higher numbers
of globals.

Here is a snippet to adjust the number of globals

for (i) in range(100):
  globals()['dummy%d' % (i)] = 1

 (3) If (2), is there some way to force the optimization, so that I can
 get the speed increase without having to add the extra lines?

Yes, module level globals have bad lookup times compared to function
local names.  If you refactor your code to pass around the data
currently at the global module level you should see times at least
as fast as the current 'fast' one.

That said, I'm very surprised that the lookup times jump around so much.
Your code does bazillions of namespace lookups, so a small difference
in lookup times is getting multiplied into some really big numbers.

-jackdied

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Speed quirk: redundant line gives six-fold speedup

2005-08-25 Thread Erik Max Francis
Mark Dickinson wrote:

 Questions:
 
 (1) Can anyone else reproduce this behaviour, or is it just some quirk
 of my setup?
 (2) Any possible explanations?  Is there some optimization that kicks
 in at a certain number of lines, or at a certain length of
 bytecode?
 (3) If (2), is there some way to force the optimization, so that I can
 get the speed increase without having to add the extra lines?

I see no difference in execution times, as expected.  The most likely 
explanation is simply that other things were going on on your system 
when you ran the first test, but not the second test, resulting in the 
discrepancy.  In other words, the speed change had nothing to do with 
your dummy lines.

-- 
Erik Max Francis  [EMAIL PROTECTED]  http://www.alcyone.com/max/
San Jose, CA, USA  37 20 N 121 53 W  AIM erikmaxfrancis
   There never was a good war or a bad peace.
   -- Benjamin Franklin
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Speed quirk: redundant line gives six-fold speedup

2005-08-25 Thread Bill Mill
On 8/25/05, Mark Dickinson [EMAIL PROTECTED] wrote:
 I have a simple 192-line Python script that begins with the line:
 
 dummy0 = 47
 
 The script runs in less than 2.5 seconds.  The variable dummy0 is never
 referenced again, directly or indirectly, by the rest of the script.
 
 Here's the surprise: if I remove or comment out this first line, the
 script takes more than 15 seconds to run.  So it appears that adding a
 redundant line produces a spectacular six-fold increase in speed!
 
 (Actually, I had to add 29 dummy lines at the beginning of the code to
 get the speed increase; if any one of these lines is removed the
 running time reverts to around 15 seconds again.)
 
 Questions:
snip

One of my own: what in the world made you think maybe I'll add 29
dummy global variables to speed things up?

It seems to work (19x speedup on my machine), I'm just curious what
path you followed to get there.

And, finally, you should forward this to the python-dev list, if
somebody hasn't already. There are more people who know a ton about
python internals there.

Peace
Bill Mill
bill.mill at gmail.com
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Speed quirk: redundant line gives six-fold speedup

2005-08-25 Thread Bill Mill
On 8/25/05, Erik Max Francis [EMAIL PROTECTED] wrote:
 Mark Dickinson wrote:
 
  Questions:
 
  (1) Can anyone else reproduce this behaviour, or is it just some quirk
  of my setup?
  (2) Any possible explanations?  Is there some optimization that kicks
  in at a certain number of lines, or at a certain length of
  bytecode?
  (3) If (2), is there some way to force the optimization, so that I can
  get the speed increase without having to add the extra lines?
 
 I see no difference in execution times, as expected.  The most likely
 explanation is simply that other things were going on on your system
 when you ran the first test, but not the second test, resulting in the
 discrepancy.  In other words, the speed change had nothing to do with
 your dummy lines.
 

Unlikely; 2 people have confirmed these results already.

I did find, though, that if I remove all print statements from the
program, the dummy and non-dummy variable versions take indentical
time. Can others reproduce this?

I'm Investigating further...

Peace
Bill Mill
bill.mill at gmail.com
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Speed quirk: redundant line gives six-fold speedup

2005-08-25 Thread Erik Max Francis
Bill Mill wrote:

 Unlikely; 2 people have confirmed these results already.
 
 I did find, though, that if I remove all print statements from the
 program, the dummy and non-dummy variable versions take indentical
 time. Can others reproduce this?

Yes, it's obviously a real effect given the other sightings.  I don't 
see any speed difference, myself (Pentium IV 3.0 GHz running Slackware 
Linux).

-- 
Erik Max Francis  [EMAIL PROTECTED]  http://www.alcyone.com/max/
San Jose, CA, USA  37 20 N 121 53 W  AIM erikmaxfrancis
   There never was a good war or a bad peace.
   -- Benjamin Franklin
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Speed quirk: redundant line gives six-fold speedup

2005-08-25 Thread Bill Mill
Bill Mill wrote:
 
 Pentium M 1.8 GHz Windows 2k. Here's the top of the profile results
 for fast and slow on my machine (these won't look decent except in a
 fixed-width font):
 
snip profiles

 Interestingly, the test.py:36 line, which takes 45 seconds (!!) in the
 slow version, does not appear at all in the fast profile. I can't
 figure out why - both printed out their data, so template must have
 been called somewhere.
 

OK, I'm getting somewhere now. When I replace:

template = (  | %s %s %s | %s %s %s | %s %s %s |\n * 3).join([ 
+---+---+---+\n] * 4)

wtih:

template =| %s %s %s | %s %s %s | %s %s %s |\n
  | %s %s %s | %s %s %s | %s %s %s |\n
  | %s %s %s | %s %s %s | %s %s %s |\n
  +---+---+---+\n
  | %s %s %s | %s %s %s | %s %s %s |\n
  | %s %s %s | %s %s %s | %s %s %s |\n
  | %s %s %s | %s %s %s | %s %s %s |\n
  +---+---+---+\n
  | %s %s %s | %s %s %s | %s %s %s |\n
  | %s %s %s | %s %s %s | %s %s %s |\n
  | %s %s %s | %s %s %s | %s %s %s |\n
  +---+---+---+\n

Then the non-dummy version is faster than the dummy version (very
slightly, presumably because it doesn't need to allocate 28 dummy
variables).

Peace
Bill Mill
bill.mill at gmail.com
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Speed quirk: redundant line gives six-fold speedup

2005-08-25 Thread Jack Diederich
On Thu, Aug 25, 2005 at 01:35:04PM -0400, Bill Mill wrote:
 On 8/25/05, Erik Max Francis [EMAIL PROTECTED] wrote:
  Mark Dickinson wrote:
  
   Questions:
  
   (1) Can anyone else reproduce this behaviour, or is it just some quirk
   of my setup?
   (2) Any possible explanations?  Is there some optimization that kicks
   in at a certain number of lines, or at a certain length of
   bytecode?
   (3) If (2), is there some way to force the optimization, so that I can
   get the speed increase without having to add the extra lines?
  
 
 I did find, though, that if I remove all print statements from the
 program, the dummy and non-dummy variable versions take indentical
 time. Can others reproduce this?
 
 I'm Investigating further...

I'm getting similarly freakish results.  I tried a little ghetto debugging
by putting a printf in dictobject.c's resize method and recompiling python.
Sadly I can't get the problem to reproduce itself with the new binary
(with or without the printf).  The Ubuntu default 2.4.1 is sometimes fast,
my hand compiled one (./configure  make) is always slow.

There are some very arcane low level things going on here.

sprat:~/src/Python-2.4.1# time ./python /tmp/odd.py  /dev/null
7.876u 0.008s 0:07.91 99.4% 0+0k 0+0io 0pf+0w
sprat:~/src/Python-2.4.1# time python /tmp/odd.py  /dev/null
1.813u 0.004s 0:01.77 102.2%0+0k 0+0io 0pf+0w

sprat:~/src/Python-2.4.1# ./python
Python 2.4.1 (#5, Aug 25 2005, 13:55:44) 
[GCC 3.3.5 (Debian 1:3.3.5-8ubuntu2)] on linux2
Type help, copyright, credits or license for more information.
 

sprat:~/src/Python-2.4.1# python
Python 2.4.1 (#2, Mar 30 2005, 21:51:10) 
[GCC 3.3.5 (Debian 1:3.3.5-8ubuntu2)] on linux2
Type help, copyright, credits or license for more information.
 

No-idea-ly,

-jackdied
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Speed quirk: redundant line gives six-fold speedup

2005-08-25 Thread Mark Dickinson
In article [EMAIL PROTECTED], Bill 
Mill [EMAIL PROTECTED] wrote:

 One of my own: what in the world made you think maybe I'll add 29
 dummy global variables to speed things up?

You mean this isn't a well-known optimization technique? :)

I was refactoring the code, and after making a particular function
redundant I noticed that removing the code for that function produced
the slow down described.  Then I naturally experimented to try to figure
out what I had to put back in to recover the original speed.

Not that I really care about the speed itself, but I'd dearly like to
understand what's at work here.

Mark
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Speed quirk: redundant line gives six-fold speedup

2005-08-25 Thread jay graves
Mark Dickinson wrote:
 Questions:
 (1) Can anyone else reproduce this behaviour, or is it just some quirk
 of my setup?

yes.  I get 7 sec vs 1 sec on my laptop.

 (2) Any possible explanations?  Is there some optimization that kicks
 in at a certain number of lines, or at a certain length of
 bytecode?

I don't think there are any optimizations at play here.

 (3) If (2), is there some way to force the optimization, so that I can
 get the speed increase without having to add the extra lines?

When I start a python interpreter and import the module the speed
difference disappears and it actually gets about two times faster.
YMMV.

I don't have a solution but I admire the problem. [1]

...
jay

[1] Google tells me that this is probably attributable to Ashleigh
Brilliant.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Speed quirk: redundant line gives six-fold speedup

2005-08-25 Thread Jack Diederich
On Thu, Aug 25, 2005 at 01:55:48PM -0400, Bill Mill wrote:
 Bill Mill wrote:
  
  Pentium M 1.8 GHz Windows 2k. Here's the top of the profile results
  for fast and slow on my machine (these won't look decent except in a
  fixed-width font):
  
 snip profiles
 
  Interestingly, the test.py:36 line, which takes 45 seconds (!!) in the
  slow version, does not appear at all in the fast profile. I can't
  figure out why - both printed out their data, so template must have
  been called somewhere.
  
 
 OK, I'm getting somewhere now. When I replace:
 
 template = (  | %s %s %s | %s %s %s | %s %s %s |\n * 3).join([ 
 +---+---+---+\n] * 4)
 
 wtih:
 
 template =| %s %s %s | %s %s %s | %s %s %s |\n
   | %s %s %s | %s %s %s | %s %s %s |\n
   | %s %s %s | %s %s %s | %s %s %s |\n
   +---+---+---+\n
   | %s %s %s | %s %s %s | %s %s %s |\n
   | %s %s %s | %s %s %s | %s %s %s |\n
   | %s %s %s | %s %s %s | %s %s %s |\n
   +---+---+---+\n
   | %s %s %s | %s %s %s | %s %s %s |\n
   | %s %s %s | %s %s %s | %s %s %s |\n
   | %s %s %s | %s %s %s | %s %s %s |\n
   +---+---+---+\n
 
 Then the non-dummy version is faster than the dummy version (very
 slightly, presumably because it doesn't need to allocate 28 dummy
 variables).

I think this is just another tweaking that hits the magic spot (or
avoids the bad spot) in some lookup table somewhere.  Perhaps it
has to do with the number interned strings and some worst case
behavior? varnames are interened, so changing the number of globals
would change the number of interened strings as well.

-jackdied

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Speed quirk: redundant line gives six-fold speedup

2005-08-25 Thread Bill Mill
On 8/25/05, Jack Diederich [EMAIL PROTECTED] wrote:
 On Thu, Aug 25, 2005 at 01:35:04PM -0400, Bill Mill wrote:
  On 8/25/05, Erik Max Francis [EMAIL PROTECTED] wrote:
   Mark Dickinson wrote:
  
Questions:
   
(1) Can anyone else reproduce this behaviour, or is it just some quirk
of my setup?
(2) Any possible explanations?  Is there some optimization that kicks
in at a certain number of lines, or at a certain length of
bytecode?
(3) If (2), is there some way to force the optimization, so that I can
get the speed increase without having to add the extra lines?
  
 
  I did find, though, that if I remove all print statements from the
  program, the dummy and non-dummy variable versions take indentical
  time. Can others reproduce this?
 
  I'm Investigating further...
 
 I'm getting similarly freakish results.  I tried a little ghetto debugging
 by putting a printf in dictobject.c's resize method and recompiling python.
 Sadly I can't get the problem to reproduce itself with the new binary
 (with or without the printf).  The Ubuntu default 2.4.1 is sometimes fast,
 my hand compiled one (./configure  make) is always slow.
 
 There are some very arcane low level things going on here.
 

agreed. Also, either I was temporarily insane, or the version with the
explicit template no longer runs faster for me, so I hope nobody
spends a lot of time on that.

Peace
Bill Mill
bill.mill at gmail.com
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Speed quirk: redundant line gives six-fold speedup

2005-08-25 Thread Stelios Xanthakis
Mark Dickinson wrote:
 I have a simple 192-line Python script that begins with the line:
 
 dummy0 = 47
 
 The script runs in less than 2.5 seconds.  The variable dummy0 is never
 referenced again, directly or indirectly, by the rest of the script.
 
 Here's the surprise: if I remove or comment out this first line, the
 script takes more than 15 seconds to run.  So it appears that adding a
 redundant line produces a spectacular six-fold increase in speed!
 
 (Actually, I had to add 29 dummy lines at the beginning of the code to
 get the speed increase; if any one of these lines is removed the
 running time reverts to around 15 seconds again.)
 
 Questions:
 
 (1) Can anyone else reproduce this behaviour, or is it just some quirk
 of my setup?
 (2) Any possible explanations?  Is there some optimization that kicks
 in at a certain number of lines, or at a certain length of
 bytecode?
 (3) If (2), is there some way to force the optimization, so that I can
 get the speed increase without having to add the extra lines?


Hi.

I haven't been able to reproduce this but I had a similar case
before (a program that some times crashed and some times worked
perfectly and the difference was whitespace in the comments!!!).

After lots of wondering and thinking that I must be dreaming
(luckily I had pyvm which also crashed but for different amounts
of whitespace), it was solved.  The explanation is this: hash
and comparison of objects depends on the state of the memory
allocator.  A sample case is this:

class A: pass
dummy0=47  # comment this to get a different result for min
a=A()
b=A()
print min (a, b)

the result of 'min' is not only non-deterministic but also depends
on whether other things have been allocated before.  The same
thing can happen for 'dictionary.keys()' if the keys are objects
and 'iterate-over-set' when the set contains objects.

In the sudoku solver, there is a min (number, object) which is
probably what's affected by the extistance of the dummy variable.
Now, in sudoku puzzles some times the algorithm has to suppose
that in a box the right solution is one of the possible, try it
and if it fails then try the other one.  I guess that the result
of from the different 'min' leads the solver to first try the
correct solution in the fast case and therefore need not attempt
the wrong one.  Or at least this is what I think that happens.

By the way, since we started posting code, here is a more pythonic
sudoku solver.

#The 'man with scissors runs around shifting barrels algorithm'

from sys import argv
SEMANTIC = 1'SEM' in argv

class Impossible:
 pass

Patterns = []
for r in range (9):
 for c in range (9):
p = 27*(r/3) + 3*(c/3)
pl = set (range (9*r, 9*r+9) + range (c, 81, 9) + [p+x for x in 
(0,1,2,9,10,11,18,19,20)])
pl.remove (9*r+c)
Patterns.append (tuple (sorted (list (pl

def initboard ():
 x = range (1, 10)
 return [ x [:] for i in xrange (9*9) ]

def printboard (board):
 if not SEMANTIC:
 return
 print 30*'-'
 for i in range (9):
for j in board [9*i:9*(i+1)]:
if type (j) is list:
#print 'X',
print ''.join (map (str, j)),
else: print j,
print
 print 30*'-'

def dupboard (board):
 B = []
 for i in board:
if type (i) is list:
B.append (i [:])
else:
B.append (i)
 return B

def solve (board, coords):
 while coords:
p, v = coords.pop ()
board [p] = v
for i in Patterns [p]:
if type (board [i]) is list:
if v in board [i]:
board [i].remove (v)
if len (board [i]) == 1:
board [i] = board [i][0]
coords.append ((i, board [i]))
else:
if board [i] == v:
raise Impossible
 for p, i in enumerate (board):
if type (i) is list:
for j in i:
try:
return solve (dupboard (board), [(p, j)])
except Impossible:
pass
raise Impossible
 return board


PP = [
[
7xx19,
46x19,
xxx6827x4,
x9xx7,
xxx3xx4x5,
xx67x,
xx1xx,
2xxx74xxx,
xxx2xx3xx,
]
]

def puz2coord (P):
 if len (P) != 9:
print P must have 9 rows
raise SystemExit
 coords = []
 for r, i in enumerate (P):
if len (i) != 9:
print Row [%s] doesn't have 9 columns %i
raise SystemExit
for c, j in enumerate (list (i)):
if j != 'x':
coords.append ((9*r + c, int (j)))
 return coords

try:
   if SEMANTIC:
for i in xrange (10):
 for P in PP:
 printboard (solve (initboard (), puz2coord (P)))
   else:
for i in xrange (TIMES):
 for P in PP:
 printboard (solve (initboard (), 

Re: Speed quirk: redundant line gives six-fold speedup

2005-08-25 Thread Bill Mill
On 8/25/05, Erik Max Francis [EMAIL PROTECTED] wrote:
 Bill Mill wrote:
 
  Unlikely; 2 people have confirmed these results already.
 
  I did find, though, that if I remove all print statements from the
  program, the dummy and non-dummy variable versions take indentical
  time. Can others reproduce this?
 
 Yes, it's obviously a real effect given the other sightings.  I don't
 see any speed difference, myself (Pentium IV 3.0 GHz running Slackware
 Linux).
 

Pentium M 1.8 GHz Windows 2k. Here's the top of the profile results
for fast and slow on my machine (these won't look decent except in a
fixed-width font):

Slow:

 6766494 function calls (6737594 primitive calls) in 45.740 CPU seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  3322320   20.5390.000   31.1520.000 test.py:135(generator expression
)
27520   10.6410.000   41.7920.002 :0(min)
  3322320   10.6130.000   10.6130.000 test.py:81(rowitems)
 28100/203.6200.000   45.6332.282 test.py:130(search)
275450.1130.0000.1130.000 :0(append)
275200.0980.0000.0980.000 :0(pop)
10.0410.041   45.736   45.736 test.py:36(?)

Fast:

 540174 function calls (536514 primitive calls) in 3.506 CPU seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   2596401.5160.0002.3030.000 test.py:135(generator expression
)
 22800.7910.0003.0940.001 :0(min)
   2596400.7880.0000.7880.000 test.py:81(rowitems)
  2860/200.2690.0003.3910.170 test.py:130(search)
10.0450.0453.4993.499 test.py:2(?)
 36450.0210.0000.0210.000 test.py:71(colinsert)
 32400.0190.0000.0190.000 test.py:62(rowinsert)
 23050.0100.0000.0100.000 :0(append)

Interestingly, the test.py:36 line, which takes 45 seconds (!!) in the
slow version, does not appear at all in the fast profile. I can't
figure out why - both printed out their data, so template must have
been called somewhere.

Peace
Bill Mill
bill.mill at gmail.com
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Speed quirk: redundant line gives six-fold speedup

2005-08-25 Thread Jack Diederich
On Thu, Aug 25, 2005 at 09:23:09PM +0300, Stelios Xanthakis wrote:
 The explanation is this: hash
 and comparison of objects depends on the state of the memory
 allocator.  A sample case is this:
 
   class A: pass
   dummy0=47  # comment this to get a different result for min
   a=A()
   b=A()
   print min (a, b)
 
 the result of 'min' is not only non-deterministic but also depends
 on whether other things have been allocated before.  The same
 thing can happen for 'dictionary.keys()' if the keys are objects
 and 'iterate-over-set' when the set contains objects.
 
 In the sudoku solver, there is a min (number, object) which is
 probably what's affected by the extistance of the dummy variable.
 Now, in sudoku puzzles some times the algorithm has to suppose

Doh, I feel silly.  Without an 'import random' in the program I
assumed it was deterministic.  I would have also expected the
comparison to be a TypeError, and it sometimes is.

 mynum = 7
 myob = object()
 cmp(mynum, myob)
-1
 mynum.__cmp__(myob)
Traceback (most recent call last):
  File stdin, line 1, in ?
TypeError: int.__cmp__(x,y) requires y to be a 'int', not a 'object'


Thanks for the explanation,
-jackdied
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Speed quirk: redundant line gives six-fold speedup

2005-08-25 Thread Bill Mill
On 8/25/05, Jack Diederich [EMAIL PROTECTED] wrote:
 On Thu, Aug 25, 2005 at 09:23:09PM +0300, Stelios Xanthakis wrote:
  The explanation is this: hash
  and comparison of objects depends on the state of the memory
  allocator.  A sample case is this:
 
class A: pass
dummy0=47  # comment this to get a different result for min
a=A()
b=A()
print min (a, b)
 
  the result of 'min' is not only non-deterministic but also depends
  on whether other things have been allocated before.  The same
  thing can happen for 'dictionary.keys()' if the keys are objects
  and 'iterate-over-set' when the set contains objects.
 

I'm also pretty sure I've caught a bug in his code, though I'm not
sure how it works exactly. I replaced the 'min' built-in with my own
min, and he's going to get nondeterministic results from this line:

   mm = min((c.S, c) for c in rowitems(h))[1].D

because 'c' is often the exact same object. A snippet from my
debugging version of 'min', which prints out the tuple its handed:

(1, __main__.LLentry object at 0x00969710)
(1, __main__.LLentry object at 0x00969710)
(4, __main__.LLentry object at 0x00969710)
snip more 4s from the same object
(4, __main__.LLentry object at 0x00969710)
(3, __main__.LLentry object at 0x00969710)
(3, __main__.LLentry object at 0x00969710)
(3, __main__.LLentry object at 0x00969710)
(2, __main__.LLentry object at 0x00969710)
snip more 2s, same object

Although they appear in order here, they don't always. Often, multiple
objects have a value of 1, and he's going to get one of them at random
as the 'min' object. I'm pretty sure.

Mark, can you confirm that this is/isn't a bug?

(btw, it still runs fast with and slow without the dummies with my
custom min() func)

Peace
Bill Mill
bill.mill at gmail.com
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Speed quirk: redundant line gives six-fold speedup

2005-08-25 Thread Mark Dickinson
In article [EMAIL PROTECTED], 
Stelios Xanthakis [EMAIL PROTECTED] wrote:
 
 In the sudoku solver, there is a min (number, object) which is
 probably what's affected by the extistance of the dummy variable.
 Now, in sudoku puzzles some times the algorithm has to suppose
 that in a box the right solution is one of the possible, try it
 and if it fails then try the other one.  I guess that the result
 of from the different 'min' leads the solver to first try the
 correct solution in the fast case and therefore need not attempt
 the wrong one.  Or at least this is what I think that happens.

Thank you!  That does indeed seem to be the explanation.  
The min() line looks for the shortest `constraint';  that is, the box 
with the fewest possible symbols, or the symbol with the smallest 
number of possible positions within a given row, column or block;  the
code below that line then tries all possibilities for this constraint.
The line should really be something like

  min(c for c in rowitems(h), key = lambda c: c.S)

but that isn't going to work until Python 2.5 comes out, and I was
too lazy to expand the whole thing properly instead of using the 
cheap trick I did.

Thanks again for clearing up this confusion.

Mark
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Speed quirk: redundant line gives six-fold speedup

2005-08-25 Thread Mark Dickinson
In article [EMAIL PROTECTED], Bill 
Mill [EMAIL PROTECTED] wrote:
 
 I'm also pretty sure I've caught a bug in his code, though I'm not
 sure how it works exactly. I replaced the 'min' built-in with my own
 min, and he's going to get nondeterministic results from this line:
 
mm = min((c.S, c) for c in rowitems(h))[1].D
 
 because 'c' is often the exact same object. A snippet from my
 debugging version of 'min', which prints out the tuple its handed:
 
 (1,   main  .LLentry object at 0x00969710)
 (1,   main  .LLentry object at 0x00969710)
 (4,   main  .LLentry object at 0x00969710)
 snip more 4s from the same object
 (4,   main  .LLentry object at 0x00969710)
 (3,   main  .LLentry object at 0x00969710)
 (3,   main  .LLentry object at 0x00969710)
 (3,   main  .LLentry object at 0x00969710)
 (2,   main  .LLentry object at 0x00969710)
 snip more 2s, same object

The c's returned by any one call to rowitems(h) should all be distinct. 
This seems to work in my code---I can't reproduce your results above, and
I don't *think* there's a bug there.

 Although they appear in order here, they don't always. Often, multiple
 objects have a value of 1, and he's going to get one of them at random
 as the 'min' object. I'm pretty sure.

I agree entirely---this is where my bug, and the source of 
all the confusion comes from.  I was briefly aware as I wrote this that the
result would be nondeterministic, but persuaded myself after two second's
thought that it didn't matter, then forgot all about it.

So mild changes in the rest of the program give rise to a different `random'
choice in the min, and choosing to eunumerate all 3 possibilities for position 
5,7
instead of the 3 possibilities for position 2, 1 (say) makes a huge diffference
to the running time.  I'm still surprised by the magnitude of the differences, 
though.

I've learnt my lesson :)  Thank you for your help, and apologies 
for wasting other people's time with this as well as my own!

Mark
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Speed quirk: redundant line gives six-fold speedup

2005-08-25 Thread Raymond Hettinger

Mark Dickinson wrote:
 I have a simple 192-line Python script that begins with the line:

 dummy0 = 47

 The script runs in less than 2.5 seconds.  The variable dummy0 is never
 referenced again, directly or indirectly, by the rest of the script.

 Here's the surprise: if I remove or comment out this first line, the
 script takes more than 15 seconds to run.  So it appears that adding a
 redundant line produces a spectacular six-fold increase in speed!

Thanks for your post.  It is cute, brilliant, and interesting.

I haven't had time to investigate but can point at the likely cause.
All of the global names are stored in a single hash table.  Search time
is dictated by two factors, the sparseness of the hash table and the
distribution of hash values.  With respect to sparseness, whenever that
table becomes 2/3 full, it grows by a factor of four and becomes only
1/6 full (leading to many fewer collisions).  With respect to
distribution, it should be noted that string hash values are decidely
non-random and your variable names likely congested consecutive spaces
in a nearly full table (resulting in seven times as many search probes
to find a global value).

When the extra value was added, it likely resized the table four-fold
and redistributed the hash values into fewer consecutive positions.


Raymond


P.S.  To analyze it further, start with something like this:

 len(set(hash('dummy%d' %i)  31 for i in xrange(29)))
26
 len(set(hash('dummy%d' %i)  127 for i in xrange(29)))
29

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Speed quirk: redundant line gives six-fold speedup

2005-08-25 Thread rafi
Stelios Xanthakis wrote:
 Mark Dickinson wrote:
 
 I have a simple 192-line Python script that begins with the line:

 dummy0 = 47

 The script runs in less than 2.5 seconds.  The variable dummy0 is never
 referenced again, directly or indirectly, by the rest of the script.

 Here's the surprise: if I remove or comment out this first line, the
 script takes more than 15 seconds to run.  So it appears that adding a
 redundant line produces a spectacular six-fold increase in speed!

 (Actually, I had to add 29 dummy lines at the beginning of the code to
 get the speed increase; if any one of these lines is removed the
 running time reverts to around 15 seconds again.)

 Questions:

 (1) Can anyone else reproduce this behaviour, or is it just some quirk
 of my setup?
 (2) Any possible explanations?  Is there some optimization that kicks
 in at a certain number of lines, or at a certain length of
 bytecode?
 (3) If (2), is there some way to force the optimization, so that I can
 get the speed increase without having to add the extra lines?

 
 Hi.
 
 I haven't been able to reproduce this but I had a similar case
 before (a program that some times crashed and some times worked
 perfectly and the difference was whitespace in the comments!!!).
 
 After lots of wondering and thinking that I must be dreaming
 (luckily I had pyvm which also crashed but for different amounts
 of whitespace), it was solved.  The explanation is this: hash
 and comparison of objects depends on the state of the memory
 allocator.  A sample case is this:
 
 class A: pass
 dummy0=47  # comment this to get a different result for min
 a=A()
 b=A()
 print min (a, b)
 
 the result of 'min' is not only non-deterministic but also depends
 on whether other things have been allocated before.  The same
 thing can happen for 'dictionary.keys()' if the keys are objects
 and 'iterate-over-set' when the set contains objects.

I do not get the point here: isn't min comparing the adress in memory as 
there is nothing else to compare?

[python 2.4.1 on ubuntu linux]

On 10 runs from within emacs I had about 50% for `a' and 50% for `b' 
returned by min (a,b). It was the same without the dummy0=47.

On 10 runs from the command line I had always `a' returned (with and 
without the dummy). The only difference was the address of b's object.
(For all the run of each case the two addresses were exactly the same)

For your first post:

On 10 tests run for each case (average time, but each time was never 
different one from the other more that .02s)

0.554 when the 28 dummies are in place
6.679 when commented out, removed or put in a single multiline string
2.576 when putting the 28 dummies in a list or a dict using a for loop
7.195 when creating the 28 dummies using an exec in a for loop

What about memory allocation that would be performed chunk by chunk by 
the interpreter? Then being at the end or at the beginning of a chunk 
may not be the same for processing? All the instructions of the program 
would then be in the cpu cache for example in the same block while in 
the other case theyr would be in two distinct block - thus less caching 
for the cpu... [this particular problem arose when I was working several 
year ago on a persistent object manager for a distributed operating system]

-- 
rafi

Imagination is more important than knowledge.
(Albert Einstein)
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Speed quirk: redundant line gives six-fold speedup

2005-08-25 Thread MrJean1
Two observations:

1 - The difference in run time with and without the dummy* globals is
due to a difference in the number of invokations of the search()
function: 1,140 resp. 27,530 in my environment.

To verify, just change the line

  def search():
  

to

  searches = 0
  def search():
  global searches
  searches += 1
  

and add at the very end

  print searches, searches


2 - The run times with and without the dummy* variables is equal(ly
slow) if the LLentry() class and min() function call are modified to be
independent of the object value.

Change line

  class LLentry: pass

to

  LLinst = 0
  class LLentry(object):
  def __init__(self):
  global LLinst
  LLinst += 1
  self.I = LLinst

and change line

mm = min((c.S, c) for c in rowitems(h))[1].D

to

mm = min((c.S, c.I, c) for c in rowitems(h))[2].D


/Jean Brouwers

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Speed quirk: redundant line gives six-fold speedup

2005-08-25 Thread Claudio Grondi
 I've learnt my lesson :)  Thank you for your help, and apologies
 for wasting other people's time with this as well as my own!

I've learnt my lesson reading through this thread, too.

I am glad to be given the chance of  wasting my time
with it and very happy and thankful, that you posted
your problem here before I have bumped into similar
one myself. I suppose, that there are many others
on this  newsgroup who appreciated your posting,
too.

Claudio


-- 
http://mail.python.org/mailman/listinfo/python-list