Re: performance of tight loop

2010-12-14 Thread Ulrich Eckhardt
Thank you for the explanation, Ryan!

Uli

-- 
Domino Laser GmbH
Geschäftsführer: Thorsten Föcking, Amtsgericht Hamburg HR B62 932

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: performance of tight loop

2010-12-14 Thread Paul Rubin
gry georgeryo...@gmail.com writes:
...
 rest =  ['%d' % randint(1, mx) for i in range(wd - 1)]
 for i in range(i,i+rows): ...

One thing that immediately comes to mind is use xrange instead of range.

Also, instead of 
first = ['%d' % i]
rest =  ['%d' % randint(1, mx) for i in range(wd - 1)]
return first + rest

you might save some copying with:

rest =  ['%d' % randint(1, mx) for i in xrange(wd)]
rest[0] = '%d'%i
return rest

That's uglier than the old-fashioned
  rest = ['%d'%i]
  for i in xrange(wd-1):
 rest.append('%d' % randint(1, mx))

I think a generator would be cleanest, but maybe slowest:

   def row(i, wd, mx):
  yield '%d' % i
  for j in xrange(wd-1):
  yield ('%d' % randint(1, mx))
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: performance of tight loop

2010-12-14 Thread Peter Otten
gry wrote:

 [python-2.4.3, rh CentOS release 5.5 linux, 24 xeon cpu's, 24GB ram]
 I have a little data generator that I'd like to go faster... any
 suggestions?
 maxint is usually 9223372036854775808(max 64bit int), but could
 occasionally be 99.
 width is usually 500 or 1600, rows ~ 5000.
 
 from random import randint
 
 def row(i, wd, mx):
 first = ['%d' % i]
 rest =  ['%d' % randint(1, mx) for i in range(wd - 1)]
 return first + rest
 ...
 while True:
 print copy %s from stdin direct delimiter ','; % table_name
 for i in range(i,i+rows):
 print ','.join(row(i, width, maxint))
 print '\.'

I see the biggest potential in inlining randint. Unfortunately you did not 
provide an executable script and I had to make it up:

$ cat gry.py
from random import randint
import sys

def row(i, wd, mx):
first = ['%d' % i]
rest =  ['%d' % randint(1, mx) for i in range(wd - 1)]
return first + rest

def main():
table_name = unknown
maxint = sys.maxint
width = 500
rows = 1000
offset = 0

print copy %s from stdin direct delimiter ','; % table_name
for i in range(offset, offset+rows):
print ','.join(row(i, width, maxint))
print '\.'

if __name__ == __main__:
main()
$ time python gry.py  /dev/null

real0m5.280s
user0m5.230s
sys 0m0.050s
$

$ cat gry_inline.py
import random
import math
import sys

def make_rand(n):
if n  1  random.BPF:
def rand(random=random.random):
return int(n*random())+1
else:
k = int(1.1 + math.log(n-1, 2.0))
def rand(getrandbits=random.getrandbits):
r = getrandbits(k)
while r = n:
r = getrandbits(k)
return r+1
return rand

def row(i, wd, rand):
first = ['%d' % i]
rest =  ['%d' % rand() for i in range(wd - 1)]
return first + rest

def main():
table_name = unknown
maxint = sys.maxint
width = 500
rows = 1000
offset = 0

rand = make_rand(maxint)

print copy %s from stdin direct delimiter ','; % table_name
for i in range(offset, offset+rows):
print ','.join(row(i, width, rand))
print '\.'

if __name__ == __main__:
main()
$ time python gry_inline.py  /dev/null

real0m2.004s
user0m2.000s
sys 0m0.000s
$

Disclaimer: the code in random.py is complex enough that I cannot guarantee 
I snatched the right pieces.

Peter
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: performance of tight loop

2010-12-14 Thread Peter Otten
Peter Otten wrote:

 gry wrote:
 
 [python-2.4.3, rh CentOS release 5.5 linux, 24 xeon cpu's, 24GB ram]
 I have a little data generator that I'd like to go faster... any
 suggestions?
 maxint is usually 9223372036854775808(max 64bit int), but could
 occasionally be 99.
 width is usually 500 or 1600, rows ~ 5000.
 
 from random import randint
 
 def row(i, wd, mx):
 first = ['%d' % i]
 rest =  ['%d' % randint(1, mx) for i in range(wd - 1)]
 return first + rest
 ...
 while True:
 print copy %s from stdin direct delimiter ','; % table_name
 for i in range(i,i+rows):
 print ','.join(row(i, width, maxint))
 print '\.'
 
 I see the biggest potential in inlining randint. Unfortunately you did not
 provide an executable script and I had to make it up:

 $ time python gry_inline.py  /dev/null
 
 real0m2.004s
 user0m2.000s
 sys 0m0.000s

On second thought, if you have numpy available:

$ cat gry_numpy.py
from numpy.random import randint
import sys

def row(i, wd, mx):
first = ['%d' % i]
rest =  ['%d' % i for i in randint(1, mx, wd - 1)]
return first + rest

def main():
table_name = unknown
maxint = sys.maxint
width = 500
rows = 1000
offset = 0

print copy %s from stdin direct delimiter ','; % table_name
for i in range(offset, offset+rows):
print ','.join(row(i, width, maxint))
print '\.'

if __name__ == __main__:
main()
$ time python gry_numpy.py  /dev/null

real0m1.024s
user0m1.010s
sys 0m0.010s
$

Argh

Peter
-- 
http://mail.python.org/mailman/listinfo/python-list


performance of tight loop

2010-12-13 Thread gry
[python-2.4.3, rh CentOS release 5.5 linux, 24 xeon cpu's, 24GB ram]
I have a little data generator that I'd like to go faster... any
suggestions?
maxint is usually 9223372036854775808(max 64bit int), but could
occasionally be 99.
width is usually 500 or 1600, rows ~ 5000.

from random import randint

def row(i, wd, mx):
first = ['%d' % i]
rest =  ['%d' % randint(1, mx) for i in range(wd - 1)]
return first + rest
...
while True:
print copy %s from stdin direct delimiter ','; % table_name
for i in range(i,i+rows):
print ','.join(row(i, width, maxint))
print '\.'

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: performance of tight loop

2010-12-13 Thread Steven D'Aprano
On Mon, 13 Dec 2010 18:50:38 -0800, gry wrote:

 [python-2.4.3, rh CentOS release 5.5 linux, 24 xeon cpu's, 24GB ram] I
 have a little data generator that I'd like to go faster... any
 suggestions?
 maxint is usually 9223372036854775808(max 64bit int), but could
 occasionally be 99.
 width is usually 500 or 1600, rows ~ 5000.
 
 from random import randint
 
 def row(i, wd, mx):
 first = ['%d' % i]
 rest =  ['%d' % randint(1, mx) for i in range(wd - 1)] 
 return first + rest
 ...
 while True:
 print copy %s from stdin direct delimiter ','; % table_name
 for i in range(i,i+rows):
 print ','.join(row(i, width, maxint))
 print '\.'


This isn't entirely clear to me. Why is the while loop indented? I assume 
it's part of some other function that you haven't shown us, rather than 
part of the function row().

Assuming this, I would say that the overhead of I/O (the print commands) 
will likely be tens or hundreds of times greater than the overhead of the 
loop, so you're probably not likely to see much appreciable benefit. You 
might save off a few seconds from something that runs for many minutes. I 
don't see the point, really.

If the print statements are informative rather than necessary, I would 
print every tenth (say) line rather than every line. That should save 
*lots* of time.

Replacing while True with while 1 may save a tiny bit of overhead. 
Whether it is significant or not is another thing.

Replacing range with xrange should also make a difference, especially if 
rows is a large number.

Moving the code from row() inline, replacing string interpolation with 
calls to str(), may also help. Making local variables of any globals may 
also help a tiny bit. But as I said, you're shaving microseconds of 
overhead and spending millseconds printing -- the difference will be tiny.

But for what it's worth, I'd try this:


# Avoid globals in favour of locals.
from random import randint
_maxint = maxint
loop = xrange(i, i+rows)  # Where does i come from?
inner_loop = xrange(width)  # Note 1 more than before.
while 1:
print copy %s from stdin direct delimiter ','; % table_name
for i in loop:
row = [str(randint(1, _maxint)) for _ in inner_loop]
row[0] = str(i)  # replace in place
print ','.join(row)
print '\.'



Hope it helps.



-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: performance of tight loop

2010-12-13 Thread Ulrich Eckhardt
gry wrote:
 I have a little data generator that I'd like to go faster... any
 suggestions?
 maxint is usually 9223372036854775808(max 64bit int), but could
 occasionally be 99.
 width is usually 500 or 1600, rows ~ 5000.
 
 from random import randint
 
 def row(i, wd, mx):
 first = ['%d' % i]
 rest =  ['%d' % randint(1, mx) for i in range(wd - 1)]
 return first + rest

A few things here:
 * If you can, don't convert the ints to strings. I'm not 100% sure about
Python 2.4, but newer versions will automatically yield long instead of int
if the range exceeds that of an int, so even with large numbers that should
be safe.
 * Replace range with xrange.
 * Instead of creating and appending lists, you could also use a generator
expression.

 print ','.join(row(i, width, maxint))

All you do here is take a list of strings, build a single string from them
and then print the string. Why not iterate over the list (or, as suggested,
the generator) and print the elements?

Summary: Avoid unnecessary conversions. This includes int to string, but
also logical sequences into arrays.

Uli

-- 
Domino Laser GmbH
Geschäftsführer: Thorsten Föcking, Amtsgericht Hamburg HR B62 932

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: performance of tight loop

2010-12-13 Thread Ulrich Eckhardt
Steven D'Aprano wrote:
 Replacing while True with while 1 may save a tiny bit of overhead.
 Whether it is significant or not is another thing.

Is this the price for an intentional complexity or just a well-known
optimizer deficiency?

Just curious...

Uli

-- 
Domino Laser GmbH
Geschäftsführer: Thorsten Föcking, Amtsgericht Hamburg HR B62 932

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: performance of tight loop

2010-12-13 Thread Ryan Kelly
On Tue, 2010-12-14 at 08:08 +0100, Ulrich Eckhardt wrote:
 Steven D'Aprano wrote:
  Replacing while True with while 1 may save a tiny bit of overhead.
  Whether it is significant or not is another thing.
 
 Is this the price for an intentional complexity or just a well-known
 optimizer deficiency?

At least on older pythons, you can assign to the name True so it's not
possible to optimize that loop - you must look up the name True on
each iteration.  For example, in python 2.6 this loop will exit after
one iteration:


 while True:
... True = False
...


To see the difference, take a look at the bytecode python generators for
the type types of loop:


 import dis
 def while1():
... while 1:
... pass
... 
 def whileTrue():
... while True:
... pass
... 
 dis.dis(while1)
  2   0 SETUP_LOOP   3 (to 6)

  3 3 JUMP_ABSOLUTE3
6 LOAD_CONST   0 (None)
  9 RETURN_VALUE

 dis.dis(whileTrue)
  2   0 SETUP_LOOP  12 (to 15)
3 LOAD_GLOBAL  0 (True)
  6 JUMP_IF_FALSE4 (to 13)
  9 POP_TOP 

  3  10 JUMP_ABSOLUTE3
   13 POP_TOP 
 14 POP_BLOCK
   15 LOAD_CONST   0 (None)
 18 RETURN_VALUE



Still, I just can't bring myself to write while 1 in favour of while
True in code.


Python 3 does away with this madness entirely:


 while True:
... True = False
... 
  File stdin, line 2
SyntaxError: assignment to keyword
 

Looking at the bytecode shows that in Python 3, while 1 and while
True are indeed identical.


  Cheers,

 Ryan


-- 
Ryan Kelly
http://www.rfk.id.au  |  This message is digitally signed. Please visit
r...@rfk.id.au|  http://www.rfk.id.au/ramblings/gpg/ for details



signature.asc
Description: This is a digitally signed message part
-- 
http://mail.python.org/mailman/listinfo/python-list