On 05/30/2013 06:41 PM Nathan Hurst wrote:
On Thu, May 30, 2013 at 10:23:17AM +0200, Armin Rigo wrote:
Hi all,
Some people learn about PyPy, and the first program they try to
measure speed with is something like this:
def factorial(n):
res = 1
for i in range(1, n + 1):
res *= i
return res
print factorial(25000)
It may not be completely obvious a priori, but this is as bogus as it
gets. This is by now only 50% slower in PyPy than in CPython thanks
to efforts from various people. The issue is of course that it's an
algo which, in CPython or in PyPy, spends most of its time in C code
computing with rather large "long" objects. (No, PyPy doesn't contain
magic to speed up C code 10 times.) In fact, this program spends more
than 2/3rd of its time in the final repr() of the result! Converting
a long to base 10 is a quadratic operation.
It doesn't have to be quadratic, it's easy to come up with a splitting
algorithm:
def reclongtostr(x):
if x< 0: return "-"+reclongtostr(-x)
x = long(x) # expect a long
min_digits = 9 # fits in 32 bits, there may be a better choice for
this
pts = [10**min_digits]
while pts[-1]< x:
pts.append(pts[-1]**2)
pts.pop() # remove first 10**2**i greater than x
output = []
def spl(x,i):
if i< 0: # bottomed out with max_digit sized pieces
if output or x> 0:
s = str(x)
output.append("0"*(min_digits - len(s)) + s) # note
that this appends in inorder
else:
top,bot = divmod(x, pts[i]) # split the number
spl(top,i-1)
spl(bot,i-1)
spl(x,len(pts)-1)
# strip leading zeros, we can probably do this more elegantly
while output[0][0] == "0":
output[0] = output[0][1:]
return ''.join(output)
which benchmarks factorial(25000) like this:
import time
s = time.time()
x = factorial(25000)
print "factorial", time.time() - s
sx = str(x) # give pypy a chance to compile
s = time.time()
sx = str(x)
print "Str time", time.time() - s
rsx = reclongtostr(x) # give pypy a chance to compile
s = time.time()
rsx = reclongtostr(x)
print "my rec time", time.time() - s
print "equal string:", sx == rsx
factorial 0.182402133942
Str time 0.505062818527
my rec time 0.0678248405457
equal string: True
I'm sure a better programmer than I could make this faster by avoiding
saving intermediate results and various micro optimisations. But
beating the builtin C implementation by a factor of 7.5 seems a
reasonable outcome for pypy.
I think I could come up with a linear time two pass algorithm working
on intdigits if this were important to pypy.
Does it still make sense to add programs like this to our benchmarks?
So far, our benchmarks are "real-life" examples. The benchmarks like
above are completely missing the point of PyPy, as they don't stress
at all the Python interpreter part. There are also other cases where
PyPy's performance is very bad, like cpyext on an extension module
with lots of small C API calls. I believe that it would still make
sense to list such cases in the official benchmark, and have the
descriptions of the benchmarks explain what's wrong with them.
I agree that you should include them, I disagree that they are
'wrong'. They measure the overhead of a C call. Why should a C call
be slower in pypy than cpython? Presumably it could be compiled down
to the appropriate instructions and then out-perform cpy.
Now that the topic of benchmarks has come up, I came across this
benchmark recently:
http://dalkescientific.com/writings/diary/archive/2009/11/15/100000_tasklets.html
The same benchmark took 8.5s on pypy 2beta2 and takes 7.5s on pypy
2.0.1. Is there any obvious reasons why pypy's tasklets are so
slow to switch? (Is it the scheduler?) This is important for my
adoption of pypy at work.
njh
I remember doing something similar to convert long to string,
way back in .. let's see .. googling google comp.lang.python ..
oy, 2001, it's been a while ;-)
https://groups.google.com/forum/?hl=sv&fromgroups#!searchin/comp.lang.python/richter$20strL.py/comp.lang.python/6HYHojX7ZlA/Wizytwby71QJ
(from top, search to first instance of strL.py)
I guess it's not too long to post a copy here:
.. (Hm, I see I should have imported time from time
instead of clock, since the latter has just scheduler
tick resolution, and time is high resolution.)
____________________________________________________
A couple of lines wrapped...
_______________________________________________________________
# strL.py -- recursive long to decimal string conversion
# 2001-07-10 bokr
#
p10d={}
def p10(n):
if not p10d.has_key(n):
p = p10d[n] = 10L**n
return p
return p10d[n]
def strLR(n,w=0): # w>0 is known width, w<0 is searching guess
if w == 0: return []
if w > 0:
if w <=9: return [('%0'+chr(w+48)+'d') % n]
wr = w/2
nl,nr = divmod(n,p10(wr))
return strLR(nl,w-wr)+strLR(nr,wr)
else:
nl,nr = divmod(n,p10(-w))
if nl:
return strLR(nl, 2*w) + strLR(nr,-w)
else:
if w >= -9:
return ['%d' % n]
else:
return strLR(nr,w/2)
def strL(n):
if n<0:
return ''.join(['-']+strLR(-n,-9))
else:
return ''.join(strLR(n,-9))
from time import clock
import sys
def main():
def pr(x):
print x
def psl(x):
s = strL(x)
sys.stdout.write(s)
sys.stdout.write('\n')
dt={'str':str,'strL':strL,'repr':repr, 'print':pr, 'printStrL':psl
}
try:
x=long( eval(sys.argv[2]) )
fn=sys.argv[1]
fcn=dt[fn]
except:
sys.stderr.write("usage: %s [str strL repr print printStrL]
<const expr>\n" % sys.argv[0])
sys.exit(2)
t0=clock()
fcn(x)
t1=clock()
print "%s took %9.6f seconds" % (fn,t1-t0)
if __name__ == "__main__":
main()
____________________________________________________
Got curious, so I added to your benchmark:
from strL import strL ## bokr mod
and
## bokr mod: add timing of strL, like above
rsb = strL(x) # give pypy a chance to compile
s = time.time()
rsx = strL(x)
print "strL rec time", time.time() - s
print "equal string:", sx == rsb
then ran it (with old python and pypy on old laptop:
Version info:
[14:49 ~/wk/py/clp]$ python
Python 2.7.2 (default, Jul 8 2011, 23:38:53)
[GCC 4.1.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>
[14:49 ~/wk/py/clp]$ pypy
pypy: /usr/lib/libcrypto.so.0.9.8: no version information available (required
by pypy)
pypy: /usr/lib/libssl.so.0.9.8: no version information available (required by
pypy)
Python 2.7.1 (b590cf6de419, Apr 30 2011, 02:00:38)
[PyPy 1.5.0-alpha0 with GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
And now for something completely different: ``nothing is true''
>>>>
The bench run:
[14:49 ~/wk/py/clp]$ python NathanHurst.py
factorial 1.36516094208
Str time 2.93479013443
my rec time 1.45956683159
equal string: True
strL rec time 1.34501504898
equal string: True
[14:50 ~/wk/py/clp]$ pypy NathanHurst.py
pypy: /usr/lib/libcrypto.so.0.9.8: no version information available (required
by pypy)
pypy: /usr/lib/libssl.so.0.9.8: no version information available (required by
pypy)
factorial 2.29024791718
Str time 3.14243102074
my rec time 1.25054502487
equal string: True
strL rec time 1.12671113014
equal string: True
[14:50 ~/wk/py/clp]$
... Hm, wonder why your factorial was slower on pypy .. prob due to old version?
Regards,
Bengt Richter
_______________________________________________
pypy-dev mailing list
pypy-dev@python.org
http://mail.python.org/mailman/listinfo/pypy-dev