Antoine Pitrou, 20.03.2011 15:51:
On Sun, 20 Mar 2011 14:39:20 +0100, Stefan Behnel wrote:
If anyone knows about a good benchmark for a currently pure Python standard
library module, preferably a smaller, self-contained one that's somewhat
computationally intensive, I'd be happy to hear about it.
You can take a look at difflib, sre_compile, weak containers
(WeakrefSet, WeakValueDict, WeakKeyDict).
Ok, I skipped through the difflib module and it looked like a reasonable
candidate. So I gave its Py3.2 version a run and except for triggering one
class scoping bug in Cython, it compiled nicely with our latest generators
branch. Working around that bug was easy (renaming the affected variables)
and allowed me to run a diff on a short text file (68 lines, Cython's own
README.txt). For simplicity, and to actually get a diff, I compared the
duplicated original lines to their sorted list. Here are the timings:
Python 3.2 (locally compiled with suitable CFLAGS, Linux 64bit):
$ PYTHONPATH=py python3 -m timeit \
-s 'from difflib import ndiff; \
t1= 2 * open("../README.txt").readlines(); t2=sorted(t1);' \
'list(ndiff(t1, t2))'
100 loops, best of 3: 3.36 msec per loop
Cython:
$ PYTHONPATH=so python3 -m timeit \
-s 'from difflib import ndiff; \
t1= 2 * open("../README.txt").readlines(); t2=sorted(t1);' \
'list(ndiff(t1, t2))'
100 loops, best of 3: 2.08 msec per loop
That's more than a third faster, without any additional code optimisations
or typing. It's not the first time I see Cython compiled plain Python code
become about 30% faster, so I guess that's about the factor that
computational code can expect before starting to add static types.
I did a bit of profiling and added a couple of type declarations to the
top-2 methods, which grew the speed-up to about 50%, but there's quite some
space left for improvements here, especially by overriding class
definitions into extension types (which would speed up attribute access and
method calls in Cython). Also, a couple of optimisationisms in the code
prevent Cython from doing its own optimisions, e.g. there's critical code
that takes off bound methods for "some_list.__contains__" or
"some_dict.get". I didn't change those. They are faster in CPython, but
Cython could otherwise translate them into the corresponding C-API calls
directly.
I currently don't have psyco available to compare, but I think that would
be interesting as well.
Stefan
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com