Re: [Python-Dev] VM and Language summit info for those not at Pycon (and those that are!)

Stefan Behnel Sun, 20 Mar 2011 14:01:17 -0700

[warning, long post ahead]

Guido van Rossum, 20.03.2011 17:17:

Hi Stefan,

Hi!

I'm glad to see Cython picking up steam and trying to compete with
CPython, PyPy, and possibly others.

We do, although our main focus is much more on targeted manual optimisationrather than whole applications. For example, we currently only boot-strapseven out of some 60 modules of Cython itself into C code, as those are themost critical parts (mostly related to parsing and syntax tree traversal)and the rest simply takes too long to run through the C compiler for toolittle overall benefit. Compiling only the top-4 pure Python modules of theCython compiler, optimised with externally provided type annotations, letsthe overall compiler run about twice as fast.

It's true that few in the core
development group know much about Cython -- essentially my own
understanding is still that it's like Pyrex, which was a
mostly-Python-compatible syntax for writing C extensions, but
certainly not able to compile all existing Python code successfully,
due to small syntactic incompatibilities (e.g. new reserved words).

We decide the available syntax/keywords based on the file extension (.py vs.pyx), and you can compile (most) Python 3 code by putting a comment at thetop of your source file that specifies the language level (or by passing"-3" on the command line).

A nice feature is that we support CPython 2.3 through 3.2 with the samegenerated C code file, so you can ship the quality assured generated C codeand then put your trust into a C compiler to do the right thing for a givenruntime on a given machine, without requiring normal users to have (theright version of) Cython installed on their side.

Cython always works at a per-module level, does a bit of function/closurelocal static type inference and optimistic optimisations and has a fewintentional semantic differences compared to Python, such as mostly fixedbuiltins (unless re-assigned directly inside of a module, or deemeduninteresting, such as "open"). This allows the compiler to apply a hugeset of optimisations to the syntax tree and the generated C code thatsubstantially speed up the usage of builtin types as well as common idiomsfor looping and iteration.

We still can't compile "all existing Python code", though, as there areseveral known bugs regarding Python semantics. However, we recently managedto get pretty close to Python language feature completeness and hope tofinish that up soon.


In short: if it works, it works, and it usually only gets better. ;-)

I also thought that Cython was mostly popular is scientific computing
circles. But I may well be wrong on all counts. Please enlighten me.

It does have a large and quickly growing user base in scientific computing.I would even say that Cython is growing into one of the main features thatPython has in that field, right next to NumPy/SciPy, which are alsostarting to migrate parts of their code to Cython to improve itsportability and maintainability. Cython is a great way to get numeric codeup to speed without descending all the way down into a low-level language.

Cython is IMHO also the most advanced FFI that CPython has, much fasterthan any other wrapper tool and also more natural to use than ctypes. Wenatively support interaction with C and C++ code, and there's externalsupport (fwrap) for Fortran as well, as that's extremely important in thehigh-performance computing area.

Finally, Cython is a programming language in its own right, that aims toclose the not-so-small gap between Python and C/C++, both in terms ofperformance and usability. For example, we support many Python programmingidioms on C data types, such as for-loop iteration over sliced pointers,and tightly adapt the code generated for a given language construct to thedata types it operates on. While code can often be rewritten in a moreC-ish way if performance truly demands it, we rather try to avoid that needby constantly improving and tailoring the generated code instead.


'nough sold? :-)

There are also Cython language semantics that we are still working on, suchas support for some tricky C++ features. We generally try to follow Pythonsemantics whenever possible, but need to accept in some cases that C/C++have language semantics that we cannot hide from the user or that areactually helpful for users (yes, that really happens :). Advancing thelanguage in these fields is pretty interesting business.

I'd like to hear more about Cython's compatibility -- e.g. does it
compile Django?

Never been that ambitious. As I said, Cython works at the module level, andyou'd likely only compile some performance critical modules of anapplication anyway, in order to keep the overall code development moreflexible.

A common approach is to profile an application, take the top-k modules andtry to compile them in Cython. If they fail to compile, adapt the code asneeded to get them compiling, then profile the application again to seewhat that gave you (Cython supports cProfile). Optimise the top-jfunctions/classes/methods by adding static type declarations to drop thecode deeper into C, until it's fast enough. If you can't get it fast enoughthat way, change the code in well selected places to make it more C-ish. Ifthat's not enough either, rewrite the critical part of it in C or maybeFortran, then call that from Cython code. The last step obviously leavesPython code compatibility, but you can usually get away with a separatewrapper module and a conditional import, which is simple enough.

OTOH, you can also use "pyximport" to integrate Cython as a JIT-likecompiler that tries to compile a Python module on import and if that works,use the compiled version instead of the plain Python version. I might trythat with the Django benchmark once Cython's current feature branches aremerged into mainline.

How does it do on the benchmark suite used by
PyPy (originated with Unladen Swallow)?

As I already mentioned, I only tried some of the simpler modules so far.It's usually quite a bit faster than CPython for what I tried, especiallythe numeric computation ones from Debian's shootout can be made to run acouple of hundred times faster (more or less as fast as C code), *if* youapply manual code modifications or at least externally provide static typesand drop Python classes into optimised extension types (which can also bedone externally). So it's usually the required manual work that stops usfrom getting better results (and it also smells like cheating if you changethe benchmarked code).

Without manual interaction, speed-ups commonly only range from 10-30%compared to CPython, with the lower speed-ups often due to an extensiveusage of Python classes and CPython specific optimisation tricks thatCython could do better if it understood their intention.

IMO it's up to Cython to prove its worth.

It also depends on what you consider it worth *for*. CPython could startusing Cython gradually in very fine steps, and I'd argue that the benefitsfor CPython are far beyond plain "run&win" performance improvements. Ithink the main selling point for Cython code in CPython is that it opens upan extremely wide field of code optimisations without requiring C code tobe written and maintained.

Even for non-CPython runtimes, Cython code would likely be easier to portthan C code, as it has a much better signal-to-noise ratio.


Stefan

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] VM and Language summit info for those not at Pycon (and those that are!)

Reply via email to