[warning, long post ahead]

Guido van Rossum, 20.03.2011 17:17:
Hi Stefan,

Hi!

I'm glad to see Cython picking up steam and trying to compete with
CPython, PyPy, and possibly others.

We do, although our main focus is much more on targeted manual optimisation rather than whole applications. For example, we currently only boot-strap seven out of some 60 modules of Cython itself into C code, as those are the most critical parts (mostly related to parsing and syntax tree traversal) and the rest simply takes too long to run through the C compiler for too little overall benefit. Compiling only the top-4 pure Python modules of the Cython compiler, optimised with externally provided type annotations, lets the overall compiler run about twice as fast.


It's true that few in the core
development group know much about Cython -- essentially my own
understanding is still that it's like Pyrex, which was a
mostly-Python-compatible syntax for writing C extensions, but
certainly not able to compile all existing Python code successfully,
due to small syntactic incompatibilities (e.g. new reserved words).

We decide the available syntax/keywords based on the file extension (.py vs .pyx), and you can compile (most) Python 3 code by putting a comment at the top of your source file that specifies the language level (or by passing "-3" on the command line).

A nice feature is that we support CPython 2.3 through 3.2 with the same generated C code file, so you can ship the quality assured generated C code and then put your trust into a C compiler to do the right thing for a given runtime on a given machine, without requiring normal users to have (the right version of) Cython installed on their side.

Cython always works at a per-module level, does a bit of function/closure local static type inference and optimistic optimisations and has a few intentional semantic differences compared to Python, such as mostly fixed builtins (unless re-assigned directly inside of a module, or deemed uninteresting, such as "open"). This allows the compiler to apply a huge set of optimisations to the syntax tree and the generated C code that substantially speed up the usage of builtin types as well as common idioms for looping and iteration.

We still can't compile "all existing Python code", though, as there are several known bugs regarding Python semantics. However, we recently managed to get pretty close to Python language feature completeness and hope to finish that up soon.

In short: if it works, it works, and it usually only gets better. ;-)


I also thought that Cython was mostly popular is scientific computing
circles. But I may well be wrong on all counts. Please enlighten me.

It does have a large and quickly growing user base in scientific computing. I would even say that Cython is growing into one of the main features that Python has in that field, right next to NumPy/SciPy, which are also starting to migrate parts of their code to Cython to improve its portability and maintainability. Cython is a great way to get numeric code up to speed without descending all the way down into a low-level language.

Cython is IMHO also the most advanced FFI that CPython has, much faster than any other wrapper tool and also more natural to use than ctypes. We natively support interaction with C and C++ code, and there's external support (fwrap) for Fortran as well, as that's extremely important in the high-performance computing area.

Finally, Cython is a programming language in its own right, that aims to close the not-so-small gap between Python and C/C++, both in terms of performance and usability. For example, we support many Python programming idioms on C data types, such as for-loop iteration over sliced pointers, and tightly adapt the code generated for a given language construct to the data types it operates on. While code can often be rewritten in a more C-ish way if performance truly demands it, we rather try to avoid that need by constantly improving and tailoring the generated code instead.

'nough sold? :-)

There are also Cython language semantics that we are still working on, such as support for some tricky C++ features. We generally try to follow Python semantics whenever possible, but need to accept in some cases that C/C++ have language semantics that we cannot hide from the user or that are actually helpful for users (yes, that really happens :). Advancing the language in these fields is pretty interesting business.


I'd like to hear more about Cython's compatibility -- e.g. does it
compile Django?

Never been that ambitious. As I said, Cython works at the module level, and you'd likely only compile some performance critical modules of an application anyway, in order to keep the overall code development more flexible.

A common approach is to profile an application, take the top-k modules and try to compile them in Cython. If they fail to compile, adapt the code as needed to get them compiling, then profile the application again to see what that gave you (Cython supports cProfile). Optimise the top-j functions/classes/methods by adding static type declarations to drop the code deeper into C, until it's fast enough. If you can't get it fast enough that way, change the code in well selected places to make it more C-ish. If that's not enough either, rewrite the critical part of it in C or maybe Fortran, then call that from Cython code. The last step obviously leaves Python code compatibility, but you can usually get away with a separate wrapper module and a conditional import, which is simple enough.

OTOH, you can also use "pyximport" to integrate Cython as a JIT-like compiler that tries to compile a Python module on import and if that works, use the compiled version instead of the plain Python version. I might try that with the Django benchmark once Cython's current feature branches are merged into mainline.


How does it do on the benchmark suite used by
PyPy (originated with Unladen Swallow)?

As I already mentioned, I only tried some of the simpler modules so far. It's usually quite a bit faster than CPython for what I tried, especially the numeric computation ones from Debian's shootout can be made to run a couple of hundred times faster (more or less as fast as C code), *if* you apply manual code modifications or at least externally provide static types and drop Python classes into optimised extension types (which can also be done externally). So it's usually the required manual work that stops us from getting better results (and it also smells like cheating if you change the benchmarked code).

Without manual interaction, speed-ups commonly only range from 10-30% compared to CPython, with the lower speed-ups often due to an extensive usage of Python classes and CPython specific optimisation tricks that Cython could do better if it understood their intention.


IMO it's up to Cython to prove its worth.

It also depends on what you consider it worth *for*. CPython could start using Cython gradually in very fine steps, and I'd argue that the benefits for CPython are far beyond plain "run&win" performance improvements. I think the main selling point for Cython code in CPython is that it opens up an extremely wide field of code optimisations without requiring C code to be written and maintained.

Even for non-CPython runtimes, Cython code would likely be easier to port than C code, as it has a much better signal-to-noise ratio.

Stefan

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to