[long post ahead, again]

Guido van Rossum, 21.03.2011 03:46:
Thanks for the clarifications. I now have a much better understanding
of what Cython is. But I'm not sold. For one, your attitude about
strict language compatibility worries me when it comes to the stdlib.

Not sure what you mean exactly. Given our large user base, we do worry a lot about things like backwards compatibility, for example.

If you are referring to compatibility with Python, I don't think anyone in the project really targets Cython as a a drop-in replacement for a Python runtime. We aim to compile Python code, yes, and there's a hand-wavy idea in the back of our head that we may want a plain Python compatibility mode at some point that will disable several important optimisations. But there's no real drive for that, simply because Cython users usually care a lot more about speed than about strict Python language compliance in dangerous areas like overridden builtins (such as 'range'). And Cython users know that they also have CPython available, which allows them to easily get 100% compatibility if they need it, be it through an import or by calling "exec".

That being said, we do consider any deviation from Python language semantics a bug, and try to fix at least those with a user impact. Compatibility has improved a lot since the early days.


Also, I don't know how big it is

It's not small. The compiler is getting close to some 50,000 lines of python code.


but it seems putting the cart before
the horse to use it to optimize the stdlib. Cython feels much less
mature than CPython;

It certainly is not completely stable, neither the language nor the compiler, but it has been used for production code ever since the project started (from Pyrex' original inheritance).

There are parts of the language that we still fledge out, but we try hard to keep the user impact low and to adhere to the "expected" Python semantics as closely as possible whenever we design new language features. Much of what we need to fix these days is actually due to different language semantics that originally appeared in Pyrex, or to differences between Python 2 and Python 3 that make it tricky for users to write portable code.


but the latter should only have dependencies that
themselves change even slower than CPython.

I understand that. C is certainly evolving a *lot* slower than Cython.

Personally, I wouldn't consider Cython a dependency even if CPython started using code written in Cython. It's more like a development tool, as users won't have to care if the generated C sources ship with the distribution. Only those who want to build from hg sources and distributors that patch impacted release sources will have to take care to install the corresponding Cython version. Shipping tested C sources is certainly the recommended way of using Cython.


I also am unclear on how
exactly you're supporting the different semantics in Python 2 vs. 3
without recompiling.

We try to make it easy for users to write portable code by keeping the code semantics fixed as much as possible once it's compiled. However, there are things that we don't currently fix. For example, we only try to keep builtins compatible as far as we consider reasonable. If you write

   x = range(5)

in your Cython code, you will get a list in Py2 and an iterator in Py3. If you write "xrange(5)", however, you will get an xrange object in Py2 and a range object in Py3. Same for "unicode" etc. We also don't change the API of the bytes type (returning integers on indexing in Python 3), even though it represents a major portability hassle for our users and also prevents several optimisations (and language features) that Cython could otherwise provide.

String semantics are actually quite complex inside of the Cython compiler (as the cross-Python/C/C++ type system in general) and were subject to major design/usability discussions in the past. We basically have three Python string types: bytes (Py2/3 bytes), unicode (Py2 unicode, Py3 str) and str (Py2/3 str), as well as C types like char/char* or Py_UCS4. The 'str' type is needed because parts of CPython, its stdlib and external libraries actually require bytes in Python 2 (and it's sort-of the "native" string type for ASCII text there), but require unicode text in Python 3. To write portable code, you can use unprefixed string constants in Cython code, which will become the respective 'str' type in each of the runtime environments. That's an impressively well appreciated feature for our users, and obviously modelled after 2to3.

However, since the API of 'str' isn't portable, you will only get a performance boost when you use the unicode (and, for portable operations, bytes) type, especially for looping, 'in' tests, etc. That will basically allow Cython to 'unbox' the strings into a C array, with the obvious optimisations like unboxed Unicode characters etc. As I said, quite a complex type system.

Cython is actually a pretty cool tool for text processing these days. For example, this

    for c in some_typed_unicode_string:
        if c == u'X': ...
        elif c in u' \t\r\n': ...
        elif c in u'AB12UV': ...
        else: ...

will turn into a C pointer loop around a C switch statement. And I heard of some fast bindings to C/C++ regex libs etc. that are getting written. Shipping those with a PyCapsule based C-API (Cython can generate and import those) and buffer interface support would provide a really speedy way to use them from other Cython modules, without sacrificing the Python language feeling.


OTOH I think you've got the perfect audience in the scientific Python
world.

Partly, but joined with the majority of FFI and "speeding up CPython code" users. The scientific Python world is (obviously) very focussed on numeric computation. Cython is much more versatile than that.


Have you tried replacing selected stdlib modules with their
Cython-optimized equivalents in some of the NumPy/SciPy distros? (E.g.
what about Enthought's Python distros?) Depending on how well that
goes I might warm up to Cython more!

Hmm, I hadn't heard about that before. I'll ask on our mailing list if anyone's aware of them. I doubt that the stdlib participates in the critical parts of scientific computation code. Maybe alternative CSV parsers or something like that, but I'd be surprised if they were compatible with what's in the stdlib.

Stefan

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to