Re: [Python-Dev] VM and Language summit info for those not at Pycon (and those that are!)

Stefan Behnel Mon, 21 Mar 2011 04:02:10 -0700

[long post ahead, again]

Guido van Rossum, 21.03.2011 03:46:

Thanks for the clarifications. I now have a much better understanding
of what Cython is. But I'm not sold. For one, your attitude about
strict language compatibility worries me when it comes to the stdlib.

Not sure what you mean exactly. Given our large user base, we do worry alot about things like backwards compatibility, for example.

If you are referring to compatibility with Python, I don't think anyone inthe project really targets Cython as a a drop-in replacement for a Pythonruntime. We aim to compile Python code, yes, and there's a hand-wavy ideain the back of our head that we may want a plain Python compatibility modeat some point that will disable several important optimisations. Butthere's no real drive for that, simply because Cython users usually care alot more about speed than about strict Python language compliance indangerous areas like overridden builtins (such as 'range'). And Cythonusers know that they also have CPython available, which allows them toeasily get 100% compatibility if they need it, be it through an import orby calling "exec".

That being said, we do consider any deviation from Python languagesemantics a bug, and try to fix at least those with a user impact.Compatibility has improved a lot since the early days.

Also, I don't know how big it is

It's not small. The compiler is getting close to some 50,000 lines ofpython code.

but it seems putting the cart before
the horse to use it to optimize the stdlib. Cython feels much less
mature than CPython;

It certainly is not completely stable, neither the language nor thecompiler, but it has been used for production code ever since the projectstarted (from Pyrex' original inheritance).

There are parts of the language that we still fledge out, but we try hardto keep the user impact low and to adhere to the "expected" Pythonsemantics as closely as possible whenever we design new language features.Much of what we need to fix these days is actually due to differentlanguage semantics that originally appeared in Pyrex, or to differencesbetween Python 2 and Python 3 that make it tricky for users to writeportable code.

but the latter should only have dependencies that
themselves change even slower than CPython.


I understand that. C is certainly evolving a *lot* slower than Cython.

Personally, I wouldn't consider Cython a dependency even if CPython startedusing code written in Cython. It's more like a development tool, as userswon't have to care if the generated C sources ship with the distribution.Only those who want to build from hg sources and distributors that patchimpacted release sources will have to take care to install thecorresponding Cython version. Shipping tested C sources is certainly therecommended way of using Cython.

I also am unclear on how
exactly you're supporting the different semantics in Python 2 vs. 3
without recompiling.

We try to make it easy for users to write portable code by keeping the codesemantics fixed as much as possible once it's compiled. However, there arethings that we don't currently fix. For example, we only try to keepbuiltins compatible as far as we consider reasonable. If you write


   x = range(5)

in your Cython code, you will get a list in Py2 and an iterator in Py3. Ifyou write "xrange(5)", however, you will get an xrange object in Py2 and arange object in Py3. Same for "unicode" etc. We also don't change the APIof the bytes type (returning integers on indexing in Python 3), even thoughit represents a major portability hassle for our users and also preventsseveral optimisations (and language features) that Cython could otherwiseprovide.

String semantics are actually quite complex inside of the Cython compiler(as the cross-Python/C/C++ type system in general) and were subject tomajor design/usability discussions in the past. We basically have threePython string types: bytes (Py2/3 bytes), unicode (Py2 unicode, Py3 str)and str (Py2/3 str), as well as C types like char/char* or Py_UCS4. The'str' type is needed because parts of CPython, its stdlib and externallibraries actually require bytes in Python 2 (and it's sort-of the "native"string type for ASCII text there), but require unicode text in Python 3. Towrite portable code, you can use unprefixed string constants in Cythoncode, which will become the respective 'str' type in each of the runtimeenvironments. That's an impressively well appreciated feature for ourusers, and obviously modelled after 2to3.

However, since the API of 'str' isn't portable, you will only get aperformance boost when you use the unicode (and, for portable operations,bytes) type, especially for looping, 'in' tests, etc. That will basicallyallow Cython to 'unbox' the strings into a C array, with the obviousoptimisations like unboxed Unicode characters etc. As I said, quite acomplex type system.

Cython is actually a pretty cool tool for text processing these days. Forexample, this


    for c in some_typed_unicode_string:
        if c == u'X': ...
        elif c in u' \t\r\n': ...
        elif c in u'AB12UV': ...
        else: ...

will turn into a C pointer loop around a C switch statement. And I heard ofsome fast bindings to C/C++ regex libs etc. that are getting written.Shipping those with a PyCapsule based C-API (Cython can generate and importthose) and buffer interface support would provide a really speedy way touse them from other Cython modules, without sacrificing the Python languagefeeling.

OTOH I think you've got the perfect audience in the scientific Python
world.

Partly, but joined with the majority of FFI and "speeding up CPython code"users. The scientific Python world is (obviously) very focussed on numericcomputation. Cython is much more versatile than that.

Have you tried replacing selected stdlib modules with their
Cython-optimized equivalents in some of the NumPy/SciPy distros? (E.g.
what about Enthought's Python distros?) Depending on how well that
goes I might warm up to Cython more!

Hmm, I hadn't heard about that before. I'll ask on our mailing list ifanyone's aware of them. I doubt that the stdlib participates in thecritical parts of scientific computation code. Maybe alternative CSVparsers or something like that, but I'd be surprised if they werecompatible with what's in the stdlib.


Stefan

_______________________________________________
Python-Dev mailing list
[email protected]
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] VM and Language summit info for those not at Pycon (and those that are!)

Reply via email to