On Apr 15, 2008, at 1:55 AM, Stefan Behnel wrote:
> Hi,
>
> one of the goals of Cython is to "compile Python code". I think we  
> should be
> clearer here. I would opt for making Python 2.6 the target syntax and
> eventually write a separate/enhanced/whatever parser for Python 3.0  
> syntax and
> semantics (unicode/bytes literals, new keywords, etc.).
>
> This has several advantages, the most important one being code  
> compatibilty.
> While it will be work to migrate from Py2 to Py3, it shouldn't  
> affect Cython
> users and the existing Cython code.

I would say the target language syntax of Python is certainly 2.6 in  
the near future. Python 3.0 hasn't even been finalized yet, and even  
when it is it will be quite a while (I anticipate) before the  
majority of projects migrate over.

Hopefully Fabrizio's GSoC project gets approved and supporting  
another syntax will be as easy as reading in another grammar file. On  
the other end of things, I would really like to output .c files that  
can be compiled and linked into either 2.x or 3.x extensions without  
having to re-run Cython (modulo, perhaps, new builtins).

> Also, I really like the fact that "test" is a plain byte string in  
> Cython that
> can directly be converted to a C char*, depending on its use. This  
> shouldn't
> change, even if Py3 dictates that this literal becomes a Unicode  
> string.
> Cython positions itself between Python and C, and that's a place  
> where the
> plain string literal semantics make perfect sense. Supporting the  
> b"test"
> bytes syntax *in addition* is ok with me, as is the u"unicode"  
> syntax, which
> Python2 and Cython currently use. I think it makes sense to be  
> explicit about
> unicode objects in the context of Cython.

Using PEP 263 to determine the encoding of string literals seems the  
right thing to do. I don't want to loose the ability to do cdef char*  
s = "test" (stored as an ASCII string), nor do I want to make the  
behavior dependent on the runtime system. Treating "xxx" as a char*  
if it is pure ASCII, and as a unicode object otherwise, seems like  
the obvious things to do. What hasn't been resolved is conversions

     cdef object o = s # s is a char*

If s is not pure ASCII, should a runtime error be raised, or should  
an encoding be chosen (at compile time?) Could one specify an  
encoding, or do any decoding manually via a bytes object? Should it  
be a unicode or a str? Should that depend on whether or not it's  
compiled with 3k syntax, or linked against 3k to create the .so file?

     cdef char* s = o # o is a python unicode object (or,  
equivalently, the result of str(o))

Should this raise a compile time error? (That would break a lot of  
code...including really nice code like declaring a function argument  
to be char*) A runtime error if o is not pure ASCII? Or what encoding  
should be used? Currently it gets a pointer to the data, which is  
very convenient, but wouldn't work for a unicode object.

Perhaps we should just choose an internal Cython encoding (preferably  
UTF-8, so ASCII strings are handled normally and everything is  
terminated with \0 as expected). Conversion to and from char* and  
unicode would always be via utf-8. One could manually create a bytes  
object to use other conversions, but most of the time this probably  
wouldn't even be needed. The user experience from Python would not be  
impacted, and if one is interfacing with external C libraries using  
non-ASCII char* then one would probably be forced to think about  
things explicitly anyways.

Whatever happens, I think <object><char*>o == o and <char*><object>s  
== s are important.

> Having a separate Cython frontend (cython3? or a command line  
> option "-3"?)
> and a distutils Extension option for compiling Python3 code with  
> Python3
> semantics might be a way to deal with the syntax issue. But I would  
> actually
> prefer a different source file extension (.cy3) or a special  
> comment in the
> first code line, or something like that. The language level is an  
> integral
> part of the source file, not so much of the build system. Even
>
>     from __future__ import python3
>
> might work. ;)
>
> Any comments on this?

I like Dag's "lang: ..." proposal, though I'm hesitant on the idea of  
"plugins" (in the sense that one would have to look at the contents  
of the files to determine dependancies, and I don't want it to  
fracture into multiple dialects depending on the exact set of lang  
parameters specified). I think the default language should be  
determined by the runtime environment of the compiler, i.e. (which  
can always be overridden, ether globally or file-by-file, but  
probably won't need to be most of the time).

- Robert
_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Reply via email to