Re: [Cython] target language syntax of Cython: Py2.6 or Py3.0?

Dag Sverre Seljebotn Tue, 15 Apr 2008 07:29:31 -0700

Yes,

cdef char c = "a"


works, but it is rather new ("Allow single-character ascii strings to be 
treated as c character literals", Robert, Feb 28).

> I want to have distinct behaviour between byte sequences and unicode character
> sequences.
>
> If you use a byte (string) literal in your code, Cython must not alter it
> (except for PEP 263 input encoding) and must support any conversion from and
> to a char*. This works fine with current Cython as long as you use the same
> input encoding for Cython code and C code.
>   

Yes, you are absolutely right when it comes to Python 2, and Python 3 
*does* come into it. Sorry.

(My experiments indicate that with a non-unicode string, no PEP 263 
conversion happens. What character set would there be to convert to?)

Still I think I disagree about this though:

==

Also, I really like the fact that "test" is a plain byte string in Cython that
can directly be converted to a C char*, depending on its use. This shouldn't
change, even if Py3 dictates that this literal becomes a Unicode string.
==


Because in my mind this change in Python 3 changes what I consider a 
real deficiency in Python 2, which is that the source input encoding 
matter. There's a strong tendency already to let Python semantics play a 
strong role, and in this area Python 3 is a real improvement over how C 
and Python 2 handles things.

(At least keep compatability with Python 3 when compiling a pure Python 
3 file -- what happens with C interfacing is less important, and I 
suppose you could do both.)

Most recent C libraries will happily pass through char* buffers in the 
current runtime encoding as strings, and if one is crazy enough to write 
Python code like:

# note: Python 3 code against libc in Cython
handle = libc.stdlib.fopen("Fødselsår.txt", "r")

...then having automatic, runtime platform default dependant conversion 
to char* will make this work on different systems. It will however break 
on different systems with your suggestion. One can always use the "b" 
literal if your wanted behaviour is wanted. (When the Python community 
didn't make Python 3 source backwards compatible with Python 2 then I 
don't think we can make a better job of it..)

(One could also parametrize the char type for C libraries that didn't 
use the platform default, ie something like

... external import header etc.
   cdef foo(char("iso-9959-1")* s)

but I think I can see the Cython community recoiling in collective 
disgust already :-) Perhaps another word than "char"...).

===
As for using a C library on different encodings, consider the following 
example on my UTF-8 machine:


$ touch åå
$ ./checkfile åå
C3 A5 C3 A5  -> fopen: 6295568

Contents of checkfile.c:

int main(int argc, char* argv[]) {
    char* ch;
    for (ch = argv[1]; *ch != 0; ++ch) {
        printf("%hhX ", *ch);
    }
    printf(" -> fopen: %ld\n", (long)fopen(argv[1], "r"));
}


-- 
Dag Sverre

_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Re: [Cython] target language syntax of Cython: Py2.6 or Py3.0?

Reply via email to