Yes,
cdef char c = "a"
works, but it is rather new ("Allow single-character ascii strings to be
treated as c character literals", Robert, Feb 28).
> I want to have distinct behaviour between byte sequences and unicode character
> sequences.
>
> If you use a byte (string) literal in your code, Cython must not alter it
> (except for PEP 263 input encoding) and must support any conversion from and
> to a char*. This works fine with current Cython as long as you use the same
> input encoding for Cython code and C code.
>
Yes, you are absolutely right when it comes to Python 2, and Python 3
*does* come into it. Sorry.
(My experiments indicate that with a non-unicode string, no PEP 263
conversion happens. What character set would there be to convert to?)
Still I think I disagree about this though:
==
Also, I really like the fact that "test" is a plain byte string in Cython that
can directly be converted to a C char*, depending on its use. This shouldn't
change, even if Py3 dictates that this literal becomes a Unicode string.
==
Because in my mind this change in Python 3 changes what I consider a
real deficiency in Python 2, which is that the source input encoding
matter. There's a strong tendency already to let Python semantics play a
strong role, and in this area Python 3 is a real improvement over how C
and Python 2 handles things.
(At least keep compatability with Python 3 when compiling a pure Python
3 file -- what happens with C interfacing is less important, and I
suppose you could do both.)
Most recent C libraries will happily pass through char* buffers in the
current runtime encoding as strings, and if one is crazy enough to write
Python code like:
# note: Python 3 code against libc in Cython
handle = libc.stdlib.fopen("Fødselsår.txt", "r")
...then having automatic, runtime platform default dependant conversion
to char* will make this work on different systems. It will however break
on different systems with your suggestion. One can always use the "b"
literal if your wanted behaviour is wanted. (When the Python community
didn't make Python 3 source backwards compatible with Python 2 then I
don't think we can make a better job of it..)
(One could also parametrize the char type for C libraries that didn't
use the platform default, ie something like
... external import header etc.
cdef foo(char("iso-9959-1")* s)
but I think I can see the Cython community recoiling in collective
disgust already :-) Perhaps another word than "char"...).
===
As for using a C library on different encodings, consider the following
example on my UTF-8 machine:
$ touch åå
$ ./checkfile åå
C3 A5 C3 A5 -> fopen: 6295568
Contents of checkfile.c:
int main(int argc, char* argv[]) {
char* ch;
for (ch = argv[1]; *ch != 0; ++ch) {
printf("%hhX ", *ch);
}
printf(" -> fopen: %ld\n", (long)fopen(argv[1], "r"));
}
--
Dag Sverre
_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev