Hi,
Dag Sverre Seljebotn wrote:
>> "In the face of ambiguity, refuse the temptation to guess." :)
>>
>> Somehow "inferring" the difference between str and unicode literals is the
>> wrong thing to do.
>>
> I don't think I explained my question well enough; I'll try again.
>
> The thing is, this kind of inferring already happens; you can do
>
> cdef char c = "c"
Isn't this illegal?
> Somehow the "natural" thing to do for Py3 is to
> continue allowing "direct" assignments to char* of the type above; but
> generate unicode objects on coercion to Python object.
Assuming we have a well-defined source code encoding (i.e. PEP 263).
> (Hmm. So the
> problem is that one can no longer auto-coerce from Python string objects
> to char*...)
Right.
> Hmm. This might come from a wrong understanding of the problem, but from
> my limited knowledge, it looks like the reason we get this problem is
> because the current Cython behaviour is wrong, even in a Python 2.6
> context. Suggestion:
>
> - Support PEP 263 as you say. This is for *input* from Cython source
> *only*; the whole point is that whether you edit your source files on a
> UTF-8 or BIG-5 system shouldn't impact anything about runtime behaviour
> as long as you declare the encoding of the source file.
That would be required by the implementation, yes. In practice, all that
matters here are string literals, both bytes and unicode.
> - Have a seperate mechanism for specifying what encoding should be used
> for conversion to C buffers.
I don't see a reason to go that route, given the existing PEP.
> - String literals to buffers (cdef char* s = "hello") are reencoded in
> Cython compilation to the right target encoding, so that if latin1 is
> specified for the C library in question
We do not know the target library at Cython compile time.
> char* s = {-20, 54, 50, 0}
or the respective "\xAB" escape sequences. But I would generally expect 8-bit
values to pass cleanly - as long as they are correctly encoded *during* the
code generation.
> . If there's a mismatch between input and output encoding (I defined the
> C library I'm calling as ASCII but try to use my native "øåæÅØ") then
> it's a compile-time error.
Same as above, we don't know the C environment.
I'm pretty sure the PEP is the right way to go.
Stefan
_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev