On Nov 27, 2009, at 2:23 PM, Dag Sverre Seljebotn wrote:

> Robert Bradshaw wrote:
>> Though I usually try to avoid the topic, I've been thinking a lot
>> about string handling in Cython lately. I think we've taken a great
>> step forward in terms of usability with CEP 108, especially for those
>> who never deal with external libraries, but all this explicit  
>> encoding
>> and decoding still seems too heavy (though I understand why it's
>> necessary to deal with anything but pure ASCII). For an application
>> like lxml that is all about string processing, the verbosity and
>> explicitness isn't burdensome and the issue naturally comes up, but
>> this is not true of many applications. (For example the last time I
>> had to use strings, my character set was limited to [0-9Ee+-.].) On
>> the other hand, it's clear letting users just ignore the encoding
>> issue is unacceptable and undesirable.
>>
>> I had an epiphany when I realized that I find this burdensome not
>> because the user needs to specify an encoding, but that they have to
>> manually handle it every time they deal with a char*. So, my proposal
>> is this: let the user specify via a compiler directive an encoding to
>> use for all conversions. Cython could then transparently and
>> efficiently handle all char* <-> str (a.k.a. unicode) encodings in
>> Py3, and unicode -> char* in Py2. If no encoding is specified char*
>> would still turn into bytes in Py3, and the conversions mentioned
>> above would be disallowed.
>>
>> This might be a good compromise between explicitness, safety, and  
>> ease
>> of use. Thoughts?
>
> I'm somewhat sceptical/undecided about char* being coerced to unicode
> this way, i.e. char*->unicode. I don't have a problem with the idea  
> for
> unicode->char*

This would only be in Py3, as usually bytes is the wrong thing to  
return (and expose to the user). What the best thing to do in Py2 is  
still unclear.

> (as long as bytes->char* is still OK as well ).

Yes, for sure.

- Robert

_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Reply via email to