On Fri, Nov 27, 2009 at 7:23 PM, Dag Sverre Seljebotn
<[email protected]> wrote:
> Robert Bradshaw wrote:
>> Though I usually try to avoid the topic, I've been thinking a lot
>> about string handling in Cython lately. I think we've taken a great
>> step forward in terms of usability with CEP 108, especially for those
>> who never deal with external libraries, but all this explicit encoding
>> and decoding still seems too heavy (though I understand why it's
>> necessary to deal with anything but pure ASCII). For an application
>> like lxml that is all about string processing, the verbosity and
>> explicitness isn't burdensome and the issue naturally comes up, but
>> this is not true of many applications. (For example the last time I
>> had to use strings, my character set was limited to [0-9Ee+-.].) On
>> the other hand, it's clear letting users just ignore the encoding
>> issue is unacceptable and undesirable.
>>
>> I had an epiphany when I realized that I find this burdensome not
>> because the user needs to specify an encoding, but that they have to
>> manually handle it every time they deal with a char*. So, my proposal
>> is this: let the user specify via a compiler directive an encoding to
>> use for all conversions. Cython could then transparently and
>> efficiently handle all char* <-> str (a.k.a. unicode) encodings in
>> Py3, and unicode -> char* in Py2. If no encoding is specified char*
>> would still turn into bytes in Py3, and the conversions mentioned
>> above would be disallowed.
>>
>> This might be a good compromise between explicitness, safety, and ease
>> of use. Thoughts?
>
> I'm somewhat sceptical/undecided about char* being coerced to unicode
> this way, i.e. char*->unicode. I don't have a problem with the idea for
> unicode->char* (as long as bytes->char* is still OK as well ).
>

I have the same feeling. However, I would accept to have two
directives: one for unicode->char*, and another for char*->unicode.
And of course, we will need a mechanism to override the default
encoding by using explicit encode()/decode() method call. For example,
if you have to deal with both text and filenames in a char*, you may
need to special-handle filenames (hello, ext* filesystems).


-- 
Lisandro Dalcín
---------------
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594
_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Reply via email to