Robert Bradshaw wrote:
> Though I usually try to avoid the topic, I've been thinking a lot  
> about string handling in Cython lately. I think we've taken a great  
> step forward in terms of usability with CEP 108, especially for those  
> who never deal with external libraries, but all this explicit encoding  
> and decoding still seems too heavy (though I understand why it's  
> necessary to deal with anything but pure ASCII). For an application  
> like lxml that is all about string processing, the verbosity and  
> explicitness isn't burdensome and the issue naturally comes up, but  
> this is not true of many applications. (For example the last time I  
> had to use strings, my character set was limited to [0-9Ee+-.].) On  
> the other hand, it's clear letting users just ignore the encoding  
> issue is unacceptable and undesirable.
> 
> I had an epiphany when I realized that I find this burdensome not  
> because the user needs to specify an encoding, but that they have to  
> manually handle it every time they deal with a char*. So, my proposal  
> is this: let the user specify via a compiler directive an encoding to  
> use for all conversions. Cython could then transparently and  
> efficiently handle all char* <-> str (a.k.a. unicode) encodings in  
> Py3, and unicode -> char* in Py2. If no encoding is specified char*  
> would still turn into bytes in Py3, and the conversions mentioned  
> above would be disallowed.
> 
> This might be a good compromise between explicitness, safety, and ease  
> of use. Thoughts?

I'm somewhat sceptical/undecided about char* being coerced to unicode 
this way, i.e. char*->unicode. I don't have a problem with the idea for 
unicode->char* (as long as bytes->char* is still OK as well ).


-- 
Dag Sverre
_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Reply via email to