> The magic of uchar("gb5)* getting translated to the above, it would  
> complicate the type system (both in terms of codebase and user's  
> perspective). Would conversion be performed assigning from a uchar 
> ("gb5") to a uchar("utf-8"), or to a uchar*? If we decide to support  
> such automatic conversions, this seems like the best syntax I've  
> seen, but still think the default should be accept unicode objects  
> (via utf-8).
>   
If the uchar way is convenient enough to use, one could in situations 
where one simply "hasn't thought about encoding" use auto-conversion 
with ascii encoding instead of utf-8, then one can still do roundtrips 
for safe ascii strings but be warned it situations where one *should 
have* thought through encoding issues (though one looses automatic 
unicode -> char* -> unicode roundtrip for non-ASCII string, which is a 
case where one wouldn't have to think through it...one would have to do 
unicode -> uchar(utf-8)* -> unicode).

I'll spell out the problems you mention with uchar. I think the most 
natural (but still not very good) behaviour would be:

cdef uchar("utf-8")* x = ...

1)
cdef uchar("gb5")* y = x
print y
# Should probably either disallow the coercion, or
# does charset conversion (implementation could be coercing
# to Python unicode and back).

2)
cdef object o = x
cdef uchar("gb5")* y = o
print y
# This is ok, conversion done

3)
cdef char* c = x
cdef uchar("gb5")* y = c
print y
# Here we have problems

I don't think there's a way around this. Still, case 3) is pretty 
specific; if one actively specifies an encoding and then assigns a char* 
buffer into it, it is kind of implied that one should somehow know that 
it actually has that encoding.

-- 
Dag Sverre

_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Reply via email to