> The magic of uchar("gb5)* getting translated to the above, it would
> complicate the type system (both in terms of codebase and user's
> perspective). Would conversion be performed assigning from a uchar
> ("gb5") to a uchar("utf-8"), or to a uchar*? If we decide to support
> such automatic conversions, this seems like the best syntax I've
> seen, but still think the default should be accept unicode objects
> (via utf-8).
>
If the uchar way is convenient enough to use, one could in situations
where one simply "hasn't thought about encoding" use auto-conversion
with ascii encoding instead of utf-8, then one can still do roundtrips
for safe ascii strings but be warned it situations where one *should
have* thought through encoding issues (though one looses automatic
unicode -> char* -> unicode roundtrip for non-ASCII string, which is a
case where one wouldn't have to think through it...one would have to do
unicode -> uchar(utf-8)* -> unicode).
I'll spell out the problems you mention with uchar. I think the most
natural (but still not very good) behaviour would be:
cdef uchar("utf-8")* x = ...
1)
cdef uchar("gb5")* y = x
print y
# Should probably either disallow the coercion, or
# does charset conversion (implementation could be coercing
# to Python unicode and back).
2)
cdef object o = x
cdef uchar("gb5")* y = o
print y
# This is ok, conversion done
3)
cdef char* c = x
cdef uchar("gb5")* y = c
print y
# Here we have problems
I don't think there's a way around this. Still, case 3) is pretty
specific; if one actively specifies an encoding and then assigns a char*
buffer into it, it is kind of implied that one should somehow know that
it actually has that encoding.
--
Dag Sverre
_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev