> I find it easier to read
>
> def dostuff(str text):
> cdef char* s = text.encode("UTF-8")
> # do UTF-8 handling stuff
> return s.decode("UTF-8")
>
> than anything you could do with internal magic.
>
Will this work though? (I'm ignorant in this area of Cython). I.e., when
will the temporary returned from text.encode exit scope and be
collected? So one would need two more lines (or magic support for
keeping temporaries to the end of the function scope when assigning
temporaries to char*).
I forgot one important case in my list though: Passing string constants
to C libraries. With no conversion, one cannot at the same time keep
nice language consistency and also allow
cdef char* s = "asdf"
Robert's proposal has the advantage that it allows this notation in a
more consistent way.
Personally I'm now (forget about earlier opinions :-) ) ready to take
the "b" at this point, rather than breaking consistency or doing
undeclared magic. It's a nice reminder that using char* for strings is
not trivial, and probably avoids more bugs.
Also, nowadays it is rather seldom I think that char* is used directly
for strings... C++ have std::string, Linux GUI apps use specific QT or
GTK/GObject strings, and so on. Console applications sometimes use char*
but are conscious about encoding matters at a low level.
You suggested to use whatever encoding the source file is in in the case
above? Or have you backtracked from that now?
Dag Sverre
_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev