Robert Bradshaw wrote:
> So if I'm understanding correctly here utf8 would behave like a bytes
> object except one could assign unicode objects to it? Would
>
> def flump(utf8 s):
> return s
>
> return a bytes object?
That's something that would require some thought. It's another
case where declaring a return type might be needed.
> The final goal is not to get a bytes object, but a
> char*, so it seems more natural to put the decoding at that spot.
The reason for the intermediate bytes object is that it neatly
solves the memory management issue that arises if you try to
go directly from str to char *, and it does it without having
to make a special case of function arguments.
> There is kind of an odd asymmetry here, for instance if I had a
> function that both accepted and returned a char* I would have to write
>
> cdef extern from "foo.h":
> utf8* cblarg(char*)
You can write that part more symmetrically if you want:
cdef extern from "foo.h":
utf8* cblarg(utf8*)
> [somewhere much later]
>
> def blarg(utf8 s):
> return cblarg(s)
Yes, it's a bit odd having the str->bytes conversion
determined by the Python side and the bytes->str
by the C side. I'll have to think about it some more.
> My whole goal was to not have to be explicit at each point, but to be
> able to specify the encoding (or at least to use a default encoding)
> for an entire file
Yes, I realise it doesn't fully address your use case.
It's more aimed at people who think a blanket declaration
would be too implicit and error-prone.
However, it seems to be difficult to implement fully
automatic conversions directly between str and char *
except for a very few encodings -- ascii and utf8 --
and even the latter would appear to hinge on a
deprecated feature held over from Py2.
The advantages of my proposal are that it would work
for any encoding and wouldn't be restricted to function
arguments.
--
Greg
_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev