Re: [Cython] Idea for automatic encoding and decoding

Stefan Behnel Sun, 13 Dec 2009 03:11:57 -0800

Robert Bradshaw, 13.12.2009 10:51:
> On Dec 12, 2009, at 11:35 PM, Stefan Behnel wrote:
>> So I think the right solution is to support automatic conversion  
>> *only* at the Python call boundary, i.e. for Python function
>> parameters and return values.
> 
> I disagree. Most of the examples here have been very simple, but in  
> general Python/C boundary need not be cleanly aligned with the Python  
> call boundary. Some more general examples would be
> 
>      cdef extern from "foo.h":
>          cdef cblarg(int i, char*):
> 
>      def blarg(obj):
>          cblarg(obj.id, obj.name)          # I realize I'm assuming  
> name is not a dynamically generated attribute...
> 
> or even
> 
>      def barg_all(list L):
>          for i, a in enumerate(L):
>              cblarg(i, a)


I guess I'm still not used to passing arbitrary user values into a C
function call without doing some kind of parameter checking before hand.
That's different for function arguments, where only the encoding would
happen automatically (and would raise an appropriate error on failure), and
the result would still be a safe Python bytes object that users can
validate in any way they want, without having to care about 0 bytes
silently becoming end markers.

We are still talking about two different use cases here. One deals with
automatic encoding of unicode strings into byte strings on input and with
automatic decoding of byte strings (or char*) on the way out.

The other use case deals with automatic coercion of Python string objects
to char*, which is what you show above. I personally think it's good to
keep those separate.

Remember that you mentioned the performance issue of a char* vs. a Python
object parameter when the function is called from Cython code? The only
place where this matters is for cpdef functions, and that should be rare
enough to ignore it and require an explicit wrapper function, as it's quite
likely that user input would have to be validated separately anyway.

To make this clear: I don't think it's worth encouraging users to drop
input validation in favour of automatic and unsafe coercion.


> I'm all for making string encodings easier to use, though as I've said  
> encode() and decode() seem to be a clean enough solution for nearly  
> everything but argument parsing.

That seems to match my distinction above then.


> However (and maybe this belongs on the other thread), you are  
> completely skirting the issue of being able to declare the encoding  
> for a block of code in one place, rather than having to specify it  
> every single place it is used.

Yes, the above would actually be orthogonal to that feature. Although I'm
not sure simply saying

    def func(bytes s):
        ...

plus a global setting somewhere at the top of your code is really readable
enough as "this function accepts unicode strings which get converted
automatically". And, no, I don't think typing the input parameter as "str"
is what people want in most cases. I'm really leaning towards the
assumption that most people really *want* bytes as basic string input type
in their Cython code. Either that, or exactly unicode strings. Not 'str'.


> I initially thought your concern with  
> char* <-> unicode conversion was the ambiguity in what character set  
> to use, which I was proposing could be declared at a higher than case- 
> by-case level. Is there another reason it is vital that the encoding  
> step and/or parameters be reiterated at every instance they are used?

I don't like code redundancy either. But making up a default should only be
the second step after fixing the semantics of the feature that has this
default.

Stefan
_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Re: [Cython] Idea for automatic encoding and decoding

Reply via email to