Re: [Cython] str arguments

Stefan Behnel Thu, 26 Nov 2009 22:46:46 -0800

Joachim Saul, 26.11.2009 22:53:
> It is not Cython's job to enforce correct (Cython) code. The 
> user (of the Cython-created modules) will normally not be aware of what 
> goes on inside.


Oh, this is absolutely not about compiler internals.


> To me, nevertheless, it would make a lot of sense to allow code like
> 
> def foo(list L):
>     cdef array arr  = L
> 
> given that at Python level
> 
>     arr = array(L)
> 
> is likely to work (depending on the *values* of L of course).

-1. Constructing objects should be explicit.


>> Lets look at the example that started this thread:
>> cdef extern void c_foo(char*)
>>
>> def foo(str bar=""):
>>      c_foo(bar)
>>
>> Now I understand that your position that this is incorrect, broken,  
>> and fragile code just waiting to blow up in the users face,
> 
> Under what conditions is this incorrect or fragile? Certainly not 
> because of the typing or the default vaule.

Sure, also because of that. The 'str' type does not accept unicode values
in Py2, so you will get a TypeError when users pass text as a unicode
string. That may not sound like a bit deal ("just tell them not to do it"),
but Py2 will actually convert strings to unicode on the fly (e.g. on string
joining and concatenation), so you may not even be aware that you passed a
unicode string. And you will most likely have a pretty hard time debugging
your code to find out where it originated from.

This problem doesn't usually appear in Python itself, because all Python
2.x APIs are designed to handle both str and unicode. But your code will
fail for non-str input.

Believe me, your problem is not with the oh-so-far-away Py3, where 'str' is
really what you want. It's with Py2.


> even though in Python 3 'str' is differently encoded.

<nitpick>
Unicode is not an encoding.
</nitpick>


> As a consequence, I need to 
> define a C function that takes care of the de/encoding from/to C strings 
> depending on whether this is Python 2 or 3. I am pretty sure doing this 
> at C level using #ifdef's for different Python versions should be 
> straightforward.

The preprocessor #if isn't the problem. The encoding of the string is.

But given that str->unicode coercion exists in Py2, maybe we should really
give this a helping hand here by generating automatic Py2-only input
coercion code for the 'str' type that encodes unicode input using the
platform encoding (just like Py2 does). That's not correct in many cases,
but at least you'd get exceptions for the same data that Py2 would have
trouble with, and the data that happens to work in other places of Py2
would pass silently.


> Manual type checking only if I expect bar to be anything other than a 
> 'str'. Which I don't :)

Too bad for the users of the API you write, who will then have to find a
work-around themselves. Putting the conversion code behind the API makes
sure it only has to be written once.


>> which is the most correct from a unicode standpoint, but may require  
>> substantial changes to an existing codebase (I know, it could be  
>> called bugfixing), is certainly less user-friendly, and could  
>> introduce a fair amount of overhead as well.
> 
> You probably mean less developer-friendly ;)

And more user friendly, although I expect the users of your library to be
developers, too.


> The changes to the existing codebase would probably only involve the 
> additional decoding step. The overhead is what I'm a bit concerned 
> about. See, often one wants to simply compare such codes, or perhaps 
> sorting a large list of objects with such strings as attributes; the 
> overhead can be expected to be quite substantial in such cases.

That's the price you pay for a world that's not defined in ASCII.

Stefan

_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Re: [Cython] str arguments

Reply via email to