Re: [Cython] str arguments

Joachim Saul Thu, 26 Nov 2009 13:53:48 -0800

Robert Bradshaw [26.11.2009 11:03]:
> (Well, I'm OK with a compile time  
> error for something like
>
> def foo(list L):
>      cdef tuple t = L # might succeed if L is None
>
> but that should almost never work.) Should we disallow the use of  
> xrange, file, or reduce because they are no longer available in Python  
> 3? (I concede that their use in Py2 is more correct than sloppy  
> handling of strings.) As another example, if someone declares
>
> cdef extern from "Python.h":
>      object PyDict_New(int foo)
>
> than I don't think it's our job to notice and raise an error, we  
> should let them use the (incorrect) definition and the C compiler will  
> choke (or, maybe not, if they're working on a hacked or pre-release  
> version of Python).
>


Exactly. It is not Cython's job to enforce correct (Cython) code. The 
user (of the Cython-created modules) will normally not be aware of what 
goes on inside.

To me, nevertheless, it would make a lot of sense to allow code like

def foo(list L):
    cdef array arr  = L

given that at Python level

    arr = array(L)

is likely to work (depending on the *values* of L of course).

> Lets look at the example that started this thread:
> cdef extern void c_foo(char*)
>
> def foo(str bar=""):
>      c_foo(bar)
>
> Now I understand that your position that this is incorrect, broken,  
> and fragile code just waiting to blow up in the users face,

Under what conditions is this incorrect or fragile? Certainly not 
because of the typing or the default vaule. Maybe I need to add that the 
argument to c_foo() is interpreted as 0-terminated character string. 
This will obviously give incorrect results if bar itself contains 
0-characters, which can be ruled out unless such strings are created 
deliberately. Usually it will be simple ASCII character strings like 
"ABC" or the like. In this particular case it could be sensor ID's. I 
recognize that the c_foo() interface is not the most robust, but in the 
context in which this code is used, the value of bar can be trusted.

>  but from a  
> pragmatic point of view it's working code that's part of a larger  
> project and compiles and runs fine with Python 2 and Pyrex. It would  
> compile and work fine with Cython as well except we have a check for  
> just this case to disallow it [1], trying to force the user to write  
> correct code. So what are the options?
>
> def foo(bar=""):
>       c_foo(bar)
>
> is probably the easiest option, but is still incorrect in your book,  
> and lacks the static typing desired to fit in with the rest of the  
> code. (Personally, I would discourage typing bar, but perhaps foo is  
> more complicated than this two-liner, and we shouldn't prohibit it.)  
> The other proposed option
>
> def foo(bytes bar=b""):
>      c_foo(bar)
>
> doesn't maintain compatibility with Pyrex, and though I have no idea  
> what foo actually does, I would guess that this is not the API one  
> would want to actually use in Py3.

Indeed not. I must admit I haven't used Python 3 yet and am not very 
familiar so far with the new unicode str's, bytes-type etc.

What I certainly don't want is some sensor ID "ABC" encoded as bytes.

code=b"ABC"
foo(code)
print(code[1])
66

Event though formally correct, this is not what the user (at Python 
level) would consider acceptable.

The user wants and shall be able to

code="ABC"
foo(code)
print(code[1])
'B'

so 'code' must be a 'str', there is no alternative to this even though 
in Python 3 'str' is differently encoded. As a consequence, I need to 
define a C function that takes care of the de/encoding from/to C strings 
depending on whether this is Python 2 or 3. I am pretty sure doing this 
at C level using #ifdef's for different Python versions should be 
straightforward.

>  The most correct option is
>
> def foo(bar=""):
>      [manual type checking and decoding on bar, perhaps via a helper  
> function]
>   

Manual type checking only if I expect bar to be anything other than a 
'str'. Which I don't :)

>      c_foo(bar_c)
>   

I agree on the decoding.

> which is the most correct from a unicode standpoint, but may require  
> substantial changes to an existing codebase (I know, it could be  
> called bugfixing), is certainly less user-friendly, and could  
> introduce a fair amount of overhead as well.
>   

You probably mean less developer-friendly ;)

The changes to the existing codebase would probably only involve the 
additional decoding step. The overhead is what I'm a bit concerned 
about. See, often one wants to simply compare such codes, or perhaps 
sorting a large list of objects with such strings as attributes; the 
overhead can be expected to be quite substantial in such cases.

Cheers,
Joachim
_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Re: [Cython] str arguments

Reply via email to