Re: [Cython] str arguments

Robert Bradshaw Thu, 26 Nov 2009 02:04:39 -0800

On Nov 26, 2009, at 12:05 AM, Stefan Behnel wrote:

> Robert Bradshaw, 25.11.2009 23:37:
>> On Nov 25, 2009, at 12:19 PM, Stefan Behnel wrote:
>>> Robert Bradshaw, 25.11.2009 19:54:
>>>> You can still use str in Python 3, it's
>>>> just that str -> char* will not happen automatically.
>>> Slight clarification: "str -> char* will not happen automatically"
>>> is also
>>> true for Py2. 'bytes' is the only type that automatically coerces
>>> from and to char*.
>>
>> So, short of a bunch of encoding code (can it be written Pyrex and
>> Cython compatible) there's no way to support this?
>
> It's easy to write encoding code that works in Python, Pyrex and  
> Cython,
> and if you encode your strings, it will also easily work in Py2.x  
> and Py3.
> If you don't, then your code will not work well in both Py3 *and* Py2.
>
>
>> I'd rather this is allowed with a (runtime) TypeError in Py3 and
>> possibly a compile time warning than disallowed altogether.
>
> Well, I prefer disallowing it all together, because it helps users in
> writing correct code, even if they don't care about Py3  
> compatibility for now.


I think this boils down to a philosophical difference. I think we  
should encourage users to write correct code, but not require it.  
That's why there's warnings vs. errors. We should only raise errors  
for things that could never work. (Well, I'm OK with a compile time  
error for something like

def foo(list L):
     cdef tuple t = L # might succeed if L is None

but that should almost never work.) Should we disallow the use of  
xrange, file, or reduce because they are no longer available in Python  
3? (I concede that their use in Py2 is more correct than sloppy  
handling of strings.) As another example, if someone declares

cdef extern from "Python.h":
     object PyDict_New(int foo)

than I don't think it's our job to notice and raise an error, we  
should let them use the (incorrect) definition and the C compiler will  
choke (or, maybe not, if they're working on a hacked or pre-release  
version of Python).

> Don't forget that typing variables as "str" doesn't magically solve  
> all
> problems. Not in all cases in Py3, but certainly not in Py2. It is
> important to /not/ type your variables as long as you don't need to,
> especially for function parameters that may receive both str and  
> unicode in
> Py2. Then, if you mean "bytes" in your code, write "bytes" and if  
> you mean
> "unicode", write "unicode" - but don't write "str" just because you  
> think
> it's text. "str" is just there so users can work with Python text  
> strings
> in both Py2 and Py3 without too much hassle (so it's actually for  
> people
> who care about Py3 compatibility), although the hassle is big enough
> already in Py2 as it requires users to deal with both "str" and  
> "unicode"
> for essentially the same thing. This is much easier in Py3.
>
> Whenever you want to pass strings into C (which is the only case  
> where this
> restriction /really/ matters IMHO), you have to care about the  
> content of
> the string anyway, so "str" will simply not work as it is  
> underdefined from
> a C level perspective. Ignoring the semantics of data will  
> necessarily make
> you write fragile code. Supporting that should not be a design goal  
> of the
> Cython language.
>
> I'm actually getting tired of rediscussing these things over and over
> again.

Me too--this is why for my first several posts I focused on the extern  
compatibility issue.

> Please come up with a valid use case where a currently unsupported
> automatic conversion between different string types clearly helps in
> writing simple and correct code.

Lets look at the example that started this thread:

cdef extern void c_foo(char*)

def foo(str bar=""):
     c_foo(bar)

Now I understand that your position that this is incorrect, broken,  
and fragile code just waiting to blow up in the users face, but from a  
pragmatic point of view it's working code that's part of a larger  
project and compiles and runs fine with Python 2 and Pyrex. It would  
compile and work fine with Cython as well except we have a check for  
just this case to disallow it [1], trying to force the user to write  
correct code. So what are the options?

def foo(bar=""):
      c_foo(bar)

is probably the easiest option, but is still incorrect in your book,  
and lacks the static typing desired to fit in with the rest of the  
code. (Personally, I would discourage typing bar, but perhaps foo is  
more complicated than this two-liner, and we shouldn't prohibit it.)  
The other proposed option

def foo(bytes bar=b""):
     c_foo(bar)

doesn't maintain compatibility with Pyrex, and though I have no idea  
what foo actually does, I would guess that this is not the API one  
would want to actually use in Py3. The most correct option is

def foo(bar=""):
     [manual type checking and decoding on bar, perhaps via a helper  
function]
     c_foo(bar_c)

which is the most correct from a unicode standpoint, but may require  
substantial changes to an existing codebase (I know, it could be  
called bugfixing), is certainly less user-friendly, and could  
introduce a fair amount of overhead as well.

My modest proposal is to allow bytes <-> str, char* <-> str, and str <- 
 > unicode *with* typechecking. (Alternatively, we could omit an  
explicit error at C compile time if conversion is attempted between  
incompatible types. I'm not sure whether that would be better.)

- Robert


[1] I'm actually surprised at

cdef list L
cdef str s
cdef char* ss0 = L # this will never work, but is accepted
cdef char* ss1 = s # this could succeed in Py2, but is prohibited for  
being incorrect by not explicitly encoding


_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Re: [Cython] str arguments

Reply via email to