Re: [Cython] str arguments

Robert Bradshaw Fri, 27 Nov 2009 13:08:31 -0800

On Nov 26, 2009, at 4:48 AM, Stefan Behnel wrote:

> Hi Robert,
>
> Robert Bradshaw, 26.11.2009 11:03:
>> I think this boils down to a philosophical difference. I think we
>> should encourage users to write correct code, but not require it.
>> That's why there's warnings vs. errors. We should only raise errors
>> for things that could never work.
>
> I'm fine with that in general. However, for the specific case of  
> strings, I
> must note that Py3 is as much a reality as Py2, and that the  
> semantics of
> strings in Cython dictate that certain things will not work in certain
> Python runtimes. I admit that a warning allows users to actively say  
> "I
> don't care, my code will never run in a different environment  
> anyway". So I
> guess there is a case for making it a warning.
>
>
>>> Please come up with a valid use case where a currently unsupported
>>> automatic conversion between different string types clearly helps in
>>> writing simple and correct code.
>>
>> Lets look at the example that started this thread:
>>
>> cdef extern void c_foo(char*)
>>
>> def foo(str bar=""):
>>     c_foo(bar)
>>
>> Now I understand that your position that this is incorrect, broken,
>> and fragile code just waiting to blow up in the users face, but  
>> from a
>> pragmatic point of view it's working code that's part of a larger
>> project and compiles and runs fine with Python 2 and Pyrex. It would
>> compile and work fine with Cython as well except we have a check for
>> just this case to disallow it [1], trying to force the user to write
>> correct code.
>
> My position is that "str" is not the right type here. It's just like  
> using
> "float" where "long" would be appropriate.


Sure.

> If you want "bytes", write "bytes".

Bytes is usually the wrong API. Encoding, if necessary, should usually  
happen inside the API, not be left to the users.

> That will also give you an appropriate
> runtime exception in both Py2 and Py3 when you pass a unicode  
> string. No
> harm done to anyone. If you use 'str', the exception you get in Py3  
> later
> in the code will be much less beautiful.

I think the error would actually be better--instead of TypeError when  
you call the function, it would be a TypeError (or it could even be an  
unspecified ending error) at the point conversion to a char* is  
attempted.

Anther deficiency in my book is that it would reject rather than  
handle unicode in Py2.

>> The other proposed option
>>
>> def foo(bytes bar=b""):
>>     c_foo(bar)
>>
>> doesn't maintain compatibility with Pyrex
>
> Note that the following works in both Cython and Pyrex:
>
>       cdef bytes b = "abcdefg"

It doesn't work with either, as the bytes type is not a builtin in  
Pyrex (and can't even be imported, as we fake it for Py2), and we  
require the b"" prefix in Cython.

My other issue with the bytes object is that most people don't know  
about it (yes, they probably should) so it's non-obvious to try (and  
also has non-intuitive semantics if it leaks out).

> This is actually specified in CEP 108, and it's supported for good  
> reason.
> So the above could simply be written as
>
>  def foo(bytes bar=""):
>      c_foo(bar)
>
> in a portable way. (If this doesn't work in Cython, I would consider  
> that a
> bug w.r.t. the CEP).

Looks like we decided to start out strict, but maybe this is good  
reason to loosen it:

http://www.mail-archive.com/[email protected]/msg07268.html

>> and though I have no idea
>> what foo actually does, I would guess that this is not the API one
>> would want to actually use in Py3.
>
> At least the OP mentioned the intention to pass "text", so I'd  
> second that
> (also in Py2, BTW).
>
>
>> The most correct option is
>>
>> def foo(bar=""):
>>     [manual type checking and decoding on bar, perhaps via a helper
>> function]
>>     c_foo(bar_c)
>>
>> which is the most correct from a unicode standpoint, but may require
>> substantial changes to an existing codebase (I know, it could be
>> called bugfixing), is certainly less user-friendly, and could
>> introduce a fair amount of overhead as well.
>
> Ok, I understand that you understand that this is required anyway,  
> both in
> Py2 and Py3, so there's not much to add here.
>
> But what about actually making this specific case easier? Not caring  
> too
> much about syntax for now, we could support something like
>
>       def foo(bytes[encoding='UTF-8'] bar=""):
>            c_foo(bar_c)
>
> and, respectively:
>
>       def foo(unicode[encoding='UTF-8'] bar=u""):
>           assert isinstance(bar, unicode)
>
>       def foo(str[encoding='UTF-8'] bar=""):
>           assert isinstance(bar, str)
>
> and let Cython generate the argument type checking and conversion code
> internally, based on what kind of string is passed at runtime. Even  
> 1D char
> buffers could be supported as argument here...
>
> OTOH, this is really only required in Py2.x, and Py3 is the pretty
> immediate future. Adding syntactic sugar for a soon-to-be-dead  
> platform may
> just drop legacy stuff into the Cython language. But it would  
> certainly
> solve the above problem.

I just got another idea for this (I'll start new thread).

>
>
>> My modest proposal
>
> ;)

Well, you know I can come up with far more outrageous ones when it  
comes to string handling... :)

>
>
>> is to allow bytes <-> str, char* <-> str, and str <-
>>> unicode *with* typechecking.
>
> So, you would basically allow this:
>
> cdef char* s = "abcdef"
> cdef bytes bs = s
> cdef str pys = bs
> cdef unicode us = pys
>
> Not beautiful, if you ask me, and I still haven't seen a use case that
> supports this.

Of course the above would fail with a TypeError in both Py2 and Py3  
(one either the last or second-to-last line).

> What's so bad about an explicit cast if you really think you
> need such an assignment? You can even use a safe "<bytes?>" cast to  
> tell
> the compiler that you are not sure if this works.

The most useful is the (Py2 only) str <-> char*, which currently fails  
in Cython. A <bytes?> cast is incompatible with Pyrex in two separate  
ways, as well as not being obvious.

>> (Alternatively, we could omit an
>> explicit error at C compile time if conversion is attempted between
>> incompatible types. I'm not sure whether that would be better.)
>
> I thought about that, too, but I doubt that it would be more helpful  
> than a
> runtime error.

I've thought of another reason it would be bad--it would prevent a  
user from using the rest of a library if an illegal conversion is  
attempted in one function, so a runtime error it is.

- Robert

_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Re: [Cython] str arguments

Reply via email to