Hi Robert,
Robert Bradshaw, 26.11.2009 11:03:
> I think this boils down to a philosophical difference. I think we
> should encourage users to write correct code, but not require it.
> That's why there's warnings vs. errors. We should only raise errors
> for things that could never work.
I'm fine with that in general. However, for the specific case of strings, I
must note that Py3 is as much a reality as Py2, and that the semantics of
strings in Cython dictate that certain things will not work in certain
Python runtimes. I admit that a warning allows users to actively say "I
don't care, my code will never run in a different environment anyway". So I
guess there is a case for making it a warning.
>> Please come up with a valid use case where a currently unsupported
>> automatic conversion between different string types clearly helps in
>> writing simple and correct code.
>
> Lets look at the example that started this thread:
>
> cdef extern void c_foo(char*)
>
> def foo(str bar=""):
> c_foo(bar)
>
> Now I understand that your position that this is incorrect, broken,
> and fragile code just waiting to blow up in the users face, but from a
> pragmatic point of view it's working code that's part of a larger
> project and compiles and runs fine with Python 2 and Pyrex. It would
> compile and work fine with Cython as well except we have a check for
> just this case to disallow it [1], trying to force the user to write
> correct code.
My position is that "str" is not the right type here. It's just like using
"float" where "long" would be appropriate.
If you want "bytes", write "bytes". That will also give you an appropriate
runtime exception in both Py2 and Py3 when you pass a unicode string. No
harm done to anyone. If you use 'str', the exception you get in Py3 later
in the code will be much less beautiful.
> So what are the options?
>
> def foo(bar=""):
> c_foo(bar)
>
> is probably the easiest option, but is still incorrect in your book,
> and lacks the static typing desired to fit in with the rest of the
> code.
It's incorrect, sure. However, the difference is that 'bar' is not
explicitly typed, so it's a plain Python object. The contract for the
Python object type is that it can accept any Python object. And coercing
any Python object to char* is allowed because it explicitly moves the type
check to runtime.
So it may surprise you, but I'm actually fine with not doing anything about
the above code, and just accepting it as is, without any warning.
It's an entirely different beast to type 'bar', thus restricting it to a
specific type of Python object. This type may or may not be able to coerce
to char*, but Cython can determine that at compile time if it knows the
type. Typing is not only for speed, it's also to say: dear compiler, please
check that this variable is only used with the type I give it and tell me
if I get it wrong anywhere in my code.
Note that you can also do this (warning, unhelpful code ahead):
def foo(str bar="")
cdef bytes b
if isinstance(bar, unicode):
b = bar.encode('UTF-8')
else:
b = <bytes>bar
return b
So it's not that these assignments are impossible. They are just not
allowed with *implicit* coercions.
> The other proposed option
>
> def foo(bytes bar=b""):
> c_foo(bar)
>
> doesn't maintain compatibility with Pyrex
Note that the following works in both Cython and Pyrex:
cdef bytes b = "abcdefg"
This is actually specified in CEP 108, and it's supported for good reason.
So the above could simply be written as
def foo(bytes bar=""):
c_foo(bar)
in a portable way. (If this doesn't work in Cython, I would consider that a
bug w.r.t. the CEP).
> and though I have no idea
> what foo actually does, I would guess that this is not the API one
> would want to actually use in Py3.
At least the OP mentioned the intention to pass "text", so I'd second that
(also in Py2, BTW).
> The most correct option is
>
> def foo(bar=""):
> [manual type checking and decoding on bar, perhaps via a helper
> function]
> c_foo(bar_c)
>
> which is the most correct from a unicode standpoint, but may require
> substantial changes to an existing codebase (I know, it could be
> called bugfixing), is certainly less user-friendly, and could
> introduce a fair amount of overhead as well.
Ok, I understand that you understand that this is required anyway, both in
Py2 and Py3, so there's not much to add here.
But what about actually making this specific case easier? Not caring too
much about syntax for now, we could support something like
def foo(bytes[encoding='UTF-8'] bar=""):
c_foo(bar_c)
and, respectively:
def foo(unicode[encoding='UTF-8'] bar=u""):
assert isinstance(bar, unicode)
def foo(str[encoding='UTF-8'] bar=""):
assert isinstance(bar, str)
and let Cython generate the argument type checking and conversion code
internally, based on what kind of string is passed at runtime. Even 1D char
buffers could be supported as argument here...
OTOH, this is really only required in Py2.x, and Py3 is the pretty
immediate future. Adding syntactic sugar for a soon-to-be-dead platform may
just drop legacy stuff into the Cython language. But it would certainly
solve the above problem.
> My modest proposal
;)
> is to allow bytes <-> str, char* <-> str, and str <-
> > unicode *with* typechecking.
So, you would basically allow this:
cdef char* s = "abcdef"
cdef bytes bs = s
cdef str pys = bs
cdef unicode us = pys
Not beautiful, if you ask me, and I still haven't seen a use case that
supports this. What's so bad about an explicit cast if you really think you
need such an assignment? You can even use a safe "<bytes?>" cast to tell
the compiler that you are not sure if this works.
> (Alternatively, we could omit an
> explicit error at C compile time if conversion is attempted between
> incompatible types. I'm not sure whether that would be better.)
I thought about that, too, but I doubt that it would be more helpful than a
runtime error.
> I'm actually surprised at
>
> cdef list L
> cdef str s
> cdef char* ss0 = L # this will never work, but is accepted
> cdef char* ss1 = s # this could succeed in Py2, but is prohibited for
> being incorrect by not explicitly encoding
As I said in my answer to Greg, I would appreciate it if Cython considered
the list assignment an error. But as you see from the current behaviour for
external types, it's not trivial to set up the rules correctly.
Stefan
_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev