On Nov 28, 2009, at 6:13 AM, Dag Sverre Seljebotn wrote:
> Robert Bradshaw wrote:
>> On Nov 27, 2009, at 10:52 PM, Stefan Behnel wrote:
>>> Currently, coercion from char*/bytes to unicode is an explicit step
>>> that is
>>> easy to do via
>>>
>>> cdef char* s = ...
>>> u = s[:length].decode('UTF-8')
>>>
>>> in 0.12. See
>>>
>>> http://trac.cython.org/cython_trac/ticket/436
>>
>> That is an improvement, though still a lot more baggage than
>>
>> cdef char* s = ...
>> u = s
>
> Hmm. Seeing it in action makes me worry even more. I'm leaning towards
> -1 for the whole proposal now.
>
> In Python "u = s" always mean a strict transfer of reference. In
> Cython
> we diverge from this (apart from raising exceptions on mismatch) in
> some
> places:
> a) When converting intrensic types. However these are always
> immutable
> in Python and so the semantic mismatch isn't there.
> b) Structs
> c) When converting char*<->bytes? (If a copy is made, otherwise it
> can
> be considered similar with Python. I'm not sure what the case is.).
>
> Making "u = s" mean more than a pure transfer of reference/copy of
> immutable object makes problems for both pure Python mode and
> possibility of type inference. I believe it contradicts the direction
> we've gone in -- that static types should be as optional as we can
> possibly make them, instead, it is proposed that "u = s" is overloaded
> to mean encoding conversion, which is something quite different from
> an
> assignment.
For the C <-> Python conversions (whether by assignment or casting) as
"create the best Python (or C) equivalent." The directive would flag
that the Python equivalent of char* is str in Py3, not bytes.
Trying to make it easy to not violate the principle of least surprise,
as I find bytes objects surprising to deal with.
> I believe this contradicts the Pythonic philosophy of being explicit
> (where even "self" is passed explicitly...).
>
> One solution around this would be to create a new, Cython-specific
> string class which constitutes a view on a char*, rather than a copy.
> Views are fine (as they are semantically similar to a pure reference
> assignment "u = s").
That may be as much overhead (and less intuitive) than explicitly
decoding and encoding.
> Is proper string handling creating big problems in Sage, since the
> question keeps coming up?
With Sage we ignore the issue completely (with the exception of the
notebook, which is all in Python anyway), and it works fine. In fact,
I can't remember any complaints about it (again, except for the
notebook) and we have more non-US users than US users (extrapolating
from the latest web stats). I don't usually bring the topic up, it
just comes up in response to user inquiries.
My personal concern is the pain I see porting Sage to Py3. I'd have to
go through the codebase and throw in encodes() and decodes() and
change signatures of functions that take char* arguments (which, I
just realized, will be a step backwards for cpdef functions). The
thought of mechanically going through and doing all of this,
especially when I would be surprised to see any benefit (most of the
libraries we work with would probably balk at anything but ASCII
anyways), makes me wonder if there's a better way...this is the kind
of thing that usually tells me there's a deficiency in the language
that should be fixed to ease the users burden instead. I would also
have a hard time explaining to people (including myself) why this step
of encoding/decoding can't just be automated everywhere (unless
there's truly a technical obstruction). That's where I'm coming from.
> I just don't mentally associate char* with
> strings at all and thus didn't ever think about this as a problem...
What do you associate with strings in C?
> I don't know why one wouldn't want to call encode/decode explicitly if
> only to make better self-documenting code about what is going on.
If explicitly encoding and decoding is irrelevant to the purpose of
the function call, I think it makes the code less clean and readable.
- Robert
_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev