Re: [Cython] Another string encoding idea

Dag Sverre Seljebotn Sat, 28 Nov 2009 14:28:10 -0800

Robert Bradshaw wrote:
> On Nov 28, 2009, at 6:13 AM, Dag Sverre Seljebotn wrote:
>> In Python "u = s" always mean a strict transfer of reference. In  
>> Cython
>> we diverge from this (apart from raising exceptions on mismatch) in  
>> some
>> places:
>>  a) When converting intrensic types. However these are always  
>> immutable
>> in Python and so the semantic mismatch isn't there.
>>  b) Structs
>>  c) When converting char*<->bytes? (If a copy is made, otherwise it  
>> can
>> be considered similar with Python. I'm not sure what the case is.).
>>
>> Making "u = s" mean more than a pure transfer of reference/copy of
>> immutable object makes problems for both pure Python mode and
>> possibility of type inference. I believe it contradicts the direction
>> we've gone in -- that static types should be as optional as we can
>> possibly make them, instead, it is proposed that "u = s" is overloaded
>> to mean encoding conversion, which is something quite different from  
>> an
>> assignment.
> 
> For the C <-> Python conversions (whether by assignment or casting) as  
> "create the best Python (or C) equivalent."  The directive would flag  
> that the Python equivalent of char* is str in Py3, not bytes.
> 
> Trying to make it easy to not violate the principle of least surprise,  
> as I find bytes objects surprising to deal with.


I didn't put it very well, but my point was mainly that whatever the 
advantages are, it does cause problems for getting the Cython language 
closer to the Python language, type inference etc. (due to assignments 
making a copy of the data of a mutable type -- char* is mutable).

I actually wouldn't mind so much with "const char*".

> My personal concern is the pain I see porting Sage to Py3. I'd have to  
> go through the codebase and throw in encodes() and decodes() and  
> change signatures of functions that take char* arguments (which, I  
> just realized, will be a step backwards for cpdef functions). The  
> thought of mechanically going through and doing all of this,  
> especially when I would be surprised to see any benefit (most of the  
> libraries we work with would probably balk at anything but ASCII  
> anyways), makes me wonder if there's a better way...this is the kind  
> of thing that usually tells me there's a deficiency in the language  
> that should be fixed to ease the users burden instead. I would also  
> have a hard time explaining to people (including myself) why this step  
> of encoding/decoding can't just be automated everywhere (unless  
> there's truly a technical obstruction). That's where I'm coming from.
> 
>> I just don't mentally associate char* with
>> strings at all and thus didn't ever think about this as a problem...
> 
> What do you associate with strings in C?

C doesn't have strings. Just like it doesn't have a lot of other 
convenient types. It does have byte streams which can be in an encoding. 
Because of this it feels *very* natural to explicitly call encode/decode.

But this isn't very constructive, just rehashing...

-- 
Dag Sverre
_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Re: [Cython] Another string encoding idea

Reply via email to