> Robert Bradshaw wrote:
> this is the kind
>> of thing that usually tells me there's a deficiency in the language  
>> that should be fixed to ease the users burden instead.

sure -- but the deficiency is in C (and py2), and that's not something 
we can fix. As for the Cython language, it should really follow Python: 
unicode for "text", bytes for arbitrary data.

But we need to deal with C (and fortran) no matter how you slice it.

I wrote a similar post on the numpy list: I think the key from a user's 
perspective is that one is either working with "text": human readable 
stuff, or data. If text, then the natural python3 data type is a unicode 
string. If data, then bytes -- we should really follow that as best we can.

> most of the  
> libraries we work with would probably balk at anything but ASCII  
> anyways

This is key. unicode is new, and AFAICT, C still doesn't really have a 
decent way to deal with it anyway (it never even had a native string type).

So a very, very, common usage is for C and Fortran code and libraries to 
expect char*, encoded in ASCI (or ANSI, but 1 byte per character, in any 
case). It needs to be easy, and perhaps automatic, to write code that 
crosses the Python-C border in these cases.

I've lost track of what has been proposed here, but it seems to me that 
we need a Cython type:

ANSI_string  (not that that's what it should be called)

It might be nice if there were a way to specify the encoding -- ASCII, 
Latin1, etc. though it would have to be a 1byte-per-character encoding. 
I'm not sure what the syntax could be for that, but I'd like to have it 
specified in there code near where it is used, rather than as a 
program-wide default.

If you declare a variable an ANSI_string, then Cython will convert to a 
char* internally, using ASCII (or another defined encoding). At the 
python level it could except either a unicode string or a byte string, 
passing the byte string right on through. A runtime errror would be 
raised if the input could not be ASCII encoded.

It seems this would handle the very common case of libraries expecting 
simple ascii strings for flags, etc.

It would be kind of like numpy's "asarray" call, in that it may or may 
not make a copy, depending on what the input is, but I don't think that 
would be problem, as strings are immutable anyway.

Wouldn't this be much like declaring a variable a C int, and being able 
to pass in python integers that may or may not (until run time) fit?

This completely from a user's perspective.

-Chris










-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/OR&R            (206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115       (206) 526-6317   main reception

[email protected]
_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Reply via email to