> Robert Bradshaw wrote: > this is the kind >> of thing that usually tells me there's a deficiency in the language >> that should be fixed to ease the users burden instead.
sure -- but the deficiency is in C (and py2), and that's not something we can fix. As for the Cython language, it should really follow Python: unicode for "text", bytes for arbitrary data. But we need to deal with C (and fortran) no matter how you slice it. I wrote a similar post on the numpy list: I think the key from a user's perspective is that one is either working with "text": human readable stuff, or data. If text, then the natural python3 data type is a unicode string. If data, then bytes -- we should really follow that as best we can. > most of the > libraries we work with would probably balk at anything but ASCII > anyways This is key. unicode is new, and AFAICT, C still doesn't really have a decent way to deal with it anyway (it never even had a native string type). So a very, very, common usage is for C and Fortran code and libraries to expect char*, encoded in ASCI (or ANSI, but 1 byte per character, in any case). It needs to be easy, and perhaps automatic, to write code that crosses the Python-C border in these cases. I've lost track of what has been proposed here, but it seems to me that we need a Cython type: ANSI_string (not that that's what it should be called) It might be nice if there were a way to specify the encoding -- ASCII, Latin1, etc. though it would have to be a 1byte-per-character encoding. I'm not sure what the syntax could be for that, but I'd like to have it specified in there code near where it is used, rather than as a program-wide default. If you declare a variable an ANSI_string, then Cython will convert to a char* internally, using ASCII (or another defined encoding). At the python level it could except either a unicode string or a byte string, passing the byte string right on through. A runtime errror would be raised if the input could not be ASCII encoded. It seems this would handle the very common case of libraries expecting simple ascii strings for flags, etc. It would be kind of like numpy's "asarray" call, in that it may or may not make a copy, depending on what the input is, but I don't think that would be problem, as strings are immutable anyway. Wouldn't this be much like declaring a variable a C int, and being able to pass in python integers that may or may not (until run time) fit? This completely from a user's perspective. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception [email protected] _______________________________________________ Cython-dev mailing list [email protected] http://codespeak.net/mailman/listinfo/cython-dev
