On Dec 20, 10:02 am, Marc 'BlackJack' Rintsch <bj_...@gmx.net> wrote: > On Fri, 19 Dec 2008 15:20:08 -0700, Joe Strout wrote: > > Marc 'BlackJack' Rintsch wrote: > > >>> And because strings in Python, unlike in (say) REALbasic, do not know > >>> their encoding -- they're just a string of bytes. If they were a > >>> string of bytes PLUS an encoding, then every string would know what it > >>> is, and things like conversion to another encoding, or concatenation > >>> of two strings that may differ in encoding, could be handled > >>> automatically. > > >>> I consider this one of the great shortcomings of Python, but it's > >>> mostly just a temporary inconvenience -- the world is moving to > >>> Unicode, and with Python 3, we won't have to worry about it so much. > > >> I don't see the shortcoming in Python <3.0. If you want real strings > >> with characters instead of just a bunch of bytes simply use `unicode` > >> objects instead of `str`. > > > Fair enough -- that certainly is the best policy. But working with any > > other encoding (sometimes necessary when interfacing with any other > > software), it's still a bit of a PITA. > > But it has to be. There is no automagic guessing possible. > > >> And does REALbasic really use byte strings plus an encoding!? > > > You betcha! Works like a dream. > > IMHO a strange design decision. A lot more hassle compared to an opaque > unicode string type which uses some internal encoding that makes > operations like getting a character at a given index easy or > concatenating without the need to reencode.
In general I quite agree with you ... hoever with Unicode "getting a character at a given index" is fine unless and until you stray (or are dragged!) outside the BMP and you have only a 16-bit Unicode implementation. -- http://mail.python.org/mailman/listinfo/python-list