On Fri, 19 Dec 2008 08:20:07 -0700, Joe Strout wrote: > Marc 'BlackJack' Rintsch wrote: > >>> The question is why the Python interpreter use the default encoding >>> instead of "utf-8", which I explicitly declared in the source. >> >> Because the declaration is only for decoding unicode literals in that >> very source file. > > And because strings in Python, unlike in (say) REALbasic, do not know > their encoding -- they're just a string of bytes. If they were a string > of bytes PLUS an encoding, then every string would know what it is, and > things like conversion to another encoding, or concatenation of two > strings that may differ in encoding, could be handled automatically. > > I consider this one of the great shortcomings of Python, but it's mostly > just a temporary inconvenience -- the world is moving to Unicode, and > with Python 3, we won't have to worry about it so much.
I don't see the shortcoming in Python <3.0. If you want real strings with characters instead of just a bunch of bytes simply use `unicode` objects instead of `str`. And does REALbasic really use byte strings plus an encoding!? Sounds strange. When concatenating which encoding "wins"? Ciao, Marc 'BlackJack' Rintsch -- http://mail.python.org/mailman/listinfo/python-list