Re: encoding problem

Joe Strout Fri, 19 Dec 2008 14:21:23 -0800

Marc 'BlackJack' Rintsch wrote:

And because strings in Python, unlike in (say) REALbasic, do not know
their encoding -- they're just a string of bytes.  If they were a string
of bytes PLUS an encoding, then every string would know what it is, and
things like conversion to another encoding, or concatenation of two
strings that may differ in encoding, could be handled automatically.


I consider this one of the great shortcomings of Python, but it's mostly
just a temporary inconvenience -- the world is moving to Unicode, and
with Python 3, we won't have to worry about it so much.

I don't see the shortcoming in Python <3.0. If you want real stringswith characters instead of just a bunch of bytes simply use `unicode`objects instead of `str`.

Fair enough -- that certainly is the best policy. But working with anyother encoding (sometimes necessary when interfacing with any othersoftware), it's still a bit of a PITA.

And does REALbasic really use byte strings plus an encoding!?


You betcha!  Works like a dream.

Sounds strange.  When concatenating which encoding "wins"?

The one that is a superset of the other, or if neither is, then both areconverted to UTF-8 (which is the "standard" encoding in RB, though itworks comfily with any other too).


Cheers,
- Joe

--
http://mail.python.org/mailman/listinfo/python-list

Re: encoding problem

Reply via email to