Marc 'BlackJack' Rintsch wrote:

And because strings in Python, unlike in (say) REALbasic, do not know
their encoding -- they're just a string of bytes.  If they were a string
of bytes PLUS an encoding, then every string would know what it is, and
things like conversion to another encoding, or concatenation of two
strings that may differ in encoding, could be handled automatically.

I consider this one of the great shortcomings of Python, but it's mostly
just a temporary inconvenience -- the world is moving to Unicode, and
with Python 3, we won't have to worry about it so much.

I don't see the shortcoming in Python <3.0. If you want real strings with characters instead of just a bunch of bytes simply use `unicode` objects instead of `str`.

Fair enough -- that certainly is the best policy. But working with any other encoding (sometimes necessary when interfacing with any other software), it's still a bit of a PITA.

And does REALbasic really use byte strings plus an encoding!?

You betcha!  Works like a dream.

Sounds strange.  When concatenating which encoding "wins"?

The one that is a superset of the other, or if neither is, then both are converted to UTF-8 (which is the "standard" encoding in RB, though it works comfily with any other too).

Cheers,
- Joe

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to