I know this is wrong, but I'm not sure just how wrong it is, or why. Using Python 2.x:
>>> s = "éâÄ" >>> print s éâÄ >>> len(s) 6 >>> list(s) ['\xc3', '\xa9', '\xc3', '\xa2', '\xc3', '\x84'] Can somebody explain what happens when I put non-ASCII characters into a non-unicode string? My guess is that the result will depend on the current encoding of my terminal. In this case, my terminal is set to UTF-8. If I change it to ISO 8859-1, and repeat the above, I get this: >>> list("éâÄ") ['\xe9', '\xe2', '\xc4'] If I do this: >>> s = u"éâÄ" >>> s.encode('utf-8') '\xc3\xa9\xc3\xa2\xc3\x84' >>> s.encode('iso8859-1') '\xe9\xe2\xc4' which at least explains why the bytes have the values which they do. Thank you, -- Steven -- http://mail.python.org/mailman/listinfo/python-list