"Ian Sparks" <[EMAIL PROTECTED]> writes: > This is probably stupid and/or misguided but supposing I'm passed a > byte-string value that I want to be unicode, this is what I do. I'm > sure I'm missing something very important.
Perhaps you need to read one of the good Python Unicode tutorials, such as: <URL:http://effbot.org/zone/unicode-objects.htm> > Short version : > > >>> s = "José" #Start with non-unicode string In what encoding? Once you step outside the ASCII character set, you *must* be explicit about the encoding used for the text. Because there is no sure way to infer it, Python refuses to guess. If you're going to include literal non-ASCII characters in the code (which is the simplest and most readable way), you must also tell Python what encoding to use when it reads the source file. <URL:http://docs.python.org/ref/encodings.html> > >>> unicoded = eval("u'%s'" % "José") Once you know the encoding, you can simply say:: >>> str_encoding = "iso-8859-1" >>> str = "José" >>> unicode_str = str.decode(str_encoding) (Note that I didn't type this using the iso-8859-1 encoding, so it's likely to be wrong in that respect; you'll need to change it to match your situation.) -- \ "To me, boxing is like a ballet, except there's no music, no | `\ choreography, and the dancers hit each other." -- Jack Handey | _o__) | Ben Finney -- http://mail.python.org/mailman/listinfo/python-list