On Apr 25, 10:01 pm, Bjoern Schliessmann <usenet- [EMAIL PROTECTED]> wrote: > [EMAIL PROTECTED] wrote: > > # media is a binary string (mysql escaped zipped file) > > >>>> print media > > x???[? ... > > (works) > > Which encoding, perhaps UTF-8 or ISO8859-1? > > >>>> print unicode(media) > > UnicodeDecodeError: 'ascii' codec can't decode byte 0x9c in > > position 1: ordinal not in range(128) > > (ok i guess print assumes you want to print to ascii) > > Not at all -- unicode tries to decode the byte string you gave it, > but doesn't know which encoding to use, so it falls back to ASCII. > > You should decode all "incoming" byte strings to unicode objects > using the right encoding -- here I tried yours with UTF-8. This > works best using string's method "decode" which returns a unicode > object. > > >>> media="x???[?" > >>> print repr(media.decode("utf-8")) > > u'x\u30ef\u30e6\u30ed[\u30e8' >
But that_unicode_string.encode("utf-8") produces 'x\xe3\x83\xaf\xe3\x83\xa6\xe3\x83\xad[\xe3\x83\xa8' which does not contain the complained-about byte 0x9c in position 1 (or any other position) -- how can that be? -- http://mail.python.org/mailman/listinfo/python-list