Hi, Apologies first as I am not a unicode expert.... indeed I the details probably totally elude me. Not withstanding: how can I convert a binary string containing UTF-8 binary into a python unicode string?
cutdown example: $ cat ./uc.py #!/usr/bin/env python imported="\304\246\311\231\316\257\316\271\303\222 \317\216\317\203\305\224\304\271\304\220" print "English/ASCII quoting:",'"'+imported+'"',"SUCCEEDS :-)" # xterm encoding if UTF8 print "German/ALCOR quoting:",u"\N{runic cross punctuation}"+"test" +"\N{runic cross punctuation}","AOK :-)" print "German/ALCOR quoting:",u"\N{runic cross punctuation}"+imported+u"\N{runic cross punctuation}","FAILS :-(" $ ./uc.py English/ASCII quoting: "ĦəίιÒ ώσŔĹĐ" SUCCEEDS :-) German/ALCOR quoting: ᛭test᛭ AOK :-) German/ALCOR quoting: Traceback (most recent call last): File "./uc.py", line 5, in <module> print "German/ALCOR quoting:",u"\N{runic cross punctuation}"+imported+u"\N{runic cross punctuation}","FAILS :-(" UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 0: ordinal not in range(128) The last print statement fails because the ascii "imported" characters are 8 bit encoded UTF-8 and dont know it! How do I tell "imported" that it is actually already UTF-8 unicode? Cheers NevilleDNZ -- http://mail.python.org/mailman/listinfo/python-list