It was just TOO easy... on posting my message to google groups, and when I re-read the posting on groups I found that google had pointed me to a python-unicode tutorial... www.reportlab.com/i18n/python_unicode_tutorial.html - exercise one :-)
Gosh sometime a google is worth so much more then ₁₀¹⁰⁰! Happy New Year NevilleD It works now: $ ./uc.py English/ASCII quoting: "ĦəίιÒ ώσŔĹĐ" SUCCEEDS :-) German/ALCOR quoting: ᛭test᛭ AOK :-) German/ALCOR quoting: ᛭ĦəίιÒ ώσŔĹĐ᛭ FAILS :-( [EMAIL PROTECTED]:/root0/home/nevilled/Project/20 $ vi ./uc.py [EMAIL PROTECTED]:/root0/home/nevilled/Project/20 $ cat ./uc.py #!/usr/bin/env python imported=unicode("\304\246\311\231\316\257\316\271\303\222 \317\216\317\203\305\224\304\271\304\220","utf-8") print "English/ASCII quoting:",'"'+imported+'"',"SUCCEEDS :-)" # xterm encoding if UTF8 print "German/ALCOR quoting:",u"\N{runic cross punctuation}test\N{runic cross punctuation}","AOK :-)" print "German/ALCOR quoting:",u"\N{runic cross punctuation}"+imported+u"\N{runic cross punctuation}","Just TOO easy :-)" $ ./uc.py English/ASCII quoting: "ĦəίιÒ ώσŔĹĐ" SUCCEEDS :-) German/ALCOR quoting: ᛭test᛭ AOK :-) German/ALCOR quoting: ᛭ĦəίιÒ ώσŔĹĐ᛭ Just TOO easy :-) NevilleDNZ wrote: > Hi, > > Apologies first as I am not a unicode expert.... indeed I the details > probably totally elude me. Not withstanding: how can I convert a > binary string containing UTF-8 binary into a python unicode string? > > cutdown example: > $ cat ./uc.py > #!/usr/bin/env python > imported="\304\246\311\231\316\257\316\271\303\222 > \317\216\317\203\305\224\304\271\304\220" > print "English/ASCII quoting:",'"'+imported+'"',"SUCCEEDS :-)" # xterm > encoding if UTF8 > print "German/ALCOR quoting:",u"\N{runic cross punctuation}"+"test" > +"\N{runic cross punctuation}","AOK :-)" > print "German/ALCOR quoting:",u"\N{runic cross > punctuation}"+imported+u"\N{runic cross punctuation}","FAILS :-(" > > $ ./uc.py > English/ASCII quoting: "ĦəίιÒ ώσŔĹĐ" SUCCEEDS :-) > German/ALCOR quoting: ᛭test᛭ AOK :-) > German/ALCOR quoting: > Traceback (most recent call last): > File "./uc.py", line 5, in <module> > print "German/ALCOR quoting:",u"\N{runic cross > punctuation}"+imported+u"\N{runic cross punctuation}","FAILS :-(" > UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 0: > ordinal not in range(128) > > The last print statement fails because the ascii "imported" characters > are 8 bit encoded UTF-8 and dont know it! How do I tell "imported" that > it is actually already UTF-8 unicode? > > Cheers > NevilleDNZ -- http://mail.python.org/mailman/listinfo/python-list