It was just TOO easy... on posting my message to google groups, and
when I re-read the posting on groups I found that google had pointed me
to a python-unicode tutorial...
www.reportlab.com/i18n/python_unicode_tutorial.html - exercise one :-)
Gosh sometime a google is worth so much more then ₁₀¹⁰⁰!
Happy New Year
NevilleD
It works now:
$ ./uc.py
English/ASCII quoting: ĦəίιÒ ώσŔĹĐ SUCCEEDS :-)
German/ALCOR quoting: ᛭test᛭ AOK :-)
German/ALCOR quoting: ᛭ĦəίιÒ ώσŔĹĐ᛭ FAILS :-(
[EMAIL PROTECTED]:/root0/home/nevilled/Project/20 $ vi ./uc.py
[EMAIL PROTECTED]:/root0/home/nevilled/Project/20 $ cat ./uc.py
#!/usr/bin/env python
imported=unicode(\304\246\311\231\316\257\316\271\303\222
\317\216\317\203\305\224\304\271\304\220,utf-8)
print English/ASCII quoting:,''+imported+'',SUCCEEDS :-) # xterm
encoding if UTF8
print German/ALCOR quoting:,u\N{runic cross punctuation}test\N{runic
cross punctuation},AOK :-)
print German/ALCOR quoting:,u\N{runic cross
punctuation}+imported+u\N{runic cross punctuation},Just TOO easy
:-)
$ ./uc.py
English/ASCII quoting: ĦəίιÒ ώσŔĹĐ SUCCEEDS :-)
German/ALCOR quoting: ᛭test᛭ AOK :-)
German/ALCOR quoting: ᛭ĦəίιÒ ώσŔĹĐ᛭ Just TOO easy :-)
NevilleDNZ wrote:
Hi,
Apologies first as I am not a unicode expert indeed I the details
probably totally elude me. Not withstanding: how can I convert a
binary string containing UTF-8 binary into a python unicode string?
cutdown example:
$ cat ./uc.py
#!/usr/bin/env python
imported=\304\246\311\231\316\257\316\271\303\222
\317\216\317\203\305\224\304\271\304\220
print English/ASCII quoting:,''+imported+'',SUCCEEDS :-) # xterm
encoding if UTF8
print German/ALCOR quoting:,u\N{runic cross punctuation}+test
+\N{runic cross punctuation},AOK :-)
print German/ALCOR quoting:,u\N{runic cross
punctuation}+imported+u\N{runic cross punctuation},FAILS :-(
$ ./uc.py
English/ASCII quoting: ĦəίιÒ ώσŔĹĐ SUCCEEDS :-)
German/ALCOR quoting: ᛭test᛭ AOK :-)
German/ALCOR quoting:
Traceback (most recent call last):
File ./uc.py, line 5, in module
print German/ALCOR quoting:,u\N{runic cross
punctuation}+imported+u\N{runic cross punctuation},FAILS :-(
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc4 in position 0:
ordinal not in range(128)
The last print statement fails because the ascii imported characters
are 8 bit encoded UTF-8 and dont know it! How do I tell imported that
it is actually already UTF-8 unicode?
Cheers
NevilleDNZ
--
http://mail.python.org/mailman/listinfo/python-list